A Pipeline To Data Preprocessing For Lipreading And Audio-Visual Speech Recognition

Studies show that only about 30 to 45 percent of English language can be understood by lipreading alone. Even the most talented lip readers are unable to collect a complete message based on lipreading only, although they are often very good at interpreting facial features, body language, and context...

Full description

Saved in:
Bibliographic Details
Main Authors: Ngo, Hea Choon, Hashim, Ummi Rabaah, Raja Ikram, Raja Rina, Salahuddin, Lizawati, Teoh, Mok Lee
Format: Article
Language:English
Published: World Academy of Research in Science and Engineering 2020
Online Access:http://eprints.utem.edu.my/id/eprint/25010/2/2020%2C%20NGO%2C%20AUDIO-VISUAL%20SPEECH%20-%20IJATCSE_01.PDF
http://eprints.utem.edu.my/id/eprint/25010/
http://www.warse.org/IJATCSE/static/pdf/file/ijatcse58942020.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.utem.eprints.25010
record_format eprints
spelling my.utem.eprints.250102021-04-20T12:28:13Z http://eprints.utem.edu.my/id/eprint/25010/ A Pipeline To Data Preprocessing For Lipreading And Audio-Visual Speech Recognition Ngo, Hea Choon Hashim, Ummi Rabaah Raja Ikram, Raja Rina Salahuddin, Lizawati Teoh, Mok Lee Studies show that only about 30 to 45 percent of English language can be understood by lipreading alone. Even the most talented lip readers are unable to collect a complete message based on lipreading only, although they are often very good at interpreting facial features, body language, and context to find out. As you can imagine, this technique affects the brain in different ways and becomes exhausting over a period of time. If a person who is deaf, uses language and is able to read lips, hearing people may not understand the challenges they are facing just to have a simple one-on-one conversation. The hearing person may be annoyed that they are often asked to repeat themselves or to speak more slowly and clearly. They could lose patience and break off the conversation. In our modern world, where technology connects us in a way never thought possible, there are a variety of ways to communicate with another person. Deaf people come from all walks of life and with different backgrounds. In this study, a lipreading model is being developed that is able to record, analyze, translate the movement of lips and display them into subtitles. A model is trained with GRID Corpus, MIRACL-VC1 and pre-trained dataset and with the LipNet model to build a system which deaf people can decode text from the movement of a speaker’s mouth. This system will help the deaf people understand what others are actually saying and communicate more effectively. As a conclusion, this system helps deaf people to communicate effectively with others. World Academy of Research in Science and Engineering 2020-08 Article PeerReviewed text en http://eprints.utem.edu.my/id/eprint/25010/2/2020%2C%20NGO%2C%20AUDIO-VISUAL%20SPEECH%20-%20IJATCSE_01.PDF Ngo, Hea Choon and Hashim, Ummi Rabaah and Raja Ikram, Raja Rina and Salahuddin, Lizawati and Teoh, Mok Lee (2020) A Pipeline To Data Preprocessing For Lipreading And Audio-Visual Speech Recognition. International Journal of Advanced Trends in Computer Science and Engineering, 9 (4). pp. 4589-4596. ISSN 2278-3091 http://www.warse.org/IJATCSE/static/pdf/file/ijatcse58942020.pdf 10.30534/ijatcse/2020/58942020
institution Universiti Teknikal Malaysia Melaka
building UTEM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknikal Malaysia Melaka
content_source UTEM Institutional Repository
url_provider http://eprints.utem.edu.my/
language English
description Studies show that only about 30 to 45 percent of English language can be understood by lipreading alone. Even the most talented lip readers are unable to collect a complete message based on lipreading only, although they are often very good at interpreting facial features, body language, and context to find out. As you can imagine, this technique affects the brain in different ways and becomes exhausting over a period of time. If a person who is deaf, uses language and is able to read lips, hearing people may not understand the challenges they are facing just to have a simple one-on-one conversation. The hearing person may be annoyed that they are often asked to repeat themselves or to speak more slowly and clearly. They could lose patience and break off the conversation. In our modern world, where technology connects us in a way never thought possible, there are a variety of ways to communicate with another person. Deaf people come from all walks of life and with different backgrounds. In this study, a lipreading model is being developed that is able to record, analyze, translate the movement of lips and display them into subtitles. A model is trained with GRID Corpus, MIRACL-VC1 and pre-trained dataset and with the LipNet model to build a system which deaf people can decode text from the movement of a speaker’s mouth. This system will help the deaf people understand what others are actually saying and communicate more effectively. As a conclusion, this system helps deaf people to communicate effectively with others.
format Article
author Ngo, Hea Choon
Hashim, Ummi Rabaah
Raja Ikram, Raja Rina
Salahuddin, Lizawati
Teoh, Mok Lee
spellingShingle Ngo, Hea Choon
Hashim, Ummi Rabaah
Raja Ikram, Raja Rina
Salahuddin, Lizawati
Teoh, Mok Lee
A Pipeline To Data Preprocessing For Lipreading And Audio-Visual Speech Recognition
author_facet Ngo, Hea Choon
Hashim, Ummi Rabaah
Raja Ikram, Raja Rina
Salahuddin, Lizawati
Teoh, Mok Lee
author_sort Ngo, Hea Choon
title A Pipeline To Data Preprocessing For Lipreading And Audio-Visual Speech Recognition
title_short A Pipeline To Data Preprocessing For Lipreading And Audio-Visual Speech Recognition
title_full A Pipeline To Data Preprocessing For Lipreading And Audio-Visual Speech Recognition
title_fullStr A Pipeline To Data Preprocessing For Lipreading And Audio-Visual Speech Recognition
title_full_unstemmed A Pipeline To Data Preprocessing For Lipreading And Audio-Visual Speech Recognition
title_sort pipeline to data preprocessing for lipreading and audio-visual speech recognition
publisher World Academy of Research in Science and Engineering
publishDate 2020
url http://eprints.utem.edu.my/id/eprint/25010/2/2020%2C%20NGO%2C%20AUDIO-VISUAL%20SPEECH%20-%20IJATCSE_01.PDF
http://eprints.utem.edu.my/id/eprint/25010/
http://www.warse.org/IJATCSE/static/pdf/file/ijatcse58942020.pdf
_version_ 1698700618387947520
score 13.1944895