A Pipeline To Data Preprocessing For Lipreading And Audio-Visual Speech Recognition
Studies show that only about 30 to 45 percent of English language can be understood by lipreading alone. Even the most talented lip readers are unable to collect a complete message based on lipreading only, although they are often very good at interpreting facial features, body language, and context...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
World Academy of Research in Science and Engineering
2020
|
Online Access: | http://eprints.utem.edu.my/id/eprint/25010/2/2020%2C%20NGO%2C%20AUDIO-VISUAL%20SPEECH%20-%20IJATCSE_01.PDF http://eprints.utem.edu.my/id/eprint/25010/ http://www.warse.org/IJATCSE/static/pdf/file/ijatcse58942020.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.utem.eprints.25010 |
---|---|
record_format |
eprints |
spelling |
my.utem.eprints.250102021-04-20T12:28:13Z http://eprints.utem.edu.my/id/eprint/25010/ A Pipeline To Data Preprocessing For Lipreading And Audio-Visual Speech Recognition Ngo, Hea Choon Hashim, Ummi Rabaah Raja Ikram, Raja Rina Salahuddin, Lizawati Teoh, Mok Lee Studies show that only about 30 to 45 percent of English language can be understood by lipreading alone. Even the most talented lip readers are unable to collect a complete message based on lipreading only, although they are often very good at interpreting facial features, body language, and context to find out. As you can imagine, this technique affects the brain in different ways and becomes exhausting over a period of time. If a person who is deaf, uses language and is able to read lips, hearing people may not understand the challenges they are facing just to have a simple one-on-one conversation. The hearing person may be annoyed that they are often asked to repeat themselves or to speak more slowly and clearly. They could lose patience and break off the conversation. In our modern world, where technology connects us in a way never thought possible, there are a variety of ways to communicate with another person. Deaf people come from all walks of life and with different backgrounds. In this study, a lipreading model is being developed that is able to record, analyze, translate the movement of lips and display them into subtitles. A model is trained with GRID Corpus, MIRACL-VC1 and pre-trained dataset and with the LipNet model to build a system which deaf people can decode text from the movement of a speaker’s mouth. This system will help the deaf people understand what others are actually saying and communicate more effectively. As a conclusion, this system helps deaf people to communicate effectively with others. World Academy of Research in Science and Engineering 2020-08 Article PeerReviewed text en http://eprints.utem.edu.my/id/eprint/25010/2/2020%2C%20NGO%2C%20AUDIO-VISUAL%20SPEECH%20-%20IJATCSE_01.PDF Ngo, Hea Choon and Hashim, Ummi Rabaah and Raja Ikram, Raja Rina and Salahuddin, Lizawati and Teoh, Mok Lee (2020) A Pipeline To Data Preprocessing For Lipreading And Audio-Visual Speech Recognition. International Journal of Advanced Trends in Computer Science and Engineering, 9 (4). pp. 4589-4596. ISSN 2278-3091 http://www.warse.org/IJATCSE/static/pdf/file/ijatcse58942020.pdf 10.30534/ijatcse/2020/58942020 |
institution |
Universiti Teknikal Malaysia Melaka |
building |
UTEM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknikal Malaysia Melaka |
content_source |
UTEM Institutional Repository |
url_provider |
http://eprints.utem.edu.my/ |
language |
English |
description |
Studies show that only about 30 to 45 percent of English language can be understood by lipreading alone. Even the most talented lip readers are unable to collect a complete message based on lipreading only, although they are often very good at interpreting facial features, body language, and context to find out. As you can imagine, this technique affects the brain in different ways and becomes exhausting over a period of time. If a person who is deaf, uses language and is able to read lips, hearing people may not understand the challenges they are facing just to have a simple one-on-one conversation. The hearing person may be annoyed that they are often asked to repeat themselves or to speak more slowly and clearly. They could lose patience and break off the conversation. In our modern world, where technology connects us in a way never thought possible, there are a variety of ways to communicate with another person. Deaf people come from all walks of life and with different backgrounds. In this study, a lipreading model is being developed that is able to record, analyze, translate the movement of lips and display them into subtitles. A model is trained with GRID Corpus, MIRACL-VC1 and pre-trained dataset and with the LipNet model to build a system which deaf people can decode text from the movement of a speaker’s mouth. This system will help the deaf people understand what others are actually saying and communicate more effectively. As a conclusion, this system helps deaf people to communicate effectively with others. |
format |
Article |
author |
Ngo, Hea Choon Hashim, Ummi Rabaah Raja Ikram, Raja Rina Salahuddin, Lizawati Teoh, Mok Lee |
spellingShingle |
Ngo, Hea Choon Hashim, Ummi Rabaah Raja Ikram, Raja Rina Salahuddin, Lizawati Teoh, Mok Lee A Pipeline To Data Preprocessing For Lipreading And Audio-Visual Speech Recognition |
author_facet |
Ngo, Hea Choon Hashim, Ummi Rabaah Raja Ikram, Raja Rina Salahuddin, Lizawati Teoh, Mok Lee |
author_sort |
Ngo, Hea Choon |
title |
A Pipeline To Data Preprocessing For Lipreading And Audio-Visual Speech Recognition |
title_short |
A Pipeline To Data Preprocessing For Lipreading And Audio-Visual Speech Recognition |
title_full |
A Pipeline To Data Preprocessing For Lipreading And Audio-Visual Speech Recognition |
title_fullStr |
A Pipeline To Data Preprocessing For Lipreading And Audio-Visual Speech Recognition |
title_full_unstemmed |
A Pipeline To Data Preprocessing For Lipreading And Audio-Visual Speech Recognition |
title_sort |
pipeline to data preprocessing for lipreading and audio-visual speech recognition |
publisher |
World Academy of Research in Science and Engineering |
publishDate |
2020 |
url |
http://eprints.utem.edu.my/id/eprint/25010/2/2020%2C%20NGO%2C%20AUDIO-VISUAL%20SPEECH%20-%20IJATCSE_01.PDF http://eprints.utem.edu.my/id/eprint/25010/ http://www.warse.org/IJATCSE/static/pdf/file/ijatcse58942020.pdf |
_version_ |
1698700618387947520 |
score |
13.1944895 |