Development on SNR estimator for audio-visual speech recognition based on waveform amplitude distribution analysis

For audio-visual speech recognition (AVSR) that uses audio modality combined with visual modality, the performance of speech recognition system can be improved, particularly when operating in a noisy environment. Audio modality can be easily corrupted by ambient noise, and this causes difficulty in...

Full description

Saved in:
Bibliographic Details
Main Author: Thum, Wei Seong
Format: Thesis
Language:English
Published: 2018
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/27969/1/Development%20on%20SNR%20estimator%20for%20audio-visual%20speech%20recognition%20based%20on%20waveform%20amplitude.pdf
http://umpir.ump.edu.my/id/eprint/27969/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.ump.umpir.27969
record_format eprints
spelling my.ump.umpir.279692020-02-25T04:17:26Z http://umpir.ump.edu.my/id/eprint/27969/ Development on SNR estimator for audio-visual speech recognition based on waveform amplitude distribution analysis Thum, Wei Seong TK Electrical engineering. Electronics Nuclear engineering For audio-visual speech recognition (AVSR) that uses audio modality combined with visual modality, the performance of speech recognition system can be improved, particularly when operating in a noisy environment. Audio modality can be easily corrupted by ambient noise, and this causes difficulty in distinguishing the actual speech signal with noise signal correctly. Signal-to-noise ratio (SNR) is a fundamental measuring ratio of signal power over noise power, which is expressed in decibels (dB). One of the most famous SNR estimation techniques is the waveform amplitude distribution analysis (WADA), where it assumes that the amplitude of speech and noise follows gamma and Gaussian distributions. It has been used in some research works as a benchmark for result comparison. However, there is no clear instruction on how to build the look-up table. In this work, the development and rebuild of the look-up table using the own database corrupted with general white noise as the noise reference has been proposed. The reconstruction of WADA look-up table technique, which is known as the waveform amplitude distribution analysis-white (WADA-W), is able to enhance the SNR estimation by referring to the reconstructed WADA-W look-up table instead of a general WADA precomputed look-up table. The proposed WADA-W SNR estimation technique was evaluated by developing an AVSR system that utilised mel-frequency cepstral coefficients (MFCC) features and shape-based visual features from two speech databases: LUNA-V and CUAVE. According to the experimental result, it showed that by referring to the WADA-W look-up table, it is capable of performing a consistent SNR estimation with more accurate and less bias result compared to the original WADA technique under four types of noises, which are white, babble, factory1, and factory2 noises from the NOISEX-92 dataset. The overall deviation of the SNR estimation of the LUNA-V database using the proposed WADA-W technique was just approximately 9.6dB, whereas the deviation of NIST and WADA techniques was approximately 42.3dB and 67.3dB respectively. By using the same proposed technique for CUAVE database, the overall deviation of the SNR estimation was only 13.3dB, whereas the deviation of NIST and WADA techniques was 50.6dB and 62.3dB respectively. The classification was done using the multi-stream hidden Markov model (MSHMM) with leave-one-out cross-validation (LOOCV) technique. From the experiments, it showed that the proposed AVSR system able to achieve the highest accuracy at 96.6% using LUNA-V database and 95.2% for CUAVE database under clean condition. In conclusion, the proposed WADA-W SNR estimator able to improve by 4.5% and 12.7% compared to the original WADA technique by using the LUNA-V and CUAVE database respectively. 2018-12 Thesis NonPeerReviewed pdf en http://umpir.ump.edu.my/id/eprint/27969/1/Development%20on%20SNR%20estimator%20for%20audio-visual%20speech%20recognition%20based%20on%20waveform%20amplitude.pdf Thum, Wei Seong (2018) Development on SNR estimator for audio-visual speech recognition based on waveform amplitude distribution analysis. Masters thesis, Universiti Malaysia Pahang.
institution Universiti Malaysia Pahang
building UMP Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Pahang
content_source UMP Institutional Repository
url_provider http://umpir.ump.edu.my/
language English
topic TK Electrical engineering. Electronics Nuclear engineering
spellingShingle TK Electrical engineering. Electronics Nuclear engineering
Thum, Wei Seong
Development on SNR estimator for audio-visual speech recognition based on waveform amplitude distribution analysis
description For audio-visual speech recognition (AVSR) that uses audio modality combined with visual modality, the performance of speech recognition system can be improved, particularly when operating in a noisy environment. Audio modality can be easily corrupted by ambient noise, and this causes difficulty in distinguishing the actual speech signal with noise signal correctly. Signal-to-noise ratio (SNR) is a fundamental measuring ratio of signal power over noise power, which is expressed in decibels (dB). One of the most famous SNR estimation techniques is the waveform amplitude distribution analysis (WADA), where it assumes that the amplitude of speech and noise follows gamma and Gaussian distributions. It has been used in some research works as a benchmark for result comparison. However, there is no clear instruction on how to build the look-up table. In this work, the development and rebuild of the look-up table using the own database corrupted with general white noise as the noise reference has been proposed. The reconstruction of WADA look-up table technique, which is known as the waveform amplitude distribution analysis-white (WADA-W), is able to enhance the SNR estimation by referring to the reconstructed WADA-W look-up table instead of a general WADA precomputed look-up table. The proposed WADA-W SNR estimation technique was evaluated by developing an AVSR system that utilised mel-frequency cepstral coefficients (MFCC) features and shape-based visual features from two speech databases: LUNA-V and CUAVE. According to the experimental result, it showed that by referring to the WADA-W look-up table, it is capable of performing a consistent SNR estimation with more accurate and less bias result compared to the original WADA technique under four types of noises, which are white, babble, factory1, and factory2 noises from the NOISEX-92 dataset. The overall deviation of the SNR estimation of the LUNA-V database using the proposed WADA-W technique was just approximately 9.6dB, whereas the deviation of NIST and WADA techniques was approximately 42.3dB and 67.3dB respectively. By using the same proposed technique for CUAVE database, the overall deviation of the SNR estimation was only 13.3dB, whereas the deviation of NIST and WADA techniques was 50.6dB and 62.3dB respectively. The classification was done using the multi-stream hidden Markov model (MSHMM) with leave-one-out cross-validation (LOOCV) technique. From the experiments, it showed that the proposed AVSR system able to achieve the highest accuracy at 96.6% using LUNA-V database and 95.2% for CUAVE database under clean condition. In conclusion, the proposed WADA-W SNR estimator able to improve by 4.5% and 12.7% compared to the original WADA technique by using the LUNA-V and CUAVE database respectively.
format Thesis
author Thum, Wei Seong
author_facet Thum, Wei Seong
author_sort Thum, Wei Seong
title Development on SNR estimator for audio-visual speech recognition based on waveform amplitude distribution analysis
title_short Development on SNR estimator for audio-visual speech recognition based on waveform amplitude distribution analysis
title_full Development on SNR estimator for audio-visual speech recognition based on waveform amplitude distribution analysis
title_fullStr Development on SNR estimator for audio-visual speech recognition based on waveform amplitude distribution analysis
title_full_unstemmed Development on SNR estimator for audio-visual speech recognition based on waveform amplitude distribution analysis
title_sort development on snr estimator for audio-visual speech recognition based on waveform amplitude distribution analysis
publishDate 2018
url http://umpir.ump.edu.my/id/eprint/27969/1/Development%20on%20SNR%20estimator%20for%20audio-visual%20speech%20recognition%20based%20on%20waveform%20amplitude.pdf
http://umpir.ump.edu.my/id/eprint/27969/
_version_ 1662754801456250880
score 13.209306