Isolated English alphabet speech recognition using wavelet cepstral coefficients and neural network
Speech recognition has many applications in various fields. One of the most important phase in speech recognition is feature extraction. In feature extraction relevant important information from the speech signal are extracted. However, two important issues that affect feature extraction are noise r...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2014
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/78047/1/TarmiziAdamMFC20141.pdf http://eprints.utm.my/id/eprint/78047/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:83709 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.utm.78047 |
---|---|
record_format |
eprints |
spelling |
my.utm.780472018-07-23T05:33:14Z http://eprints.utm.my/id/eprint/78047/ Isolated English alphabet speech recognition using wavelet cepstral coefficients and neural network Adam, Tarmizi QA75 Electronic computers. Computer science Speech recognition has many applications in various fields. One of the most important phase in speech recognition is feature extraction. In feature extraction relevant important information from the speech signal are extracted. However, two important issues that affect feature extraction are noise robustness and high feature dimension. Existing feature extraction which uses fixed windows processing and spectral analysis methods like Mel-Frequency Cepstral Coefficient (MFCC) could not cater robustness and high feature dimension problems. This research proposes the usage of Discrete Wavelet Transform (DWT) to replace Discrete Fourier Transform (DFT) for calculating the cepstrum coefficients to produce a newly proposed Wavelet Cepstral Coefficient Wavelet Cepstral Coefficient (WCC). The DWT is used in order to gain the advantages of the wavelet in analyzing non stationary signals. The WCC is computed in a frame by frame manner. Each speech frame is decomposed using the DWT and the log energy of its coefficients is taken. The final stage of the WCC computation is done by taking the Discrete Cosine Transform (DCT) of these log energies to form the WCC. The WCC are then fed into a Neural Network (NN) for classification. In order to test the proposed WCC a series of experiments were conducted on TI-ALPHA dataset to compare its performance with the MFCC. The experiments were conducted under several noise levels using Additive White Gaussian Noise (AWGN) and number of coefficients for speaker dependent and independent tasks. From the results, it is shown that the WCC has the advantage of withstanding noisy conditions better than MFCC especially under small number of features for both speaker dependent and independent tasks. The best result tested under noisy condition of 25 dB shows that 30 WCC coefficients using Daubechies 12 achieved 71.79% recognition rate in comparison to only 37.62% using MFCC under the same constraint. The main contribution of this research is the development of the WCC features which performs better than the MFCC under noisy signals and reduced number of feature coefficients. 2014-03 Thesis NonPeerReviewed application/pdf en http://eprints.utm.my/id/eprint/78047/1/TarmiziAdamMFC20141.pdf Adam, Tarmizi (2014) Isolated English alphabet speech recognition using wavelet cepstral coefficients and neural network. Masters thesis, Universiti Teknologi Malaysia, Faculty of Computing. http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:83709 |
institution |
Universiti Teknologi Malaysia |
building |
UTM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Malaysia |
content_source |
UTM Institutional Repository |
url_provider |
http://eprints.utm.my/ |
language |
English |
topic |
QA75 Electronic computers. Computer science |
spellingShingle |
QA75 Electronic computers. Computer science Adam, Tarmizi Isolated English alphabet speech recognition using wavelet cepstral coefficients and neural network |
description |
Speech recognition has many applications in various fields. One of the most important phase in speech recognition is feature extraction. In feature extraction relevant important information from the speech signal are extracted. However, two important issues that affect feature extraction are noise robustness and high feature dimension. Existing feature extraction which uses fixed windows processing and spectral analysis methods like Mel-Frequency Cepstral Coefficient (MFCC) could not cater robustness and high feature dimension problems. This research proposes the usage of Discrete Wavelet Transform (DWT) to replace Discrete Fourier Transform (DFT) for calculating the cepstrum coefficients to produce a newly proposed Wavelet Cepstral Coefficient Wavelet Cepstral Coefficient (WCC). The DWT is used in order to gain the advantages of the wavelet in analyzing non stationary signals. The WCC is computed in a frame by frame manner. Each speech frame is decomposed using the DWT and the log energy of its coefficients is taken. The final stage of the WCC computation is done by taking the Discrete Cosine Transform (DCT) of these log energies to form the WCC. The WCC are then fed into a Neural Network (NN) for classification. In order to test the proposed WCC a series of experiments were conducted on TI-ALPHA dataset to compare its performance with the MFCC. The experiments were conducted under several noise levels using Additive White Gaussian Noise (AWGN) and number of coefficients for speaker dependent and independent tasks. From the results, it is shown that the WCC has the advantage of withstanding noisy conditions better than MFCC especially under small number of features for both speaker dependent and independent tasks. The best result tested under noisy condition of 25 dB shows that 30 WCC coefficients using Daubechies 12 achieved 71.79% recognition rate in comparison to only 37.62% using MFCC under the same constraint. The main contribution of this research is the development of the WCC features which performs better than the MFCC under noisy signals and reduced number of feature coefficients. |
format |
Thesis |
author |
Adam, Tarmizi |
author_facet |
Adam, Tarmizi |
author_sort |
Adam, Tarmizi |
title |
Isolated English alphabet speech recognition using wavelet cepstral coefficients and neural network |
title_short |
Isolated English alphabet speech recognition using wavelet cepstral coefficients and neural network |
title_full |
Isolated English alphabet speech recognition using wavelet cepstral coefficients and neural network |
title_fullStr |
Isolated English alphabet speech recognition using wavelet cepstral coefficients and neural network |
title_full_unstemmed |
Isolated English alphabet speech recognition using wavelet cepstral coefficients and neural network |
title_sort |
isolated english alphabet speech recognition using wavelet cepstral coefficients and neural network |
publishDate |
2014 |
url |
http://eprints.utm.my/id/eprint/78047/1/TarmiziAdamMFC20141.pdf http://eprints.utm.my/id/eprint/78047/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:83709 |
_version_ |
1643657712034119680 |
score |
13.214268 |