Staff View: Dynamic Time Warping fixed - frame coefficient with pitch feature for speech recognition system with neural network

Dynamic Time Warping fixed - frame coefficient with pitch feature for speech recognition system with neural network

Automatic Speech Recognition products are already available in the market since many years ago. Intensive research and development still continue for further improvement of speech technology. Among typical methods that have been applied to speech technology are Hidden Markov Model (HMM), Dynamic Tim...

Full description

Saved in:

Bibliographic Details
Main Author:	Sudirman, Rubita
Format:	Thesis
Language:	English
Published:	2007
Subjects:	TK Electrical engineering. Electronics Nuclear engineering
Online Access:	http://eprints.utm.my/id/eprint/18678/1/RubitaSudirmanPFKE2007.pdf http://eprints.utm.my/id/eprint/18678/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:73534
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.utm.18678
record_format	eprints
spelling	my.utm.186782018-08-26T04:53:08Z http://eprints.utm.my/id/eprint/18678/ Dynamic Time Warping fixed - frame coefficient with pitch feature for speech recognition system with neural network Sudirman, Rubita TK Electrical engineering. Electronics Nuclear engineering Automatic Speech Recognition products are already available in the market since many years ago. Intensive research and development still continue for further improvement of speech technology. Among typical methods that have been applied to speech technology are Hidden Markov Model (HMM), Dynamic Time Warping (DTW), and Neural Network (NN). However previous research relied heavily on the HMM without paying much attention to Neural Network (NN). In this research, NN with back-propagation algorithm is used to perform the recognition, with inputs derived from Linear Predictive Coefficient (LPC) and pitch feature. It is known that back-propagation NN is capable of handling large learning problems and is a very promising method due to its ability to train data and classify them. NN has not been fully employed as a successful speech recognition engine since it requires a normalized input length. The nonlinear time normalization based on DTW is identified as the suitable tool to overcome time variation problem by expanding or compressing the speech to a desired number of data. The proposed DTW frame fixing (DTW-FF) algorithm is an extended DTW algorithm to reduce the number of inputs into the NN. This method had reduced the amount of computation and network complexity by reducing the number of inputs by 90%. Therefore a faster recognition is achieved. Recognition using DTW showed the same results when LPC or DTW-FF feature were used. This indicates no loss of information occurred during data manipulation. Pitch estimate is another feature introduced to the NN that has helped to increase recognition accuracy. An average of 10.32% improvement is recorded when pitch is added to DTW-FF feature as input to back-propagation NN using Malay digits samples. The back-propagation algorithm was then designed with both the Quasi Newton and Conjugate Gradient methods. This is to compare which method is able to achieve optimal global minimum. Results showed that Conjugate Gradient performed better. 2007-08 Thesis NonPeerReviewed application/pdf en http://eprints.utm.my/id/eprint/18678/1/RubitaSudirmanPFKE2007.pdf Sudirman, Rubita (2007) Dynamic Time Warping fixed - frame coefficient with pitch feature for speech recognition system with neural network. PhD thesis, Universiti Teknologi Malaysia, Fakulti Kejuruteraan Elektrik. http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:73534
institution	Universiti Teknologi Malaysia
building	UTM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Teknologi Malaysia
content_source	UTM Institutional Repository
url_provider	http://eprints.utm.my/
language	English
topic	TK Electrical engineering. Electronics Nuclear engineering
spellingShingle	TK Electrical engineering. Electronics Nuclear engineering Sudirman, Rubita Dynamic Time Warping fixed - frame coefficient with pitch feature for speech recognition system with neural network
description	Automatic Speech Recognition products are already available in the market since many years ago. Intensive research and development still continue for further improvement of speech technology. Among typical methods that have been applied to speech technology are Hidden Markov Model (HMM), Dynamic Time Warping (DTW), and Neural Network (NN). However previous research relied heavily on the HMM without paying much attention to Neural Network (NN). In this research, NN with back-propagation algorithm is used to perform the recognition, with inputs derived from Linear Predictive Coefficient (LPC) and pitch feature. It is known that back-propagation NN is capable of handling large learning problems and is a very promising method due to its ability to train data and classify them. NN has not been fully employed as a successful speech recognition engine since it requires a normalized input length. The nonlinear time normalization based on DTW is identified as the suitable tool to overcome time variation problem by expanding or compressing the speech to a desired number of data. The proposed DTW frame fixing (DTW-FF) algorithm is an extended DTW algorithm to reduce the number of inputs into the NN. This method had reduced the amount of computation and network complexity by reducing the number of inputs by 90%. Therefore a faster recognition is achieved. Recognition using DTW showed the same results when LPC or DTW-FF feature were used. This indicates no loss of information occurred during data manipulation. Pitch estimate is another feature introduced to the NN that has helped to increase recognition accuracy. An average of 10.32% improvement is recorded when pitch is added to DTW-FF feature as input to back-propagation NN using Malay digits samples. The back-propagation algorithm was then designed with both the Quasi Newton and Conjugate Gradient methods. This is to compare which method is able to achieve optimal global minimum. Results showed that Conjugate Gradient performed better.
format	Thesis
author	Sudirman, Rubita
author_facet	Sudirman, Rubita
author_sort	Sudirman, Rubita
title	Dynamic Time Warping fixed - frame coefficient with pitch feature for speech recognition system with neural network
title_short	Dynamic Time Warping fixed - frame coefficient with pitch feature for speech recognition system with neural network
title_full	Dynamic Time Warping fixed - frame coefficient with pitch feature for speech recognition system with neural network
title_fullStr	Dynamic Time Warping fixed - frame coefficient with pitch feature for speech recognition system with neural network
title_full_unstemmed	Dynamic Time Warping fixed - frame coefficient with pitch feature for speech recognition system with neural network
title_sort	dynamic time warping fixed - frame coefficient with pitch feature for speech recognition system with neural network
publishDate	2007
url	http://eprints.utm.my/id/eprint/18678/1/RubitaSudirmanPFKE2007.pdf http://eprints.utm.my/id/eprint/18678/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:73534
_version_	1643646968795234304
score	13.188404

Dynamic Time Warping fixed - frame coefficient with pitch feature for speech recognition system with neural network

Similar Items