The Effectiveness of DTW-FF Coefficients and Pitch Feature in NN Speech Recognition
This paper presents a method to extract speech features contained in the dynamic time warping path which originally was derived from linear predictive coding (LPC). For the purpose of recognition, the extracted feature will represent the inputs into neural network back-propagation. The new method p...
Saved in:
Main Authors: | , , |
---|---|
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2006
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/1971/1/rubita06_Effectiveness_of_DTWFF_coefficients.pdf http://eprints.utm.my/id/eprint/1971/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This paper presents a method to extract speech features
contained in the dynamic time warping path which originally was derived from linear predictive coding (LPC). For the purpose of recognition, the extracted feature will represent the inputs into neural network back-propagation. The new method presented here is how the feature is extracted and those coefficients are normalized against the template pattern according to the selected average number of frames over the samples collected. The idea behind this method is due to neural network (NN) limitation where a fixed amount of input nodes are needed for every input class especially in the application of multiple inputs. Thus, the main objective of this research is to find an alternative method to reduce the amount of computation and complexity in a neural network, in this case is for speech recognition. One way to achieve this is by reducing the number of inputs into the network. This is done through dynamic warping process in which local distance scores of the warping path will be utilized instead of the global distance scores. From the literature review, past and most current research are using the global distance score or LPC coefficients as input to the neural network. LPC certainly presented into the network with a large amount of coefficients in each speech frame. |
---|