Development of an Isolated Digit Speech Recognition Based on Multilayer Perceptron Model
The automatic speech recognition (ASR) field has become one of the leading speech technology areas nowadays. The research in ASR has always been emphasizing on developing man-machine communication and promising in ease of use over the traditional keyboard and mouse. The speech recognition task is...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2004
|
Online Access: | http://psasir.upm.edu.my/id/eprint/6352/1/FSKTM_2004_7a.pdf http://psasir.upm.edu.my/id/eprint/6352/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.upm.eprints.6352 |
---|---|
record_format |
eprints |
institution |
Universiti Putra Malaysia |
building |
UPM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Putra Malaysia |
content_source |
UPM Institutional Repository |
url_provider |
http://psasir.upm.edu.my/ |
language |
English |
description |
The automatic speech recognition (ASR) field has become one of the leading speech
technology areas nowadays. The research in ASR has always been emphasizing on
developing man-machine communication and promising in ease of use over the
traditional keyboard and mouse. The speech recognition task is simple to be
identified by human, but a very complex process for the machine to understand.
Various methods have been introduced to develop an efficient ASR system. A
Neural Network (NN) approach is one of the famous methods and widely used in this
field. A Multilayer perceptron (MLP) is a popular NN model used in ASR field. In
this study, a MLP with back propagation learning algorithm is implemented to
perform the isolated digit speech recognition task for Malay language. However, one
of the current problems faced by MLP and most NN models in ASR field is the long
learning time. Besides that, the requirement to produce high recognition rate for
isolated digit speech recognition system performed by MLP is also not trivial
because it has been widely used in many applications. Thus, this study focuses on
improving the learning time and recognition rate of the MLP neural network for Malay isolated digit speech recognition system. This current study proposes three
new methods to fulfill the objective above. The improvement is made in
preprocessing and recognition phase. In preprocessing phase, a new endpoint
detection method is proposed and it is known as variance method. This method is
introduced to overcome the disadvantages of the conventional method. The
obstacles in the conventional method are unstable and difficult to set the threshold
during the silence detection. Hence, poor recognition rate is produced. Another
contribution in the preprocessing phase is in normalization phase. Three
normalization methods are introduced to normalize the speech data before
propagating to NN. The proposed methods consist of exponent, hybrid I and hybrid
II. These methods are compared with 4 widely used conventional normalization
methods. These include range I, range II, simple and variance method. The
conventional methods have two limitations. The first is that some of the methods are
very slow in learning phase but produce good recognition rate such as variance and
range I methods. The second is that few of them are very fast in learning phase but
produce low recognition rate such as simple and range II methods. Therefore, the
new normalization methods are proposed to accelerate learning time and to produce
high recognition rate. In recognition phase, a simple novel approach is introduced to
increase the recognition rate. An adaptive sigmoid function is implemented to
achieve this objective. A typical or fixed sigmoid function method is used in
learning phase. In the recognition phase, an adaptive sigmoid function is employed.
In this sense, the slope of the activation function is adjusted to gain highest
recognition rate. This study emphasizes on 10 Malay words that comprise of “sifar”
to “sembilan” (“0” to “9”). All utterances were recorded through single male speaker
and each utterance was repeated 100 times. Thus the data set consist of 1000 utterances of Malay words. Four hundred data sets were split to utilize in the learning
phase and the remaining 600 data for recognition phase. The TI46 standard data set
was used to evaluate the performance of the all proposed method and 10 English
words, consisting of “zero” to “nine” (“0” to “9”) are utilized throughout this study.
Eight male and female speakers uttered each word 8 times. Hence, the total data set
is 1600 for both speakers. The data set based on male and female speaker is trained
separately. In this sense, four hundred male data sets were experimented during
learning phase; meanwhile 400 data sets are kept as test data. The same approach is
utilized in learning and recognition phase for female data sets. The Linear Predictive
Coding (LPC) is implemented as a feature extraction method to represent the speech
data. The experimental results show that the proposed endpoint detection (variance
method) produced promising results in term of learning time and recognition rate.
Meanwhile, the proposed normalization method has shown excellent results over all
experiments. The adaptive sigmoid function also successfully increased the
recognition rate in the most of the experiments. Finally, from the overall
experiments, it can be concluded that the highest recognition rate for Malay data set
is 99.83% with 82s convergence time. Meanwhile, for TI46 data set (female and
male data set), the yielded convergence time is 55s and 111s with the recognition rate
of 96.75% and 94.75% respectively. |
format |
Thesis |
author |
Mohamad Hussin, Ummu Salmah |
spellingShingle |
Mohamad Hussin, Ummu Salmah Development of an Isolated Digit Speech Recognition Based on Multilayer Perceptron Model |
author_facet |
Mohamad Hussin, Ummu Salmah |
author_sort |
Mohamad Hussin, Ummu Salmah |
title |
Development of an Isolated Digit Speech Recognition Based on Multilayer Perceptron Model |
title_short |
Development of an Isolated Digit Speech Recognition Based on Multilayer Perceptron Model |
title_full |
Development of an Isolated Digit Speech Recognition Based on Multilayer Perceptron Model |
title_fullStr |
Development of an Isolated Digit Speech Recognition Based on Multilayer Perceptron Model |
title_full_unstemmed |
Development of an Isolated Digit Speech Recognition Based on Multilayer Perceptron Model |
title_sort |
development of an isolated digit speech recognition based on multilayer perceptron model |
publishDate |
2004 |
url |
http://psasir.upm.edu.my/id/eprint/6352/1/FSKTM_2004_7a.pdf http://psasir.upm.edu.my/id/eprint/6352/ |
_version_ |
1643823462436831232 |
spelling |
my.upm.eprints.63522015-05-22T01:56:37Z http://psasir.upm.edu.my/id/eprint/6352/ Development of an Isolated Digit Speech Recognition Based on Multilayer Perceptron Model Mohamad Hussin, Ummu Salmah The automatic speech recognition (ASR) field has become one of the leading speech technology areas nowadays. The research in ASR has always been emphasizing on developing man-machine communication and promising in ease of use over the traditional keyboard and mouse. The speech recognition task is simple to be identified by human, but a very complex process for the machine to understand. Various methods have been introduced to develop an efficient ASR system. A Neural Network (NN) approach is one of the famous methods and widely used in this field. A Multilayer perceptron (MLP) is a popular NN model used in ASR field. In this study, a MLP with back propagation learning algorithm is implemented to perform the isolated digit speech recognition task for Malay language. However, one of the current problems faced by MLP and most NN models in ASR field is the long learning time. Besides that, the requirement to produce high recognition rate for isolated digit speech recognition system performed by MLP is also not trivial because it has been widely used in many applications. Thus, this study focuses on improving the learning time and recognition rate of the MLP neural network for Malay isolated digit speech recognition system. This current study proposes three new methods to fulfill the objective above. The improvement is made in preprocessing and recognition phase. In preprocessing phase, a new endpoint detection method is proposed and it is known as variance method. This method is introduced to overcome the disadvantages of the conventional method. The obstacles in the conventional method are unstable and difficult to set the threshold during the silence detection. Hence, poor recognition rate is produced. Another contribution in the preprocessing phase is in normalization phase. Three normalization methods are introduced to normalize the speech data before propagating to NN. The proposed methods consist of exponent, hybrid I and hybrid II. These methods are compared with 4 widely used conventional normalization methods. These include range I, range II, simple and variance method. The conventional methods have two limitations. The first is that some of the methods are very slow in learning phase but produce good recognition rate such as variance and range I methods. The second is that few of them are very fast in learning phase but produce low recognition rate such as simple and range II methods. Therefore, the new normalization methods are proposed to accelerate learning time and to produce high recognition rate. In recognition phase, a simple novel approach is introduced to increase the recognition rate. An adaptive sigmoid function is implemented to achieve this objective. A typical or fixed sigmoid function method is used in learning phase. In the recognition phase, an adaptive sigmoid function is employed. In this sense, the slope of the activation function is adjusted to gain highest recognition rate. This study emphasizes on 10 Malay words that comprise of “sifar” to “sembilan” (“0” to “9”). All utterances were recorded through single male speaker and each utterance was repeated 100 times. Thus the data set consist of 1000 utterances of Malay words. Four hundred data sets were split to utilize in the learning phase and the remaining 600 data for recognition phase. The TI46 standard data set was used to evaluate the performance of the all proposed method and 10 English words, consisting of “zero” to “nine” (“0” to “9”) are utilized throughout this study. Eight male and female speakers uttered each word 8 times. Hence, the total data set is 1600 for both speakers. The data set based on male and female speaker is trained separately. In this sense, four hundred male data sets were experimented during learning phase; meanwhile 400 data sets are kept as test data. The same approach is utilized in learning and recognition phase for female data sets. The Linear Predictive Coding (LPC) is implemented as a feature extraction method to represent the speech data. The experimental results show that the proposed endpoint detection (variance method) produced promising results in term of learning time and recognition rate. Meanwhile, the proposed normalization method has shown excellent results over all experiments. The adaptive sigmoid function also successfully increased the recognition rate in the most of the experiments. Finally, from the overall experiments, it can be concluded that the highest recognition rate for Malay data set is 99.83% with 82s convergence time. Meanwhile, for TI46 data set (female and male data set), the yielded convergence time is 55s and 111s with the recognition rate of 96.75% and 94.75% respectively. 2004-08 Thesis NonPeerReviewed application/pdf en http://psasir.upm.edu.my/id/eprint/6352/1/FSKTM_2004_7a.pdf Mohamad Hussin, Ummu Salmah (2004) Development of an Isolated Digit Speech Recognition Based on Multilayer Perceptron Model. PhD thesis, Universiti Putra Malaysia. |
score |
13.211869 |