Ethnic recognition system for Malay language speakers using gammatone frequency cepstral coefficients pitch (GFCCP) and pattern classification

Malaysia is a multi-racial country consisting of many ethnic groups such as the Malay, Chinese, Indian, and Bumiputera, also known as a multilingual society. The Malay language is a non-tonal language, which does not need lexical stress. The study on recognizing the speaker's ethnicity is impor...

Full description

Saved in:
Bibliographic Details
Main Author: Mohd Hanifa, Rafizah
Format: Thesis
Language:English
English
English
Published: 2022
Subjects:
Online Access:http://eprints.uthm.edu.my/10809/1/24p%20RAFIZAH%20MOHD%20HANIFA.pdf
http://eprints.uthm.edu.my/10809/2/RAFIZAH%20MOHD%20HANIFA%20COPYRIGHT%20DECLARATION.pdf
http://eprints.uthm.edu.my/10809/3/RAFIZAH%20MOHD%20HANIFA%20WATERMARK.pdf
http://eprints.uthm.edu.my/10809/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.uthm.eprints.10809
record_format eprints
spelling my.uthm.eprints.108092024-05-13T06:56:32Z http://eprints.uthm.edu.my/10809/ Ethnic recognition system for Malay language speakers using gammatone frequency cepstral coefficients pitch (GFCCP) and pattern classification Mohd Hanifa, Rafizah T Technology (General) Malaysia is a multi-racial country consisting of many ethnic groups such as the Malay, Chinese, Indian, and Bumiputera, also known as a multilingual society. The Malay language is a non-tonal language, which does not need lexical stress. The study on recognizing the speaker's ethnicity is important as it has many potential and useful applications such as improving the interaction between robots and humans, audio forensic, telephone banking, and electronic commerce. Feature extraction, voice text-independent, and variability coverage are issues related to speaker recognition systems. The research focused on establishing a novel method, Gammatone Frequency Cepstral Coefficients and pitch (GFFCP) coupled with the K-Nearest Neighbours (KNN) and the voice text-independent system were used to identify the speaker's ethnicity. The speech corpus consisted of a collection of readings of Malay texts by both genders with ages ranging from 10 to 48 years old and classified into three ethnic groups: Malay, Chinese, and Indian. GFCC and Mel Frequency Cepstral Coefficients (MFCC) were used to represent the human auditory system. Pitch was added to MFCC and GFCC, as it contributes to the differences in the human voice and is difficult to imitate. The use of Naïve Bayes, Support Vector Machine (SVM), and KNN as classifiers was to quantify the pattern classification performance. The dataset used the hold-out validation methods (80% training, 20% testing) to split the data for training and testing. The system's performance was assessed based on the validation and prediction accuracy. The results revealed that the GFCCP obtained the highest validation and prediction accuracy from the KNN classifier. The validation accuracy was 100%, 99.6%, and 99.2% for 12, 24, and 34 speakers, respectively, while the prediction accuracy was 89.98%, 73.56%, and 72.36% for 12, 24, and 34 speakers, respectively. An important finding in the study is that the combination of the pitch with MFCC and GFCC provided better accuracy, with the latter performing better than the former, compared with those of MFCC and GFCC alone under noisy conditions. 2022-03 Thesis NonPeerReviewed text en http://eprints.uthm.edu.my/10809/1/24p%20RAFIZAH%20MOHD%20HANIFA.pdf text en http://eprints.uthm.edu.my/10809/2/RAFIZAH%20MOHD%20HANIFA%20COPYRIGHT%20DECLARATION.pdf text en http://eprints.uthm.edu.my/10809/3/RAFIZAH%20MOHD%20HANIFA%20WATERMARK.pdf Mohd Hanifa, Rafizah (2022) Ethnic recognition system for Malay language speakers using gammatone frequency cepstral coefficients pitch (GFCCP) and pattern classification. Doctoral thesis, Universiti Tun Hussein Onn Malaysia.
institution Universiti Tun Hussein Onn Malaysia
building UTHM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Tun Hussein Onn Malaysia
content_source UTHM Institutional Repository
url_provider http://eprints.uthm.edu.my/
language English
English
English
topic T Technology (General)
spellingShingle T Technology (General)
Mohd Hanifa, Rafizah
Ethnic recognition system for Malay language speakers using gammatone frequency cepstral coefficients pitch (GFCCP) and pattern classification
description Malaysia is a multi-racial country consisting of many ethnic groups such as the Malay, Chinese, Indian, and Bumiputera, also known as a multilingual society. The Malay language is a non-tonal language, which does not need lexical stress. The study on recognizing the speaker's ethnicity is important as it has many potential and useful applications such as improving the interaction between robots and humans, audio forensic, telephone banking, and electronic commerce. Feature extraction, voice text-independent, and variability coverage are issues related to speaker recognition systems. The research focused on establishing a novel method, Gammatone Frequency Cepstral Coefficients and pitch (GFFCP) coupled with the K-Nearest Neighbours (KNN) and the voice text-independent system were used to identify the speaker's ethnicity. The speech corpus consisted of a collection of readings of Malay texts by both genders with ages ranging from 10 to 48 years old and classified into three ethnic groups: Malay, Chinese, and Indian. GFCC and Mel Frequency Cepstral Coefficients (MFCC) were used to represent the human auditory system. Pitch was added to MFCC and GFCC, as it contributes to the differences in the human voice and is difficult to imitate. The use of Naïve Bayes, Support Vector Machine (SVM), and KNN as classifiers was to quantify the pattern classification performance. The dataset used the hold-out validation methods (80% training, 20% testing) to split the data for training and testing. The system's performance was assessed based on the validation and prediction accuracy. The results revealed that the GFCCP obtained the highest validation and prediction accuracy from the KNN classifier. The validation accuracy was 100%, 99.6%, and 99.2% for 12, 24, and 34 speakers, respectively, while the prediction accuracy was 89.98%, 73.56%, and 72.36% for 12, 24, and 34 speakers, respectively. An important finding in the study is that the combination of the pitch with MFCC and GFCC provided better accuracy, with the latter performing better than the former, compared with those of MFCC and GFCC alone under noisy conditions.
format Thesis
author Mohd Hanifa, Rafizah
author_facet Mohd Hanifa, Rafizah
author_sort Mohd Hanifa, Rafizah
title Ethnic recognition system for Malay language speakers using gammatone frequency cepstral coefficients pitch (GFCCP) and pattern classification
title_short Ethnic recognition system for Malay language speakers using gammatone frequency cepstral coefficients pitch (GFCCP) and pattern classification
title_full Ethnic recognition system for Malay language speakers using gammatone frequency cepstral coefficients pitch (GFCCP) and pattern classification
title_fullStr Ethnic recognition system for Malay language speakers using gammatone frequency cepstral coefficients pitch (GFCCP) and pattern classification
title_full_unstemmed Ethnic recognition system for Malay language speakers using gammatone frequency cepstral coefficients pitch (GFCCP) and pattern classification
title_sort ethnic recognition system for malay language speakers using gammatone frequency cepstral coefficients pitch (gfccp) and pattern classification
publishDate 2022
url http://eprints.uthm.edu.my/10809/1/24p%20RAFIZAH%20MOHD%20HANIFA.pdf
http://eprints.uthm.edu.my/10809/2/RAFIZAH%20MOHD%20HANIFA%20COPYRIGHT%20DECLARATION.pdf
http://eprints.uthm.edu.my/10809/3/RAFIZAH%20MOHD%20HANIFA%20WATERMARK.pdf
http://eprints.uthm.edu.my/10809/
_version_ 1800094613994733568
score 13.188404