Gender and accent identification for Malaysian English using MFCC and Gaussian mixture model
Speaker and speech variability are a challenge in speaker and speech recognition. In the context of Malaysian English speakers, the variability is highly complex due to sociolinguistic and cultural background. Past researches focused on vowel classification of Malaysian English on a small dataset wi...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2013
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/35850/5/GohEngLynMFSKSM2013.pdf http://eprints.utm.my/id/eprint/35850/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:70422?site_name=Restricted Repository |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.utm.35850 |
---|---|
record_format |
eprints |
spelling |
my.utm.358502017-07-13T06:19:11Z http://eprints.utm.my/id/eprint/35850/ Gender and accent identification for Malaysian English using MFCC and Gaussian mixture model Goh, Eng Lyn TK Electrical engineering. Electronics Nuclear engineering Speaker and speech variability are a challenge in speaker and speech recognition. In the context of Malaysian English speakers, the variability is highly complex due to sociolinguistic and cultural background. Past researches focused on vowel classification of Malaysian English on a small dataset with limited number of speakers and text corpus. Among speaker specific characteristics, gender is the most prominent features, which is followed by accents. This project approached the issues of speaker and speech variability in Malaysian English by proposing identifier that combined the gender and accent aspects. This method is fulfilled by training the classifier with gender-dependent data and accent-prone text corpus. The genderaccent database collected is comprised of 120 speakers categorized into four genderaccent groups namely Malay Female (CF), Malay Male (MM), Chinese Female (CF) and Chinese Male (CM). MFCC algorithm is used to extract the features, while GMM is the algorithm used to model the identifier. Findings from the test results show that female and Chinese speakers have higher degree of distinctiveness compared to other accent-gender groups. Chinese female is the best recognized accent-gender group, meanwhile Malay Male is the least recognized, due to codemixing of Malay language and English. Optimum configuration of GMM is also studied across 3 different numbers of Gaussians (12, 24, and 32). It is found that 24 is the most optimal configuration of MFCC-GMM. Meanwhile, it is also known that MFCC-GMM performs better than LPC-KNN on noisy dataset. Overall, the MFCCGMM identifier scored 99.34% gender identification rate, 67.5% accent identification and 65.83% for accent-gender identification task. 2013-06 Thesis NonPeerReviewed application/pdf en http://eprints.utm.my/id/eprint/35850/5/GohEngLynMFSKSM2013.pdf Goh, Eng Lyn (2013) Gender and accent identification for Malaysian English using MFCC and Gaussian mixture model. Masters thesis, Universiti Teknologi Malaysia, Faculty of Computing. http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:70422?site_name=Restricted Repository |
institution |
Universiti Teknologi Malaysia |
building |
UTM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Malaysia |
content_source |
UTM Institutional Repository |
url_provider |
http://eprints.utm.my/ |
language |
English |
topic |
TK Electrical engineering. Electronics Nuclear engineering |
spellingShingle |
TK Electrical engineering. Electronics Nuclear engineering Goh, Eng Lyn Gender and accent identification for Malaysian English using MFCC and Gaussian mixture model |
description |
Speaker and speech variability are a challenge in speaker and speech recognition. In the context of Malaysian English speakers, the variability is highly complex due to sociolinguistic and cultural background. Past researches focused on vowel classification of Malaysian English on a small dataset with limited number of speakers and text corpus. Among speaker specific characteristics, gender is the most prominent features, which is followed by accents. This project approached the issues of speaker and speech variability in Malaysian English by proposing identifier that combined the gender and accent aspects. This method is fulfilled by training the classifier with gender-dependent data and accent-prone text corpus. The genderaccent database collected is comprised of 120 speakers categorized into four genderaccent groups namely Malay Female (CF), Malay Male (MM), Chinese Female (CF) and Chinese Male (CM). MFCC algorithm is used to extract the features, while GMM is the algorithm used to model the identifier. Findings from the test results show that female and Chinese speakers have higher degree of distinctiveness compared to other accent-gender groups. Chinese female is the best recognized accent-gender group, meanwhile Malay Male is the least recognized, due to codemixing of Malay language and English. Optimum configuration of GMM is also studied across 3 different numbers of Gaussians (12, 24, and 32). It is found that 24 is the most optimal configuration of MFCC-GMM. Meanwhile, it is also known that MFCC-GMM performs better than LPC-KNN on noisy dataset. Overall, the MFCCGMM identifier scored 99.34% gender identification rate, 67.5% accent identification and 65.83% for accent-gender identification task. |
format |
Thesis |
author |
Goh, Eng Lyn |
author_facet |
Goh, Eng Lyn |
author_sort |
Goh, Eng Lyn |
title |
Gender and accent identification for Malaysian English using MFCC and Gaussian mixture model |
title_short |
Gender and accent identification for Malaysian English using MFCC and Gaussian mixture model |
title_full |
Gender and accent identification for Malaysian English using MFCC and Gaussian mixture model |
title_fullStr |
Gender and accent identification for Malaysian English using MFCC and Gaussian mixture model |
title_full_unstemmed |
Gender and accent identification for Malaysian English using MFCC and Gaussian mixture model |
title_sort |
gender and accent identification for malaysian english using mfcc and gaussian mixture model |
publishDate |
2013 |
url |
http://eprints.utm.my/id/eprint/35850/5/GohEngLynMFSKSM2013.pdf http://eprints.utm.my/id/eprint/35850/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:70422?site_name=Restricted Repository |
_version_ |
1643649859181346816 |
score |
13.211869 |