Implementation and analysis of GMM-based speaker identification on FPGA

The use of highly accurate identification systems is required in today’s society. Existing systems such as pin numbers and passwords can be forgotten or forged easily and they are no longer considered to offer a high level of security. The use of biological features (biometrics) is becoming widely a...

Full description

Saved in:
Bibliographic Details
Main Author: Phaklen, Ehkan
Format: Thesis
Language:English
Published: Universiti Malaysia Perlis (UniMAP) 2014
Subjects:
Online Access:http://dspace.unimap.edu.my:80/dspace/handle/123456789/32469
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.unimap-32469
record_format dspace
spelling my.unimap-324692014-03-09T12:53:43Z Implementation and analysis of GMM-based speaker identification on FPGA Phaklen, Ehkan Field Programmable Gate Array (FPGA) Gaussian Mixture Model (GMM) Personal identification systems Biometric-based speaker identification Speech signals The use of highly accurate identification systems is required in today’s society. Existing systems such as pin numbers and passwords can be forgotten or forged easily and they are no longer considered to offer a high level of security. The use of biological features (biometrics) is becoming widely accepted as the next level for security systems. One of the biometric is the human voice and it leads to the task of speaker identification. Speaker identification is the process of determining whether a speaker exists in a group of known speakers and identifying the speaker within the group. Speaker specific characteristics exist in speech signals due to different speakers having different resonances of the vocal tract. These differences can be exploited by extracting Mel-frequency Cepstral Coefficients (MFCCs) from the speech signal. A statistical modelling process known as Gaussian Mixture Model (GMM) is used to model the distribution of each speaker’s MFCCs in a multi-dimensional acoustic space. GMM involves with two phases called training and classification. The training phase is complex and is better suited for implementation in software. The classification phase is well suited for implementation in hardware and this allows for real time processing of multiple voice streams on large population sizes. Several innovative techniques are demonstrated which enable hardware system to obtain two orders of magnitude speed up over software while maintaining comparable levels of accuracy. A speedup factor of eighty six is achieved on hardware-based FPGA compared to a software implementation on a standard PC for this approach. 2014-03-09T12:53:43Z 2014-03-09T12:53:43Z 2012 Thesis http://dspace.unimap.edu.my:80/dspace/handle/123456789/32469 en Universiti Malaysia Perlis (UniMAP) School of Computer and Communication Engineering
institution Universiti Malaysia Perlis
building UniMAP Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Perlis
content_source UniMAP Library Digital Repository
url_provider http://dspace.unimap.edu.my/
language English
topic Field Programmable Gate Array (FPGA)
Gaussian Mixture Model (GMM)
Personal identification systems
Biometric-based speaker identification
Speech signals
spellingShingle Field Programmable Gate Array (FPGA)
Gaussian Mixture Model (GMM)
Personal identification systems
Biometric-based speaker identification
Speech signals
Phaklen, Ehkan
Implementation and analysis of GMM-based speaker identification on FPGA
description The use of highly accurate identification systems is required in today’s society. Existing systems such as pin numbers and passwords can be forgotten or forged easily and they are no longer considered to offer a high level of security. The use of biological features (biometrics) is becoming widely accepted as the next level for security systems. One of the biometric is the human voice and it leads to the task of speaker identification. Speaker identification is the process of determining whether a speaker exists in a group of known speakers and identifying the speaker within the group. Speaker specific characteristics exist in speech signals due to different speakers having different resonances of the vocal tract. These differences can be exploited by extracting Mel-frequency Cepstral Coefficients (MFCCs) from the speech signal. A statistical modelling process known as Gaussian Mixture Model (GMM) is used to model the distribution of each speaker’s MFCCs in a multi-dimensional acoustic space. GMM involves with two phases called training and classification. The training phase is complex and is better suited for implementation in software. The classification phase is well suited for implementation in hardware and this allows for real time processing of multiple voice streams on large population sizes. Several innovative techniques are demonstrated which enable hardware system to obtain two orders of magnitude speed up over software while maintaining comparable levels of accuracy. A speedup factor of eighty six is achieved on hardware-based FPGA compared to a software implementation on a standard PC for this approach.
format Thesis
author Phaklen, Ehkan
author_facet Phaklen, Ehkan
author_sort Phaklen, Ehkan
title Implementation and analysis of GMM-based speaker identification on FPGA
title_short Implementation and analysis of GMM-based speaker identification on FPGA
title_full Implementation and analysis of GMM-based speaker identification on FPGA
title_fullStr Implementation and analysis of GMM-based speaker identification on FPGA
title_full_unstemmed Implementation and analysis of GMM-based speaker identification on FPGA
title_sort implementation and analysis of gmm-based speaker identification on fpga
publisher Universiti Malaysia Perlis (UniMAP)
publishDate 2014
url http://dspace.unimap.edu.my:80/dspace/handle/123456789/32469
_version_ 1643796891260944384
score 13.214268