Staff View: Voice to text conversion app with speaker recognition

Voice to text conversion app with speaker recognition

This project is a final year project of a computer science student. Voice recognition is a field that is still quite underdeveloped. There are still a lot of obstacles that are needed to be overcome before voice recognition system can identify all the speakers correctly under all kind of conditions....

Full description

Saved in:

Bibliographic Details
Main Author:	Ang, Sea Zhe
Format:	Final Year Project / Dissertation / Thesis
Published:	2022
Subjects:	Q Science (General) T Technology (General)
Online Access:	http://eprints.utar.edu.my/4637/1/fyp_CS_2022_ASZ.pdf http://eprints.utar.edu.my/4637/
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my-utar-eprints.4637
record_format	eprints
spelling	my-utar-eprints.46372023-01-15T13:20:17Z Voice to text conversion app with speaker recognition Ang, Sea Zhe Q Science (General) T Technology (General) This project is a final year project of a computer science student. Voice recognition is a field that is still quite underdeveloped. There are still a lot of obstacles that are needed to be overcome before voice recognition system can identify all the speakers correctly under all kind of conditions. This would be helpful in speaker verification field, and also a speech recognition system that is personalized to the user. In this paper, Mel Frequency Cepstral Coefficient and delta of it are used to describe the vocal traits of a person. Mel Frequency Cepstral Coefficient is popular in this field to describe the phenomes of voice. After that, Gaussian Mixture Model is used to represent each speaker or each pair of speakers. In the first part of experiments using self-generated datasets, the total number of users that are tested in this paper is 5. 25 voice recordings, where 5 of them belongs to each speaker are used as the input to the system for single speaker identification. For two simultaneous speaker identifications, 65 voice recordings where 40 of them are artificially mixed are used as the input to the system for two simultaneous speaker identifications. In the second part of experiments using LibriSpeech datasets, the total number of users that are tested in this paper is 20 including me and speakers from LibriSpeech dataset. There are a total of 20 single speaker models. Not only that, 5 single word speaker models are trained to detect each short word is spoken by who. Finally, a Universal Background Model is also built for speaker verification. All the models are built using Gaussian Distributions Technique. The experiments that are done in second part of experiments are single speaker identification, non-overlapped multi-speaker identification with speech extraction with known speakers, speaker verification and speaker verification with noise estimation and speech extraction with unknown which is the main part of the project, Voice To Text Conversion With Speaker Recognition. 2022-04-20 Final Year Project / Dissertation / Thesis NonPeerReviewed application/pdf http://eprints.utar.edu.my/4637/1/fyp_CS_2022_ASZ.pdf Ang, Sea Zhe (2022) Voice to text conversion app with speaker recognition. Final Year Project, UTAR. http://eprints.utar.edu.my/4637/
institution	Universiti Tunku Abdul Rahman
building	UTAR Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Tunku Abdul Rahman
content_source	UTAR Institutional Repository
url_provider	http://eprints.utar.edu.my
topic	Q Science (General) T Technology (General)
spellingShingle	Q Science (General) T Technology (General) Ang, Sea Zhe Voice to text conversion app with speaker recognition
description	This project is a final year project of a computer science student. Voice recognition is a field that is still quite underdeveloped. There are still a lot of obstacles that are needed to be overcome before voice recognition system can identify all the speakers correctly under all kind of conditions. This would be helpful in speaker verification field, and also a speech recognition system that is personalized to the user. In this paper, Mel Frequency Cepstral Coefficient and delta of it are used to describe the vocal traits of a person. Mel Frequency Cepstral Coefficient is popular in this field to describe the phenomes of voice. After that, Gaussian Mixture Model is used to represent each speaker or each pair of speakers. In the first part of experiments using self-generated datasets, the total number of users that are tested in this paper is 5. 25 voice recordings, where 5 of them belongs to each speaker are used as the input to the system for single speaker identification. For two simultaneous speaker identifications, 65 voice recordings where 40 of them are artificially mixed are used as the input to the system for two simultaneous speaker identifications. In the second part of experiments using LibriSpeech datasets, the total number of users that are tested in this paper is 20 including me and speakers from LibriSpeech dataset. There are a total of 20 single speaker models. Not only that, 5 single word speaker models are trained to detect each short word is spoken by who. Finally, a Universal Background Model is also built for speaker verification. All the models are built using Gaussian Distributions Technique. The experiments that are done in second part of experiments are single speaker identification, non-overlapped multi-speaker identification with speech extraction with known speakers, speaker verification and speaker verification with noise estimation and speech extraction with unknown which is the main part of the project, Voice To Text Conversion With Speaker Recognition.
format	Final Year Project / Dissertation / Thesis
author	Ang, Sea Zhe
author_facet	Ang, Sea Zhe
author_sort	Ang, Sea Zhe
title	Voice to text conversion app with speaker recognition
title_short	Voice to text conversion app with speaker recognition
title_full	Voice to text conversion app with speaker recognition
title_fullStr	Voice to text conversion app with speaker recognition
title_full_unstemmed	Voice to text conversion app with speaker recognition
title_sort	voice to text conversion app with speaker recognition
publishDate	2022
url	http://eprints.utar.edu.my/4637/1/fyp_CS_2022_ASZ.pdf http://eprints.utar.edu.my/4637/
_version_	1755876963266854912
score	13.160551

Voice to text conversion app with speaker recognition

Similar Items