Staff View: On the Identification of FOSD-based Non-zero Onset Speech Dataset

On the Identification of FOSD-based Non-zero Onset Speech Dataset

Recent trends in voicebot and chatbot application development have enabled utilization of speech-to-text (STT) and text-to-speech (TTS) generation techniques. In order to develop such TTS or STT engines, text and the corresponding recorded speech in an audio file used for training, validating and te...

Full description

Saved in:

Bibliographic Details
Main Authors:	Tran, D.C., Ibrahim, R.
Format:	Conference or Workshop Item
Published:	Institute of Electrical and Electronics Engineers Inc. 2020
Online Access:	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85097784542&doi=10.1109%2fSCOReD50371.2020.9251018&partnerID=40&md5=1d2be75eae4378ac3c59411b41438a1c http://eprints.utp.edu.my/29962/
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.utp.eprints.29962
record_format	eprints
spelling	my.utp.eprints.299622022-03-25T03:17:05Z On the Identification of FOSD-based Non-zero Onset Speech Dataset Tran, D.C. Ibrahim, R. Recent trends in voicebot and chatbot application development have enabled utilization of speech-to-text (STT) and text-to-speech (TTS) generation techniques. In order to develop such TTS or STT engines, text and the corresponding recorded speech in an audio file used for training, validating and testing must be aligned. This is to ensure the developed engines achieve the desired conversion quality. In order to align speech and text, an audio alignment tool should be used. In such tools, often onset detection algorithms are utilized for labeling the audio file's speech start and end times. This information is then stored together with the file's transcript. In this work, an open nonzero onset Vietnamese speech dataset is provided. This dataset contains 348 audio files filtered from over 25,000 (approximately 30-hours) Vietnamese speech records released publicly by FPT Corporation, Vietnam in 2018. This amount of labeled data is considered to be more than sufficient for a typical onset detection algorithm researches. Â© 2020 IEEE. Institute of Electrical and Electronics Engineers Inc. 2020 Conference or Workshop Item NonPeerReviewed https://www.scopus.com/inward/record.uri?eid=2-s2.0-85097784542&doi=10.1109%2fSCOReD50371.2020.9251018&partnerID=40&md5=1d2be75eae4378ac3c59411b41438a1c Tran, D.C. and Ibrahim, R. (2020) On the Identification of FOSD-based Non-zero Onset Speech Dataset. In: UNSPECIFIED. http://eprints.utp.edu.my/29962/
institution	Universiti Teknologi Petronas
building	UTP Resource Centre
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Teknologi Petronas
content_source	UTP Institutional Repository
url_provider	http://eprints.utp.edu.my/
description	Recent trends in voicebot and chatbot application development have enabled utilization of speech-to-text (STT) and text-to-speech (TTS) generation techniques. In order to develop such TTS or STT engines, text and the corresponding recorded speech in an audio file used for training, validating and testing must be aligned. This is to ensure the developed engines achieve the desired conversion quality. In order to align speech and text, an audio alignment tool should be used. In such tools, often onset detection algorithms are utilized for labeling the audio file's speech start and end times. This information is then stored together with the file's transcript. In this work, an open nonzero onset Vietnamese speech dataset is provided. This dataset contains 348 audio files filtered from over 25,000 (approximately 30-hours) Vietnamese speech records released publicly by FPT Corporation, Vietnam in 2018. This amount of labeled data is considered to be more than sufficient for a typical onset detection algorithm researches. Â© 2020 IEEE.
format	Conference or Workshop Item
author	Tran, D.C. Ibrahim, R.
spellingShingle	Tran, D.C. Ibrahim, R. On the Identification of FOSD-based Non-zero Onset Speech Dataset
author_facet	Tran, D.C. Ibrahim, R.
author_sort	Tran, D.C.
title	On the Identification of FOSD-based Non-zero Onset Speech Dataset
title_short	On the Identification of FOSD-based Non-zero Onset Speech Dataset
title_full	On the Identification of FOSD-based Non-zero Onset Speech Dataset
title_fullStr	On the Identification of FOSD-based Non-zero Onset Speech Dataset
title_full_unstemmed	On the Identification of FOSD-based Non-zero Onset Speech Dataset
title_sort	on the identification of fosd-based non-zero onset speech dataset
publisher	Institute of Electrical and Electronics Engineers Inc.
publishDate	2020
url	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85097784542&doi=10.1109%2fSCOReD50371.2020.9251018&partnerID=40&md5=1d2be75eae4378ac3c59411b41438a1c http://eprints.utp.edu.my/29962/
_version_	1738657040183590912
score	13.211869

On the Identification of FOSD-based Non-zero Onset Speech Dataset

Similar Items