Staff View: Lightweight real-time recurrent models for speech enhancement and automatic speech recognition

Lightweight real-time recurrent models for speech enhancement and automatic speech recognition

Traditional recurrent neural networks (RNNs) encounter difficulty in capturing long-term temporal dependencies. However, lightweight recurrent models for speech enhancement are important to improve noisy speech, while being computationally efficient and able to capture long-term temporal dependencie...

Full description

Saved in:

Bibliographic Details
Main Authors:	Dhahbi, Sami, Saleem, Nasir, Gunawan, Teddy Surya, Bourouis, Sami, Ali, Imad, Trigui, Aymen, Algarni, Abeer D
Format:	Article
Language:	English English English
Published:	Universidad Internacional de la Rioja 2024
Subjects:	TK7885 Computer engineering
Online Access:	http://irep.iium.edu.my/113757/1/113757_Lightweight%20real-time%20recurrent%20models.pdf http://irep.iium.edu.my/113757/2/113757_Lightweight%20real-time%20recurrent%20models_SCOPUS.pdf http://irep.iium.edu.my/113757/3/113757_Lightweight%20real-time%20recurrent%20models_WOS.pdf http://irep.iium.edu.my/113757/ https://www.ijimai.org/journal/bibcite/reference/3450
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.iium.irep.113757
record_format	dspace
spelling	my.iium.irep.1137572024-08-07T08:50:20Z http://irep.iium.edu.my/113757/ Lightweight real-time recurrent models for speech enhancement and automatic speech recognition Dhahbi, Sami Saleem, Nasir Gunawan, Teddy Surya Bourouis, Sami Ali, Imad Trigui, Aymen Algarni, Abeer D TK7885 Computer engineering Traditional recurrent neural networks (RNNs) encounter difficulty in capturing long-term temporal dependencies. However, lightweight recurrent models for speech enhancement are important to improve noisy speech, while being computationally efficient and able to capture long-term temporal dependencies efficiently. This study proposes a lightweight hourglass-shaped model for speech enhancement (SE) and automatic speech recognition (ASR). Simple recurrent units (SRU) with skip connections are implemented where attention gates are added to the skip connections, highlighting the important features and spectral regions. The model operates without relying on future information that is well-suited for real-time processing. Combined acoustic features and two training objectives are estimated. Experimental evaluations using the short time speech intelligibility (STOI), perceptual evaluation of speech quality (PESQ), and word error rates (WERs) indicate better intelligibility, perceptual quality, and word recognition rates. The composite measures further confirm the performance of residual noise and speech distortion. With the TIMIT database, the proposed model improves the STOI and PESQ by 16.21% and 0.69 (31.1%) whereas with the LibriSpeech database, the model improves STOI by 16.41% and PESQ by 0.71 (32.9%) over the noisy speech. Further, our model outperforms other deep neural networks (DNNs) in seen and unseen conditions. The ASR performance is measured using the Kaldi toolkit and achieves 15.13% WERs in noisy backgrounds. Universidad Internacional de la Rioja 2024-06 Article PeerReviewed application/pdf en http://irep.iium.edu.my/113757/1/113757_Lightweight%20real-time%20recurrent%20models.pdf application/pdf en http://irep.iium.edu.my/113757/2/113757_Lightweight%20real-time%20recurrent%20models_SCOPUS.pdf application/pdf en http://irep.iium.edu.my/113757/3/113757_Lightweight%20real-time%20recurrent%20models_WOS.pdf Dhahbi, Sami and Saleem, Nasir and Gunawan, Teddy Surya and Bourouis, Sami and Ali, Imad and Trigui, Aymen and Algarni, Abeer D (2024) Lightweight real-time recurrent models for speech enhancement and automatic speech recognition. International Journal of Interactive Multimedia and Artificial Intelligence, 8 (6). pp. 74-85. ISSN 1989-1660 https://www.ijimai.org/journal/bibcite/reference/3450 10.9781/ijimai.2024.04.003
institution	Universiti Islam Antarabangsa Malaysia
building	IIUM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	International Islamic University Malaysia
content_source	IIUM Repository (IREP)
url_provider	http://irep.iium.edu.my/
language	English English English
topic	TK7885 Computer engineering
spellingShingle	TK7885 Computer engineering Dhahbi, Sami Saleem, Nasir Gunawan, Teddy Surya Bourouis, Sami Ali, Imad Trigui, Aymen Algarni, Abeer D Lightweight real-time recurrent models for speech enhancement and automatic speech recognition
description	Traditional recurrent neural networks (RNNs) encounter difficulty in capturing long-term temporal dependencies. However, lightweight recurrent models for speech enhancement are important to improve noisy speech, while being computationally efficient and able to capture long-term temporal dependencies efficiently. This study proposes a lightweight hourglass-shaped model for speech enhancement (SE) and automatic speech recognition (ASR). Simple recurrent units (SRU) with skip connections are implemented where attention gates are added to the skip connections, highlighting the important features and spectral regions. The model operates without relying on future information that is well-suited for real-time processing. Combined acoustic features and two training objectives are estimated. Experimental evaluations using the short time speech intelligibility (STOI), perceptual evaluation of speech quality (PESQ), and word error rates (WERs) indicate better intelligibility, perceptual quality, and word recognition rates. The composite measures further confirm the performance of residual noise and speech distortion. With the TIMIT database, the proposed model improves the STOI and PESQ by 16.21% and 0.69 (31.1%) whereas with the LibriSpeech database, the model improves STOI by 16.41% and PESQ by 0.71 (32.9%) over the noisy speech. Further, our model outperforms other deep neural networks (DNNs) in seen and unseen conditions. The ASR performance is measured using the Kaldi toolkit and achieves 15.13% WERs in noisy backgrounds.
format	Article
author	Dhahbi, Sami Saleem, Nasir Gunawan, Teddy Surya Bourouis, Sami Ali, Imad Trigui, Aymen Algarni, Abeer D
author_facet	Dhahbi, Sami Saleem, Nasir Gunawan, Teddy Surya Bourouis, Sami Ali, Imad Trigui, Aymen Algarni, Abeer D
author_sort	Dhahbi, Sami
title	Lightweight real-time recurrent models for speech enhancement and automatic speech recognition
title_short	Lightweight real-time recurrent models for speech enhancement and automatic speech recognition
title_full	Lightweight real-time recurrent models for speech enhancement and automatic speech recognition
title_fullStr	Lightweight real-time recurrent models for speech enhancement and automatic speech recognition
title_full_unstemmed	Lightweight real-time recurrent models for speech enhancement and automatic speech recognition
title_sort	lightweight real-time recurrent models for speech enhancement and automatic speech recognition
publisher	Universidad Internacional de la Rioja
publishDate	2024
url	http://irep.iium.edu.my/113757/1/113757_Lightweight%20real-time%20recurrent%20models.pdf http://irep.iium.edu.my/113757/2/113757_Lightweight%20real-time%20recurrent%20models_SCOPUS.pdf http://irep.iium.edu.my/113757/3/113757_Lightweight%20real-time%20recurrent%20models_WOS.pdf http://irep.iium.edu.my/113757/ https://www.ijimai.org/journal/bibcite/reference/3450
_version_	1807048414593024000
score	13.214268

Lightweight real-time recurrent models for speech enhancement and automatic speech recognition

Similar Items