The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition

Human speech indirectly represents the mental state or emotion of others. The use of Artificial Intelligence (AI)-based techniques may bring revolution in this modern era by recognizing emotion from speech. In this study, we introduced a robust method for emotion recognition from human speech using...

Full description

Saved in:
Bibliographic Details
Main Authors: Uddin, Mohammad Amaz, Chowdury, Mohammad Salah Uddin, Khandaker, Mayeen Uddin *, Tamam, Nissren, Sulieman, Abdelmoneim
Format: Article
Language:English
Published: Tech Science Press 2022
Subjects:
Online Access:http://eprints.sunway.edu.my/2250/1/28.pdf
http://eprints.sunway.edu.my/2250/
https://doi.org/10.32604/cmc.2023.031177
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.sunway.eprints.2250
record_format eprints
spelling my.sunway.eprints.22502023-06-16T01:38:32Z http://eprints.sunway.edu.my/2250/ The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition Uddin, Mohammad Amaz Chowdury, Mohammad Salah Uddin Khandaker, Mayeen Uddin * Tamam, Nissren Sulieman, Abdelmoneim BF Psychology Q Science (General) TA Engineering (General). Civil engineering (General) Human speech indirectly represents the mental state or emotion of others. The use of Artificial Intelligence (AI)-based techniques may bring revolution in this modern era by recognizing emotion from speech. In this study, we introduced a robust method for emotion recognition from human speech using a well-performed preprocessing technique together with the deep learning-based mixed model consisting of Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN). About 2800 audio files were extracted from the Toronto emotional speech set (TESS) database for this study. A high pass and Savitzky Golay Filter have been used to obtain noise-free as well as smooth audio data. A total of seven types of emotions; Angry, Disgust, Fear, Happy, Neutral, Pleasant-surprise, and Sad were used in this study. Energy, Fundamental frequency, and Mel Frequency Cepstral Coefficient (MFCC) have been used to extract the emotion features, and these features resulted in 97.5% accuracy in the mixed LSTM+CNN model. This mixed model is found to be performed better than the usual state-of-the-art models in emotion recognition from speech. It also indicates that this mixed model could be effectively utilized in advanced research dealing with sound processing. Tech Science Press 2022-09-22 Article PeerReviewed text en cc_by_4 http://eprints.sunway.edu.my/2250/1/28.pdf Uddin, Mohammad Amaz and Chowdury, Mohammad Salah Uddin and Khandaker, Mayeen Uddin * and Tamam, Nissren and Sulieman, Abdelmoneim (2022) The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition. Computers, Materials & Continua, 74 (1). pp. 1709-1722. ISSN 1546-2226 https://doi.org/10.32604/cmc.2023.031177 10.32604/cmc.2023.031177
institution Sunway University
building Sunway Campus Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Sunway University
content_source Sunway Institutional Repository
url_provider http://eprints.sunway.edu.my/
language English
topic BF Psychology
Q Science (General)
TA Engineering (General). Civil engineering (General)
spellingShingle BF Psychology
Q Science (General)
TA Engineering (General). Civil engineering (General)
Uddin, Mohammad Amaz
Chowdury, Mohammad Salah Uddin
Khandaker, Mayeen Uddin *
Tamam, Nissren
Sulieman, Abdelmoneim
The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition
description Human speech indirectly represents the mental state or emotion of others. The use of Artificial Intelligence (AI)-based techniques may bring revolution in this modern era by recognizing emotion from speech. In this study, we introduced a robust method for emotion recognition from human speech using a well-performed preprocessing technique together with the deep learning-based mixed model consisting of Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN). About 2800 audio files were extracted from the Toronto emotional speech set (TESS) database for this study. A high pass and Savitzky Golay Filter have been used to obtain noise-free as well as smooth audio data. A total of seven types of emotions; Angry, Disgust, Fear, Happy, Neutral, Pleasant-surprise, and Sad were used in this study. Energy, Fundamental frequency, and Mel Frequency Cepstral Coefficient (MFCC) have been used to extract the emotion features, and these features resulted in 97.5% accuracy in the mixed LSTM+CNN model. This mixed model is found to be performed better than the usual state-of-the-art models in emotion recognition from speech. It also indicates that this mixed model could be effectively utilized in advanced research dealing with sound processing.
format Article
author Uddin, Mohammad Amaz
Chowdury, Mohammad Salah Uddin
Khandaker, Mayeen Uddin *
Tamam, Nissren
Sulieman, Abdelmoneim
author_facet Uddin, Mohammad Amaz
Chowdury, Mohammad Salah Uddin
Khandaker, Mayeen Uddin *
Tamam, Nissren
Sulieman, Abdelmoneim
author_sort Uddin, Mohammad Amaz
title The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition
title_short The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition
title_full The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition
title_fullStr The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition
title_full_unstemmed The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition
title_sort efficacy of deep learning-based mixed model for speech emotion recognition
publisher Tech Science Press
publishDate 2022
url http://eprints.sunway.edu.my/2250/1/28.pdf
http://eprints.sunway.edu.my/2250/
https://doi.org/10.32604/cmc.2023.031177
_version_ 1769846265551519744
score 13.159267