Speech emotion recognition using deep feedforward neural network

Speech emotion recognition (SER) is currently a research hotspot due to its challenging nature but bountiful future prospects. The objective of this research is to utilize Deep Neural Networks (DNNs) to recognize human speech emotion. First, the chosen speech feature Mel-frequency cepstral coefficie...

Full description

Saved in:
Bibliographic Details
Main Authors: Alghifari, Muhammad Fahreza, Gunawan, Teddy Surya, Kartiwi, Mira
Format: Article
Language:English
English
Published: IAES 2018
Subjects:
Online Access:http://irep.iium.edu.my/62495/7/62495%20Speech%20emotion%20recognition%20SCOPUS.pdf
http://irep.iium.edu.my/62495/13/62495_Speech%20emotion%20recognition%20using%20deep%20feedforward%20neural%20network_article.pdf
http://irep.iium.edu.my/62495/
http://www.iaescore.com/journals/index.php/IJEECS/article/view/11765/8301
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.iium.irep.62495
record_format dspace
spelling my.iium.irep.624952018-08-09T07:46:50Z http://irep.iium.edu.my/62495/ Speech emotion recognition using deep feedforward neural network Alghifari, Muhammad Fahreza Gunawan, Teddy Surya Kartiwi, Mira TK7885 Computer engineering Speech emotion recognition (SER) is currently a research hotspot due to its challenging nature but bountiful future prospects. The objective of this research is to utilize Deep Neural Networks (DNNs) to recognize human speech emotion. First, the chosen speech feature Mel-frequency cepstral coefficient (MFCC) were extracted from raw audio data. Second, the speech features extracted were fed into the DNN to train the network. The trained network was then tested onto a set of labelled emotion speech audio and the recognition rate was evaluated. Based on the accuracy rate the MFCC, number of neurons and layers are adjusted for optimization. Moreover, a custom-made database is introduced and validated using the network optimized. The optimum configuration for SER is 13 MFCC, 12 neurons and 2 layers for 3 emotions and 25 MFCC, 21 neurons and 4 layers for 4 emotions, achieving a total recognition rate of 96.3% for 3 emotions and 97.1% for 4 emotions. IAES 2018-05 Article PeerReviewed application/pdf en http://irep.iium.edu.my/62495/7/62495%20Speech%20emotion%20recognition%20SCOPUS.pdf application/pdf en http://irep.iium.edu.my/62495/13/62495_Speech%20emotion%20recognition%20using%20deep%20feedforward%20neural%20network_article.pdf Alghifari, Muhammad Fahreza and Gunawan, Teddy Surya and Kartiwi, Mira (2018) Speech emotion recognition using deep feedforward neural network. Indonesian Journal of Electrical Engineering and Computer Science, 10 (2). pp. 554-561. ISSN 2502-4752 http://www.iaescore.com/journals/index.php/IJEECS/article/view/11765/8301 10.11591/ijeecs.v10.i2.pp554-561
institution Universiti Islam Antarabangsa Malaysia
building IIUM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider International Islamic University Malaysia
content_source IIUM Repository (IREP)
url_provider http://irep.iium.edu.my/
language English
English
topic TK7885 Computer engineering
spellingShingle TK7885 Computer engineering
Alghifari, Muhammad Fahreza
Gunawan, Teddy Surya
Kartiwi, Mira
Speech emotion recognition using deep feedforward neural network
description Speech emotion recognition (SER) is currently a research hotspot due to its challenging nature but bountiful future prospects. The objective of this research is to utilize Deep Neural Networks (DNNs) to recognize human speech emotion. First, the chosen speech feature Mel-frequency cepstral coefficient (MFCC) were extracted from raw audio data. Second, the speech features extracted were fed into the DNN to train the network. The trained network was then tested onto a set of labelled emotion speech audio and the recognition rate was evaluated. Based on the accuracy rate the MFCC, number of neurons and layers are adjusted for optimization. Moreover, a custom-made database is introduced and validated using the network optimized. The optimum configuration for SER is 13 MFCC, 12 neurons and 2 layers for 3 emotions and 25 MFCC, 21 neurons and 4 layers for 4 emotions, achieving a total recognition rate of 96.3% for 3 emotions and 97.1% for 4 emotions.
format Article
author Alghifari, Muhammad Fahreza
Gunawan, Teddy Surya
Kartiwi, Mira
author_facet Alghifari, Muhammad Fahreza
Gunawan, Teddy Surya
Kartiwi, Mira
author_sort Alghifari, Muhammad Fahreza
title Speech emotion recognition using deep feedforward neural network
title_short Speech emotion recognition using deep feedforward neural network
title_full Speech emotion recognition using deep feedforward neural network
title_fullStr Speech emotion recognition using deep feedforward neural network
title_full_unstemmed Speech emotion recognition using deep feedforward neural network
title_sort speech emotion recognition using deep feedforward neural network
publisher IAES
publishDate 2018
url http://irep.iium.edu.my/62495/7/62495%20Speech%20emotion%20recognition%20SCOPUS.pdf
http://irep.iium.edu.my/62495/13/62495_Speech%20emotion%20recognition%20using%20deep%20feedforward%20neural%20network_article.pdf
http://irep.iium.edu.my/62495/
http://www.iaescore.com/journals/index.php/IJEECS/article/view/11765/8301
_version_ 1643617368988975104
score 13.214268