Speech-based depression detection for Bahasa Malaysia female speakers using deep learning

Depression is a mental disorder of high prevalence, leading to a negative effect on individuals, society, and the economy. Traditional clinical diagnosis methods are subjective and require extensive participation of experts. Furthermore, the severe shortage in psychiatrists’ ratio per population i...

Full description

Saved in:
Bibliographic Details
Main Authors: Ahmed Ezzi, Mugahed Al-Ezzi, Nik Hashim, Nik Nur Wahidah, Ahmad Basri, Nadzirah, Toha, Siti Fauziah
Format: Article
Language:English
Published: Penerbit UTM Press 2021
Subjects:
Online Access:http://irep.iium.edu.my/94038/7/94038_Speech-based%20depression%20detection%20for%20Bahasa%20Malaysia.pdf
http://irep.iium.edu.my/94038/
https://elektrika.utm.my/index.php/ELEKTRIKA_Journal/article/view/318/195
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Depression is a mental disorder of high prevalence, leading to a negative effect on individuals, society, and the economy. Traditional clinical diagnosis methods are subjective and require extensive participation of experts. Furthermore, the severe shortage in psychiatrists’ ratio per population in Malaysia imposes patients’ delay in seeking treatment and poor compliance to follow-up. Besides, the social stigma of visiting psychiatric clinics also prevents patients from seeking early treatment. Automatic depression detection using speech signals is a promising depression biometric because it is fast, convenient, and non-invasive. This research attempts to develop an end-to-end deep learning model to classify depression from female Bahasa Malaysia speech using our dataset. Depression status was identified by the Patient Health Questionnaire 9, the Malay Beck Depression Inventory-II, and subjects’ declaration of Major Depressive Disorder diagnosis by a trained clinician. The dataset consists of 110 female participants. We provided a detailed implementation of deep learning models using raw audio input. Multiple combinations of speech types were analyzed using various deep neural network models. After performing hyperparameters tunning, raw audio input from female read and spontaneous speech combination using AttCRNN model achieved an accuracy of 91%.