Performance evaluation of multilabel emotion classification using data augmentation techniques

One of the challenges of emotion classification is the existence of low annotated datasets, that makes the task more complex. Certain existing datasets often suffer from imbalanced data for the emotion classes. Several data augmentation approaches can help to overcome the challenges regarding imbala...

Full description

Saved in:
Bibliographic Details
Main Authors: Ahanin, Zahra, Ismail, Maizatul Akmar, Herawan, Tutut
Format: Article
Published: Faculty of Computer Science and Information Technology, University of Malaya 2024
Subjects:
Online Access:http://eprints.um.edu.my/45835/
https://doi.org/10.22452/mjcs.vol37no2.4
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.um.eprints.45835
record_format eprints
spelling my.um.eprints.458352024-11-12T07:33:21Z http://eprints.um.edu.my/45835/ Performance evaluation of multilabel emotion classification using data augmentation techniques Ahanin, Zahra Ismail, Maizatul Akmar Herawan, Tutut QA75 Electronic computers. Computer science One of the challenges of emotion classification is the existence of low annotated datasets, that makes the task more complex. Certain existing datasets often suffer from imbalanced data for the emotion classes. Several data augmentation approaches can help to overcome the challenges regarding imbalanced datasets. However, the existing data augmentation techniques in emotion classification lack consideration for the contextual nuances of emotions and this area is still relatively underexplored. In this work, we study the impact of data augmentation on classification performance of three machine learning models including Logistic Regression, BiLSTM and BERT and compare frequently used methods to address the issue. Specifically, we assessed Easy Data Augmentation (EDA) and contextual Embedding -based data augmentation (BERT) on two datasets. Based on the experimental results, we combined two BERT -based augmentation techniques including insert and substitute, to generate data for minority emotion classes. Furthermore, we proposed a data augmentation method using ChatGPT. Compared to the baseline models, incorporating the BERT augmentation techniques with BERT model resulted in improvements of +4.34% and +5.56% in Macro F1 score on the SemEval-2018 and GoEmotions datasets, respectively. Moreover, the proposed augmentation technique utilizing ChatGPT yielded improvements of +3.55% and +4.83% on the same datasets. Faculty of Computer Science and Information Technology, University of Malaya 2024 Article PeerReviewed Ahanin, Zahra and Ismail, Maizatul Akmar and Herawan, Tutut (2024) Performance evaluation of multilabel emotion classification using data augmentation techniques. Malaysian Journal of Computer Science, 37 (2). pp. 154-168. ISSN 0127-9084, DOI https://doi.org/10.22452/mjcs.vol37no2.4 <https://doi.org/10.22452/mjcs.vol37no2.4>. https://doi.org/10.22452/mjcs.vol37no2.4 10.22452/mjcs.vol37no2.4
institution Universiti Malaya
building UM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaya
content_source UM Research Repository
url_provider http://eprints.um.edu.my/
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Ahanin, Zahra
Ismail, Maizatul Akmar
Herawan, Tutut
Performance evaluation of multilabel emotion classification using data augmentation techniques
description One of the challenges of emotion classification is the existence of low annotated datasets, that makes the task more complex. Certain existing datasets often suffer from imbalanced data for the emotion classes. Several data augmentation approaches can help to overcome the challenges regarding imbalanced datasets. However, the existing data augmentation techniques in emotion classification lack consideration for the contextual nuances of emotions and this area is still relatively underexplored. In this work, we study the impact of data augmentation on classification performance of three machine learning models including Logistic Regression, BiLSTM and BERT and compare frequently used methods to address the issue. Specifically, we assessed Easy Data Augmentation (EDA) and contextual Embedding -based data augmentation (BERT) on two datasets. Based on the experimental results, we combined two BERT -based augmentation techniques including insert and substitute, to generate data for minority emotion classes. Furthermore, we proposed a data augmentation method using ChatGPT. Compared to the baseline models, incorporating the BERT augmentation techniques with BERT model resulted in improvements of +4.34% and +5.56% in Macro F1 score on the SemEval-2018 and GoEmotions datasets, respectively. Moreover, the proposed augmentation technique utilizing ChatGPT yielded improvements of +3.55% and +4.83% on the same datasets.
format Article
author Ahanin, Zahra
Ismail, Maizatul Akmar
Herawan, Tutut
author_facet Ahanin, Zahra
Ismail, Maizatul Akmar
Herawan, Tutut
author_sort Ahanin, Zahra
title Performance evaluation of multilabel emotion classification using data augmentation techniques
title_short Performance evaluation of multilabel emotion classification using data augmentation techniques
title_full Performance evaluation of multilabel emotion classification using data augmentation techniques
title_fullStr Performance evaluation of multilabel emotion classification using data augmentation techniques
title_full_unstemmed Performance evaluation of multilabel emotion classification using data augmentation techniques
title_sort performance evaluation of multilabel emotion classification using data augmentation techniques
publisher Faculty of Computer Science and Information Technology, University of Malaya
publishDate 2024
url http://eprints.um.edu.my/45835/
https://doi.org/10.22452/mjcs.vol37no2.4
_version_ 1816130465365491712
score 13.223943