Bilingual sentiment analysis on Malaysian social media using vader and normalisation heuristics
This research addresses a number of important issues involved in performing Sentiment Analysis (SA) on Malaysian Social Media (SM), including an analysis of bilingual or mixed language, choice of sentiment lexicon, normalisation heuristics, and the use of public datasets. This work is the first to q...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English English |
Published: |
Journal of Theoretical and Applied Information Technology
2023
|
Subjects: | |
Online Access: | https://eprints.ums.edu.my/id/eprint/39226/1/ABSTRACT.pdf https://eprints.ums.edu.my/id/eprint/39226/2/FULL%20TEXT.pdf https://eprints.ums.edu.my/id/eprint/39226/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.ums.eprints.39226 |
---|---|
record_format |
eprints |
spelling |
my.ums.eprints.392262024-07-19T08:13:59Z https://eprints.ums.edu.my/id/eprint/39226/ Bilingual sentiment analysis on Malaysian social media using vader and normalisation heuristics James Mountstephens Tan, Mathieson Zui Quen Lai, Po Hung QA76.75-76.765 Computer software T10.5-11.9 Communication of technical information This research addresses a number of important issues involved in performing Sentiment Analysis (SA) on Malaysian Social Media (SM), including an analysis of bilingual or mixed language, choice of sentiment lexicon, normalisation heuristics, and the use of public datasets. This work is the first to quantify the level of language mixing in informal Malaysian text. Analysis of the 2M tweet Malaya dataset revealed a significant level of English sentiment content in Malaysian social media (13.5%), demonstrating the neccessity of a bilingual approach to Malaysian Sentiment Analysis. Significant patterns in noisy Malaysian SM text were identified and heuristics for normalising them were devised. The popular and effective English lexicon-based SA system VADER (Valence Aware Dictionary and sEntiment Reasoner) was translated to Malay using automatic and manual methods, with the combination of English and Malay VADER yielding a bilingual SA system. A subset of the Malaya dataset was both corrected and extended from two to three classes in order to properly test the bilingual SA system. Bilingual VADER with normalisation heuristics was able to achieve an impressive level of performance on a three-class problem (accuracy=0.71, mean F1=0.72), as compared to Malay VADER alone and several popular machine learning-based algorithms. Journal of Theoretical and Applied Information Technology 2023 Article NonPeerReviewed text en https://eprints.ums.edu.my/id/eprint/39226/1/ABSTRACT.pdf text en https://eprints.ums.edu.my/id/eprint/39226/2/FULL%20TEXT.pdf James Mountstephens and Tan, Mathieson Zui Quen and Lai, Po Hung (2023) Bilingual sentiment analysis on Malaysian social media using vader and normalisation heuristics. Journal of Theoretical and Applied Information Technology, 101 (12). pp. 5037-5050. ISSN 1992-8645 |
institution |
Universiti Malaysia Sabah |
building |
UMS Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Malaysia Sabah |
content_source |
UMS Institutional Repository |
url_provider |
http://eprints.ums.edu.my/ |
language |
English English |
topic |
QA76.75-76.765 Computer software T10.5-11.9 Communication of technical information |
spellingShingle |
QA76.75-76.765 Computer software T10.5-11.9 Communication of technical information James Mountstephens Tan, Mathieson Zui Quen Lai, Po Hung Bilingual sentiment analysis on Malaysian social media using vader and normalisation heuristics |
description |
This research addresses a number of important issues involved in performing Sentiment Analysis (SA) on Malaysian Social Media (SM), including an analysis of bilingual or mixed language, choice of sentiment lexicon, normalisation heuristics, and the use of public datasets. This work is the first to quantify the level of language mixing in informal Malaysian text. Analysis of the 2M tweet Malaya dataset revealed a significant level of English sentiment content in Malaysian social media (13.5%), demonstrating the neccessity of a bilingual approach to Malaysian Sentiment Analysis. Significant patterns in noisy Malaysian SM text were identified and heuristics for normalising them were devised. The popular and effective English lexicon-based SA system VADER (Valence Aware Dictionary and sEntiment Reasoner) was translated to Malay using automatic and manual methods, with the combination of English and Malay VADER yielding a bilingual SA system. A subset of the Malaya dataset was both corrected and extended from two to three classes in order to properly test the bilingual SA system. Bilingual VADER with normalisation heuristics was able to achieve an impressive level of performance on a three-class problem (accuracy=0.71, mean F1=0.72), as compared to Malay VADER alone and several popular machine learning-based algorithms. |
format |
Article |
author |
James Mountstephens Tan, Mathieson Zui Quen Lai, Po Hung |
author_facet |
James Mountstephens Tan, Mathieson Zui Quen Lai, Po Hung |
author_sort |
James Mountstephens |
title |
Bilingual sentiment analysis on Malaysian social media using vader and normalisation heuristics |
title_short |
Bilingual sentiment analysis on Malaysian social media using vader and normalisation heuristics |
title_full |
Bilingual sentiment analysis on Malaysian social media using vader and normalisation heuristics |
title_fullStr |
Bilingual sentiment analysis on Malaysian social media using vader and normalisation heuristics |
title_full_unstemmed |
Bilingual sentiment analysis on Malaysian social media using vader and normalisation heuristics |
title_sort |
bilingual sentiment analysis on malaysian social media using vader and normalisation heuristics |
publisher |
Journal of Theoretical and Applied Information Technology |
publishDate |
2023 |
url |
https://eprints.ums.edu.my/id/eprint/39226/1/ABSTRACT.pdf https://eprints.ums.edu.my/id/eprint/39226/2/FULL%20TEXT.pdf https://eprints.ums.edu.my/id/eprint/39226/ |
_version_ |
1805887935540625408 |
score |
13.211869 |