Staff View: TOWARDS CURBING CYBER-BULLYING IN MALAYSIA BY AUTHOR IDENTIFICATION OF IBAN AND KADAZANDUSUN OSN TEXT USING DEEP LEARNING

TOWARDS CURBING CYBER-BULLYING IN MALAYSIA BY AUTHOR IDENTIFICATION OF IBAN AND KADAZANDUSUN OSN TEXT USING DEEP LEARNING

Online Social Network (OSN) is frequently used to carry out cyber-criminal actions such as cyberbullying. As a developing country in Asia that keeps abreast of ICT advancement, Malaysia is no exception when it comes to cyberbullying. Author Identification (AI) task plays a vital role in social media...

Full description

Saved in:

Bibliographic Details
Main Authors:	Nursyahirah, Tarmizi, Suhaila, Saee, Dayang Hanani, Abang Ibrahim
Format:	Article
Language:	English
Published:	Penerbit UTM Press 2023
Subjects:	QA75 Electronic computers. Computer science
Online Access:	http://ir.unimas.my/id/eprint/42630/3/TOWARDS%20CURBING%20-%20Copy.pdf http://ir.unimas.my/id/eprint/42630/ https://journals.utm.my/aej/article/view/19171 https://doi.org/10.11113/aej.v13.19171
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.unimas.ir.42630
record_format	eprints
spelling	my.unimas.ir.426302023-08-21T04:06:46Z http://ir.unimas.my/id/eprint/42630/ TOWARDS CURBING CYBER-BULLYING IN MALAYSIA BY AUTHOR IDENTIFICATION OF IBAN AND KADAZANDUSUN OSN TEXT USING DEEP LEARNING Nursyahirah, Tarmizi Suhaila, Saee Dayang Hanani, Abang Ibrahim QA75 Electronic computers. Computer science Online Social Network (OSN) is frequently used to carry out cyber-criminal actions such as cyberbullying. As a developing country in Asia that keeps abreast of ICT advancement, Malaysia is no exception when it comes to cyberbullying. Author Identification (AI) task plays a vital role in social media forensic investigation (SMF) to unveil the genuine identity of the offender by analysing the text written in OSN by the candidate culprits. Several challenges in AI dealing with OSN text, including limited text length and informal language full of internet jargon and grammatical errors that further impact AI's performance in SMF. The traditional AI system that analyses long text documents seems inadequate to analyse short OSN text's writing style. N-gram features are proven to efficiently represent the authors' writing style for shot text. However, representing N-grams in traditional representation like Tf-IDF resulted in sparse and difficult in grasping the semantic information from text. Besides, most AI works have been done in English but receive less attention in indigenous languages. In West Malaysia, the supreme languages that transcend ethnic boundaries are Iban of Sarawak and KadazanDusun of Sabah, which both are inherently under-resourced. This paper presented a proposed workflow of AI for short OSN text using two Under-Resourced Language (U-RL), Iban and KadazanDusun tweets, to curb the cyberbullying issue in Malaysia. This paper compares Tf-Idf (sparse) and SoA embedding-based (dense) feature representations to observe which representations best represent the stylistic features of the authors’ writing. N-grams of word, character, and POS were extracted as the features. The representation models were learned by different classifiers using machine learning (Naïve Bayes, Random Forest, and SVM). The convolutional neural network (CNN), a SoA deep learning model in sentence classification, was tested against the traditional classifiers. The result was observed by combining different representation models and classifiers on three datasets (English, Iban, and KadazanDusun). The best result was achieved when CNN learned embedding-based models with a combination of all features. KadazanDusun achieved the highest accuracy with 95.76%, English with 95.02%, and Iban with 94%.. Penerbit UTM Press 2023-05-31 Article PeerReviewed text en http://ir.unimas.my/id/eprint/42630/3/TOWARDS%20CURBING%20-%20Copy.pdf Nursyahirah, Tarmizi and Suhaila, Saee and Dayang Hanani, Abang Ibrahim (2023) TOWARDS CURBING CYBER-BULLYING IN MALAYSIA BY AUTHOR IDENTIFICATION OF IBAN AND KADAZANDUSUN OSN TEXT USING DEEP LEARNING. ASEAN Engineering Journal, 13 (2). pp. 145-157. ISSN 2586–9159 https://journals.utm.my/aej/article/view/19171 https://doi.org/10.11113/aej.v13.19171
institution	Universiti Malaysia Sarawak
building	Centre for Academic Information Services (CAIS)
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Malaysia Sarawak
content_source	UNIMAS Institutional Repository
url_provider	http://ir.unimas.my/
language	English
topic	QA75 Electronic computers. Computer science
spellingShingle	QA75 Electronic computers. Computer science Nursyahirah, Tarmizi Suhaila, Saee Dayang Hanani, Abang Ibrahim TOWARDS CURBING CYBER-BULLYING IN MALAYSIA BY AUTHOR IDENTIFICATION OF IBAN AND KADAZANDUSUN OSN TEXT USING DEEP LEARNING
description	Online Social Network (OSN) is frequently used to carry out cyber-criminal actions such as cyberbullying. As a developing country in Asia that keeps abreast of ICT advancement, Malaysia is no exception when it comes to cyberbullying. Author Identification (AI) task plays a vital role in social media forensic investigation (SMF) to unveil the genuine identity of the offender by analysing the text written in OSN by the candidate culprits. Several challenges in AI dealing with OSN text, including limited text length and informal language full of internet jargon and grammatical errors that further impact AI's performance in SMF. The traditional AI system that analyses long text documents seems inadequate to analyse short OSN text's writing style. N-gram features are proven to efficiently represent the authors' writing style for shot text. However, representing N-grams in traditional representation like Tf-IDF resulted in sparse and difficult in grasping the semantic information from text. Besides, most AI works have been done in English but receive less attention in indigenous languages. In West Malaysia, the supreme languages that transcend ethnic boundaries are Iban of Sarawak and KadazanDusun of Sabah, which both are inherently under-resourced. This paper presented a proposed workflow of AI for short OSN text using two Under-Resourced Language (U-RL), Iban and KadazanDusun tweets, to curb the cyberbullying issue in Malaysia. This paper compares Tf-Idf (sparse) and SoA embedding-based (dense) feature representations to observe which representations best represent the stylistic features of the authors’ writing. N-grams of word, character, and POS were extracted as the features. The representation models were learned by different classifiers using machine learning (Naïve Bayes, Random Forest, and SVM). The convolutional neural network (CNN), a SoA deep learning model in sentence classification, was tested against the traditional classifiers. The result was observed by combining different representation models and classifiers on three datasets (English, Iban, and KadazanDusun). The best result was achieved when CNN learned embedding-based models with a combination of all features. KadazanDusun achieved the highest accuracy with 95.76%, English with 95.02%, and Iban with 94%..
format	Article
author	Nursyahirah, Tarmizi Suhaila, Saee Dayang Hanani, Abang Ibrahim
author_facet	Nursyahirah, Tarmizi Suhaila, Saee Dayang Hanani, Abang Ibrahim
author_sort	Nursyahirah, Tarmizi
title	TOWARDS CURBING CYBER-BULLYING IN MALAYSIA BY AUTHOR IDENTIFICATION OF IBAN AND KADAZANDUSUN OSN TEXT USING DEEP LEARNING
title_short	TOWARDS CURBING CYBER-BULLYING IN MALAYSIA BY AUTHOR IDENTIFICATION OF IBAN AND KADAZANDUSUN OSN TEXT USING DEEP LEARNING
title_full	TOWARDS CURBING CYBER-BULLYING IN MALAYSIA BY AUTHOR IDENTIFICATION OF IBAN AND KADAZANDUSUN OSN TEXT USING DEEP LEARNING
title_fullStr	TOWARDS CURBING CYBER-BULLYING IN MALAYSIA BY AUTHOR IDENTIFICATION OF IBAN AND KADAZANDUSUN OSN TEXT USING DEEP LEARNING
title_full_unstemmed	TOWARDS CURBING CYBER-BULLYING IN MALAYSIA BY AUTHOR IDENTIFICATION OF IBAN AND KADAZANDUSUN OSN TEXT USING DEEP LEARNING
title_sort	towards curbing cyber-bullying in malaysia by author identification of iban and kadazandusun osn text using deep learning
publisher	Penerbit UTM Press
publishDate	2023
url	http://ir.unimas.my/id/eprint/42630/3/TOWARDS%20CURBING%20-%20Copy.pdf http://ir.unimas.my/id/eprint/42630/ https://journals.utm.my/aej/article/view/19171 https://doi.org/10.11113/aej.v13.19171
_version_	1775627331618996224
score	13.18916

TOWARDS CURBING CYBER-BULLYING IN MALAYSIA BY AUTHOR IDENTIFICATION OF IBAN AND KADAZANDUSUN OSN TEXT USING DEEP LEARNING

Similar Items