Staff View: Enhancement of feature sets for subjectivity analysis on Malay-English code-switching text

Enhancement of feature sets for subjectivity analysis on Malay-English code-switching text

A code-switching sentence is a sentence that is constructed using two or more languages. It is a norm for a multi-lingual speaker to use code-switching sentences to share objective and subjective textual information on public platforms such as blogs and social media. Classifying a voluminous code-sw...

Full description

Saved in:

Bibliographic Details
Main Author:	Kasmuri, Emaliana
Format:	Thesis
Language:	English English
Published:	2023
Subjects:	Q Science (General)
Online Access:	http://eprints.utem.edu.my/id/eprint/26959/1/Enhancement%20of%20feature%20sets%20for%20subjectivity%20analysis%20on%20Malay-English%20code-switching%20text.pdf http://eprints.utem.edu.my/id/eprint/26959/2/Enhancement%20of%20feature%20sets%20for%20subjectivity%20analysis%20on%20Malay-English%20code-switching%20text.pdf http://eprints.utem.edu.my/id/eprint/26959/ https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=122648
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.utem.eprints.26959
record_format	eprints
spelling	my.utem.eprints.269592023-09-14T14:39:00Z http://eprints.utem.edu.my/id/eprint/26959/ Enhancement of feature sets for subjectivity analysis on Malay-English code-switching text Kasmuri, Emaliana Q Science (General) A code-switching sentence is a sentence that is constructed using two or more languages. It is a norm for a multi-lingual speaker to use code-switching sentences to share objective and subjective textual information on public platforms such as blogs and social media. Classifying a voluminous code-switching text into subjective and objective classes has posed a new challenge to the current solution of subjectivity analysis. The current solution has limited its design to process only monolingual text. Therefore, the presence of subjective code-switching text is ignored by the current solution. The ignorant limits the capability of the current solution to generate an accurate result of subjectivity analysis on code-switching text. Therefore, this research aims to find a set of solutions for subjectivity analysis on code-switching text. The research process begins by filling in the absence of the subjectivity code-switching corpus. A subjective Malay-English code-switching corpus was built. The corpus contains 35,067 Malay-English code-switching sentences that were harvested from Malay-English blog posts. Each sentence was annotated with either subjective or objective labels. The research process continues with designing the feature sets that represent the subjectivity of the Malay-English code-switching sentences from the corpus. The feature sets were enhanced from the subjective monolingual feature set, that was initially designed to represent subjectivity of English text. The initial subjective monolingual feature sets consist of pronoun, adjective, cardinal number, modal and adverb. The enhanced feature sets consist three feature sets which are embedded code-switching feature set, unified code-switching feature set and stylistic feature set. The embedded code-switching feature used the initial monolingual feature set for English and embeds the feature of Malay language in it. In the unified code-switching feature set, the extracted Malay and English features were unified using an adapted algorithm known as the Malay-English Unified POS. The algorithm predicts the type of each word in a code-switching sentence according to the language of the word. In the stylistic feature set, emoticons, interjections, signs of subjectivity such as exclamation marks and word with exaggerations of spelling were extracted to represent the subjectivity in the code-switching sentences. The effectiveness of the enhanced feature sets was evaluated using the Malay-English code-switching subjectivity corpus as the data set and two machine learning classifiers, which are Naïve-Bayes and Support Vector Machine. The 10-fold cross-validation classification technique was used on different settings of experiments and combinations of feature sets to obtain the performance of the enhanced feature sets. The performance from the combination of unified code-switching and stylistic feature sets has outperformed other feature sets. The combination has consistently performed at the accuracy of 59% using both machine learning classifiers. The consistent performance indicates the combined feature sets are the viable solution for subjectivity analysis on the Malay-English code-switching text. 2023 Thesis NonPeerReviewed text en http://eprints.utem.edu.my/id/eprint/26959/1/Enhancement%20of%20feature%20sets%20for%20subjectivity%20analysis%20on%20Malay-English%20code-switching%20text.pdf text en http://eprints.utem.edu.my/id/eprint/26959/2/Enhancement%20of%20feature%20sets%20for%20subjectivity%20analysis%20on%20Malay-English%20code-switching%20text.pdf Kasmuri, Emaliana (2023) Enhancement of feature sets for subjectivity analysis on Malay-English code-switching text. Doctoral thesis, Universiti Teknikal Malaysia Melaka. https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=122648
institution	Universiti Teknikal Malaysia Melaka
building	UTEM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Teknikal Malaysia Melaka
content_source	UTEM Institutional Repository
url_provider	http://eprints.utem.edu.my/
language	English English
topic	Q Science (General)
spellingShingle	Q Science (General) Kasmuri, Emaliana Enhancement of feature sets for subjectivity analysis on Malay-English code-switching text
description	A code-switching sentence is a sentence that is constructed using two or more languages. It is a norm for a multi-lingual speaker to use code-switching sentences to share objective and subjective textual information on public platforms such as blogs and social media. Classifying a voluminous code-switching text into subjective and objective classes has posed a new challenge to the current solution of subjectivity analysis. The current solution has limited its design to process only monolingual text. Therefore, the presence of subjective code-switching text is ignored by the current solution. The ignorant limits the capability of the current solution to generate an accurate result of subjectivity analysis on code-switching text. Therefore, this research aims to find a set of solutions for subjectivity analysis on code-switching text. The research process begins by filling in the absence of the subjectivity code-switching corpus. A subjective Malay-English code-switching corpus was built. The corpus contains 35,067 Malay-English code-switching sentences that were harvested from Malay-English blog posts. Each sentence was annotated with either subjective or objective labels. The research process continues with designing the feature sets that represent the subjectivity of the Malay-English code-switching sentences from the corpus. The feature sets were enhanced from the subjective monolingual feature set, that was initially designed to represent subjectivity of English text. The initial subjective monolingual feature sets consist of pronoun, adjective, cardinal number, modal and adverb. The enhanced feature sets consist three feature sets which are embedded code-switching feature set, unified code-switching feature set and stylistic feature set. The embedded code-switching feature used the initial monolingual feature set for English and embeds the feature of Malay language in it. In the unified code-switching feature set, the extracted Malay and English features were unified using an adapted algorithm known as the Malay-English Unified POS. The algorithm predicts the type of each word in a code-switching sentence according to the language of the word. In the stylistic feature set, emoticons, interjections, signs of subjectivity such as exclamation marks and word with exaggerations of spelling were extracted to represent the subjectivity in the code-switching sentences. The effectiveness of the enhanced feature sets was evaluated using the Malay-English code-switching subjectivity corpus as the data set and two machine learning classifiers, which are Naïve-Bayes and Support Vector Machine. The 10-fold cross-validation classification technique was used on different settings of experiments and combinations of feature sets to obtain the performance of the enhanced feature sets. The performance from the combination of unified code-switching and stylistic feature sets has outperformed other feature sets. The combination has consistently performed at the accuracy of 59% using both machine learning classifiers. The consistent performance indicates the combined feature sets are the viable solution for subjectivity analysis on the Malay-English code-switching text.
format	Thesis
author	Kasmuri, Emaliana
author_facet	Kasmuri, Emaliana
author_sort	Kasmuri, Emaliana
title	Enhancement of feature sets for subjectivity analysis on Malay-English code-switching text
title_short	Enhancement of feature sets for subjectivity analysis on Malay-English code-switching text
title_full	Enhancement of feature sets for subjectivity analysis on Malay-English code-switching text
title_fullStr	Enhancement of feature sets for subjectivity analysis on Malay-English code-switching text
title_full_unstemmed	Enhancement of feature sets for subjectivity analysis on Malay-English code-switching text
title_sort	enhancement of feature sets for subjectivity analysis on malay-english code-switching text
publishDate	2023
url	http://eprints.utem.edu.my/id/eprint/26959/1/Enhancement%20of%20feature%20sets%20for%20subjectivity%20analysis%20on%20Malay-English%20code-switching%20text.pdf http://eprints.utem.edu.my/id/eprint/26959/2/Enhancement%20of%20feature%20sets%20for%20subjectivity%20analysis%20on%20Malay-English%20code-switching%20text.pdf http://eprints.utem.edu.my/id/eprint/26959/ https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=122648
_version_	1778166431741902848
score	13.211869

Enhancement of feature sets for subjectivity analysis on Malay-English code-switching text

Similar Items