Building a malay-english code-switching subjectivity corpus for sentiment analysis

Combining of local and foreign language in single utterance has become a norm in multi-ethnic region. This phenomenon is known as code-switching. Code-switching has become a new challenge in sentiment analysis when the Internet users express their opinion in blogs, reviews and social network sites....

Full description

Saved in:
Bibliographic Details
Main Authors: Kasmuri, Emaliana, Basiron, Halizah
Format: Article
Language:English
Published: International Center for Scientific Research and Studies 2019
Online Access:http://eprints.utem.edu.my/id/eprint/24469/2/ARTICLE-IJASCA2019-BUILDING-A-MALAY-ENGLISH-CODE-SWITCHING-SUBJECTIVITY-CORPUS-FOR-SENTIMENT-ANALYSIS.PDF
http://eprints.utem.edu.my/id/eprint/24469/
http://home.ijasca.com/data/documents/8_page-112-130_Building-a-Malay-English-Code-Switching-Subjectivity-Corpus-for-Sentiment-Analysis.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.utem.eprints.24469
record_format eprints
spelling my.utem.eprints.244692023-07-12T11:22:34Z http://eprints.utem.edu.my/id/eprint/24469/ Building a malay-english code-switching subjectivity corpus for sentiment analysis Kasmuri, Emaliana Basiron, Halizah Combining of local and foreign language in single utterance has become a norm in multi-ethnic region. This phenomenon is known as code-switching. Code-switching has become a new challenge in sentiment analysis when the Internet users express their opinion in blogs, reviews and social network sites. The resources to process code-switching text in sentiment analysis is scarce especially annotated corpus. This paper develops a guideline to build a code-switching subjectivity corpus for a mix of Malay and English language known as MY-EN-CS. The guideline is suitable for any code-switching textual document. This paper built a new MY-EN-CS to demonstrate the guideline. The corpus consists of opinionated and factual sentences that are constructed from combination of words from these the languages. The sentences were retrieved from blogs and MY-EN-CS sentences are identified and annotated either as opinionated or factual. The annotated task yields 0.83 Kappa value rate that indicates the reliability of this corpus. International Center for Scientific Research and Studies 2019-03 Article PeerReviewed text en http://eprints.utem.edu.my/id/eprint/24469/2/ARTICLE-IJASCA2019-BUILDING-A-MALAY-ENGLISH-CODE-SWITCHING-SUBJECTIVITY-CORPUS-FOR-SENTIMENT-ANALYSIS.PDF Kasmuri, Emaliana and Basiron, Halizah (2019) Building a malay-english code-switching subjectivity corpus for sentiment analysis. International Journal of Advances in Soft Computing and its Applications, 11 (1). pp. 112-130. ISSN 2074-8523 http://home.ijasca.com/data/documents/8_page-112-130_Building-a-Malay-English-Code-Switching-Subjectivity-Corpus-for-Sentiment-Analysis.pdf
institution Universiti Teknikal Malaysia Melaka
building UTEM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknikal Malaysia Melaka
content_source UTEM Institutional Repository
url_provider http://eprints.utem.edu.my/
language English
description Combining of local and foreign language in single utterance has become a norm in multi-ethnic region. This phenomenon is known as code-switching. Code-switching has become a new challenge in sentiment analysis when the Internet users express their opinion in blogs, reviews and social network sites. The resources to process code-switching text in sentiment analysis is scarce especially annotated corpus. This paper develops a guideline to build a code-switching subjectivity corpus for a mix of Malay and English language known as MY-EN-CS. The guideline is suitable for any code-switching textual document. This paper built a new MY-EN-CS to demonstrate the guideline. The corpus consists of opinionated and factual sentences that are constructed from combination of words from these the languages. The sentences were retrieved from blogs and MY-EN-CS sentences are identified and annotated either as opinionated or factual. The annotated task yields 0.83 Kappa value rate that indicates the reliability of this corpus.
format Article
author Kasmuri, Emaliana
Basiron, Halizah
spellingShingle Kasmuri, Emaliana
Basiron, Halizah
Building a malay-english code-switching subjectivity corpus for sentiment analysis
author_facet Kasmuri, Emaliana
Basiron, Halizah
author_sort Kasmuri, Emaliana
title Building a malay-english code-switching subjectivity corpus for sentiment analysis
title_short Building a malay-english code-switching subjectivity corpus for sentiment analysis
title_full Building a malay-english code-switching subjectivity corpus for sentiment analysis
title_fullStr Building a malay-english code-switching subjectivity corpus for sentiment analysis
title_full_unstemmed Building a malay-english code-switching subjectivity corpus for sentiment analysis
title_sort building a malay-english code-switching subjectivity corpus for sentiment analysis
publisher International Center for Scientific Research and Studies
publishDate 2019
url http://eprints.utem.edu.my/id/eprint/24469/2/ARTICLE-IJASCA2019-BUILDING-A-MALAY-ENGLISH-CODE-SWITCHING-SUBJECTIVITY-CORPUS-FOR-SENTIMENT-ANALYSIS.PDF
http://eprints.utem.edu.my/id/eprint/24469/
http://home.ijasca.com/data/documents/8_page-112-130_Building-a-Malay-English-Code-Switching-Subjectivity-Corpus-for-Sentiment-Analysis.pdf
_version_ 1772816017275224064
score 13.211869