Building a malay-english code-switching subjectivity corpus for sentiment analysis
Combining of local and foreign language in single utterance has become a norm in multi-ethnic region. This phenomenon is known as code-switching. Code-switching has become a new challenge in sentiment analysis when the Internet users express their opinion in blogs, reviews and social network sites....
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
International Center for Scientific Research and Studies
2019
|
Online Access: | http://eprints.utem.edu.my/id/eprint/24469/2/ARTICLE-IJASCA2019-BUILDING-A-MALAY-ENGLISH-CODE-SWITCHING-SUBJECTIVITY-CORPUS-FOR-SENTIMENT-ANALYSIS.PDF http://eprints.utem.edu.my/id/eprint/24469/ http://home.ijasca.com/data/documents/8_page-112-130_Building-a-Malay-English-Code-Switching-Subjectivity-Corpus-for-Sentiment-Analysis.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.utem.eprints.24469 |
---|---|
record_format |
eprints |
spelling |
my.utem.eprints.244692023-07-12T11:22:34Z http://eprints.utem.edu.my/id/eprint/24469/ Building a malay-english code-switching subjectivity corpus for sentiment analysis Kasmuri, Emaliana Basiron, Halizah Combining of local and foreign language in single utterance has become a norm in multi-ethnic region. This phenomenon is known as code-switching. Code-switching has become a new challenge in sentiment analysis when the Internet users express their opinion in blogs, reviews and social network sites. The resources to process code-switching text in sentiment analysis is scarce especially annotated corpus. This paper develops a guideline to build a code-switching subjectivity corpus for a mix of Malay and English language known as MY-EN-CS. The guideline is suitable for any code-switching textual document. This paper built a new MY-EN-CS to demonstrate the guideline. The corpus consists of opinionated and factual sentences that are constructed from combination of words from these the languages. The sentences were retrieved from blogs and MY-EN-CS sentences are identified and annotated either as opinionated or factual. The annotated task yields 0.83 Kappa value rate that indicates the reliability of this corpus. International Center for Scientific Research and Studies 2019-03 Article PeerReviewed text en http://eprints.utem.edu.my/id/eprint/24469/2/ARTICLE-IJASCA2019-BUILDING-A-MALAY-ENGLISH-CODE-SWITCHING-SUBJECTIVITY-CORPUS-FOR-SENTIMENT-ANALYSIS.PDF Kasmuri, Emaliana and Basiron, Halizah (2019) Building a malay-english code-switching subjectivity corpus for sentiment analysis. International Journal of Advances in Soft Computing and its Applications, 11 (1). pp. 112-130. ISSN 2074-8523 http://home.ijasca.com/data/documents/8_page-112-130_Building-a-Malay-English-Code-Switching-Subjectivity-Corpus-for-Sentiment-Analysis.pdf |
institution |
Universiti Teknikal Malaysia Melaka |
building |
UTEM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknikal Malaysia Melaka |
content_source |
UTEM Institutional Repository |
url_provider |
http://eprints.utem.edu.my/ |
language |
English |
description |
Combining of local and foreign language in single utterance has become a norm in multi-ethnic region. This phenomenon is known as code-switching. Code-switching has become a new challenge in sentiment analysis when the Internet users express their opinion in blogs, reviews and social network sites. The resources to process code-switching text in sentiment analysis is scarce especially annotated corpus. This paper develops a guideline to build a code-switching subjectivity corpus for a mix of Malay and English language known as MY-EN-CS. The guideline is suitable for any code-switching textual document. This paper built a new MY-EN-CS to demonstrate the guideline. The corpus consists of opinionated and factual sentences that are constructed from combination of words from these the languages. The sentences were retrieved from blogs and MY-EN-CS sentences are identified and annotated either as opinionated or factual. The annotated task yields 0.83 Kappa value rate that indicates the reliability of this corpus. |
format |
Article |
author |
Kasmuri, Emaliana Basiron, Halizah |
spellingShingle |
Kasmuri, Emaliana Basiron, Halizah Building a malay-english code-switching subjectivity corpus for sentiment analysis |
author_facet |
Kasmuri, Emaliana Basiron, Halizah |
author_sort |
Kasmuri, Emaliana |
title |
Building a malay-english code-switching subjectivity corpus for sentiment analysis |
title_short |
Building a malay-english code-switching subjectivity corpus for sentiment analysis |
title_full |
Building a malay-english code-switching subjectivity corpus for sentiment analysis |
title_fullStr |
Building a malay-english code-switching subjectivity corpus for sentiment analysis |
title_full_unstemmed |
Building a malay-english code-switching subjectivity corpus for sentiment analysis |
title_sort |
building a malay-english code-switching subjectivity corpus for sentiment analysis |
publisher |
International Center for Scientific Research and Studies |
publishDate |
2019 |
url |
http://eprints.utem.edu.my/id/eprint/24469/2/ARTICLE-IJASCA2019-BUILDING-A-MALAY-ENGLISH-CODE-SWITCHING-SUBJECTIVITY-CORPUS-FOR-SENTIMENT-ANALYSIS.PDF http://eprints.utem.edu.my/id/eprint/24469/ http://home.ijasca.com/data/documents/8_page-112-130_Building-a-Malay-English-Code-Switching-Subjectivity-Corpus-for-Sentiment-Analysis.pdf |
_version_ |
1772816017275224064 |
score |
13.211869 |