Development of multilingual social media data corpus for sentiment classification
The purpose of this study is to develop a corpus, which consists of 2 (two) languages: Bahasa Indonesia and Bahasa Melayu. In both languages, there are several similar vocabularies but have different meanings. The data used on this corpus, taken from social media that is Twitter and Facebook. Each l...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Institute of Advanced Scientific Research, Inc.
2019
|
Online Access: | http://eprints.utem.edu.my/id/eprint/24479/2/ARTICLE-FITRAH-JARDCS.PDF http://eprints.utem.edu.my/id/eprint/24479/ https://www.jardcs.org/abstract.php?id=794 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.utem.eprints.24479 |
---|---|
record_format |
eprints |
spelling |
my.utem.eprints.244792023-07-12T11:24:39Z http://eprints.utem.edu.my/id/eprint/24479/ Development of multilingual social media data corpus for sentiment classification Rumaisa, Fitrah Basiron, Halizah Saaya, Zurina The purpose of this study is to develop a corpus, which consists of 2 (two) languages: Bahasa Indonesia and Bahasa Melayu. In both languages, there are several similar vocabularies but have different meanings. The data used on this corpus, taken from social media that is Twitter and Facebook. Each language has 2100 words collected. After manual selection of words, there are 300 vocabularies that have different meanings. The words will be formed into the core of the formed corpus, regardless of the remaining words. This corpus will density on the polarity of each word per language type using automatic-annotation. So that will be formed two corpuses namely Bahasa Indonesia and Bahasa Melayu. This Corpus will be used in subsequent research on sentence-level annotation and demonstrated using manual annotations using human annotators. Institute of Advanced Scientific Research, Inc. 2019 Article PeerReviewed text en http://eprints.utem.edu.my/id/eprint/24479/2/ARTICLE-FITRAH-JARDCS.PDF Rumaisa, Fitrah and Basiron, Halizah and Saaya, Zurina (2019) Development of multilingual social media data corpus for sentiment classification. Journal of Advanced Research in Dynamical and Control Systems, 11 (3). 286 - 293. ISSN 1943-023X https://www.jardcs.org/abstract.php?id=794 |
institution |
Universiti Teknikal Malaysia Melaka |
building |
UTEM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknikal Malaysia Melaka |
content_source |
UTEM Institutional Repository |
url_provider |
http://eprints.utem.edu.my/ |
language |
English |
description |
The purpose of this study is to develop a corpus, which consists of 2 (two) languages: Bahasa Indonesia and Bahasa Melayu. In both languages, there are several similar vocabularies but have different meanings. The data used on this corpus, taken from social media that is Twitter and Facebook. Each language has 2100 words collected. After manual selection of words, there are 300 vocabularies that have different meanings. The words will be formed into the core of the formed corpus, regardless of the remaining words. This corpus will density on the polarity of each word per language type using automatic-annotation. So that will be formed two corpuses namely Bahasa Indonesia and Bahasa Melayu. This Corpus will be used in subsequent research on sentence-level annotation and demonstrated using manual annotations using human annotators. |
format |
Article |
author |
Rumaisa, Fitrah Basiron, Halizah Saaya, Zurina |
spellingShingle |
Rumaisa, Fitrah Basiron, Halizah Saaya, Zurina Development of multilingual social media data corpus for sentiment classification |
author_facet |
Rumaisa, Fitrah Basiron, Halizah Saaya, Zurina |
author_sort |
Rumaisa, Fitrah |
title |
Development of multilingual social media data corpus for sentiment classification |
title_short |
Development of multilingual social media data corpus for sentiment classification |
title_full |
Development of multilingual social media data corpus for sentiment classification |
title_fullStr |
Development of multilingual social media data corpus for sentiment classification |
title_full_unstemmed |
Development of multilingual social media data corpus for sentiment classification |
title_sort |
development of multilingual social media data corpus for sentiment classification |
publisher |
Institute of Advanced Scientific Research, Inc. |
publishDate |
2019 |
url |
http://eprints.utem.edu.my/id/eprint/24479/2/ARTICLE-FITRAH-JARDCS.PDF http://eprints.utem.edu.my/id/eprint/24479/ https://www.jardcs.org/abstract.php?id=794 |
_version_ |
1772816017521639424 |
score |
13.160551 |