Staff View: Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples

Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples

In recent years, research in sentiment classification has received considerable attention by natural language processing researchers. Annotated sentiment corpora are the most important resources used in sentiment classification. However, since most recent research works in this field have focused on...

Full description

Saved in:

Bibliographic Details
Main Authors:	Hajmohammadi, Mohammad Sadegh, Ibrahim, Roliana, Selamat, Ali, Fujita, Hamido
Format:	Article
Published:	Elsevier Inc. 2015
Subjects:	QA76 Computer software
Online Access:	http://eprints.utm.my/id/eprint/58077/ http://dx.doi.org/10.1016/j.ins.2015.04.003
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.utm.58077
record_format	eprints
spelling	my.utm.580772022-04-05T04:57:46Z http://eprints.utm.my/id/eprint/58077/ Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples Hajmohammadi, Mohammad Sadegh Ibrahim, Roliana Selamat, Ali Fujita, Hamido QA76 Computer software In recent years, research in sentiment classification has received considerable attention by natural language processing researchers. Annotated sentiment corpora are the most important resources used in sentiment classification. However, since most recent research works in this field have focused on the English language, there are accordingly not enough annotated sentiment resources in other languages. Manual construction of reliable annotated sentiment corpora for a new language is a labour-intensive and time-consuming task. Projection of sentiment corpus from one language into another language is a natural solution used in cross-lingual sentiment classification. Automatic machine translation services are the most commonly tools used to directly project information from one language into another. However, since term distribution across languages may be different due to variations in linguistic terms and writing styles, cross-lingual methods cannot reach the performance of monolingual methods. In this paper, a novel learning model is proposed based on the combination of uncertainty-based active learning and semi-supervised self-training approaches to incorporate unlabelled sentiment documents from the target language in order to improve the performance of cross-lingual methods. Further, in this model, the density measures of unlabelled examples are considered in active learning part in order to avoid outlier selection. The empirical evaluation on book review datasets in three different languages shows that the proposed model can significantly improve the performance of cross-lingual sentiment classification in comparison with other existing and baseline methods. Elsevier Inc. 2015 Article PeerReviewed Hajmohammadi, Mohammad Sadegh and Ibrahim, Roliana and Selamat, Ali and Fujita, Hamido (2015) Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples. Information Sciences, 317 . pp. 67-77. ISSN 0020-0255 http://dx.doi.org/10.1016/j.ins.2015.04.003
institution	Universiti Teknologi Malaysia
building	UTM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Teknologi Malaysia
content_source	UTM Institutional Repository
url_provider	http://eprints.utm.my/
topic	QA76 Computer software
spellingShingle	QA76 Computer software Hajmohammadi, Mohammad Sadegh Ibrahim, Roliana Selamat, Ali Fujita, Hamido Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples
description	In recent years, research in sentiment classification has received considerable attention by natural language processing researchers. Annotated sentiment corpora are the most important resources used in sentiment classification. However, since most recent research works in this field have focused on the English language, there are accordingly not enough annotated sentiment resources in other languages. Manual construction of reliable annotated sentiment corpora for a new language is a labour-intensive and time-consuming task. Projection of sentiment corpus from one language into another language is a natural solution used in cross-lingual sentiment classification. Automatic machine translation services are the most commonly tools used to directly project information from one language into another. However, since term distribution across languages may be different due to variations in linguistic terms and writing styles, cross-lingual methods cannot reach the performance of monolingual methods. In this paper, a novel learning model is proposed based on the combination of uncertainty-based active learning and semi-supervised self-training approaches to incorporate unlabelled sentiment documents from the target language in order to improve the performance of cross-lingual methods. Further, in this model, the density measures of unlabelled examples are considered in active learning part in order to avoid outlier selection. The empirical evaluation on book review datasets in three different languages shows that the proposed model can significantly improve the performance of cross-lingual sentiment classification in comparison with other existing and baseline methods.
format	Article
author	Hajmohammadi, Mohammad Sadegh Ibrahim, Roliana Selamat, Ali Fujita, Hamido
author_facet	Hajmohammadi, Mohammad Sadegh Ibrahim, Roliana Selamat, Ali Fujita, Hamido
author_sort	Hajmohammadi, Mohammad Sadegh
title	Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples
title_short	Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples
title_full	Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples
title_fullStr	Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples
title_full_unstemmed	Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples
title_sort	combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples
publisher	Elsevier Inc.
publishDate	2015
url	http://eprints.utm.my/id/eprint/58077/ http://dx.doi.org/10.1016/j.ins.2015.04.003
_version_	1729703224697421824
score	13.15806

Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples

Similar Items