Senti2Vec: an effective feature extraction technique for sentiment analysis based on Word2Vec

The discovery of an active feature extraction technique has been the focus of many researchers to improve the performance of classification methods, such as for sentiment analysis. Many of them have shown interest in using word embeddings especially Word2Vec as the features for text classification t...

Full description

Saved in:
Bibliographic Details
Main Authors: M.Alshari, Eissa, Azman, Azreen, Doraisamy, Shyamala, Mustapha, Norwati, Alksher, Mostafa
Format: Article
Published: University of Malaya * Faculty of Computer Science and Information Technology 2020
Online Access:http://psasir.upm.edu.my/id/eprint/85796/
https://ejournal.um.edu.my/index.php/MJCS/article/view/25280
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The discovery of an active feature extraction technique has been the focus of many researchers to improve the performance of classification methods, such as for sentiment analysis. Many of them have shown interest in using word embeddings especially Word2Vec as the features for text classification tasks. Its ability to model high-quality distributional semantics among words has contributed to its success in many of the functions. Despite the success, Word2Vec features are high dimensional that lead to an increase in the complexity of the classifier. In this paper, an effective method for feature extraction based on Word2Vec is proposed for sentiment analysis. The process discovers polarity clusters of the terms in the vocabulary through Word2Vec and opinion lexical dictionary. The features vector for each text is constructed from the polarity clusters, which lead to a lower-dimensional vector to represent the text. This paper also investigates the effect of two opinion lexical dictionaries on the performance of sentiment analysis, and one of the dictionaries are created based on SentiWordNet. The effectiveness of the proposed method is evaluated on the IMDB with two classifiers, namely the Logistic Regression and the Support Vector Machine. The result is promising, showing that the proposed method can be more effective than the baseline approaches.