Performance evaluation of hybrid feature selection technique for sentiment classification based on food reviews

This paper presents an evaluation of the performance efficiency of sentiment classification using a hybrid feature selection technique. This technique is able to overcome the issue of lack in evaluating features importance by using a combination of TF-IDF+SVM-RFE (Term Frequency-Inverse Document Fre...

Full description

Saved in:
Bibliographic Details
Main Authors: Awang, Suryanti, Mohd Nafis, Nur Syafiqah
Format: Conference or Workshop Item
Language:English
English
Published: IEEE 2021
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/33473/1/Performance%20evaluation%20of%20hybrid%20feature%20selection%20technique_FULL.pdf
http://umpir.ump.edu.my/id/eprint/33473/2/Performance%20evaluation%20of%20hybrid%20feature%20selection%20technique.pdf
http://umpir.ump.edu.my/id/eprint/33473/
https://doi.org/10.1109/ICSECS52883.2021.00038
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper presents an evaluation of the performance efficiency of sentiment classification using a hybrid feature selection technique. This technique is able to overcome the issue of lack in evaluating features importance by using a combination of TF-IDF+SVM-RFE (Term Frequency-Inverse Document Frequency (TF-IDF) and Supports Vector Machine (SVM-RFE)). Feature importance is measured and significant features are selected recursively based on the number of significant features known as k-top features. We tested this technique with a food reviews dataset from Kaggle to classify a positive and negative review. Finally, SVM has been deployed as a classifier to evaluate the classification performance. The performance is observed based on the accuracy, precision, recall and F-measure. The highest accuracy is 80%, precision is 82%, recall is 76% and F-measure is 79%. Consequently, 24.5% of the features to be classified in this technique have been reduced in obtaining these highest results. Thus, the computational resources are able to be utilized optimally from this reduction and the classification performance efficiency is able to be maintained.