An expandable Arabic lexicon and valence shifter rules for sentiment analysis on twitter

Sentiment analysis (SA) refers as computational and natural language processing techniques used to extract subjective information expressed in a text. In this SA study, three main problems are addressed: a) absence of resources on Palestinian Arabic dialect (PAL), b) emergence of new sentiment words...

Full description

Saved in:
Bibliographic Details
Main Author: Ihnaini, Baha' Najim Salman
Format: Thesis
Language:English
English
English
Published: 2019
Subjects:
Online Access:https://etd.uum.edu.my/8699/1/s900147_01.pdf
https://etd.uum.edu.my/8699/2/s900147_02.pdf
https://etd.uum.edu.my/8699/3/s900147_references.docx
https://etd.uum.edu.my/8699/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Sentiment analysis (SA) refers as computational and natural language processing techniques used to extract subjective information expressed in a text. In this SA study, three main problems are addressed: a) absence of resources on Palestinian Arabic dialect (PAL), b) emergence of new sentiment words, hence decreases the performance of sentiment analysis models when applied on tweets collected, and c) handling valence shifter words were not thoroughly addressed in Arabic sentiment analysis. Therefore, this study aims to construct a PAL lexicon for Palestinian tweets and to design an Expandable and Up-to-date Lexicon for Arabic (EULA). A new valence shifter rules in enhancing the performance of lexicon-based sentiment analysis on Arabic tweets is also been constructed. In this study, a PAL lexicon is built by using phonology matching algorithm while EULA is constructed by harnessing a general lexicon on a tweets dataset to find new terms and predict its polarity through some linguistic rules. Furthermore, a set of rules are proposed to handle the valence shifters words by applying rules to find the scope of words, and shifting value that is produced by these words. Palestinian and Arabic tweets datasets from March to May 2018 are used to evaluate the proposed idea. Experimental results indicate that the proposed PAL lexicon has produced better results compared to other lexicons when tested on Palestinian dataset. Meanwhile, EULA enhanced the performance of lexicon-based approach to be competitive with machine learning approach. Moreover, applying the proposed valence shifter rules have increased overall performance of 5% on average. The new proposed PAL sentiment lexicon is able to handle Palestinian’s dialects. Furthermore, the EULA has overcome the emergence of new slang words in social media. Moreover, the constructed valence shifter rules are capable to handle negation, intensifiers and contrasts in enhancing the performance of Arabic sentiment analysis.