Sarcasm detection model based on tweets’ strength using hashtags and non-hashtags sentiment analysis

Recently, microblogs platforms such as Twitter are becoming popular day by day. People used Twitter for building common ground, sharing information and sharing opinions on a variety of topics and discussing current issues. Thus, Twitter becomes source of opinions. Therefore understanding the sent...

Full description

Saved in:
Bibliographic Details
Main Author: Nadali, Samaneh
Format: Thesis
Language:English
Published: 2016
Online Access:http://psasir.upm.edu.my/id/eprint/69401/1/FSKTM%202016%2046%20-%20IR.pdf
http://psasir.upm.edu.my/id/eprint/69401/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Recently, microblogs platforms such as Twitter are becoming popular day by day. People used Twitter for building common ground, sharing information and sharing opinions on a variety of topics and discussing current issues. Thus, Twitter becomes source of opinions. Therefore understanding the sentiment of the opinion is needed. Over the last decades, sentiment analysis (SA) in social media has been one of the most research areas in Natural Language Processing (NLP). The aim of sentiment analysis is to automatically identify the polarity of a document, where misinterpreting irony and sarcasm is a big challenge. There is a weak boundary in the meaning between irony, sarcasm and satire, therefore in this thesis only the term sarcasm is employed. Sarcasm is a common phenomenon in social media, which is a nuance form of language for expressing the opposite of what is inferred. Sarcasm generally changes the polarity of an utterance from positive or negative into its opposite. Therefore, identifying sarcasm correctly can enhance the performance of sentiment classification. Sarcasm analysis is a difficult task not only for the machine, but also for a human, because of the intentional ambiguity. Although sarcasm detection has an important effect on sentiment, it is usually ignored in social media analysis because sarcasm analysis is too complicated. Several techniques have been used in sarcasm detection such a semi-supervised, detection sarcasm based on intensifiers and exclamation, the impact of lexical and pragmatic factors, contrast between positive and negative situation verb phrases and hashtags based sentiment analysis. In this thesis, two existing works; sarcasm as a contrast between positive sentiment and negative situation phrases and hashtags based sentiment analysis are extended. For the former task, the authors of the work have presented a novel bootstrapping algorithm that automatically learns a list of positive sentiment phrases and negative situation phrases from sarcastic tweets. The results showed a contrast between positive and negative and they can be used in recognizing sarcastic tweets. However, the work only identified one type of sarcasm tweets (i.e. positive verb phrases followed by negative situation phrases). In additional they did not work on identifying sarcasm when a negative situation phrases is followed by positive sentiment in the separate sentences. Moreover, the intensity of the negativity is not considered in their work. In addition, the work did not consider hashtags and sentiment analysis of hashtags. Hashtag is a topic or key words that are marked with a tweet. Since many of the hashtags contain polarity, detection of sarcasm at hashtags level will have a positive effect on polarity classification. The later work which is extended in this thesis works based on the hashtags sentiment analysis. The authors identified sarcastic tweets based on the sarcasm indicators and contrast between the sentiment orientation of the tweets and hashtags. Although, the work was primary work at the level of the hashtags sentiment analysis, they did not use systematic approach for identifying sarcasm indicators. Moreover, they worked only based on the contrast between the sentiment orientation of the tweets and hashtags. Since sarcasm utterance contains hyperbole and exaggeration and some hashtags are used for emphasizing the text, identifying based on the contrast between the sentiment of the tweets and hashtags is not sufficient. To address problems, a Sarcasm Detection Model (SDM) is proposed. In the proposed model, three classifiers; SentiStrength Sarcasm Classifier (SSC), Sarcasm Hashtags Classifier (SHC) and Hashtags-SentiStrength Sarcasm Classifier (HSSC) is used. SSC is worked at the level of the non-hashtags sentiment analysis, whereas SHC and HSSC at the level of the hashtags sentiment analysis. In the SSC, sarcasm is identified based on the strength level of tweets. Several lexical and pragmatic features such as emoticons, interjections, capital words and elongate words are applied in the proposed SentiStrength formula. Sarcasm Hashtags Classifier (SHC) is used to identify sarcastic tweets based on the Sarcasm Hashtags Indicator (SHI) and Sentiment Hashtags Analysis (SHA). In the classifier (SHC), a bootstrapping algorithm is used to identify Sarcasm Hashtags Indicator (SHI). SHI contains a list of hashtags that help to identify sarcastic tweets easily. In the proposed model (SDM), if a tweet contains SHI, it will be labeled as sarcastic tweet; otherwise the Sentiment Hashtags Analysis (SHA) is applied. SHA is worked based on the contrast between sentiment orientation of the tweets and hashtags. In this part, the hashtags are retokenized through preprocessing and the orientation of the hashtags is identified. Next, the orientation of a tweet without hashtags is also identified. The tweet is considered as sarcasm hashtags if there is a contrast between the orientation of the tweet and hashtags. The HSSC, works based on the strength level of tweets and hashtags. In this classifier, the effect of the sentiment of the hashtags for increasing the polarity of the tweets is considered. The Sarcasm Detection Model (SDM) has been tested on two datasets which each dataset contains 3000 sarcastic and non- sarcastic tweets. All of the tweets were extracted randomly using the Twitter API. So far, no work has been done in sarcasm detection at the level of hashtags and non-hashtags based sentiment analysis. So, the novelty of the proposed model (SDM) is in identifying sarcastic tweets by analyzing strength of the tweets at the level of the hashtags and non-hashtags sentiment analysis. The results of the study (0.85% of precision) demonstrates that the SDM is more accurate and effective than the existing works which was done based on the contrast between positive and negative situation phrases and hashtags based sentiment analysis.