Sarcasm detection model based on tweets’ strength using hashtags and non-hashtags sentiment analysis
Recently, microblogs platforms such as Twitter are becoming popular day by day. People used Twitter for building common ground, sharing information and sharing opinions on a variety of topics and discussing current issues. Thus, Twitter becomes source of opinions. Therefore understanding the sent...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2016
|
Online Access: | http://psasir.upm.edu.my/id/eprint/69401/1/FSKTM%202016%2046%20-%20IR.pdf http://psasir.upm.edu.my/id/eprint/69401/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Recently, microblogs platforms such as Twitter are becoming popular day by
day. People used Twitter for building common ground, sharing information
and sharing opinions on a variety of topics and discussing current issues. Thus,
Twitter becomes source of opinions. Therefore understanding the sentiment of
the opinion is needed.
Over the last decades, sentiment analysis (SA) in social media has been one
of the most research areas in Natural Language Processing (NLP). The aim of
sentiment analysis is to automatically identify the polarity of a document, where
misinterpreting irony and sarcasm is a big challenge. There is a weak boundary
in the meaning between irony, sarcasm and satire, therefore in this thesis only
the term sarcasm is employed.
Sarcasm is a common phenomenon in social media, which is a nuance form of
language for expressing the opposite of what is inferred. Sarcasm generally
changes the polarity of an utterance from positive or negative into its opposite.
Therefore, identifying sarcasm correctly can enhance the performance of sentiment
classification. Sarcasm analysis is a difficult task not only for the machine,
but also for a human, because of the intentional ambiguity. Although sarcasm
detection has an important effect on sentiment, it is usually ignored in social
media analysis because sarcasm analysis is too complicated.
Several techniques have been used in sarcasm detection such a semi-supervised,
detection sarcasm based on intensifiers and exclamation, the impact of lexical and pragmatic factors, contrast between positive and negative situation verb
phrases and hashtags based sentiment analysis. In this thesis, two existing
works; sarcasm as a contrast between positive sentiment and negative situation
phrases and hashtags based sentiment analysis are extended. For the former
task, the authors of the work have presented a novel bootstrapping algorithm
that automatically learns a list of positive sentiment phrases and negative
situation phrases from sarcastic tweets. The results showed a contrast between
positive and negative and they can be used in recognizing sarcastic tweets.
However, the work only identified one type of sarcasm tweets (i.e. positive verb
phrases followed by negative situation phrases). In additional they did not work
on identifying sarcasm when a negative situation phrases is followed by positive
sentiment in the separate sentences. Moreover, the intensity of the negativity is
not considered in their work. In addition, the work did not consider hashtags
and sentiment analysis of hashtags. Hashtag is a topic or key words that are
marked with a tweet. Since many of the hashtags contain polarity, detection of
sarcasm at hashtags level will have a positive effect on polarity classification.
The later work which is extended in this thesis works based on the hashtags
sentiment analysis. The authors identified sarcastic tweets based on the sarcasm
indicators and contrast between the sentiment orientation of the tweets and
hashtags. Although, the work was primary work at the level of the hashtags
sentiment analysis, they did not use systematic approach for identifying sarcasm
indicators. Moreover, they worked only based on the contrast between the sentiment
orientation of the tweets and hashtags. Since sarcasm utterance contains
hyperbole and exaggeration and some hashtags are used for emphasizing the
text, identifying based on the contrast between the sentiment of the tweets and
hashtags is not sufficient.
To address problems, a Sarcasm Detection Model (SDM) is proposed. In the
proposed model, three classifiers; SentiStrength Sarcasm Classifier (SSC), Sarcasm
Hashtags Classifier (SHC) and Hashtags-SentiStrength Sarcasm Classifier
(HSSC) is used. SSC is worked at the level of the non-hashtags sentiment
analysis, whereas SHC and HSSC at the level of the hashtags sentiment analysis.
In the SSC, sarcasm is identified based on the strength level of tweets.
Several lexical and pragmatic features such as emoticons, interjections, capital
words and elongate words are applied in the proposed SentiStrength formula.
Sarcasm Hashtags Classifier (SHC) is used to identify sarcastic tweets based on
the Sarcasm Hashtags Indicator (SHI) and Sentiment Hashtags Analysis (SHA).
In the classifier (SHC), a bootstrapping algorithm is used to identify Sarcasm
Hashtags Indicator (SHI). SHI contains a list of hashtags that help to identify
sarcastic tweets easily. In the proposed model (SDM), if a tweet contains SHI, it
will be labeled as sarcastic tweet; otherwise the Sentiment Hashtags Analysis
(SHA) is applied. SHA is worked based on the contrast between sentiment
orientation of the tweets and hashtags. In this part, the hashtags are retokenized
through preprocessing and the orientation of the hashtags is identified. Next, the orientation of a tweet without hashtags is also identified. The tweet is
considered as sarcasm hashtags if there is a contrast between the orientation of
the tweet and hashtags.
The HSSC, works based on the strength level of tweets and hashtags. In this
classifier, the effect of the sentiment of the hashtags for increasing the polarity of
the tweets is considered.
The Sarcasm Detection Model (SDM) has been tested on two datasets which
each dataset contains 3000 sarcastic and non- sarcastic tweets. All of the tweets
were extracted randomly using the Twitter API. So far, no work has been done
in sarcasm detection at the level of hashtags and non-hashtags based sentiment
analysis. So, the novelty of the proposed model (SDM) is in identifying sarcastic
tweets by analyzing strength of the tweets at the level of the hashtags and
non-hashtags sentiment analysis. The results of the study (0.85% of precision)
demonstrates that the SDM is more accurate and effective than the existing
works which was done based on the contrast between positive and negative
situation phrases and hashtags based sentiment analysis. |
---|