Multi-feature fusion framework for automatic sarcasm identification in Twitter data / Christopher Ifeanyi Eke

Recently, sentiment analysis in social network research has gained much recognition. The notion behind sentiment analysis is to determine the polarity of the emotion word in an expression. Analysis of people’s sentiments is a process of identifying subjective information in source documents. The pro...

Full description

Saved in:
Bibliographic Details
Main Author: Christopher , Ifeanyi Eke
Format: Thesis
Published: 2021
Subjects:
Online Access:http://studentsrepo.um.edu.my/14518/1/Eke_Christopher.pdf
http://studentsrepo.um.edu.my/14518/2/Christopher.pdf
http://studentsrepo.um.edu.my/14518/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.um.stud.14518
record_format eprints
institution Universiti Malaya
building UM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaya
content_source UM Student Repository
url_provider http://studentsrepo.um.edu.my/
topic QA75 Electronic computers. Computer science
ZA4050 Electronic information resources
spellingShingle QA75 Electronic computers. Computer science
ZA4050 Electronic information resources
Christopher , Ifeanyi Eke
Multi-feature fusion framework for automatic sarcasm identification in Twitter data / Christopher Ifeanyi Eke
description Recently, sentiment analysis in social network research has gained much recognition. The notion behind sentiment analysis is to determine the polarity of the emotion word in an expression. Analysis of people’s sentiments is a process of identifying subjective information in source documents. The process of identifying people’s opinions (sentiments) about products, politics, services, or individuals brings a lot of benefits to the organizations. For example, sarcasm is a type of sentiment where people express their negative emotions using positive words or intensified positive words in a text. In a sarcastic utterance, the expressed statement usually deflects the different meanings than their actual composition. Various feature engineering techniques such as Bag-of-words (BoWs), N-gram, and word embedding have been investigated to detect sarcasm in textual data automatically. However, the use of the features mentioned above results in the loss of contextual information due to the methods ignoring the context of words in the text. Furthermore, there are issues bothering on the sparsity of training data in sarcasm expression. This issue makes a feature vector for each sample constructed by BoW mostly null due to the microblog's word limit. Moreover, many deep learning methods in Natural Language Processing uses word embedding learning as a standard approach for feature vector representation. Nevertheless, one of the major drawbacks of word embedding is that it does not consider the sentiment polarity of the words. Consequently, words with opposite polarities are mapped into a close vector. To address the above-named problems and enhance the predictive performance in sarcasm identification, a Multi-Feature Fusion Framework for sarcasm identification is proposed using two classification stages. The first classification stage is constructed with a lexical feature only, extracted using the BoW technique and trained using five standard classifiers, including Support Vector Machine, Decision Tree, K-Nearest Neighbor, Logistic Regression, and Random Forest to predict the sarcastic tendency based on the lexical feature. In stage two, the extracted lexical feature is fused with the length of microblog, hashtag, discourse markers, emoticons, syntactic, pragmatic, semantic (GloVe embedding), and sentiment related features to form a feature fusion and modelled using various classifiers, including Support Vector Machine, Decision Tree, K-Nearest Neighbor, Logistic Regression, and Random Forest. The developed Multi-feature framework effectiveness is tested with various experimental analysis, which was performed to obtain classifiers’ performance. The evaluation shows that the constructed classification models based on the developed framework obtained results with the highest precision of 94.7% using a Random Forest classifier. Finally, the obtained results were compared with baseline approaches, and the proposed Multi-feature fusion framework attained the average detection precision between 11.2% - 27.1% compared to the baseline methods. The comparison outcomes show the significance of the proposed framework for sarcasm identification. Thus, the data sparsity issue can be resolved by selecting the discriminative features from the sparse training set before the modelling phase and bolstering the content-based feature with contextual information can enhance the predictive performance of sarcasm classification in textual data.
format Thesis
author Christopher , Ifeanyi Eke
author_facet Christopher , Ifeanyi Eke
author_sort Christopher , Ifeanyi Eke
title Multi-feature fusion framework for automatic sarcasm identification in Twitter data / Christopher Ifeanyi Eke
title_short Multi-feature fusion framework for automatic sarcasm identification in Twitter data / Christopher Ifeanyi Eke
title_full Multi-feature fusion framework for automatic sarcasm identification in Twitter data / Christopher Ifeanyi Eke
title_fullStr Multi-feature fusion framework for automatic sarcasm identification in Twitter data / Christopher Ifeanyi Eke
title_full_unstemmed Multi-feature fusion framework for automatic sarcasm identification in Twitter data / Christopher Ifeanyi Eke
title_sort multi-feature fusion framework for automatic sarcasm identification in twitter data / christopher ifeanyi eke
publishDate 2021
url http://studentsrepo.um.edu.my/14518/1/Eke_Christopher.pdf
http://studentsrepo.um.edu.my/14518/2/Christopher.pdf
http://studentsrepo.um.edu.my/14518/
_version_ 1769842920321449984
spelling my.um.stud.145182023-06-25T22:15:35Z Multi-feature fusion framework for automatic sarcasm identification in Twitter data / Christopher Ifeanyi Eke Christopher , Ifeanyi Eke QA75 Electronic computers. Computer science ZA4050 Electronic information resources Recently, sentiment analysis in social network research has gained much recognition. The notion behind sentiment analysis is to determine the polarity of the emotion word in an expression. Analysis of people’s sentiments is a process of identifying subjective information in source documents. The process of identifying people’s opinions (sentiments) about products, politics, services, or individuals brings a lot of benefits to the organizations. For example, sarcasm is a type of sentiment where people express their negative emotions using positive words or intensified positive words in a text. In a sarcastic utterance, the expressed statement usually deflects the different meanings than their actual composition. Various feature engineering techniques such as Bag-of-words (BoWs), N-gram, and word embedding have been investigated to detect sarcasm in textual data automatically. However, the use of the features mentioned above results in the loss of contextual information due to the methods ignoring the context of words in the text. Furthermore, there are issues bothering on the sparsity of training data in sarcasm expression. This issue makes a feature vector for each sample constructed by BoW mostly null due to the microblog's word limit. Moreover, many deep learning methods in Natural Language Processing uses word embedding learning as a standard approach for feature vector representation. Nevertheless, one of the major drawbacks of word embedding is that it does not consider the sentiment polarity of the words. Consequently, words with opposite polarities are mapped into a close vector. To address the above-named problems and enhance the predictive performance in sarcasm identification, a Multi-Feature Fusion Framework for sarcasm identification is proposed using two classification stages. The first classification stage is constructed with a lexical feature only, extracted using the BoW technique and trained using five standard classifiers, including Support Vector Machine, Decision Tree, K-Nearest Neighbor, Logistic Regression, and Random Forest to predict the sarcastic tendency based on the lexical feature. In stage two, the extracted lexical feature is fused with the length of microblog, hashtag, discourse markers, emoticons, syntactic, pragmatic, semantic (GloVe embedding), and sentiment related features to form a feature fusion and modelled using various classifiers, including Support Vector Machine, Decision Tree, K-Nearest Neighbor, Logistic Regression, and Random Forest. The developed Multi-feature framework effectiveness is tested with various experimental analysis, which was performed to obtain classifiers’ performance. The evaluation shows that the constructed classification models based on the developed framework obtained results with the highest precision of 94.7% using a Random Forest classifier. Finally, the obtained results were compared with baseline approaches, and the proposed Multi-feature fusion framework attained the average detection precision between 11.2% - 27.1% compared to the baseline methods. The comparison outcomes show the significance of the proposed framework for sarcasm identification. Thus, the data sparsity issue can be resolved by selecting the discriminative features from the sparse training set before the modelling phase and bolstering the content-based feature with contextual information can enhance the predictive performance of sarcasm classification in textual data. 2021-10 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/14518/1/Eke_Christopher.pdf application/pdf http://studentsrepo.um.edu.my/14518/2/Christopher.pdf Christopher , Ifeanyi Eke (2021) Multi-feature fusion framework for automatic sarcasm identification in Twitter data / Christopher Ifeanyi Eke. PhD thesis, Universiti Malaya. http://studentsrepo.um.edu.my/14518/
score 13.211869