An enhanced sequential exception technique for semantic-based text anomaly detection

The detection of semantic-based text anomaly is an interesting research area which has gained considerable attention from the data mining community. Text anomaly detection identifies deviating information from general information contained in documents. Text data are characterized by having problems...

Full description

Saved in:
Bibliographic Details
Main Author: Taiye, Mohammed Ahmed
Format: Thesis
Language:English
English
Published: 2019
Subjects:
Online Access:https://etd.uum.edu.my/8112/1/s900757_01.pdf
https://etd.uum.edu.my/8112/2/s900757_02.pdf
https://etd.uum.edu.my/8112/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.uum.etd.8112
record_format eprints
spelling my.uum.etd.81122022-05-09T08:13:52Z https://etd.uum.edu.my/8112/ An enhanced sequential exception technique for semantic-based text anomaly detection Taiye, Mohammed Ahmed T58.5-58.64 Information technology QA273-280 Probabilities. Mathematical statistics The detection of semantic-based text anomaly is an interesting research area which has gained considerable attention from the data mining community. Text anomaly detection identifies deviating information from general information contained in documents. Text data are characterized by having problems related to ambiguity, high dimensionality, sparsity and text representation. If these challenges are not properly resolved, identifying semantic-based text anomaly will be less accurate. This study proposes an Enhanced Sequential Exception Technique (ESET) to detect semantic-based text anomaly by achieving five objectives: (1) to modify Sequential Exception Technique (SET) in processing unstructured text; (2) to optimize Cosine Similarity for identifying similar and dissimilar text data; (3) to hybridize modified SET with Latent Semantic Analysis (LSA); (4) to integrate Lesk and Selectional Preference algorithms for disambiguating senses and identifying text canonical form; and (5) to represent semantic-based text anomaly using First Order Logic (FOL) and Concept Network Graph (CNG). ESET performs text anomaly detection by employing optimized Cosine Similarity, hybridizing LSA with modified SET, and integrating it with Word Sense Disambiguation algorithms specifically Lesk and Selectional Preference. Then, FOL and CNG are proposed to represent the detected semantic-based text anomaly. To demonstrate the feasibility of the technique, four selected datasets namely NIPS data, ENRON, Daily Koss blog, and 20Newsgroups were experimented on. The experimental evaluation revealed that ESET has significantly improved the accuracy of detecting semantic-based text anomaly from documents. When compared with existing measures, the experimental results outperformed benchmarked methods with an improved F1-score from all datasets respectively; NIPS data 0.75, ENRON 0.82, Daily Koss blog 0.93 and 20Newsgroups 0.97. The results generated from ESET has proven to be significant and supported a growing notion of semantic-based text anomaly which is increasingly evident in existing literatures. Practically, this study contributes to topic modelling and concept coherence for the purpose of visualizing information, knowledge sharing and optimized decision making. 2019 Thesis NonPeerReviewed text en https://etd.uum.edu.my/8112/1/s900757_01.pdf text en https://etd.uum.edu.my/8112/2/s900757_02.pdf Taiye, Mohammed Ahmed (2019) An enhanced sequential exception technique for semantic-based text anomaly detection. Doctoral thesis, Universiti Utara Malaysia.
institution Universiti Utara Malaysia
building UUM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Utara Malaysia
content_source UUM Electronic Theses
url_provider http://etd.uum.edu.my/
language English
English
topic T58.5-58.64 Information technology
QA273-280 Probabilities. Mathematical statistics
spellingShingle T58.5-58.64 Information technology
QA273-280 Probabilities. Mathematical statistics
Taiye, Mohammed Ahmed
An enhanced sequential exception technique for semantic-based text anomaly detection
description The detection of semantic-based text anomaly is an interesting research area which has gained considerable attention from the data mining community. Text anomaly detection identifies deviating information from general information contained in documents. Text data are characterized by having problems related to ambiguity, high dimensionality, sparsity and text representation. If these challenges are not properly resolved, identifying semantic-based text anomaly will be less accurate. This study proposes an Enhanced Sequential Exception Technique (ESET) to detect semantic-based text anomaly by achieving five objectives: (1) to modify Sequential Exception Technique (SET) in processing unstructured text; (2) to optimize Cosine Similarity for identifying similar and dissimilar text data; (3) to hybridize modified SET with Latent Semantic Analysis (LSA); (4) to integrate Lesk and Selectional Preference algorithms for disambiguating senses and identifying text canonical form; and (5) to represent semantic-based text anomaly using First Order Logic (FOL) and Concept Network Graph (CNG). ESET performs text anomaly detection by employing optimized Cosine Similarity, hybridizing LSA with modified SET, and integrating it with Word Sense Disambiguation algorithms specifically Lesk and Selectional Preference. Then, FOL and CNG are proposed to represent the detected semantic-based text anomaly. To demonstrate the feasibility of the technique, four selected datasets namely NIPS data, ENRON, Daily Koss blog, and 20Newsgroups were experimented on. The experimental evaluation revealed that ESET has significantly improved the accuracy of detecting semantic-based text anomaly from documents. When compared with existing measures, the experimental results outperformed benchmarked methods with an improved F1-score from all datasets respectively; NIPS data 0.75, ENRON 0.82, Daily Koss blog 0.93 and 20Newsgroups 0.97. The results generated from ESET has proven to be significant and supported a growing notion of semantic-based text anomaly which is increasingly evident in existing literatures. Practically, this study contributes to topic modelling and concept coherence for the purpose of visualizing information, knowledge sharing and optimized decision making.
format Thesis
author Taiye, Mohammed Ahmed
author_facet Taiye, Mohammed Ahmed
author_sort Taiye, Mohammed Ahmed
title An enhanced sequential exception technique for semantic-based text anomaly detection
title_short An enhanced sequential exception technique for semantic-based text anomaly detection
title_full An enhanced sequential exception technique for semantic-based text anomaly detection
title_fullStr An enhanced sequential exception technique for semantic-based text anomaly detection
title_full_unstemmed An enhanced sequential exception technique for semantic-based text anomaly detection
title_sort enhanced sequential exception technique for semantic-based text anomaly detection
publishDate 2019
url https://etd.uum.edu.my/8112/1/s900757_01.pdf
https://etd.uum.edu.my/8112/2/s900757_02.pdf
https://etd.uum.edu.my/8112/
_version_ 1732947714074738688
score 13.214268