Deviation detection in text using conceptual graph interchange format and error tolerance dissimilarity function

The rapid increase in the amount of textual data has brought forward a growing research interest towards mining text to detect deviations. Specialized methods for specific domains have emerged to satisfy various needs in discovering rare patterns in text. This paper focuses on a graph-based approach...

Full description

Saved in:
Bibliographic Details
Main Authors: Kamaruddin, Siti Sakira, Hamdan, Abdul Razak, Abu Bakar, Azuraliza, Mat Nor, Fauzias
Format: Article
Language:English
Published: IOS Press 2012
Subjects:
Online Access:http://repo.uum.edu.my/15356/1/ida%252F2012%252F16-3%252FIDA00535.pdf
http://repo.uum.edu.my/15356/
http://dx.doi.org/10.3233/IDA-2012-0535
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.uum.repo.15356
record_format eprints
spelling my.uum.repo.153562015-09-02T01:17:19Z http://repo.uum.edu.my/15356/ Deviation detection in text using conceptual graph interchange format and error tolerance dissimilarity function Kamaruddin, Siti Sakira Hamdan, Abdul Razak Abu Bakar, Azuraliza Mat Nor, Fauzias QA76 Computer software The rapid increase in the amount of textual data has brought forward a growing research interest towards mining text to detect deviations. Specialized methods for specific domains have emerged to satisfy various needs in discovering rare patterns in text. This paper focuses on a graph-based approach for text representation and presents a novel error tolerance dissimilarity algorithm for deviation detection. We resolve two non-trivial problems, i.e. semantic representation of text and the complexity of graph matching. We employ conceptual graphs interchange format (CGIF) – a knowledge representation formalism to capture the structure and semantics of sentences. We propose a novel error tolerance dissimilarity algorithm to detect deviations in the CGIFs. We evaluate our method in the context of analyzing real world financial statements for identifying deviating performance indicators. We show that our method performs better when compared with two related text based graph similarity measuring methods. Our proposed method has managed to identify deviating sentences and it strongly correlates with expert judgments. Furthermore, it offers error tolerance matching of CGIFs and retains a linear complexity with the increasing number of CGIFs. IOS Press 2012 Article PeerReviewed application/pdf en cc_by http://repo.uum.edu.my/15356/1/ida%252F2012%252F16-3%252FIDA00535.pdf Kamaruddin, Siti Sakira and Hamdan, Abdul Razak and Abu Bakar, Azuraliza and Mat Nor, Fauzias (2012) Deviation detection in text using conceptual graph interchange format and error tolerance dissimilarity function. Intelligent Data Analysis, 16 (3). pp. 487-511. ISSN 1088-467X http://dx.doi.org/10.3233/IDA-2012-0535
institution Universiti Utara Malaysia
building UUM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Utara Malaysia
content_source UUM Institutionali Repository
url_provider http://repo.uum.edu.my/
language English
topic QA76 Computer software
spellingShingle QA76 Computer software
Kamaruddin, Siti Sakira
Hamdan, Abdul Razak
Abu Bakar, Azuraliza
Mat Nor, Fauzias
Deviation detection in text using conceptual graph interchange format and error tolerance dissimilarity function
description The rapid increase in the amount of textual data has brought forward a growing research interest towards mining text to detect deviations. Specialized methods for specific domains have emerged to satisfy various needs in discovering rare patterns in text. This paper focuses on a graph-based approach for text representation and presents a novel error tolerance dissimilarity algorithm for deviation detection. We resolve two non-trivial problems, i.e. semantic representation of text and the complexity of graph matching. We employ conceptual graphs interchange format (CGIF) – a knowledge representation formalism to capture the structure and semantics of sentences. We propose a novel error tolerance dissimilarity algorithm to detect deviations in the CGIFs. We evaluate our method in the context of analyzing real world financial statements for identifying deviating performance indicators. We show that our method performs better when compared with two related text based graph similarity measuring methods. Our proposed method has managed to identify deviating sentences and it strongly correlates with expert judgments. Furthermore, it offers error tolerance matching of CGIFs and retains a linear complexity with the increasing number of CGIFs.
format Article
author Kamaruddin, Siti Sakira
Hamdan, Abdul Razak
Abu Bakar, Azuraliza
Mat Nor, Fauzias
author_facet Kamaruddin, Siti Sakira
Hamdan, Abdul Razak
Abu Bakar, Azuraliza
Mat Nor, Fauzias
author_sort Kamaruddin, Siti Sakira
title Deviation detection in text using conceptual graph interchange format and error tolerance dissimilarity function
title_short Deviation detection in text using conceptual graph interchange format and error tolerance dissimilarity function
title_full Deviation detection in text using conceptual graph interchange format and error tolerance dissimilarity function
title_fullStr Deviation detection in text using conceptual graph interchange format and error tolerance dissimilarity function
title_full_unstemmed Deviation detection in text using conceptual graph interchange format and error tolerance dissimilarity function
title_sort deviation detection in text using conceptual graph interchange format and error tolerance dissimilarity function
publisher IOS Press
publishDate 2012
url http://repo.uum.edu.my/15356/1/ida%252F2012%252F16-3%252FIDA00535.pdf
http://repo.uum.edu.my/15356/
http://dx.doi.org/10.3233/IDA-2012-0535
_version_ 1644281697295925248
score 13.149126