Staff View: Exploring Canonical Data Model for Text Clustering (S/O 12828)

Exploring Canonical Data Model for Text Clustering (S/O 12828)

The abundance of text data have been witnessed with the growth of web and other text repositories. There is an important need to provide improved mechanism to effectively represent and retrieve text data. This paper advocates the construction of canonical data models for mapping contents of multi do...

Full description

Saved in:

Bibliographic Details
Main Authors:	Kamaruddin, Siti Sakira, Yusof, Yuhanis
Format:	Monograph
Language:	English
Published:	UUM
Subjects:	T Technology (General)
Online Access:	https://repo.uum.edu.my/id/eprint/31505/1/12828.pdf https://repo.uum.edu.my/id/eprint/31505/
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.uum.repo.31505
record_format	eprints
spelling	my.uum.repo.315052024-11-18T08:46:44Z https://repo.uum.edu.my/id/eprint/31505/ Exploring Canonical Data Model for Text Clustering (S/O 12828) Kamaruddin, Siti Sakira Yusof, Yuhanis T Technology (General) The abundance of text data have been witnessed with the growth of web and other text repositories. There is an important need to provide improved mechanism to effectively represent and retrieve text data. This paper advocates the construction of canonical data models for mapping contents of multi documents into a few general models that can represent the corpus. However to construct canonical data model for text, it involves non-trivial text mining techniques prior to the actual construction process. Furthermore constructing canonical data models for all terms in a set of documents will be costly and will not reduce the sparsity problem that are associated with text document processing. In order to solve this problem we propose a two tier dimensionality reduction step adopting commonly used feature extraction and feature selection methods. The reduced features are then used to construct a canonical data model. A canonical data model for text documents can be used as a general model that has potential to act as a reference model for text comparison in a wide variety of text mining tasks such as text clustering, text classification, text summarization and text deviation detection. Experimental result reveals that the proposed approach produces better results compared to methods without canonical data model UUM Monograph NonPeerReviewed application/pdf en https://repo.uum.edu.my/id/eprint/31505/1/12828.pdf Kamaruddin, Siti Sakira and Yusof, Yuhanis Exploring Canonical Data Model for Text Clustering (S/O 12828). Project Report. UUM. (Submitted)
institution	Universiti Utara Malaysia
building	UUM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Utara Malaysia
content_source	UUM Institutional Repository
url_provider	http://repo.uum.edu.my/
language	English
topic	T Technology (General)
spellingShingle	T Technology (General) Kamaruddin, Siti Sakira Yusof, Yuhanis Exploring Canonical Data Model for Text Clustering (S/O 12828)
description	The abundance of text data have been witnessed with the growth of web and other text repositories. There is an important need to provide improved mechanism to effectively represent and retrieve text data. This paper advocates the construction of canonical data models for mapping contents of multi documents into a few general models that can represent the corpus. However to construct canonical data model for text, it involves non-trivial text mining techniques prior to the actual construction process. Furthermore constructing canonical data models for all terms in a set of documents will be costly and will not reduce the sparsity problem that are associated with text document processing. In order to solve this problem we propose a two tier dimensionality reduction step adopting commonly used feature extraction and feature selection methods. The reduced features are then used to construct a canonical data model. A canonical data model for text documents can be used as a general model that has potential to act as a reference model for text comparison in a wide variety of text mining tasks such as text clustering, text classification, text summarization and text deviation detection. Experimental result reveals that the proposed approach produces better results compared to methods without canonical data model
format	Monograph
author	Kamaruddin, Siti Sakira Yusof, Yuhanis
author_facet	Kamaruddin, Siti Sakira Yusof, Yuhanis
author_sort	Kamaruddin, Siti Sakira
title	Exploring Canonical Data Model for Text Clustering (S/O 12828)
title_short	Exploring Canonical Data Model for Text Clustering (S/O 12828)
title_full	Exploring Canonical Data Model for Text Clustering (S/O 12828)
title_fullStr	Exploring Canonical Data Model for Text Clustering (S/O 12828)
title_full_unstemmed	Exploring Canonical Data Model for Text Clustering (S/O 12828)
title_sort	exploring canonical data model for text clustering (s/o 12828)
publisher	UUM
url	https://repo.uum.edu.my/id/eprint/31505/1/12828.pdf https://repo.uum.edu.my/id/eprint/31505/
_version_	1816134263486021632
score	13.214268

Exploring Canonical Data Model for Text Clustering (S/O 12828)

Similar Items