Staff View: Enhanced ontology-based text classification algorithm for structurally organized documents

Enhanced ontology-based text classification algorithm for structurally organized documents

Text classification (TC) is an important foundation of information retrieval and text mining. The main task of a TC is to predict the text‟s class according to the type of tag given in advance. Most TC algorithms used terms in representing the document which does not consider the relations among th...

Full description

Saved in:

Bibliographic Details
Main Author:	Oleiwi, Suha Sahib
Format:	Thesis
Language:	English English
Published:	2015
Subjects:	QA Mathematics QA76.76 Fuzzy System.
Online Access:	http://etd.uum.edu.my/5358/
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my.uum.etd.5358
record_format	eprints
spelling	my.uum.etd.53582021-03-18T08:38:01Z http://etd.uum.edu.my/5358/ Enhanced ontology-based text classification algorithm for structurally organized documents Oleiwi, Suha Sahib QA Mathematics QA76.76 Fuzzy System. Text classification (TC) is an important foundation of information retrieval and text mining. The main task of a TC is to predict the text‟s class according to the type of tag given in advance. Most TC algorithms used terms in representing the document which does not consider the relations among the terms. These algorithms represent documents in a space where every word is assumed to be a dimension. As a result such representations generate high dimensionality which gives a negative effect on the classification performance. The objectives of this thesis are to formulate algorithms for classifying text by creating suitable feature vector and reducing the dimension of data which will enhance the classification accuracy. This research combines the ontology and text representation for classification by developing five algorithms. The first and second algorithms namely Concept Feature Vector (CFV) and Structure Feature Vector (SFV), create feature vector to represent the document. The third algorithm is the Ontology Based Text Classification (OBTC) and is designed to reduce the dimensionality of training sets. The fourth and fifth algorithms, Concept Feature Vector_Text Classification (CFV_TC) and Structure Feature Vector_Text Classification (SFV_TC) classify the document to its related set of classes. These proposed algorithms were tested on five different scientific paper datasets downloaded from different digital libraries and repositories. Experimental obtained from the proposed algorithm, CFV_TC and SFV_TC shown better average results in terms of precision, recall, f-measure and accuracy compared against SVM and RSS approaches. The work in this study contributes to exploring the related document in information retrieval and text mining research by using ontology in TC. 2015 Thesis NonPeerReviewed text en /5358/1/s91731.pdf text en /5358/2/s91731_abstract.pdf Oleiwi, Suha Sahib (2015) Enhanced ontology-based text classification algorithm for structurally organized documents. PhD. thesis, Universiti Utara Malaysia.
institution	Universiti Utara Malaysia
building	UUM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Utara Malaysia
content_source	UUM Electronic Theses
url_provider	http://etd.uum.edu.my/
language	English English
topic	QA Mathematics QA76.76 Fuzzy System.
spellingShingle	QA Mathematics QA76.76 Fuzzy System. Oleiwi, Suha Sahib Enhanced ontology-based text classification algorithm for structurally organized documents
description	Text classification (TC) is an important foundation of information retrieval and text mining. The main task of a TC is to predict the text‟s class according to the type of tag given in advance. Most TC algorithms used terms in representing the document which does not consider the relations among the terms. These algorithms represent documents in a space where every word is assumed to be a dimension. As a result such representations generate high dimensionality which gives a negative effect on the classification performance. The objectives of this thesis are to formulate algorithms for classifying text by creating suitable feature vector and reducing the dimension of data which will enhance the classification accuracy. This research combines the ontology and text representation for classification by developing five algorithms. The first and second algorithms namely Concept Feature Vector (CFV) and Structure Feature Vector (SFV), create feature vector to represent the document. The third algorithm is the Ontology Based Text Classification (OBTC) and is designed to reduce the dimensionality of training sets. The fourth and fifth algorithms, Concept Feature Vector_Text Classification (CFV_TC) and Structure Feature Vector_Text Classification (SFV_TC) classify the document to its related set of classes. These proposed algorithms were tested on five different scientific paper datasets downloaded from different digital libraries and repositories. Experimental obtained from the proposed algorithm, CFV_TC and SFV_TC shown better average results in terms of precision, recall, f-measure and accuracy compared against SVM and RSS approaches. The work in this study contributes to exploring the related document in information retrieval and text mining research by using ontology in TC.
format	Thesis
author	Oleiwi, Suha Sahib
author_facet	Oleiwi, Suha Sahib
author_sort	Oleiwi, Suha Sahib
title	Enhanced ontology-based text classification algorithm for structurally organized documents
title_short	Enhanced ontology-based text classification algorithm for structurally organized documents
title_full	Enhanced ontology-based text classification algorithm for structurally organized documents
title_fullStr	Enhanced ontology-based text classification algorithm for structurally organized documents
title_full_unstemmed	Enhanced ontology-based text classification algorithm for structurally organized documents
title_sort	enhanced ontology-based text classification algorithm for structurally organized documents
publishDate	2015
url	http://etd.uum.edu.my/5358/
_version_	1695533671769964544
score	13.188486

Enhanced ontology-based text classification algorithm for structurally organized documents

Similar Items