Improved relative discriminative criterion using rare and informative terms and ringed seal search-support vector machine techniques for text classification

Classification has become an important task for automatically classifying the documents to their respective categories. For text classification, feature selection techniques are normally used to identify important features and to remove irrelevant, and noisy features for minimizing the dimensionalit...

Full description

Saved in:
Bibliographic Details
Main Author: Sharif, Wareesa
Format: Thesis
Language:English
English
English
Published: 2019
Subjects:
Online Access:http://eprints.uthm.edu.my/135/1/24p%20WAREESA%20SHARIF.pdf
http://eprints.uthm.edu.my/135/2/WAREESA%20SHARIF%20COPYRIGHT%20DECLARATION.pdf
http://eprints.uthm.edu.my/135/3/WAREESA%20SHARIF%20WATERMARK.pdf
http://eprints.uthm.edu.my/135/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.uthm.eprints.135
record_format eprints
spelling my.uthm.eprints.1352021-07-05T02:26:42Z http://eprints.uthm.edu.my/135/ Improved relative discriminative criterion using rare and informative terms and ringed seal search-support vector machine techniques for text classification Sharif, Wareesa QA76 Computer software Classification has become an important task for automatically classifying the documents to their respective categories. For text classification, feature selection techniques are normally used to identify important features and to remove irrelevant, and noisy features for minimizing the dimensionality of feature space. These techniques are expected particularly to improve efficiency, accuracy, and comprehensibility of the classification models in text labeling problems. Most of the feature selection techniques utilize document and term frequencies to rank a term. Existing feature selection techniques (e.g. RDC, NRDC) consider frequently occurring terms and ignore rarely occurring terms count in a class. However, this study proposes the Improved Relative Discriminative Criterion (IRDC) technique which considers rarely occurring terms count. It is argued that rarely occurring terms count are also meaningful and important as frequently occurring terms in a class. The proposed IRDC is compared to the most recent feature selection techniques RDC and NRDC. The results reveal significant improvement by the proposed IRDC technique for feature selection in terms of precision 27%, recall 30%, macro-average 35% and micro- average 30%. Additionally, this study also proposes a hybrid algorithm named: Ringed Seal Search-Support Vector Machine (RSS-SVM) to improve the generalization and learning capability of the SVM. The proposed RSS-SVM optimizes kernel and penalty parameter with the help of RSS algorithm. The proposed RSS-SVM is compared to the most recent techniques GA-SVM and CS-SVM. The results show significant improvement by the proposed RSS-SVM for classification in terms of accuracy 18.8%, recall 15.68%, precision 15.62% and specificity 13.69%. In conclusion, the proposed IRDC has shown better performance as compare to existing techniques because its capability in considering rare and informative terms. Additionally, the proposed RSS- SVM has shown better performance as compare to existing techniques because it has capability to improve balance between exploration and exploitation. 2019-12 Thesis NonPeerReviewed text en http://eprints.uthm.edu.my/135/1/24p%20WAREESA%20SHARIF.pdf text en http://eprints.uthm.edu.my/135/2/WAREESA%20SHARIF%20COPYRIGHT%20DECLARATION.pdf text en http://eprints.uthm.edu.my/135/3/WAREESA%20SHARIF%20WATERMARK.pdf Sharif, Wareesa (2019) Improved relative discriminative criterion using rare and informative terms and ringed seal search-support vector machine techniques for text classification. Doctoral thesis, Universiti Tun Hussein Onn Malaysia.
institution Universiti Tun Hussein Onn Malaysia
building UTHM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Tun Hussein Onn Malaysia
content_source UTHM Institutional Repository
url_provider http://eprints.uthm.edu.my/
language English
English
English
topic QA76 Computer software
spellingShingle QA76 Computer software
Sharif, Wareesa
Improved relative discriminative criterion using rare and informative terms and ringed seal search-support vector machine techniques for text classification
description Classification has become an important task for automatically classifying the documents to their respective categories. For text classification, feature selection techniques are normally used to identify important features and to remove irrelevant, and noisy features for minimizing the dimensionality of feature space. These techniques are expected particularly to improve efficiency, accuracy, and comprehensibility of the classification models in text labeling problems. Most of the feature selection techniques utilize document and term frequencies to rank a term. Existing feature selection techniques (e.g. RDC, NRDC) consider frequently occurring terms and ignore rarely occurring terms count in a class. However, this study proposes the Improved Relative Discriminative Criterion (IRDC) technique which considers rarely occurring terms count. It is argued that rarely occurring terms count are also meaningful and important as frequently occurring terms in a class. The proposed IRDC is compared to the most recent feature selection techniques RDC and NRDC. The results reveal significant improvement by the proposed IRDC technique for feature selection in terms of precision 27%, recall 30%, macro-average 35% and micro- average 30%. Additionally, this study also proposes a hybrid algorithm named: Ringed Seal Search-Support Vector Machine (RSS-SVM) to improve the generalization and learning capability of the SVM. The proposed RSS-SVM optimizes kernel and penalty parameter with the help of RSS algorithm. The proposed RSS-SVM is compared to the most recent techniques GA-SVM and CS-SVM. The results show significant improvement by the proposed RSS-SVM for classification in terms of accuracy 18.8%, recall 15.68%, precision 15.62% and specificity 13.69%. In conclusion, the proposed IRDC has shown better performance as compare to existing techniques because its capability in considering rare and informative terms. Additionally, the proposed RSS- SVM has shown better performance as compare to existing techniques because it has capability to improve balance between exploration and exploitation.
format Thesis
author Sharif, Wareesa
author_facet Sharif, Wareesa
author_sort Sharif, Wareesa
title Improved relative discriminative criterion using rare and informative terms and ringed seal search-support vector machine techniques for text classification
title_short Improved relative discriminative criterion using rare and informative terms and ringed seal search-support vector machine techniques for text classification
title_full Improved relative discriminative criterion using rare and informative terms and ringed seal search-support vector machine techniques for text classification
title_fullStr Improved relative discriminative criterion using rare and informative terms and ringed seal search-support vector machine techniques for text classification
title_full_unstemmed Improved relative discriminative criterion using rare and informative terms and ringed seal search-support vector machine techniques for text classification
title_sort improved relative discriminative criterion using rare and informative terms and ringed seal search-support vector machine techniques for text classification
publishDate 2019
url http://eprints.uthm.edu.my/135/1/24p%20WAREESA%20SHARIF.pdf
http://eprints.uthm.edu.my/135/2/WAREESA%20SHARIF%20COPYRIGHT%20DECLARATION.pdf
http://eprints.uthm.edu.my/135/3/WAREESA%20SHARIF%20WATERMARK.pdf
http://eprints.uthm.edu.my/135/
_version_ 1738580699526463488
score 13.18916