Indonesian Enhanced Bracewell‘S Text Classification Method For Indonesian News Documents

Text classification has been a popular research field in the area of computer science. It deals with the assignment of labels into a group of similar textual document. However, there have been very limited approaches which are focused on improving the unique character of news corpus, even less for I...

Full description

Saved in:
Bibliographic Details
Main Author: KUSUMAAGAMA FUDDOLY, AINI RACHMANIA
Format: Thesis
Language:English
Published: 2014
Subjects:
Online Access:http://utpedia.utp.edu.my/15129/1/Thesis%20Final%20-%20AINI%20RACHMANIA.pdf
http://utpedia.utp.edu.my/15129/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Text classification has been a popular research field in the area of computer science. It deals with the assignment of labels into a group of similar textual document. However, there have been very limited approaches which are focused on improving the unique character of news corpus, even less for Indonesian news document. Apart from that, only few were aimed at categorizing and identifying topics. The aim of this study is to solve the problems in text classification for online news: the large volume of data, sparsely distributed articles, classification of unseen data, and limitation of text classification approach for Indonesian news documents. Classification is done using likelihood calculation for the category classification, whereas for the topic identification cosine similarity calculation is employed. Two sets of data have been used during experiments: training and testing corpus. The training corpus consists of 900 documents, and is employed as the learning material for the classifier. The testing set covers 455 documents and are utilised to measure the accuracy of the classifier. Classification was conducted offline and online using Indonesian online news dataset from the year 2011 – 2012. The enhanced method is proven able to produce a good result with accuracy rate of up to 93.84% accuracy for category classification, and 95.64% for topic identification. In terms of computational time, the results prove that proposed classifier works optimally on n = 20, with an average of 2.81 seconds computational time. In comparison against human evaluation, the integrated method has managed to outperform by 13%. A study in depth has also been conducted to investigate the human annotators‘ responses towards the experiments process. This highlights that the enhanced method has advantage over manual classification, and is suitable for Indonesian news classification.