AN INTEGRATED GENERIC TEXT CLASSIFICATION ALGORITHM FOR INDONESIAN AND MALAY NEWS DOCUMENTS

Text classification (TC)provides a better wayto organize information since it allows better understanding and interpretation of the content. It deals with the assignment of labels into a group of similar textual document. However, TC research for Asian language documents is relatively limited com...

Full description

Saved in:
Bibliographic Details
Main Author: ,, ZUL INDRA
Format: Thesis
Language:English
Published: 2016
Subjects:
Online Access:http://utpedia.utp.edu.my/id/eprint/21420/1/2015%20-IT%20-%20AN%20INTEGRATED%20GENERIC%20TEXT%20CLASSIFICATION%20ALGORITHM%20FOR%20INDONESIAN%20AND%20MALAY%20NEWS%20DOCUMENT%20-%20ZUL%20INDRA%20-%20MASTER.pdf
http://utpedia.utp.edu.my/id/eprint/21420/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Text classification (TC)provides a better wayto organize information since it allows better understanding and interpretation of the content. It deals with the assignment of labels into a group of similar textual document. However, TC research for Asian language documents is relatively limited compared to English documents and even lesser particularly for news articles. Apart from that, TC research to classify textual documents in similar morphology such Indonesian and Malay is still scarce. Hence, the aimof this study is to develop an integrated generic TCalgorithm which is able to identify the language and then classify the category for identified news documents. Furthermore, top-ra feature selection method is utilised to improve TCperformance andto overcome theonline news corpora classification challenges: rapid datagrowth of online news documents, and the high computational time.