Support Vector Machines (SVM) in Test Extraction

Text categorization is the process of grouping documents or words into predefined categories. Each category consists of documents or words having similar attributes. There exist numerous algorithms to address the need of text categorization including Naive Bayes, k-nearest-neighbor classifier, an...

Full description

Saved in:
Bibliographic Details
Main Author: Ghazali, Nadirah
Format: Final Year Project
Language:English
Published: Universiti Teknologi PETRONAS 2006
Subjects:
Online Access:http://utpedia.utp.edu.my/9323/1/2006%20-%20Support%20Vector%20Machine%20%28SVM%29%20in%20Test%20Extraction.pdf
http://utpedia.utp.edu.my/9323/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utp-utpedia.9323
record_format eprints
spelling my-utp-utpedia.93232017-01-25T09:46:05Z http://utpedia.utp.edu.my/9323/ Support Vector Machines (SVM) in Test Extraction Ghazali, Nadirah T Technology (General) Text categorization is the process of grouping documents or words into predefined categories. Each category consists of documents or words having similar attributes. There exist numerous algorithms to address the need of text categorization including Naive Bayes, k-nearest-neighbor classifier, and decision trees. In this project, Support Vector Machines (SVM) is studied and experimented by the implementation ofa textual extractor. This algorithm is used to extract important points from a lengthy document, by which it classifies each word in the document under its relevant category and constructs the structure of the summary with reference to the categorized words. The performance of the extractor is evaluated using a similar corpus against an existing summarizer, which uses a different kind of approach. Summarization is part of text categorization whereby it is considered an essential part of today's information-led society, and it has been a growing area of research for over 40 years. This project's objective is to create a summarizer, or extractor, based on machine learning algorithms, which are namely SVM and K-Means. Each word in the particular document is processed by both algorithms to determine its actual occurrence in the document by which it will first be clustered or grouped into categories based on parts of speech (verb, noun, adjective) which is done by K-Means, then later processed by SVM to determine the actual occurrence of each word in each of the cluster, taking into account whether the words have similar meanings with otherwords in the subsequent cluster. The corpus chosen to evaluate the application is the Reuters-21578 dataset comprising of newspaper articles. Evaluation of the applications are carried out against another accompanying system-generated extract which is already in the market, as a means to observe the amount of sentences overlap with the tested applications, in this case, the Text Extractor and also Microsoft Word AutoSummarizer. Results show that the Text Extractor has optimal results at compression rates of 10 - 20% and 35 - 45% Universiti Teknologi PETRONAS 2006-11 Final Year Project NonPeerReviewed application/pdf en http://utpedia.utp.edu.my/9323/1/2006%20-%20Support%20Vector%20Machine%20%28SVM%29%20in%20Test%20Extraction.pdf Ghazali, Nadirah (2006) Support Vector Machines (SVM) in Test Extraction. Universiti Teknologi PETRONAS. (Unpublished)
institution Universiti Teknologi Petronas
building UTP Resource Centre
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Petronas
content_source UTP Electronic and Digitized Intellectual Asset
url_provider http://utpedia.utp.edu.my/
language English
topic T Technology (General)
spellingShingle T Technology (General)
Ghazali, Nadirah
Support Vector Machines (SVM) in Test Extraction
description Text categorization is the process of grouping documents or words into predefined categories. Each category consists of documents or words having similar attributes. There exist numerous algorithms to address the need of text categorization including Naive Bayes, k-nearest-neighbor classifier, and decision trees. In this project, Support Vector Machines (SVM) is studied and experimented by the implementation ofa textual extractor. This algorithm is used to extract important points from a lengthy document, by which it classifies each word in the document under its relevant category and constructs the structure of the summary with reference to the categorized words. The performance of the extractor is evaluated using a similar corpus against an existing summarizer, which uses a different kind of approach. Summarization is part of text categorization whereby it is considered an essential part of today's information-led society, and it has been a growing area of research for over 40 years. This project's objective is to create a summarizer, or extractor, based on machine learning algorithms, which are namely SVM and K-Means. Each word in the particular document is processed by both algorithms to determine its actual occurrence in the document by which it will first be clustered or grouped into categories based on parts of speech (verb, noun, adjective) which is done by K-Means, then later processed by SVM to determine the actual occurrence of each word in each of the cluster, taking into account whether the words have similar meanings with otherwords in the subsequent cluster. The corpus chosen to evaluate the application is the Reuters-21578 dataset comprising of newspaper articles. Evaluation of the applications are carried out against another accompanying system-generated extract which is already in the market, as a means to observe the amount of sentences overlap with the tested applications, in this case, the Text Extractor and also Microsoft Word AutoSummarizer. Results show that the Text Extractor has optimal results at compression rates of 10 - 20% and 35 - 45%
format Final Year Project
author Ghazali, Nadirah
author_facet Ghazali, Nadirah
author_sort Ghazali, Nadirah
title Support Vector Machines (SVM) in Test Extraction
title_short Support Vector Machines (SVM) in Test Extraction
title_full Support Vector Machines (SVM) in Test Extraction
title_fullStr Support Vector Machines (SVM) in Test Extraction
title_full_unstemmed Support Vector Machines (SVM) in Test Extraction
title_sort support vector machines (svm) in test extraction
publisher Universiti Teknologi PETRONAS
publishDate 2006
url http://utpedia.utp.edu.my/9323/1/2006%20-%20Support%20Vector%20Machine%20%28SVM%29%20in%20Test%20Extraction.pdf
http://utpedia.utp.edu.my/9323/
_version_ 1739831659053711360
score 13.211869