Support Vector Machines (SVM) in Test Extraction
Text categorization is the process of grouping documents or words into predefined categories. Each category consists of documents or words having similar attributes. There exist numerous algorithms to address the need of text categorization including Naive Bayes, k-nearest-neighbor classifier, an...
Saved in:
Main Author: | |
---|---|
Format: | Final Year Project |
Language: | English |
Published: |
Universiti Teknologi PETRONAS
2006
|
Subjects: | |
Online Access: | http://utpedia.utp.edu.my/9323/1/2006%20-%20Support%20Vector%20Machine%20%28SVM%29%20in%20Test%20Extraction.pdf http://utpedia.utp.edu.my/9323/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my-utp-utpedia.9323 |
---|---|
record_format |
eprints |
spelling |
my-utp-utpedia.93232017-01-25T09:46:05Z http://utpedia.utp.edu.my/9323/ Support Vector Machines (SVM) in Test Extraction Ghazali, Nadirah T Technology (General) Text categorization is the process of grouping documents or words into predefined categories. Each category consists of documents or words having similar attributes. There exist numerous algorithms to address the need of text categorization including Naive Bayes, k-nearest-neighbor classifier, and decision trees. In this project, Support Vector Machines (SVM) is studied and experimented by the implementation ofa textual extractor. This algorithm is used to extract important points from a lengthy document, by which it classifies each word in the document under its relevant category and constructs the structure of the summary with reference to the categorized words. The performance of the extractor is evaluated using a similar corpus against an existing summarizer, which uses a different kind of approach. Summarization is part of text categorization whereby it is considered an essential part of today's information-led society, and it has been a growing area of research for over 40 years. This project's objective is to create a summarizer, or extractor, based on machine learning algorithms, which are namely SVM and K-Means. Each word in the particular document is processed by both algorithms to determine its actual occurrence in the document by which it will first be clustered or grouped into categories based on parts of speech (verb, noun, adjective) which is done by K-Means, then later processed by SVM to determine the actual occurrence of each word in each of the cluster, taking into account whether the words have similar meanings with otherwords in the subsequent cluster. The corpus chosen to evaluate the application is the Reuters-21578 dataset comprising of newspaper articles. Evaluation of the applications are carried out against another accompanying system-generated extract which is already in the market, as a means to observe the amount of sentences overlap with the tested applications, in this case, the Text Extractor and also Microsoft Word AutoSummarizer. Results show that the Text Extractor has optimal results at compression rates of 10 - 20% and 35 - 45% Universiti Teknologi PETRONAS 2006-11 Final Year Project NonPeerReviewed application/pdf en http://utpedia.utp.edu.my/9323/1/2006%20-%20Support%20Vector%20Machine%20%28SVM%29%20in%20Test%20Extraction.pdf Ghazali, Nadirah (2006) Support Vector Machines (SVM) in Test Extraction. Universiti Teknologi PETRONAS. (Unpublished) |
institution |
Universiti Teknologi Petronas |
building |
UTP Resource Centre |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Petronas |
content_source |
UTP Electronic and Digitized Intellectual Asset |
url_provider |
http://utpedia.utp.edu.my/ |
language |
English |
topic |
T Technology (General) |
spellingShingle |
T Technology (General) Ghazali, Nadirah Support Vector Machines (SVM) in Test Extraction |
description |
Text categorization is the process of grouping documents or words into predefined
categories. Each category consists of documents or words having similar attributes.
There exist numerous algorithms to address the need of text categorization including
Naive Bayes, k-nearest-neighbor classifier, and decision trees. In this project, Support
Vector Machines (SVM) is studied and experimented by the implementation ofa textual
extractor. This algorithm is used to extract important points from a lengthy document,
by which it classifies each word in the document under its relevant category and
constructs the structure of the summary with reference to the categorized words. The
performance of the extractor is evaluated using a similar corpus against an existing
summarizer, which uses a different kind of approach. Summarization is part of text
categorization whereby it is considered an essential part of today's information-led
society, and it has been a growing area of research for over 40 years. This project's
objective is to create a summarizer, or extractor, based on machine learning algorithms,
which are namely SVM and K-Means. Each word in the particular document is
processed by both algorithms to determine its actual occurrence in the document by
which it will first be clustered or grouped into categories based on parts of speech (verb,
noun, adjective) which is done by K-Means, then later processed by SVM to determine
the actual occurrence of each word in each of the cluster, taking into account whether
the words have similar meanings with otherwords in the subsequent cluster. The corpus
chosen to evaluate the application is the Reuters-21578 dataset comprising of
newspaper articles. Evaluation of the applications are carried out against another
accompanying system-generated extract which is already in the market, as a means to
observe the amount of sentences overlap with the tested applications, in this case, the
Text Extractor and also Microsoft Word AutoSummarizer. Results show that the Text
Extractor has optimal results at compression rates of 10 - 20% and 35 - 45% |
format |
Final Year Project |
author |
Ghazali, Nadirah |
author_facet |
Ghazali, Nadirah |
author_sort |
Ghazali, Nadirah |
title |
Support Vector Machines (SVM) in Test Extraction |
title_short |
Support Vector Machines (SVM) in Test Extraction |
title_full |
Support Vector Machines (SVM) in Test Extraction |
title_fullStr |
Support Vector Machines (SVM) in Test Extraction |
title_full_unstemmed |
Support Vector Machines (SVM) in Test Extraction |
title_sort |
support vector machines (svm) in test extraction |
publisher |
Universiti Teknologi PETRONAS |
publishDate |
2006 |
url |
http://utpedia.utp.edu.my/9323/1/2006%20-%20Support%20Vector%20Machine%20%28SVM%29%20in%20Test%20Extraction.pdf http://utpedia.utp.edu.my/9323/ |
_version_ |
1739831659053711360 |
score |
13.211869 |