Staff View: Support Vector Machines (SVM) in Test Extraction

Support Vector Machines (SVM) in Test Extraction

Text categorization is the process of grouping documents or words into predefined categories. Each category consists of documents or words having similar attributes. There exist numerous algorithms to address the need of text categorization including Naive Bayes, k-nearest-neighbor classifier, an...

Full description

Saved in:

Bibliographic Details
Main Author:	Ghazali, Nadirah
Format:	Final Year Project
Language:	English
Published:	Universiti Teknologi PETRONAS 2006
Subjects:	T Technology (General)
Online Access:	http://utpedia.utp.edu.my/9323/1/2006%20-%20Support%20Vector%20Machine%20%28SVM%29%20in%20Test%20Extraction.pdf http://utpedia.utp.edu.my/9323/
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my-utp-utpedia.9323
record_format	eprints
spelling	my-utp-utpedia.93232017-01-25T09:46:05Z http://utpedia.utp.edu.my/9323/ Support Vector Machines (SVM) in Test Extraction Ghazali, Nadirah T Technology (General) Text categorization is the process of grouping documents or words into predefined categories. Each category consists of documents or words having similar attributes. There exist numerous algorithms to address the need of text categorization including Naive Bayes, k-nearest-neighbor classifier, and decision trees. In this project, Support Vector Machines (SVM) is studied and experimented by the implementation ofa textual extractor. This algorithm is used to extract important points from a lengthy document, by which it classifies each word in the document under its relevant category and constructs the structure of the summary with reference to the categorized words. The performance of the extractor is evaluated using a similar corpus against an existing summarizer, which uses a different kind of approach. Summarization is part of text categorization whereby it is considered an essential part of today's information-led society, and it has been a growing area of research for over 40 years. This project's objective is to create a summarizer, or extractor, based on machine learning algorithms, which are namely SVM and K-Means. Each word in the particular document is processed by both algorithms to determine its actual occurrence in the document by which it will first be clustered or grouped into categories based on parts of speech (verb, noun, adjective) which is done by K-Means, then later processed by SVM to determine the actual occurrence of each word in each of the cluster, taking into account whether the words have similar meanings with otherwords in the subsequent cluster. The corpus chosen to evaluate the application is the Reuters-21578 dataset comprising of newspaper articles. Evaluation of the applications are carried out against another accompanying system-generated extract which is already in the market, as a means to observe the amount of sentences overlap with the tested applications, in this case, the Text Extractor and also Microsoft Word AutoSummarizer. Results show that the Text Extractor has optimal results at compression rates of 10 - 20% and 35 - 45% Universiti Teknologi PETRONAS 2006-11 Final Year Project NonPeerReviewed application/pdf en http://utpedia.utp.edu.my/9323/1/2006%20-%20Support%20Vector%20Machine%20%28SVM%29%20in%20Test%20Extraction.pdf Ghazali, Nadirah (2006) Support Vector Machines (SVM) in Test Extraction. Universiti Teknologi PETRONAS. (Unpublished)
institution	Universiti Teknologi Petronas
building	UTP Resource Centre
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Teknologi Petronas
content_source	UTP Electronic and Digitized Intellectual Asset
url_provider	http://utpedia.utp.edu.my/
language	English
topic	T Technology (General)
spellingShingle	T Technology (General) Ghazali, Nadirah Support Vector Machines (SVM) in Test Extraction
description	Text categorization is the process of grouping documents or words into predefined categories. Each category consists of documents or words having similar attributes. There exist numerous algorithms to address the need of text categorization including Naive Bayes, k-nearest-neighbor classifier, and decision trees. In this project, Support Vector Machines (SVM) is studied and experimented by the implementation ofa textual extractor. This algorithm is used to extract important points from a lengthy document, by which it classifies each word in the document under its relevant category and constructs the structure of the summary with reference to the categorized words. The performance of the extractor is evaluated using a similar corpus against an existing summarizer, which uses a different kind of approach. Summarization is part of text categorization whereby it is considered an essential part of today's information-led society, and it has been a growing area of research for over 40 years. This project's objective is to create a summarizer, or extractor, based on machine learning algorithms, which are namely SVM and K-Means. Each word in the particular document is processed by both algorithms to determine its actual occurrence in the document by which it will first be clustered or grouped into categories based on parts of speech (verb, noun, adjective) which is done by K-Means, then later processed by SVM to determine the actual occurrence of each word in each of the cluster, taking into account whether the words have similar meanings with otherwords in the subsequent cluster. The corpus chosen to evaluate the application is the Reuters-21578 dataset comprising of newspaper articles. Evaluation of the applications are carried out against another accompanying system-generated extract which is already in the market, as a means to observe the amount of sentences overlap with the tested applications, in this case, the Text Extractor and also Microsoft Word AutoSummarizer. Results show that the Text Extractor has optimal results at compression rates of 10 - 20% and 35 - 45%
format	Final Year Project
author	Ghazali, Nadirah
author_facet	Ghazali, Nadirah
author_sort	Ghazali, Nadirah
title	Support Vector Machines (SVM) in Test Extraction
title_short	Support Vector Machines (SVM) in Test Extraction
title_full	Support Vector Machines (SVM) in Test Extraction
title_fullStr	Support Vector Machines (SVM) in Test Extraction
title_full_unstemmed	Support Vector Machines (SVM) in Test Extraction
title_sort	support vector machines (svm) in test extraction
publisher	Universiti Teknologi PETRONAS
publishDate	2006
url	http://utpedia.utp.edu.my/9323/1/2006%20-%20Support%20Vector%20Machine%20%28SVM%29%20in%20Test%20Extraction.pdf http://utpedia.utp.edu.my/9323/
_version_	1739831659053711360
score	13.211869

Support Vector Machines (SVM) in Test Extraction

Similar Items