loformation Retrieval - using Porter Stemming Algorithm

Stemming is a process of removing or transforming endings (suffixes) when they are found on a word; inflectional endings (-s, -ing, -ed, etc) and derivational endings (-ion, - ative, -ity, -ment, -less, etc) and prefixes (un-, in-, etc). The rationale for using stemming is that similar words usua...

Full description

Saved in:
Bibliographic Details
Main Author: Zulkifly, Zurida Azita
Format: Final Year Project
Language:English
Published: Universiti Teknologi Petronas 2006
Subjects:
Online Access:http://utpedia.utp.edu.my/7082/1/2006%20-%20loformation%20Retrieval%20-%20using%20Porter%20Stemming%20Algorithm.pdf
http://utpedia.utp.edu.my/7082/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Stemming is a process of removing or transforming endings (suffixes) when they are found on a word; inflectional endings (-s, -ing, -ed, etc) and derivational endings (-ion, - ative, -ity, -ment, -less, etc) and prefixes (un-, in-, etc). The rationale for using stemming is that similar words usually have similar meanings, so including words that are similar in meaning to those originally contained within it will increased the retrieval process effectiveness. There are many stemming method that have been developed. However, the main focus of this project is on Porter Stemming Algorithm which has been developed by M.F Porter in 1980. The objective of this project is to develop a system that will demonstrate the information retrieval using Porter Stemming Algorithm. Problem with information retrieval is to get document that relevant to users query. To measure the performance, there are two measurement, which are precision and recall. The scope of the project is to implement the original Porter Stemming Algorithm in the application to improved the precision and recall in the retrieving document process. Even though there are many improvements have been made to the Porter Algorithm, we will focus on the original algorithm in this project. The Porter Stemming algorithm had five phases, which in every phase have it owns rules to stripping the suffixes. By implementing the algorithm, it is expected from the application to retrieve only documents that relevant to the users query.