To study the performance of stemming algorithm on Malay words beginning with the letter "S" / Rohana Jantan

This thesis concerns the study of Malay stemming algorithm for the word beginning with the letter "S". This algorithm is used in the Malay language document that is used is the Quran translated document. A Malay stemming algorithm known as RulesApplication-Order (RAO) is applied in the exp...

Full description

Saved in:
Bibliographic Details
Main Author: Jantan, Rohana
Format: Thesis
Language:English
Published: 2000
Subjects:
Online Access:https://ir.uitm.edu.my/id/eprint/98222/1/98222.PDF
https://ir.uitm.edu.my/id/eprint/98222/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This thesis concerns the study of Malay stemming algorithm for the word beginning with the letter "S". This algorithm is used in the Malay language document that is used is the Quran translated document. A Malay stemming algorithm known as RulesApplication-Order (RAO) is applied in the experiment. In the experiments dictionaries of Malay root words and combination of morphological rules also used. The performance of the Malay stemming algorithm is evaluated by applying to the "S" word by removing different combination of prefixes. The "S" words or the resulted stemmed words are checked for their existences in the dictionaries. If these words do exist, the following stemming processes stop. These words are then analyzed. In the analysis, the percentage of each combination is compared to find the best prefixes combination. The result shows that there is still problem of overstemming, understemming and unstemming of word. For a total of unique 411 "S" words there are 0.73% overstemming, 0.73% understemming and 2.68% unstemmed words. Therefore, the algorithm must be modified in order to increase the performance of the stemming algorithm for Malay words.