Malay Language Stemmer

Stemmer is a language processing tool that has been widely used in many artificial intelligence applications for removing affixes in a word such as prefixes, infixes, and suffixes to generate the root word. This study designs an algorithm and develops a Malay language stemmer. It is given that most...

Full description

Saved in:
Bibliographic Details
Main Authors: Khan Ullah, Rehman, Fitri Suraya, Mohamad, Muh Inam, Ulhaq, Shahren Ahmad, Zaidi Adruce, Philip Nuli, Anding, Sajjad, Nawaz Khan, Abdulrazak Yahya, Saleh
Format: Article
Language:English
Published: International Journal for Research in Emerging Science and Technology 2017
Subjects:
Online Access:http://ir.unimas.my/id/eprint/31722/1/Malay%20Language%20Stemmer%20-%20Copy.pdf
http://ir.unimas.my/id/eprint/31722/
https://ijrest.net/vol-4-issue-12.html
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.unimas.ir.31722
record_format eprints
spelling my.unimas.ir.317222023-08-22T02:45:59Z http://ir.unimas.my/id/eprint/31722/ Malay Language Stemmer Khan Ullah, Rehman Fitri Suraya, Mohamad Muh Inam, Ulhaq Shahren Ahmad, Zaidi Adruce Philip Nuli, Anding Sajjad, Nawaz Khan Abdulrazak Yahya, Saleh P Philology. Linguistics Stemmer is a language processing tool that has been widely used in many artificial intelligence applications for removing affixes in a word such as prefixes, infixes, and suffixes to generate the root word. This study designs an algorithm and develops a Malay language stemmer. It is given that most of Malay language stemmers have problems in stemming, as they tended to have dependencies on online dictionaries, which return false results during stemming. It is given that the complexity of affixes in Malay words is higher than that of English words. Therefore, an offline dictionary of 9,512 words is introduced in this study to handle the ambiguity when stemming Malay words. Each step the algorithm first checks the word in the local dictionary as a root word, otherwise process the word. The five steps are stem-extra-suffix, stem-plural, stem-infix, stem-prefix, and stem-suffix. The affixes rules are extracted from Kamus Tatabahasa, and Kamus Dewan (4th Ed) is used to confirm the accuracy of stemmed words. The results show that the proposed stemmer can stem prefixes, suffixes and infixes with high accuracy. The study conclusively illustrated that the proposed stemmer can handle the complexity of Malay words. This stemmer can be further enhanced by a look-up table or dictionary of overlapping words to cover the prefix and suffix overlapping limitation. International Journal for Research in Emerging Science and Technology 2017-12 Article PeerReviewed text en http://ir.unimas.my/id/eprint/31722/1/Malay%20Language%20Stemmer%20-%20Copy.pdf Khan Ullah, Rehman and Fitri Suraya, Mohamad and Muh Inam, Ulhaq and Shahren Ahmad, Zaidi Adruce and Philip Nuli, Anding and Sajjad, Nawaz Khan and Abdulrazak Yahya, Saleh (2017) Malay Language Stemmer. International Journal for Research in Emerging Science and Technology, 4 (12). pp. 1-9. ISSN 2349-7610 https://ijrest.net/vol-4-issue-12.html
institution Universiti Malaysia Sarawak
building Centre for Academic Information Services (CAIS)
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Sarawak
content_source UNIMAS Institutional Repository
url_provider http://ir.unimas.my/
language English
topic P Philology. Linguistics
spellingShingle P Philology. Linguistics
Khan Ullah, Rehman
Fitri Suraya, Mohamad
Muh Inam, Ulhaq
Shahren Ahmad, Zaidi Adruce
Philip Nuli, Anding
Sajjad, Nawaz Khan
Abdulrazak Yahya, Saleh
Malay Language Stemmer
description Stemmer is a language processing tool that has been widely used in many artificial intelligence applications for removing affixes in a word such as prefixes, infixes, and suffixes to generate the root word. This study designs an algorithm and develops a Malay language stemmer. It is given that most of Malay language stemmers have problems in stemming, as they tended to have dependencies on online dictionaries, which return false results during stemming. It is given that the complexity of affixes in Malay words is higher than that of English words. Therefore, an offline dictionary of 9,512 words is introduced in this study to handle the ambiguity when stemming Malay words. Each step the algorithm first checks the word in the local dictionary as a root word, otherwise process the word. The five steps are stem-extra-suffix, stem-plural, stem-infix, stem-prefix, and stem-suffix. The affixes rules are extracted from Kamus Tatabahasa, and Kamus Dewan (4th Ed) is used to confirm the accuracy of stemmed words. The results show that the proposed stemmer can stem prefixes, suffixes and infixes with high accuracy. The study conclusively illustrated that the proposed stemmer can handle the complexity of Malay words. This stemmer can be further enhanced by a look-up table or dictionary of overlapping words to cover the prefix and suffix overlapping limitation.
format Article
author Khan Ullah, Rehman
Fitri Suraya, Mohamad
Muh Inam, Ulhaq
Shahren Ahmad, Zaidi Adruce
Philip Nuli, Anding
Sajjad, Nawaz Khan
Abdulrazak Yahya, Saleh
author_facet Khan Ullah, Rehman
Fitri Suraya, Mohamad
Muh Inam, Ulhaq
Shahren Ahmad, Zaidi Adruce
Philip Nuli, Anding
Sajjad, Nawaz Khan
Abdulrazak Yahya, Saleh
author_sort Khan Ullah, Rehman
title Malay Language Stemmer
title_short Malay Language Stemmer
title_full Malay Language Stemmer
title_fullStr Malay Language Stemmer
title_full_unstemmed Malay Language Stemmer
title_sort malay language stemmer
publisher International Journal for Research in Emerging Science and Technology
publishDate 2017
url http://ir.unimas.my/id/eprint/31722/1/Malay%20Language%20Stemmer%20-%20Copy.pdf
http://ir.unimas.my/id/eprint/31722/
https://ijrest.net/vol-4-issue-12.html
_version_ 1775627281295736832
score 13.209306