Malay Language Stemmer
Stemmer is a language processing tool that has been widely used in many artificial intelligence applications for removing affixes in a word such as prefixes, infixes, and suffixes to generate the root word. This study designs an algorithm and develops a Malay language stemmer. It is given that most...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
International Journal for Research in Emerging Science and Technology
2017
|
Subjects: | |
Online Access: | http://ir.unimas.my/id/eprint/31722/1/Malay%20Language%20Stemmer%20-%20Copy.pdf http://ir.unimas.my/id/eprint/31722/ https://ijrest.net/vol-4-issue-12.html |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.unimas.ir.31722 |
---|---|
record_format |
eprints |
spelling |
my.unimas.ir.317222023-08-22T02:45:59Z http://ir.unimas.my/id/eprint/31722/ Malay Language Stemmer Khan Ullah, Rehman Fitri Suraya, Mohamad Muh Inam, Ulhaq Shahren Ahmad, Zaidi Adruce Philip Nuli, Anding Sajjad, Nawaz Khan Abdulrazak Yahya, Saleh P Philology. Linguistics Stemmer is a language processing tool that has been widely used in many artificial intelligence applications for removing affixes in a word such as prefixes, infixes, and suffixes to generate the root word. This study designs an algorithm and develops a Malay language stemmer. It is given that most of Malay language stemmers have problems in stemming, as they tended to have dependencies on online dictionaries, which return false results during stemming. It is given that the complexity of affixes in Malay words is higher than that of English words. Therefore, an offline dictionary of 9,512 words is introduced in this study to handle the ambiguity when stemming Malay words. Each step the algorithm first checks the word in the local dictionary as a root word, otherwise process the word. The five steps are stem-extra-suffix, stem-plural, stem-infix, stem-prefix, and stem-suffix. The affixes rules are extracted from Kamus Tatabahasa, and Kamus Dewan (4th Ed) is used to confirm the accuracy of stemmed words. The results show that the proposed stemmer can stem prefixes, suffixes and infixes with high accuracy. The study conclusively illustrated that the proposed stemmer can handle the complexity of Malay words. This stemmer can be further enhanced by a look-up table or dictionary of overlapping words to cover the prefix and suffix overlapping limitation. International Journal for Research in Emerging Science and Technology 2017-12 Article PeerReviewed text en http://ir.unimas.my/id/eprint/31722/1/Malay%20Language%20Stemmer%20-%20Copy.pdf Khan Ullah, Rehman and Fitri Suraya, Mohamad and Muh Inam, Ulhaq and Shahren Ahmad, Zaidi Adruce and Philip Nuli, Anding and Sajjad, Nawaz Khan and Abdulrazak Yahya, Saleh (2017) Malay Language Stemmer. International Journal for Research in Emerging Science and Technology, 4 (12). pp. 1-9. ISSN 2349-7610 https://ijrest.net/vol-4-issue-12.html |
institution |
Universiti Malaysia Sarawak |
building |
Centre for Academic Information Services (CAIS) |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Malaysia Sarawak |
content_source |
UNIMAS Institutional Repository |
url_provider |
http://ir.unimas.my/ |
language |
English |
topic |
P Philology. Linguistics |
spellingShingle |
P Philology. Linguistics Khan Ullah, Rehman Fitri Suraya, Mohamad Muh Inam, Ulhaq Shahren Ahmad, Zaidi Adruce Philip Nuli, Anding Sajjad, Nawaz Khan Abdulrazak Yahya, Saleh Malay Language Stemmer |
description |
Stemmer is a language processing tool that has been widely used in many artificial intelligence applications for removing affixes in a word such as prefixes, infixes, and suffixes to generate the root word. This study designs an algorithm and develops a Malay language stemmer. It is given that most of Malay language stemmers have problems in stemming, as they tended to have dependencies on online dictionaries, which return false results during stemming. It is given that the complexity of affixes in Malay words is higher than that of English words. Therefore, an offline dictionary of 9,512 words is introduced in this study to handle the ambiguity when stemming Malay words. Each step the algorithm first checks the word in the local dictionary as a root word, otherwise process the word. The five steps are stem-extra-suffix, stem-plural, stem-infix, stem-prefix, and stem-suffix. The affixes rules are extracted from Kamus Tatabahasa, and Kamus Dewan (4th Ed) is used to confirm the accuracy of stemmed words. The results show that the proposed stemmer can stem prefixes, suffixes and infixes with high accuracy. The study conclusively illustrated that the proposed stemmer can handle the complexity of Malay words. This stemmer can be further enhanced by a look-up table or dictionary of overlapping words to cover the prefix and suffix overlapping limitation. |
format |
Article |
author |
Khan Ullah, Rehman Fitri Suraya, Mohamad Muh Inam, Ulhaq Shahren Ahmad, Zaidi Adruce Philip Nuli, Anding Sajjad, Nawaz Khan Abdulrazak Yahya, Saleh |
author_facet |
Khan Ullah, Rehman Fitri Suraya, Mohamad Muh Inam, Ulhaq Shahren Ahmad, Zaidi Adruce Philip Nuli, Anding Sajjad, Nawaz Khan Abdulrazak Yahya, Saleh |
author_sort |
Khan Ullah, Rehman |
title |
Malay Language Stemmer |
title_short |
Malay Language Stemmer |
title_full |
Malay Language Stemmer |
title_fullStr |
Malay Language Stemmer |
title_full_unstemmed |
Malay Language Stemmer |
title_sort |
malay language stemmer |
publisher |
International Journal for Research in Emerging Science and Technology |
publishDate |
2017 |
url |
http://ir.unimas.my/id/eprint/31722/1/Malay%20Language%20Stemmer%20-%20Copy.pdf http://ir.unimas.my/id/eprint/31722/ https://ijrest.net/vol-4-issue-12.html |
_version_ |
1775627281295736832 |
score |
13.209306 |