Enhanced text stemmer with noisy text normalization for Malay texts

In general, the current text stemmers for Malay texts were not developed for text stemming against social media texts. Therefore, there is a need to develop an enhanced text stemmer that is able to map morphological variants based on the characteristics of non-standard derived word patterns on socia...

Full description

Saved in:
Bibliographic Details
Main Authors: Kassim, Mohamad Nizam, Mat Jali, Shaiful Hisham, Maarof, Mohd. Aizaini, Zainal, Anazida, Abdul Wahab, Amirudin
Format: Book Section
Published: Springer, Singapore 2020
Subjects:
Online Access:http://eprints.utm.my/id/eprint/92796/
http://dx.doi.org/10.1007/978-981-15-0077-0_44
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.utm.92796
record_format eprints
spelling my.utm.927962023-04-04T07:41:06Z http://eprints.utm.my/id/eprint/92796/ Enhanced text stemmer with noisy text normalization for Malay texts Kassim, Mohamad Nizam Mat Jali, Shaiful Hisham Maarof, Mohd. Aizaini Zainal, Anazida Abdul Wahab, Amirudin QA75 Electronic computers. Computer science In general, the current text stemmers for Malay texts were not developed for text stemming against social media texts. Therefore, there is a need to develop an enhanced text stemmer that is able to map morphological variants based on the characteristics of non-standard derived word patterns on social media platforms. It deals with noncompliance word patterns (also called noisy texts or micro text) such as misspelled word and texting language which are often being used as informal conversation. This paper proposes an enhanced text stemmer to perform text stemming against social media texts. The investigation focuses on different patterns of non-standard, non-derived words (mechanics, non-standard word formation, code-switching, and slang words) and also non-standard derived words. The experimental results show that the performance of the proposed text stemmer depends on how much “noise” is in social media texts. Springer, Singapore 2020 Book Section PeerReviewed Kassim, Mohamad Nizam and Mat Jali, Shaiful Hisham and Maarof, Mohd. Aizaini and Zainal, Anazida and Abdul Wahab, Amirudin (2020) Enhanced text stemmer with noisy text normalization for Malay texts. In: Smart Trends in Computing and Communications Proceedings of SmartCom 2019. Smart Innovation, Systems and Technologies, 165 (NA). Springer, Singapore, Gateway East, Singapore, pp. 433-444. ISBN 978-981-15-0076-3 http://dx.doi.org/10.1007/978-981-15-0077-0_44 DOI : 10.1007/978-981-15-0077-0_44
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Kassim, Mohamad Nizam
Mat Jali, Shaiful Hisham
Maarof, Mohd. Aizaini
Zainal, Anazida
Abdul Wahab, Amirudin
Enhanced text stemmer with noisy text normalization for Malay texts
description In general, the current text stemmers for Malay texts were not developed for text stemming against social media texts. Therefore, there is a need to develop an enhanced text stemmer that is able to map morphological variants based on the characteristics of non-standard derived word patterns on social media platforms. It deals with noncompliance word patterns (also called noisy texts or micro text) such as misspelled word and texting language which are often being used as informal conversation. This paper proposes an enhanced text stemmer to perform text stemming against social media texts. The investigation focuses on different patterns of non-standard, non-derived words (mechanics, non-standard word formation, code-switching, and slang words) and also non-standard derived words. The experimental results show that the performance of the proposed text stemmer depends on how much “noise” is in social media texts.
format Book Section
author Kassim, Mohamad Nizam
Mat Jali, Shaiful Hisham
Maarof, Mohd. Aizaini
Zainal, Anazida
Abdul Wahab, Amirudin
author_facet Kassim, Mohamad Nizam
Mat Jali, Shaiful Hisham
Maarof, Mohd. Aizaini
Zainal, Anazida
Abdul Wahab, Amirudin
author_sort Kassim, Mohamad Nizam
title Enhanced text stemmer with noisy text normalization for Malay texts
title_short Enhanced text stemmer with noisy text normalization for Malay texts
title_full Enhanced text stemmer with noisy text normalization for Malay texts
title_fullStr Enhanced text stemmer with noisy text normalization for Malay texts
title_full_unstemmed Enhanced text stemmer with noisy text normalization for Malay texts
title_sort enhanced text stemmer with noisy text normalization for malay texts
publisher Springer, Singapore
publishDate 2020
url http://eprints.utm.my/id/eprint/92796/
http://dx.doi.org/10.1007/978-981-15-0077-0_44
_version_ 1762837425282875392
score 13.209306