Staff View: Part-of-speech tagger for Malay social media texts

Part-of-speech tagger for Malay social media texts

Processing the meaning of words in social media texts, such as tweets, is challenging in natural language processing. Malay tweets are no exception because they demonstrate distinct linguistic phenomena, such as the use of dialects from each state in Malaysia; borrowing foreign language terms in...

Full description

Saved in:

Bibliographic Details
Main Authors:	Siti Noor Allia Noor Ariffin,, Sabrina Tiun,
Format:	Article
Language:	English
Published:	Penerbit Universiti Kebangsaan Malaysia 2018
Online Access:	http://journalarticle.ukm.my/17663/1/28357-89214-1-PB.pdf http://journalarticle.ukm.my/17663/ https://ejournal.ukm.my/gema/issue/view/1146
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my-ukm.journal.17663
record_format	eprints
spelling	my-ukm.journal.176632021-11-24T00:47:45Z http://journalarticle.ukm.my/17663/ Part-of-speech tagger for Malay social media texts Siti Noor Allia Noor Ariffin, Sabrina Tiun, Processing the meaning of words in social media texts, such as tweets, is challenging in natural language processing. Malay tweets are no exception because they demonstrate distinct linguistic phenomena, such as the use of dialects from each state in Malaysia; borrowing foreign language terms in the context of Malay language; and using mixed languages, abbreviations and spelling errors or mistakes in sentence structure. Tagging the word class of tweets is an arduous task because tweets are characterised by their distinctive style, linguistic sounds and errors. Currently, existing works on Malay part-of-speech (POS) are based only on standard Malay and formal texts and are thus unsuitable for tagging tweet texts. Thus, a POS model of tweet tagging for non-standardised Malay language must be developed. This study aims to design and implement a non-standardised Malay POS model for tweets and performs assessment on the basis of the word tagging accuracy of test data of unnormalised and normalised tweet texts. A solution that adopts a probabilistic POS tagging called QTAG is proposed. Results show that the Malay QTAG achieves best average POS tagging accuracies of 90% and 88.8% for normalised and unnormalised test datasets, respectively. Penerbit Universiti Kebangsaan Malaysia 2018-11 Article PeerReviewed application/pdf en http://journalarticle.ukm.my/17663/1/28357-89214-1-PB.pdf Siti Noor Allia Noor Ariffin, and Sabrina Tiun, (2018) Part-of-speech tagger for Malay social media texts. GEMA: Online Journal of Language Studies, 18 (4). pp. 124-142. ISSN 1675-8021 https://ejournal.ukm.my/gema/issue/view/1146
institution	Universiti Kebangsaan Malaysia
building	Tun Sri Lanang Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Kebangsaan Malaysia
content_source	UKM Journal Article Repository
url_provider	http://journalarticle.ukm.my/
language	English
description	Processing the meaning of words in social media texts, such as tweets, is challenging in natural language processing. Malay tweets are no exception because they demonstrate distinct linguistic phenomena, such as the use of dialects from each state in Malaysia; borrowing foreign language terms in the context of Malay language; and using mixed languages, abbreviations and spelling errors or mistakes in sentence structure. Tagging the word class of tweets is an arduous task because tweets are characterised by their distinctive style, linguistic sounds and errors. Currently, existing works on Malay part-of-speech (POS) are based only on standard Malay and formal texts and are thus unsuitable for tagging tweet texts. Thus, a POS model of tweet tagging for non-standardised Malay language must be developed. This study aims to design and implement a non-standardised Malay POS model for tweets and performs assessment on the basis of the word tagging accuracy of test data of unnormalised and normalised tweet texts. A solution that adopts a probabilistic POS tagging called QTAG is proposed. Results show that the Malay QTAG achieves best average POS tagging accuracies of 90% and 88.8% for normalised and unnormalised test datasets, respectively.
format	Article
author	Siti Noor Allia Noor Ariffin, Sabrina Tiun,
spellingShingle	Siti Noor Allia Noor Ariffin, Sabrina Tiun, Part-of-speech tagger for Malay social media texts
author_facet	Siti Noor Allia Noor Ariffin, Sabrina Tiun,
author_sort	Siti Noor Allia Noor Ariffin,
title	Part-of-speech tagger for Malay social media texts
title_short	Part-of-speech tagger for Malay social media texts
title_full	Part-of-speech tagger for Malay social media texts
title_fullStr	Part-of-speech tagger for Malay social media texts
title_full_unstemmed	Part-of-speech tagger for Malay social media texts
title_sort	part-of-speech tagger for malay social media texts
publisher	Penerbit Universiti Kebangsaan Malaysia
publishDate	2018
url	http://journalarticle.ukm.my/17663/1/28357-89214-1-PB.pdf http://journalarticle.ukm.my/17663/ https://ejournal.ukm.my/gema/issue/view/1146
_version_	1718927143058014208
score	13.160551

Part-of-speech tagger for Malay social media texts

Similar Items