A rule-based named-entity recognition for malay articles

A Named-Entity Recognition (NER) is part of the process in Text Mining used for information extraction. This NER tool can be used to assist user in identifying and detecting entities such as person, location or organization. Different languages may have different morphologies and thus require differ...

Full description

Saved in:
Bibliographic Details
Main Authors: Rayner Alfred, Leong, Leow Ching, On, Chin Kim, Patricia Anthony, Tan, Soo Fun, Mohd Norhisham Razali, Mohd Hanafi Ahmad Hijazi
Format: Conference or Workshop Item
Language:English
Published: 2013
Online Access:https://eprints.ums.edu.my/id/eprint/15044/1/A_rule.pdf
https://eprints.ums.edu.my/id/eprint/15044/
http://dx.doi.org/10.1007/978-3-642-53914-5_25
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.ums.eprints.15044
record_format eprints
spelling my.ums.eprints.150442017-10-11T07:10:54Z https://eprints.ums.edu.my/id/eprint/15044/ A rule-based named-entity recognition for malay articles Rayner Alfred Leong, Leow Ching On, Chin Kim Patricia Anthony Tan, Soo Fun Mohd Norhisham Razali Mohd Hanafi Ahmad Hijazi A Named-Entity Recognition (NER) is part of the process in Text Mining used for information extraction. This NER tool can be used to assist user in identifying and detecting entities such as person, location or organization. Different languages may have different morphologies and thus require different NER processes. For instance, an English NER process cannot be applied in processing Malay articles due to the different morphology used in different languages. This paper proposes a Rule-Based Named-Entity Recognition algorithm for Malay articles. The proposed Malay NER is designed based on a Malay part-of-speech (POS) tagging features and contextual features that had been implemented to handle Malay articles. Based on the POS results, proper names will be identified or detected as the possible candidates for annotation. Besides that, there are some symbols and conjunctions that will also be considered in the process of identifying named-entity for Malay articles. Several manually constructed dictionaries will be used to handle three named-entities; Person, Location and Organizations. The experimental results show a reasonable output of 89.47% for the F-Measure value. The proposed Malay NER algorithm can be further improved by having more complete dictionaries and refined rules to be used in order to identify the correct Malay entities system. 2013 Conference or Workshop Item PeerReviewed text en https://eprints.ums.edu.my/id/eprint/15044/1/A_rule.pdf Rayner Alfred and Leong, Leow Ching and On, Chin Kim and Patricia Anthony and Tan, Soo Fun and Mohd Norhisham Razali and Mohd Hanafi Ahmad Hijazi (2013) A rule-based named-entity recognition for malay articles. In: International Conference on Advanced Data Mining and Applications, 14-16 December 2013, Hangzhou, China. http://dx.doi.org/10.1007/978-3-642-53914-5_25
institution Universiti Malaysia Sabah
building UMS Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Sabah
content_source UMS Institutional Repository
url_provider http://eprints.ums.edu.my/
language English
description A Named-Entity Recognition (NER) is part of the process in Text Mining used for information extraction. This NER tool can be used to assist user in identifying and detecting entities such as person, location or organization. Different languages may have different morphologies and thus require different NER processes. For instance, an English NER process cannot be applied in processing Malay articles due to the different morphology used in different languages. This paper proposes a Rule-Based Named-Entity Recognition algorithm for Malay articles. The proposed Malay NER is designed based on a Malay part-of-speech (POS) tagging features and contextual features that had been implemented to handle Malay articles. Based on the POS results, proper names will be identified or detected as the possible candidates for annotation. Besides that, there are some symbols and conjunctions that will also be considered in the process of identifying named-entity for Malay articles. Several manually constructed dictionaries will be used to handle three named-entities; Person, Location and Organizations. The experimental results show a reasonable output of 89.47% for the F-Measure value. The proposed Malay NER algorithm can be further improved by having more complete dictionaries and refined rules to be used in order to identify the correct Malay entities system.
format Conference or Workshop Item
author Rayner Alfred
Leong, Leow Ching
On, Chin Kim
Patricia Anthony
Tan, Soo Fun
Mohd Norhisham Razali
Mohd Hanafi Ahmad Hijazi
spellingShingle Rayner Alfred
Leong, Leow Ching
On, Chin Kim
Patricia Anthony
Tan, Soo Fun
Mohd Norhisham Razali
Mohd Hanafi Ahmad Hijazi
A rule-based named-entity recognition for malay articles
author_facet Rayner Alfred
Leong, Leow Ching
On, Chin Kim
Patricia Anthony
Tan, Soo Fun
Mohd Norhisham Razali
Mohd Hanafi Ahmad Hijazi
author_sort Rayner Alfred
title A rule-based named-entity recognition for malay articles
title_short A rule-based named-entity recognition for malay articles
title_full A rule-based named-entity recognition for malay articles
title_fullStr A rule-based named-entity recognition for malay articles
title_full_unstemmed A rule-based named-entity recognition for malay articles
title_sort rule-based named-entity recognition for malay articles
publishDate 2013
url https://eprints.ums.edu.my/id/eprint/15044/1/A_rule.pdf
https://eprints.ums.edu.my/id/eprint/15044/
http://dx.doi.org/10.1007/978-3-642-53914-5_25
_version_ 1760229247865061376
score 13.1944895