Named entity recognition for quranic text using rule based approaches

The variety and difference between domains for textual data require customization in the Natural Language Processing component especially in Named Entity Recognition where different domains contain several types of entities. The current NER model is deemed not fit to accurately extract entities...

Full description

Saved in:
Bibliographic Details
Main Authors: Shasha Arzila Tarmizi,, Saidah Saad,
Format: Article
Language:English
Published: Penerbit Universiti Kebangsaan Malaysia 2022
Online Access:http://journalarticle.ukm.my/20852/1/9.pdf
http://journalarticle.ukm.my/20852/
https://www.ukm.my/apjitm/articles-issues
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-ukm.journal.20852
record_format eprints
spelling my-ukm.journal.208522022-12-21T08:30:27Z http://journalarticle.ukm.my/20852/ Named entity recognition for quranic text using rule based approaches Shasha Arzila Tarmizi, Saidah Saad, The variety and difference between domains for textual data require customization in the Natural Language Processing component especially in Named Entity Recognition where different domains contain several types of entities. The current NER model is deemed not fit to accurately extract entities from Quranic text due to its unique content. This paper describes the building of a rule-based Named Entity Recognition method to extract the entities that exist in the English translation to the meaning of the Quranic text and its performance evaluation. Named entity tagging, a common task in-text annotation, in which entities (nouns) in the unstructured text are identified and assigned a class. A few rules are built to extract several types of entities such as the name of prophets and people, creation, location, time, and the various names of God. The rules are built mainly using regular expressions and gazetteers. The rules that have been built result in high precision and recall as well as a satisfactory F-score of over 90%. The results from this experiment can be used as annotation in building a machine learning model to extract entities from the same type of domain specifically on the Quranic text or generally in the Islamic domain text. Penerbit Universiti Kebangsaan Malaysia 2022-12 Article PeerReviewed application/pdf en http://journalarticle.ukm.my/20852/1/9.pdf Shasha Arzila Tarmizi, and Saidah Saad, (2022) Named entity recognition for quranic text using rule based approaches. Asia-Pacific Journal of Information Technology and Multimedia, 11 (2). pp. 112-122. ISSN 2289-2192 https://www.ukm.my/apjitm/articles-issues
institution Universiti Kebangsaan Malaysia
building Tun Sri Lanang Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Kebangsaan Malaysia
content_source UKM Journal Article Repository
url_provider http://journalarticle.ukm.my/
language English
description The variety and difference between domains for textual data require customization in the Natural Language Processing component especially in Named Entity Recognition where different domains contain several types of entities. The current NER model is deemed not fit to accurately extract entities from Quranic text due to its unique content. This paper describes the building of a rule-based Named Entity Recognition method to extract the entities that exist in the English translation to the meaning of the Quranic text and its performance evaluation. Named entity tagging, a common task in-text annotation, in which entities (nouns) in the unstructured text are identified and assigned a class. A few rules are built to extract several types of entities such as the name of prophets and people, creation, location, time, and the various names of God. The rules are built mainly using regular expressions and gazetteers. The rules that have been built result in high precision and recall as well as a satisfactory F-score of over 90%. The results from this experiment can be used as annotation in building a machine learning model to extract entities from the same type of domain specifically on the Quranic text or generally in the Islamic domain text.
format Article
author Shasha Arzila Tarmizi,
Saidah Saad,
spellingShingle Shasha Arzila Tarmizi,
Saidah Saad,
Named entity recognition for quranic text using rule based approaches
author_facet Shasha Arzila Tarmizi,
Saidah Saad,
author_sort Shasha Arzila Tarmizi,
title Named entity recognition for quranic text using rule based approaches
title_short Named entity recognition for quranic text using rule based approaches
title_full Named entity recognition for quranic text using rule based approaches
title_fullStr Named entity recognition for quranic text using rule based approaches
title_full_unstemmed Named entity recognition for quranic text using rule based approaches
title_sort named entity recognition for quranic text using rule based approaches
publisher Penerbit Universiti Kebangsaan Malaysia
publishDate 2022
url http://journalarticle.ukm.my/20852/1/9.pdf
http://journalarticle.ukm.my/20852/
https://www.ukm.my/apjitm/articles-issues
_version_ 1753789423008350208
score 13.160551