Projecting named entity tags from a resource rich language to a resource poor language

Named Entities (NE) are the prominent entities appearing in textual documents. Automatic classification of NE in a textual corpus is a vital process in Information Extraction and Information Retrieval research. Named Entity Recognition (NER) is the identification of words in text that correspond to...

Full description

Saved in:
Bibliographic Details
Main Authors: Zamin, N., Oxley, A., Bakar, Z.A.
Format: Article
Published: 2013
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-84893004882&partnerID=40&md5=f8a6bb4af5b4d7f06f9e5e8666f91c47
http://eprints.utp.edu.my/32692/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.utp.eprints.32692
record_format eprints
spelling my.utp.eprints.326922022-03-30T01:04:28Z Projecting named entity tags from a resource rich language to a resource poor language Zamin, N. Oxley, A. Bakar, Z.A. Named Entities (NE) are the prominent entities appearing in textual documents. Automatic classification of NE in a textual corpus is a vital process in Information Extraction and Information Retrieval research. Named Entity Recognition (NER) is the identification of words in text that correspond to a pre-defined taxonomy such as person, organization, location, date, time, etc. This article focuses on the person (PER), organization (ORG) and location (LOC) entities for a Malay journalistic corpus of terrorism. A projection algorithm, using the Dice Coefficient function and bigram scoring method with domain-specific rules, is suggested to map the NE information from the English corpus to the Malay corpus of terrorism. The English corpus is the translated version of the Malay corpus. Hence, these two corpora are treated as parallel corpora. The method computes the string similarity between the English words and the list of available lexemes in a pre-built lexicon that approximates the best NE mapping. The algorithm has been effectively evaluated using our own terrorism tagged corpus; it achieved satisfactory results in terms of precision, recall, and F-measure. An evaluation of the selected open source NER tool for English is also presented. 2013 Article NonPeerReviewed https://www.scopus.com/inward/record.uri?eid=2-s2.0-84893004882&partnerID=40&md5=f8a6bb4af5b4d7f06f9e5e8666f91c47 Zamin, N. and Oxley, A. and Bakar, Z.A. (2013) Projecting named entity tags from a resource rich language to a resource poor language. Journal of Information and Communication Technology, 12 (1). pp. 121-146. http://eprints.utp.edu.my/32692/
institution Universiti Teknologi Petronas
building UTP Resource Centre
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Petronas
content_source UTP Institutional Repository
url_provider http://eprints.utp.edu.my/
description Named Entities (NE) are the prominent entities appearing in textual documents. Automatic classification of NE in a textual corpus is a vital process in Information Extraction and Information Retrieval research. Named Entity Recognition (NER) is the identification of words in text that correspond to a pre-defined taxonomy such as person, organization, location, date, time, etc. This article focuses on the person (PER), organization (ORG) and location (LOC) entities for a Malay journalistic corpus of terrorism. A projection algorithm, using the Dice Coefficient function and bigram scoring method with domain-specific rules, is suggested to map the NE information from the English corpus to the Malay corpus of terrorism. The English corpus is the translated version of the Malay corpus. Hence, these two corpora are treated as parallel corpora. The method computes the string similarity between the English words and the list of available lexemes in a pre-built lexicon that approximates the best NE mapping. The algorithm has been effectively evaluated using our own terrorism tagged corpus; it achieved satisfactory results in terms of precision, recall, and F-measure. An evaluation of the selected open source NER tool for English is also presented.
format Article
author Zamin, N.
Oxley, A.
Bakar, Z.A.
spellingShingle Zamin, N.
Oxley, A.
Bakar, Z.A.
Projecting named entity tags from a resource rich language to a resource poor language
author_facet Zamin, N.
Oxley, A.
Bakar, Z.A.
author_sort Zamin, N.
title Projecting named entity tags from a resource rich language to a resource poor language
title_short Projecting named entity tags from a resource rich language to a resource poor language
title_full Projecting named entity tags from a resource rich language to a resource poor language
title_fullStr Projecting named entity tags from a resource rich language to a resource poor language
title_full_unstemmed Projecting named entity tags from a resource rich language to a resource poor language
title_sort projecting named entity tags from a resource rich language to a resource poor language
publishDate 2013
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-84893004882&partnerID=40&md5=f8a6bb4af5b4d7f06f9e5e8666f91c47
http://eprints.utp.edu.my/32692/
_version_ 1738657422425194496
score 13.160551