Minimizing Human Labelling Effort for Annotating Named Entities in Historical Newspaper

To accelerate the annotation of named entities (NEs) in historical newspapers like Sarawak Gazette, only two choices are possible: an automatic approach or a semi-automatic approach. This paper presents a fully automatic annotation of NEs occurring in Sarawak Gazette. At the initial stage, a subset...

Full description

Saved in:
Bibliographic Details
Main Authors: Wan Tamlikha, W.M.F., Ranaivo-Malançon, Bali, Chua, S.
Format: Article
Language:English
Published: Universiti Teknikal Malaysia (UTEM) 2017
Subjects:
Online Access:http://ir.unimas.my/id/eprint/21732/1/Minimizing.pdf
http://ir.unimas.my/id/eprint/21732/
http://journal.utem.edu.my/index.php/jtec/article/view/2704
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.unimas.ir.21732
record_format eprints
spelling my.unimas.ir.217322022-09-29T04:30:38Z http://ir.unimas.my/id/eprint/21732/ Minimizing Human Labelling Effort for Annotating Named Entities in Historical Newspaper Wan Tamlikha, W.M.F. Ranaivo-Malançon, Bali Chua, S. QA75 Electronic computers. Computer science To accelerate the annotation of named entities (NEs) in historical newspapers like Sarawak Gazette, only two choices are possible: an automatic approach or a semi-automatic approach. This paper presents a fully automatic annotation of NEs occurring in Sarawak Gazette. At the initial stage, a subset of the historical newspapers is fed to an established rule-based named entity recognizer (NER), that is ANNIE. Then, the preannotated corpus is used as training and testing data for three supervised learning NER, which are based on Naïve Bayes, J48 decision trees, and SVM-SMO methods. These methods are not always accurate and it appears that SVM-SMO and J48 have better performance than Naïve Bayes. Thus, a thorough study on the errors done by SVM-SMO and J48 yield to the creation of ad hoc rules to correct the errors automatically. The proposed approach is promising even though it still needs more experiments to refine the rules. Universiti Teknikal Malaysia (UTEM) 2017 Article PeerReviewed text en http://ir.unimas.my/id/eprint/21732/1/Minimizing.pdf Wan Tamlikha, W.M.F. and Ranaivo-Malançon, Bali and Chua, S. (2017) Minimizing Human Labelling Effort for Annotating Named Entities in Historical Newspaper. Journal of Telecommunication, Electronic and Computer Engineering, 9 (2-10). ISSN 2289-8131 http://journal.utem.edu.my/index.php/jtec/article/view/2704
institution Universiti Malaysia Sarawak
building Centre for Academic Information Services (CAIS)
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Sarawak
content_source UNIMAS Institutional Repository
url_provider http://ir.unimas.my/
language English
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Wan Tamlikha, W.M.F.
Ranaivo-Malançon, Bali
Chua, S.
Minimizing Human Labelling Effort for Annotating Named Entities in Historical Newspaper
description To accelerate the annotation of named entities (NEs) in historical newspapers like Sarawak Gazette, only two choices are possible: an automatic approach or a semi-automatic approach. This paper presents a fully automatic annotation of NEs occurring in Sarawak Gazette. At the initial stage, a subset of the historical newspapers is fed to an established rule-based named entity recognizer (NER), that is ANNIE. Then, the preannotated corpus is used as training and testing data for three supervised learning NER, which are based on Naïve Bayes, J48 decision trees, and SVM-SMO methods. These methods are not always accurate and it appears that SVM-SMO and J48 have better performance than Naïve Bayes. Thus, a thorough study on the errors done by SVM-SMO and J48 yield to the creation of ad hoc rules to correct the errors automatically. The proposed approach is promising even though it still needs more experiments to refine the rules.
format Article
author Wan Tamlikha, W.M.F.
Ranaivo-Malançon, Bali
Chua, S.
author_facet Wan Tamlikha, W.M.F.
Ranaivo-Malançon, Bali
Chua, S.
author_sort Wan Tamlikha, W.M.F.
title Minimizing Human Labelling Effort for Annotating Named Entities in Historical Newspaper
title_short Minimizing Human Labelling Effort for Annotating Named Entities in Historical Newspaper
title_full Minimizing Human Labelling Effort for Annotating Named Entities in Historical Newspaper
title_fullStr Minimizing Human Labelling Effort for Annotating Named Entities in Historical Newspaper
title_full_unstemmed Minimizing Human Labelling Effort for Annotating Named Entities in Historical Newspaper
title_sort minimizing human labelling effort for annotating named entities in historical newspaper
publisher Universiti Teknikal Malaysia (UTEM)
publishDate 2017
url http://ir.unimas.my/id/eprint/21732/1/Minimizing.pdf
http://ir.unimas.my/id/eprint/21732/
http://journal.utem.edu.my/index.php/jtec/article/view/2704
_version_ 1745566040706777088
score 13.159267