Minimizing Human Labelling Effort for Annotating Named Entities in Historical Newspaper
To accelerate the annotation of named entities (NEs) in historical newspapers like Sarawak Gazette, only two choices are possible: an automatic approach or a semi-automatic approach. This paper presents a fully automatic annotation of NEs occurring in Sarawak Gazette. At the initial stage, a subset...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Universiti Teknikal Malaysia (UTEM)
2017
|
Subjects: | |
Online Access: | http://ir.unimas.my/id/eprint/21732/1/Minimizing.pdf http://ir.unimas.my/id/eprint/21732/ http://journal.utem.edu.my/index.php/jtec/article/view/2704 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.unimas.ir.21732 |
---|---|
record_format |
eprints |
spelling |
my.unimas.ir.217322022-09-29T04:30:38Z http://ir.unimas.my/id/eprint/21732/ Minimizing Human Labelling Effort for Annotating Named Entities in Historical Newspaper Wan Tamlikha, W.M.F. Ranaivo-Malançon, Bali Chua, S. QA75 Electronic computers. Computer science To accelerate the annotation of named entities (NEs) in historical newspapers like Sarawak Gazette, only two choices are possible: an automatic approach or a semi-automatic approach. This paper presents a fully automatic annotation of NEs occurring in Sarawak Gazette. At the initial stage, a subset of the historical newspapers is fed to an established rule-based named entity recognizer (NER), that is ANNIE. Then, the preannotated corpus is used as training and testing data for three supervised learning NER, which are based on Naïve Bayes, J48 decision trees, and SVM-SMO methods. These methods are not always accurate and it appears that SVM-SMO and J48 have better performance than Naïve Bayes. Thus, a thorough study on the errors done by SVM-SMO and J48 yield to the creation of ad hoc rules to correct the errors automatically. The proposed approach is promising even though it still needs more experiments to refine the rules. Universiti Teknikal Malaysia (UTEM) 2017 Article PeerReviewed text en http://ir.unimas.my/id/eprint/21732/1/Minimizing.pdf Wan Tamlikha, W.M.F. and Ranaivo-Malançon, Bali and Chua, S. (2017) Minimizing Human Labelling Effort for Annotating Named Entities in Historical Newspaper. Journal of Telecommunication, Electronic and Computer Engineering, 9 (2-10). ISSN 2289-8131 http://journal.utem.edu.my/index.php/jtec/article/view/2704 |
institution |
Universiti Malaysia Sarawak |
building |
Centre for Academic Information Services (CAIS) |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Malaysia Sarawak |
content_source |
UNIMAS Institutional Repository |
url_provider |
http://ir.unimas.my/ |
language |
English |
topic |
QA75 Electronic computers. Computer science |
spellingShingle |
QA75 Electronic computers. Computer science Wan Tamlikha, W.M.F. Ranaivo-Malançon, Bali Chua, S. Minimizing Human Labelling Effort for Annotating Named Entities in Historical Newspaper |
description |
To accelerate the annotation of named entities
(NEs) in historical newspapers like Sarawak Gazette, only two choices are possible: an automatic approach or a semi-automatic approach. This paper presents a fully automatic annotation of NEs occurring in Sarawak Gazette. At the initial stage, a subset of the historical newspapers is fed to an established rule-based
named entity recognizer (NER), that is ANNIE. Then, the preannotated corpus is used as training and testing data for three supervised learning NER, which are based on Naïve Bayes, J48 decision trees, and SVM-SMO methods. These methods are not always accurate and it appears that SVM-SMO and J48 have better performance than Naïve Bayes. Thus, a thorough study on the errors done by SVM-SMO and J48 yield to the creation of ad hoc rules to correct the errors automatically. The proposed approach is promising even though it still needs more experiments to refine the rules. |
format |
Article |
author |
Wan Tamlikha, W.M.F. Ranaivo-Malançon, Bali Chua, S. |
author_facet |
Wan Tamlikha, W.M.F. Ranaivo-Malançon, Bali Chua, S. |
author_sort |
Wan Tamlikha, W.M.F. |
title |
Minimizing Human Labelling Effort for Annotating Named Entities in Historical Newspaper |
title_short |
Minimizing Human Labelling Effort for Annotating Named Entities in Historical Newspaper |
title_full |
Minimizing Human Labelling Effort for Annotating Named Entities in Historical Newspaper |
title_fullStr |
Minimizing Human Labelling Effort for Annotating Named Entities in Historical Newspaper |
title_full_unstemmed |
Minimizing Human Labelling Effort for Annotating Named Entities in Historical Newspaper |
title_sort |
minimizing human labelling effort for annotating named entities in historical newspaper |
publisher |
Universiti Teknikal Malaysia (UTEM) |
publishDate |
2017 |
url |
http://ir.unimas.my/id/eprint/21732/1/Minimizing.pdf http://ir.unimas.my/id/eprint/21732/ http://journal.utem.edu.my/index.php/jtec/article/view/2704 |
_version_ |
1745566040706777088 |
score |
13.159267 |