A framework for English and Malay cross-lingual document alignment method

Issues of information divide in multilingual information retrieval are usually being solved by translating users’ queries to a language that the users understand. But dictionaries or other translation knowledge in some of the Asian languages are scarce. The objective of this study was to automat...

Full description

Saved in:
Bibliographic Details
Main Authors: Nasharuddin, Nurul Amelina, Azman, Azreen, Abdullah, Muhamad Taufik, Abdul Kadir, Rabiah
Format: Article
Language:English
Published: The World Academy of Research in Science and Engineering 2019
Online Access:http://psasir.upm.edu.my/id/eprint/80417/1/LINGUAL.pdf
http://psasir.upm.edu.my/id/eprint/80417/
http://www.warse.org/IJATCSE/static/pdf/file/ijatcse38813sl2019.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.upm.eprints.80417
record_format eprints
spelling my.upm.eprints.804172020-11-10T07:23:02Z http://psasir.upm.edu.my/id/eprint/80417/ A framework for English and Malay cross-lingual document alignment method Nasharuddin, Nurul Amelina Azman, Azreen Abdullah, Muhamad Taufik Abdul Kadir, Rabiah Issues of information divide in multilingual information retrieval are usually being solved by translating users’ queries to a language that the users understand. But dictionaries or other translation knowledge in some of the Asian languages are scarce. The objective of this study was to automatically align the English and Malay news documents to become a comparable corpus, which could contribute as a translation resource to improve the query translation in cross-lingual information retrieval. This study proposes a direct alignment framework by utilizing the textual features similarity of each document itself while attempting a novel approach of using the similarity of the documents sentiment in improving the effectiveness of the alignment method. The proposed sentiment-based approach outperformed existing alignment methods and improved the effectiveness in differentiating the related and unrelated documents. These aligned comparable documents can further be utilised in translation research for the English and Malay cross-lingual information retrieval tasks. The World Academy of Research in Science and Engineering 2019 Article PeerReviewed text en http://psasir.upm.edu.my/id/eprint/80417/1/LINGUAL.pdf Nasharuddin, Nurul Amelina and Azman, Azreen and Abdullah, Muhamad Taufik and Abdul Kadir, Rabiah (2019) A framework for English and Malay cross-lingual document alignment method. International Journal of Advanced Trends in Computer Science and Engineering, 8 (1.3). pp. 190-195. ISSN 2278-3091 http://www.warse.org/IJATCSE/static/pdf/file/ijatcse38813sl2019.pdf 10.30534/ijatcse/2019/3881.32019
institution Universiti Putra Malaysia
building UPM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Putra Malaysia
content_source UPM Institutional Repository
url_provider http://psasir.upm.edu.my/
language English
description Issues of information divide in multilingual information retrieval are usually being solved by translating users’ queries to a language that the users understand. But dictionaries or other translation knowledge in some of the Asian languages are scarce. The objective of this study was to automatically align the English and Malay news documents to become a comparable corpus, which could contribute as a translation resource to improve the query translation in cross-lingual information retrieval. This study proposes a direct alignment framework by utilizing the textual features similarity of each document itself while attempting a novel approach of using the similarity of the documents sentiment in improving the effectiveness of the alignment method. The proposed sentiment-based approach outperformed existing alignment methods and improved the effectiveness in differentiating the related and unrelated documents. These aligned comparable documents can further be utilised in translation research for the English and Malay cross-lingual information retrieval tasks.
format Article
author Nasharuddin, Nurul Amelina
Azman, Azreen
Abdullah, Muhamad Taufik
Abdul Kadir, Rabiah
spellingShingle Nasharuddin, Nurul Amelina
Azman, Azreen
Abdullah, Muhamad Taufik
Abdul Kadir, Rabiah
A framework for English and Malay cross-lingual document alignment method
author_facet Nasharuddin, Nurul Amelina
Azman, Azreen
Abdullah, Muhamad Taufik
Abdul Kadir, Rabiah
author_sort Nasharuddin, Nurul Amelina
title A framework for English and Malay cross-lingual document alignment method
title_short A framework for English and Malay cross-lingual document alignment method
title_full A framework for English and Malay cross-lingual document alignment method
title_fullStr A framework for English and Malay cross-lingual document alignment method
title_full_unstemmed A framework for English and Malay cross-lingual document alignment method
title_sort framework for english and malay cross-lingual document alignment method
publisher The World Academy of Research in Science and Engineering
publishDate 2019
url http://psasir.upm.edu.my/id/eprint/80417/1/LINGUAL.pdf
http://psasir.upm.edu.my/id/eprint/80417/
http://www.warse.org/IJATCSE/static/pdf/file/ijatcse38813sl2019.pdf
_version_ 1683232224620576768
score 13.160551