A framework for English and Malay cross-lingual document alignment method

Issues of information divide in multilingual information retrieval are usually being solved by translating users’ queries to a language that the users understand. But dictionaries or other translation knowledge in some of the Asian languages are scarce. The objective of this study was to automat...

Full description

Saved in:
Bibliographic Details
Main Authors: Nasharuddin, Nurul Amelina, Azman, Azreen, Abdullah, Muhamad Taufik, Abdul Kadir, Rabiah
Format: Article
Language:English
Published: The World Academy of Research in Science and Engineering 2019
Online Access:http://psasir.upm.edu.my/id/eprint/80417/1/LINGUAL.pdf
http://psasir.upm.edu.my/id/eprint/80417/
http://www.warse.org/IJATCSE/static/pdf/file/ijatcse38813sl2019.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Issues of information divide in multilingual information retrieval are usually being solved by translating users’ queries to a language that the users understand. But dictionaries or other translation knowledge in some of the Asian languages are scarce. The objective of this study was to automatically align the English and Malay news documents to become a comparable corpus, which could contribute as a translation resource to improve the query translation in cross-lingual information retrieval. This study proposes a direct alignment framework by utilizing the textual features similarity of each document itself while attempting a novel approach of using the similarity of the documents sentiment in improving the effectiveness of the alignment method. The proposed sentiment-based approach outperformed existing alignment methods and improved the effectiveness in differentiating the related and unrelated documents. These aligned comparable documents can further be utilised in translation research for the English and Malay cross-lingual information retrieval tasks.