A framework for English and Malay cross-lingual document alignment method
Issues of information divide in multilingual information retrieval are usually being solved by translating users’ queries to a language that the users understand. But dictionaries or other translation knowledge in some of the Asian languages are scarce. The objective of this study was to automat...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
The World Academy of Research in Science and Engineering
2019
|
Online Access: | http://psasir.upm.edu.my/id/eprint/80417/1/LINGUAL.pdf http://psasir.upm.edu.my/id/eprint/80417/ http://www.warse.org/IJATCSE/static/pdf/file/ijatcse38813sl2019.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Issues of information divide in multilingual information
retrieval are usually being solved by translating users’ queries
to a language that the users understand. But dictionaries or
other translation knowledge in some of the Asian languages
are scarce. The objective of this study was to automatically
align the English and Malay news documents to become a
comparable corpus, which could contribute as a translation
resource to improve the query translation in cross-lingual
information retrieval. This study proposes a direct alignment
framework by utilizing the textual features similarity of each
document itself while attempting a novel approach of using
the similarity of the documents sentiment in improving the
effectiveness of the alignment method. The proposed
sentiment-based approach outperformed existing alignment
methods and improved the effectiveness in differentiating the
related and unrelated documents. These aligned comparable
documents can further be utilised in translation research for
the English and Malay cross-lingual information retrieval
tasks. |
---|