A framework for English and Malay cross-lingual document alignment method
Issues of information divide in multilingual information retrieval are usually being solved by translating users’ queries to a language that the users understand. But dictionaries or other translation knowledge in some of the Asian languages are scarce. The objective of this study was to automat...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
The World Academy of Research in Science and Engineering
2019
|
Online Access: | http://psasir.upm.edu.my/id/eprint/80417/1/LINGUAL.pdf http://psasir.upm.edu.my/id/eprint/80417/ http://www.warse.org/IJATCSE/static/pdf/file/ijatcse38813sl2019.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.upm.eprints.80417 |
---|---|
record_format |
eprints |
spelling |
my.upm.eprints.804172020-11-10T07:23:02Z http://psasir.upm.edu.my/id/eprint/80417/ A framework for English and Malay cross-lingual document alignment method Nasharuddin, Nurul Amelina Azman, Azreen Abdullah, Muhamad Taufik Abdul Kadir, Rabiah Issues of information divide in multilingual information retrieval are usually being solved by translating users’ queries to a language that the users understand. But dictionaries or other translation knowledge in some of the Asian languages are scarce. The objective of this study was to automatically align the English and Malay news documents to become a comparable corpus, which could contribute as a translation resource to improve the query translation in cross-lingual information retrieval. This study proposes a direct alignment framework by utilizing the textual features similarity of each document itself while attempting a novel approach of using the similarity of the documents sentiment in improving the effectiveness of the alignment method. The proposed sentiment-based approach outperformed existing alignment methods and improved the effectiveness in differentiating the related and unrelated documents. These aligned comparable documents can further be utilised in translation research for the English and Malay cross-lingual information retrieval tasks. The World Academy of Research in Science and Engineering 2019 Article PeerReviewed text en http://psasir.upm.edu.my/id/eprint/80417/1/LINGUAL.pdf Nasharuddin, Nurul Amelina and Azman, Azreen and Abdullah, Muhamad Taufik and Abdul Kadir, Rabiah (2019) A framework for English and Malay cross-lingual document alignment method. International Journal of Advanced Trends in Computer Science and Engineering, 8 (1.3). pp. 190-195. ISSN 2278-3091 http://www.warse.org/IJATCSE/static/pdf/file/ijatcse38813sl2019.pdf 10.30534/ijatcse/2019/3881.32019 |
institution |
Universiti Putra Malaysia |
building |
UPM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Putra Malaysia |
content_source |
UPM Institutional Repository |
url_provider |
http://psasir.upm.edu.my/ |
language |
English |
description |
Issues of information divide in multilingual information
retrieval are usually being solved by translating users’ queries
to a language that the users understand. But dictionaries or
other translation knowledge in some of the Asian languages
are scarce. The objective of this study was to automatically
align the English and Malay news documents to become a
comparable corpus, which could contribute as a translation
resource to improve the query translation in cross-lingual
information retrieval. This study proposes a direct alignment
framework by utilizing the textual features similarity of each
document itself while attempting a novel approach of using
the similarity of the documents sentiment in improving the
effectiveness of the alignment method. The proposed
sentiment-based approach outperformed existing alignment
methods and improved the effectiveness in differentiating the
related and unrelated documents. These aligned comparable
documents can further be utilised in translation research for
the English and Malay cross-lingual information retrieval
tasks. |
format |
Article |
author |
Nasharuddin, Nurul Amelina Azman, Azreen Abdullah, Muhamad Taufik Abdul Kadir, Rabiah |
spellingShingle |
Nasharuddin, Nurul Amelina Azman, Azreen Abdullah, Muhamad Taufik Abdul Kadir, Rabiah A framework for English and Malay cross-lingual document alignment method |
author_facet |
Nasharuddin, Nurul Amelina Azman, Azreen Abdullah, Muhamad Taufik Abdul Kadir, Rabiah |
author_sort |
Nasharuddin, Nurul Amelina |
title |
A framework for English and Malay cross-lingual document alignment method |
title_short |
A framework for English and Malay cross-lingual document alignment method |
title_full |
A framework for English and Malay cross-lingual document alignment method |
title_fullStr |
A framework for English and Malay cross-lingual document alignment method |
title_full_unstemmed |
A framework for English and Malay cross-lingual document alignment method |
title_sort |
framework for english and malay cross-lingual document alignment method |
publisher |
The World Academy of Research in Science and Engineering |
publishDate |
2019 |
url |
http://psasir.upm.edu.my/id/eprint/80417/1/LINGUAL.pdf http://psasir.upm.edu.my/id/eprint/80417/ http://www.warse.org/IJATCSE/static/pdf/file/ijatcse38813sl2019.pdf |
_version_ |
1683232224620576768 |
score |
13.160551 |