Sentence-based alignment for parallel text corpora preparation for machine translation.

In the age of technology, we are living in a world that is widely related to Natural Language Processing (NLP) as NLP helps in downstream applications like speech recognition, machine translation and so forth. Machine translation is important in our daily life as it is faster to translate a large nu...

Full description

Saved in:
Bibliographic Details
Main Author: Lee, Yong Wei
Format: Final Year Project / Dissertation / Thesis
Published: 2021
Subjects:
Online Access:http://eprints.utar.edu.my/4261/1/17ACB04464_FYP.pdf
http://eprints.utar.edu.my/4261/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utar-eprints.4261
record_format eprints
spelling my-utar-eprints.42612022-03-09T13:04:36Z Sentence-based alignment for parallel text corpora preparation for machine translation. Lee, Yong Wei QA75 Electronic computers. Computer science T Technology (General) In the age of technology, we are living in a world that is widely related to Natural Language Processing (NLP) as NLP helps in downstream applications like speech recognition, machine translation and so forth. Machine translation is important in our daily life as it is faster to translate a large number of texts compared to human translators. With the aids of machine translator, it definitely saves a lot of our times. Besides, it is also cheaper than using a human translator. In machine translation, parallel corpus plays a significant role as a resource for translation training and language teaching. A good quality of parallel corpus will greatly increase the accuracy of the machine translation. Hence, sentence-based alignment for parallel text corpora plays an important role in helping NLP especially for machine translation. However, there are limited resources on parallel corpus for some selected source language and target language. Furthermore, the accuracy of machine translation on some target languages is still low. Therefore, an approach of generating parallel corpus on source language and target language is proposed. In this study, parallel corpus of English (source language) and Malay (target language) are collected. Besides, a machine translation is developed using recurrent neural network (RNN) model of neural network translation. An accuracy of training with 0.9 is obtained from the model. Besides, the translated Malay text achieved BLEU score of 0.65 which is considered a good score. 2021-04-15 Final Year Project / Dissertation / Thesis NonPeerReviewed application/pdf http://eprints.utar.edu.my/4261/1/17ACB04464_FYP.pdf Lee, Yong Wei (2021) Sentence-based alignment for parallel text corpora preparation for machine translation. Final Year Project, UTAR. http://eprints.utar.edu.my/4261/
institution Universiti Tunku Abdul Rahman
building UTAR Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Tunku Abdul Rahman
content_source UTAR Institutional Repository
url_provider http://eprints.utar.edu.my
topic QA75 Electronic computers. Computer science
T Technology (General)
spellingShingle QA75 Electronic computers. Computer science
T Technology (General)
Lee, Yong Wei
Sentence-based alignment for parallel text corpora preparation for machine translation.
description In the age of technology, we are living in a world that is widely related to Natural Language Processing (NLP) as NLP helps in downstream applications like speech recognition, machine translation and so forth. Machine translation is important in our daily life as it is faster to translate a large number of texts compared to human translators. With the aids of machine translator, it definitely saves a lot of our times. Besides, it is also cheaper than using a human translator. In machine translation, parallel corpus plays a significant role as a resource for translation training and language teaching. A good quality of parallel corpus will greatly increase the accuracy of the machine translation. Hence, sentence-based alignment for parallel text corpora plays an important role in helping NLP especially for machine translation. However, there are limited resources on parallel corpus for some selected source language and target language. Furthermore, the accuracy of machine translation on some target languages is still low. Therefore, an approach of generating parallel corpus on source language and target language is proposed. In this study, parallel corpus of English (source language) and Malay (target language) are collected. Besides, a machine translation is developed using recurrent neural network (RNN) model of neural network translation. An accuracy of training with 0.9 is obtained from the model. Besides, the translated Malay text achieved BLEU score of 0.65 which is considered a good score.
format Final Year Project / Dissertation / Thesis
author Lee, Yong Wei
author_facet Lee, Yong Wei
author_sort Lee, Yong Wei
title Sentence-based alignment for parallel text corpora preparation for machine translation.
title_short Sentence-based alignment for parallel text corpora preparation for machine translation.
title_full Sentence-based alignment for parallel text corpora preparation for machine translation.
title_fullStr Sentence-based alignment for parallel text corpora preparation for machine translation.
title_full_unstemmed Sentence-based alignment for parallel text corpora preparation for machine translation.
title_sort sentence-based alignment for parallel text corpora preparation for machine translation.
publishDate 2021
url http://eprints.utar.edu.my/4261/1/17ACB04464_FYP.pdf
http://eprints.utar.edu.my/4261/
_version_ 1728055945473294336
score 13.154949