A model for islamic istilahnet in Malay manuscripts for big data analytics and linguistics consortium
The research on Malay manuscripts content in Information Technology is limited especially on statistical approach as compared to rule-based approach. This research aims to propose a hybrid model, which combines the two approaches for jawi-roman transliteration of Malay manuscript contents. This rese...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Monograph |
Language: | English |
Published: |
2019
|
Subjects: | |
Online Access: | http://irep.iium.edu.my/73052/1/Research%20Report%20RIGS%202015%20-%20MSAS.pdf http://irep.iium.edu.my/73052/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The research on Malay manuscripts content in Information Technology is limited especially on statistical approach as compared to rule-based approach. This research aims to propose a hybrid model, which combines the two approaches for jawi-roman transliteration of Malay manuscript contents. This research assesses the quality scores of utilizing a prevalent statistical model, Statistical Model Transliteration (SMT) for jawi-roman transliteration. This research utilizes exploratory approach. The data used were extracted from 3 Malay manuscripts: Bidāyat al-Mubtadī bi-Faḍlillāh al-Muhdī, Kashf al-Asrār and Hujjat al-Balighah, acquired from ISTAC with a total of 3,420 rows of data transliterated into old jawi, modern jawi and roman form. Quality scores of Bilingual Evaluation Understudy (BLEU) score and word error rate are used for evaluation of SMT output. The findings show that E-Jawi.net word error rate for old jawi-roman is 55.8% error while modern jawi-roman is 32.42% on the initial data. Hence, the research opted for human expert to develop a quality corpus for SMT consisting of multiple transliterations of the manuscript contents in modern jawi and roman. Significantly, the model is dependable on a quality parallel corpus. |
---|