Quranic ontology for resolving query translation disambiguation in English-Malay cross-language information retrieval

This research proposed a Cross Language Information Retrieval (CLIR)method based on specific domain/ontology using specific concepts for disambiguating translation of the query. This research experiment the use of specific domain/ontology: Quran, written in English and Malay languages as a bilingual...

Full description

Saved in:
Bibliographic Details
Main Author: Yahya, Zulaini
Format: Thesis
Language:English
Published: 2012
Online Access:http://psasir.upm.edu.my/id/eprint/31652/1/FSKTM%202012%2027R.pdf
http://psasir.upm.edu.my/id/eprint/31652/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This research proposed a Cross Language Information Retrieval (CLIR)method based on specific domain/ontology using specific concepts for disambiguating translation of the query. This research experiment the use of specific domain/ontology: Quran, written in English and Malay languages as a bilingual parallel-corpora and specific concepts: Quran, as a resource for cross-language query translation along with dictionary-based translation. This study evaluates the effectiveness of query translation using dictionary based and ontology for CLIR system. For translation, we use two basic approaches as benchmark: 1) first translation listed in the dictionary; and 2)all translation candidates listed in the dictionary. For the proposed CLIR method, we use three approaches: 1) based on verse list; 2) based on concepts similarity; and 3) based on concepts expansion. For concepts matching before and after query translation, we used two approaches: 1)query concepts; and 2) translation concepts. The experimental result shows that retrieval performance using dictionary based is lower than monolingual either in English or Malay document collections. Direct translation involved in returning many possibility results which can affect the decreasing in document retrieval performance either in English or Malay document collections. For the proposed CLIR method, performance of CLIR query translation based on verse list approach, concepts similarity approach and concepts expansion approach, obtained a better result either using query concepts or translation concepts matching compared to dictionary-based for English document collections but not in Malay document collections. In Malay document collections the retrieval performance only improved in concepts expansion approach. English language has a better structure compared to Malay language which affects the retrieval performance. A single Malay word may have a variety of meaning, not only by the word itself but also depends on the meaning of the verse or chapter. This is one of the reasons why retrieval performance decreasing in Malay document collections.