A method to enrich domain ontology using synonym and probability theory

Ontology has become a popular topic of research for numerous areas of computer science, such as question answering, information retrieval, and use of the semantic web. Considerable efforts have been made in constructing ontologies due to the complexity and time-consuming nature of the task. Conc...

Full description

Saved in:
Bibliographic Details
Main Author: Mohd Rafei Heng, Nur Fatin Nabila
Format: Thesis
Language:English
Published: 2016
Online Access:http://psasir.upm.edu.my/id/eprint/69348/1/FSKTM%202016%2015%20IR.pdf
http://psasir.upm.edu.my/id/eprint/69348/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Ontology has become a popular topic of research for numerous areas of computer science, such as question answering, information retrieval, and use of the semantic web. Considerable efforts have been made in constructing ontologies due to the complexity and time-consuming nature of the task. Concept, taxonomy, and non-taxonomic relations are three important components in the development of ontology. These three components are used to represent the knowledge of the domain texts. Most of the existing techniques focus on extracting the concept, the taxonomic relations, and non-taxonomic relationships within a single sentence. These techniques neglect a sentence when either the subject or object of a sentence is missing or not clear. Thus, the knowledge of domain texts is not properly represented as some relations cannot be identified. This thesis proposes a solution for the enrichment of the knowledge of domain text by finding possible relations. The proposed method suggests the appropriate or the most likely term for an uncertain subject or object of a sentence using the probability theory. In addition, the method can extract the relations between concepts (i.e. subject and object) that appear not only in a single sentence, but also in different sentences by using a synonym of the predicates. The proposed method has been tested and evaluated with three collections of domain texts that describe computers, tourism, and science. Precision, recall, and f-score metrics have been used to evaluate the results of the experiments. The experiment results were compared with the results that were completed manually by the domain experts. For the computer dataset, an F-score value of 62.33% has been achieved using the proposed solution. Additionally, the science dataset achieved an F-score of 78.98%, whereas the tourism dataset achieved an F-score of 81.58%. The result shows that the proposed method has increased and enriched the relationships of domain texts thus providing better results compared to several existing methods. The method is shown to be useful to assist ontology engineer in conceptualization process of ontology engineering.