Description: Computational Morphological Resources Management System

Computational Morphological Resources Management System

In Natural Language Processing (NLP), morphological analyser is one of a very basic processing tool that we need to have. It is because with the help of the morphological analyser a word structure could be studied. In order to analyse a word structure, morphological resources is a very crucial input...

Full description

Saved in:

Bibliographic Details
Main Author:	Jovianna, Juk
Format:	Final Year Project Report
Language:	English English
Published:	Universiti Malaysia Sarawak, (UNIMAS) 2014
Subjects:	P Philology. Linguistics QA75 Electronic computers. Computer science QA76 Computer software
Online Access:	http://ir.unimas.my/id/eprint/39301/1/JOVIANNA%20%2824%20pgs%29.pdf http://ir.unimas.my/id/eprint/39301/4/JOVIANNA%20%28fulltext%29.pdf http://ir.unimas.my/id/eprint/39301/
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In Natural Language Processing (NLP), morphological analyser is one of a very basic processing tool that we need to have. It is because with the help of the morphological analyser a word structure could be studied. In order to analyse a word structure, morphological resources is a very crucial input for the morphological analyser. Currently, the acquisition of morphological resources is done manually which consumes a lot of energy and time. Therefore, we proposed Computational Morphological Resources Management System (CMRMS), a management system that will ease the linguist when undergoing the pre-processing part. Besides, CMRMS would allow the linguist to induce morphological information from the obtained wordlist. Therefore, to overcome the time and energy consuming problem an automated way is developed. The automated way combines the manual pre-processing and automatic file management system as the solution to obtain a wordlist and segmented data. The automated system, CMRMS has three main modules which are tokenization, conversion and segmentation tools module. . The tokenization module will tokenize any text file data which is obtain from hardcopy data, softcopy data and existing data into word by word. The conversion module would convert two types of softcopy data which is a pdf file and html file. Lastly, the segmentation tools module will provide two segmentation tools called Linguistica and Morfessor to analyse the data which have been tokenized. In order to test the functionality of CMRMS, three types of testing was implemented which are system, component and integration testing. Each of the testing gave a good result as the result shows CMRMS able to obtain the acquired result. This system has helped the linguist to manage their time more efficiently since they do not have to undergo the pre-processing part manual. Using CMRMS, they can obtain the wordlist easily. Beside, the produced wordlist can be re-used again as the input for other segmentation process.

Computational Morphological Resources Management System

Similar Items