Keyword indexing for text documents using signature files / Abdul Hakim A. Gafa

Information retrieval is the first step in developing retrieval systems for text document in collections. Signature File is popular and effective in searching and retrieving processes (Zobel and Moffat, 2006) other than Inverted Files. This project explores the potential and limitation of prototype...

Full description

Saved in:
Bibliographic Details
Main Author: A. Gafa, Abdul Hakim
Format: Thesis
Language:English
Published: 2008
Subjects:
Online Access:https://ir.uitm.edu.my/id/eprint/98182/1/98182.pdf
https://ir.uitm.edu.my/id/eprint/98182/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Information retrieval is the first step in developing retrieval systems for text document in collections. Signature File is popular and effective in searching and retrieving processes (Zobel and Moffat, 2006) other than Inverted Files. This project explores the potential and limitation of prototype text search engines using Signature Files on Malaysian Text Documents. Malaysian Text Documents is an official text report of proceedings and debates in parliament which is documented in Malay Language and maintained by House of Parliament. These document are categorizes into House of Commons and House of Lords. Currently, searching and retrieving information from text document in Malay Language are done manually. These process are tedious, very time consuming and inefficient. Text search engine prototype using signature file can speed up the process of searching and retrieving information from Malaysian text documents. The main of this project is to compare the effectiveness of searching Text documents between using Signature files algorithm and Inverted files algorithm. In order to achieve the main objective, the Signature Files algorithm for indexing methods needs to be understood and implemented. A text search engine prototype for Malay Text Document will developed as a tools to evaluate the effectiveness of searching Text Documents using Signature Files and Inverted Files.