Development of automated web traversing tool / Tai Sock Yin

As the size of World Wide Web (WWW) grows rapidly and relevant web sites proliferate, the issue of locating information becomes increasingly challenging. We, in Malaysia are among the 215 million Internet users within the South East Asia region (SIL, 2000), who also show an exponential growth in num...

Full description

Saved in:
Bibliographic Details
Main Author: Tai, Sock Yin
Format: Thesis
Published: 2004
Subjects:
Online Access:http://studentsrepo.um.edu.my/10820/1/Tai_Sock_Yin.pdf
http://studentsrepo.um.edu.my/10820/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:As the size of World Wide Web (WWW) grows rapidly and relevant web sites proliferate, the issue of locating information becomes increasingly challenging. We, in Malaysia are among the 215 million Internet users within the South East Asia region (SIL, 2000), who also show an exponential growth in numbers of web pages, similar to the trends of WWW in general. Thus, collecting Malaysia pages becomes a tough problem. To manually check out pages from some possible portals, directories or even search engines require considerable amount of time and effort. A significant aspect of finding these pages is the set of choices for automatically traversing from one web page to another and the ramifications that these choices have will provide different search results. This study investigates the development of an automated traversing prototype that implements breadth first and depth first approaches to gather Malaysia web pages from the WWW, which will allow the organized study of the navigational aspects of web site. Finally, it describes how the use of these traversal approaches can achieve different results. The dissertation therefore involves work that spans in three major areas. First, understand the structure of the web as a directed but unstructured graph, as well as familiarize with the two elementary traversing approaches. Secondly is to build a working prototype of traversing tool to experiment the traversing approaches. Finally, is to investigate on how to examine the quality of web pages gathered by two different traversing approaches, in terms of two aspects: recall (measure of the ability of the prototype to find all of the relevant items that are in the database) and precision (a measure of accuracy of the traversing process).