REDUCING DISTRIBUTED URLS CRAWLING TIME : A COMPARISON OF GUIDS AND IDS

Web crawler visits websites for the purpose of indexing. The dynamic nature of today’s web makes the crawling process harder than before as web contents are continuously updated. In addition, crawling speed is important considering tsunami of big data that need to be indexed among competitive sear...

Full description

Saved in:
Bibliographic Details
Main Authors: Shakir, I., Shibghatullah, Abdul Samad, Hussin, Burairah, Gede , Pramudya Ananta, Shafei, Suhailan
Format: Article
Language:English
Published: Little Lion Scientific Islamabad Pakistan 2014
Subjects:
Online Access:http://eprints.utem.edu.my/id/eprint/13289/3/GUIDs.pdf
http://eprints.utem.edu.my/id/eprint/13289/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.utem.eprints.13289
record_format eprints
spelling my.utem.eprints.132892015-05-28T04:30:50Z http://eprints.utem.edu.my/id/eprint/13289/ REDUCING DISTRIBUTED URLS CRAWLING TIME : A COMPARISON OF GUIDS AND IDS Shakir, I. Shibghatullah, Abdul Samad Hussin, Burairah Gede , Pramudya Ananta Shafei, Suhailan T Technology (General) Web crawler visits websites for the purpose of indexing. The dynamic nature of today’s web makes the crawling process harder than before as web contents are continuously updated. In addition, crawling speed is important considering tsunami of big data that need to be indexed among competitive search engines. This research project is aimed to provide survey of current problems in distributed web crawlers. It then investigate the best crawling speed between dynamic globally unique identifiers (GUIDs) and the traditional static identifiers (IDs). Experiment are done by implementing Arachnot.net web crawlers to index up to 20000 locally generated URLs using both techniques. The results shown that URLs crawling time can be reduced up to 7% by using GUIDs technique instead of using IDs. Little Lion Scientific Islamabad Pakistan 2014-09-10 Article PeerReviewed application/pdf en http://eprints.utem.edu.my/id/eprint/13289/3/GUIDs.pdf Shakir, I. and Shibghatullah, Abdul Samad and Hussin, Burairah and Gede , Pramudya Ananta and Shafei, Suhailan (2014) REDUCING DISTRIBUTED URLS CRAWLING TIME : A COMPARISON OF GUIDS AND IDS. Journal of Theoretical and Applied Information Technology. pp. 1-8. ISSN 1992-8645
institution Universiti Teknikal Malaysia Melaka
building UTEM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknikal Malaysia Melaka
content_source UTEM Institutional Repository
url_provider http://eprints.utem.edu.my/
language English
topic T Technology (General)
spellingShingle T Technology (General)
Shakir, I.
Shibghatullah, Abdul Samad
Hussin, Burairah
Gede , Pramudya Ananta
Shafei, Suhailan
REDUCING DISTRIBUTED URLS CRAWLING TIME : A COMPARISON OF GUIDS AND IDS
description Web crawler visits websites for the purpose of indexing. The dynamic nature of today’s web makes the crawling process harder than before as web contents are continuously updated. In addition, crawling speed is important considering tsunami of big data that need to be indexed among competitive search engines. This research project is aimed to provide survey of current problems in distributed web crawlers. It then investigate the best crawling speed between dynamic globally unique identifiers (GUIDs) and the traditional static identifiers (IDs). Experiment are done by implementing Arachnot.net web crawlers to index up to 20000 locally generated URLs using both techniques. The results shown that URLs crawling time can be reduced up to 7% by using GUIDs technique instead of using IDs.
format Article
author Shakir, I.
Shibghatullah, Abdul Samad
Hussin, Burairah
Gede , Pramudya Ananta
Shafei, Suhailan
author_facet Shakir, I.
Shibghatullah, Abdul Samad
Hussin, Burairah
Gede , Pramudya Ananta
Shafei, Suhailan
author_sort Shakir, I.
title REDUCING DISTRIBUTED URLS CRAWLING TIME : A COMPARISON OF GUIDS AND IDS
title_short REDUCING DISTRIBUTED URLS CRAWLING TIME : A COMPARISON OF GUIDS AND IDS
title_full REDUCING DISTRIBUTED URLS CRAWLING TIME : A COMPARISON OF GUIDS AND IDS
title_fullStr REDUCING DISTRIBUTED URLS CRAWLING TIME : A COMPARISON OF GUIDS AND IDS
title_full_unstemmed REDUCING DISTRIBUTED URLS CRAWLING TIME : A COMPARISON OF GUIDS AND IDS
title_sort reducing distributed urls crawling time : a comparison of guids and ids
publisher Little Lion Scientific Islamabad Pakistan
publishDate 2014
url http://eprints.utem.edu.my/id/eprint/13289/3/GUIDs.pdf
http://eprints.utem.edu.my/id/eprint/13289/
_version_ 1665905535502778368
score 13.160551