REDUCING DISTRIBUTED URLS CRAWLING TIME : A COMPARISON OF GUIDS AND IDS
Web crawler visits websites for the purpose of indexing. The dynamic nature of today’s web makes the crawling process harder than before as web contents are continuously updated. In addition, crawling speed is important considering tsunami of big data that need to be indexed among competitive sear...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Little Lion Scientific Islamabad Pakistan
2014
|
Subjects: | |
Online Access: | http://eprints.utem.edu.my/id/eprint/13289/3/GUIDs.pdf http://eprints.utem.edu.my/id/eprint/13289/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.utem.eprints.13289 |
---|---|
record_format |
eprints |
spelling |
my.utem.eprints.132892015-05-28T04:30:50Z http://eprints.utem.edu.my/id/eprint/13289/ REDUCING DISTRIBUTED URLS CRAWLING TIME : A COMPARISON OF GUIDS AND IDS Shakir, I. Shibghatullah, Abdul Samad Hussin, Burairah Gede , Pramudya Ananta Shafei, Suhailan T Technology (General) Web crawler visits websites for the purpose of indexing. The dynamic nature of today’s web makes the crawling process harder than before as web contents are continuously updated. In addition, crawling speed is important considering tsunami of big data that need to be indexed among competitive search engines. This research project is aimed to provide survey of current problems in distributed web crawlers. It then investigate the best crawling speed between dynamic globally unique identifiers (GUIDs) and the traditional static identifiers (IDs). Experiment are done by implementing Arachnot.net web crawlers to index up to 20000 locally generated URLs using both techniques. The results shown that URLs crawling time can be reduced up to 7% by using GUIDs technique instead of using IDs. Little Lion Scientific Islamabad Pakistan 2014-09-10 Article PeerReviewed application/pdf en http://eprints.utem.edu.my/id/eprint/13289/3/GUIDs.pdf Shakir, I. and Shibghatullah, Abdul Samad and Hussin, Burairah and Gede , Pramudya Ananta and Shafei, Suhailan (2014) REDUCING DISTRIBUTED URLS CRAWLING TIME : A COMPARISON OF GUIDS AND IDS. Journal of Theoretical and Applied Information Technology. pp. 1-8. ISSN 1992-8645 |
institution |
Universiti Teknikal Malaysia Melaka |
building |
UTEM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknikal Malaysia Melaka |
content_source |
UTEM Institutional Repository |
url_provider |
http://eprints.utem.edu.my/ |
language |
English |
topic |
T Technology (General) |
spellingShingle |
T Technology (General) Shakir, I. Shibghatullah, Abdul Samad Hussin, Burairah Gede , Pramudya Ananta Shafei, Suhailan REDUCING DISTRIBUTED URLS CRAWLING TIME : A COMPARISON OF GUIDS AND IDS |
description |
Web crawler visits websites for the purpose of indexing. The dynamic nature of today’s web makes the
crawling process harder than before as web contents are continuously updated. In addition, crawling speed
is important considering tsunami of big data that need to be indexed among competitive search engines.
This research project is aimed to provide survey of current problems in distributed web crawlers. It then
investigate the best crawling speed between dynamic globally unique identifiers (GUIDs) and the
traditional static identifiers (IDs). Experiment are done by implementing Arachnot.net web crawlers to
index up to 20000 locally generated URLs using both techniques. The results shown that URLs crawling
time can be reduced up to 7% by using GUIDs technique instead of using IDs. |
format |
Article |
author |
Shakir, I. Shibghatullah, Abdul Samad Hussin, Burairah Gede , Pramudya Ananta Shafei, Suhailan |
author_facet |
Shakir, I. Shibghatullah, Abdul Samad Hussin, Burairah Gede , Pramudya Ananta Shafei, Suhailan |
author_sort |
Shakir, I. |
title |
REDUCING DISTRIBUTED URLS CRAWLING
TIME : A COMPARISON OF GUIDS AND IDS |
title_short |
REDUCING DISTRIBUTED URLS CRAWLING
TIME : A COMPARISON OF GUIDS AND IDS |
title_full |
REDUCING DISTRIBUTED URLS CRAWLING
TIME : A COMPARISON OF GUIDS AND IDS |
title_fullStr |
REDUCING DISTRIBUTED URLS CRAWLING
TIME : A COMPARISON OF GUIDS AND IDS |
title_full_unstemmed |
REDUCING DISTRIBUTED URLS CRAWLING
TIME : A COMPARISON OF GUIDS AND IDS |
title_sort |
reducing distributed urls crawling
time : a comparison of guids and ids |
publisher |
Little Lion Scientific Islamabad Pakistan |
publishDate |
2014 |
url |
http://eprints.utem.edu.my/id/eprint/13289/3/GUIDs.pdf http://eprints.utem.edu.my/id/eprint/13289/ |
_version_ |
1665905535502778368 |
score |
13.160551 |