An architecture for a focused trend parallel web crawler with the application of clickstream analysis

The tremendous growth of the Web poses many challenges for all-purpose single-process crawlers including the presence of some irrelevant answers among search results and the coverage and scaling issues regarding the enormous dimension of the World Wide Web. Hence, more enhanced and convincing algori...

Full description

Saved in:
Bibliographic Details
Main Authors: Ahmadi-Abkenari, Fatemeh, Selamat, Ali
Format: Article
Published: Elsevier Inc. 2012
Subjects:
Online Access:http://eprints.utm.my/id/eprint/28674/
http://dx.doi.org/10.1016/j.ins.2011.08.022
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.utm.28674
record_format eprints
spelling my.utm.286742019-01-28T03:38:10Z http://eprints.utm.my/id/eprint/28674/ An architecture for a focused trend parallel web crawler with the application of clickstream analysis Ahmadi-Abkenari, Fatemeh Selamat, Ali QA75 Electronic computers. Computer science The tremendous growth of the Web poses many challenges for all-purpose single-process crawlers including the presence of some irrelevant answers among search results and the coverage and scaling issues regarding the enormous dimension of the World Wide Web. Hence, more enhanced and convincing algorithms are on demand to yield more precise and relevant search results in an appropriate amount of time. Since employing link based Web page importance metrics within a multi-processes crawler bears a considerable communication overhead on the overall system and cannot produce the precise answer set, employing these metrics in search engines is not an absolute solution to identify the best search answer set by the overall search system. Thus considering the employment of a link independent Web page importance metric is required to govern the priority rule within the queue of fetched URLs. The aim of this paper is to propose a modest weighted architecture for a focused structured parallel Web crawler which employs a link independent clickstream based Web page importance metric. The experiments of this metric over the restricted boundary Web zone of our crowded UTM University Web site shows the efficiency of the proposed metric. Elsevier Inc. 2012-02 Article PeerReviewed Ahmadi-Abkenari, Fatemeh and Selamat, Ali (2012) An architecture for a focused trend parallel web crawler with the application of clickstream analysis. Information Sciences, 184 (1). pp. 266-281. ISSN 0020-0255 http://dx.doi.org/10.1016/j.ins.2011.08.022 DOI:10.1016/j.ins.2011.08.022
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Ahmadi-Abkenari, Fatemeh
Selamat, Ali
An architecture for a focused trend parallel web crawler with the application of clickstream analysis
description The tremendous growth of the Web poses many challenges for all-purpose single-process crawlers including the presence of some irrelevant answers among search results and the coverage and scaling issues regarding the enormous dimension of the World Wide Web. Hence, more enhanced and convincing algorithms are on demand to yield more precise and relevant search results in an appropriate amount of time. Since employing link based Web page importance metrics within a multi-processes crawler bears a considerable communication overhead on the overall system and cannot produce the precise answer set, employing these metrics in search engines is not an absolute solution to identify the best search answer set by the overall search system. Thus considering the employment of a link independent Web page importance metric is required to govern the priority rule within the queue of fetched URLs. The aim of this paper is to propose a modest weighted architecture for a focused structured parallel Web crawler which employs a link independent clickstream based Web page importance metric. The experiments of this metric over the restricted boundary Web zone of our crowded UTM University Web site shows the efficiency of the proposed metric.
format Article
author Ahmadi-Abkenari, Fatemeh
Selamat, Ali
author_facet Ahmadi-Abkenari, Fatemeh
Selamat, Ali
author_sort Ahmadi-Abkenari, Fatemeh
title An architecture for a focused trend parallel web crawler with the application of clickstream analysis
title_short An architecture for a focused trend parallel web crawler with the application of clickstream analysis
title_full An architecture for a focused trend parallel web crawler with the application of clickstream analysis
title_fullStr An architecture for a focused trend parallel web crawler with the application of clickstream analysis
title_full_unstemmed An architecture for a focused trend parallel web crawler with the application of clickstream analysis
title_sort architecture for a focused trend parallel web crawler with the application of clickstream analysis
publisher Elsevier Inc.
publishDate 2012
url http://eprints.utm.my/id/eprint/28674/
http://dx.doi.org/10.1016/j.ins.2011.08.022
_version_ 1643648129685258240
score 13.211853