Architecture for a parallel focused crawler for clickstream analysis

The tremendous growth of the Web poses many challenges for all-purpose single-process crawlers including the presence of some irrelevant answers among search results and the coverage and scaling issues regarding the enormous dimension of the World Wide Web. Meanwhile, more enhanced and convincing al...

Full description

Saved in:
Bibliographic Details
Main Authors: Selamat, Ali, Ahmadi-Abkenari, Fatemeh
Format: Conference or Workshop Item
Published: 2011
Online Access:http://eprints.utm.my/id/eprint/45605/
http://dx.doi.org/10.1007/978-3-642-20039-7-3
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.utm.45605
record_format eprints
spelling my.utm.456052017-08-29T00:57:18Z http://eprints.utm.my/id/eprint/45605/ Architecture for a parallel focused crawler for clickstream analysis Selamat, Ali Ahmadi-Abkenari, Fatemeh The tremendous growth of the Web poses many challenges for all-purpose single-process crawlers including the presence of some irrelevant answers among search results and the coverage and scaling issues regarding the enormous dimension of the World Wide Web. Meanwhile, more enhanced and convincing algorithms are on demand to yield more precise and relevant search results in an appropriate amount of time. Due to the fact that employing the link based Web page importance metrics in search engines is not an absolute solution to identify the best answer set by the overall search system and because employing such metrics within a multi-processes crawler bears a considerable communication overhead on the overall system, employing a link independent Web page importance metric is required to govern the priority rule within the queue of fetched URLs. The aim of this paper is to propose a modest weighted architecture for a focused structured parallel crawler in which the credit assignment to the discovered URLs is performed upon a combined metric based on clickstream analysis and Web page text similarity analysis to the specified mapped topic(s). 2011 Conference or Workshop Item PeerReviewed Selamat, Ali and Ahmadi-Abkenari, Fatemeh (2011) Architecture for a parallel focused crawler for clickstream analysis. In: The 3rd Asian Conference On Intelligent And Database System. http://dx.doi.org/10.1007/978-3-642-20039-7-3
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
description The tremendous growth of the Web poses many challenges for all-purpose single-process crawlers including the presence of some irrelevant answers among search results and the coverage and scaling issues regarding the enormous dimension of the World Wide Web. Meanwhile, more enhanced and convincing algorithms are on demand to yield more precise and relevant search results in an appropriate amount of time. Due to the fact that employing the link based Web page importance metrics in search engines is not an absolute solution to identify the best answer set by the overall search system and because employing such metrics within a multi-processes crawler bears a considerable communication overhead on the overall system, employing a link independent Web page importance metric is required to govern the priority rule within the queue of fetched URLs. The aim of this paper is to propose a modest weighted architecture for a focused structured parallel crawler in which the credit assignment to the discovered URLs is performed upon a combined metric based on clickstream analysis and Web page text similarity analysis to the specified mapped topic(s).
format Conference or Workshop Item
author Selamat, Ali
Ahmadi-Abkenari, Fatemeh
spellingShingle Selamat, Ali
Ahmadi-Abkenari, Fatemeh
Architecture for a parallel focused crawler for clickstream analysis
author_facet Selamat, Ali
Ahmadi-Abkenari, Fatemeh
author_sort Selamat, Ali
title Architecture for a parallel focused crawler for clickstream analysis
title_short Architecture for a parallel focused crawler for clickstream analysis
title_full Architecture for a parallel focused crawler for clickstream analysis
title_fullStr Architecture for a parallel focused crawler for clickstream analysis
title_full_unstemmed Architecture for a parallel focused crawler for clickstream analysis
title_sort architecture for a parallel focused crawler for clickstream analysis
publishDate 2011
url http://eprints.utm.my/id/eprint/45605/
http://dx.doi.org/10.1007/978-3-642-20039-7-3
_version_ 1643651789846740992
score 13.159267