Architecture for a parallel focused crawler for clickstream analysis
The tremendous growth of the Web poses many challenges for all-purpose single-process crawlers including the presence of some irrelevant answers among search results and the coverage and scaling issues regarding the enormous dimension of the World Wide Web. Meanwhile, more enhanced and convincing al...
Saved in:
Main Authors: | , |
---|---|
Format: | Conference or Workshop Item |
Published: |
2011
|
Online Access: | http://eprints.utm.my/id/eprint/45605/ http://dx.doi.org/10.1007/978-3-642-20039-7-3 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.utm.45605 |
---|---|
record_format |
eprints |
spelling |
my.utm.456052017-08-29T00:57:18Z http://eprints.utm.my/id/eprint/45605/ Architecture for a parallel focused crawler for clickstream analysis Selamat, Ali Ahmadi-Abkenari, Fatemeh The tremendous growth of the Web poses many challenges for all-purpose single-process crawlers including the presence of some irrelevant answers among search results and the coverage and scaling issues regarding the enormous dimension of the World Wide Web. Meanwhile, more enhanced and convincing algorithms are on demand to yield more precise and relevant search results in an appropriate amount of time. Due to the fact that employing the link based Web page importance metrics in search engines is not an absolute solution to identify the best answer set by the overall search system and because employing such metrics within a multi-processes crawler bears a considerable communication overhead on the overall system, employing a link independent Web page importance metric is required to govern the priority rule within the queue of fetched URLs. The aim of this paper is to propose a modest weighted architecture for a focused structured parallel crawler in which the credit assignment to the discovered URLs is performed upon a combined metric based on clickstream analysis and Web page text similarity analysis to the specified mapped topic(s). 2011 Conference or Workshop Item PeerReviewed Selamat, Ali and Ahmadi-Abkenari, Fatemeh (2011) Architecture for a parallel focused crawler for clickstream analysis. In: The 3rd Asian Conference On Intelligent And Database System. http://dx.doi.org/10.1007/978-3-642-20039-7-3 |
institution |
Universiti Teknologi Malaysia |
building |
UTM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Malaysia |
content_source |
UTM Institutional Repository |
url_provider |
http://eprints.utm.my/ |
description |
The tremendous growth of the Web poses many challenges for all-purpose single-process crawlers including the presence of some irrelevant answers among search results and the coverage and scaling issues regarding the enormous dimension of the World Wide Web. Meanwhile, more enhanced and convincing algorithms are on demand to yield more precise and relevant search results in an appropriate amount of time. Due to the fact that employing the link based Web page importance metrics in search engines is not an absolute solution to identify the best answer set by the overall search system and because employing such metrics within a multi-processes crawler bears a considerable communication overhead on the overall system, employing a link independent Web page importance metric is required to govern the priority rule within the queue of fetched URLs. The aim of this paper is to propose a modest weighted architecture for a focused structured parallel crawler in which the credit assignment to the discovered URLs is performed upon a combined metric based on clickstream analysis and Web page text similarity analysis to the specified mapped topic(s). |
format |
Conference or Workshop Item |
author |
Selamat, Ali Ahmadi-Abkenari, Fatemeh |
spellingShingle |
Selamat, Ali Ahmadi-Abkenari, Fatemeh Architecture for a parallel focused crawler for clickstream analysis |
author_facet |
Selamat, Ali Ahmadi-Abkenari, Fatemeh |
author_sort |
Selamat, Ali |
title |
Architecture for a parallel focused crawler for clickstream analysis |
title_short |
Architecture for a parallel focused crawler for clickstream analysis |
title_full |
Architecture for a parallel focused crawler for clickstream analysis |
title_fullStr |
Architecture for a parallel focused crawler for clickstream analysis |
title_full_unstemmed |
Architecture for a parallel focused crawler for clickstream analysis |
title_sort |
architecture for a parallel focused crawler for clickstream analysis |
publishDate |
2011 |
url |
http://eprints.utm.my/id/eprint/45605/ http://dx.doi.org/10.1007/978-3-642-20039-7-3 |
_version_ |
1643651789846740992 |
score |
13.159267 |