Parallel web crawler architecture for clickstream analysis

The tremendous growth of the Web causes many challenges for single-process crawlers including the presence of some irrelevant answers among search results and the coverage and scaling issues. As a result, more robust algorithms needed to produce more precise and relevant search results in an appropr...

Full description

Saved in:
Bibliographic Details
Main Authors: Ahmadi-Abkenari, Fatemeh, Selamat, Ali
Format: Book Section
Published: Springer 2012
Subjects:
Online Access:http://eprints.utm.my/id/eprint/35741/
http://dx.doi.org/10.1007/978-3-642-32826-8_13
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.utm.35741
record_format eprints
spelling my.utm.357412017-02-02T04:57:11Z http://eprints.utm.my/id/eprint/35741/ Parallel web crawler architecture for clickstream analysis Ahmadi-Abkenari, Fatemeh Selamat, Ali QA75 Electronic computers. Computer science The tremendous growth of the Web causes many challenges for single-process crawlers including the presence of some irrelevant answers among search results and the coverage and scaling issues. As a result, more robust algorithms needed to produce more precise and relevant search results in an appropriate timely manner. The existed Web crawlers mostly implement link dependent Web page importance metrics. One of the barriers of applying this metrics is that these metrics produce considerable communication overhead on the multi agent crawlers. Moreover, they suffer from the shortcoming of high dependency to their own index size that ends in their failure to rank Web pages with complete accuracy. Hence more enhanced metrics need to be addressed in this area. Proposing new Web page importance metric needs define a new architecture as a framework to implement the metric. The aim of this paper is to propose architecture for a focused parallel crawler. In this framework, the decision-making on Web page importance is based on a combined metric of clickstream analysis and context similarity analysis to the issued queries. Springer 2012 Book Section PeerReviewed Ahmadi-Abkenari, Fatemeh and Selamat, Ali (2012) Parallel web crawler architecture for clickstream analysis. In: Communications In Computer And Information Science. Springer, Berlin, pp. 123-132. ISBN 978-3-642-32825-1 (Print); 978-3-642-32826-8 (Electronic) http://dx.doi.org/10.1007/978-3-642-32826-8_13 DOI:10.1007/978-3-642-32826-8_13
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Ahmadi-Abkenari, Fatemeh
Selamat, Ali
Parallel web crawler architecture for clickstream analysis
description The tremendous growth of the Web causes many challenges for single-process crawlers including the presence of some irrelevant answers among search results and the coverage and scaling issues. As a result, more robust algorithms needed to produce more precise and relevant search results in an appropriate timely manner. The existed Web crawlers mostly implement link dependent Web page importance metrics. One of the barriers of applying this metrics is that these metrics produce considerable communication overhead on the multi agent crawlers. Moreover, they suffer from the shortcoming of high dependency to their own index size that ends in their failure to rank Web pages with complete accuracy. Hence more enhanced metrics need to be addressed in this area. Proposing new Web page importance metric needs define a new architecture as a framework to implement the metric. The aim of this paper is to propose architecture for a focused parallel crawler. In this framework, the decision-making on Web page importance is based on a combined metric of clickstream analysis and context similarity analysis to the issued queries.
format Book Section
author Ahmadi-Abkenari, Fatemeh
Selamat, Ali
author_facet Ahmadi-Abkenari, Fatemeh
Selamat, Ali
author_sort Ahmadi-Abkenari, Fatemeh
title Parallel web crawler architecture for clickstream analysis
title_short Parallel web crawler architecture for clickstream analysis
title_full Parallel web crawler architecture for clickstream analysis
title_fullStr Parallel web crawler architecture for clickstream analysis
title_full_unstemmed Parallel web crawler architecture for clickstream analysis
title_sort parallel web crawler architecture for clickstream analysis
publisher Springer
publishDate 2012
url http://eprints.utm.my/id/eprint/35741/
http://dx.doi.org/10.1007/978-3-642-32826-8_13
_version_ 1643649826269691904
score 13.160551