A clickstream-based focused trend parallel web crawler

The immense growing dimension of the World Wide Web induces many obstacles for all-purpose single-process crawlers including the presence of some incorrect answers among search results and the scaling drawbacks. As a result, more enhanced heuristics are needed to provide more accurate search outcome...

Full description

Saved in:
Bibliographic Details
Main Authors: Ahmadi-Abkenari, F., Selamat, Ali
Format: Article
Language:English
Published: Foundation of Computer Science, USA 2010
Subjects:
Online Access:http://eprints.utm.my/id/eprint/37000/2/pxc3871866.pdf
http://eprints.utm.my/id/eprint/37000/
http://www.ijcaonline.org/volume9/number5/pxc3871866.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.utm.37000
record_format eprints
spelling my.utm.370002017-02-15T00:33:41Z http://eprints.utm.my/id/eprint/37000/ A clickstream-based focused trend parallel web crawler Ahmadi-Abkenari, F. Selamat, Ali QA75 Electronic computers. Computer science The immense growing dimension of the World Wide Web induces many obstacles for all-purpose single-process crawlers including the presence of some incorrect answers among search results and the scaling drawbacks. As a result, more enhanced heuristics are needed to provide more accurate search outcomes in an appropriate timely manner. Regarding the fact that employing link dependent Web page importance metrics within a parallel crawler yields a considerable overhead on the overall searching system, and also because such a metric is not able to cover the authorized Web content in dark net and authorized fresh pages, therefore employing these metrics is not an absolute solution within search engines’ architecture. This paper proposes the application of a link independent Web page importance metric to govern the priority rule within the crawl frontier through proposing a modest weighted architecture for a focused structured parallel Web crawler (CFP crawler) in which the credit assignment to URLs in crawl frontier is done according to a clickstream-based prioritizing algorithm. Foundation of Computer Science, USA 2010-11 Article PeerReviewed text/html en http://eprints.utm.my/id/eprint/37000/2/pxc3871866.pdf Ahmadi-Abkenari, F. and Selamat, Ali (2010) A clickstream-based focused trend parallel web crawler. International Journal of Computer Applications, 9 (5). pp. 1-8. ISSN 0975-8887 http://www.ijcaonline.org/volume9/number5/pxc3871866.pdf DOI:10.5120/1385-1866
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
language English
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Ahmadi-Abkenari, F.
Selamat, Ali
A clickstream-based focused trend parallel web crawler
description The immense growing dimension of the World Wide Web induces many obstacles for all-purpose single-process crawlers including the presence of some incorrect answers among search results and the scaling drawbacks. As a result, more enhanced heuristics are needed to provide more accurate search outcomes in an appropriate timely manner. Regarding the fact that employing link dependent Web page importance metrics within a parallel crawler yields a considerable overhead on the overall searching system, and also because such a metric is not able to cover the authorized Web content in dark net and authorized fresh pages, therefore employing these metrics is not an absolute solution within search engines’ architecture. This paper proposes the application of a link independent Web page importance metric to govern the priority rule within the crawl frontier through proposing a modest weighted architecture for a focused structured parallel Web crawler (CFP crawler) in which the credit assignment to URLs in crawl frontier is done according to a clickstream-based prioritizing algorithm.
format Article
author Ahmadi-Abkenari, F.
Selamat, Ali
author_facet Ahmadi-Abkenari, F.
Selamat, Ali
author_sort Ahmadi-Abkenari, F.
title A clickstream-based focused trend parallel web crawler
title_short A clickstream-based focused trend parallel web crawler
title_full A clickstream-based focused trend parallel web crawler
title_fullStr A clickstream-based focused trend parallel web crawler
title_full_unstemmed A clickstream-based focused trend parallel web crawler
title_sort clickstream-based focused trend parallel web crawler
publisher Foundation of Computer Science, USA
publishDate 2010
url http://eprints.utm.my/id/eprint/37000/2/pxc3871866.pdf
http://eprints.utm.my/id/eprint/37000/
http://www.ijcaonline.org/volume9/number5/pxc3871866.pdf
_version_ 1643650059506548736
score 13.160551