A clickstream-based focused trend parallel web crawler
The immense growing dimension of the World Wide Web induces many obstacles for all-purpose single-process crawlers including the presence of some incorrect answers among search results and the scaling drawbacks. As a result, more enhanced heuristics are needed to provide more accurate search outcome...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Foundation of Computer Science, USA
2010
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/37000/2/pxc3871866.pdf http://eprints.utm.my/id/eprint/37000/ http://www.ijcaonline.org/volume9/number5/pxc3871866.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.utm.37000 |
---|---|
record_format |
eprints |
spelling |
my.utm.370002017-02-15T00:33:41Z http://eprints.utm.my/id/eprint/37000/ A clickstream-based focused trend parallel web crawler Ahmadi-Abkenari, F. Selamat, Ali QA75 Electronic computers. Computer science The immense growing dimension of the World Wide Web induces many obstacles for all-purpose single-process crawlers including the presence of some incorrect answers among search results and the scaling drawbacks. As a result, more enhanced heuristics are needed to provide more accurate search outcomes in an appropriate timely manner. Regarding the fact that employing link dependent Web page importance metrics within a parallel crawler yields a considerable overhead on the overall searching system, and also because such a metric is not able to cover the authorized Web content in dark net and authorized fresh pages, therefore employing these metrics is not an absolute solution within search engines’ architecture. This paper proposes the application of a link independent Web page importance metric to govern the priority rule within the crawl frontier through proposing a modest weighted architecture for a focused structured parallel Web crawler (CFP crawler) in which the credit assignment to URLs in crawl frontier is done according to a clickstream-based prioritizing algorithm. Foundation of Computer Science, USA 2010-11 Article PeerReviewed text/html en http://eprints.utm.my/id/eprint/37000/2/pxc3871866.pdf Ahmadi-Abkenari, F. and Selamat, Ali (2010) A clickstream-based focused trend parallel web crawler. International Journal of Computer Applications, 9 (5). pp. 1-8. ISSN 0975-8887 http://www.ijcaonline.org/volume9/number5/pxc3871866.pdf DOI:10.5120/1385-1866 |
institution |
Universiti Teknologi Malaysia |
building |
UTM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Malaysia |
content_source |
UTM Institutional Repository |
url_provider |
http://eprints.utm.my/ |
language |
English |
topic |
QA75 Electronic computers. Computer science |
spellingShingle |
QA75 Electronic computers. Computer science Ahmadi-Abkenari, F. Selamat, Ali A clickstream-based focused trend parallel web crawler |
description |
The immense growing dimension of the World Wide Web induces many obstacles for all-purpose single-process crawlers including the presence of some incorrect answers among search results and the scaling drawbacks. As a result, more enhanced heuristics are needed to provide more accurate search outcomes in an appropriate timely manner. Regarding the fact that employing link dependent Web page importance metrics within a parallel crawler yields a considerable overhead on the overall searching system, and also because such a metric is not able to cover the authorized Web content in dark net and authorized fresh pages, therefore employing these metrics is not an absolute solution within search engines’ architecture. This paper proposes the application of a link independent Web page importance metric to govern the priority rule within the crawl frontier through proposing a modest weighted architecture for a focused structured parallel Web crawler (CFP crawler) in which the credit assignment to URLs in crawl frontier is done according to a clickstream-based prioritizing algorithm. |
format |
Article |
author |
Ahmadi-Abkenari, F. Selamat, Ali |
author_facet |
Ahmadi-Abkenari, F. Selamat, Ali |
author_sort |
Ahmadi-Abkenari, F. |
title |
A clickstream-based focused trend parallel web crawler |
title_short |
A clickstream-based focused trend parallel web crawler |
title_full |
A clickstream-based focused trend parallel web crawler |
title_fullStr |
A clickstream-based focused trend parallel web crawler |
title_full_unstemmed |
A clickstream-based focused trend parallel web crawler |
title_sort |
clickstream-based focused trend parallel web crawler |
publisher |
Foundation of Computer Science, USA |
publishDate |
2010 |
url |
http://eprints.utm.my/id/eprint/37000/2/pxc3871866.pdf http://eprints.utm.my/id/eprint/37000/ http://www.ijcaonline.org/volume9/number5/pxc3871866.pdf |
_version_ |
1643650059506548736 |
score |
13.160551 |