Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Geographically Distributed Web Crawler full report
#1

[attachment=2721]Geographically Distributed Web Crawler

Introduction

Web crawling is a resource intensive process, both in terms of processing and in terms of communication. Distributing the crawling activity among multiple machines can distribute processing, and spreading out the distribution geographically can significantly reduce the communication cost. The reduction in communication is because of the following reasons. By choosing a crawler nearer to a web server being crawled, the http fetch of the content on the web server travels a shorter distance Each crawler while sending back the index to the central indexing location, can compress the information as compared to uncompressed content that would have otherwise traveled over http



Presented By:
Aseem Bajaj and Emin Gun Sirer
Cornell University, Ithaca, NY


for more please read
http://research.yahoofiles/paper_0.pdf
http://aseempapers/GeoDistCrawler.pdf
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

Powered By MyBB, © 2002-2024 iAndrew & Melroy van den Berg.