Breadth-First Search Crawling Yields High-Quality Pages Marc Najork and Janet Weiner Jonathan Ledlie (jonathan@eecs) October 2, 2001 The authors argue that the PageRank algorithm for computing the quality of pages, while very good at finding pages of high quality, is too computationally-intensive to scale to the billions of pages on the Internet and is better approximated with a breadth-first search. Their analysis compares a breadth-first search of 351 million pages with a PageRank analysis of the same set of pages and finds that the highest PageRank-ing pages are heavily weighted to the beginning of the crawl. They do not perform a strict breadth-first search to download pages to analyze. Instead, their web crawler, called Mercator, pauses briefly in between downloading pages from the same web server. It seemed unclear to me that this design change -- originally enacted to avoid overloading a crawled web server -- would actually lead to significantly different results. Still, it does show that crawlers which do perform this friendliness do not get poor results. One area the authors leave unresolved is how local links should be weighed in comparison with remote. To me, they successfully show that essentially whatever further analysis is going to be performed on the crawled pages, a breadth-first search of them will yield better, more timely, results than other crawling methods.