Mar 19 2008
The Fallacy in our Search for Information
Jan. 8th, 2008 marked the official 10 year anniversary of Google’s famous PageRank algorithm. The idea was conceived earlier, but Larry Page filed the PageRank algorithm patent (U.S. Patent 6,285,999 ) on Jan. 8th 1998. PageRank together with the business acumen of the Google folks powered Google to the throne of Internet. Is this worth celebrating? Maybe not. Here’s why.

Internet search results follows the power law (Pareto distribution), i.e. listings on the first few pages of search results get all the traffic. Based on AOL search data with 20 million searches for 650,000+ users over a 3-month period, the 1st search result was chosen 42.3% of time, the 1st page (first 10 results) 89.6%. The percentage drop drastically fast after 1st page and nobody ever picks a search result beyond the 502nd position. See the diagram.
The order of search results on the planet’s most popular search engine Google is governed by the PageRank algorithm. PageRank algorithm assigns a numerial weight to each HTML page based on the number and the weight of other pages referring to it (called incoming links).
It’s of utmost importance to Internet marketers that your websites are listed on the first page of the chosen keywords. And this requires a lot incoming links with high PageRank to your websites. Tragically, much of the creativity of our generation of netizens has been consumed in engaging paid listing, link exchange, SEO, spam comments, etc. etc. Sites not engaging in self-promotion for incoming links disappeared into oblivion in the search space. Plenty of SEO marketers promise 1st page listing within a week, for a fee. And they fulfill the promise most of time.
Can the true knowledge compete with the assault of SEO marketers armed with an excessive array of modern weapons (tools, bots and phony web sites)? The competition is like that between the European settlers and the American India in the New World.
What Google search results present are, at best, our popular believes. And a large collection of Google Bombs is the living proof how easily and regularly the popular believes are manipulated. Try searching for “French Victory”, “Dangerous cult” and “Miserable failure”. The No.1 in Google results point to pages about “French defeat”, “Scientology” and “George Bush” respectively. Google defused some of the bombs in Jan. 2008. But God knows how many more are still at large or in the making.
Pages filled with disinformation swamp the ones with useful information, pushing them down in the ranking. To see this effect, you only need to search for term “credit card”. You won’t find a credit card provider in the first page. To get noticed, the credit card providers have to pay, normally over $2 per click. One may speculate that it is precisely this effect which sustains the growth of Google’s revenue.
Alas, we have created the great Internet in our own intellectural image. Are we aware of how fallable and distorted this image has become? The image was shaped not by our understanding and truth, but rather by the commercial value of keywords. Most of us would think homogeneously because we read exactly the same information in our Google searches. I wish Google has “research philosopher” on staff to understand the philosophical impact of its search engine.
Back to PageRank. The PageRank idea is similar to bibliography citation ranking in academic papers, where the most cited paper are considered most original. Researchers in the early 70’s showed the high citation ranking often do not reflect the importance but rather a phenomenon called the Matthew Effect (the famous authors get disproportionally more citations than the less famous ones). As Nassim Taleb put it in “The Black Swan”,
“Academic success is partly (but significantly) a lottery” (page 217).
PageRank had shown its age of relevance. To prolong the aging PageRank algorithm, Google proposed the “NoFollow” attribute in HTML tag in 2005. Recent Google crack down on link farms and paid links is another attempt to maintain the meaning of PageRank. Will paid links be completely stamped out? Betting on it would mean admitting that our generation of netizens do not have any ingenuity.
However, these being said, I believe Google only wants to prolong the life of PageRank until the next search ranking algorithm is ready.
The PageRank is dead; Long live the PageRank!
What will the next algorithm look like? Well, for the sake of brevity, I’ll leave it to the next blog. And here’s a hint. Why was Google toolbar aggressively promoted? Why do other companies, e.g. Yahoo, Alexa/Amazon, promote their toolbar equally aggressively?
Until next time, please write down your praise, disapproval, criticism and suggestions.
The Matthew Effect has nothing whatever to do with randomness. It is about giving credit for scientific papers to the best known author of a paper regardless of how much the best known author contributed to the work.
[Reply]
John, I stand corrected. Text updated. Thanks for point this out.
[Reply]