fantomTip:
“The Googleness of Being” Further Qualified
(rt) Very geeky stuff, as was Michael Martinez‘ original piece → On the Googleness of Being: this followup is a full three pages’ read – certainly not the easiest of undertakings, even for the experienced SEO, but he’s done it yet again – probably the most important reverse-engineering based analysis of Google’s ranking algorithm published in the past 2 years or so.
Things have changed even after February (when “The Googleness of Being” was originally published, and so he takes into account Google’s recently detected lifting of the → 101K limit for cached pages.
More noteworthy is his inclusion of → Google’s recent patent application and what it may import. While many of his original assertions seem to be borne out by Google’s paper, he introduces a set of additional concepts to better describe the overall ranking process: Fixed Content (referring to originally fresh content turned static), Reassociation (a switch in a document’s relevance to one expression to another), Productive History (a site’s performance in search results), and several others.
To give an example of his reasoning:
Since millions of queries are conducted across Google each day, the simplest method of measuring a URL’s performance would be to create a ranking vector which records every position from 1 to 1000 that a URL is returned for (without regard for the queries). This could lead to an exorbitant amount of data for extremely popular sites, but the daily vector could be averaged and then stored in a monthly vector. The monthly vector would have up to 31 elements, of which the first 28 would be most signicant (alternatively, Google could just go with an artificial 28-day month to match its approximate weekly update cycles and allow for a 4-week rebuild process). The average of a URL’s vector performances could be used as a measure of the site’s popularity, scope of content, and general importance.
Some other interesting and inferences and conclusions based on actual search results:
[…] we can conclude that Google is indeed adapting user behavior to modify its ranking algorithm, which implies that search results rankings will be more dependent upon where query results terminate in chains of successive searches than upon other off-page factors (such as inbound link anchor text). Google may be attempting to learn what is relevant by watching for how users refine their queries.1
1 This seems to fit seamlessly into Google’s freshly beta launched “personalized search” tracking setup → My Search History, effectively creating a prime self-serving data resource. This in turn, of course, fits in neatly with Google’s ambitions towards becoming the world’s #1 data mining company. Vide our recent oped:
↗ fantOpEd:
Search Engines as Data Gobblers
If his assumption should prove correct, Michael forsees the development of query spam as a means of influencing Google’s then-predetermined weighting of popular search queries and the pre-listed results tied to them.
He further articulates a very soft spoken but no less devastating critique of Google’s proprietary PageRank algorithm:
[…] they have demonstrated a resourcefulness in refining what is essentially a flawed concept. Relevance and authority have never been ascertainable by link analysis because the vast majority of Web documents are unaware of each other. That is, studies have shown that popular Web sites get links faster than other Web sites→ , creating a self-sustaining cycle of favoring well-known sites over poorly known sites.
Who gets linked to, therefore, is not determined by quality, but by visibility, and visibility is no measure of quality. Good content can only overcome its competition by achieving high visibility, but if it has no visibility to begin with, it will accrue less visibility than any site already visible.
Next comes an equally lethal criticism of the prevalent “link building frenzy” amongst the SEO crowd: if Google should develop, as Michael anticipates, a set of methodologies for ascertaining relevance, “each with approximate equal probability of success to the others, they can cycle through the methodologies in satisfying queries” – effectively leaving SEOs out in the cold, at least until they have caught up with this approach and have succeeded in developing new strategies to overcome it.
Not that they would have too many choices: if Google capitalizes on its databases of historical search queries by structuring them into blocks with parametrized classes, making it fairly easy to detect attempts of manipulation via bot nets spreading weight skewing spam queries, which in any case would probably have to be spread across the globe so as not to trigger IP class alerts etc. – a daunting task well beyond the resources of most present day SEO agencies.
The only minor contention we have with this view is the sheer volume of queries. While combinations or search terms (keywords) may not actually be infinite, they do come pretty close. This would entail massive scalation problems not easily addressed.
Also, changing search behavior must be factored into the overall methodology: if searchers were still wont to type in one to two term queries only a few years ago, they seem to have become a lot more sophisticated these days – or maybe they are simply fed up with being delivered tons and tons of generic search results not reflecting their real targets. In any case, as → research has shown, four term queries seem to deliver the highest sales conversions, indicating a radical evolution of differentiated search operations. Thus, historical data will probably be of limited use.
On the other hand, Google’s prime advantage over any and all SEOs is it’s direct interface with user behavior. Moreover, their My Search History service, once launched in its final version and adopted by millions of users, will enable them to gauge individualized search behavior to a degree of precision never before achieved. (And of course, the chief advantage of this portalization rests in the fact that all data will be server rather than client based: no having to rely on deletable cookies anymore!)
Another interesting point he makes is the importance of outgoing rather than incoming link for new sites (which won’t have many inbound links organically to begin with anyway):
Google can assess a document’s purpose as much by what the document links to as by what links to the document. A newer page, having few if any inbound links, should be able to influence the determination of its relevance by what it links to. That is, the page makes an initial footprint in its history of association with other pages by saying, “These are the documents I want to be associated with”.
There is lots more, including a brief outlook on viable future optimization strategies, and we strongly recommend you not only peruse but actually study this paper scrupulously if you want to get a shrewd indication of the world of search to come.
This is the (abridged) version on Spider-Food Forum:
→ “Paper: Changes in Google Ranking Strategies”
Read the full paper here:
→ Google: Changes in Google Ranking Strategies
And here’s our original coverage of Michael’s first version:
↗ fantomTip:
Rating the Googleness of Being
You’ll also find our reference to Google’s recently published patent application here:
↗ Where Google Is Heading – the Final Road Map?
There’s a budding thread on it now at Threadwatch:
→ “Trust & Inheritance – Testing Hypotheticals and Google”
[Keywords: Child Inheritance, consumer tracking, contextual advertising, Context Delivery, Correspondence Analysis, Fixed Content, latent semantic indexing, linkage, List Inheritance, LSI, Page Peering, PageRank, ranking algorithms, Reassociation, search analytics, search engine research, semantic indexing, traffic analysis, Trusted Content Site, web analytics ]
Trackback link: http://fantomaster.com/fantomNews/archives/2005/04/20/fantomtip-the-googleness-of-being-further-qualified/trackback/
![[Home]](http://fantomaster.com/images/shim.gif)






















