fantomNews™ — the ultimate know

fantomNews
 
fantomNews RDF feed: cloaking, IP delivery, search engine optimization and marketing fantomNews RSS 0.92 feed: cloaking, IP delivery, search engine optimization and marketing fantomNews RSS 2.0 feed: cloaking, IP delivery, search engine optimization and marketing fantomNews ATOM 0.3 feed: cloaking, IP delivery, search engine optimization and marketing fantomNews XML: cloaking, IP delivery, search engine optimization and marketing fantomNews WAP version: cloaking, IP delivery, search engine optimization and marketing

Lies, Damn Lies and Search Engine Stats: War Diary of a Con Job

(rt) Ever since Yahoo and MSN joined the fray with newly consolidated and/or fresh resources, the notorious Search Engine Wars have flared up full scale again.

The contenders, being the competing public corporations they are, constantly in the limelight of shareholders, investment analyst and consultant, it is hardly surprising to see them flaunt their index size figures all over the place – size, after all, is a concept very much easier to grasp for most people (including those working in the media) than “quality” or “relevancy”. Being traded on the stock exchange equates with access to funding by banks and investors – a crucial entrepreneurial factor in an industry requiring vast amounts of resouces for research and development in an ever more sophisticated technological environment.

However, some hardnosed analysis of their fundamental claims seems called for, the more frantic the competition gets. What is verifiable data, what are merely wild claims … and what are investors and search marketers supposed to make of it all?

Admittedly the material presented below is an old hat for many of the more Google-critical and number crunching SEOs out there. However, for those (presumably the vast majority) who haven’t seen it yet, it seems more than justified to point it out. The main reason being that – rather than constituting yet another contribution towards Google bashing if not trashing – the research outlined impacts all your search analytics, ranging from the actual size of Google’s index (in blatant contrast to their on site claims, effectively inflating their figures by 66%!) via their loss of millions of web pages to the perennial “my index rocks, your index sucks” race between the major search operators, broken search algos, and much, much more.

So let’s introduce a French-and-English blog maintained by Jean Véronis of Aix-en-Provence, France, professor of Linguistics and Information Technology and director of  CILSH Centre Informatique pour les Lettres et Sciences Humaines (appr. Information Center for Letters and Humanities Studies) at the  Université de Provence in Aix. Véronis also doubles as director of  DELIC DEscription Linguistique Informatisée sur Corpus (Informationalized Linguistic Corpus Description), a major research center for Corpus Linguistics.

The blog is named Technologie du langage and, as the title makes clear, is focused on  Language Technology.

Some of Véronis’ published findings are quite spectacular and have set in motion a flurry of corrective activities at the search engines’ headquarters. Nor is this a mere frivolous claim on the author’s part, as this (French only) piece shows:  Web: Le futur selon Yahoo. Here, he mentions Jan Pedersen, Chief Scientist at Yahoo!, citing Véronis’ at length and even displaying his graphs in the course of a presentation at the  10th Search Engine Meeting in Boston, Massachusetts which took place only last month.

For the following overview we have only listed those of Véronis’ postings available in English. We may present a digest of his French material at some later point. The titles are fairly self-explanatory, so we will restrict ourselves to short quotes – highly recommended reading!

The entries are sorted in chronological order.

 

January 26, 2005
 Web: Google’s counts faked?
Quote:

In any case, I would not recommend professional uses of Google’s counts (such as  “Google linguistics”). Yahoo! seems more reliable — or are they simply cleverer?

February 08, 2005
 Web: Google’s missing pages: mystery solved?
Here’s his take on Google’s botched Boolean aearch algo:

In all likelihood, the Google engineers simply forgot to plug the extrapolation routine at the end of the boolean module! Therefore, if you want to know the real index count for any word, simply type it twice:

    Word Count
    stuttering 749,000
    stuttering stuttering 452,000

The second line is likely to be the real count…

February 28, 2005
 Web: MSN cheating too?
Quote:

Google : 66% inflation ; MSN : 33% inflation. About half. Coincidental ?

In any case, so far only Yahoo’s results seem coherent (should I say sincere ?). The irony is that Google probably inflated its count because of MSN’s pressure, when MSN announced 5 billion pages, but it seems that MSN if playing a trick too!

March 09, 2005
 Web: Yahoo doubles its counts!
Quote:

Yahoo has clearly caught up on Google in terms of size and quality (relevance, freshness, etc.), and is beginning to gain more and more respect among professional users, experts, academics (a good step was the release of a very nice API a few days ago).

March 13, 2005
 Web: Google adjusts its counts
Quote:

The Googlers must have been slightly embarrassed, and since the study was published (Feb. 8th), they have been adjusting the counts in a major way to correct the situation.

March 23, 2005
 Google: 5 billion “the” have disappeared overnight
Quote:

Interestingly enough, the new results reveal very clearly that Yahoo indexes more pages than Google

March 25, 2005
 Google: A snapshot of the update

Google is currently undergoing major modifications, in which the problem is no more a simple index update, but an in-depth correction of extrapolation routines and boolean logic, in order to fix the count aberrations

In other words: this is no mere “slightly tweaking the algo to improve search results” – it’s a full fledged upscaling problem, an issue regular readers of fantomNews have been familiar with for years.

And the show goes on …

Subscribe!
Social Bookmark This! These icons link to social bookmarking sites where readers can share and discover new web pages.
  • OnlyWire
  • Socialize-It
  • Digg
  • del.icio.us
  • Furl
  • StumbleUpon
  • Netscape
  • YahooMyWeb
  • Reddit
  • Slashdot
  • Ma.gnolia
  • RawSugar
  • Sphinn

[Keywords: , , , , , ]

Trackback link: http://fantomaster.com/fantomNews/archives/2005/05/18/lies-damn-lies-and-search-engine-stats-war-diary-of-a-con-job/trackback/



Comments currently closed

Recommend Us! Spread it! Recommended us!
The Complete Archive

Download all pre-blog fantomNews + fantomFlash issues in a single text file (zip archive)

fantomas shadowMaker™ fantomas Software shadowMaker Special Deal The all-powerful 100% automatic Shadow Domain™ generator for effective heavy duty industrial-strength cloaking.

This beauty generates 100% relevant and unique content for your Shadow Domains™, creating an unlimited number of highly optimized unique pages including site maps in fully customizable keyword density and page weight

Then submit them to the search engines and redirect search engine generated human traffic in realtime (i.e. without delay) by search phrase to any URL you wish

fantomas spiderSpy™
searchbotBase
Service The world's largest database of search engine and other spiders, updated every six hours, seven days a week

The fantomas spiderSpy™ botBase Service is an indispensable backend tool for IP based search engine optimization software developers and agencies.

It is also a must have for server stats analysis tools and services to ascertain reliable spider access traffic data.

fantomas Webmaster Suite™ Special Deal Everything you will ever need for efficient IP based cloaking both for code protection and search engine optimization.

Plus, some world exclusives such as a 12 month access to the most comprehensive database of search engine spiders extant and the world's first automatic keyword density generator.

As an added bonus: two cutting edge tools to protect your web page code from hijackers and to thwart email harvester bots.

fantomInfo About Us Mission Statement Privacy Policy Contact Office Hours

At fantomaster.com we are committed to aiding internet and Web professionals in achieving their goals in today's and tomorrow's increasingly competitive technological environment.

fantomNews Weblog siteFlash: What's New Here? Archive

Read the latest info on our products and services in our fantomNews™ online newsletter focusing on IP delivery (cloaking), search engine optimization, webmaster tricks, etc

fantomProducts Overview Downloads TechSpecs Manuals Price List

Check out our fine product line of webmaster software, Perl and CGI scripts, many of them world time firsts in their class. See our documentation and test our demo versions in real time.

fantomTips FAQs Tutorials Cloaking and IP Delivery Resources Free Content

Our information gold mine: search engine positioning, IP delivery, cloaking technology, search engine spider IPs, FAQs, link popularity, resources and links to boost your web presence.

fantomServices Overview spiderSpy™ Anti-Spam Anti Code Napping Anti-Fraud

Profit from our research and development efforts! Get the world's most comprehensive database of search engine spiders for top notch search engine optimization and traffic analysis.

fantomFreestuff Overview Services Downloads FAQs Tutorials

Giving back to the community: our free cutting edge applications for webmasters and IT professionals. With thousands of downloads per year, we're helping to make the Web a better place.

fantomOrders Overview Ordering Online PayPal Ordering Offline Price List Special Deals

Need we say more?

We offer the industry's widest variety of secure options for payment, download and registration of our products and services. Order online via our state-of-the-art SSL-secured enhanced Apache server or via PayPal

Alternatively, you may order by fax, by email, by phone or by snail mail.

fantomCrew™ Affiliates Overview FAQ Links & Banners Terms Join Up! Member Login

Teaming up with success: excellent established products, lifetime commissions, zero setup fee, enlightened support — if you can make web professionals listen, speak with us and join up!

fantomTech™ OEM Program Overview Contact

The fantomTech™ Mighty Engines OEM Licensing Program offers cutting edge power engines and value added services for software developers and service providers. Full support available.

fantomMedia™ Center Press Releases

Media workers: stay informed and up-to-date by reading our fantomNews™ online newsletter, special press releases and digests. Consult with our world renowned experts.
Interview inquiries welcome.