Scraping Google Down to Size
(rt) Microsoft tried this approach with their infamous “Smart Tags” feature back in 2001 – and had to retract it following an avalanche of protests. After all, webmasters rightly resented their content being tampered with by outside parties solely interested in serving their own commercial agenda.
Now, Google is trying to get away with it, too. But why should they?
Their new Beta Toolbar includes a feature called “AutoLink” (quickly tagged → AutoThink – “Let the toolbar think so you don’t have to!” – by its critics). The toolbar scans through the current Web page and links any addresses or ISBN numbers to Google’s services. Meaning that they will mix your site’s content with competitors’ sales links – without asking your permission, of course. (But then again there’s some consistency in this, for when has Google ever asked web site owners’ permission for anything, including displaying their copyright protected content within a different
context via their Cached Page function …)
Well, here’s a way to level the playing field again. This cute script will stop the toolbar from placing a link in the Web page. It even allows you to substitute your own Amazon affiliate code in any Amazon links the toolbar features.
The AutoBlink Plus version will fill the toolbar’s pull down menu with junk ISBNs, effectively defeating its function.
Check it out here: ↗ www.searchguild.com …
Alternatively, try this one: ↗ javascript.internet.com …
And here is a whole pile of solutions (JavaScript and other) that will help you disable Google’s Autolink feature:
↗ www.threadwatch.org …
The latest effort by David Naylor:
↗ www.davidnaylor.co.uk/archives/2005/02/25/google-autolink-code/ …
Further to this, Mark Pilgrim has written a Firefox browser extension named Butler that effectively busts Google’s approach to smithereens. It’s free and should it catch on in a big way with surfers, it could seriously impact Google’s business model. Read all about it here: ↗ diveintomark.org …
All of this blends seamlessly into the growing attitude of irritation, annoyance and mistrust Google has begun to foster amongst ever larger Web constituencies. Because in case you haven’t noticed, while Google bombing really qualifies for a yawn inspiring old black hat now, there’s another new Web sports of sorts around called Google Scraping, and it’s actually been going for quite a while, too. And flourishing nicely, as the following examples go to show.
Sure, it may violate Google’s Terms of Service, offering automatic search, but in a further development of Mark Pilgrim’s Firefox extension Butler this blog site offering CustomizeGoogle is truly going over the top: use regular Google search in Firefox, but remove ads, add other product links, even remove ads from gmail, etc. etc. So is it spyware? Here’s what they say:
No, CustomizeGoogle is not spyware. It does not track the pages you visit, display ads, hijack Amazon affiliate links, log keystrokes, steal passwords, set cookies, “phone home,” or install any bundled software on your computer. It is simply a Firefox script that modifies a few Google services in a useful way. If you don’t like it, you can easily uninstall it.
And so you can. Check it out here:
→ CustomizeGoogle
This goes way beyond Scroogle, the now-classic AdWords scraper with an attitude – sure, they may be in flagrant violation of Google’s Terms of Service, offering automatic search, but that’s anything but a coincidence:
This step that we have taken has implications for all search engines. These engines crawl the public web without asking permission, and cache and reproduce the content without asking permission, and then use this information as a carrier for ads that generate private profit. We are convinced that if citizens scrape Google and strip the ads, and make the scraped results available as a nonprofit public service, that this is legal. This is especially the case if there are public policy concerns behind the scraping.
Google Watch has been the most prominent critic of Google’s outrageous privacy policies for more than two years. This is why we started the proxy, and it’s why we continue the proxy. We invite Google to serve us with a cease and desist letter as a first step toward resolving this issue. So far, we have yet to hear from Google’s lawyers. By releasing the source code for our proxy, we’re trying to escalate the issue.
Nor are they beyond quoting Google’s top echelons in pursuit of their agenda:
“We are moving to a Google that knows more about you.”
— Google CEO Eric Schmidt, speaking to financial analysts,
February 9, 2005, as quoted in the New York Times the next day
Indeed, Scroogle’s primary concern seems to be Google’s data mining activity and that search engine’s fairly shoddy Privacy Policy. Thus, it only makes sense that they’re sporting the motto “Not traceable by Big Brother”, deleting their access log every seven days. By the way – they’ll scrape Yahoo! for you as well …
Here are the links:
→ Scroogle homepage
→ Scroogle’s Google and Yahoo! scraper
If you’d like to proactively do something against the sorry abuse of webmasters’ efforts, here’s a petition to sign against the Google toolbar’s Autolink feature with its potential to wreck just about any commercial site’s business:
Finally, let’s not forget our alltime favorite (covered extensively before in fantomNews), Daniel Brandt’s → Google Watch.
[Keywords: AutoBlink, Firefox extensions, Google bombing, Google scraping, Google toolbar ]
Trackback link: http://fantomaster.com/fantomNews/archives/2005/04/03/fantomtipautoblink-cutting-google-down-to-size-2/trackback/
![[Home]](http://fantomaster.com/images/shim.gif)















