fantomTip: How To Block Google’s Web Accelerator
Note: We have published an update to this article here:
? fantomTip:
How To Block Google’s Web Accelerator – Update #2
Please make sure you adjust your existing .htaccess code for optimum protection!
(bro) A very fast and efficacious method of denying Google Web Accelerator (GWA) funneled traffic access to
your web site is blocking the IPs it is calling your pages from.
The current GWA IPs are allocated to the following IP range: 72.14.192.0 – 72.14.192.255
The best procedure for Linux/Unix systems is working with the .htaccess file and Apache’s module mod_rewrite.
For this to work, the following server configuration is required:
- Apache web server with module mod_rewrite installed.
- .htaccess functionality enabled.
If you don’t know whether these requirements are fulfilled please ask your system administrator – provided he or she
knows. They really should, but unfortunately some plain do not …
The .htaccess File
For the mod_rewrite technique to work, you will need to upload a file named .htaccess (please note the period/dot “.” at the beginning of the file name!) to your server’s HTML directory.
This can be done via Telnet or FTP.
Warning! .htaccess should only be uploaded in ASCII mode, i.e. never in binary mode! Else, the file will not be executed properly.
If you already have a .htaccess file on your system, e. g. one with the following entries:
Options Includes +ExecCGIAddType text/x-server-parsed-html .html
simply add our code sample (see below) to it.
IMPORTANT!
For all ADJUSTMENTS IN FILE .htaccess:
please edit in ASCII or plain text editor like Notepad etc.
(Doing it in MS-Word will usually implement some formatting elements which will cause the file not to work as intended.)
The first two entries will trigger the module:
RewriteEngine onOptions +FollowSymlinks
Tip: The entry “RewriteEngine off” will override all subsequent commands. This is a very useful feature: instead of having to comment out all subsequent lines, all you need to do is set an “off” switch.
The next required entry is this one:
RewriteBase /
The “/” symbol stands for the base URL. Should you have another one, you will want to include it.
However, “/” is normally the entry for “http://www.YourDomain.com”.
And now to the entries proper!
RewriteCond %{REMOTE_ADDR} ^72.14.192.RewriteRule ^.*$ - [F]
This rule translates to:
If someone using an IP out of the IP range 72.14.192.0 – 72.14.192.255 tries to access any file, system shall generate error code “HTTP response of 403”.
The regular expression “^.*$” consists of some meta symbols:
^ = Start of line anchor
$ = End of line anchor
The dot “.” in the regular expression is the wildcard meta symbol and signifies any random character.
“*” signifies that the string may occur an unlimited number of times. In this case, regardless which specific page is called, a “forbidden” message will be displayed.
So the complete .htaccess file will now consist of these lines:
RewriteEngine onOptions +FollowSymlinksRewriteBase /RewriteCond %{REMOTE_ADDR} ^72.14.192.RewriteRule ^.*$ - [F]
Redirecting traffic to a designated page
If you don’t want to block the IPs but, rather, wish to redirect visitors to a designated page (e. g. one where you will explain to them what’s happening and how to proceed from there), you can expand your .htaccess file accordingly:
RewriteCond %{REMOTE_ADDR} ^72.14.192.RewriteCond %{REQUEST_URI} !^/gwa-forbidden.html$RewriteRule ^.*$ /gwa-forbidden.html
This translates to:
If someone using an IP out of the IP range 72.14.192.0 – 72.14.192.255 tries to access any file, and the file is not the gwa-forbidden.html file, system shall generate error code “HTTP response of 403”.
The second RewriteCond will prevent an internal loop.
The complete .htaccess file will now consist of these lines:
RewriteEngine onOptions +FollowSymlinksRewriteBase /RewriteCond %{REMOTE_ADDR} ^72.14.192.RewriteCond %{REQUEST_URI} !^/gwa-forbidden.html$RewriteRule ^.*$ /gwa-forbidden.html
Note that the file name gwa-forbidden.html only acts as a placeholder here: you can replace it by the name of your own file.
Thus, with a few lines of code you can effectively prevent the Google Web Accelerator from accessing and scraping your web site.
Concerning the privacy and bandwidth issues tied to usage of Google Web Accelerator see:
? Google – The Coming Out of a Datascraper Spook
Some comments on the Web:
? The Google Web Accelerator Fiasco
? The Google Web Accelerator Fiasco, Part Two
[ ]
Trackback link: http://fantomaster.com/fantomNews/archives/2005/05/05/fantomtiphow-to-block-google%e2%80%99s-web-accelerator/trackback/
![[Home]](http://fantomaster.com/images/shim.gif)























