Anti-Spam Tip #004
Blocking the Spambots
(rt) A very professional method of weeding out email harvesters or extractor bots by blocking them from access to your web site altogether.
This is usually worked by UserAgent because typically there will be no fixed IPs you could work from as most of these extractor programs are client based. Hence, tools like our own freeware fantomas blockFrog™, being IP focused, will do you little good in this respect.
One way to go about it, though, is to make use of your web server's own resources. For brevity's sake we will limit the examples given in this tip to systems running Apache webserver.
The Apache module mod_rewrite comes free with every version of Apache. However, note that it is not installed and implemented by default! So if your web host doesn't feature it yet, you will have to request them to install it for you. You will also require .htaccess functionality to make the following tip work for you.
For an in depth discussion of this functionality see our four part Module mod_rewrite Tutorial
Let's say you want to block users parsing your site with email harvester programs Extractor Pro and EmailSiphon. The UserAgent variables transmitted by these programs in action are “ExtractorPro” and “EmailSiphon” respectively.
The method presented will generate an “access forbidden” message whenever a visitor with one of these UserAgents tries to access your site.
htaccess code
In your .htaccess file, add the commands given in the box below and upload it to your main directory (DocumentRoot).
Note: Please unwrap any lines (6 in all) before copying and pasting to your system if your browser should wrap them — this is crucial, as the code won't work otherwise!
However, this method does have its drawbacks, too:
- You are restricted to systems featuring Apache and .htaccess as well as module mod_rewrite functionality. (IIS based web sites are out of luck!)
- You must follow the mod_rewrite and .htaccess syntax to the dot. Else, you may realistically risk blocking site access altogether — even for yourself!
- This is a one-for-all approach: the block commands will apply to any directory content under the one you uploaded the .htaccess file to — unless you consign a different .htaccess version to those directories you wish to exclude from the ban. Also, it does not allow blocking by web page. (While this is basically possible under mod_rewrite, it involves a more complex syntax we cannot cover here.) Not every webmaster will want to bow to these restrictions.
Still, it will probably cover the requirements of most small web sites nicely.
For a more flexible (and comfortable!) way of doing it, involving CGI scripts and SSI, and also allowing to block individual pages not only by UserAgent but by IP and by host as well, you may want to take a look at our fantomas multiBlocker™. This will work on IIS systems, too, provided they offer Perl & CGI functionality — usually a given.
Professional IP Blocker Program The heavy duty spam, snoop and fraud protector: block an unlimited number of predefined IPs, UserAgents and referrer Hosts from accessing your web pages.
Avoid code napping, frivolous litigation for purported rights violation, protect your code and your privacy from established snoops and dumb, misbehaved spiders running rampant on your site.
Weed out domains or entire countries notable for their high rate of fraudulent credit card chargebacks.
Read More …![[Home]](http://fantomaster.com/images/shim.gif)
