Fast and easy way to block bots from your website using Apache
Some weeks ago the site I work on started having severe outages. It looked like the system was not able to fulfill the incoming requests fast enough, making the passenger queue to grow faster than new requests could be served.
Looking at the rails logs it looked like some Chinese bot was crawling the entire site, including a long list of dynamic pages that took a long time to generate and that are not usually visited. Those pages were not yet cached, so every request went through the rails pipeline. Once you start having the dreadful problem of your passenger queue to grow faster and faster you are usually doomed.
Since you can’t expect some of the malicious bots out there to respect the
robots.txt file, I had to filter those requests at the Apache level so they did not even reach the application level. This past few months I’ve been learning a lot of systems administration, basically because it’s us, the developers, who also handle this part of the business.
Since all those requests came from the same
user agent, I looked for a simple way to filter the requests based on this criteria. It can be easily done if you use the
mod_access Apache module. All you need to do is make use of the
Deny directives. Here’s a simple example to filter the
<Directory "/home/rails/sites/prod/your_site/current/public"> SetEnvIf User-Agent "ezooms" BlockUA Order allow,deny Deny from env=BlockUA Allow from all </Directory>
What this piece of code does is very self explanatory. The first line tells Apache to set up an environment variable called
BlockUA if the
user agent of the request matches the “
ezooms” string. Then you tell Apache the order it has to evaluate the access control to the directory: it first has to evaluate the
Allow directive, and then the
Deny one. After that you set up both directives.
Allow from all basically allows everything in.
Deny from env=BlockUA denies all requests in which the environment variable
BlockUA has been set. Since that variable is set up when the
user agent matches our desired string, the config will basically deny access to the application to all requests with the “
ezooms” user agent.
This way you can easily protect yourself from basic bot attacks.