I was recently looking for a way to throttle the number of requests to a site from bad bots that sometimes flood a server with dozens of requests in a few seconds. I found a very nice simple function by Charlie Arehart:
http://www.carehart.org/blog/client/index.cfm/2010/5/21/throttling_by_ip_address
Before proceeding, you might want to look over Charlie's article, which nicely lays out many of the considerations and caveats that come with limiting the number of requests by IP address.
Aside from all the caveats, Charlie's function had a couple things I wanted to adjust. One was that, as written, it would block even "good" bots, like Googlebot, if they made requests more frequently than the "duration" parameter. For instance, if you set it up to allow up to 6 requests every 3 seconds (as I did), it would end up blocking Googlebot from making a steady stream of requests at the frequency of once per second. So even though Googlebot was only making 3 requests every 3 seconds it was getting blocked.
Adjusting Charlie's function to stop doing that was very easy. But the other thing I was concerned about was whether keeping a list of IP addresses in an application variable would scale very well. How long would the list have to be before it would affect performance? I didn't really know, but felt it would be better to have a solution where the list would have old IP addresses pruned periodically. I toyed with the idea of using a scheduled task to go through the list and removed old entries. That seemed cumbersome, though. In the end, since the site I was working on was running ColdFusion 9, I thought it would be a good opportunity to use ColdFusion 9's new caching functions. Using ColdFusion's built-in caching functions means the I can have old IP addresses cleaned up automatically.
So here is my version of the rate limiter. This is generally more forgiving than Charlie's version of the function. In Charlie's version if a bot is blocked but continues to make requests (e.g. Googlebot), it will continue to be blocked until it pauses for the "duration". In my version, the bot is blocked for the "duration" and then allowed to make more requests before being blocked again. My main goal is to block the rogue bots that flood the system with 10 or 20 requests a second for a short time, and this does a pretty good job of that.
<cffunction name="limiter">
<cfargument name="duration" type="numeric" default=3>
<cfargument name="count" type="numeric" default=6>
<cfset var cacheId = "rate_limiter_" & CGI.REMOTE_ADDR>
<cfset var rate = cacheGet(cacheId)>
<cfif isNull(rate)>
<!--- Create cached object --->
<cfset cachePut(cacheID, {attempts = 1, start = Now()}, createTimeSpan(0,0,1,0))>
<cfelseif DateDiff("s", rate.start, Now()) LT arguments.duration>
<cfif rate.attempts gte arguments.count>
<cfoutput>
<p>You are making too many requests too fast,
please slow down and wait #arguments.duration# seconds</p>
</cfoutput>
<cfheader statuscode="503" statustext="Service Unavailable">
<cfheader name="Retry-After" value="#arguments.duration#">
<cflog file="limiter" text="#cgi.remote_addr# #rate.attempts# #cgi.request_method# #cgi.SCRIPT_NAME# #cgi.QUERY_STRING# #cgi.http_user_agent# #rate.start#">
<cfif rate.attempts is arguments.count>
<!--- Lock out for duration --->
<cfset cachePut(cacheID, {attempts = rate.attempts + 1, start = Now()}, createTimeSpan(0,0,1,0))>
</cfif>
<cfabort>
<cfelse>
<!--- Increment attempts --->
<cfset cachePut(cacheID, {attempts = rate.attempts + 1, start = rate.start}, createTimeSpan(0,0,1,0))>
</cfif>
<cfelse>
<!--- Reset attempts --->
<cfset cachePut(cacheID, {attempts = 1, start = Now()}, createTimeSpan(0,0,1,0))>
</cfif>
</cffunction>
Posted on April 2, 2012 2:58:45 PM EDT by David Hammond
Posted on July 19, 2012 3:58:00 PM EDT by Craig
Posted on July 19, 2012 5:01:16 PM EDT by David Hammond
Posted on August 22, 2012 7:22:52 AM EDT by Adam
Posted on August 22, 2012 8:08:11 AM EDT by David Hammond
Comments have been disabled for this page.