scalability


26
Dec 09

calculating dequeue time to tune load balancer queue handling

Maybe there’s already some obvious answer to this that I just ignored sometime in years past, but at some point, you may use a load balancer that will allow you to queue requests up in the event the backend apps are running slower than normal. Instead of killing the apps, the load balancer sets requests aside until the app is ready for more, then it sends it off.

However, what happens if the app is REALLY in trouble? Your connection queue could very easily grow into thousands of piled up requests all waiting to be serviced. At some point, you’re going to have to start dropping requests from the queue: it will get so large, that even after the app comes back, you may never recover from the pile of requests (or if you do, it might be 60 seconds worth, at which point your user will probably have given up anyway).

So, here we go.

First: how many requests per second (max_requests) can your application slice handle? In this case, your application slice will be however many services are being load balanced. Depending on the max_requests, that will allow us to know how much headroom will be available for a given slice utilization; we will use that headroom to determine how many requests over and above the current utilization are available to help dequeue.

Second: What is our average utilization? If on a given day we are 60% utilized, then we should in theory have 40% capacity available to help dequeue in the event of a surge/backup/whatever. That is, once we begin to dequeue, we will actually be running at a full 100% until we’re back from being under, then we’ll be back at the usual 40%.

Third: How long were we backing up for, ie: what is the worst-case scenario we want to prepare for? If underlying apps rely on a database, and the database locks up for 10 seconds, then that will be 10 seconds of queueing we will get to enjoy. If we can’t afford 10 seconds, then what is the most we’re willing to deal with?

Equation time! Here are the three variables we have to deal with:

max_requests = number of requests/sec that our load balanced slice can handle (ie: 400 .. as in 400 req/s)

utilization = percentage of our average utilization that we want to plan around (ie: 0.6 .. as in 60% avg utilization)

queue_time = number of seconds we were queueing for (ie: 10 .. in which the db locked up)

And here are some things we can derive:

current_requests = ( max_requests * utilization) <--- ie: req/s utilized

available_requests = ( max_requests * ( 1 – utilization ) ) <--- req/s available ... this is also what you should consider headroom

queue_size = ( current_requests * queue_time )

At this point, it’s simple:

time_to_dequeue = ( queue_size / available_requests )

Basically, all we’re saying is, take the number of current requests/sec we’re handling, and multiply it by the number of seconds we were queueing. This will give us the number of requests that are backed up.

Once our queuing has stopped, and we can begin to dequeue, we will still be handling the usual rate of requests (ie: 60% of our max_requests), but we have that 40% of headroom (ie: available_requests) to handle the queue, so we divide our total queue_size up by the available_requests rate, and that will give us the number of seconds our load balancer will take to dequeue.

Expanded, the equation is this:

time_to_dequeue = ( ( ( max_requests * utilization ) * queue_time ) / ( max_requests * ( 1- utilization ) ) )

But wait! The beauty of this is, the time_to_dequeue is going to be the same for any given max_requests (you can see that from the equation). Regardless of our request rate, the rate at which we dequeue will be same relative to the percentage of utilization and a given queue_time.

So, we can simplify, removing anything having to do with request rates, allowing us to place time calculations for anything where we know the utilization and time we were queueing.

time_to_dequeue = ( ( utilization * queue_time ) / ( 1 – utilization ) )

Now, a lot of the times, we need to know how LARGE the queue will be at the given peak (to configure the load balancer accordingly), and in that case, we will need to know request rates.

Either way, I found this to be an interesting little exercise- got my mind thinking about math, and having an equation was much easier to help me build a table of data and pump it into gnuplot.

Enjoy!