Distributed Ratelimiter

Post Share

Subroto Biswas, Developer, Azuga Inc.

Azuga’s IOT platform has been growing consistently and having a distributed rate limiter in place has become paramount for both internal and external services. 

Rate limiting is a well-studied problem when it comes to controlling resources in a high transaction SaaS environment. At a high level, a rate limiter throttles the number of events an entity can perform during window period distributed over a cluster of services.

We’ve had our own implementation for limiting usage of 3rd party APIs, which is good in its own way, but as the number of calls/second to the APIs increases so does the system resource utilization.

Rate limiting rules can include:
  • “A single IP can only create 20 accounts per day”.
  • “Devices are allowed 5 failed credit card transactions per day”.
  • “Users may only send 1 message per day with risky keywords”.

While exploring open source tools that can throttle the calls, we found that Network Congestion is a great solution for our needs.  We looked into Guava and the solution we wanted was a scalable and distributed solution which was missing here. Guava works great for a local rate limiter within a single JVM.

We then came across Bucket4J, based on token bucket algorithm – provides distributed capabilities, allow overflowing of calls beyond the defined threshold and even unpredictable.  

In the end, none of the open source solutions solved our problem and we went ahead with our in-house implementation from the ground up. Token Bucket based algorithm, with distributed capabilities, leveraging Redis in-memory data structure store.

The Algorithm

Each bucket has few properties:

Key: a unique id to identify the bucket

Tokens: at any instant of time, the number of tokens presents in the bucket

Previous refill time: previous time at which the bucket was refilled

Refill amount: the rate or number of tokens that needs to be added in each refill cycle

Every other modified implementation of Token Bucket uses an Initial and a max fill, to support Burst effect. While this implementation is good for limiting the total number of requests received on a server with few extra requests processed than the actual rate, we had to restrict the total calls made to 3rd party APIs to a hard limit (no more than the allowed calls per second).

We achieved this by removing the Max Fill and creating a Bucket that can be refilled periodically with the max amount of tokens.

First, if the Bucket didn’t exist:

setCurrentTime(key, System.nanoTime());

We are not using the background service to keep refilling the bucket periodically, instead, a Thread tries to get a Token, and checks if the bucket can be refilled or not. This is done only to support rolling rate limiter. As the interval for refill is not constant, it changes as per first thread that consumes a token during any interval.

For the refill logic, instead of adding one token per rate window time (1/r – where r is rate) we have added the max tokens logic around this algorithm:

long current = System.nanoTime();

long lastFill = getLastRefillTime(counterKey);

long diff = current – lastFill;

if(diff > totalInterval + TimeUnit.MILLISECONDS.toNanos(roundingError)) {

  setCurrentTime(key, current);

  setMaxTokens(bucketKey, totaltokens);


Where bucketKey is a unique key for bucket tokens.

After refill, the thread will be provided with a token from the bucket by decrementing tokens from the total available tokens in the bucket:

long count = decreaseAToken(bucketKey);

So the main logic of consuming tokens looks something like this:




return getToken() >= 0;  // returns true if tokens exists, else false

The lock is an instance of a special Redis based Java library – Redisson, with support for various distributed Java objects. It’s a mutex applied on refill() to synchronize the operation across multiple JVM instances. We are planning to open source the library.