I. Introduction
1. What is Rate Limiting?
Rate limiting is a strategy for limiting network traffic that is very commonly used on free to use APIs. It puts a limit on how often someone can call an API within a certain time frame. This way you can protect against certain kinds of malicious or abusive intentions and activities as well as you protect your hardware or costs of resources!
2. Why is Rate Limiting Crucial for APIs?
These days most APIs are hosted in the Cloud and therefore are connected to monthly costs depending on how much resources they use. Rate Limiting can help you to reduce excessive or even abusive usage of your resources. It helps you to keep the costs low as well as ensuring uptime of your API / service. It sometimes even has security features to rate limit requests – for instance, rate limiting the login requests for your application to prevent brute-force attacks. But don’t just lock down the entire login service, just because there are lots of users trying to login at the same time – find the correct way of how to rate limit!
II. Types of Rate Limiting
1. Request-Based Rate Limiting
Just counting the requests per minute up to a certain limit will restrict your API in general – which is more of a self defense system. It protects your resources, while ignoring the user experience completely. Attackers could just send a few hundreds of requests per minute and DoS all your users by doing so. This is the last resort, and should most likely never be used in production ever.
2. IP-Based Rate Limiting
To not lock out all users, you could track IP addresses to find out who is spamming requests against your API. This is the first idea that most developers have when they need to find out who is calling their API – and let me tell you: It’s not the best one! An IP is a good indicator, but might be misleading! While sometimes, this is all we know of our callers it is very much possible for multiple devices to call from the same IP address! Especially if you have various callers sitting in the same office, household or network.
3. User-Based Rate Limiting (or API-Key)
If your API is restricted by a login anyways, another possibility is to track the user specifically after they logged in successfully. This way you can target the very person that is spamming the API and there is no more collateral damage for other users. This also works great with API-Key restricted endpoints.
III. Limiting Strategies
1.Fixed Window
Let’s talk about implementations. The 1st idea most developers have is to set a limit of e.g. 120 per minute. They will count up to 100 and reset the counter whenever 60 seconds have passed. This is called Fixed Window Limiting naturally. While this is certainly a way of handling excessive requests, there are other solutions that might fit your use case a little better.
2. Sliding Window
For example if you want to keep the “120 Requests per 1 minute” mindset, you could also define smaller intervals to replenish the counter. Let’s break the minute into 3x 20 second segments that replenish however many requests were made exactly 60 seconds before.
Time | Available | Taken | Recycled from expired | Carry over |
---|---|---|---|---|
0 | 120 | 40 | 0 | 80 |
20 | 80 | 50 | 0 | 30 |
40 | 30 | 30 | 0 | 0 |
60 | 0 | 30 | 40 | 10 |
80 | 10 | 10 | 50 | 50 |
100 | 50 | 15 | 30 | 65 |
120 | 65 | 50 | 30 | 45 |
3. Token Bucket
Very similar to the sliding window variant, the token bucket limit will not replenish exactly the requests you made last minute, but rather a fixed value. So basically every 20 seconds you would get up to 20 new requests credited, but never more than the initial limit of 120 requests.
4. Concurrency
Sometimes the pure amount of requests is not the main issue, but rather some synchronous operations that might get problematic if too many requests hit the same code multiple times within a very short time. With the concurrency limit you can ensure that there are never more than X requests handled at the same time.
IV. Best Practices
1. Handling Requests
Keep your main focus on the user experience. Everybody understands that your system and infrastructure is valuable and worth protecting but please be graceful with your users and handle requests in a proper way. There are two very important things users should know when their request is not being processed.
- Send them a HTTP response code of 429 – Too Many Requests
- There is a HTTP Header “Retry-After” that you should set to the window defined for your app.
2. Monitor and Adjust
Check the state of your API regularly and consider the defined limits.
- Is the window too small or too big?
- Is the limit still appropriate?
- Is the limit hitting the right people / routes on your API?
- Is the method of limiting useful to you?
3. Documentation
A good documentation on your API containing the rate limiting strategy is key for your users. It also encourages them to build their consuming system in a friendly way, to either not run into the limits or at least stop spamming, when the limit hits them. A reference to the “Retry-After” header and the units used in it is also very helpful to build a nice consuming system.
V. Conclusion
Now you know the most important things about rate limiting. I suggest you start implementing them into your next project, or add your knowledge to an existing API of yours to gather some hands-on experience. There are many frameworks that handle rate limiting for you, but sometimes they are not 100% fitting your use case. In fact I got into learning about rate limiting because the .NET Library could not do what I needed, and I had to go down that rabbit hole.
Thank you for reading!
Source link
lol