Security

Rate Limiting

Rate limiting is a technique that limits how many requests a client may send to an API or service within a given time window.

Rate limiting is one of the most important protections for APIs and web services. It limits how many requests a client may send in a time window and thus protects against overload, abuse and DDoS. Without rate limiting a single client or bot could make the service unusable for everyone. Every professional API – from Twitter and Stripe to your own REST API – uses rate limiting.

What is Rate Limiting?

Rate limiting is a traffic control technique that limits the number of requests per client per time unit. When a client exceeds the limit it receives an error (HTTP 429 Too Many Requests) and must wait. Algorithms include fixed window (count per fixed window), sliding window (limit computed over a sliding window), token bucket (tokens refilled at constant rate) and leaky bucket (requests processed at steady rate). Limits can be per IP, API key, user or endpoint. Besides abuse protection, rate limiting supports fair use and monetization (different limits per plan). Headers like X-RateLimit-Limit, X-RateLimit-Remaining and Retry-After inform clients of their status.

How does Rate Limiting work?

On each request the rate limiter checks whether the client has already reached its limit. A counter per client (identified by IP, API key or token) is kept in fast storage like Redis. With token bucket each client has a bucket of tokens; each request consumes one and tokens are refilled at a constant rate. When the bucket is empty the request is rejected. Sliding window computes the limit over the current and previous window to avoid burst issues at the boundary. In distributed systems the counter must be stored centrally (e.g. Redis) so all instances enforce the same limits.

Practical Examples

API gateway with rate limiting: Kong or AWS API Gateway limits requests per API key to 1,000 per minute and protects the backend from overload.

SaaS tiering: Free tier 100 API calls per day, Pro 10,000, Enterprise unlimited. Rate limiting enforces the tiers.

E-commerce protection: Rate limiting on checkout prevents bots from placing mass orders or blocking inventory.

Webhook throttling: Outgoing webhooks limited to 100 per second to avoid overloading the recipient.

Typical Use Cases

DDoS mitigation: Rate limiting is a first line of defence against volumetric attacks on APIs and web services

Fair use: Resources are shared fairly so one client does not consume all capacity

API monetization: Different rate limits for different pricing tiers (Free, Pro, Enterprise)

Bot protection: Detection and limiting of automated access via unusually high request rates

Backend protection: Prevent cascade failures when a service is suddenly flooded with requests

Advantages and Disadvantages

Advantages

Availability: Rate limiting protects services from overload and keeps them available for all users
Security: Brute force, scraping and DDoS are effectively contained
Fairness: All clients get a fair share of resources
Cost control: In pay-per-use models rate limiting protects against unexpectedly high cost from abuse
Simple to add: Common API gateways and frameworks offer rate limiting built in

Disadvantages

False positives: Overly strict limits can throttle legitimate users during peaks
Bypass: Attackers can use many IPs or API keys to spread load
Distributed complexity: Consistent limits across multiple server instances need central state (Redis)
Tuning effort: Finding the right limits requires analysis of normal usage and ongoing adjustment

Frequently Asked Questions about Rate Limiting

Which rate-limiting algorithm is best?

Token bucket offers a good compromise: it allows short bursts (full bucket) while keeping average rate limited. Sliding window is more accurate but more complex to implement. For simple cases fixed window is enough. Choice depends on accuracy, burst tolerance and implementation effort.

How do I communicate rate limits to API users?

Via response headers: X-RateLimit-Limit (max requests), X-RateLimit-Remaining (remaining), X-RateLimit-Reset (reset time). On exceed respond with HTTP 429 and Retry-After. Also document limits clearly in the API docs.

Is rate limiting enough for DDoS protection?

No. Rate limiting is only one layer. For large volumetric DDoS you also need CDN-based protection (Cloudflare, AWS Shield), WAF and network-level filtering. Rate limiting is effective against application-layer (Layer 7) attacks but not against network-layer (Layer 3/4) attacks.

Want to use Rate Limiting in your project?

We are happy to advise you on Rate Limiting and find the optimal solution for your requirements. Benefit from our experience across over 200 projects.

Learn more Get free consultation

Back to IT Glossary