Design a URL Shortener
A URL shortener is the "hello world" of system design interviews -- simple enough that you can finish in 35 minutes, deep enough that a staff-level candidate can still find interesting things to say. The trick is knowing which parts are actually hard (ID generation, redirect semantics) and which parts interviewers only care about as proof you've built real systems (database choice, caching).
The System
A URL shortener takes a long URL like https://example.com/very/long/path?with=params&and=more and produces something like https://short.ly/a8Kz3p. When anyone visits the short link, they get redirected to the original.
Bit.ly processes billions of link clicks per month across 4+ billion links created since launch. TinyURL handles similar scale. These systems power marketing campaigns, SMS messages (where character counts matter), and analytics dashboards that track how links perform across geographies and devices. The read-to-write ratio is extreme -- a link gets created once but clicked hundreds or thousands of times. That asymmetry shapes every architectural decision.
Requirements
Functional
- Shorten: Given a long URL, return a short URL (e.g.,
short.ly/a8Kz3p) - Redirect: Given a short URL, redirect the user to the original long URL
- Custom aliases: Users can optionally specify a custom short code (e.g.,
short.ly/my-promo) - Expiration: URLs can have an optional TTL after which they stop working
Non-Functional
- Read latency: Redirect completes in under 50ms at the 99th percentile (users click short links in time-sensitive contexts like ads)
- Write throughput: Handle 1M new URLs per day (peak: 120 writes/sec)
- Read throughput: Handle 100M redirects per day at 100:1 read-write ratio (peak: ~12K reads/sec)
- Durability: Once created, a URL mapping must not be lost until it expires
- Availability: 99.9% uptime -- a broken short link in a marketing campaign costs real money
Back-of-Envelope Math
Write path:
1M new URLs/day = 1,000,000 / 86,400 = ~12 writes/sec average
Peak (10x): ~120 writes/sec
A single Postgres instance handles 10,000+ writes/sec. This is trivial.
Read path:
100:1 read-write = 100M redirects/day
100,000,000 / 86,400 = ~1,160 reads/sec average
Peak (10x): ~11,600 reads/sec
Storage:
Each record: short_code (7B) + long_url (avg 200B) + created_at (8B) +
expiration (8B) + user_id (16B) + overhead = ~500 bytes
Per year: 365M URLs * 500B = ~183 GB/year
5-year total: ~915 GB
This fits on a single database server.
ID space:
7 characters of base62 (a-z, A-Z, 0-9) = 62^7 = 3.5 trillion codes
At 1M URLs/day: exhaustion in 9.6 million years
Even at 100M URLs/day: exhaustion in 96,000 years
7 characters is more than enough.
Cache:
Top 20% of URLs get 80% of traffic (Zipf distribution)
20M unique URLs accessed daily * 20% = 4M hot entries
4M * 500 bytes = ~2 GB
Fits in a single Redis instance with room to spare.
The Naive Design
┌──────────┐ ┌──────────────┐ ┌──────────────┐
│ Client │────>│ Web Server │────>│ PostgreSQL │
│ │<────│ (monolith) │<────│ │
└──────────┘ └──────────────┘ └──────────────┘
POST /shorten { url: "https://long-url.com/..." }
-> Server generates short code
-> INSERT INTO urls (short_code, long_url) VALUES (...)
-> Return { short_url: "short.ly/a8Kz3p" }
GET /a8Kz3p
-> SELECT long_url FROM urls WHERE short_code = 'a8Kz3p'
-> Return 302 Redirect to long_url
Honestly? This works fine for a small team's internal link shortener. One server, one database, done.
Where Does This Break First?
It is not the write path (12 writes/sec is nothing). It is the read path -- every single redirect hits the database. At 11K reads/sec peak, your database becomes the bottleneck, and your p99 latency climbs because redirect queries compete with everything else.
Where It Breaks
The core tension in a URL shortener is the read-heavy asymmetry. The database can handle writes all day, but reads at 100x the write rate will saturate connection pools and push latency past your 50ms target.
There is also a subtler problem: ID generation. The naive approach -- hash the URL with MD5, take the first 7 characters -- sounds elegant until you run the birthday paradox math. Seven characters of base62 gives you 3.5 trillion possibilities, which sounds like a lot. But the 50% collision probability hits at only ~2.6 million URLs. At 1M URLs/day, you are dealing with collisions within the first few days. Not theoretical. Certain.
And the third thing most candidates miss: redirect semantics. The difference between HTTP 301 and 302 determines whether you even see the clicks that flow through your system.
The Real Design
┌─────────────────────────┐
│ CDN / Edge Workers │
│ (Cloudflare Workers) │
└───────────┬─────────────┘
│ cache miss
v
┌──────────┐ ┌───────────────┐ ┌──────────────────┐
│ Client │────>│ Load Balancer │────>│ Read Service │
│ │<────│ │<────│ │
└──────────┘ └───────────────┘ └────────┬─────────┘
│ cache miss
v
┌──────────────┐
│ Redis Cache │
│ (~2 GB) │
└────────┬──────┘
│ cache miss
v
┌──────────────┐
│ PostgreSQL │
│ (primary + │
│ read replica)│
└──────────────┘
┌──────────┐ ┌───────────────┐ ┌──────────────────┐ ┌──────────┐
│ Client │────>│ Load Balancer │────>│ Write Service │────>│ Redis │
│ (create) │<────│ │<────│ │ │ Counter │
└──────────┘ └───────────────┘ └────────┬─────────┘ └──────────┘
│
v
┌──────────────┐
│ PostgreSQL │
│ (primary) │
└──────────────┘
Write Path
- Client sends
POST /api/shortenwith the long URL. - Write service calls
INCR short_url_counteron Redis to get the next unique integer. - Service base62-encodes the integer to produce the short code (e.g.,
1000000becomes"4c92"). - Service inserts
(short_code, long_url, created_at, expires_at)into Postgres with a UNIQUE constraint onshort_code. - Service writes the mapping to Redis cache (write-through) so the first click is fast.
- Returns the short URL to the client.
Why a counter instead of hashing? Counters never collide. Period. No birthday paradox, no retry logic, no Bloom filters. The counter approach is boring, and boring is exactly what you want for the core of a URL shortener.
Counter batching: Each write server requests 1,000 IDs at once from Redis (INCRBY short_url_counter 1000). It then uses IDs locally without further Redis calls. If a server crashes with 400 unused IDs in its batch, those IDs are simply lost. Acceptable -- you have 3.5 trillion of them.
Read Path (Redirect)
- Client visits
short.ly/a8Kz3p. - Read service checks Redis cache. Cache hit (~95% of requests): return the long URL.
- Cache miss: query Postgres read replica. Populate cache on miss.
- Check if the URL has expired. If yes, return
410 Gone. - Return an HTTP redirect to the long URL.
The redirect response includes a Location header with the original URL. The status code matters -- and it is the most debated decision in this entire design.
Custom Aliases
Custom aliases create a collision risk with generated codes. If a user creates the alias "abc123" and your counter later generates the same code, you have a conflict.
The cleanest fix: use separate namespaces. Generated codes live at short.ly/{code} and custom aliases live at short.ly/c/{alias}. No overlap possible, no special lookup logic needed.
Alternatively, reserve a range of the counter space (skip short codes that match custom aliases), but that adds complexity for little gain.
Deep Dives

Deep Dive 1: ID Generation -- Counters, Hashes, and the Birthday Paradox
This is the core technical decision in the design, and where most candidates either nail it or fumble.
The hash approach (and why it's worse than it looks)
The intuitive approach: hash the long URL with MD5, base62-encode it, take the first 7 characters.
import hashlib, base62
short_code = base62_encode(int(hashlib.md5(url.encode()).hexdigest(), 16))[:7]
The problem is the birthday paradox. The math is unforgiving:
| Characters (base62) | Space size | 50% collision at | Days at 1M/day |
|---|---|---|---|
| 6 chars | 56.8 billion | ~267K | < 1 day |
| 7 chars | 3.5 trillion | ~2.6M | 2.6 days |
| 8 chars | 218 trillion | ~20.9M | 21 days |
| 9 chars | 13.5 quadrillion | ~164M | 164 days |
With 7 characters, you hit 50% collision probability at roughly 2.6 million URLs. That is not a scale problem -- it is a day-three problem. You must handle collisions, which means retry logic, UNIQUE constraints, and increasing latency as collision probability rises.
The counter approach (what I'd actually use)
A simple Redis INCR gives you a monotonically increasing integer. Base62-encode it:
Counter value 1,000,000 -> base62 -> "4c92"
Counter value 3,521,614,606,208 -> base62 -> "zzzzzzz" (7 chars filled)
Zero collisions by construction. The only downside: sequential codes are predictable. An attacker can enumerate short.ly/4c90, short.ly/4c91, short.ly/4c92 to discover all URLs in sequence.
If predictability matters, apply a Feistel cipher before base62 encoding. A Feistel cipher is a reversible, fixed-width permutation -- it maps every integer in the range [0, N) to a unique other integer in the same range, but in a way that looks random. You get the unpredictability of hashing with the collision-free guarantee of a counter.
Other approaches worth mentioning
| Strategy | Collisions? | Predictable? | Coordination needed? | Best for |
|---|---|---|---|---|
| Counter + base62 | Never | Yes | Redis counter | Most systems |
| MD5 hash + truncate | Yes (~2.6M) | No | None (stateless) | When you need stateless gen |
| Pre-generated key pool | Never | No | KGS service | When unpredictability is hard requirement |
| Snowflake-style IDs | Never | Partially | Time sync only | When you also need ordering |
I'd pick the counter approach for 90% of URL shortener interviews. It is the simplest, most reliable, and easiest to reason about. If the interviewer pushes on predictability, mention Feistel ciphers. If they push on single-point-of-failure, mention counter batching and Redis Sentinel.
Redis counter failure recovery: Query the database for SELECT MAX(short_code) FROM urls, decode base62 to get the last known counter value, add a safety margin of 10,000 (to account for in-flight batches), and reinitialize the counter. The UNIQUE constraint on short_code is the ultimate safety net.
Deep Dive 2: 301 vs 302 -- What Bit.ly Actually Does
Most candidates say "use 302 for analytics" and move on. That answer is fine but incomplete. Here is the full picture.
302 Found (Temporary Redirect)
The browser does not cache anything. Every click goes through your server. You see every single redirect, so your analytics are perfectly accurate.
The downside: your server handles 100% of the traffic. No browser-side caching helps you.
301 Moved Permanently
The browser caches the redirect. Subsequent clicks go directly to the destination without hitting your server. Great for reducing load. Terrible for analytics -- you can not count clicks you never see.
301 also transfers SEO "link juice" to the destination URL. Businesses running marketing campaigns care about this: if their Bit.ly link accumulates backlinks, a 301 passes that PageRank to the destination. A 302 does not.
What Bit.ly actually does: 301 + Cache-Control
If you curl -v bit.ly/4ox4Y67, you will see:
HTTP/1.1 301 Moved Permanently
Cache-Control: private, max-age=90
Location: https://destination-url.com/
This is a clever middle ground. The 301 gives SEO benefits. Cache-Control: private prevents CDN and shared proxy caches from caching the redirect (only the user's browser caches it). max-age=90 means the browser cache expires after 90 seconds. So the same user clicking the same link twice within 90 seconds only generates one server-side event, but after 90 seconds, the next click hits Bit.ly's servers again.
My recommendation: Start with 302 in the interview (safer, full analytics control). Then mention that Bit.ly uses the 301 + Cache-Control trick as a production optimization. The interviewer will appreciate that you know the real-world implementation.
Deep Dive 3: Expiration and Cleanup
URLs with a TTL need to stop working after expiration. But how do you actually clean them up?
Layer 1: Read-time check (lazy deletion)
On every redirect, check the expiration timestamp. If expired, return 410 Gone instead of redirecting. Also delete the entry from the cache.
if url_record.expires_at and url_record.expires_at < now():
cache.delete(short_code)
return Response(status=410) # Gone
This catches all expired URLs that get clicked. Zero overhead for non-expired URLs (just a timestamp comparison). But it never cleans up URLs that nobody clicks -- they linger in the database forever.
Layer 2: Background sweep
A periodic job (hourly or daily) runs:
Batch deletes in chunks of 10,000 to avoid long-running transactions. Run during off-peak hours. Must use an index on expires_at to avoid full table scans.
Layer 3: Database-native TTL (if using DynamoDB)
DynamoDB can automatically delete expired items if you enable TTL on a column. Items get removed within ~48 hours of expiration. No application code needed. If you choose DynamoDB, this is free cleanup.
For Redis cache entries, set EXPIRE on each key to match the URL's expiration time. The cache cleans itself.
Should you reuse expired short codes? Almost certainly not. If someone bookmarked short.ly/a8Kz3p and you reassign that code to a different URL, the bookmark now goes to the wrong place. The ID space is large enough (3.5 trillion codes for 7 chars) that exhaustion is not a real concern.
Alternative Designs
Alternative 1: Serverless at the Edge
Instead of a traditional server architecture, deploy the entire read path as Cloudflare Workers or AWS Lambda@Edge. The redirect logic runs at 200+ edge locations worldwide, reading from a globally distributed KV store (Cloudflare KV or DynamoDB Global Tables).
Write path stays centralized (API Gateway + Lambda + DynamoDB). Read path is fully distributed.
Alternative 2: Hash-Based with Bloom Filter
If you genuinely need stateless ID generation (multiple regions, no shared counter), use MD5 hashing with a Bloom filter for fast collision pre-checking. The Bloom filter (~1.8 bytes per item at 0.1% false positive rate) sits in front of the database. For 1 billion URLs, the Bloom filter is ~1.67 GB -- fits in memory.
On collision, retry with a salt/nonce. UNIQUE constraint on the database is the final safety net.
| Aspect | Counter + Redis | Edge Workers + KV | Hash + Bloom Filter |
|---|---|---|---|
| Collision risk | Zero | Zero (uses counter) | Low (Bloom filter) |
| Read latency | 5-10ms (cache hit) | 1-5ms (edge) | 5-10ms (cache hit) |
| Write complexity | Low | Medium (sync to KV) | Medium (retry logic) |
| Multi-region writes | Needs coordination | Needs coordination | Fully independent |
| Operational complexity | Low | Low (managed infra) | Medium (Bloom filter) |
| Monthly cost at scale | ~$500-2K | ~$200-1K | ~$500-2K |
I would default to the counter approach for most interviews and mention edge workers as an optimization if the interviewer asks about global latency.
Scaling Math Verification
Let's verify our design handles the target load.
Write path (120 writes/sec peak):
- Redis
INCR: handles 100K+ operations/sec on a single instance. Our 120/sec is 0.1% of capacity. - Postgres INSERT: a single instance handles 10K+ writes/sec. Our 120/sec is 1.2% of capacity.
- Verdict: a single Redis counter and a single Postgres primary handle writes with enormous headroom.
Read path (11,600 reads/sec peak):
- Redis cache hit rate: ~95% (Zipf distribution + top-20% caching)
- Redis reads: 11,600 * 0.95 = 11,020 reads/sec. Redis handles 100K+ reads/sec. Fine.
- Postgres reads (cache misses): 11,600 * 0.05 = 580 reads/sec. A read replica handles this easily.
- Verdict: the read path handles 10x our peak with no issues.
Storage (5 years):
- 915 GB total. A single Postgres instance with a 1 TB SSD handles this. No sharding needed.
- If you want headroom, add a second Postgres instance, but it is not required at this scale.
Failure Analysis
What breaks at 10x (10M new URLs/day, 1B redirects/day)?
| Component | Current capacity | At 10x | Breaks? | Fix |
|---|---|---|---|---|
| Redis counter | 100K+ ops/sec | 1,200 writes/sec | No | -- |
| Redis cache | 2 GB, 100K reads/s | 20 GB, 116K r/s | No | Larger instance or Redis Cluster |
| Postgres writes | 10K+ writes/sec | 1,200 writes/sec | No | -- |
| Postgres reads | 5K reads/sec | 5,800 reads/sec | Maybe | Add read replica, improve cache hit rate |
| Storage | 1 TB | 9.15 TB (5 yr) | Yes | Shard by hash of short_code |
| Network/LB | Standard | ~116K req/sec | Maybe | Multiple LB instances, edge caching |
The first thing to break at 10x is storage (9 TB over 5 years). The fix is straightforward: shard Postgres by hash of the short code. Redirect lookups (the dominant operation) hit exactly one shard. Use consistent hashing so adding a shard does not require reshuffling all data.
The second thing to consider: at 1B redirects/day, you probably want edge caching (Cloudflare Workers, Lambda@Edge) to handle the read path closer to users. This drops latency from 10ms (cache hit at your data center) to 1-5ms (edge cache hit).
What breaks at 100x?
At 10B redirects/day (~116K reads/sec sustained), you need a distributed cache (Redis Cluster across regions), edge computing for the read path, and a multi-region database setup. At this point you are building Bit.ly, and you should probably just be running DynamoDB Global Tables with edge workers.
What's Expected at Each Level
| Aspect | Mid-Level | Senior | Staff+ |
|---|---|---|---|
| Requirements | Shorten + redirect, basic numbers | Read-write ratio, latency targets, storage est. | Full back-of-envelope with ID space math |
| ID generation | "Use a hash" or counter | Counter with base62, mention collision risk | Compare counter vs hash vs KGS, birthday paradox math, Feistel for obfuscation |
| Redirect semantics | "Use a redirect" | 301 vs 302 trade-off | 301 + Cache-Control (what Bit.ly does), SEO implications |
| Caching | "Add Redis" | Cache-aside pattern, LRU, sizing | Zipf distribution, cache warming, LFU vs LRU |
| Database | "Use Postgres" or "Use DynamoDB" | Justify choice, discuss UNIQUE constraint | Sharding strategy (hash of short_code), explain why sharding is probably unnecessary at stated scale |
| Expiration | "Add an expiry field" | Lazy deletion on read | Three-layer cleanup: lazy + sweep + native TTL |
| Rate limiting | Not mentioned | Mention token bucket per API key | Fixed-window or sliding-window, tie to write protection |
| Analytics | Not mentioned | Acknowledge as follow-up | Async via Kafka, HyperLogLog for unique visitors |
| Edge optimization | Not mentioned | CDN for redirect caching | Cloudflare Workers / Lambda@Edge for redirect logic at edge |
The single most important thing at any level: show you understand that this is a read-heavy system and that caching is the first optimization, not database scaling.
References from Our Courses
- Redis Data Structures and Use Cases — caching short-to-long URL mappings for read-heavy traffic
- Redis Persistence and Replication — durability guarantees for cached redirect entries
- HyperLogLog and Count-Min — approximate unique visitor counting for analytics
Red Team This Design
Ready to stress-test this architecture? The Attack companion tears apart every decision in this design — from hardware physics to security holes to what actually happens at 10x scale.