Why Load Balancing

TL;DR

When one server can't handle all your traffic, you add more servers. But then someone has to decide which server handles which request — that's load balancing. Think of it like a host at a busy restaurant seating guests at different tables so no single waiter gets overwhelmed.

The Restaurant That Got Too Popular

Imagine you open a small restaurant with one waiter. Things go great until word gets out and suddenly you've got a line out the door. Your one waiter is sprinting between tables, orders are taking forever, and customers are leaving.

You have two choices:

Get a superhuman waiter (vertical scaling) — Someone who can carry 10 plates at once and never gets tired. This works up to a point, but there's a limit to how super one person can be.
Hire more waiters (horizontal scaling) — Now you can serve more tables. But you need a host at the front to seat people evenly — otherwise everyone might crowd around the same waiter while the others stand around doing nothing.

That host is your load balancer.

Vertical vs Horizontal Scaling

Vertical vs. Horizontal Scaling — A Quick Detour

Vertical scaling means making your one server more powerful — more CPU, more RAM, faster disks. It's underrated! Modern machines are incredibly powerful. If upgrading your server solves the problem, that's often the simplest path.

Horizontal scaling means adding more servers. It's the pattern interviewers expect to hear about. The moment you have two or more servers, you need something to distribute traffic between them.

The Core Problem

Without a load balancer, how do your users know which server to talk to? If you tell half of them "go to Server A" and the other half "go to Server B," what happens when Server A crashes? Half your users are stuck.

A load balancer sits between your users and your servers. Users always talk to the load balancer, and it decides which server handles each request. If Server A goes down, the load balancer just stops sending traffic there. Users don't notice a thing.

Load Balancing Problem

Two Ways to Balance the Load

There are two fundamentally different approaches, and each has its place:

1. Client-Side Load Balancing — "The Client Picks"

The client itself decides which server to talk to. It gets a list of available servers from a directory (called a service registry) and picks one. No middleman needed.

This is like a restaurant app that shows you all the open tables — you pick one yourself instead of waiting for a host.

2. Dedicated Load Balancer — "A Traffic Cop"

A separate component sits between clients and servers, routing every request. The client talks to the load balancer, the load balancer talks to a server.

This is your classic restaurant host — standing at the front, deciding who sits where.

We'll cover both approaches in the next two lessons, starting with client-side (it's simpler and often overlooked), then diving into dedicated load balancers.

Wait — Reverse Proxy, API Gateway, Load Balancer... Aren't Those the Same Thing?

These three terms come up in almost every system design discussion, and they overlap enough to cause confusion. Here's the clean distinction:

Component	Core Job	Example
Load Balancer	Distribute traffic across multiple servers	Nginx, AWS ALB/NLB, HAProxy
Reverse Proxy	Sit in front of servers, hide internal topology from clients	Nginx, Envoy, Caddy
API Gateway	Auth, rate limiting, request transformation, billing, analytics	Kong, AWS API Gateway, Apigee

A reverse proxy is the broadest concept — any server that sits between clients and your backend, forwarding requests on the client's behalf. Clients talk to the proxy; the proxy talks to your servers. This hides your internal network structure (clients never see individual server IPs) and lets you add caching, compression, or SSL termination in one place.

A load balancer is a reverse proxy that distributes traffic. All load balancers are reverse proxies, but not all reverse proxies load balance (you might have a reverse proxy in front of just one server for SSL termination).

An API gateway is a reverse proxy that does application-level work — authenticating requests, enforcing rate limits, transforming payloads, routing by API version, collecting billing metrics. It's the "smart receptionist" that handles business logic before requests reach your services.

In practice, these roles often collapse into one box. Nginx is commonly used as a reverse proxy, load balancer, and a lightweight API gateway simultaneously. But understanding the conceptual difference matters in interviews — it shows you know why each component exists, not just that you can name-drop them.

Interview Tip

When you say "I'd add a load balancer" in an interview, follow it up with why. The two main reasons are (1) distributing traffic so no single server is overwhelmed, and (2) automatic failover — if a server goes down, the load balancer routes around it. Most candidates mention the first but forget the second, and failover is often the more valuable one.