Circuit Breakers
TL;DR
Retries handle temporary hiccups. But when a service is completely down, retries make things worse — like a crowd pushing harder on a jammed door. A circuit breaker says "stop trying, this service is down" and fails requests instantly instead of waiting for timeouts. This protects both your system and the struggling service.
When Being Helpful Makes Things Worse
Retries with backoff handle temporary failures beautifully. But what happens when a dependency isn't just hiccupping — it's completely down?
Imagine a fire at a restaurant. Customers keep showing up, wait 30 minutes for a table, then leave frustrated. More customers arrive while the first ones are still waiting. The sidewalk fills up. The line blocks the street. Now the fire truck can't get through to put out the fire.
That's a cascading failure — one problem creating new problems, each making the original worse.
Here's how it plays out in software: Your database crashes and needs to restart. Every request to your API tries to hit the database, waits for a timeout, then retries. Now you've got a firehose of retries pointed at a database that's barely booting up. The first instance comes online, gets immediately hammered by the backlog, and collapses before it can stabilize. You can't get any instance running because the retries won't let it breathe.
This is the kind of thing that goes unnoticed until it wakes you up at 3 AM. And it's exactly what circuit breakers prevent.
This Has Happened at Scale
In 2012, Netflix experienced a cascading failure across its API layer. One overloaded backend service caused upstream services to queue requests and exhaust their thread pools. Because there were no circuit breakers, every service in the chain slowed down waiting for the broken one, eventually bringing the entire API to a halt. This incident directly led Netflix to build Hystrix, one of the most influential open-source circuit breaker libraries ever created. It's now a standard pattern in microservice architectures precisely because real companies learned this lesson the hard way.
The Electrical Analogy (It's Right There in the Name)
You know the circuit breaker in your house? When too much current flows through a wire, the breaker trips and cuts the circuit. This prevents the wire from overheating and starting a fire. You fix the problem, flip the breaker back on, and power is restored.
Software circuit breakers work the same way. They wrap your calls to an external service and watch for failures. When failures pile up, the breaker trips and stops all attempts — giving the failing service room to recover.
The Three States
A circuit breaker has three states, and it cycles through them automatically:
Closed (Everything's Fine)
This is normal operation. Requests flow through like usual. Behind the scenes, the circuit breaker is quietly counting failures.
Think of it like a green traffic light — everything moves normally, but the system is watching.
Open (Tripped — Stop Everything)
Failures have crossed a threshold. The circuit breaker immediately rejects all requests without even trying to reach the service. No waiting for timeouts, no adding load to the dying service. Requests fail in 5 milliseconds instead of 30 seconds.
This is the key insight: failing fast is better than failing slow. A quick "sorry, try again later" is way better than making users stare at a loading spinner for 30 seconds only to see an error.
Think of it like a red traffic light — full stop, no exceptions.
Half-Open (Cautiously Testing)
After waiting a while, the circuit breaker lets one single request through as a test. Like peeking through a door to see if the coast is clear.
- If the test request succeeds → The service is probably recovering. Close the circuit and let traffic flow again.
- If it fails → Still broken. Stay open and wait longer.
This makes the system self-healing. No engineer needs to manually flip a switch at 3 AM.

What Circuit Breakers Give You
- Fail fast — Millisecond errors instead of 30-second timeouts. Your users get a quick "something went wrong" instead of a frozen screen.
- Protect the sick service — Stop piling requests on a struggling dependency. Give it room to recover.
- Prevent cascading failures — If Service A calls Service B and B is down, A's circuit breaker trips. Now A fails fast instead of tying up all its threads waiting for B. Without the breaker, A would slow down, then Service C (which calls A) would slow down, and your entire system would freeze like falling dominoes.
- Self-healing — The half-open state automatically tests for recovery without flooding the service.
- Better user experience — A quick "we're having trouble, try again shortly" beats a hanging page.
Where to Use Them
Circuit breakers belong anywhere one service depends on another across a network:
- Third-party APIs — Payment processors, email services, SMS providers
- Database connections — When the DB is overloaded or restarting
- Service-to-service calls — In microservice architectures, every inter-service call is a candidate
- Any external dependency that could fail or slow down
Circuit Breaker + Retries: The Full Picture
Here's how all the failure-handling pieces fit together:
- Timeout — Don't wait forever for a response
- Retry with backoff + jitter — Try again, but give the server breathing room
- Circuit breaker — If failures keep piling up, stop trying altogether
Think of it like getting a taxi in the rain: - Timeout: "I'll wait 5 minutes for a cab." - Retry: "No cab? I'll try again in a few minutes." - Circuit breaker: "Okay, I've tried 5 times. The roads must be flooded. I'll stop trying and take the subway instead."
Interview Tip
Circuit breakers are a strong answer when an interviewer asks "what happens if this service goes down?" Most candidates say "we'd retry." A better answer: "We'd retry with exponential backoff, but if the service is fully down and not just hiccupping, a circuit breaker trips and fails requests instantly — protecting both our system and the failing service from a cascade." This shows you've thought beyond the happy path and understand how real systems fail.