Streaming Patterns
TL;DR
gRPC supports four communication patterns: unary (one request, one response), server streaming (one request, many responses), client streaming (many requests, one response), and bidirectional streaming (many requests, many responses simultaneously). Beyond streaming, gRPC provides built-in deadline propagation, integrates with service meshes for load balancing, and supports browser clients via gRPC-Web. The most common production pattern is REST for public APIs + gRPC for internal services.
The Four gRPC Communication Patterns
REST has one communication pattern: request in, response out. gRPC has four, and understanding when to use each one is what separates a surface-level answer from a strong one in interviews.

Pattern 1: Unary RPC (The Familiar One)
Unary is the simplest pattern — it works exactly like a REST call. One request, one response.
service TicketService {
// Unary: send one request, get one response
rpc GetEvent(GetEventRequest) returns (Event);
}
When to use: Any standard request-response operation. Getting a user profile, creating a booking, fetching a product. If you'd use a normal REST call for it, unary RPC is the equivalent.
Real-world example: A client asks "What are the details for event 123?" and the server responds with the event data. One question, one answer.
Pattern 2: Server Streaming (The Firehose)
The client sends a single request, and the server responds with a stream of messages. The client reads from the stream until it's done.
service StockService {
// Server streaming: client subscribes, server pushes updates
rpc WatchStockPrice(StockRequest) returns (stream PriceUpdate);
}
message StockRequest {
string symbol = 1; // e.g., "AAPL"
}
message PriceUpdate {
string symbol = 1;
float price = 2;
int64 timestamp = 3;
}
Think of this like subscribing to a news feed. You say "I want updates on Apple stock," and the server keeps sending you price changes as they happen. You don't have to keep asking — the server pushes to you.
When to use:
- Real-time price feeds (stocks, crypto, sports scores)
- Live progress updates (file processing, ML model training)
- Event logs (streaming log entries as they occur)
- Search results delivered incrementally (return results as they're found, not all at once)
Why not just poll with REST? With REST, you'd call GET /stocks/AAPL/price every second. That's 60 HTTP requests per minute, each with full headers, connection setup, and JSON parsing. With server streaming, you open one connection and the server pushes small binary updates as prices change. Drastically less overhead.
Pattern 3: Client Streaming (The Upload)
The client sends a stream of messages, and the server responds with a single message after it's received everything (or enough).
service UploadService {
// Client streaming: client sends chunks, server responds when done
rpc UploadFile(stream FileChunk) returns (UploadResult);
}
message FileChunk {
bytes data = 1;
int32 chunk_number = 2;
}
message UploadResult {
string file_id = 1;
int64 total_bytes = 2;
bool success = 3;
}
The analogy here is dictating a letter to someone over the phone. You speak sentence by sentence (streaming chunks), and when you're done, they say "Got it, your letter has been filed as #456."
When to use:
- File uploads in chunks
- Sending batches of sensor/IoT data
- Aggregating data from the client before processing (e.g., collecting GPS points for a route, then calculating the total distance)
- Log shipping (client streams log entries, server acknowledges when batch is stored)
Pattern 4: Bidirectional Streaming (The Conversation)
Both the client and server send streams of messages simultaneously. Neither side has to wait for the other to finish. This is full-duplex communication over a single connection.
service ChatService {
// Bidirectional streaming: both sides send and receive concurrently
rpc Chat(stream ChatMessage) returns (stream ChatMessage);
}
message ChatMessage {
string user_id = 1;
string text = 2;
int64 timestamp = 3;
}
This is like a real phone conversation. Both people can talk and listen at the same time. The client sends messages as the user types, and the server relays messages from other users in real time.
When to use:
- Real-time chat applications
- Collaborative editing (Google Docs-style)
- Multiplayer game state synchronization
- Interactive voice/video processing (send audio frames, receive transcription in real time)
Streaming Patterns Summary
| Pattern | Client Sends | Server Sends | Real-World Example |
|---|---|---|---|
| Unary | 1 message | 1 message | Get event details |
| Server streaming | 1 message | Stream of messages | Live stock price feed |
| Client streaming | Stream of messages | 1 message | Upload file in chunks |
| Bidirectional | Stream of messages | Stream of messages | Real-time chat |
Interview Tip
You rarely need to draw out all four streaming patterns in a system design interview. But if your design involves real-time data (chat, live feeds, notifications), mentioning "we'd use gRPC server streaming here" shows depth. For chat systems or collaborative features, "bidirectional gRPC streaming" is the right callout.
Deadlines and Timeouts: gRPC's Built-In Safety Net
One of gRPC's underrated features is deadline propagation. In a microservices architecture, a single user request might trigger a chain of internal calls:
With REST, if the user's request has a 5-second timeout, each service in the chain has no idea about that deadline. The Order Service might wait 4 seconds for Payment, then Payment waits 4 seconds for Fraud Detection — and the user's request times out at the gateway while services are still working.
gRPC solves this by propagating deadlines through the call chain:
User sets 5s timeout
→ API Gateway (4.8s remaining)
→ Order Service (4.5s remaining)
→ Payment Service (4.2s remaining)
→ Fraud Detection: "I only have 4.0s left, better be quick"
Each service in the chain knows exactly how much time it has left. If the deadline has already passed, the service can immediately return an error instead of wasting resources on a request nobody is waiting for.
// Client sets a deadline
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
// This deadline automatically propagates to every downstream gRPC call
response, err := orderService.CreateOrder(ctx, request)
This is particularly important for avoiding cascading failures. Without deadline propagation, a slow downstream service can tie up resources in every upstream service, causing the entire system to grind to a halt.
Load Balancing with gRPC
Load balancing gRPC is trickier than load balancing REST, and it's worth understanding why.
The problem: REST typically uses short-lived HTTP/1.1 connections. A load balancer sees each request as a separate connection and can distribute requests evenly. gRPC uses long-lived HTTP/2 connections with multiplexing. A traditional L4 (TCP-level) load balancer sees one connection and sends all requests to the same server.
REST + L4 Load Balancer:
Request 1 → Server A ✓ (new connection)
Request 2 → Server B ✓ (new connection)
Request 3 → Server C ✓ (new connection)
gRPC + L4 Load Balancer:
Request 1 → Server A (connection established)
Request 2 → Server A (same connection!)
Request 3 → Server A (same connection!)
Request 4 → Server A (everything goes to A!)
Solutions:
| Approach | How It Works | Used By |
|---|---|---|
| L7 (application-level) load balancer | Understands HTTP/2 frames, distributes individual requests | Envoy, Nginx (newer versions) |
| Client-side load balancing | Client knows about all servers, picks one per request | gRPC built-in, Netflix Ribbon |
| Service mesh (sidecar proxy) | Each service has a local proxy (Envoy) that handles LB | Istio, Linkerd |
| Look-aside load balancing | Client queries a separate LB service for the best server | gRPC xDS protocol |
In practice, most production gRPC deployments use either Envoy as an L7 proxy or a service mesh like Istio that handles load balancing transparently.
gRPC-Web: Bringing gRPC to the Browser
Here's an awkward truth: browsers can't natively speak gRPC. Browser JavaScript can make HTTP/2 requests, but it can't access the low-level HTTP/2 framing that gRPC requires. The browser's fetch() and XMLHttpRequest APIs don't expose the necessary HTTP/2 trailer support that gRPC depends on.
gRPC-Web is the solution. It's a modified protocol that works within browser limitations:
Browser (gRPC-Web client)
|
| HTTP/1.1 or HTTP/2 (browser-compatible)
|
[Envoy Proxy / gRPC-Web proxy]
|
| Native gRPC (HTTP/2 + Protobuf)
|
Backend gRPC Service
The proxy translates between gRPC-Web (which browsers can speak) and native gRPC (which backends speak). This adds a hop but lets browser clients benefit from Protobuf's type safety and compact format.
Limitations of gRPC-Web:
- Client streaming and bidirectional streaming are not fully supported in all implementations (server streaming works)
- Requires a proxy (Envoy is the most common)
- Adds operational complexity compared to REST
This is one of the main reasons public-facing APIs stick with REST — browsers support it natively without any proxy layer.
When to Use gRPC (The Decision Framework)
After all the technical details, here's the practical decision framework:
Use gRPC When:
| Scenario | Why gRPC Wins |
|---|---|
| Internal service-to-service calls | Performance + type safety where you control both sides |
| Polyglot environments (Go + Java + Python services) | One .proto generates all client/server code, guaranteed compatibility |
| Streaming requirements | Built-in support for all four patterns, no WebSocket hacks |
| High-throughput data pipelines | Binary serialization saves bandwidth and CPU at scale |
| Strict API contracts across teams | .proto files as enforceable contracts with compile-time checks |
| Mobile clients on constrained networks | Smaller payloads = faster loads, less data usage |
Do NOT Use gRPC When:
| Scenario | Why REST/Other Is Better |
|---|---|
| Public-facing APIs for third-party developers | REST + JSON is universally understood; Protobuf adds a learning curve |
| Simple CRUD applications | gRPC's tooling overhead isn't worth it for basic apps |
| Browser-first applications without a proxy | gRPC-Web requires a proxy; REST works natively |
| Quick prototypes or MVPs | JSON is simpler to debug, test, and iterate on |
| APIs that need human readability | You can't curl a gRPC endpoint and read the response |
| Teams unfamiliar with Protobuf | Learning curve is real; REST has a lower barrier |
The Common Production Pattern: REST + gRPC
The most successful large-scale systems don't choose one or the other — they use both:
Internet
|
[API Gateway]
/ \
REST (JSON) REST (JSON)
/ \
[Mobile App] [Web Browser]
| |
+------→ [API Gateway] ←------+
|
gRPC (Protobuf)
/ | \
[User [Event [Payment
Service] Service] Service]
\ | /
gRPC (Protobuf) internally
|
[Notification
Service]
Public-facing layer: REST with JSON. Browsers and mobile apps send HTTP requests with JSON bodies. Easy to debug, easy to document (OpenAPI/Swagger), easy for third-party developers.
Internal layer: gRPC with Protocol Buffers. Services communicate with binary messages over HTTP/2. Fast, type-safe, and streamable. Teams define contracts in .proto files and generate code in whatever language they prefer.
This is the pattern at Google, Netflix, Uber, Lyft, and most companies operating at scale. It's not theoretical — it's the industry standard.
gRPC in System Design Interviews
Here's exactly how and when to bring up gRPC in an interview:
During the API Design step: Design the user-facing REST API. This is what the interviewer expects. Don't design internal RPC interfaces here.
API Design:
GET /events → List events
GET /events/:id → Get event details
POST /events/:id/bookings → Create a booking
GET /bookings/:id → Get booking details
During the High-Level Design step: When you draw boxes for internal services, mention gRPC:
"The Event Service and Booking Service communicate over gRPC for type safety and performance. Since they're both internal services that we control, the binary serialization and compile-time contracts reduce integration bugs."
When discussing real-time features: If the system requires streaming data (live feeds, notifications, chat):
"For the live price updates, the client subscribes via gRPC server streaming. This avoids the overhead of polling and gives us a persistent stream of updates."
When asked about tradeoffs: This is where you show depth:
"We use REST for the public API because it's universally accessible and easy to document. Internal services use gRPC because we control both sides, need type safety across our Go and Java services, and the binary format reduces bandwidth between our data centers."
Interview Tip
A common mistake is spending 5 minutes explaining Protocol Buffers during the API step. Don't do this. Mention gRPC in one sentence during high-level design, and only go deeper if the interviewer asks. The API step is for user-facing endpoints. Save gRPC for the architecture discussion.
Quick Reference: RPC and gRPC Cheat Sheet
| Concept | Key Point |
|---|---|
| RPC | Call remote functions as if they were local |
| gRPC | Google's modern RPC: Protobuf + HTTP/2 |
| Protocol Buffers | Binary serialization format, 5-10x smaller than JSON |
| Why it's fast | Binary encoding (Protobuf) + HTTP/2 multiplexing. NOT "faster than HTTP" |
| .proto files | Single source of truth, generates code in any language |
| Field numbers | Used for wire encoding; never change them |
| Streaming | 4 patterns: unary, server, client, bidirectional |
| Deadlines | Propagate automatically through the call chain |
| Load balancing | Needs L7 LB or service mesh (not simple L4) |
| gRPC-Web | Proxy-based solution for browser clients |
| Production pattern | REST (public) + gRPC (internal) |
| Interview usage | Mention during high-level design, not API step |
| ## Interview Expectations: Junior vs. Senior |
- Junior/Mid-level: Can mention streaming as a feature of gRPC but might struggle to differentiate between server, client, and bidirectional streaming.
- Senior/Staff: Proposes gRPC streaming for specific, high-throughput use cases (like real-time telemetry, continuous log shipping, or video chunking). Understands that gRPC streaming requires HTTP/2 end-to-end, which complicates load balancing (requiring L7 load balancers that understand HTTP/2).