Streaming Patterns

TL;DR

gRPC supports four communication patterns: unary (one request, one response), server streaming (one request, many responses), client streaming (many requests, one response), and bidirectional streaming (many requests, many responses simultaneously). Beyond streaming, gRPC provides built-in deadline propagation, integrates with service meshes for load balancing, and supports browser clients via gRPC-Web. The most common production pattern is REST for public APIs + gRPC for internal services.

The Four gRPC Communication Patterns

REST has one communication pattern: request in, response out. gRPC has four, and understanding when to use each one is what separates a surface-level answer from a strong one in interviews.

gRPC four streaming patterns

Pattern 1: Unary RPC (The Familiar One)

Unary is the simplest pattern — it works exactly like a REST call. One request, one response.

service TicketService {
  // Unary: send one request, get one response
  rpc GetEvent(GetEventRequest) returns (Event);
}

When to use: Any standard request-response operation. Getting a user profile, creating a booking, fetching a product. If you'd use a normal REST call for it, unary RPC is the equivalent.

Real-world example: A client asks "What are the details for event 123?" and the server responds with the event data. One question, one answer.

Pattern 2: Server Streaming (The Firehose)

The client sends a single request, and the server responds with a stream of messages. The client reads from the stream until it's done.

service StockService {
  // Server streaming: client subscribes, server pushes updates
  rpc WatchStockPrice(StockRequest) returns (stream PriceUpdate);
}

message StockRequest {
  string symbol = 1;   // e.g., "AAPL"
}

message PriceUpdate {
  string symbol = 1;
  float price = 2;
  int64 timestamp = 3;
}

Think of this like subscribing to a news feed. You say "I want updates on Apple stock," and the server keeps sending you price changes as they happen. You don't have to keep asking — the server pushes to you.

When to use:

Real-time price feeds (stocks, crypto, sports scores)
Live progress updates (file processing, ML model training)
Event logs (streaming log entries as they occur)
Search results delivered incrementally (return results as they're found, not all at once)

Why not just poll with REST? With REST, you'd call GET /stocks/AAPL/price every second. That's 60 HTTP requests per minute, each with full headers, connection setup, and JSON parsing. With server streaming, you open one connection and the server pushes small binary updates as prices change. Drastically less overhead.

Pattern 3: Client Streaming (The Upload)

The client sends a stream of messages, and the server responds with a single message after it's received everything (or enough).

service UploadService {
  // Client streaming: client sends chunks, server responds when done
  rpc UploadFile(stream FileChunk) returns (UploadResult);
}

message FileChunk {
  bytes data = 1;
  int32 chunk_number = 2;
}

message UploadResult {
  string file_id = 1;
  int64 total_bytes = 2;
  bool success = 3;
}

The analogy here is dictating a letter to someone over the phone. You speak sentence by sentence (streaming chunks), and when you're done, they say "Got it, your letter has been filed as #456."

When to use:

File uploads in chunks
Sending batches of sensor/IoT data
Aggregating data from the client before processing (e.g., collecting GPS points for a route, then calculating the total distance)
Log shipping (client streams log entries, server acknowledges when batch is stored)

Pattern 4: Bidirectional Streaming (The Conversation)

Both the client and server send streams of messages simultaneously. Neither side has to wait for the other to finish. This is full-duplex communication over a single connection.

service ChatService {
  // Bidirectional streaming: both sides send and receive concurrently
  rpc Chat(stream ChatMessage) returns (stream ChatMessage);
}

message ChatMessage {
  string user_id = 1;
  string text = 2;
  int64 timestamp = 3;
}

This is like a real phone conversation. Both people can talk and listen at the same time. The client sends messages as the user types, and the server relays messages from other users in real time.

When to use:

Real-time chat applications
Collaborative editing (Google Docs-style)
Multiplayer game state synchronization
Interactive voice/video processing (send audio frames, receive transcription in real time)

Streaming Patterns Summary

Pattern	Client Sends	Server Sends	Real-World Example
Unary	1 message	1 message	Get event details
Server streaming	1 message	Stream of messages	Live stock price feed
Client streaming	Stream of messages	1 message	Upload file in chunks
Bidirectional	Stream of messages	Stream of messages	Real-time chat

Interview Tip

You rarely need to draw out all four streaming patterns in a system design interview. But if your design involves real-time data (chat, live feeds, notifications), mentioning "we'd use gRPC server streaming here" shows depth. For chat systems or collaborative features, "bidirectional gRPC streaming" is the right callout.

Deadlines and Timeouts: gRPC's Built-In Safety Net

One of gRPC's underrated features is deadline propagation. In a microservices architecture, a single user request might trigger a chain of internal calls:

User → API Gateway → Order Service → Payment Service → Fraud Detection

With REST, if the user's request has a 5-second timeout, each service in the chain has no idea about that deadline. The Order Service might wait 4 seconds for Payment, then Payment waits 4 seconds for Fraud Detection — and the user's request times out at the gateway while services are still working.

gRPC solves this by propagating deadlines through the call chain:

User sets 5s timeout
  → API Gateway (4.8s remaining)
    → Order Service (4.5s remaining)
      → Payment Service (4.2s remaining)
        → Fraud Detection: "I only have 4.0s left, better be quick"

Each service in the chain knows exactly how much time it has left. If the deadline has already passed, the service can immediately return an error instead of wasting resources on a request nobody is waiting for.

// Client sets a deadline
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

// This deadline automatically propagates to every downstream gRPC call
response, err := orderService.CreateOrder(ctx, request)

This is particularly important for avoiding cascading failures. Without deadline propagation, a slow downstream service can tie up resources in every upstream service, causing the entire system to grind to a halt.

Load Balancing with gRPC

Load balancing gRPC is trickier than load balancing REST, and it's worth understanding why.

The problem: REST typically uses short-lived HTTP/1.1 connections. A load balancer sees each request as a separate connection and can distribute requests evenly. gRPC uses long-lived HTTP/2 connections with multiplexing. A traditional L4 (TCP-level) load balancer sees one connection and sends all requests to the same server.

REST + L4 Load Balancer:
  Request 1 → Server A  ✓ (new connection)
  Request 2 → Server B  ✓ (new connection)
  Request 3 → Server C  ✓ (new connection)

gRPC + L4 Load Balancer:
  Request 1 → Server A  (connection established)
  Request 2 → Server A  (same connection!)
  Request 3 → Server A  (same connection!)
  Request 4 → Server A  (everything goes to A!)

Solutions:

Approach	How It Works	Used By
L7 (application-level) load balancer	Understands HTTP/2 frames, distributes individual requests	Envoy, Nginx (newer versions)
Client-side load balancing	Client knows about all servers, picks one per request	gRPC built-in, Netflix Ribbon
Service mesh (sidecar proxy)	Each service has a local proxy (Envoy) that handles LB	Istio, Linkerd
Look-aside load balancing	Client queries a separate LB service for the best server	gRPC xDS protocol

In practice, most production gRPC deployments use either Envoy as an L7 proxy or a service mesh like Istio that handles load balancing transparently.

gRPC-Web: Bringing gRPC to the Browser

Here's an awkward truth: browsers can't natively speak gRPC. Browser JavaScript can make HTTP/2 requests, but it can't access the low-level HTTP/2 framing that gRPC requires. The browser's fetch() and XMLHttpRequest APIs don't expose the necessary HTTP/2 trailer support that gRPC depends on.

gRPC-Web is the solution. It's a modified protocol that works within browser limitations:

Browser (gRPC-Web client)
    |
    | HTTP/1.1 or HTTP/2 (browser-compatible)
    |
[Envoy Proxy / gRPC-Web proxy]
    |
    | Native gRPC (HTTP/2 + Protobuf)
    |
Backend gRPC Service

The proxy translates between gRPC-Web (which browsers can speak) and native gRPC (which backends speak). This adds a hop but lets browser clients benefit from Protobuf's type safety and compact format.

Limitations of gRPC-Web:

Client streaming and bidirectional streaming are not fully supported in all implementations (server streaming works)
Requires a proxy (Envoy is the most common)
Adds operational complexity compared to REST

This is one of the main reasons public-facing APIs stick with REST — browsers support it natively without any proxy layer.

When to Use gRPC (The Decision Framework)

After all the technical details, here's the practical decision framework:

Use gRPC When:

Scenario	Why gRPC Wins
Internal service-to-service calls	Performance + type safety where you control both sides
Polyglot environments (Go + Java + Python services)	One .proto generates all client/server code, guaranteed compatibility
Streaming requirements	Built-in support for all four patterns, no WebSocket hacks
High-throughput data pipelines	Binary serialization saves bandwidth and CPU at scale
Strict API contracts across teams	.proto files as enforceable contracts with compile-time checks
Mobile clients on constrained networks	Smaller payloads = faster loads, less data usage

Do NOT Use gRPC When:

Scenario	Why REST/Other Is Better
Public-facing APIs for third-party developers	REST + JSON is universally understood; Protobuf adds a learning curve
Simple CRUD applications	gRPC's tooling overhead isn't worth it for basic apps
Browser-first applications without a proxy	gRPC-Web requires a proxy; REST works natively
Quick prototypes or MVPs	JSON is simpler to debug, test, and iterate on
APIs that need human readability	You can't `curl` a gRPC endpoint and read the response
Teams unfamiliar with Protobuf	Learning curve is real; REST has a lower barrier

The Common Production Pattern: REST + gRPC

The most successful large-scale systems don't choose one or the other — they use both:

                    Internet
                       |
                  [API Gateway]
                   /        \
            REST (JSON)    REST (JSON)
              /                \
    [Mobile App]          [Web Browser]
         |                      |
         +------→ [API Gateway] ←------+
                       |
                  gRPC (Protobuf)
                 /     |      \
        [User       [Event    [Payment
         Service]    Service]  Service]
              \       |        /
          gRPC (Protobuf) internally
                  |
           [Notification
            Service]

Public-facing layer: REST with JSON. Browsers and mobile apps send HTTP requests with JSON bodies. Easy to debug, easy to document (OpenAPI/Swagger), easy for third-party developers.

Internal layer: gRPC with Protocol Buffers. Services communicate with binary messages over HTTP/2. Fast, type-safe, and streamable. Teams define contracts in .proto files and generate code in whatever language they prefer.

This is the pattern at Google, Netflix, Uber, Lyft, and most companies operating at scale. It's not theoretical — it's the industry standard.

gRPC in System Design Interviews

Here's exactly how and when to bring up gRPC in an interview:

During the API Design step: Design the user-facing REST API. This is what the interviewer expects. Don't design internal RPC interfaces here.

API Design:
  GET  /events                    → List events
  GET  /events/:id                → Get event details
  POST /events/:id/bookings       → Create a booking
  GET  /bookings/:id              → Get booking details

During the High-Level Design step: When you draw boxes for internal services, mention gRPC:

"The Event Service and Booking Service communicate over gRPC for type safety and performance. Since they're both internal services that we control, the binary serialization and compile-time contracts reduce integration bugs."

When discussing real-time features: If the system requires streaming data (live feeds, notifications, chat):

"For the live price updates, the client subscribes via gRPC server streaming. This avoids the overhead of polling and gives us a persistent stream of updates."

When asked about tradeoffs: This is where you show depth:

"We use REST for the public API because it's universally accessible and easy to document. Internal services use gRPC because we control both sides, need type safety across our Go and Java services, and the binary format reduces bandwidth between our data centers."

Interview Tip

A common mistake is spending 5 minutes explaining Protocol Buffers during the API step. Don't do this. Mention gRPC in one sentence during high-level design, and only go deeper if the interviewer asks. The API step is for user-facing endpoints. Save gRPC for the architecture discussion.

Quick Reference: RPC and gRPC Cheat Sheet

Concept	Key Point
RPC	Call remote functions as if they were local
gRPC	Google's modern RPC: Protobuf + HTTP/2
Protocol Buffers	Binary serialization format, 5-10x smaller than JSON
Why it's fast	Binary encoding (Protobuf) + HTTP/2 multiplexing. NOT "faster than HTTP"
.proto files	Single source of truth, generates code in any language
Field numbers	Used for wire encoding; never change them
Streaming	4 patterns: unary, server, client, bidirectional
Deadlines	Propagate automatically through the call chain
Load balancing	Needs L7 LB or service mesh (not simple L4)
gRPC-Web	Proxy-based solution for browser clients
Production pattern	REST (public) + gRPC (internal)
Interview usage	Mention during high-level design, not API step
## Interview Expectations: Junior vs. Senior

Junior/Mid-level: Can mention streaming as a feature of gRPC but might struggle to differentiate between server, client, and bidirectional streaming.
Senior/Staff: Proposes gRPC streaming for specific, high-throughput use cases (like real-time telemetry, continuous log shipping, or video chunking). Understands that gRPC streaming requires HTTP/2 end-to-end, which complicates load balancing (requiring L7 load balancers that understand HTTP/2).