Partitions, Consumer Groups, and Ordering Guarantees
TL;DR
Kafka guarantees message order within a partition but not across partitions -- every design decision about partition keys, consumer groups, and parallelism flows from this one fact.
What It Is

A Kafka topic is a named stream of records. But a topic is just a logical concept. The real unit of work is the partition.
A partition is an ordered, immutable sequence of messages. Each message in a partition gets a unique, monotonically increasing offset. Partitions are spread across brokers. They're the unit of parallelism, the unit of replication, and the unit of consumer assignment.
If you remember one thing from this lesson: partition = unit of everything in Kafka.
Topics and Partitions
When you create a topic, you specify the number of partitions:
This creates 12 partitions, each replicated to 3 brokers.
Topic: orders
Partition 0: Broker 1 (leader), Broker 2, Broker 3
Partition 1: Broker 2 (leader), Broker 3, Broker 1
Partition 2: Broker 3 (leader), Broker 1, Broker 2
...
Partition 11: Broker 2 (leader), Broker 1, Broker 3
Partitions are distributed across brokers so that no single broker holds all partitions for a topic. This distributes load evenly.
Partition Keys: Controlling Where Messages Land
When a producer sends a message, it can include a key:
Kafka routes the message to a partition using:
All messages with the same key go to the same partition. Always. This is how you get ordering for related messages.
No Key = Round-Robin
If you don't provide a key, Kafka distributes messages across partitions using a sticky partitioner (batch to one partition, then switch). Messages are evenly distributed but have no ordering relationship.
Key Design Is Schema Design
Choosing the right partition key is as consequential as choosing a database shard key. The key determines:
- Ordering -- messages with the same key are ordered.
- Locality -- messages with the same key are on the same partition.
- Parallelism -- one consumer per partition, so all messages for a key go to one consumer.
- Hot partitions -- if one key has 80% of the traffic, one partition (and one consumer) gets 80% of the work.
| Partition Key | Ordering Guarantee | Distribution | Best For |
|---|---|---|---|
user_id |
All events for a user are ordered | Even (if many users) | User activity streams |
order_id |
All events for an order are ordered | Even | Order state machines |
null (no key) |
No ordering | Round-robin, even | Logs, metrics (order doesn't matter) |
country |
Per-country ordering | Skewed (US dominates) | Bad choice -- hot partition |
sensor_id |
Per-sensor ordering | Even (if many sensors) | IoT data |
Spicy take: 90% of Kafka production issues trace back to a bad partition key. Either the key has low cardinality (hot partition), or the developer forgot to set a key and wonders why messages arrive out of order.
Ordering Guarantees
This is the most misunderstood aspect of Kafka.
What Kafka Guarantees
- Messages within the same partition are ordered by offset. Producer sends A, B, C. Consumer reads A, B, C. Always.
- Messages across different partitions have no ordering. Producer sends A to partition 0 and B to partition 1. Consumer might read B before A.
The Wall-Clock Fallacy
"But message A has an earlier timestamp than message B!"
Doesn't matter. Timestamps don't determine ordering. Offsets do. And offsets are per-partition. There's no global offset across partitions.
Example: Order State Machine
An order goes through states: CREATED -> PAID -> SHIPPED -> DELIVERED.
If these events go to different partitions, a consumer might process SHIPPED before PAID. The state machine breaks.
Fix: use order_id as the partition key. All state transitions for the same order go to the same partition. Order is preserved.
# All events for order-456 go to the same partition
producer.send("order-events", key=b"order-456", value=b'{"status": "CREATED"}')
producer.send("order-events", key=b"order-456", value=b'{"status": "PAID"}')
producer.send("order-events", key=b"order-456", value=b'{"status": "SHIPPED"}')
When You Need Global Ordering
If you truly need global ordering across all messages, use one partition. This limits you to one consumer (no parallelism) and one broker's throughput. That's the trade-off.
Stripe processes payment events with ordering per merchant (partition key = merchant_id). They don't need global ordering across all merchants. A payment for Merchant A and a payment for Merchant B are independent. Different partitions, different consumers, full parallelism.
Consumer Groups
A consumer group is a set of consumers that cooperate to read from a topic.
The Core Rule
Each partition is assigned to exactly one consumer within a group. But one consumer can read from multiple partitions.
Topic: orders (6 partitions)
Consumer Group: order-processing
Consumer A: reads Partition 0, 1
Consumer B: reads Partition 2, 3
Consumer C: reads Partition 4, 5
Three consumers, six partitions, each consumer handles two partitions. Add a fourth consumer:
Consumer A: reads Partition 0, 1
Consumer B: reads Partition 2, 3
Consumer C: reads Partition 4
Consumer D: reads Partition 5
Add a seventh consumer? It sits idle. Six partitions, six consumers max. The seventh consumer has nothing to read.
This is why partition count limits parallelism. If you create a topic with 3 partitions, you can never have more than 3 active consumers in a group. Plan your partition count based on your expected maximum consumer count.
Multiple Consumer Groups
Different consumer groups read the same topic independently. Each group maintains its own offsets.
Topic: orders (6 partitions)
Consumer Group "order-processing":
Consumer A: Partitions 0, 1, 2
Consumer B: Partitions 3, 4, 5
Consumer Group "analytics":
Consumer X: Partitions 0, 1, 2, 3, 4, 5
The order-processing group and the analytics group both read all messages. They don't interfere with each other. This is how Kafka enables the "write once, read many" pattern.
Rebalancing
When a consumer joins or leaves a group, Kafka rebalances -- reassigns partitions across the remaining consumers.
Triggers
- Consumer joins the group (new instance deployed).
- Consumer leaves the group (crash, shutdown, network issue).
- Consumer fails to send a heartbeat within
session.timeout.ms(default: 45 seconds). - Consumer takes too long to process a batch (exceeds
max.poll.interval.ms, default: 5 minutes). - Topic gets new partitions.
What Happens During Rebalancing
- All consumers in the group stop processing.
- The group coordinator (a Kafka broker) collects partition assignments.
- The coordinator computes new assignments using the configured strategy.
- Each consumer gets its new partition assignment.
- Consumers resume processing.
The problem: during rebalancing, the entire group stops. For a group with 20 consumers, a single consumer restart pauses processing for all 20. This can last seconds to minutes.
Rebalancing Strategies
| Strategy | How It Works | Pros | Cons |
|---|---|---|---|
| Range | Assigns consecutive partitions to each consumer | Simple | Uneven distribution with few partitions |
| RoundRobin | Distributes partitions one at a time across consumers | Even distribution | All partitions reassigned on change |
| Sticky | Minimizes partition movement during rebalance | Fewer reassignments | Slightly more complex |
| CooperativeSticky | Incremental rebalance -- only affected partitions stop | Minimal disruption | Requires all consumers to support it |
Use CooperativeSticky (Kafka 2.4+). It's strictly better than the alternatives. Only the partitions that need to move are reassigned. Other partitions continue processing uninterrupted.
Avoiding Unnecessary Rebalances
# Increase session timeout -- don't trigger rebalance for brief GC pauses
session.timeout.ms=45000
# Increase poll interval -- give slow consumers more time
max.poll.interval.ms=300000
# Set a static group member ID -- survive restarts without rebalancing
group.instance.id=consumer-host-1
Static group membership (Kafka 2.3+) is a game-changer. Assign a fixed group.instance.id to each consumer. When a consumer restarts, Kafka recognizes it as the same member and reassigns the same partitions without triggering a full rebalance.
DoorDash reduced their rebalancing incidents by 80% after switching to static group membership. Their order processing pipeline couldn't afford the 30-second pauses that full rebalances caused during peak hours.
Offsets: Tracking Progress
Each consumer tracks its position in each partition using offsets. The offset is the index of the next message to read.
Where Offsets Are Stored
Offsets are stored in a special Kafka topic: __consumer_offsets. Each consumer group writes its current offsets to this topic. When a consumer restarts or a partition is reassigned, the new consumer reads the last committed offset and resumes from there.
Commit Strategies
Auto-Commit
The consumer automatically commits offsets every 5 seconds. Simple, but dangerous.
The problem: the consumer reads messages, starts processing them, and the auto-commit fires before processing finishes. If the consumer crashes, the offset was already committed. Those messages are "consumed" but never processed. Data loss.
Manual Commit (Synchronous)
Commit after processing. If the consumer crashes before committing, the message is reprocessed on restart. At-least-once delivery. Safe.
The downside: one commit per message is slow. Commit in batches instead:
batch = consumer.poll(timeout_ms=1000)
for message in batch:
process(message)
consumer.commit() # commit once per batch
Manual Commit (Asynchronous)
Non-blocking. Higher throughput. But if the async commit fails and you don't retry, you might reprocess messages.
The Offset Reset Question
What happens when a consumer starts and there's no committed offset? The auto.offset.reset setting decides:
| Setting | Behavior | Use Case |
|---|---|---|
earliest |
Start from the beginning of the partition | Process all historical data |
latest |
Start from the latest message | Skip history, only process new data |
none |
Throw an error | Fail explicitly, don't guess |
For new consumer groups processing critical data, use earliest. For monitoring/alerting consumers that only care about new events, use latest.
Replication: Fault Tolerance
Each partition is replicated to multiple brokers.
Leader and Followers
One replica is the leader. All reads and writes go through the leader. Other replicas are followers. They replicate data from the leader by fetching from its log.
ISR (In-Sync Replicas)
The ISR is the set of replicas that are "caught up" with the leader. A replica stays in the ISR if it replicates within replica.lag.time.max.ms (default: 30 seconds).
If a follower falls behind (slow disk, network issue), it's removed from the ISR. When it catches up, it rejoins.
Write Acknowledgment
The acks producer setting interacts with ISR:
| acks | What It Means | Durability | Latency |
|---|---|---|---|
0 |
Don't wait for any ack | None | Lowest |
1 |
Leader ack | Survives follower failure | Low |
all |
All ISR ack | Survives any single failure | Highest |
With acks=all and min.insync.replicas=2 on a topic with replication.factor=3:
- All 3 replicas alive: write succeeds after 2 acks (ISR = 3, min = 2).
- 1 replica down: write succeeds after 2 acks (ISR = 2, min = 2).
- 2 replicas down: write fails (ISR = 1, min = 2). The broker returns
NotEnoughReplicas.
This is the correct production configuration for critical data. It guarantees that at least 2 copies of every message exist before acknowledging.
Unclean Leader Election
If all ISR replicas die and only an out-of-sync replica remains, should it become the leader?
unclean.leader.election.enable=true: Yes. Data loss is possible (the out-of-sync replica may be missing recent messages). But the partition is available.unclean.leader.election.enable=false(default since Kafka 0.11): No. The partition is unavailable until an ISR replica recovers. No data loss.
Interview shortcut: "We disable unclean leader election because we prefer unavailability over data loss."
Partition Count: The One-Way Door
You can increase partition count. You cannot decrease it. This is a one-way door.
Choosing the Right Partition Count
| Factor | Guidance |
|---|---|
| Consumer parallelism | At least as many partitions as your maximum consumer count |
| Throughput | More partitions = more parallel I/O. But diminishing returns past 20-30 per broker. |
| Ordering | More partitions = weaker global ordering. Use partition keys. |
| End-to-end latency | More partitions = more replication overhead. Slight latency increase. |
| Broker memory | Each partition uses ~10KB of broker memory for metadata. 10K partitions = 100MB. |
| Recovery time | More partitions = longer leader election during broker failure |
Rule of thumb: Start with max(expected_consumers * 2, expected_throughput_MB / 10). For most topics, 6-24 partitions is the sweet spot. Don't create 1000 partitions "just in case."
What Happens When You Add Partitions
Existing messages stay in their current partitions. New messages are distributed across all partitions (including new ones). But here's the gotcha: if you use partition keys, adding partitions changes the key-to-partition mapping.
hash("user-123") % 12 gives a different result than hash("user-123") % 16. Messages for user-123 that were in partition 3 now go to partition 7. Ordering is broken for in-flight data.
This is why you should over-provision partitions slightly at creation time. Adding partitions later requires careful coordination.
Patterns for System Design Interviews
Pattern 1: Event-Driven Order Processing
"Topic orders with partition key order_id. Each order's events (created, paid, shipped, delivered) land in the same partition, preserving order. Consumer group order-processor with one consumer per partition. Static group membership to avoid rebalancing during deployments."
Pattern 2: Multi-Tenant Activity Feed
"Topic user-activity with partition key tenant_id. All events for a tenant land on the same partition. Each tenant's events are processed in order. Different consumer groups: one for real-time notifications, one for analytics, one for audit logging."
Pattern 3: Fan-Out to Multiple Services
"Topic payments with 3 consumer groups: fraud-detection, accounting, notifications. Each group reads all messages independently. Fraud detection needs low latency (small fetch.max.wait.ms). Accounting needs reliability (enable.auto.commit=false, manual commit after processing). Notifications tolerate occasional duplicates."

Trade-Offs Table
| Factor | Few Partitions (1-3) | Many Partitions (50+) |
|---|---|---|
| Ordering | Stronger (fewer partition boundaries) | Weaker (must rely on keys) |
| Parallelism | Limited consumers | High parallelism |
| Throughput | Limited by single partition I/O | High aggregate throughput |
| Rebalancing | Fast (few assignments) | Slow (many assignments) |
| Broker memory | Minimal | Noticeable at 10K+ |
| Recovery time | Fast leader election | Slower leader election |
| Key distribution | More prone to hot partitions | Better key distribution |
| Operational complexity | Low | Higher (monitoring, tuning) |
Interview Gotchas
"How do you guarantee ordering in Kafka?"
"Order is guaranteed within a partition. To order related messages, use a partition key that groups them. For example, all events for order-123 use order-123 as the key. They land in the same partition and are consumed in order. We cannot guarantee ordering across partitions."
"What happens when a consumer is slow?"
If a consumer takes longer than max.poll.interval.ms to process a batch, Kafka assumes it's dead and triggers a rebalance. Its partitions get reassigned to other consumers. The slow consumer then tries to commit offsets for partitions it no longer owns -- commit fails. Fix: increase max.poll.interval.ms or reduce batch size.
"Can two consumers read the same partition?"
Not within the same consumer group. But different consumer groups can. This is how Kafka enables multiple independent consumers on the same data.
"What's the difference between Kafka partitions and database shards?"
Conceptually similar -- both split data across nodes. But Kafka partitions are append-only and immutable. Database shards support random reads, updates, and deletes. Kafka partitions are optimized for sequential access. Database shards are optimized for random access.
"How do you handle partition skew?"
If one partition key has disproportionate traffic (celebrity user, large tenant), that partition becomes a hot spot. Solutions: use a compound key (tenant_id + sub_key), hash the key for even distribution, or dedicate a separate topic for high-volume producers.
Summary
Partitions are the heart of Kafka's architecture. They determine parallelism, ordering, and fault tolerance. The partition key choice is a one-way decision that affects every consumer. Consumer groups enable parallel processing with automatic partition assignment, but rebalancing is expensive -- use CooperativeSticky and static membership. Offsets track consumer progress -- commit manually for at-least-once semantics. Replication with ISR keeps data safe. Get these fundamentals right and the rest of Kafka falls into place.