Exactly-Once Semantics and Transactions
TL;DR
Kafka's exactly-once semantics are real but narrow -- they guarantee each message is processed exactly once within Kafka, but your downstream systems still need idempotent consumers to be truly safe.
What It Is

Delivery semantics describe how many times a message gets processed. Three options exist. Most people only understand two of them, and misunderstand the third.
At-Most-Once
Commit the offset before processing the message. If the consumer crashes after committing but before finishing processing, the message is lost. When the consumer restarts, it picks up from the committed offset and skips the unprocessed message.
1. Receive message at offset 42
2. Commit offset 43 ← "I'm done with 42"
3. Start processing message
4. 💥 Consumer crashes
5. Restart → resume from offset 43
6. Message 42 is never processed
When it's acceptable: metrics collection, logging, monitoring. A missing data point every few hours doesn't break anything.
When it's not: payment processing, inventory updates, user notifications. Losing a single message means real business impact.
At-Least-Once
Commit the offset after processing the message. If the consumer crashes after processing but before committing, the message is reprocessed on restart.
1. Receive message at offset 42
2. Process message (write to database)
3. 💥 Consumer crashes before commit
4. Restart → resume from offset 42
5. Process message AGAIN (duplicate write to database)
This is the default for most Kafka consumers. It's safe -- you never lose data. But you might process the same message twice.
The fix: make your consumer idempotent. If processing a message twice produces the same result as processing it once, duplicates don't matter.
Idempotency patterns:
- Database upsert:
INSERT ... ON CONFLICT DO UPDATE-- same data written twice, same result. - Deduplication key: Store a
(message_id, partition, offset)in a processed-messages table. Skip if already seen. - Conditional update:
UPDATE balance SET amount = amount - 10 WHERE transaction_id = 'txn-123' AND NOT EXISTS (SELECT 1 FROM processed WHERE id = 'txn-123').
Exactly-Once
Each message is processed exactly once. No losses, no duplicates. Sounds perfect. But the details matter.
The Idempotent Producer
The first building block. Enabled with one setting:
Since Kafka 3.0, this is the default. You get it for free.
How It Works
When the producer starts, the broker assigns it a Producer ID (PID) and a sequence number starting at 0. Each message sent includes the PID and a per-partition sequence number.
Producer PID=7, sending to Partition 3:
Message A: sequence=0
Message B: sequence=1
Message C: sequence=2
The broker tracks the last sequence number per (PID, partition). If it receives a message with a sequence number it's already seen, it rejects it as a duplicate. If it receives a sequence number that's too far ahead (gap), it rejects it as an out-of-order error.
What It Solves
Network retries. The producer sends message B, the broker writes it, the acknowledgment is lost. The producer retries. Without idempotency, the broker writes message B again -- duplicate. With idempotency, the broker sees sequence=1 already exists and returns a success without writing again.
What It Doesn't Solve
If the producer crashes and restarts, it gets a new PID. The sequence numbers reset. The broker treats it as a new producer. If the producer had sent a message right before crashing that was acknowledged by the broker but not by the producer, the restarted producer might send it again with a new PID. The broker can't detect this as a duplicate.
Idempotent producers prevent duplicates from retries. They don't prevent duplicates from producer restarts. That's what transactions are for.
Kafka Transactions
Transactions allow atomic writes across multiple partitions and topics.
The Read-Process-Write Pattern
This is the most common use case. A consumer reads from an input topic, processes the message, and writes to an output topic. The offset commit and the output write must happen atomically.
Without transactions:
1. Read from input topic (offset 42)
2. Process message
3. Write result to output topic ← could fail
4. Commit offset 43 on input topic ← could fail independently
If step 3 succeeds but step 4 fails, you reprocess message 42 and write a duplicate to the output topic. If step 4 succeeds but step 3 fails, you skip the message.
With transactions:
1. Read from input topic (offset 42)
2. Begin transaction
3. Write result to output topic
4. Commit offset 43 on input topic
5. Commit transaction ← atomic: all or nothing
Either both the output write and the offset commit succeed, or neither does. No duplicates. No lost messages.
How Transactions Work
from kafka import KafkaProducer
producer = KafkaProducer(
bootstrap_servers='localhost:9092',
transactional_id='order-processor-1', # must be unique per producer instance
enable_idempotence=True # required for transactions
)
producer.init_transactions()
try:
producer.begin_transaction()
# Write to output topic
producer.send("processed-orders", key=b"order-123", value=b"result")
# Commit consumer offsets as part of the transaction
producer.send_offsets_to_transaction(
offsets={TopicPartition("orders", 0): OffsetAndMetadata(43)},
group_id="order-processor"
)
producer.commit_transaction()
except Exception:
producer.abort_transaction()
The Transaction Coordinator
Each Kafka broker can act as a transaction coordinator. The coordinator manages the state of active transactions. It writes transaction markers (COMMIT or ABORT) to the relevant partitions.
Transaction lifecycle:
1. Producer registers with coordinator (via transactional.id)
2. Producer begins transaction
3. Producer sends messages to partitions (marked as "uncommitted")
4. Producer requests commit
5. Coordinator writes PREPARE marker to transaction log
6. Coordinator writes COMMIT markers to all involved partitions
7. Messages become visible to consumers with read_committed
The transactional.id persists across producer restarts. When a producer with the same transactional.id reconnects, the coordinator fences the old producer instance. This prevents zombie producers from committing transactions started by a crashed instance.
Spicy take: Kafka transactions are a two-phase commit protocol disguised as a streaming feature. The coordinator is the transaction manager. The partitions are the participants. It has the same failure modes as any 2PC system -- coordinator failure during the prepare phase can leave transactions in limbo. Kafka handles this with timeout-based resolution, but it's not magic.
Consumer Side: read_committed
Transactions are useless if consumers read uncommitted messages. The consumer needs to opt in:
With read_committed:
- The consumer only sees messages from committed transactions.
- Aborted transaction messages are skipped.
- Non-transactional messages are read normally.
Without read_committed (default is read_uncommitted), the consumer sees all messages, including ones from transactions that will be aborted. This defeats the purpose.
The LSO (Last Stable Offset)
The consumer won't read past the Last Stable Offset -- the offset of the earliest open (uncommitted) transaction. This means a long-running transaction can block consumer progress on that partition.
Partition offsets:
[0] [1] [2] [3] [4] [5] [6] [7] [8]
^ ^
Open txn Latest message
(uncommitted)
LSO = 3. Consumer with read_committed stops here.
Messages 3-8 are invisible until the transaction commits or aborts.
If a producer starts a transaction and crashes without committing, the LSO is stuck. The transaction eventually times out (transaction.timeout.ms, default 60 seconds), the coordinator aborts it, and the LSO advances.
Interview tip: "Long-running transactions block read_committed consumers. We keep transactions short -- read, process, write, commit. No multi-minute transactions."
End-to-End Exactly-Once
Putting it all together:
Exactly-Once = Idempotent Producer
+ Transactional Writes (atomic offset commit + output)
+ Consumer read_committed
+ Idempotent Processing Logic
Wait -- Idempotent Processing Too?
Yes. And this is the part most articles skip.
Kafka's exactly-once guarantees that a message is delivered and committed exactly once within Kafka. But what if your consumer writes to an external system?
If step 2 succeeds (PostgreSQL write committed) and step 3 fails (Kafka transaction aborted), the consumer retries. It reads the message again and writes to PostgreSQL again. Duplicate in PostgreSQL.
Kafka can't roll back a PostgreSQL write. It only controls what happens inside Kafka. External systems are outside the transaction boundary.
Fix: Make the PostgreSQL write idempotent. Use a unique constraint on the message ID. Use INSERT ... ON CONFLICT DO NOTHING. Or store the Kafka offset in the same PostgreSQL transaction as the business write, and check it before processing.
This is why some engineers call it "effectively once" rather than "exactly once." The Kafka side is exact. The end-to-end system needs application-level help.
When Exactly-Once Matters
Not always. The operational complexity and performance cost of transactions are real. Choose based on the use case.
Worth the Cost
| Use Case | Why |
|---|---|
| Financial transactions | Double-charging a credit card is unacceptable |
| Ad click counting | Advertisers pay per click. Duplicates mean overcharging. |
| Inventory management | Double-decrementing stock means overselling |
| Account balance updates | Duplicate credits/debits corrupt balances |
Uber uses exactly-once semantics for their driver payment pipeline. A duplicate payment event means paying a driver twice. The cost of a duplicate far exceeds the cost of the transaction overhead.
Not Worth the Cost
| Use Case | Why At-Least-Once Is Fine |
|---|---|
| Logging | A duplicate log line is harmless |
| Metrics aggregation | Counters are approximate anyway |
| Search indexing | Re-indexing a document is idempotent |
| Notifications | A duplicate "your order shipped" email is mildly annoying, not catastrophic |
| CDC replication | Upserts are idempotent by nature |
For these, at-least-once delivery with an idempotent consumer is simpler, faster, and cheaper. Don't pay the transaction tax when you don't need to.
Performance Impact
Transactions aren't free.
| Factor | Without Transactions | With Transactions |
|---|---|---|
| Throughput | 100% | ~70-80% |
| Latency | Baseline | +5-15ms per transaction commit |
| Broker CPU | Normal | Higher (coordinator overhead) |
| Consumer lag | Real-time | Bounded by LSO (open transactions) |
| Complexity | Low | Medium (transactional.id management, fencing) |
The throughput hit comes from the two-phase commit protocol and the synchronous nature of transaction commits. Batching multiple messages into one transaction amortizes the cost.
# Bad: one transaction per message (high overhead)
for msg in messages:
producer.begin_transaction()
producer.send("output", msg)
producer.commit_transaction()
# Good: one transaction per batch (amortized overhead)
producer.begin_transaction()
for msg in batch:
producer.send("output", msg)
producer.commit_transaction()
Patterns for System Design Interviews
Pattern 1: Exactly-Once Stream Processing
"Consumer reads from raw-events, transforms the data, writes to processed-events. Using transactional producer with read_committed consumers. Offset commit and output write are atomic. If the processor crashes, it resumes from the last committed offset without duplicating output."
Pattern 2: Idempotent External Writes
"Consumer reads payment events from Kafka. Writes to PostgreSQL with INSERT INTO payments (id, amount, status) VALUES (\$1, \$2, \$3) ON CONFLICT (id) DO NOTHING. The payment ID is derived from the Kafka message key. Duplicate processing attempts are no-ops. We use at-least-once delivery with idempotent writes instead of Kafka transactions because the external write can't participate in the Kafka transaction."
Pattern 3: Hybrid Approach
"For Kafka-to-Kafka processing (enrichment, filtering), we use exactly-once with transactions. For Kafka-to-database writes, we use at-least-once with idempotent consumers. This minimizes transaction overhead while maintaining correctness."
Anti-Pattern to Call Out
"We would NOT use exactly-once for our logging pipeline. Logging generates millions of messages per second. The 20-30% throughput hit from transactions would require adding brokers. A duplicate log line is harmless. At-least-once is the right choice here."

Trade-Offs Table
| Factor | At-Most-Once | At-Least-Once | Exactly-Once |
|---|---|---|---|
| Data loss | Possible | No | No |
| Duplicates | No | Possible | No (within Kafka) |
| Throughput | Highest | High | Lower (70-80%) |
| Latency | Lowest | Low | Higher (+5-15ms) |
| Complexity | Simple | Simple + idempotent consumer | Complex (transactions + fencing) |
| External writes | N/A | Needs idempotent consumer | Still needs idempotent consumer |
| Use case | Metrics, logs | Most applications | Financial, billing, inventory |
Interview Gotchas
"Does Kafka guarantee exactly-once delivery?"
"Within Kafka, yes -- with idempotent producers, transactions, and read_committed consumers. But end-to-end exactly-once requires the consumer's external writes to be idempotent too. Kafka can't roll back a PostgreSQL INSERT. So it's exactly-once within the Kafka boundary and effectively-once end-to-end."
"What's the difference between idempotent producer and transactions?"
"Idempotent producers prevent duplicates from network retries on a single partition. Transactions provide atomic writes across multiple partitions and topics. You need idempotent producers for transactions, but you can use idempotent producers without transactions."
"What happens if the transaction coordinator crashes?"
"Another broker takes over as coordinator. Pending transactions are resolved based on the transaction log. If a transaction was in the PREPARE state, the new coordinator aborts it after transaction.timeout.ms. If it was already COMMITTED, the new coordinator completes the commit markers. No data loss."
"How do you handle zombie producers?"
"The transactional.id enables producer fencing. When a new producer instance registers with the same transactional.id, the coordinator increments the epoch. The old producer's requests are rejected with ProducerFenced error. This prevents a crashed producer that comes back to life from committing stale transactions."
"Why not just use exactly-once everywhere?"
"Performance cost (20-30% throughput), operational complexity (transactional.id management, LSO monitoring, fencing), and it doesn't help with external writes anyway. Most systems work perfectly with at-least-once delivery and idempotent consumers. Reserve exactly-once for the critical paths where duplicates have real business cost."
Summary
Kafka's exactly-once semantics are built on three layers: idempotent producers (prevent retry duplicates), transactions (atomic multi-partition writes with offset commits), and read_committed consumers (skip uncommitted data). Together they guarantee each message is processed exactly once within Kafka. But the moment your consumer writes to an external system, you need application-level idempotency. Know when exactly-once is worth the cost -- financial transactions, yes; logging, no. This nuanced understanding is what separates a strong system design answer from a textbook recitation.