Engineering

Exactly-once delivery in data pipelines

Exactly-once delivery means every record lands in the destination exactly one time — no drops, no duplicates — even when the machines in between crash, retry, or lose the network. It's the guarantee analytics quietly depends on, and the hardest one to make honest.

7 min read · Blunox

Every data pipeline makes a promise about how many times a record shows up at the other end. Most teams never read the fine print — until a revenue number is double-counted and everyone spends a Friday figuring out why. That promise has a name: the delivery guarantee. There are three of them, and exactly-once delivery is the one you actually want.

The three delivery guarantees

A pipeline reads from a source, moves records across a network, and writes them to a destination. When a step fails partway through, the system has to decide: resend the record, or move on? That single decision defines which guarantee you get.

At-most-once. Send each record and never retry. If a write fails or a process dies mid-batch, the record is simply lost. You never get duplicates — but you can silently drop data. Fine for disposable telemetry; a disaster for anything you count on.
At-least-once. Retry until the destination acknowledges the write. Nothing is ever lost — but a record can arrive twice if an acknowledgment is missed and the pipeline resends. This is the default for most systems because it's safe against loss and cheap to build. The cost is duplicates.
Exactly-once. Every record lands precisely one time. No loss, no duplication, regardless of how many times a step retries under the hood. This is what people mean when they say a pipeline "just works" — and it takes real machinery to deliver.

The trap is that at-least-once looks like exactly-once most of the time. Duplicates only appear at the seams — a crash between writing and committing, a timeout that was actually a success. Everything passes in testing, then breaks in production during the one outage that matters.

Why exactly-once is hard

The difficulty comes from a fact of distributed systems: you cannot both perform an action and record that you performed it in a single, indivisible step across two machines. Write the record to the destination, then update your position in the source — and if the process dies in the gap between those two operations, you don't know which one happened.

So on restart you face an impossible-to-answer question. Did that last batch commit before the crash, or not? If you assume it did and it didn't, you drop data (at-most-once). If you assume it didn't and it did, you resend and duplicate (at-least-once). Network partitions make it worse: a write can succeed at the destination while the acknowledgment never returns, so the sender believes it failed and tries again.

Exactly-once isn't a stronger version of retrying harder. It's making the duplicates that retries create harmless.

That reframing is the whole game. You can't stop a distributed system from occasionally doing work twice. What you can do is ensure the second attempt has no visible effect — that redelivery is safe by construction.

How systems actually achieve it

Real pipelines reach exactly-once by combining a few techniques, not by magic. Each one closes a different gap.

Idempotent writes. Give every record a stable identity and make writing it twice equivalent to writing it once — an upsert on a primary key, or a dedup key the destination rejects on collision. Now a retry is a no-op instead of a duplicate.
Checkpointing and offset tracking. The pipeline durably records how far it has processed — a log offset, a WAL position, a watermark. After a crash it resumes from the last committed checkpoint rather than guessing, so it never skips or reprocesses a committed range.
Transactional writes. Bundle the destination write and the offset advance into one atomic commit, so either both land or neither does. This closes the fatal gap between "wrote the data" and "recorded that we wrote it."
Deduplication. Where end-to-end transactions aren't possible, the destination keeps a window of recently seen record IDs and discards repeats. It's the safety net that turns an at-least-once transport into an exactly-once result.

Purists point out that true exactly-once — where a record physically crosses the wire only once — is impossible in an asynchronous network. What production systems actually deliver is effectively-once: records may be transmitted more than once, but through idempotency and checkpointing the observable outcome is identical to exactly-once. That's the honest version of the promise, and it's the one that matters, because what you care about is the final state of the data, not the packet count.

Why it matters for analytics

Delivery semantics feel like plumbing until you trace where the water goes. A pipeline that duplicates just 0.1% of an events stream will over-count sign-ups, inflate revenue, and skew every conversion rate built on top of it — and nobody will notice until the numbers are challenged in a meeting.

The damage compounds downstream. A double-counted order corrupts the daily revenue rollup, which feeds the executive dashboard, which trains the forecasting model, which sets next quarter's targets. This is especially acute for change data capture streams, where a replayed change event can flip a row to a stale value or resurrect a deleted record. Under at-least-once, "the dashboard is a little off" is not a bug you can fix after the fact — the wrong numbers are already baked into everything built on them.

What to check in a pipeline tool

Vendors say "exactly-once" freely. Treat it as a claim to verify, not a feature to check off. A few questions cut through the marketing:

Checkpoint and resume. Does it persist progress durably, and resume cleanly from the exact last committed position after a crash — without a full re-sync or a manual replay?
Offset tracking. Are source offsets committed atomically with the destination write, or updated separately? Separate updates are the classic source of the write-then-die gap.
Replay safety. If you reprocess a range — after a failure, a backfill, or an operator re-run — does the destination end in the same state, or do you get duplicates? Idempotent writes are what make replay safe.
Failure behavior, tested. Ask what happens on a mid-batch crash, a network partition, and a destination timeout. If the answer is vague, the guarantee is vague.

How Blunox thinks about it

Blunox Pulse guarantees exactly-once delivery, and it does so the boring, verifiable way. Every source position is a checkpointed WAL offset committed atomically with the write, so a crash never leaves the pipeline uncertain about what landed. On restart it resumes cleanly from the last committed offset — no re-sync, no manual replay, no duplicated rows. The result you get is effectively-once in the honest sense: replay any failure and the destination arrives at exactly the same state. That's not a checkbox on a spec sheet. It's the difference between a dashboard you can defend in a meeting and one you can't.

Book a demo Explore Blunox Pulse