Engineering

Exactly-once delivery in data pipelines

Exactly-once delivery means every record lands in the destination exactly one time — no drops, no duplicates — even when the machines in between crash, retry, or lose the network. It's the guarantee analytics quietly depends on, and the hardest one to make honest.

7 min read · Blunox

Every data pipeline makes a promise about how many times a record shows up at the other end. Most teams never read the fine print — until a revenue number is double-counted and everyone spends a Friday figuring out why. That promise has a name: the delivery guarantee. There are three of them, and exactly-once delivery is the one you actually want.

The three delivery guarantees

A pipeline reads from a source, moves records across a network, and writes them to a destination. When a step fails partway through, the system has to decide: resend the record, or move on? That single decision defines which guarantee you get.

The trap is that at-least-once looks like exactly-once most of the time. Duplicates only appear at the seams — a crash between writing and committing, a timeout that was actually a success. Everything passes in testing, then breaks in production during the one outage that matters.

Why exactly-once is hard

The difficulty comes from a fact of distributed systems: you cannot both perform an action and record that you performed it in a single, indivisible step across two machines. Write the record to the destination, then update your position in the source — and if the process dies in the gap between those two operations, you don't know which one happened.

So on restart you face an impossible-to-answer question. Did that last batch commit before the crash, or not? If you assume it did and it didn't, you drop data (at-most-once). If you assume it didn't and it did, you resend and duplicate (at-least-once). Network partitions make it worse: a write can succeed at the destination while the acknowledgment never returns, so the sender believes it failed and tries again.

Exactly-once isn't a stronger version of retrying harder. It's making the duplicates that retries create harmless.

That reframing is the whole game. You can't stop a distributed system from occasionally doing work twice. What you can do is ensure the second attempt has no visible effect — that redelivery is safe by construction.

How systems actually achieve it

Real pipelines reach exactly-once by combining a few techniques, not by magic. Each one closes a different gap.

Purists point out that true exactly-once — where a record physically crosses the wire only once — is impossible in an asynchronous network. What production systems actually deliver is effectively-once: records may be transmitted more than once, but through idempotency and checkpointing the observable outcome is identical to exactly-once. That's the honest version of the promise, and it's the one that matters, because what you care about is the final state of the data, not the packet count.

Why it matters for analytics

Delivery semantics feel like plumbing until you trace where the water goes. A pipeline that duplicates just 0.1% of an events stream will over-count sign-ups, inflate revenue, and skew every conversion rate built on top of it — and nobody will notice until the numbers are challenged in a meeting.

The damage compounds downstream. A double-counted order corrupts the daily revenue rollup, which feeds the executive dashboard, which trains the forecasting model, which sets next quarter's targets. This is especially acute for change data capture streams, where a replayed change event can flip a row to a stale value or resurrect a deleted record. Under at-least-once, "the dashboard is a little off" is not a bug you can fix after the fact — the wrong numbers are already baked into everything built on them.

What to check in a pipeline tool

Vendors say "exactly-once" freely. Treat it as a claim to verify, not a feature to check off. A few questions cut through the marketing:

How Blunox thinks about it

Blunox Pulse guarantees exactly-once delivery, and it does so the boring, verifiable way. Every source position is a checkpointed WAL offset committed atomically with the write, so a crash never leaves the pipeline uncertain about what landed. On restart it resumes cleanly from the last committed offset — no re-sync, no manual replay, no duplicated rows. The result you get is effectively-once in the honest sense: replay any failure and the destination arrives at exactly the same state. That's not a checkbox on a spec sheet. It's the difference between a dashboard you can defend in a meeting and one you can't.