Engineering

Log-based vs query-based CDC: which method to use

All change data capture answers the same question — "what changed, and when?" — but the three methods answer it very differently. Log-based CDC reads the database's write-ahead log after commit; query-based and trigger-based CDC don't. That one difference drives almost everything that matters.

7 min read · Blunox

If you're replicating a source database into a warehouse or stream, you'll choose one of three capture methods. Log-based CDC tails the transaction log. Query-based CDC polls the table on a schedule. Trigger-based CDC installs database triggers that write to a shadow table. They differ on the things you'll actually be paged about: load on your primary, whether deletes get captured, how fresh the data is, and how much of your weekend the setup consumes.

The three CDC methods in one minute

Every method is a different answer to "how do I find out a row changed?"

Query-based (timestamp-based). Poll the table: SELECT * WHERE updated_at > :last_run. Simple, works on any database, no special privileges.
Trigger-based. Install AFTER INSERT/UPDATE/DELETE triggers that record each change into an audit table you then read. Captures deletes, but runs inside every write.
Log-based (WAL-based). Read the write-ahead log the database already keeps for durability — Postgres WAL, MySQL binlog, Oracle redo, SQL Server transaction log. The change record is a byproduct of the commit, so you capture it without touching the tables at all.

For a broader primer on what CDC is and why it beats nightly batch reloads, see change data capture, explained. Here we're comparing the three ways to do it.

Query-based CDC

Query-based CDC is the one everyone reaches for first, because it needs nothing special: a timestamp or version column, and a scheduled job that pulls rows newer than the last high-water mark. It's genuinely fine for a lot of cases — a nightly sync of a reference table, a dashboard that can tolerate an hour of lag, a source you can't get replication access to.

The problems show up at the edges. It misses deletes. A deleted row simply stops appearing in your WHERE clause — there's no timestamp on a row that no longer exists, so your copy quietly keeps the stale record forever. It misses intermediate updates: if a row changes three times between polls, you see only the final state, which breaks any downstream logic that cares about the sequence. And it adds load to the source — every poll is a real query with a real scan, so tightening the interval to reduce lag directly increases pressure on the primary you're trying to protect. You end up trading freshness against source impact, and you can't win both.

Trigger-based CDC

Trigger-based CDC fixes the correctness gap: because a trigger fires inside the transaction, it sees deletes, and it sees every individual change in order. You install AFTER triggers on each tracked table that append the old and new row into a changelog table, then a reader drains that table.

The cost is that you've now put your capture logic on the write path. Every insert, update, and delete on the source does extra work — a second write into the audit table — inside the same transaction, so you've added latency and I/O to production writes, sometimes materially under load. Triggers are also brittle operationally: they're schema-coupled, so a column change can silently break capture; they multiply with every table you track; and they're application-visible surface area that has to be maintained, tested, and reasoned about during every migration. Powerful, but it makes your source database heavier and more fragile.

Query-based CDC asks the database what changed. Trigger-based CDC makes the database do extra work per write. Log-based CDC just reads what the database already wrote down.

Log-based CDC

Log-based CDC takes a different route entirely. Every durable database already writes a write-ahead log — the ordered record of committed changes it uses for crash recovery and replication. Log-based CDC reads that log after commit, decoding each change from a stream the database produces whether you're watching or not.

That single design choice fixes the trade-offs the other two methods force on you:

Near-zero source impact. There are no polling queries and no per-write triggers. Reading the log is cheap and happens off the hot path, so freshness costs the primary almost nothing.
Completeness. The log records every committed change — inserts, updates, and the deletes query-based CDC drops on the floor — plus every intermediate state, because the log is a full history, not a snapshot.
Ordering guarantees. The log is inherently ordered by commit. You get changes in exactly the sequence they happened, which is what makes correct, consistent replication possible.
Low latency. Changes appear in the log the moment they commit, so a reader tailing it delivers near-real-time streams without hammering the source.

The catch is setup: log-based CDC needs replication-level access (a replication slot, binlog enabled, supplemental logging), and decoding each engine's log format is non-trivial. That's the work you're paying for — and increasingly the work you hand to a tool rather than build yourself.

Choosing a method

Line the three up against the dimensions you'll be judged on:

Source impact: log-based is lowest (off-path), query-based scales with poll frequency, trigger-based taxes every write.
Completeness: log-based and trigger-based capture deletes and every change; query-based misses both.
Latency: log-based is near-real-time; query-based is bounded by poll interval; trigger-based is fast but at write-path cost.
Operational cost: query-based is easiest to stand up; trigger-based is brittle and schema-coupled; log-based needs privileged setup but is stable once running.

For real-time, complete, low-impact replication, log-based CDC wins — it's why it's the default under most serious streaming pipelines. But be honest about when simpler is enough: if a source only ever appends, never deletes, and an hour of lag is fine, query-based CDC is less machinery and perfectly correct. Match the method to the requirement, not to the hype.

How Blunox does it

Blunox Pulse uses log-based CDC. It reads the write-ahead log after commit — no triggers on your tables, no polling queries, no load on your primary — and delivers changes with exactly-once semantics, so every insert, update, and delete lands downstream once and in order. You get the completeness and freshness of log-based capture without owning the log-decoding and slot-management plumbing yourself.

Book a demo Explore Blunox Pulse