Change Data Capture (CDC), explained
Change data capture is the practice of watching a source database and streaming every insert, update, and delete the moment it happens — so downstream systems stay in sync without re-scanning the whole table.
Most data starts life in a transactional database — orders, users, payments, inventory. The rest of the business needs that data somewhere else: a warehouse for analytics, a search index, a cache, another service. The question is how you keep those copies current. Change data capture is the answer that scales.
What is change data capture?
Change data capture, or CDC, is a technique for capturing row-level changes in a source database — inserts, updates, and deletes — and emitting them as a stream of events as they occur. Instead of asking "what does this table look like right now?" and copying the whole thing, CDC asks "what changed since last time?" and ships only the delta.
Each captured change is a small, ordered record: the operation type, the affected row's before and after values, and enough metadata (a transaction ID, a log position, a timestamp) to replay it reliably somewhere else. Downstream, a consumer applies those records to keep its copy of the data in lockstep with the source.
CDC turns a database from a place you query into a stream you subscribe to.
Why CDC beats batch and polling
The old way to move data was a nightly batch job: dump the table, reload it downstream, repeat every 24 hours. Polling is the slightly-newer variant — query for rows changed since the last run, every few minutes. Both work, and both have real costs that CDC avoids:
- Freshness. Batch data is stale by definition. CDC delivers changes in seconds, so dashboards, indexes, and downstream services reflect reality instead of last night.
- Load on the source. Full table scans and repeated polling queries compete with production traffic. Log-based CDC reads the database's own change log, adding almost no query load.
- Completeness. Polling on a timestamp column silently misses deletes and can skip fast updates between polls. CDC captures every operation, including deletes, in order — nothing falls through the cracks.
How change data capture works — the main methods
There are three broad ways to implement CDC, and they trade off differently on latency, source impact, and completeness.
Log-based (WAL) CDC reads the database's transaction log directly — the write-ahead log in Postgres, the binlog in MySQL, the redo log in Oracle. Every committed change is already recorded there for durability and replication, so CDC just tails it. This is the gold standard: it captures every operation in commit order, adds negligible load because it isn't running queries, and doesn't require touching the schema. The cost is that it's more involved to build — you have to parse a database-specific log format and manage replication slots or offsets.
Trigger-based CDC attaches database triggers to the tables you care about, writing a row into an audit table on every insert, update, or delete. Its completeness is good and it works on databases without an accessible log. But triggers run inside the source transaction, so they add write latency and load to production, and maintaining triggers across many tables gets brittle.
Query-based (timestamp) CDC polls the source on a schedule, selecting rows where an updated_at column is newer than the last checkpoint. It's the simplest to set up and needs no special database access. The downsides are the ones above: it misses deletes, can miss rapid intermediate updates, and puts scan load on the source. It's a reasonable fallback, not a foundation.
Common use cases
Once you have a reliable stream of changes, a lot of previously-painful integration problems become straightforward:
- Real-time warehouse replication. Keep Snowflake, BigQuery, or Redshift continuously in sync with production, so analytics run on fresh data instead of last night's snapshot.
- Cache invalidation. When a row changes, emit an event that evicts or refreshes the corresponding cache entry — no more guessing at TTLs.
- Event-driven systems. Treat database changes as the source of truth for events. Other services react to "order shipped" or "user upgraded" without the source service having to publish anything extra.
- Analytics and ML pipelines. Feed feature stores and streaming aggregations with low-latency change streams so models train and score on current data.
- Search and index sync. Keep an Elasticsearch or vector index current as the underlying records evolve.
The common thread is that CDC decouples the systems producing data from the systems that need it. That's exactly the discipline behind a good data product — reliable, current data that other teams can build on without re-checking the work.
What to look for in a CDC tool
Capturing changes is the easy 80%. The hard 20% is what separates a demo from something you'd trust in production:
- Exactly-once delivery. Networks fail and processes restart. A CDC tool should guarantee each change lands downstream once — no duplicates, no gaps — even across crashes.
- Schema-drift handling. Source schemas change: columns get added, types change, tables get renamed. The pipeline should detect and adapt to that automatically instead of breaking or silently dropping data.
- Low source impact. The whole point is to not burden production. Prefer log-based capture that reads the change log rather than querying tables.
- Resumability. When a consumer stops and restarts, it should pick up exactly where it left off using durable offsets — no re-snapshotting the world, no missed changes in the gap.
- Ordering guarantees. Changes must be applied in the order they were committed, or downstream state ends up inconsistent.
Where Blunox fits: Blunox Pulse is a log-based CDC engine that streams changes from your source database to your warehouse in under a second, exactly-once, and schema-drift aware. It reads the write-ahead log directly, so it stays out of your production query path, and it resumes cleanly from durable offsets after any restart — the properties above, without you having to build them.