What is a data product?
Short version: a data product is a dataset built like software — owned, tested, documented, and governed — so the rest of the org can trust it and build on it without re-checking the work.
"Data product" gets used two ways, and it's worth separating them.
The older sense — popularized by DJ Patil around 2012 — is a product powered by data: a recommendation engine, a fraud score, "people you may know." The data is the fuel; the product is the feature.
The modern sense, and the one most data teams mean today, comes out of the data mesh movement (Zhamak Dehghani, 2019) and the rise of analytics engineering. Here the dataset itself is the product. Instead of a raw table someone dumped into the warehouse and hoped for the best, a data product is a curated, reliable unit of data that a domain team owns and serves to everyone else — with the same discipline you'd expect from shipped software.
A data product isn't "some data." It's data someone stands behind.
What makes a dataset a "product"
The shift is from data as exhaust to data as something you ship on purpose. In practice that means a data product has:
- An owner. A named team or person is responsible for it — not "whoever wrote the query two years ago."
- Tests. Freshness, volume, schema, and business-rule checks run automatically, so consumers know it's right.
- Documentation. What it means, where it came from, and how to use it — not tribal knowledge in someone's head.
- Lineage & governance. You can trace it back to source and forward to every consumer, with access controls and an audit trail.
- Discoverability. People can find it, trust it, and reuse it instead of rebuilding a slightly-different version.
Analytics engineering — the practice popularized alongside tools like dbt — is really the operational answer to this: apply version control, testing, documentation, and CI/CD to your data transformations so the output is trustworthy and reusable. That rigor is what turns a dataset into a product.
Why it matters now
Two forces made the term unavoidable. First, scale: as data teams grew, the "one big lake, everyone fends for themselves" model produced datasets nobody trusted. Treating data as a product — with owners and SLAs — was the fix. Second, AI: models and agents are only as good as the data underneath them. A wrong number that reaches a board deck or an AI agent erodes trust fast. Governed, tested data products are the foundation that makes AI on your data safe to ship.
How Blunox thinks about it
When we say Blunox Mira builds data products in days, we mean exactly this modern sense: its agents don't just move data around — they produce datasets that arrive tested, documented, and governed, with a human approving the plan. The output isn't a raw table you still have to vet. It's a data product your team can stand behind on day one.