Guide

What is a data product?

Short version: a data product is a dataset built like software — owned, tested, documented, and governed — so the rest of the org can trust it and build on it without re-checking the work.

5 min read · Blunox

"Data product" gets used two ways, and it's worth separating them.

The older sense — popularized by DJ Patil around 2012 — is a product powered by data: a recommendation engine, a fraud score, "people you may know." The data is the fuel; the product is the feature.

The modern sense, and the one most data teams mean today, comes out of the data mesh movement (Zhamak Dehghani, 2019) and the rise of analytics engineering. Here the dataset itself is the product. Instead of a raw table someone dumped into the warehouse and hoped for the best, a data product is a curated, reliable unit of data that a domain team owns and serves to everyone else — with the same discipline you'd expect from shipped software.

A data product isn't "some data." It's data someone stands behind.

What makes a dataset a "product"

The shift is from data as exhaust to data as something you ship on purpose. In practice that means a data product has:

Analytics engineering — the practice popularized alongside tools like dbt — is really the operational answer to this: apply version control, testing, documentation, and CI/CD to your data transformations so the output is trustworthy and reusable. That rigor is what turns a dataset into a product.

Why it matters now

Two forces made the term unavoidable. First, scale: as data teams grew, the "one big lake, everyone fends for themselves" model produced datasets nobody trusted. Treating data as a product — with owners and SLAs — was the fix. Second, AI: models and agents are only as good as the data underneath them. A wrong number that reaches a board deck or an AI agent erodes trust fast. Governed, tested data products are the foundation that makes AI on your data safe to ship.

How Blunox thinks about it

When we say Blunox Mira builds data products in days, we mean exactly this modern sense: its agents don't just move data around — they produce datasets that arrive tested, documented, and governed, with a human approving the plan. The output isn't a raw table you still have to vet. It's a data product your team can stand behind on day one.