Let’s Be Clear: A Data Asset is Not a Data Product
ABSTRACT: Most definitions of a data product conflate it with a data asset. The primary thing that makes a data asset a data product is where it resides: in a data store.
If there is one thing in which our industry excels, it’s concocting new terms for existing things. The latest is “data product.” Today, many people use that term to describe data assets: datasets, SQL queries, dashboards, reports, ML models, or data components. But these types of assets have existed for decades. So, why the name change?
I suppose the term “data product” sounds more important and valuable than “data asset”. Or maybe it’s that a data product sounds like something that a sophisticated data team produces. Or maybe it’s because data is now an intrinsic part of honest-to-goodness digital products that generate revenue. Whatever the case, we shouldn’t call data assets “data products” just because it’s trendy.
What is a Data Product?
I believe there is a subtle, but fundamental difference between a data asset and data product. This might be a tad radical, but bear with me. I think a data product is a data asset that has all the characteristics of something that can be bought and sold in a store. Let me explain.
Most products in the real world are found in digital or brick-and-mortar stores. The store is a central point for customers to shop for products and for sellers to connect with customers, removing the friction between buying and selling. Until they reach a store, products are just assets or inventory. Once in a store, products possess certain characteristics that facilitate the shopping process: they are standardized, packaged, “shoppable”, deliverable, and returnable. (See figure 1).
Figure 1. Characteristics of a Product
I believe data products work the same way. A data product is just an asset until it resides in a data store. There, it acquires new characteristics: a SKU, a unique metadata (subscription and delivery options), and terms of service/use that spell out a bidirectional, binding contract that is part of a formal transaction. In addition, data producers can charge for the product, if they want, or give it away for free or assess a chargeback fee, if it’s an internal transaction. In essence, a data product looks, smells, and acts like any product available for purchase in a local grocery, hardware, or retail store—except that it’s digital and data.
The data store makes it easy for data producers to create, publish, and distribute data products and for data consumers to browse, evaluate, compare, and acquire them. The focus here is external, on what customers do with data assets, not internal on how developers build data assets, stewards govern them, or IT staff monitor them. Companies ought to build data assets with the same rigor as data products. Right? A data asset is a prerequisite for a data product, but a data asset doesn’t become a data product until it lands on the digital shelves of a data store (or the price list of a vendor). Essentially, a data product is determined by its transactional nature and its residence and use.
Essentially, a data product is determined by its transactional nature and its residence and use.
The Role of an Internal Data Store
Most people are familiar with public data marketplaces operated by Amazon, Snowflake, and commercial data providers, such as Corelogic, Acxiom, and LiveRamp. These serve a purpose, mainly to promote commercial data products from data brokers. But they don’t meet the needs of most organizations that want to broadly share data assets internally, and perhaps externally as well. What our industry really needs are internal data stores that make it easy for internal data producers to create and publish data products that data consumers can find and use.
Benefits. Without an internal data store, data owners become overwhelmed with requests for data that suck up valuable time and resources and inject huge delays in the delivery and consumption of valuable data assets. A data store broadens access to core data assets while eliminating the manual and time-consuming processes that require data owners to review data requests to ensure data security. Data producers create data products once and distribute them many times without human intervention. Data consumers browse, evaluate, and acquire data products without having to request permission and wait for delivery.
Without an internal data store, data owners become overwhelmed with requests for data that suck up valuable time and resources and inject huge delays in the delivery and consumption of valuable data assets.
In the ideal world, you wouldn’t have organizational boundaries, either internal and external, that prevent the free flow of data. Although many organizations establish cultures where data is freely shared, even with external customers and trusted partners, most do not. Security, privacy, and risk concerns often create huge barriers to data sharing within and between organizations. Thus, I contend that any organization that wants to be data-driven requires an internal data store or data marketplace where it can publish and share its data products.
Versus Data Assets
By virtue of its residence in a data store, a data product contains transactional metadata that a data asset doesn’t. To publish a data product, a data product manager needs to define subscription and delivery options, terms of service, access rights, and an SKU or product number. This metadata helps data consumers understand whether the data product is going to meet their needs.
A data product also carries a bidirectional contract, especially if it’s being bought and sold. A data producer can cancel a data consumer’s subscription if he doesn’t abide by the terms of service, and a data consumer can return the data product if it doesn’t meet expectations defined in the product packaging. Although data assets may come with service-level agreements (SLA), a data product requires a data contract.
Project versus Program. Another differentiator is that developers build data assets in response to a service ticket, project request, or architectural need. In contrast, a data product is the result of a product management process that involves upfront product planning, followed by product development, packaging, governance, customer support and training, product enhancement, and product retirement. You need an ongoing program, not a time-fixed project, to produce and manage a data product throughout its lifecycle. This implies a longer-term commitment of resources and people and continued iteration and enhancement than what organizations typically assign to data assets.
The table below summarizes differences between data assets and data products.
Ad hoc or predefined
Service level agreement
Similarities Between Data Assets and Products
Targeted. Despite these differences, data assets and data products have more in common than not. First, both data assets and data products are designed to serve a target audience whose needs are documented upfront. Without a strong focus on the customer, neither a data asset nor a data product will prove useful.
Metadata. Second, a data asset has metadata—schema, volume, owners, lineage, etc.—that helps customers understand whether a data product will meet their needs: is it the right size; does it have the right ingredients; is it expired or outdated; is it sourced from the right places; and is it put together by trustworthy developers overseen by reputable data owners, and used by leading power users to build mission-critical solutions.
Governed, monitored, secured, and tested. In addition, both must be governed, monitored, and secured to engender trust among the target audience, and they must be continuously tested and monitored to ensure the deliverables meet customer expectations. They both can be building blocks that business or technical people use to assemble applications or entire solutions that support business needs and goals.
Some people define a data product as a data asset with well-defined metadata, rigorous governance, systematic testing, or ongoing monitoring. These are all important characteristics of a data product, but they equally apply to data assets. The only thing that turns a data asset into a data product is its availability in a data store, which requires that data assets contain metadata about subscription and delivery options and a bidirectional contract between data consumer and producer. Let me know what you think.