Data Mesh’s Missing Ingredient: A Data Marketplace
ABSTRACT: The data mesh framework doesn’t specify a key component that completes the last mile of the architecture: a data provisioning environment.
By now, everyone knows the four components of a data mesh: 1) domain ownership 2) data as a product 3) self-service platform and 4) federated governance. What people don’t know is that the framework lacks a key component: an internal data marketplace for provisioning and consuming data products.
Provisioning Data Products
Let’s say you’re an early adopter of data mesh. You define domains, implement a data mesh platform, update data governance, and establish teams to build data products. Those teams will build products to meet their needs. And ideally, they share their data products with other domains. But how do they do that?
There are many issues to work out:
Will domains have time to evaluate other domain’s needs and externalize their products?
How will other domains know where to find data products?
How will they evaluate the products and determine if their suitable to meet their needs?
How will domain owners decide who can access their data products and who can’t?
What if users want custom versions of the products? Who creates and maintains them?
Are domain owners obligated to deliver data products via a service-level agreement?
Is there a way to track usage of the self-service platform for planning and chargebacks?
Provisioning data products is not a simple process. A data catalog is not enough since it describes data assets but doesn’t provision them. What’s needed is an environment that provides a frictionless way for data providers and data consumers inside or outside an organization to collaborate around data products. We call this a data marketplace, and it runs on a data exchange platform. It complements a data mesh architecture. (See figure 1.)
Figure 1. Data Marketplace Framework
Integrating with Data Mesh
Closing the last mile. A data marketplace closes the last mile between data providers and data consumers. It represents a fourth-generation data architecture that perfectly complements a data mesh. A data mesh provides the platform for distributed development of data products, while the data marketplace provides the mechanism for publishing and exchanging data products.
A data mesh provides the platform for distributed development of data products, while the data marketplace provides the mechanism for publishing and exchanging data products.
Formalizing data products. Unlike a data catalog, a data marketplace formalizes the notion of a data product. It requires data providers to describe or “package” their product in detail, define who can use the product and terms of access (i.e., licensing), and specify distribution methods and minimum guarantees of delivery. This onboarding process forces providers to think about the “externality” of their data—how other parties might use it—which addresses one of the liabilities of a data mesh approach: data domain owners have little time or incentive to consider the needs of external groups. It also gives data consumers assurances about the quality and reliability of the data products in the data marketplace.
Collaboration. A data marketplace also manages the end-to-end relationship between data providers and consumers and formalizes underground data sharing practices. It enables both parties to converse about data products in a structured and efficient way. They can use the platform to discuss ways to customize data products or create new products from existing ones. Together, they can launch secure spaces in the platform to manipulate, enrich, or model data to gain greater value from the assets. A data mesh is not geared to manage this kind of interaction.
Data ecosystems. Moreover, a data marketplace is bidirectional: domain owners can be both data providers and data consumers, using the same platform to exchange data. This creates a vibrant ecosystem on top of a data mesh. It fosters a culture of data sharing and collaboration, that is often lacking with distributed systems and organizations. And once an organization establishes an internal data marketplace, it’s easy to extend it to incorporate external partners.
A data marketplace fosters a culture of data sharing and collaboration, which distributed systems and organizations often lack.
Streamlines operations. Lastly, a data marketplace increases the efficiency of data operations. It streamlines requests for new and custom data sets and automates the distribution of data. Data providers create pipelines that ingest, transform, create, and deliver data products to target platforms in the required format. Data consumers build pipelines to ingest data products and integrate them with internal databases and schema. The data marketplace tracks consumption at a granular level to support infrastructure planning and chargeback processes.
Data Marketplace Functionality
A data marketplace makes it easy for domain teams to publish data products for internal or external partners in a secure way that minimizes the time required to handle external requests. A data exchange platform has distinct modules for data providers (or sellers), data consumers (or buyers), and data marketplace operators. (See figure 2.)
Figure 2. Data Exchange Platform Functionality
Domain team functionality. For data mesh domain teams, the platform makes it easy to ingest data, create data products, document their attributes, and define access rules and licensing terms (i.e., who can see what for how long and for what purpose). It also enables them to define payment methods (if applicable), collaborate with data consumers about their requests, specify distribution methods (i.e., SFTP, ETL, API, desktop download), and automate the distribution of data products.
Data consumer functionality. For data consumers, a data marketplace makes it easy to discover data products by browsing a catalog. They can click on a data product to examine its attributes, values, lineage, ownership, and data quality. They can tap the wisdom of the community by seeing who else has used the product for what purpose. They can also view sample data, and, in some cases, the entire data set in a secure space. They can do a one-time download or create a permanent feed that moves data to a target environment on a scheduled or real-time basis.
Organizations can build their own data marketplace or purchase the technology from a growing list of providers, including Narrative.io., Revelate, Harbr, Dawex, and Informatica. We expect data catalog vendors to muscle into the space by enhancing the data provisioning features of their products.
A data marketplace is a perfect complement to a data mesh. Without a data marketplace, a data mesh turns into a series of static parking lots for data products. Domains become isolated and focus inward on their own data needs at the expense of the enterprise. A data marketplace liberates data products within the mesh, making them easy to find and use. It gives business velocity to the data mesh and fosters a culture of data sharing.
For more information about data marketplaces and the data exchange platforms they run on see:
Virtual Event. Wayne Eckerson and speakers, “CDO TechVent: Data Sharing and Data Marketplaces”, December 15, 2022.
Webinar. Wayne Eckerson, “Foundations for Data Marketplaces: How to Build a Robust Data Supply Chain”, December 1, 2022.
Article. Joe Hilleary, “Direct Data Commerce: The Next Wave of Data Monetization” – May 14, 2022.
Report. Joe Hilleary, “Deep Dive on Data Exchanges: Three Tools to Consider” – February 8, 2022.
- Report. Wayne Eckerson, “The Rise of Data Exchanges: Frictionless Integration of Third Party Data” – September 1, 2020.