The Exciting, Unnerving Vision of Data Mesh
Data mesh is an evolutionary concept that’s gained a lot of traction in the software engineering world. As an old-school data team guy, I find data mesh both exciting and unnerving. It’s exciting because it represents a sea change in engineering attitudes toward data; unnerving because of its potential to create organizational confusion. In this article, I’ll explore how two of the four principles of data mesh—data as a product and domain ownership—are changing attitudes and expectations toward data.
Why Data Mesh is Exciting
A key principle of data mesh asserts that data is a product. Amen! It’s exciting to see this concept so readily adopted by the software engineering community. Although, the extent to which it’s treated as an epiphany is a bit mystifying. For those of us who’ve worked on data teams for years, this is not news. We’ve been designing and delivering data products all along.
Another key principle of data mesh is domain ownership. This means that the teams closest to the data—the owners of operational products that create data—are also the data product owners. They are responsible for storing, cleaning, cataloging, and making their data available to its consumers in the forms they need, such as native source format, or conformed and aggregated data sets for different use cases. Hallelujah!
The adoption of data mesh concepts is very exciting because it represents a change in attitudes of indifference about data in software engineering
The adoption of these concepts is very exciting because it represents a change in attitudes of indifference about data in software engineering. Operational product development teams are laser-focused on providing their customers with the best possible functionality and experience. So, getting them to consider data requirements and design early in their product development process is at best difficult. As Zhamak Dehghani says in her book, Data Mesh: Delivering Data-Driven Value at Scale,
“Operational teams still perceive their data as a byproduct of running the business, leaving it to someone else, e.g., the data analytics team, to pick it up and recycle it into products.”
Data mesh is driving positive changes in attitudes and expectations about data. That’s good.
Why Data Mesh is Unnerving
On the other hand, data mesh is also unnerving because of the significant organizational change it entails. Data mesh envisions an organizational structure with no centralized data team. The functions that a central data team provides are distributed to operational product development teams. My concern is not with the concept, but with implementation.
Implementing data mesh requires a full commitment to data-as-a-product and domain ownership. A half-hearted attempt at data mesh will produce a train wreck, like adopting agile without product owners. The concept won’t work without the organizational change that goes with it. History tells us that many organizations will cherry pick the parts they like about data mesh, e.g., speed and scale, and leave the hard stuff like governance for another day.
Implementing data mesh requires a full commitment to data-as-a-product and domain ownership. A half-hearted attempt at data mesh will produce a train wreck
Another implementation concern is about data skills capacity. One of the criticisms of the centralized data team is the bottleneck it creates because demand always outstrips capacity. Data mesh does not increase data product development capacity—at least not at first; it moves the people with data skills to other teams. It’s likely that data skills will propagate through operational product teams over time. However, that’s not sufficient. Cultivating data product development skills needs to be planned and executed with intention. And it starts with building on what product teams already know how to do.
Jumpstarting Data Product Ownership
Successful operational product teams know how to design their functional products. They understand their market, what customers want and need, their competitors, etc. They have well-established ways of learning about their market, validating their designs with users, developing the functionality, and evolving it over time. These are all part of the product mindset.
However, operational product teams don’t have the product mindset when it comes to their data. They don’t have the time, methods, or experience to treat their data as a product. It’s never been expected of them, so why would they?
How do operational teams learn to create and manage data products if they’ve never done so? This is not a new challenge. It’s similar to introducing data stewardship in a company that’s never done it. Most of the time, people who are assigned the role of data steward are willing, but not able at first. They need a playbook, training, and mentorship. The same is true for newly anointed data product owners.
If we want operational teams to be successful data product owners, then they need to learn data product management methods like those they use to manage operational products. Data product owners need to discover and document the value their data has to its consumers. This discovery process looks a lot like what they do regularly—iterate through research, ideation, and user validation. However, the dimensions of value for data are different from operational products. So, it’s critical to understand what makes data valuable.
Characteristics of a Good Data Product
Zhamak Dehghani proposes several necessary characteristics of data as a product. I won’t go into each one. She’s already done that. But here’s the list: discoverable, addressable, trustworthy, self-describing, interoperable, and secure.
Trustworthiness is the most important data product characteristic. if data is not trustworthy, then it’s not ready to be used and shouldn’t be discoverable, addressable, and interoperable
I believe trustworthy is the most important characteristic. if data is not trustworthy, then it’s not ready to be used and shouldn’t be discoverable, addressable, and interoperable. Trustworthy data is another way of saying data quality, the attributes of which are well known:
Completeness. This is the extent to which all required and expected data is available in a data set as defined by consumer needs.
Relevance. Data is relevant if it has meaning to consumers. The cleaned and cataloged versions of data that operational owners make available must be limited to the relevant stuff. Irrelevant data diminishes value because it produces noise that obscures what’s meaningful.
Consistency. When the same data is captured in multiple places, consistency measures the extent to which those instances vary from one another. Inconsistencies in data create inconsistencies in analytical results, which leads to more work to research and resolve them and can lead to faulty business decisions or missed opportunities.
Accuracy. Data accuracy is the level to which data correctly represents the real-world thing or event that it represents. Inaccurate data is insidious because it’s hard to identify without some other source to verify it against. My birthdate is August 9, 1978. Accurate or inaccurate?
Validity. Refers to data’s conformity with the format, type, and range of its definition. For example, ZIP codes are valid if they contain the correct characters for the region. Validity goes hand-in-hand with accuracy. Data cannot be accurate without being valid. However, valid data can be inaccurate.
Uniqueness. This indicates the extent to which there is a single recorded instance of a thing or event in a data set. Its opposite is duplication. Duplication degrades data value because it skews analytical results as well as creating a host of operational and sometimes legal and regulatory problems.
Timeliness. Is the extent to which information is available when it is expected and needed for a given purpose. Data that arrives late has the same effect as data that is not complete.
Operational product teams should start their journey up the data product learning curve by focusing on the principles of data quality. This will help them understand their data customers’ varying needs in terms of a common structure, and help them avoid developing data products as isolated point solutions.
There’s more to data mesh than covered here and much more I want to learn about it. For example, how does data mesh address data domains that span functional domains? Consider the complexities of handling master data management with customer entities in a distributed data mesh. What I do know is that implementing a data mesh involves getting people to change their assumptions and their behavior, and abandon what’s familiar. And that is no small task.