A Thoughtful Approach to Data Mesh
ABSTRACT: Data Mesh raises hard questions—game-changing or high risk? Reality lies somewhere in between. Good architecture practices help to reap the benefits with minimum risk.
Data mesh gets a lot of discussion. Proponents see it as revolutionary—the first new data architecture thinking in many years. Detractors view it as a dangerous trend and a backward slide to the chaos of data silos. The reality lies somewhere between—a big shift in architecture thinking with a level of risk if poorly applied. Jay Piscioneri emphasizes the risk when he says “A half-hearted attempt at data mesh will produce a train wreck.”
A few good architecture practices can help to realize the benefits while managing the risks. Two fundamental principles lie at the core of data mesh architecture—domain -managed data and sharing through data products. Those principles, if thoughtfully applied, can ease the pain of data management without regressing to disorderly data silos. In this article, I’ll discuss four practices for thoughtful application of Data Mesh principles.
Be Selective with Domain Data Management
Not all data is well-suited to domain data management. Sensitive data of some types requires stronger protections than can be ensured by requiring independent data domains to adhere to enterprise data governance policies. Many categories of sensitive data require protection and regulatory compliance that include personally identifiable information (PII) and payment card information (PCI). Additional types of sensitive data are industry specific, for example healthcare organizations managing protected health information (PHI) and individually identifiable health information (IIHI). Also consider internally regulated data. Every competitive enterprise is likely to have company confidential information that must be protected. Each category of sensitive data should be carefully considered before positioning it as domain managed. Data owners, data governors, and data architects should collaborate to make informed decisions about domain-managed data.
Not all data is well-suited to domain data management.
Not all data is well-suited to enterprise data management. Managing data at the enterprise level sometimes imposes constraints that inhibit effectiveness and efficiency of business processes at the local level. Agility may suffer in functional domains (finance, sales, marketing, R&D, etc.) when changes in data content, meaning, structure, and encoding are subject to enterprise-level consensus. Granularity, specificity, and even cultural alignment may suffer in geographic domains (AsiaPac, EMEA, Americas, etc.) when differences in name and address conventions, currencies, character sets, and units-of-measure must adapt to singular enterprise conventions. Owners, governors, and architects should consider these factors when making decisions about domain-managed data.
Not all data is well-suited to enterprise data management.
Data management architecture is ideally configured to optimize data management for all data within its scope. Ultimately, a well-balanced data architecture blends enterprise and domain data management.
A well-balanced data architecture blends enterprise and domain data management.
Think Beyond Data for Analytics
Data mesh is often presented as architecture for data analytics. Domains independently and autonomously manage data, then create and publish data products for consumption by data analysts and data scientists. Data sharing is achieved through data products while avoiding the heavy lifting and the complexity of centralized data integration. That is all good news for data analytics. But don’t limit your Data Mesh thinking to analytics.
Don’t limit your Data Mesh thinking to analytics. Operational processes also need data sharing.
Operational processes also rely on data sharing. In recent years, many changes have occurred in the world of operational systems and data. What was once almost exclusively transactional systems has evolved to encompass transactions,workflow automation, process automation, commercial IoT, and industrial IoT. Today’s transactional systems are dominated by purchased products – ERP, SaaS, etc. – each with unique semantics, data models, and proprietary data architectures. Operational data management has become highly complex. Integration with an Operational Data Store (ODS) is not sufficient to meet operational data sharing needs.The line between operational and analytical has become blurred with embedded analytics and analytical/operational workload overlaps. Data integration, data sharing, and data management must work across that blurry line. With Data Mesh, it is practical for operational data domains to create data products and to make use of products from other domains.
Think Product Management for Data Products
Data products are a defining characteristic and a core concept of data mesh. Success depends on managing data products well—building the right products, building them right, and keeping them fresh as needs evolve and change. Kevin Petrie’s thoughts about DataOps and data mesh give some important perspective about “making it easier to deliver timely, accurate, and governed data products.”
DataOps is a good step in the right direction, but not enough to step up to the complexities of managing data products. Adopting some product management best practices from other fields is sure to help here.
Continuous discovery of needs for data products is essential to creating a robust collection of data products that meet a diverse set of data sharing requirements. Product discovery is ideally performed before product development to understand why a product should exist and who is likely to use it. This means that data domains can’t behave as if they are islands. They must actively seek to discover and understand the data needs of others.
Data domains can’t behave as if they are islands.
They must actively seek to discover the data needs of others.
Product interoperability is a key part of creating a mesh. A library of data products that can’t easily be interconnected is not a mesh; it is simply a collection of loose threads. Product interoperability is the characteristic that weaves the threads into a mesh, making it practical to combine data from two or more domains to create valuable cross-domain information.
A library of data products that can’t easily be interconnected is not a mesh.
It is simply a collection of loose threads.
Bill of Materials (BOM) management supports components, modularity, and reuse for data product management. In manufacturing BOM is a list of the raw materials, assemblies, and components needed to manufacture an end product. With data products, data elements, groups, collections, datasets, metadata, and APIs are the parts needed to “manufacture” a data product. Data product BOM encourages and supports reuse of collections, datasets, APIs, etc. Perhaps more importantly, data product BOM describes the assembly path of a data product. Data consumers will continue to ask about data sources and trustworthiness. Data products supersede data pipelines in data mesh. Similarly, data product BOM replaces pipeline-oriented data lineage metadata.
Data consumers will continue to ask about data sources and trustworthiness.
Data product BOM replaces pipeline-oriented data lineage metadata.
Think Hybrid Architecture with Data Mesh Features
Many of those responsible for data management architecture struggle with architectural uncertainty: Data mesh? Or data fabric? What about data lakes? But these are the wrong questions to be asking. The right question (yes, a complex question ) is: Data mesh, data fabric, data lake, data warehouse—how do we make them all work together?
Data mesh, data fabric, data lake, data warehouse—how do we make them all work together?.
Implementing pure data mesh architecture is impractical for many reasons. Few of us have the freedom to disregard existing data architecture and start over with a greenfield approach. Even if that were possible, not all data is appropriate for domain management as stated earlier in this article. Migrating from existing data resources—data lake, data warehouse, operational data store, MDM hub, etc.—is sure to be costly, time-consuming, labor-intensive, and high-risk.
A more practical approach is to view data mesh as a new tool in the data architecture toolbox. Use it appropriately for the data, the data sharing, and the use cases where it works well. Use other tools where they are the best fit. Focus your data architecture efforts on design and evolution of a hybrid architecture that best meets your needs.
I first wrote about weaving data mesh into blended and hybrid architecture more than a year ago in an article titled Data Architecture: Complex or Complicated. Since that time, other voices have emphasized that message with balanced views on the upside and the downside of data mesh. Kevin Petrie underscores the point when he says “This paradigm is no cure-all. It can address certain organizations and certain use cases.” Wayne Eckerson makes a strong statement that “The data mesh has no place for a data warehouse, a data lake, or the data pipelines that feed them and the enterprise data engineers who create them.”
The reality is that data mesh is here to stay. It has real potential to resolve some data management challenges. It has gained attention, developed a community of advocates, and developed a community of adversaries. Data mesh will not be going away anytime soon. Accept that reality and don’t let it become a polarizing debate (good advice from Dan Everett). Be open-minded about data mesh potential. Be practical and cautious about data mesh risks. Most importantly, be thoughtful about why, where, and how you will weave data mesh into your data management architecture and practices.