Why Enterprises Should Implement the Data Mesh with DataOps

ABSTRACT: The discipline of DataOps can improve the odds of success with the data mesh by simplifying and optimizing data delivery.

The data mesh strikes a nerve in our industry because it challenges long-held assumptions about best to manage and consume data. It pushes aspects of data ownership from central IT departments to distributed business units in the hope of creating a more efficient foundation for analytics. This blog asserts that enterprises improve the odds of success with the data mesh if they use the discipline of DataOps to simplify and optimize how it works.

The data mesh defined

Zhamak Dehghani defines the data mesh as “a sociotechnical approach to share, access, and manage analytical data in complex and large-scale environments.” Her new approach reverses the traditional role for IT and instead makes business domain experts the owners of their data, which they deliver as a “data product” to other domain owners and analytics teams across the enterprise. They do so by using a self-service data platform and what Dehghani calls “federated computational governance.”

The data mesh makes business domain experts the owners of their data, which they deliver as a data product to analytics teams across the enterprise

In a best-case scenario, the data mesh makes data delivery more efficient and effective because the people that best understand the data become responsible for managing it. In a worst-case scenario, the data mesh creates a lawless Wild West in which data owners go their own way and fail to support the rest of the business. Also check out recent blogs by my colleagues Wayne Eckerson and Jay Piscioneri to get their takes on the data mesh.

In a best-case scenario, the data mesh makes data delivery more efficient and effective. In a worst-case scenario, the data mesh creates a lawless Wild West

Enter DataOps

DataOps can bolster the best-case scenario by making it easier to deliver timely, accurate, and governed data products. It adapts methodologies from DevOps, agile software development, and total quality management to the creation and management of data pipelines. DataOps comprises four pillars: testing, continuous integration and deployment (CI/CD), orchestration, and data observability. Also check out the insightful DataOps content by my colleagues Wayne Eckerson, Joe Hilleary, and Dave Wells.

Data mesh principles and DataOps pillars

Let’s walk through Dehghani’s four principles of the data mesh: domain ownership, data as a product, the self-service data platform, and federated computational governance. Along the way we will explore how the DataOps pillars of CI/CD, testing, observability, and orchestration contribute to the success of the data mesh. DataOps adds the most value with the principles of data as a product, self-service platform, and federated governance.

Data mesh principle #1: domain ownership

The data mesh assigns ownership of data to business domain experts and makes them responsible for delivering that data to the rest of the business. For example, a sales operations (SalesOps) team might own customer sales data, and other functional teams would own their functional data. They logically isolate this data and manage its lifecycle of inception, transformation, and delivery. 

Data mesh principle #2: data as a product

The data mesh has domain owners transform their data into standard structures and formats, check its quality, and deliver it as a refined product to consumers. These data products should be discoverable, trustworthy, and interoperable. In our example, the SalesOps team might deliver data products about customer sales to data analysts for business intelligence projects, or to data scientists for machine learning (ML) projects.

How DataOps helps. The DataOps pillars of CI/CD, testing, and observability can increase the quality and quantity of data products in the following ways. Applying DataOps in these ways helps standardize data products across domains, making them more predictable, modular, and reusable.

  • CI/CD. Domain owners, or better yet the data engineers that work for them, employ CI/CD methods to maintain quality standards. They branch pipelines or datasets into development zones, update those versions, then merge them back into the production “source of truth.”

  • Testing. These data engineers build tests into pipelines to inspect code, execute functions, and assess results. They also compare tests to choose the best version. They aim to find and fix errors before sharing any data products.

  • Observability. Domain owners and data engineers monitor data quality—i.e., accuracy and timeliness—as well as the performance of production pipelines and the infrastructure supporting them. They study patterns and anomalies in indicators such as value ranges, schema, lineage, job status, and resource utilization.

DataOps helps deliver standardized, modular, and reusable data products across the domains of a data mesh

Data mesh principle #3: self-service data platform

According to the data mesh, domain owners deliver their data products through the self-service data platform for consumption by other domain owners and analytics teams. These teams use platform services such as data catalogs, lineage, and knowledge graphs, as well as low-code pipeline tools, to find, share, and consume analytics data. They rely on data engineers and platform engineers within central IT to manage the platform.

How DataOps helps. The central IT team can apply the same DataOps pillars as above—CI/CD, testing, and observability—to build and manage an efficient, effective self-service data platform. They also can apply the DataOps pillar of orchestration, which automates data pipelines by grouping their tasks into workflows that transform and move data between various stores, algorithms, processors, applications, and micro services. By orchestrating these various workflows and elements, data engineers reduce the repetitive work of managing data pipelines.

The central IT department should bake all these DataOps capabilities into the self-service platform itself. Domain owners can use them to build data products, and analytics teams and other domain owners can use them to consume data products. A self-service data platform with built-in DataOps capabilities empowers teams to act as autonomous units, handling analytics projects with little or no assistance from central IT.

A self-service data platform with built-in DataOps capabilities enables domain owners and analytics teams to act as autonomous units, without help from IT

Data mesh principle #4: federated computational governance

The data mesh requires domain owners, analytics teams, and compliance experts to create and enforce policies that govern these various activities according to federated standards. They embed rules into the data products, platform, and workflows, then monitor and report on usage to reduce the risk of noncompliance.

How DataOps helps. All four DataOps pillars—CI/CD, testing, data observability, and orchestration—help streamline data governance by standardizing data products and how enterprises handle them. DataOps reduces variability, which in turn reduces the risk and effort of data governance. Cross-functional governance teams can automate many of their checks and controls, confident that most of the elements they govern fall into predefined categories that do not require special human attention. 

Conclusion: making it all mesh

A successful data mesh needs reliable domain owners, usable data products, and autonomous analytics teams—all operating in a governed framework. Without these elements the Wild West will prevail. Implemented well, DataOps can serve as the town sheriff that keeps everyone in line.

That’s my take. I’m excited to moderate a panel discussion, “Data Mesh - What You Need to Know,” at the Big Data LDN conference in London on September 21. I’ll get the take of Zhamak Dehghani, now CEO of a stealth startup; along with Justin Borgman, CEO of Starburst; Omar Khawaja, Head of Analytics at Roche; Cindi Howson, Chief Data Strategy Officer at Thoughtspot; and Jules Marshall, Director of Product Data at the BBC. Hope you can join us!

Kevin Petrie

Kevin is the VP of Research at Eckerson Group, where he manages the research agenda and writes about topics such as data integration, data observability, machine learning, and cloud data...

More About Kevin Petrie