What Can DataOps Do For You? Ask Roche
ABSTRACT: Biotech giant Roche has decreased cycle times by migrating from a traditional data warehouse to a DataOps-driven data mesh.
Everyone knows they need DataOps, but they don’t know what it is or where to start. DataOps is a mindset, a practice, a process, and a set of technologies. No wonder people are confused.
At its heart, DataOps improves the efficiency and effectiveness of data delivery processes. It manages and coordinates data ingestion, transformation, validation, development, testing, and deployment tools and processes. DataOps enables data teams to eliminate bottlenecks and deliver data faster, better, cheaper—the Holy Grail of enterprise data teams.
DataOps is about making data developers more efficient using agile and CI/CD processes and tools. It’s also about automating data pipelines using continuous testing and observability and managing data as code so developers can develop, test, and deploy data pipelines and data-driven applications quickly while also improving data quality.
DataOps turns the practice of building data pipelines from an artisanal activity carried out by individual developers working in isolation to an industrial-scale activity handled by a team of developers working in concert using standard processes, tools, and approaches.
DataOps turns the practice of building data pipelines from an artisanal activity carried out by individual developers working in isolation to an industrial-scale activity handled by a team of developers working in concert using standard processes, tools, and approaches.
Roche Implements a Data Mesh
One company that has embraced DataOps and experienced its benefit is Roche, one of the world’s largest biotech companies. It is using DataOps practices and tools to help it migrate from a traditional monolithic data warehousing architecture to a distributed data mesh architecture in the cloud.
Like many companies saddled with a legacy analytics architecture, Roche experienced data bottlenecks that prevented it from leveraging data for business advantage. It took an average of three months to scale compute power; three to four months to deliver a new software release; and four days to implement a hot fix. Business satisfaction with the data team’s performance was low.
Data Mesh in the Cloud. To revolutionize its performance, the data team decided to migrate from its legacy, centralized architecture to a distributed one based on data mesh concepts implemented with the Snowflake data cloud. Snowflake provides a common data and governance backbone for Roche’s numerous domain teams, who are now empowered to service their own data needs through the data mesh.
To make the data mesh work, Roche purchased a DataOps tool from Snowflake partner DataOps.live. The product supports team-based development using CI/CD, continuous testing, orchestration, and observability on Snowflake cloud platforms. Roche is using DataOps.live to coordinate development and execution across a dozen or more teams and tools, including Snowflake, Talend, Qlik, Snowpipe, LucidChart, GitLab, Immuta, Monte Carlo, Collibra, RDM Portal, Tableau, ThoughtSpot, Dataiku, and Sagemaker.
DataOps Benefits
Starting in early 2021 with a single department, Roche onboarded 40 data product teams that have created 50 data products. The data team has increased its delivery cadence from one release every three months to an astounding 120 releases in one month. Overall, the data mesh environment consists of 1300 users who can access 180TB of data. That’s an impressive amount of work, done in a year and a half.
The data team has increased its delivery cadence from one release every three months to an astounding 120 releases in one month.
Paul Rankin, head of Data Management and Architecture at Roche, cites the combination of Snowflake and DataOps.live as keys to his team’s extraordinary turnaround. “DataOps.live enables us to pull all this together in terms of orchestration, deployment, release management, CI/CD—and to do it at scale, making full use of Snowflake’s features as a programmable Data Cloud. It’s a complete game changer. We’re talking about ROI in terms of saving thousands of hours and dollars in processing and developer time.” Rankin presented his case study at the recent Snowflake Summit 2022 in Las Vegas along with representatives from DataOps.live.
Rankin concluded his presentation by citing several best practices. First, he said it is important to differentiate between DataOps engineers and data engineers. DataOps engineers focus on the process of building new data-driven pipelines and applications, while data engineers build the solutions. He added that it’s important to assign a DataOps engineer to every data domain team to ensure developers learn new habits for building and deploying data applications.
Second, he advised rolling out products when they are “good enough,” along with documentation and regular cadence calls. Third, he suggested selecting products that “naturally work in harmony with each other” through API integration and other means.
Summary
Enterprise data teams embrace DataOps to achieve new levels of efficiency and effectiveness in delivering data-driven solutions. Roche shows what’s possible when you combine a state-of-the-art cloud data platform with a data mesh architecture and DataOps solution.