What is a DataOps Engineer?
If we think about data as the flow of traffic through a city -- digital cars ferrying bundles of information along complex systems of roads (data pipelines) -- DataOps is like transportation planning. It covers everything from laying out new motorways and intersections to studying and maintaining old ones and ensuring construction doesn’t create gridlock. Done well, everything gets where it’s heading safely and quickly. Done poorly, the streets become congested, delivery slows to a crawl, and wrecks occur, resulting in poor data quality.
In this scenario, you wouldn’t ask the road crews or the engineer who designed the cars to lay out the city, yet too often this is what happens in the data world. Enter the DataOps engineer. Rather than off-loading the responsibility for managing the people, processes, and technology that make up the DataOps lifecycle on data or IT teams with other primary responsibilities, more and more organizations are hiring dedicated personnel. Akin to a DevOps engineer in the field of software development, DataOps engineers are technical professionals that focus primarily or exclusively on the development and deployment lifecycle rather than the product itself.
But who are these DataOps engineers?
How organizations define them
Increasingly, companies are actually using the term DataOps engineer in job postings. A quick search through LinkedIn and Glassdoor this month revealed more than a hundred active listings that included “DataOps” in the description. Unsurprisingly, most of these roles were at companies in data-intensive industries like tech, healthcare, finance, and consulting. Aggregating a sampling of these posts generated the following word cloud. (See figure 1.)
Figure 1. Most Common Words in Job Postings for a “DataOps Engineer”
Beyond “data” and more generic words common to most job listings, a handful of terms jump out from these job descriptions. Several make it clear that these are technical positions, but “process,” “manage,” “integrate,” and “automate” help distinguish them from other data roles.
Unlike other technical data workers, DataOps engineers don’t work with the data itself. They engineer the environment and processes through which others build the data products.
DataOps engineers don’t work with the data itself. They engineer the environment and processes through which others build the data products.
Despite the growing trend of calling these kinds of workers DataOps engineers, most currently in the workforce have other titles. Take Joe Mirizio at the Children’s Hospital of Philadelphia (CHOP) with whom I had the pleasure of speaking for this article. Mirizio was initially hired as a software developer and now works on the Data and Analytics team within CHOP’s Center for Health Care Quality and Analytics. Although his current title (Senior Data Developer) doesn’t include the term “DataOps,” he primarily serves the data engineers and analysts by improving their development processes through the adaptation and application of DevOps principles to the data product workflow. This is nearly the textbook definition of DataOps.
What they do
DataOps engineers’ holistic approach to the data development environment separates them from other technical team members. At CHOP, data engineers mostly work on ETL tasks while analysts serve on subject matter teams within the hospital. Mirizo, on the other hand, works on building infrastructure for data development. Some of his major projects have included building a metric platform to standardize calculations, creating an adaptor that allows data engineers to layer tests on top of their pipelines, and crafting a GitHub-integrated metadata catalogue to track document sources. On a day-to-day basis, he provides data engineers with guidance and design support around workflows and pipelines, conducts code reviews through GitHub, and helps select the tools the team will use.
Prior to the creation of his position, CHOP’s data team relied on human beings to manually check Excel spreadsheets to ensure everything looked okay, engineers emailed proposed changes to code and metadata back and forth, and the lack of shared definitions meant different pipelines delivered conflicting data. Now, thanks to Mirizio, much of that process is automated and tools like Jira, GitHub, and Airflow help the team maintain continuous, high-quality integration and development.
As data technology companies increasingly offer complete DataOps platforms, many vendors are building them with the DataOps engineer persona specifically in mind. As the primary user of these platforms, DataOps engineers build out the necessary integrations to get them off the ground and manage team production through the tracking metrics many of these tools provide.
Where they come from
Most DataOps engineers, like Mirizio, come from a software background where they learned DevOps and agile techniques. Others were formerly data engineers whose roles shifted to encompass a broader scope. Regardless of how they obtained their experience, DataOps engineers need to be intimately familiar with both the data itself and different development approaches. They should have strong people skills and take a big picture approach to planning. Most have degrees in computer science and fluency in multiple coding languages.
Most DataOps engineers come from a software background where they learned DevOps and agile techniques. Others were formerly data engineers whose roles shifted to encompass a broader scope.
Conclusion
The DataOps methodology has the potential to transform data teams by reducing development times, increasing data quality, and regularizing the production cycle, but without a steady hand to guide it, even the finest ship flounders. If your organization wants to truly adopt a DataOps approach, consider designating someone whose primary task is to oversee that process. You don’t necessarily even need to hire someone new. Look around your team and find the person who is already doing the work of trying to coordinate and standardize development and give them the resources to focus on that full time. The efficiency benefits of a functional DataOps strategy will far exceed the loss of another ETL developer.