The Modernizing Data Stack: Three Ways to Balance New and Old

ABSTRACT: Traditional companies must balance new and old technologies as part of an ever-modernizing data stack.

While moderating a panel at the Modern Data Stack Conference (MDSCon) this spring, I was struck at the diversity of viewpoints on what a modern data stack really is. Our smart panelists George Fraser of Fivetran, Benn Stancil of Mode, Lior Gavish of Monte Carlo, and Sarah Catanzaro of Amplify Partners framed this concept as a specific collection of bleeding-edge tools and techniques. In contrast, I suggested the modern data stack is an ever-changing environment that balances the new and the old. In retrospect, my definition is actually more of a “modernizing data stack,” so this blog will use that term.

All types of companies need new tools to integrate, prepare, and deliver data as part of a modernizing data stack. But traditional companies—i.e., those born before the cloud boom—must balance their new tools with older technologies and processes that persist on premises. My blog in April described twelve “must-have” characteristics of a modernizing data stack with this balancing act in mind. This blog explores three big trends evident at MDSCon—economic uncertainty, AI disruption, and tool consolidation—and ways for companies to navigate them by balancing new and old elements in their stack.

Definitions

The “modern data stack” refers to a loose collection of tools and techniques, often cloud-based, that together process and store data to support modern analytics. While these stacks vary by company and use case, they include new products such as Fivetran for ingestion, dbt for transformation, Snowflake for data warehousing, Monte Carlo for data quality observability, and Google Looker for analytics. Go to an event like MDSCon and you get excited about what this new stuff can do.

Modernizing and balancing

However, traditional companies still must balance the new and old as they modernize. They might need to support new data science initiatives while maintaining established BI dashboards. They might move some workloads to the cloud but keep others on premises thanks to data gravity, sovereignty requirements, migration costs, resource constraints, and the complexity of integrating various elements on the cloud target.

New vendors recognize the need to strike this balance and accommodate older systems. Fivetran, whose co-founder Taylor Brown popularized the term “modern data stack,” acquired HVR in 2021 to integrate its ingestion tool with legacy systems such as IBM Db2 z/OS and SAP.


Most companies need their modernizing data stack to balance the new and the old


What’s Next

Now let’s examine three trends that will shape the future of the modernizing data stack: economic uncertainty, AI disruption, and tool consolidation.

Economic uncertainty

Companies and vendors alike feel uncertain about today’s economy. While overall US employment remains strong, we all have tech friends that are looking for jobs. While indicators such as NewVantage Partners’ annual survey show continued growth in data investments, venture capital funding for data startups is way down. Add recent bank failures, persistent inflation, and geopolitical tension to the mix, and you can see why companies want to reduce data costs wherever possible.

FinOps can help. This emerging discipline enables cross-functional teams to govern cloud costs as they predict, measure, and account for the consumption of compute or other elastic resources. Data teams should adopt FinOps best practices as they implement tools and manage workloads in their modernizing data stack. FinOps enables data teams to strike the right balance between new and old elements. They might find it’s cheaper to run certain ETL jobs off-hours on their own servers rather than the cloud. They might simulate operational workloads on a target cloud system—then decide whether migrating the data even makes economic sense.


FinOps enables data teams to govern cloud costs by balancing 

new and old elements of their modernizing data stack


AI disruption

OpenAI’s release of ChatGPT in November 2022 triggered an AI adoption wave whose implications we have yet to understand. As I blogged last month, nearly half of data engineers use LLMs to boost productivity. Fivetran’s founder and CEO George Fraser wisely observed at the conference in April that “large language models [LLMs] are like this tsunami that we all see in the distance. We don’t know what will be left after it passes, but we’re all frantically trying to build surfboards.” We should expect open-source communities, inhouse developers, and creative vendors such as Fivetran to devise new LLM assistants that further modernize each layer of the data stack. These assistants will help humans create starter pipelines, debug code, document procedures, and more.

To manage the risks of new AI use cases, companies must employ time-tested governance and training practices. For starters, these early generations of LLMs need human vigilance, based on familiar processes for data governance. Experts must inspect and refine the quality of LLM outputs before releasing anything for use in either development or production. They must apply careful test procedures, based on established best practices, to avoid implementing faulty code. Companies also should apply the “old” practice of team training. What LLM prompts work or don’t work? What templates can data engineers create, share, and reuse?


To manage the risks of new AI use cases, companies must employ time-tested governance and training practices


Tool consolidation

The ongoing proliferation of tool sets can make life harder for data engineers and their colleagues. Benn Stancil noted during our panel that data teams struggle to integrate all the modular tools they might need for various projects. Sarah Catanzaro, meanwhile, observed that cost concerns force teams to think more carefully about what tools to couple or decouple with one another. Companies therefore seek ways to consolidate their toolsets. Vendors, even startups, help them achieve this by offering solution suites for ingestion, transformation, DataOps, and pipeline orchestration in cloud or multi-cloud environments. Microsoft Fabric, announced in preview mode last month, seeks to integrate elements such as Azure Data Factory, Synapse Data Engineering, Synapse Data Science, and Synapse Data Warehouse.

Once again the balancing act comes into play with product decisions like these. Data teams might select a new specialty tool for a bleeding-edge AI project, but stick with an older solution suite for existing BI projects. They might also stick with the older solution suite because it is adding features to support AI projects. Data teams should assess the ability of both new and old offerings to reduce their total number of tools.


Data teams should assess the ability of both new and old offerings to help them consolidate and simplify their toolsets


Turbulent times

Turbulent times tend to spark both innovation and a return to fundamentals. Our current challenges of economic uncertainty, AI disruption, and tool proliferation will have similar effects. Innovative tools and techniques will create new possibilities for extracting data value. And at the same time, proven tools and processes will provide the reliable results companies need to face the future with confidence. Together, this mix of new and old will help the data stack continue to modernize.

Kevin Petrie

Kevin is the VP of Research at BARC US, where he writes and speaks about the intersection of AI, analytics, and data management. For nearly three decades Kevin has deciphered...

More About Kevin Petrie