Big Data Part IV: Hadoop Won't Kill the Data Warehouse

This is the fourth in a multi-part series on the impact of Hadoop on business intelligence

Hadoop advocates know they’ve struck gold. They’ve got new technology that promises to transform the way organizations capture, access, analyze, and act on information. (See "Part III: “Hadoop 2 Changes Everything.”) Market watchers estimate the potential revenue from big data software and systems to be in the tens of billions of dollars. So, it’s not surprising that Hadoop advocates are eager to discard the old to make way for the new.

But in their haste, some Hadoop advocates have plied a lot of misinformation about so-called “traditional” systems, especially the data warehouse. They seem to think that by bashing the data warehouse, they’ll accelerate the pace at which people adopt Hadoop and the “data lake”. This is a counterproductive strategy for a couple of reasons.

First, the data warehouse will be an integral part of the analytical ecosystem for many years to come. It will take many years (decades?) for a majority of companies to convert their data and analytics architecture to a data lake powered by Hadoop, if they do at all. Organizations simply have too much time, money, resources, and skills tied up with existing systems and applications to throw them away and start anew. The mantra of big data is evolution, not revolution. (To learn about these countervailing strategies, see "Part I: The Battle for the Future of Hadoop.”)

Second, Hadoop is at the beginning of its journey, and while things look bright and rosy now, this new architecture will inevitably encounter dark times and failures, just like all new technologies. Thus, it’s unwise for Hadoop advocates to take potshots at a mature technology, like the data warehouse, which has been refined in the crucible of thousands of real-world implementations. Just because there are data warehousing failures doesn’t mean the technology is bankrupt or that a majority of organizations are eager to cast their data processing destiny to a new, untested platform whose deficiencies have yet to emerge.

Many data warehousing deficiencies stem from the fact that the data warehouse has been asked it to shoulder a bigger load than it was designed to handle. A data warehouse is best used to deliver answers to known questions: it allows users to monitor performance along predefined metrics and drill down and across related dimensions to gain additional context about a situation. It isn’t optimized to support unfettered exploration and discovery or to store and provide access to non-relational data.

But, since the data warehouse has been the only analytical game in town for the past 20 years, organizations have tried to shoehorn into it many workloads that it’s not suited to handle. These failures aren’t a blemish against the data warehouse as much as evidence of a lack of imagination about how best to solve various types of data processing problems. Fortunately, we now have other ways to capture, store, access, and analyze data. So, we can finally offload some of these workloads from our overburdened data warehouses and give them space to do what they do best—populate reports and dashboards with clean, integrated, and certified data.

A final reason that Hadoop proponents shouldn’t disparage the data warehouse is because the data warehouse is ultimately a process, not a technology. A data warehouse reunites an organization in electronic form (i.e. data) so that it can function as a single entity, not a conglomeration of loosely coupled fiefdoms. In this sense, the data warehouse will never go away.

The truth is that companies can implement a data warehouse with a variety of technologies and tools, including a data lake. Some are better than others, and none is sufficient in and of itself. But that is not the point: a data warehouse is really an abstraction, a logical representation of clean, vetted data that executives can use to make decisions.  Without a data warehouse, executives run blind, making critical decisions with inaccurate data or no data at all.

So, despite what some critics say, the data warehouse is here to stay. It will remain a prominent fixture in analytical environments for many years to come.


Read the final article in this five-part series: "Part V: Hadoop: The Next Spreadmart?"

Wayne Eckerson

Wayne Eckerson is an internationally recognized thought leader in the business intelligence and analytics field. He is a sought-after consultant and noted speaker who thinks critically, writes clearly and presents...

More About Wayne Eckerson