Cloud Data Warehousing: Producing the Infrastructureless Culture

The Tsunami in the Room

Ok, “tsunami in the room” is a horrible mixed metaphor but this is big. Really big. It’s not just an elephant in the room that everyone sees, nobody mentions and is making everyone uncomfortable. This is a tsunami in the room. It’s not even a tsunami that is off the coast that you can see coming if you look in the right direction. This tsunami is here.

This tsunami is the “infrastructureless culture” that is being created by cloud data warehousing. It proposes to wipe out and wipe clean everything that we now think we know about data warehouses, data lakes, business intelligence and predictive analytics. This “tsunami in the room” is the movement of the data warehouse, business intelligence and predictive analytics to the cloud. It is big and it is happening now and those who act now are going to ride this wave rather than be overwhelmed by it.

What is an Infrastructureless Culture?

The first time I heard this phrase was from Joe Caserta of Caserta Concepts while at SPARK Summit East. (Until someone tells me otherwise I’ll give Joe full credit for coining the phrase.) What it means is that there are going to be changes in our database world that reach beyond just business and technology. When we move our infrastructure technology to the cloud it becomes an appliance for us. It becomes an infrastructure that we no longer need to think about. It just works. And that has big implications for our corporate cultures. We become ‘infrastructureless’.

Consider that Caserta Concepts used to make their living building data warehouses onsite and inside the onsite corporate firewall.  They regularly built data warehouses with one of the juggernaut RDBMS providers on the technological infrastructure that the customer provided. Today they haven’t built one with that RDBMS vendor in over two years. Everything has been data warehousing in the cloud. 100% cloud data warehousing.

Consider also that Microsoft’s cloud offering, Azure, has doubled in revenue again this past fiscal year. Consider that Google is now a viable player in the cloud, challenging incumbent Amazon and upstart Microsoft (isn’t that by itself an indicator of major change when we think of Microsoft as an upstart!).

Consider also that the greatest barriers to moving to the cloud: security and privacy have been shown to be paper tigers. They have been easily knocked down by new technology and best practices.

In fact, the opposite is true. It is just beginning to dawn on IT shops around the globe. There is no way that an individual company can spend as much on security or build it as a core competency as well as the big cloud players.  You are not only safe in the cloud you are almost assuredly safer. You’re more likely to join the ranks of Yahoo!, TJX, and Target on the front page of the Wall Street Journal for a security breach if you roll your own infrastructure than if you rely on best of breed. And keep in mind the companies that are suffering security breaches have technology and data as core competencies (think Sony and Yahoo! here) … so no one is safe.

 Knocking down the security and privacy protection challenges was huge.  But the next biggest objections to the cloud are performance and politics.  For details see the excellent report by Business Application Research Center (BARC) and the Eckerson Group: “BI and Data Management in the Cloud: Issues and Trends”.

Performance is, aside from issues of data upload and data change management, a solved problem.  Again it is a paper tiger in that the economies of scale that the cloud providers offer result in lower cost per terabyte and teraflop.

The problem of ‘politics’ as reported by Eckerson Group and BARC is however quite real. And this is why this is as much a cultural tsunami as a technological one.

Best Practices for How to Manage Your Culture

Caserta Concepts prides itself on its ability to help its clients manage the cultural changes of moving to an infrastructureless data warehouse solution and they shared some of their best practices.

  1. Think big picture. Make sure you map out the full architecture at a high level. But don’t try to build all of it. Instead look for one end to end business problem that the cloud data warehouse can solve. Get a win then move on to the next business problem.  Look at and keep improving your roadmap but don’t assume you have to build the whole thing. Focus on one win at a time.
  2. Look for projects driven by business leaders. There is a reason that new titles like Chief Data Officer and Chief Analytics Officer now exist in large corporations. Business centers are now driving the construction of the cloud data warehouse and the IT department sometimes becomes one of the team members rather than the project leader.
  3. Provide an overarching design methodology for your cloud data warehouse. In the case of Caserta Concepts they have developed a corporate data pyramid which consists of data ingestion, data lake, data science workbench and big data warehouse.
  4. Research not only best of breed offerings but those that work well together. Suggestion here might be to look at SPARK running directly on top of S3 as an inexpensive, scalable, and powerful way to satisfy your data scientists.  Google BigQuery and Pub/Sub also is an up and coming solution set.

Four Years from Now

Joe Caserta and Kevin Rasmussen offered some intriguing (and perhaps outrageous) thoughts on where we might be in four years. I mostly agree.

  1. Watch Google’s entry into cloud services. They are late to the party but have learned from Amazon and Microsoft. They provide much more of a full service offering and simplify and streamline the major things you need to do.
  2. Business users want better. There has been tremendous progress in the last decade in data storage, access and compute power but the tools to access that data are still frustrating for the business user. Business users want an Amazon marketplace and shopping cart or a Google search based interface to get the answers to their questions. Only the most motivated will want to drag and drop measures and dimensions on interactive reports. Look to natural language processing and generation as well as intelligent assistants and smartbots to provide the breakthroughs in this area.
  3. The incumbent large database providers may be in trouble in a way they have never been in trouble before. The tsunami is not located somewhere off the coast. It is in the room with us. New projects are rapidly moving to low cost databases and infrastructure on the cloud and it is no longer scary. It is the opposite of scary. It is safer, faster, cheaper. My personal expectation is that most of these large providers will pivot just in the nick of time. In fact, they already are. But this may be the fastest pivot in the histories of their companies.
  4. Investment in premier data catalogs is going to be a critical differentiator for success. The most important question for a cloud data warehouse isn’t what you’d expect like: “How do I organize my data?”. It is much more fundamental like “Where is my data? What does this column really mean?”.  Look at Apache Atlas, Waterline Data and CKAN as current contenders. But expect acceleration and some breakthroughs in this space.

Ride the Wave

I apologized at the beginning of this article for mixing the “elephant in the room” and the tsunami metaphors. But the point is that this is bigger than an elephant (it’s not just technology it’s cultural) and your feet are already getting wet from the tsunami (i.e. it’s already here).

The good news is that this is awesome. This is a tsunami of lower cost, higher performance and more rapid deployment. Hop on this cloud data warehousing wave as soon as you can and we’ll all ride it into the “infrastructureless culture” of the near future!

Expert Insights. Many thanks to Joe Caserta and Kevin Rasmussen who provided expert insights on this topic. Joe is founder and CEO and Kevin is a data engineer at Caserta Concepts. www.casertaconcepts.com

Stephen J. Smith

Stephen Smith is a well-respected expert in the fields of data science, predictive analytics and their application in the education, pharmaceutical, healthcare, telecom and finance...

More About Stephen J. Smith

Books by Our Experts