Our Modern Data Platform Approach

For many years, the biggest debate I would have about data warehousing was whether to adopt the Kimball versus Inmon approach or a hybrid of the two. (Maybe the hybrid advocates weren’t so crazy after all). As technology improved, data warehousing found itself at a crossroad. Data storage was cheap, “Big Data” was just coming into the spotlight and we were experiencing performance degradation that was threatening our existence. We had users running reports all day long impacting users that were trying to analyze the same sets of data.  Rogue spreadsheets started to take hold because end users couldn’t get what they needed out of the system quickly enough. Even then, the challenge was obvious; we were asking a system that was set up to do one thing very well to do two things and it just couldn’t meet the
challenge.

Finally, technology caught up with the way people wanted to use our systems; blending different technology platforms that support different use cases (reporting versus data analysis). Our modern data platform (MDP) approach started with the idea that we need an analytic processing environment that supports three very different use cases:

  1. Reporting, lots of users logging on once or twice a month, usually about the same time each month (such as regulatory reporting)
  2. Ad-hoc analysis, some modest data mining.  These users answer the more advanced questions that reports don’t cover.
  3. Research; deep data mining, statistical analysis, etc.

After agreeing to those three sets of use cases we had to look at platforms. Obviously we needed the platforms to match the need and despite incredible – sometimes mind boggling advancements in technology – we found through this process that we were not able to support reporting, analysis and research in one truly consolidated platform. After a number of discussions and pilots to address our primary use cases, we found the three different technology platforms that aligned with our vision and purpose.

Reporting. For our reporting optimized tier, we selected Oracle 12c with in-memory and partitioning. Each month we have a huge hit, a couple hundred reports that are repeatedly run. Concurrent users are the core of our Oracle environment. We invested in some hardware to ensure that the 12c environment had what it needed to shine. It’s been over a year now and our reporting-optimized tier with Oracle has been one of the most successful parts of our rearchitecture.

Ad hoc Analytics. Our second tier, the analytics optimized tier, was designed to support larger data sets. In this tier we aren’t as concerned about concurrency, but with large data volumes. We needed the technology associated with this tier to scale, and after a pilot test we selected IBM PureData (Netezza). Our current environment holds only four terabytes (the 1999 version of me just dropped her jaw), but that is just the beginning. We plan to expand our data sources
dramatically, anticipating a two-fold increase in data volumes during the next two years. We implemented this environment in June. The analytics optimized tier will be the center of our re-architected data warehouse. Because we have three different technology platforms one of them has to serve as the “hub” of the data warehouse. The analytics optimized tier, because of its scalability, will serve as that hub and data will move to and from (in certain cases) the other
environments.

Finally, the reality of healthcare is that a veritable gold mine sits in our electronic health records (E.H.R) disguised as clinician or nurse notes. With an eye to the future of personalized medicine, we are starting programs that require lots of data; so into the Hadoop world we go. Our pilot Hadoop environment, or our “research optimized tier”, will go live in the fourth quarter this year with one very small pilot test. Assuming that’s successful, I suspect that based on the current level of interest from a number of our physician leaders, we will rapidly increase its usage. 

You will notice that we are doing a lot of “pilot” testing, small ideas with very specific business value. That’s an incredibly important aspect of our go-forward approach. We need the flexibility to abandon something if it doesn’t work, and the definition of work has to be very tangible business value. The risks are too high otherwise, because every vendor out there provides a “reporting solution” for their product, there is a lot of pressure to make the MDP approach work quickly and efficiently.

Books by Our Experts