Single Version of the Truth - Not Optional

Read Dewayne Washington's companion article: Single Version of Truth: Attainable Goal or Fool's Gold?

It was 1995 and I was working at Pilot Software, one of the first OLAP / BI vendors. We had Ralph Kimball’s Red Brick Systems come in to pitch a strange new data model called the ‘data warehouse’. It was about that time that I was introduced to the idea of the “Single Version of the Truth” (SVOT) - the idyllic notion that you could have a centralized database with which the CEO would get consistent answers to basic questions such as “How many customers do I have?” or “How many products did I sell last quarter?”

More than twenty years later we still struggle to deliver a single version of the truth to our customers. With the advent of data lakes and big data it has, in many ways, gotten even more difficult. It seems like we might be going backward and some might argue we should abandon the quest.

This is not the right strategy. Along with new challenges comes new technology to help us solve the problem. More importantly, new regulations (specifically GDPR) make SVOT a requirement not an option.

Is SVOT Still Worth Solving?

I was recently working with a large high tech company that drove its business through a handful of key performance indicators (KPIs). The number one thing that kept the analytics leaders awake at night was the fear that they might deliver a report to the CEO, who might deliver it to the board, which had numbers that differed from numbers coming from finance. Multiple versions of the truth were not acceptable when answering questions that came from the highest levels of this organization. For KPIs and other high level corporate metrics (many of them financial) there must be consistent and accurate reporting.

The problem can be seen when considering a metric as basic as revenue. Like the Eskimo with fifty different words for “snow”, most companies have a wide variety of words for revenue: “gross revenue”, “recognized revenue”, “net revenue”, “management revenue”, “invoiced revenue”, “commission revenue”.  Which revenue is the ‘true’ revenue?

An Eskimo might be befuddled if asked “Is there snow?” as he or she would want to know what kind of snow you were asking about. Snow that is good for sledding? Snow that indicates a storm is approaching? Or snow that is conducive to seal hunting? The Eskimo probably wouldn’t consider different definitions of snow to be multiple versions of the truth. In fact, the Eskimo do have consensus on what the truth is about snow and very precise language to convey it.

Achieving SVOT will be done through a combination of new technology, motivation of internal political will, and cleaning up and sharpening the use of a common vocabulary to answer business questions.

Can’t We Go Back to the Good Ole Days?

It seems like SVOT was easier to achieve back in the old days. There was one database and it had the right answers. How did we get to today where versions of the truth peppered throughout the enterprise?

In the early days everything was on tape. Copying was difficult and expensive. A single source of truth often consisted of a physical tape that analysts loaded onto a computer. But even back then folks recognized the importance of SVOT. The definition of the word ‘data base’ recognizes the importance of truth. “Database” was originally coined to mean a foundational ‘base’ for all data, implying that a ‘database’ was the single version of the truth.

Then Someone Made a Copy…

Once technology improved to the point organizations could have two computers and two tape archives, someone made a copy and the difficulties started. The “invention” of ETL ushered in a world of multiple databases, none of which people now considered the true ‘base’ of all data. This first copy was probably done for performance reasons, maybe to fix a data gravity issue and move data closer to a usage point or precompile data for faster retrieval.

Data warehousing only exacerbated this problem. As the grandmother of all data copying, the data warehouse compounded the growing problem of a lack of a SVOT. Interestingly SVOT is not mentioned in the indexes or table of contents of Kimball’s or Inmon’s original books on data warehousing. Probably in the early days no one recognized that the data warehouse was opening a Pandora’s Box of SVOT problems.

Multichannel Arrives - Now Everyone Has Their Own Data.

SVOT was dealt another challenge when people started generating data throughout the enterprise. It is interesting that in all the different silos of data, finance stands out as the touchstone for the ‘correct’ answer in most organizations. BI leaders are most often called to task when finance data doesn’t agree with other reports or metrics. Inconsistencies with financial metrics can cause the most sleep deprivation before presenting results to the c-level or the board.

Finance is also typically the department that has the most stringent external controls on data validity. Most other channels have only internal consistency to worry about. Finance has to be squeaky-clean compliant with a wide variety of government regulations (e.g. Sarbanes-Oxley, GAAP, Dodd-Frank). Consider that other data sources might be similarly held to higher data quality goals if there were more external ‘parental discipline’ required of them.

Consider also that SVOT goes hand in hand with building a 360-degree view of your organization. You can’t have a consistent truth if you can’t tell if your customers, employees, partners, products, and vendors differ from database to database.

Then Along Came BI and the Spreadmart.

The advent of new and more powerful technologies for putting the power of ETL, EDW, and BI into the hands of novice users didn’t do much to help the problem of delivering a SVOT. The cloud also exacerbates the fragmentation and duplication of data because it makes it so convenient to spin up a database without requiring centralized permission.

The real challenge to SVOT is not just the existence of these fragmented database copies but their disconnect from the enterprise data flow and inability to automatically update. They become dead branches, holding their out of date version of the truth.

How big a problem is this? I recently interviewed a senior data architect at a consulting firm who had been brought in to discover disparate data sources at a large healthcare company. Organization officials initially estimated there might be hundreds of different databases that needed mapping. Within a few months, more than 10,000 databases had been detected by sniffing the traffic going over their internal networks and looking for data that looked like queries. Mind you this was tens of thousands of databases not columns, not rows, not tables…. 10,000 databases.

The Data Lake Happened

The advent of the data lake was probably the biggest challenge to SVOT since the invention of the data warehouse or even ETL. It is particularly problematic because it is often not tamed by a centralized data model, contains new, wide, and big data, and extends without locality constraints to the far reaches of the data enterprise.

Data Science Pushes SVOT to the Brink

Further decentralization came about through data science, machine learning, predictive analytics, and AI. These systems not only ranged across the data ecosystems but their very raison d’etre was to seek out interesting data in the darkest corners of some long forgotten database. Data science also creates new data with its models and ‘enriches’ the existing data. They effectively create wholly new truths and inject them into the data fabric.

This last onslaught against the SVOT has made many analytics leaders throw up their hands in despair and embrace multiple versions of the truth and accept that there will be sleepless nights and last minute data tweaking to make numbers match those from finance.

But it doesn’t have to be this way. There is hope on the horizon from powerful new tools and the government. Yes, government is here to help.

GDPR Saves the Day

While our quest for SVOT seems to be thwarted at every turn by new technology and ever bigger and wider data, our challenge is not really one of technology. The stumbling block for most companies is generating the political will to get their balkanized internal organizations working together to agree on the importance of SVOT.

We have a clue from finance that some external parental discipline might be helpful. It is not a coincidence that the most heavily regulated department is also the one that often times has the best numbers, and their truth is often most synonymous with the accepted single version of the truth.

So what if there was an external regulation imposed on companies that provided immense financial incentive (say to avoid a fine as high as 4% of a company’s gross worldwide revenue) to have a good understanding of where all data resided. And what if that regulation also stipulated that information about people (e.g. customers, employees, and partners) had to be organized into a consistent view even if it was scattered across many databases?

If such an external regulation existed it might give internal organizations the political muscle to make headway on achieving a single version of the truth. The EU’s recently enforced General Data Protection Regulation (GDPR) might just be that external discipline that is needed.

Yes, I know, the GDPR is all about consumer privacy and data rights. But I predict it will also provide a catalyst to cleaning up and organizing data in general. Its requirements for data transparency and managing consent will unintentionally force companies to keep track of their data. It will also significantly help contain the spreadmart data deluge and setting consistent and controllable data pathways into and out of the data lake that keep it from becoming a data swamp.

In Conclusion: The Eskimo and Parental Discipline

In the history of our quest to find the holy grail of SVOT we have seen technologies contribute mightily to the problem. If we rolled back time to just one data ‘base’ we could have our single version of the truth – but what fun would that be?  It certainly would not provide the agility required for those companies seeking to undergo a digital transformation.

Instead of blaming technology we must recognize that our barriers to the truth have been twofold:

  • Politically and organizationally we have not provided a strong enough business case to motivate the necessary organizational changes to achieve SVOT.
  • We aren’t asking the right questions of our data. “What was my revenue last quarter?” is not a good question. It is ill formed and not precise enough. Across the enterprise there shouldn’t be multiple versions of the truth, but there are very different definitions of ‘revenue’ each with subtle but important distinctions.

GDPR provides businesses the financial incentives to better organize data. It is also going to apply some parental discipline that will make achieving SVOT much more likely.

GDPR will also apply some discipline to senior management so when they ask, “What was the revenue?” analytics leaders can fairly ask, “What kind of revenue?”. And like the Eskimo who have fifty different words for different types of snow, we will be more precise in our questions, metrics, and KPIs. Our vocabulary will expand but there will be no doubt there is a single version of the truth, and we will all speak the same language.

Read Dewayne Washington's companion article: Single Version of Truth: Attainable Goal or Fool's Gold?

More Writing from Stephen J. Smith:

“Operationalizing Data Science: 13 Challenges” https://www.eckerson.com/articles/operationalizing-data-science-13-challenges

 “Data Science is Plutonium: Powerful, Dangerous and Handle With Care” https://www.eckerson.com/articles/data-science-is-plutonium-powerful-dangerous-and-handle-with-care

“The Demise of the Data Warehouse” - https://www.eckerson.com/articles/the-demise-of-the-data-warehouse

“Ok, I Was Wrong, MDM is Broken Too: Insular, Dictatorial MDM Doesn’t Work” - https://www.eckerson.com/articles/ok-i-was-wrong-mdm-is-broken-too-insular-dictatorial-mdm-doesn-t-work

 “IT Activation Energy: Cloud Considerations in Retail and Insurance” - https://www.eckerson.com/articles/it-activation-energy-cloud-considerations-in-retail-and-insurance

“Cloud Data Warehousing: Producing the Infrastructureless Culture” -  https://www.eckerson.com/articles/cloud-data-warehousing-producing-the-infrastructureless-culture

“17 Things a CDO Should Know: A Report from MIT’s CDO-IQ Symposium” - https://www.eckerson.com/articles/17-things-a-cdo-should-know-a-report-from-mit-s-cdo-iq-symposium 

“Deep Learning, AI and Privacy” -  https://www.eckerson.com/articles/deep-learning-ai-and-privacy

“Enterprise-Grade Data Science” - https://www.eckerson.com/articles/enterprise-grade-data-science

 For Best Predictive Analytics Results: Don’t Move the Data -  https://www.eckerson.com/articles/for-best-predictive-analytics-results-don-t-move-the-data

Stephen J. Smith

Stephen Smith is a well-respected expert in the fields of data science, predictive analytics and their application in the education, pharmaceutical, healthcare, telecom and finance...

More About Stephen J. Smith

Books by Our Experts