One Version of the Truth According to My Cousin Vinny
ABSTRACT: Does one version of the truth still make sense? We explore this question about data and analytics, drawing lessons from the film My Cousin Vinny.
What do we want from the truth? We want it to be a factual description of reality. We want it to be simple. While it may take some work to reveal the truth, once we have it, we want there to be one truth—The Truth. However, truth is often a matter of perspective and context, which is why we so often disagree about it. This is especially the case when it comes to data. So does it make sense to expect—even demand—one version of the truth? We’ll explore this question as it relates to data and analytics, drawing on lessons from the 1992 film, My Cousin Vinny.
“One version of the truth” is the holy grail of data and analytics. It promises that with consistent data for all enterprise functions, we’ll get the same answers to critical business questions. However, the promise of one version of the truth still evades us because even with consistent data, the truth is a matter of perspective. Distributed data methodologies such as data mesh threaten to make it more elusive as multiple business domain teams develop and interpret data according to their own points of view.
To understand why the truth is a matter of perspective, consider the film My Cousin Vinny. It provides an entertaining illustration of how different perspectives affect perceptions of truth. It’s about two young men from New York who are mistakenly accused of murder while traveling through rural Alabama. Three eyewitnesses testify under oath that they saw the young men leaving the murder scene in a lime green convertible, leaving tire tracks in their haste. But under cross examination, Vinny, their defense attorney, shows that each witness’ perception was influenced by other factors such as an obstructed view, weakened eyesight, and a poor ability to estimate time.
The witnesses told the truth when they said they saw two young men run out of the store where the murder took place, jump in a lime green convertible, and speed off tires with squealing. Their testimony was a factual description of reality, although an incomplete and imprecise one. (Spoiler Alert! Don’t read the next sentence if you don’t want to know the ending.) Vinny’s research and analysis proved that they saw a different pair of young men in a different lime green convertible flee the scene.
Incomplete and imprecise descriptions of data pose a perennial challenge to most organizations. They often implement a data warehouse to overcome this challenge by transforming data from many sources according to centralized, analytics-optimized models such as a star schema. The theory is that one set of data will produce one version of the truth. This approach works to an extent, but at great cost of time and resources.
But even with conformed data sets we still see different perspectives of the same data. Consider the following sales data for an example company, eCommerce.yo.
Simple enough, right? All the company’s departments can access this common data set from their data warehouse. Yet recently, Finance and Sales came to the same meeting with different numbers.
Like the witnesses in My Cousin Vinny, both departments offer their sincere perspectives of the truth. And like Vinny, we must cross examine them to understand how their perspectives differ. In this case we would find that Finance’s perspective is based on the invoice date because they’re interested in sales revenue. However, Sales is very concerned with commissions, which they base on the sales date. Therein lies the difference between the numbers. One sale was booked in February but invoiced in March aggregating it in different months in their respective reports. Another sale was booked in March but not yet invoiced. So Finance didn’t include it.
Abandon hope for one version of the truth, all ye who enter the modern data environment
eCommerce.yo still has competing versions of the truth that they have to reconcile even after creating a common data set for sales data. Abandon hope for one version of the truth, all ye who enter the modern data environment. Its many formats and distributed sources produce data from a multitude of contexts and perspectives.
And now there’s data mesh that proposes to decentralize analytical data, shifting it toward the business domains that create and use it. According to its concepts of domain ownership and data as a product, many teams now should own and create analytical data. In Cousin Vinny terms, this is like having hundreds of witnesses, viewing hundreds of events, with thousands of influencing factors. To expect one version of the truth from this complex environment is almost absurd.
More complexity makes more work to manage it. That means channeling our inner Vinny to dig into how perspectives of data differ and understand the context that makes a data assertion true. It also means bolstering our practice of data governance to ensure that the terms we use to describe the business are precisely defined. In the example above, precise labels for each report, e.g., YTD Sales Revenue versus YTD Sales Booked, would indicate the difference of perspective between them.
Unfortunately, there are few easy answers when it comes to data and analytics, in spite of how desperately we want them. The quest for one version of the truth is really a cry for help. We want an easy answer about what data to trust. But if you put in the hard work to uncover and understand the nuances of truth like Vinny did, you’ll put your organization on the path to true insight.