Data Governance Part III: Two Worlds of Data Governance: Managing Data from the Top and Bottom
This is the third in a multi-part series on data governance.
In my last column, I described two approaches to creating a common data vocabulary: the executive fiat and the Trojan horse. (See “Part II: How to Create a Common Data Vocabulary”). In this article, I argue that there is not just one world of data governance, but two.
“How can that be?” you might wonder. Isn’t the purpose of data governance to standardize data? And create unique definitions of key data elements and eliminate duplicate data values? (See “Part I: Data’s Evil Twins: Definitions and Duplicates.”) But if you have two worlds of data governance—or two domains in which data is defined and de-duplicated—doesn’t that undermine any standard you define?
On the surface, the answer is yes. However, the two worlds of data governance aren’t conflicting rivals; they are synergistic counterparts. Organizations that want to normalize data and govern it as a corporate asset need to recognize these two worlds, build requisite processes to support each, and stitch them together in a coherent whole.
Top-down Data Governance
In one world, the traditional or “top down” world of data governance, an organization creates a data governance program—either formally or informally—to define key data elements and establish processes to manage, change, update data definitions and ensure the integrity of data outputs. A data governance program consists of various governing bodies with different decision rights and responsibilities comprised of data owners and subject matter experts who create, apply and monitor the implementation of data policies, standards, and procedures.
Data dictionary. One important result of a top-down data governance program is a data dictionary that provides a useful guide for business users who want to interrogate the meaning of a data element in a dashboard or report. These dictionaries are also useful to data developers who need to build new applications using the defined data elements.
Rework. One problem with the top-down approach is that definitions crafted in the abstract by a data governance committee can be fairly brittle. Data governance boards, and even subject matter experts, can’t possibly know all the existing use cases in which the data element is used nor imagine future use cases. As a result, top-down definitions often undergo significant alteration and refinement over time, requiring lots of dialogue and negotiation, creating lots of modeling rework at significant expense.
Broken GPS. Another problem with the top-down approach is that the data dictionary doesn’t always provide a useful guide to the underlying data in the data warehouse or other applications and systems. A good dictionary can describe the origin of a data element but not how to navigate all the tables and fields in those systems.
This is a big problem for business analysts, data scientists and data analysts (i.e. the people who find useful data for predictive modeling) who often get lost in the underlying data when trying to answer new questions that may not be modeled in the data warehouse. In essence, top-down data governance doesn’t provide business analysts an effective GPS for navigating detailed data.
Bottom-up Data Governance
The second world of data governance starts where the top-down world leaves off. That is, it provides a detailed map of the underlying tables that comprise a data warehouse, most of which are not designed using pristine star or snowflake schemas or annotated with ample metadata. Most data warehouses I’ve seen have hundreds of tables, some staging, some aggregate, and some that no one knows anymore what they do or how they got there!
In this data wilderness, most business analysts struggle to identify the right tables and fields to use to conduct an analysis. Most rely on tribal knowledge, passed down from one generation of analyst to another to guide their queries and extracts. Most cross their fingers when they pull data, hoping it’s the right data to apply rules governing their analysis. As a result, they never full trust their data, nor the reports created by other analysts. Intimate knowledge breeds deep distrust.
But what would happen if we could institutionalize the tribal knowledge? Rather than passing knowledge about data by word of mouth, what if business analysts and data scientists documented what they learn as they navigate databases? And what if they collaborated about those discoveries, gradually refining the meaning and usefulness of each field for themselves and future generations of analysts?
Bottom-up data governance uses self-service data profiling, data wrangling, and collaboration tools to crowdsource data definitions. Leaving breadcrumbs behind wherever they go, analysts create a virtual map of data for others to follow and refine. Although the map may be fuzzy and indeterminant at first, over time the outlines and features crystalize into distinct form, just as images downloaded over the internet gradually de-pixelize before your eyes.
Summary
Recognizing these two worlds of data governance is the first step towards creating a holistic strategy for managing data that survives the test of time. The next article in this series, “Part IV: Stitching Together the Worlds of Data Governance”, will examine how to weave top-down and bottom-up governance artifacts into a coherent data strategy.