Master Data Management: A Modern Guide for Data Governance Professionals

A master data management guide for data governance professionals

Master Data Management (MDM) is no shiny object. But like many traditional IT practices, MDM is being severely tested – and rendered all the more strategic – by digitalization and rising data volumes.

MDM arose in the 1990s as a set of practices and tools to create a commonly trusted, consistent, accurate and controlled “master” data set. A master data framework defines permissible values to describe business activities related to products, customers, employees, etc. With master data, enterprises have an authoritative point of reference for the data that drives decisions.

Once you have this framework, you can aggregate your data, standardize it and match values. You can link and synchronize your records to align user and application data with the master set. Data quality tools support MDM frameworks by cleansing and normalizing data and removing errors and duplicate values.

This has all become extremely important – and difficult to achieve – in our data-driven age. Without MDM, large enterprises struggle to manage linkages and trends across the business. The data engineer at a large insurance firm recently described his team as “data-rich, but information poor” because each of a dozen recently-acquired business units had slightly different definitions of automobile status. Definitions ranged from “new” and “almost new” to “used” and “certified pre-owned.”

So it’s no surprise that MDM is moving from “nice to have” to “must-have” in many CIO budgets. Structured data volumes keep rising; platforms and applications keep sprawling; user groups and workflows keep drifting. Data silos result from intellectual property, security, compliance, and national sovereignty requirements, as well as ongoing mergers and acquisitions. Ad hoc processes, linkages, and piecemeal solutions accumulate over time, with shadow IT looming large. Data quality falls, operational risks rise and compliance suffers. One application or field change can create a cascade of unintended consequences.  

MDM needs to navigate and fix these environments. With a “golden record,” enterprises can improve efficiency and enable advanced operational and analytics use cases that drive competitive advantage. Data analytics provider Dun and Bradstreet use MDM to provide subscribers an accurate view of commercial entities receiving 5M updates daily from 30,000 sources. MDM underpins both D&B’s own environment and its consulting services to clients.

Best Practices

What follows are MDM best practices to handle today’s high volumes and varieties of structured data. These are based on the experiences of numerous enterprises, including a top-10 Canadian bank, and the Bill and Melinda Gates Foundation, which has $50 billion in assets and 1,500 employees.

  • Treat MDM as more than just technology. While mature vendor tools abound, effective and sustainable data-mastering initiatives require executive sponsorship and organizational commitment to change. Culture, inertia and organizational politics require careful navigation from the outset.  
  • Right-size your data. Like any data-related initiative, MDM is more achievable with smaller datasets. You can consolidate records and eliminate silos with data modernization efforts such as migrations from mainframe to cloud applications. By dropping incorrect or duplicative datasets during the migration, you reduce the administrative and processing workload required for your MDM initiative on the new platform.
  • Take a modular approach. Enterprises can realize “quick wins” by applying MDM first to smaller, simpler and more stable datasets. For example, by applying MDM first to a product information system, your team can demonstrate early results and thereby win support and funding for more complex and dynamic initiatives such as CRM. Many enterprises prefer the modular approach, applying domain-centric solutions such as “Customer 360.” They are only starting to kick the tires with more ambitious multi-domain MDM initiatives.
  • Give careful thought to record-matching methods. The deterministic approach to matching record values uses rules to compare attribute values and identify either an exact match or approximate match after values are standardized. Probabilistic matching, in contrast, estimates the likelihood of a match (i.e., similarity score) based on parameters such as the frequency with which data values appear across many records. Similarity scores above a certain threshold for a given pair of records are deemed to be a match. Probabilistic approaches, when designed and configured appropriately, often more flexibly accommodate higher varieties of data. Enterprises can select deterministic, probabilistic or heuristic approaches, or a mix of them depending on their dataset characteristics and vendor-specific capabilities.

  • Consider AI/machine learning – both the pros and cons. Techniques such as probabilistic matching, which are well suited to large and varied datasets, lend themselves to advanced algorithms such as machine learning. Many MDM vendors position their AI and ML capabilities as the only scalable way to master complex modern datasets. Without them, they say, there are too many schemas to integrate, too many rules to enforce, too many records to match. This is true for some environments, although machine learning introduces additional complexity. It requires identifying or generating training data, labeling it, fixing errors and continuously refining your ML model for accuracy.

MDM is just one of many old-school technologies that are in greater demand than ever because they help enterprises fight the forces of entropy that were unleashed by exploding data volumes. Designed and implemented effectively, data mastering initiatives can answer the call.

Kevin Petrie

Kevin is the VP of Research at Eckerson Group, where he manages the research agenda and writes about topics such as data integration, data observability, machine learning, and cloud data...

More About Kevin Petrie