Active Metadata: The Critical Factor for Mastering Modern Data Management
ABSTRACT: Active metadata is not a type of metadata, it’s a way of using metadata to power processes such as data access management and data quality management.
Active metadata is not a type of metadata, it’s a way of using metadata to power systems. Metadata is active when systems depend on it rather than simply providing documentation. The breadth of systems that now rely on metadata represents a sea change in its importance. It’s a critical feature of modern data architectures such as data fabric and data mesh. It makes things work such as data access management, data classification, and data quality management.
Active metadata is not a type of metadata, it’s a way of using metadata to power systems.
Let’s look at how active metadata works through the lens of data access management. Data access management (DAM) is the process of defining and enforcing policies that control access to application data throughout an enterprise. It balances two critical needs: protect the organization’s data, and enable greater access to capture its value. Until recently, more data access meant less data protection and more stringent protection meant less data access. However, the modern approach to DAM provides both greater data access and better data protection through fine-grained access controls that are dynamically enforced.
Fine-grained access controls evaluate data at runtime to determine what information a user can access and how it should be displayed. For example, customer email addresses are PII and therefore must be protected. Companies classify PII data often by applying a PII tag to attributes in a data catalog. While a tag recorded in a catalog is metadata, it’s not active metadata unless it’s used to automate processes. In this case, a DAM system would use the PII tag for customer email addresses to automate the process of dynamically enforcing data access rules at runtime. This replaces the brittle legacy approach of manually writing code for each data access contingency in each application.
Active Metadata Examples
Let’s dig into this with a detailed example. At Consolidated Products, only certain roles, including regional sales directors, can see customer email addresses. For other roles, email addresses are masked with Xs. The company’s policies state that a regional sales director can see unmasked email addresses for customers within the countries they manage; email addresses for customers from other countries are masked. Figure 1 illustrates how data appears to the US sales director role with unmasked email addresses for US customers and a masked address for the customer from France.
Figure 1. Data Displayed with Fine-grained Access Controls
The metadata in this process is active; the process depends on the PII classification and the role of the user to produce the cell-level enforcement of its policies. The process doesn’t just limit the rows of data to only those from the US, and it doesn't expose or remove the email data altogether. It evaluates each cell—the nexus of a row and column—to determine how to display the email address. And it does so dynamically at runtime. If the company adds another sales data source, the customer email addresses will be treated the same way without additional work as long as they’re classified as PII.
Active metadata plays a part in automatically classifying new data. If Consolidate Products adds a new sales data source, its data catalog or DAM solution will scan the new source and algorithmically identify the email addresses and then automatically tag them as PII. The active metadata in this case is the algorithm’s program code that identifies an email address.
One more example of metadata in action is in data quality management (DQM). DQM solutions use metadata that describes quality specifications for a given attribute to find quality problems in production data. For example, if Consolidated requires a valid customer email address on all sales orders, its DQM tool uses that metadata to identify and act on sales orders that don’t meet the company’s quality standards.
The Importance of Active Metadata
In each of these examples, an automated process using metadata replaces a manual process that could not keep up with the volumes and pace of modern data environments—enforcing data access rules, classifying sensitive data, and managing data quality. This is why it’s critical for organizations to recognize the importance of their metadata and manage it like data. Just as siloed, duplicated, and contradictory data hobbles a 360-degree view of customers, an organization’s ability to manage its vast and disparate data environment is frustrated by siloed, duplicated, and contradictory metadata.
Active metadata replaces manual processes with automation for tasks such as enforcing data access rules, classifying sensitive data, and managing data quality.
As critical dependencies on active metadata continue to grow, so too are the forces that make metadata harder to manage. In the examples above, we referred to three different types of tools, each of which has its own metadata store with overlapping metadata attributes (See Figure 2).
Figure 2. Overlapping Metadata in Separate Tools
When organizations implement these tools separately, especially as point solutions to address the crisis of the moment, they create metadata silos and thus technical debt. However, when implemented as part of an overall data strategy that includes metadata, they can plan for integrations that keep data in sync between active metadata functions. This proactive approach avoids the technical debt that comes with siloed, duplicated, and contradictory metadata.
The promise of active metadata is to power automated functions that are no longer viable as manual processes, such as data access management, data classification, and data quality management. So we have to treat metadata with the respect it deserves to get the results from it that we need. In an upcoming blog, we’ll look at architectures and best practices for how companies can integrate metadata across application silos to fully leverage the benefits of active metadata.