Metadata describes the properties of data. It includes attributes, such as the names, locations, structures, schema, ownership, lineage, and usage of the data.
It seems that everyone wants data management but few care about metadata. Just as you need data about finances for effective financial management, you need data about data (metadata) for effective data management. You can’t manage data without metadata. The data catalog has become the new gold standard for metadata and a cornerstone of data curation. The data catalog has become the new gold standard for metadata. Metadata is the core of a data catalog. Every catalog collects data about the data inventory and also about processes, people, and platforms related to data. Metadata tools of the past collected business, process, and technical metadata, and data catalogs continue that practice.
Metadata is an essential output of data pipelines. In addition to producing data for delivery to a destination, pipelines must also produce the metadata that describes data provenance, lineage, and quality. These types of metadata are fundamental for delivery of trusted data. The scope and types of metadata expands when pipeline automation is metadata driven. Execution schedules, for example, become essential metadata for orchestration. Similarly, schema metadata becomes critical when AI/ML is used to detect and adapt to schema changes.
Metadata is the data that is needed to manage data. Just as data about finance is needed to manage financial assets, data about data is needed to manage data assets. Metadata is used to classify and describe data, and to guide and control the use of data. Data management is complex and metadata is essential. But metadata itself can be challenging and needs to be managed. Metadata exists in many forms—as data glossaries, data catalogs, data models, data profiles, semantic layers, taxonomies, ontologies, and much more. Data governance has a role in managing metadata quality, completeness, and consistency.