A metadata repository used to find, document, and evaluate data assets such as tables, schemas, definitions, visualizations, queries, and models.
Added Perspectives
A data catalog is a collection of metadata combined with data management and search tools that helps analysts and other data users to find the data that they need. The data catalog serves as an inventory of available datasets and provides information to evaluate fitness for intended use. Data cataloging is a curation activity that includes collecting, verifying, and publishing metadata about datasets that are available for analysis and reporting. Data analysis, machine learning, and crowd-sourced knowledge are combined to collect comprehensive metadata.
With a data catalog, data analysts can search for relevant data, profile that data, understand its lineage and quality, and examine who is using it elsewhere in the organization for what purpose and what they think about it. Basically, a catalog creates a data inventory—or data marketplace—of all relevant and available data for analysis purposes. Many companies now use catalogs to curate and govern data as well.
Data catalogs gather and store metadata about data assets. Data analysts use catalogs to find and evaluate sources for their analyses, while data stewards and curators employ them to govern and manage data sets and other artifacts. Although these functions are important, they make up only two steps in the question-to-answer workflow.