A traceable map of how a given data asset is created, used, and changed--akin to a family tree for data. Data lineage helps ensure data quality, facilitates DataOps, and assists in maintaining compliance.
Added Perspectives
Data lineage is a core data governance consideration. Ability to trace data from the original source, through analysis and reporting processes, to final analysis and reporting is a key component of trusted data. It is also valuable for change management, impact analysis, troubleshooting, and problem-solving.
The data lineage process chronicles the ‘flow’ of data through an organization from its origin points, generally beginning with a customer or patient or sensor or outside source, through its use and consumption at varying stages along the ‘data life cycle’. This life cycle process illustrates all of the points along the way when data is accessed, used, reported, changed, or used to derive new data.
Data Lineage uses the same activity logging metadata as is used for data protection audits. Capturing the step-by-step activities of accessing, changing, and publishing data supports bidirectional lineage tracing that is needed. Two-way tracing of data lineage is essential to sustain a dynamic, useful, and trusted analytics environment. Tracing published data back to origins is important in building trust. Whenever questions arise about analytics outcomes and the data used to derive those outcomes, the answers are found by tracing data lineage. Tracing the opposite directions-from data to publishing-is important when data sources change and you need to perform impact analysis.