The process of identifying duplicate records without common identifiers within or across systems (matching) and reconciling them (merging).
Give careful thought to record-matching methods. The deterministic approach to matching record values uses rules to compare attribute values and identify either an exact match or approximate match after values are standardized. Probabilistic matching, in contrast, estimates the likelihood of a match (i.e., similarity score) based on parameters such as the frequency with which data values appear across many records. Similarity scores above a certain threshold for a given pair of records are deemed to be a match. Probabilistic approaches, when designed and configured appropriately, often more flexibly accommodate higher varieties of data. Enterprises can select deterministic, probabilistic or heuristic approaches, or a mix of them depending on their dataset characteristics and vendor-specific capabilities.