Title | Distance-based and probabilistic record linkage for re-identification of records with categorical variables |
Publication Type | Conference Paper |
Year of Publication | 2002 |
Authors | Domingo-Ferrer J [1], Torra V [2] |
Conference Name | Butlletí de l´ACIA |
Volume | 28 |
Publisher | Associació Catalana d´Intel.ligència Artificial |
Pagination | 243-250 |
Abstract | Record linkage methods are methods for identifying the presence of the same individual in different data files (re-identification). This paper studies and compares the two main existing approaches for record linkage: probabilistic and distance-based. The performance of both approaches is compared when data are categorical. To that end, a distance over ordinal and nominal scales is defined. The paper shows that, for categorical data, distance-based and probabilistic-based record linkage lead to similar results. This is parallel to comparisons in the literature for numerical data, which also showed a similar behaviour between both record-linkage approaches. As a consequence, the distance proposed for ordinal and nominal scales is implicitly validated. |