IIIA CSIC
Published on IIIA CSIC (http://www2.iiia.csic.es)

Home > Distance-based and probabilistic record linkage for re-identification of records with categorical variables

Distance-based and probabilistic record linkage for re-identification of records with categorical variables

TitleDistance-based and probabilistic record linkage for re-identification of records with categorical variables
Publication TypeConference Paper
Year of Publication2002
AuthorsDomingo-Ferrer J [1], Torra V [2]
Conference NameButlletí de l´ACIA
Volume28
PublisherAssociació Catalana d´Intel.ligència Artificial
Pagination243-250
Abstract

Record linkage methods are methods for identifying the presence of the same individual in different data files (re-identification). This paper studies and compares the two main existing approaches for record linkage: probabilistic and distance-based. The performance of both approaches is compared when data are categorical. To that end, a distance over ordinal and nominal scales is defined. The paper shows that, for categorical data, distance-based and probabilistic-based record linkage lead to similar results. This is parallel to comparisons in the literature for numerical data, which also showed a similar behaviour between both record-linkage approaches. As a consequence, the distance proposed for ordinal and nominal scales is implicitly validated.


Source URL: http://www2.iiia.csic.es/en/node/55613

Links
[1] http://www2.iiia.csic.es/en/staff/josep-domingo-ferrer
[2] http://www2.iiia.csic.es/en/staff/vicen%C3%A7-torra