TitleData Mining Methods for Linking Data Coming from Several Sources
Publication TypeConference Paper
Year of Publication2003
AuthorsTorra V, Domingo-Ferrer J, Torres À
EditorStatistics O
Conference NameThe 3rd Joint UN/ECE-Eurostat Work session on Statistical Data Confidentiality
VolumeWorking Pa

Statistical offices are faced with the problem of multiple-database data mining at least for two reasons. On one side, there is a trend to avoid direct collection of data from respondents and use instead administrative data sources to build statistical data; such administrative sources are typically diverses and scattered across several administration level. On the other side, intruders may attempt disclosure of confidential statistical data using the same approach, i.e. by linking whatever databases they can obtain. This paper discusses issues related to multiple-database data mining, with a special focus on a method for linking records across databases which do not share any variables.