《european commission》由会员分享,可在线阅读,更多相关《european commission(10页珍藏版)》请在金锄头文库上搜索。
1、 WP.3 ENGLISH ONLY UNITED NATIONS STATISTICAL COMMISSION and ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS EUROPEAN COMMISSION STATISTICAL OFFICE OF THE EUROPEAN COMMUNITIES (EUROSTAT) Joint UNECE/Eurostat work session on statistical data confidentiality (Manchester, United Kin
2、gdom, 17-19 December 2007) Topic (i): Microdata MICRODATA SHARING VIA PSEUDONYMIZATION Invited Paper Prepared by David Galindo, University of Malaga, Spain and Eric R. Verheul, Radboud University Nijmegen, PricewaterhouseCoopers Advisory, Netherlands Microdata sharing via pseudonymizationDavid Galin
3、do, Eric R. Verheul,Department of Computer Science, University of Malaga, Spain. (dgalindolcc.uma.es)Institute for Computing and Information Sciences, Radboud University Nijmegen, The Netherlands. (eric.verheulcs.ru.nl)PricewaterhouseCoopers Advisory, The Netherlands. ()Abstract.Individual data reco
4、rds are essential for empirical research, and yet due tothe very precious information they contain, their release poses a problem to the confiden- tiality of the individuals concerned. In this paper we give a high level description of aprivacy-preserving microdata sharing system wherein subjects ide
5、ntifiers are replaced by cryptographic pseudonyms. The resulting system facilitates information sharing between organizations that typically are not allowed to exchange the microdata they own.1IntroductionIndividual data records are essential for empirical research, and yet due to the veryprecious i
6、nformation they contain, their direct release thwarts the confidentiality of the individuals concerned. The fact that research is interested in collective features rather than individual distinctiveness, makes it possible to reconcile data utilityand individual confidentiality: data identifiers can
7、be removed or encoded and datafields can be modified by means of statistical disclosure controls, while overall thecollective features of the resulting de-identified data are preserved.Microdata comes from heterogenous sources, such as statistical offices, hospitals or insurance companies to name a
8、few. There are a number of parties, named as Researchers, interested in getting access to this data for economical or researchpurposes. In the case of national statistical offices, Researchers face in general two modes of accesses: either access to the microdata is granted in the premises of the nat
9、ional statistic authorities; or the microdata is anonymized and released to Researchers under certain conditions.In both cases, the original data has beenmodified to preclude the direct identification of the subjects. The aim of this paper is to describe privacy-preserving microdata sharing systemso
10、btained by replacing subjects identifiers with pseudonyms with special mathemat- ical and cryptographic properties. The pseudonymizing system is controlled by a Trusted Third Party (TTP), and no party in the scheme (except the TTP) can re- identify the individuals from the pseudonyms. Still, natural
11、 set operations between1different pseudonymized databases, like database union and intersection are sup-ported. These operations allow for flexible research of personal data of individualsresiding at different organizations that typically do not share information.2Pseudonymous data sharingConsider a
12、 database consisting of entries of the form (id,D(id), where id is theidentifier field (also called identity) and D(id) is the data field. A pseudonymized database is obtained by replacing the identity id in the database entries by a blindedidentifier P(id,O), called pseudonym. The blinded identifie
13、r P(id,O) does ideally not leak any information on the identity id. The individual with identity id is only known to the Organization O by its pseudonym P(id,O), and the key property is that the organization O is not able to link together P(id,O) and id (under certain cryptographic assumptions). Thi
14、s property is called pseudonymity. Pseudonymized databases with the above properties provide a virtually unex- plored tool for building privacy preserving information sharing systems. Roughly speaking, an information sharing system is called privacy-preserving if no informa- tion is leaked on indivi
15、duals identities. We stress the latter is interpreted from astrict cryptographic point of view, that is, the qualification privacy-preserving refers to the cryptographic techniques used for pseudonymization and related operations, since from a global point of view privacy-preserving pseudonymized mi
16、crodata shar- ing systems likely do not exist.The reason is simple, even though the data is pseudonymized, there is the risk that the characteristics of the data singles out a person, e.g. by a combination of profession, age and place of residence. Thisrisk of indirect identification, cf. 6, 3, becomes even larger when linking severalpseudonymized databases, which is one of our targets. The issue of indirect identifi- cation is outside the scope of