ipums-europe: confidentiality measures for licensing and disseminating restricted-access census...

33
IPUMS-Europe: IPUMS-Europe: Confidentiality measures for Confidentiality measures for licensing and disseminating licensing and disseminating restricted-access census restricted-access census microdata extracts microdata extracts https://www.ipums.org/internation https://www.ipums.org/internation al al * * * * * * Robert McCaa, Minnesota Population Robert McCaa, Minnesota Population Center Center Albert Esteve, Centre d’Estudis Albert Esteve, Centre d’Estudis Demogràfics Demogràfics Inadequate use of microdata Inadequate use of microdata has high costs” has high costs” --Len Cook (2003) --Len Cook (2003)

Post on 21-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

IPUMS-Europe: IPUMS-Europe: Confidentiality measures for licensing Confidentiality measures for licensing and disseminating restricted-access and disseminating restricted-access

census microdata extracts census microdata extracts https://www.ipums.org/international https://www.ipums.org/international

* * ** * *

Robert McCaa, Minnesota Population CenterRobert McCaa, Minnesota Population CenterAlbert Esteve, Centre d’Estudis DemogràficsAlbert Esteve, Centre d’Estudis Demogràfics

[email protected], , [email protected]

““Inadequate use of microdata Inadequate use of microdata has high costs”has high costs”

--Len Cook (2003)--Len Cook (2003)

Page 2: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

Outline: Outline: IPUMS-International Confidentiality MeasuresIPUMS-International Confidentiality Measures

1.1. Introduction: What is IPUMIntroduction: What is IPUMSSii 5 5 slides slides

2.2. Disseminating anonymized, integrated extractsDisseminating anonymized, integrated extracts3 3

slidesslides3.3. IPUMS-International confidentiality protectionsIPUMS-International confidentiality protections» LegalLegal 3 3

slidesslides» AdministrativeAdministrative 3 slides3 slides» TechnicalTechnical 6 slides6 slides

4.4. Conclusions Conclusions 2 2 slidesslides

Page 3: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

1. Introduction:1. Introduction:What is IPUM-International?What is IPUM-International?

(7 slides) (7 slides)

Page 4: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

IPUMS-International is… a global collaboratory of IPUMS-International is… a global collaboratory of National Statistical Institutes & Universities to:National Statistical Institutes & Universities to:

» 1. 1. InventoryInventory the world’s census microdata the world’s census microdata

» 2. 2. PreservePreserve endangered microdata and documentation endangered microdata and documentation

* * ** * *» 3.3. IntegrateIntegrate census microdata census microdata

» a. use standards of UNSD, Eurostat, ISCO, ISCED, etc.a. use standards of UNSD, Eurostat, ISCO, ISCED, etc.» b. facilitate comparative research in time and spaceb. facilitate comparative research in time and space

» 4. 4. AnonymizeAnonymize census microdata to preserve statistical census microdata to preserve statistical confidentiality, using highest standardsconfidentiality, using highest standards

» 5. 5. DisseminateDisseminate restricted access, custom extractsrestricted access, custom extracts to to approved researchers at no costapproved researchers at no cost

Page 5: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

IPUMS-International, October IPUMS-International, October 20052005

dark greendark green = = disseminatingdisseminatinggreen = harmonizinggreen = harmonizing

lightest greenlightest green = negotiating = negotiating

Mollweide Projection

Page 6: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

Available now: (see Table 1)Available now: (see Table 1)https://www.ipums.org/international

Page 7: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

Table 1. IPUMS-International consortium members, November, 2005

RegionRegion Oficial Statistics AuthorityOficial Statistics Authority

Africa Cameroon, Egypt, The Gambia, Kenya, South Africa, Uganda

IPUMS-Latin America

Argentina, Bolivia, Brazil, Canada, Chile, Colombia, Costa Rica, Dominican Republic, Ecuador, El Salvador, Guatemala, Honduras, Mexico, Nicaragua, Panama, Paraguay, Peru, Uruguay, USA, Venezuela

IPUMS-Global(Asia)

Armenia, Bangladesh, Cambodia, China, Fiji, Indonesia, Iraq, Israel, Malaysia, Mongolia, Pakistan, Palestinian Authority, Philippines, Turkmenistan, Vietnam

IPUMS-Europe

Austria, Belarus, Bulgaria, Czech Republic, France, Germany, Greece, Hungary, Netherlands, Portugal, Romania, Slovenia, Spain, the United Kingdom (pending: Ireland, Italy, Poland, Russia, Switzerland and Turkey).

Page 8: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

IPUMS-Europe: (Table 2). By July 2005, at 1IPUMS-Europe: (Table 2). By July 2005, at 1stst workshop 9 workshop 9 countries entrusted 28 datasets to the project (bolded); 2countries entrusted 28 datasets to the project (bolded); 2ndnd

workshop in 2006; first release in 2007workshop in 2006; first release in 2007

Page 9: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

2. Disseminating Anonymized, 2. Disseminating Anonymized, Integrated Extracts (3 slides)Integrated Extracts (3 slides)

Page 10: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

IPUMSIPUMSii integration principles integration principles

» 1. 1. RespectRespect absolute anonymity absolute anonymity

» 2. 2. PreservePreserve all original data, except adjustments to all original data, except adjustments to assure confidentiality (top codes blurrings, masking, re-assure confidentiality (top codes blurrings, masking, re-ordering, etc.)ordering, etc.)

» 3. 3. HarmonizeHarmonize codes for countries codes for countriesoccupation: ISCO, HISCO (detailed, general)occupation: ISCO, HISCO (detailed, general)education: ISCED education: ISCED “ “ “ “family: IPUMS, etc. family: IPUMS, etc. “ “ “ “

» 4. 4. EnhanceEnhance with constructed variables with constructed variables

Page 11: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

6 steps6 steps usingusing https://www.ipums.org/international:https://www.ipums.org/international:

1. Logon 1. Logon w/ passwordw/ password

2a. Study documentation2a. Study documentation2b. Design extract2b. Design extract

3. Receive email; 3. Receive email; logon with p/wordlogon with p/word

4. Download 4. Download extract (SSL extract (SSL encrypted)encrypted)

5. UnZip data5. UnZip data

(also SAS, (also SAS, STATA) STATA)

6. Analyze6. Analyze

Page 12: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

Data Dissemination: web-based extraction systemData Dissemination: web-based extraction system

»Password protected: to make and retrieve extractsPassword protected: to make and retrieve extracts

»Researcher selects: Researcher selects: »Countries, Countries, »Censuses,Censuses,»Cases/sub-populations, Cases/sub-populations, »Variables, and Variables, and »Sample densitiesSample densities

»Extract engine queues request, generates extractExtract engine queues request, generates extract

»Researcher retrieves extract via web with SSL 128-bit Researcher retrieves extract via web with SSL 128-bit encryptionencryption

»NONO: CDs, original codes, or complete datasets: CDs, original codes, or complete datasets

Page 13: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

3. Confidentiality Protections3. Confidentiality Protections(15 slides)(15 slides)

““There has been no known attempt at There has been no known attempt at identification with the 1991 SARs identification with the 1991 SARs

[microdata samples of the UK]-[microdata samples of the UK]-nor in any other countries nor in any other countries

that disseminate samples of microdata” that disseminate samples of microdata” --Elliott and Dale, --Elliott and Dale,

Journal of the Royal Statistical Society, 1999Journal of the Royal Statistical Society, 1999

Page 14: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

3 kinds of confidentiality protections:3 kinds of confidentiality protections:

1.1. Legal: Dissemination agreement between University Legal: Dissemination agreement between University of Minnesota and each National Statistical Instituteof Minnesota and each National Statistical Institute» Uniform 11 point Memorandum of Understanding regarding: Uniform 11 point Memorandum of Understanding regarding:

ownership, use, authorization, restrictions, confidentiality, ownership, use, authorization, restrictions, confidentiality, security, publication, violations, sharing, arbitration, and security, publication, violations, sharing, arbitration, and order of precedenceorder of precedence

2.2. Administrative:Administrative: conditional use license between the conditional use license between the University of Minnesota and each researcherUniversity of Minnesota and each researcher» Permission to use restricted access microdata, 3 criteria: Permission to use restricted access microdata, 3 criteria:

research need, research competence, and agree to abide by research need, research competence, and agree to abide by conditions of use licenseconditions of use license

3.3. TechnicalTechnical data protection measures data protection measures» Specific to each country …/Specific to each country …/

Page 15: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

Legal: OSI and U. MinnesotaLegal: OSI and U. Minnesota

Page 16: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

Legal: OSI and U. MinnesotaLegal: OSI and U. Minnesota(2001-4)(2001-4)

Page 17: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

Legal: OSI and U. MinnesotaLegal: OSI and U. Minnesota(2005+)(2005+)

Page 18: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

3 kinds of confidentiality protections:3 kinds of confidentiality protections:

1.1. Legal: Legal: Dissemination agreement between University Dissemination agreement between University of Minnesota and each National Statistical Instituteof Minnesota and each National Statistical Institute» Uniform 11 point Memorandum of Understanding regarding: Uniform 11 point Memorandum of Understanding regarding:

ownership, use, authorization, restrictions, confidentiality, ownership, use, authorization, restrictions, confidentiality, security, publication, violations, sharing, arbitration, and security, publication, violations, sharing, arbitration, and order of precedenceorder of precedence

2.2. Administrative: conditional use license between the Administrative: conditional use license between the University of Minnesota and each researcherUniversity of Minnesota and each researcher» Permission to use restricted access microdata, 3 criteria: Permission to use restricted access microdata, 3 criteria:

research need, research competence, and agree to abide by research need, research competence, and agree to abide by conditions of use licenseconditions of use license

3.3. Technical Technical data protection measuresdata protection measures» Specific to each country …/Specific to each country …/

Page 19: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

Legally-binding license agreement Legally-binding license agreement »protects privacy and confidentialityprotects privacy and confidentiality»assures proper use assures proper use »forces snoopers to violate lawforces snoopers to violate law

Access limited to: Access limited to: »Bona-fide researchers (credentials) Bona-fide researchers (credentials) »With a demonstrated scientific needWith a demonstrated scientific need»who agree to abide by license who agree to abide by license restrictionsrestrictions

»Confidentiality Confidentiality »No redistributionNo redistribution»Safely securedSafely secured

Restricted Access web-based systemRestricted Access web-based system DDIISSSSEEMMIINNAATTEESS

II

PP

UU

MM

SSii

Page 20: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

User Conditions of Use LicenseUser Conditions of Use License

Page 21: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

Conditions of Use License (Appendix B)Conditions of Use License (Appendix B)

Page 22: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

Conditions of Use License (Appendix B)Conditions of Use License (Appendix B)

Page 23: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

3 kinds of confidentiality protections:3 kinds of confidentiality protections:

1.1. Legal: Legal: Dissemination agreement between University Dissemination agreement between University of Minnesota and each National Statistical Instituteof Minnesota and each National Statistical Institute» Uniform 11 point Memorandum of Understanding regarding: Uniform 11 point Memorandum of Understanding regarding:

ownership, use, authorization, restrictions, confidentiality, ownership, use, authorization, restrictions, confidentiality, security, publication, violations, sharing, arbitration, and security, publication, violations, sharing, arbitration, and order of precedenceorder of precedence

2.2. Administrative: Administrative: conditional use license between the conditional use license between the University of Minnesota and each researcherUniversity of Minnesota and each researcher» Permission to use restricted access microdata, 3 criteria: Permission to use restricted access microdata, 3 criteria:

research need, research competence, and agree to abide by research need, research competence, and agree to abide by conditions of use licenseconditions of use license

3.3. Technical data protection measuresTechnical data protection measures» Specific to each country …/Specific to each country …/

Page 24: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

AANNOONNYYMMIIZZEESS

II

PP

UU

MM

SSii

» » Suppress geographical detailSuppress geographical detail» » Blur/aggregate sensitive codesBlur/aggregate sensitive codes» » Convert dates to ages (blur key vars.) Convert dates to ages (blur key vars.) » » Swap cases between districtsSwap cases between districts

» » Scramble recordsScramble records

technical measures are also technical measures are also appliedapplied, in addition to the legal , in addition to the legal

& administrative protections& administrative protections

Page 25: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

EUROSTAT statistical confidentiality standards EUROSTAT statistical confidentiality standards (Thorogood, 1999) --all endorsed by IPUMS-International(Thorogood, 1999) --all endorsed by IPUMS-International

» 1. Restrict access to samples1. Restrict access to samples

» 2. Limit geographical detail2. Limit geographical detail

» 3. Re-code unique categories--top and bottom3. Re-code unique categories--top and bottom

» 4. Sign non-disclosure agreement4. Sign non-disclosure agreement

» 5. Prohibit redistribution to third parties5. Prohibit redistribution to third parties

» 6. Prohibit attempts to identify individuals or the 6. Prohibit attempts to identify individuals or the making any claim to that effectmaking any claim to that effect

» 7. Require users to provide copies of publications7. Require users to provide copies of publications

Page 26: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

EUROSTAT statistical confidentiality standards EUROSTAT statistical confidentiality standards (Thorogood, 1999) --all endorsed by IPUMS-International(Thorogood, 1999) --all endorsed by IPUMS-International

• 8. Construct age from birthdate, if necessary8. Construct age from birthdate, if necessary• 9. Do not identify date of birth9. Do not identify date of birth• 10. Do not identify precise place of birth10. Do not identify precise place of birth• 11. Migration: timing/place not identified in detail11. Migration: timing/place not identified in detail• 12. Identify place of residence by major civil 12. Identify place of residence by major civil

division (pop>20k, 60k, 100k, 1 million—i.e., division (pop>20k, 60k, 100k, 1 million—i.e., national convention) national convention)

• 13. Do sensitivity analysis 13. Do sensitivity analysis • 14. Do confidentiality assessment14. Do confidentiality assessment

Page 27: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

Anonymization example: Kenya, 1989Anonymization example: Kenya, 1989Kenya: Anonymization Based on Unique Characteristics Threshold

(50,000 for geographic variables; 10,000 for other variables)

Type Procedure Variable Name

Key Suppressed Division, Location, Sublocation, Enumeration area, Tribe/Ethnicity

  Aggregated 50,000 minimum: Province, District of Residence, Birth and Past Residence

  None Sex, Marital Status, Relationship to Head, etc.

Sensitive Aggregated 10,000/1,000 minimum: Occupation, Employment Status

Transitory (information is considered too changeable to be used to identify individuals from microdata).

  None Age, Urban/Rural Residence, Literacy, Educational Status, Educational Level, Labor Activity, Children Everborn/Alive/Dead, Last Birth Year, Mortality variables

Page 28: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

IPUMS-International samples anonymized by:IPUMS-International samples anonymized by:Census Agency (36 countries)Census Agency (36 countries)

or IPUMS (19 countries)or IPUMS (19 countries)

Census Agency (n=36)Census Agency (n=36): Argentina, Armenia, Austria, Belarus, : Argentina, Armenia, Austria, Belarus, Brazil, Cambodia, Canada, China, *Czech Republic, Egypt, Brazil, Cambodia, Canada, China, *Czech Republic, Egypt, France, *Germany, Greece, Hungary, Indonesia, *Ireland, France, *Germany, Greece, Hungary, Indonesia, *Ireland, Israel, Malaysia, Mexico, Mongolia, Netherlands, Pakistan, Israel, Malaysia, Mexico, Mongolia, Netherlands, Pakistan, Palestinian Authority, Philippines, *Poland, *Portugal, Palestinian Authority, Philippines, *Poland, *Portugal, Puerto Rico, Romania, *Slovenia, South Africa, Spain, Puerto Rico, Romania, *Slovenia, South Africa, Spain, Turkmenistan, *Turkey, USA, UK, VietnamTurkmenistan, *Turkey, USA, UK, Vietnam

* Microdata for 7 countries not entrusted to project yet.* Microdata for 7 countries not entrusted to project yet.

IPUMS (n=19)IPUMS (n=19): Bolivia, Chile, Colombia, Costa Rica, Dominican : Bolivia, Chile, Colombia, Costa Rica, Dominican Republic, Ecuador, El Salvador, Fiji Islands, Guatemala, Republic, Ecuador, El Salvador, Fiji Islands, Guatemala, Honduras, Iraq, Kenya, Nicaragua, Panama, Paraguay, Honduras, Iraq, Kenya, Nicaragua, Panama, Paraguay, Peru, Uganda, Uruguay, VenezuelaPeru, Uganda, Uruguay, Venezuela

Page 29: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

Risk assessment of 1991 SARs:Risk assessment of 1991 SARs:the risk is very low…the risk is very low…

» After taking into account errors in the data, coding After taking into account errors in the data, coding variability and changing of personal characteristics in timevariability and changing of personal characteristics in time

» Dale and Elliott, JRSS-A (2003): Dale and Elliott, JRSS-A (2003): “For a user of an outside database, attempting this sort of “For a user of an outside database, attempting this sort of match with no opportunity for verification would prove match with no opportunity for verification would prove fruitless. In the first place, the small degree of expected fruitless. In the first place, the small degree of expected overlap would be a considerable deterrent to an intruder. overlap would be a considerable deterrent to an intruder. However, if a match between the two files was attempted However, if a match between the two files was attempted the large number of apparent matches would be highly the large number of apparent matches would be highly confusing as an intruder would have no way of checking confusing as an intruder would have no way of checking correct identification.”correct identification.”

Page 30: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

4. Conclusion4. Conclusion

Page 31: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

IPUMS-Europe

2004 – 2009

CIECM2005

DIECM

2006-2008

EIECM

2007-2009

Disseminate

Joint integrated European census microdata projects

Coordinate

Enhance

Page 32: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

1.1. Uniform legal authorization with national statistical Uniform legal authorization with national statistical authorities authorities

2.2. Access restricted to academics with need who agree to Access restricted to academics with need who agree to abide by stringent confidentiality protectionsabide by stringent confidentiality protections

3.3. Experienced integration teamsExperienced integration teams

4.4. Proven web-based distribution systemProven web-based distribution system

5.5. High user satisfactionHigh user satisfaction

6.6. Sustainable: NSF, NIH, FP-6 funded (Europe only)Sustainable: NSF, NIH, FP-6 funded (Europe only)

IPUMS-International strengthsIPUMS-International strengths

Page 33: IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts

Thank you!Thank you!https://www.ipums.org/international

additional information at:additional information at:

www.hist.umn.edu/~rmccaa/ click: ipums-europeclick: ipums-europe

and:and:

www.ced.uab.es click: IECM click: IECM* * * * * ** * * * * *

Contacts: Contacts: [email protected]@yahoo.es