ipums-international: high precision population census samples: balancing the privacy-quality...

49
IPUMS-International: IPUMS-International: High precision Population Census High precision Population Census Samples: Balancing the Privacy- Samples: Balancing the Privacy- Quality Tradeoff by Means of Quality Tradeoff by Means of Restricted Access Microdata Extracts Restricted Access Microdata Extracts https://www.ipums.org/international https://www.ipums.org/international * * * * * * Robert McCaa, Steven Ruggles, Michael Robert McCaa, Steven Ruggles, Michael Davern, Davern, Tami Swenson, and Krishna Mohan Palipudi Tami Swenson, and Krishna Mohan Palipudi Minnesota Population Center Minnesota Population Center [email protected] = information not in proceedings or on CD = information not in proceedings or on CD

Post on 21-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

IPUMS-International: IPUMS-International: High precision Population Census Samples: High precision Population Census Samples:

Balancing the Privacy-Quality Tradeoff by Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts Means of Restricted Access Microdata Extracts

https://www.ipums.org/international https://www.ipums.org/international * * ** * *

Robert McCaa, Steven Ruggles, Michael Davern, Robert McCaa, Steven Ruggles, Michael Davern, Tami Swenson, and Krishna Mohan PalipudiTami Swenson, and Krishna Mohan Palipudi

Minnesota Population CenterMinnesota Population [email protected]

= information not in proceedings or on CD= information not in proceedings or on CD

Page 2: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

Outline of paper (in proceedings, except “0.”)Outline of paper (in proceedings, except “0.”)

1.1. Introduction: The Trusted User ApproachIntroduction: The Trusted User Approach

2.2. The Case for High Precision Samples: The USA The Case for High Precision Samples: The USA ExperienceExperience

3.3. High Precision Samples with Implicit StratificationHigh Precision Samples with Implicit Stratification

4.4. Access Disclosure ControlsAccess Disclosure Controls

5.5. Technical Disclosure ControlsTechnical Disclosure Controls

6.6. Fear, Hysteria and ParanoiaFear, Hysteria and Paranoia

7.7. Conclusions and Future WorkConclusions and Future Work

0.0. What’s a historian doing at PSD2006? What’s a historian doing at PSD2006?

Page 3: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

Why am I (a historian) here?Why am I (a historian) here?

1.1. To learn from you to enhance IPUMS-International To learn from you to enhance IPUMS-International privacy and confidentiality techniquesprivacy and confidentiality techniques

2.2. To inform you of our existence and the challenges we To inform you of our existence and the challenges we faceface

3.3. To invite your contributions, as producers, users, To invite your contributions, as producers, users, and creators of statistical confidentiality methodsand creators of statistical confidentiality methods

4.4. To advertise opportunities for post-docs, staffTo advertise opportunities for post-docs, staff5.5. To invite statistical agencies to entrust census To invite statistical agencies to entrust census

microdata to the projectmicrodata to the project

Page 4: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

Confidentializing IPUMS-International, Confidentializing IPUMS-International, an integrated microdatabase with:an integrated microdatabase with:

» 150 census samples of households (50 countries)150 census samples of households (50 countries)» Containing 300 million person records with Containing 300 million person records with

hundreds of variableshundreds of variables» Available to tens of thousands of licensed users Available to tens of thousands of licensed users

regardless of country of birth, citizenship, residence regardless of country of birth, citizenship, residence or place of workor place of work

» Not a single Not a single allegationallegation of violation of privacy or of violation of privacy or statistical confidentiality--statistical confidentiality--

What’s the problem?What’s the problem?

Page 5: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

IPUMS-International: a restricted-access, IPUMS-International: a restricted-access, web-based census microdata extraction systemweb-based census microdata extraction system»Password protected: to make and retrieve extractsPassword protected: to make and retrieve extracts

»Licensed researcher selects: Licensed researcher selects: »Countries, Countries, »Censuses,Censuses,»Cases/sub-populations, Cases/sub-populations, »Variables, and Variables, and »Sample densitiesSample densities

»Extract engine queues request, generates extractExtract engine queues request, generates extract

»Researcher retrieves extract via web with SSL 128-bit Researcher retrieves extract via web with SSL 128-bit encryption and analyzes using own wares (soft/hard/wet)encryption and analyzes using own wares (soft/hard/wet)

»NONO: CDs, original codes, or complete datasets: CDs, original codes, or complete datasets

Page 6: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

6 steps6 steps usingusing https://www.ipums.org/international:https://www.ipums.org/international:

1. Logon 1. Logon w/ passwordw/ password

2a. Study documentation2a. Study documentation2b. Design extract2b. Design extract

3. Receive email; 3. Receive email; logon with p/wordlogon with p/word

4. Download 4. Download extract (SSL extract (SSL encrypted)encrypted)

5. UnZip data5. UnZip data

(also SAS, (also SAS, STATA) STATA)

6. Analyze6. Analyze

Page 7: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

IPUMS-International, December IPUMS-International, December 20062006

dark greendark green = = disseminating (20 countries, 63 disseminating (20 countries, 63 censuses, 185mpr)censuses, 185mpr)

green = harmonizing (37 countries, 100 censuses, green = harmonizing (37 countries, 100 censuses, 200mpr)200mpr)

lightest greenlightest green = negotiating = negotiating

Page 8: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

What has happened since Geneva What has happened since Geneva (xi/05)?(xi/05)?

1.1. NSF-USA renewed funding for 5 yearsNSF-USA renewed funding for 5 years

2.2. Database grew: 12 countries, 35 censuses, 65mprDatabase grew: 12 countries, 35 censuses, 65mpr

3.3. More agreements signed, census data acquiredMore agreements signed, census data acquired

4.4. New, dynamic metadata system implementedNew, dynamic metadata system implemented

5.5. Number of users doubledNumber of users doubled

6.6. Publications are taking offPublications are taking off

7.7. Paris Workshop (INED/CEPED): delegates from 14 Paris Workshop (INED/CEPED): delegates from 14 European countries and 10 non-European, plus European countries and 10 non-European, plus academic researchersacademic researchers

Page 9: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

IPUMS-EuropeIPUMS-EuropeDecember 2006December 2006

Dark greenDark green = = Disseminating (5 countries, 15 censuses, 27mpr) Disseminating (5 countries, 15 censuses, 27mpr) In Lisbon: Portugal and Hungary will become “dark green” with the launch of In Lisbon: Portugal and Hungary will become “dark green” with the launch of

samples for 4 censuses ea. for Argentina and Hungary, 3 for Portgual and Israel, samples for 4 censuses ea. for Argentina and Hungary, 3 for Portgual and Israel, 2 for Egypt and Rwanda, and 1 for Gaza and the West Bank2 for Egypt and Rwanda, and 1 for Gaza and the West Bank

Page 10: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

What will happen by Lisbon (ISI, What will happen by Lisbon (ISI, viii/07)?viii/07)?

1.1. Confidentiality methods will be enhancedConfidentiality methods will be enhanced

2.2. Database will grow: 7 countries, 19 censuses, 25mprDatabase will grow: 7 countries, 19 censuses, 25mpr

3.3. Dynamic metadata system will be expandedDynamic metadata system will be expanded

4.4. Number of users will increase!!!Number of users will increase!!!

5.5. Publications!!!Publications!!!

6.6. IPUMS Workshop (Sat Aug 25 at INE-Pt) for IPUMS Workshop (Sat Aug 25 at INE-Pt) for producers and users (registration required; please producers and users (registration required; please email [email protected])email [email protected])

7.7. Microdata Session (Fri Aug 24)Microdata Session (Fri Aug 24)

* Special conditions apply

Page 11: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

1. Introduction: 1. Introduction: The “trusted-user” approach to The “trusted-user” approach to

disseminating integrated, anonymized disseminating integrated, anonymized census microdata samplecensus microdata sample

Page 12: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

MBNA: world’s largest independent credit card issuerMBNA: world’s largest independent credit card issuerspecialist in specialist in affinity marketingaffinity marketing

» 1982: MBNA founded by Charles Cawley1982: MBNA founded by Charles Cawley––instead of competing on price, instead of competing on price, compete on affinitycompete on affinity

» 1983: Georgetown Univ Alumni Association (Cawley’s alma 1983: Georgetown Univ Alumni Association (Cawley’s alma mater) mater) supplied MBNA with names and addresses of its members in supplied MBNA with names and addresses of its members in exchange for percentage of revenues on card usageexchange for percentage of revenues on card usage

» Big hit! Large number of new accounts, low risk, high spendersBig hit! Large number of new accounts, low risk, high spenders

» 1985: new groups: American Dental Association, Aircraft 1985: new groups: American Dental Association, Aircraft Owners and Pilots Association, National Education Assoc., Owners and Pilots Association, National Education Assoc.,

» 1994: Sierra Club, 45,000 members signed with MBNA generating 1994: Sierra Club, 45,000 members signed with MBNA generating $400,000 annually for Sierra Club$400,000 annually for Sierra Club

» The rest is history!The rest is history!» 2005:2005:

Page 13: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

MBNA: world’s largest independent credit card issuerMBNA: world’s largest independent credit card issuerspecialist in specialist in affinity marketingaffinity marketing

» 1982: MBNA founded by Charles Cawley1982: MBNA founded by Charles Cawley––instead of competing on price, instead of competing on price, compete on affinitycompete on affinity

» 2005: MBNA, with 25,000 employees, acquired by Bank 2005: MBNA, with 25,000 employees, acquired by Bank of America, US$35 billionof America, US$35 billion

» How many credit cards do you have?How many credit cards do you have?

» How many How many affinityaffinity credit cards do you have? credit cards do you have?

Page 14: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

IPUMS-International: world’s largest provider of IPUMS-International: world’s largest provider of integrated census microdata to integrated census microdata to trusted userstrusted users

» 1999: Founded by Steven Ruggles and Bob McCaa,1999: Founded by Steven Ruggles and Bob McCaa,––restrict access to trusted users, and apply corresponding restrict access to trusted users, and apply corresponding confidentiality techniquesconfidentiality techniques

» 2002: 12002: 1stst release of integrated samples for 7 countries; >200 release of integrated samples for 7 countries; >200 users in first yearusers in first year

» Big hit! 69 countries signed; 57 entrusted data to IPUMS, Big hit! 69 countries signed; 57 entrusted data to IPUMS, datasets for more than 230 censuses, >150 entire datasetsdatasets for more than 230 censuses, >150 entire datasets

» 2006, 2006,

Page 15: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

IPUMS-International: world’s largest provider of IPUMS-International: world’s largest provider of integrated census microdata to integrated census microdata to trusted userstrusted users

» 1999: Founded—seeks neither profits or popularity!1999: Founded—seeks neither profits or popularity!

» 2006, 32006, 3rdrd release: release: » data for 20 countries, samples for 63 censuses, data for 20 countries, samples for 63 censuses,

» 185 million person records, 185 million person records,

» >1,000 users>1,000 users

» 2009, 82009, 8thth release: release: » data for 50 countries, samples for ~150 censusesdata for 50 countries, samples for ~150 censuses

» >300 million person records>300 million person records

» thousands of usersthousands of users

» Note: data extracts are provided only to licensed users.Note: data extracts are provided only to licensed users.

Page 16: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

2. The case of High Precision 2. The case of High Precision Samples: The USA ExperienceSamples: The USA Experience

Page 17: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

2. High Precision Samples: The Case of the USA2. High Precision Samples: The Case of the USA

» Beginning with the 1980 census, US Census Bureau Beginning with the 1980 census, US Census Bureau released 5% samples of householdsreleased 5% samples of households» Not a single allegation of misuseNot a single allegation of misuse

» 1988: first articles using high precision samples published in 1988: first articles using high precision samples published in DemographyDemography

Language use and fertility in the Mexican origin populationLanguage use and fertility in the Mexican origin population

Household size and regional outmigrationHousehold size and regional outmigration

» 1996: 1996: IPUMS-USA samples available via internetIPUMS-USA samples available via internet» Available at no cost to researchers worldwideAvailable at no cost to researchers worldwide» 81% of articles in Demography, since 1990, use high precision 81% of articles in Demography, since 1990, use high precision

samplessamples» In 2000 & 2001, high precision census microdata used twice as In 2000 & 2001, high precision census microdata used twice as

often as next most common data sourceoften as next most common data source» Analyze information for small population subgroupsAnalyze information for small population subgroups» very large census microdata samples are among the most very large census microdata samples are among the most

powerful tools available for economic and demographic analysispowerful tools available for economic and demographic analysis

Page 18: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

2. High Precision Samples: The Case of the USA2. High Precision Samples: The Case of the USA

» Beginning with the 1980 census, US Census Bureau Beginning with the 1980 census, US Census Bureau released 5% samples of householdsreleased 5% samples of households» Not a single allegation of misuseNot a single allegation of misuse

» 1988: first articles using high precision samples published in 1988: first articles using high precision samples published in DemographyDemography

Language use and fertility in the Mexican origin populationLanguage use and fertility in the Mexican origin population

Household size and regional outmigrationHousehold size and regional outmigration

» 1996: IPUMS-USA samples available via internet1996: IPUMS-USA samples available via internet» Available at no cost to researchers worldwideAvailable at no cost to researchers worldwide» 81% of articles in Demography, since 1990, use high precision 81% of articles in Demography, since 1990, use high precision

samplessamples» In 2000 & 2001, high precision census microdata used twice as In 2000 & 2001, high precision census microdata used twice as

often as next most common data sourceoften as next most common data source» Analyze information for small population subgroupsAnalyze information for small population subgroups» very large census microdata samples are among the most very large census microdata samples are among the most

powerful tools available for economic and demographic analysispowerful tools available for economic and demographic analysis

Page 19: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

3. High Precision Samples with Implicit 3. High Precision Samples with Implicit StratificationStratification

NoteNote: almost all NSIs are supplying : almost all NSIs are supplying household samples drawn to IPUMS household samples drawn to IPUMS

specifications (every nspecifications (every nthth household from household from 100% fine-grained geographically 100% fine-grained geographically stratified microdata)—see table 1stratified microdata)—see table 1

Page 20: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

IPUMS-International: IPUMS-International: High precision samples with implicit stratificationHigh precision samples with implicit stratification

» Suppress all identifying information: names, id numbers, street addresses, Suppress all identifying information: names, id numbers, street addresses, low-level administrative geography (NUTS-5, NUTS-4?, NUTS-3?, NUTS-2?) low-level administrative geography (NUTS-5, NUTS-4?, NUTS-3?, NUTS-2?)

» Sample is stratified by lowest level geography (census tract)Sample is stratified by lowest level geography (census tract)» Lower standard errors than a classic random sample—to the extent that Lower standard errors than a classic random sample—to the extent that

variables of interest are correlated with geographyvariables of interest are correlated with geography» Implicit geographical stratification is equivalent to extremely fine geographic Implicit geographical stratification is equivalent to extremely fine geographic

stratification with proportional weightingstratification with proportional weighting» Many of our NSI partners have adopted the IPUMS sample design (see table 1). Many of our NSI partners have adopted the IPUMS sample design (see table 1).

» 26 countries provided 100% microdata for the MPC to draw the sample26 countries provided 100% microdata for the MPC to draw the sample» Europe: almost all NSIs have drawn samples to IPUMS specs. for all censusesEurope: almost all NSIs have drawn samples to IPUMS specs. for all censuses

» High precision samples for 57 countries entrusting microdata (12/12/2006)High precision samples for 57 countries entrusting microdata (12/12/2006)» 10% samples: 10% samples: 43 countries43 countries» 5% 5% 10 countries10 countries» <5% <5% 4 countries 4 countries

Page 21: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

IPUMS-International: IPUMS-International: High precision samples with implicit stratificationHigh precision samples with implicit stratification

» Suppress all identifying information: names, id numbers, street addresses, Suppress all identifying information: names, id numbers, street addresses, low-level administrative geography (NUTS-5, NUTS-4?, NUTS-3?, NUTS-2?) low-level administrative geography (NUTS-5, NUTS-4?, NUTS-3?, NUTS-2?)

» Sample is stratified by lowest level geography (census tract)Sample is stratified by lowest level geography (census tract)» Lower standard errors than a classic random sample—to the extent that Lower standard errors than a classic random sample—to the extent that

variables of interest are correlated with geographyvariables of interest are correlated with geography» Implicit geographical stratification is equivalent to extremely fine geographic Implicit geographical stratification is equivalent to extremely fine geographic

stratification with proportional weightingstratification with proportional weighting» Many of our NSI partners have adopted the IPUMS sample design (see table 1). Many of our NSI partners have adopted the IPUMS sample design (see table 1).

» 26 countries provided 100% microdata for the MPC to draw the sample26 countries provided 100% microdata for the MPC to draw the sample» Europe: almost all NSIs have drawn samples to IPUMS specs. for all censusesEurope: almost all NSIs have drawn samples to IPUMS specs. for all censuses

» High precision samples for 57 countries entrusting microdata (12/12/2006)High precision samples for 57 countries entrusting microdata (12/12/2006)» 10% samples: 10% samples: 43 countries43 countries» 5% 5% 10 countries10 countries» <5% <5% 4 countries 4 countries

Page 22: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

4. Access Disclosure Controls4. Access Disclosure Controlsa. Memorandum with NSIa. Memorandum with NSI

b. License with researchersb. License with researchers

Page 23: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

A. NSI with U of MinnesotaA. NSI with U of Minnesota

Page 24: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

A. NSI with U. of MinnesotaA. NSI with U. of Minnesota(2005+)(2005+)

Page 25: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

Legally-binding license agreement Legally-binding license agreement »forces would-be snoopers to violate law forces would-be snoopers to violate law by which they can be fined and jailedby which they can be fined and jailed»protects privacy and confidentialityprotects privacy and confidentiality»assures proper use assures proper use

Access limited to: Access limited to: »Bona-fide researchers (credentials) Bona-fide researchers (credentials) »With a demonstrated scientific needWith a demonstrated scientific need»who agree to abide by license who agree to abide by license restrictionsrestrictions

»Confidentiality Confidentiality »No redistributionNo redistribution»Safely securedSafely secured»Alleging that a person has been identified is Alleging that a person has been identified is prohibitedprohibited

B. License with researchersB. License with researchersRestricted Access web-based systemRestricted Access web-based system

LLIICCEENNSSEE

II

PP

UU

MM

SSii

Page 26: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

Legally-binding license agreement Legally-binding license agreement »forces would-be snoopers to violate lawforces would-be snoopers to violate law»protects privacy and confidentialityprotects privacy and confidentiality»assures proper use assures proper use

Access limited to: Access limited to: »Bona-fide researchers (credentials) Bona-fide researchers (credentials) »With a demonstrated scientific needWith a demonstrated scientific need»who agree to abide by license who agree to abide by license restrictionsrestrictions

»Confidentiality Confidentiality »No redistribution, no commercial useNo redistribution, no commercial use»Safely securedSafely secured»Alleging that a person can be or has been Alleging that a person can be or has been identified is illegalidentified is illegal

B. License with researchersB. License with researchersRestricted Access web-based systemRestricted Access web-based system

LLIICCEENNSSEE

II

PP

UU

MM

SSii

Page 27: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts
Page 28: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts
Page 29: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

““Apply for Access”Apply for Access”

Page 30: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts
Page 31: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts
Page 32: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts
Page 33: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts
Page 34: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts
Page 35: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

End of applicationEnd of application

Page 36: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

5. Technical Disclosure Controls5. Technical Disclosure Controls

Page 37: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

CCOONNFFIIDDEENNTTIIAALLIIZZEESS

II

PP

UU

MM

SSii

» » Suppress geographical detailSuppress geographical detail» » Blur/aggregate sensitive codesBlur/aggregate sensitive codes» » Convert dates to ages (blur key vars.) Convert dates to ages (blur key vars.) » » Swap cases between districtsSwap cases between districts

» » Scramble order of recordsScramble order of records

technical measures are also technical measures are also appliedapplied, in addition to the legal , in addition to the legal

& administrative protections& administrative protections

Page 38: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

EUROSTAT statistical confidentiality standards EUROSTAT statistical confidentiality standards (Thorogood, 1999) --all endorsed by IPUMS-International(Thorogood, 1999) --all endorsed by IPUMS-International

» 1. Restrict access to samples1. Restrict access to samples

» 2. Limit geographical detail2. Limit geographical detail

» 3. Re-code unique categories--top and bottom3. Re-code unique categories--top and bottom

» 4. Sign non-disclosure agreement4. Sign non-disclosure agreement

» 5. Prohibit redistribution to third parties5. Prohibit redistribution to third parties

» 6. Prohibit attempts to identify individuals or the 6. Prohibit attempts to identify individuals or the making any claim to that effectmaking any claim to that effect

» 7. Require users to provide copies of publications7. Require users to provide copies of publications

Page 39: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

EUROSTAT statistical confidentiality standards EUROSTAT statistical confidentiality standards (Thorogood, 1999) --all endorsed by IPUMS-International(Thorogood, 1999) --all endorsed by IPUMS-International

• 8. Construct age from birthdate, if necessary8. Construct age from birthdate, if necessary• 9. Do not identify date of birth9. Do not identify date of birth• 10. Do not identify precise place of birth10. Do not identify precise place of birth• 11. Migration: timing/place not identified in detail11. Migration: timing/place not identified in detail• 12. Identify place of residence by major civil 12. Identify place of residence by major civil

division (pop>20k, 60k, 100k, 1 million—i.e., division (pop>20k, 60k, 100k, 1 million—i.e., national convention) national convention)

• 13. Do sensitivity analysis (not yet)13. Do sensitivity analysis (not yet)• 14. Do confidentiality assessment (not yet)14. Do confidentiality assessment (not yet)

Page 40: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

““There has been no known attempt at There has been no known attempt at identification with the 1991 SARs identification with the 1991 SARs

[microdata samples of the UK]-[microdata samples of the UK]-nor in any other countries nor in any other countries

that disseminate samples of microdata” that disseminate samples of microdata” --Elliott and Dale, --Elliott and Dale,

Journal of the Royal Statistical Society, 1999Journal of the Royal Statistical Society, 1999

6. Countering Fear, Hysteria and 6. Countering Fear, Hysteria and Paranoia…with reasonParanoia…with reason

Page 41: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

ChoicePoint Data Sources and Clients. Source: Washington

Post

http://www.choicepoint.com/

Why Not?Why Not?Companies want linkable Companies want linkable data with names, data with names, addresses, ID #s, etc.addresses, ID #s, etc.

* * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * *Probabilistic linking with Probabilistic linking with 90% of the population 90% of the population missing is not good missing is not good enoughenough

Page 43: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

“…“…there are no known incidents of researchers using their there are no known incidents of researchers using their access to microdata to deliberately identify individuals...”access to microdata to deliberately identify individuals...”

----Managing Statistical Confidentiality and Microdata Managing Statistical Confidentiality and Microdata Access: Principles and Guidelines of Good PracticeAccess: Principles and Guidelines of Good Practice

UNECE, Conference of European Statisticians, UNECE, Conference of European Statisticians, Task Force on Census Microdata (October 2006), p. 19 Task Force on Census Microdata (October 2006), p. 19

http://www.unece.org/stats/documents/tfcm/1.e.pdf

Page 44: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

““Statistical disclosure control methods may modify the data Statistical disclosure control methods may modify the data or the design of the statistic, or a combination of both. or the design of the statistic, or a combination of both.

They will be judged sufficient when the guarantee of They will be judged sufficient when the guarantee of confidentiality can be maintained, taking account of confidentiality can be maintained, taking account of

information likely to be available to third parties, either from information likely to be available to third parties, either from other sources or as previously released National Statistics other sources or as previously released National Statistics

outputs, against the following standard:outputs, against the following standard:

“It would take a disproportionate amount of time, “It would take a disproportionate amount of time, effort and expertise for an intruder to identify effort and expertise for an intruder to identify

a statistical unit to others, or to reveal information a statistical unit to others, or to reveal information about that unit not already in the public domain.”about that unit not already in the public domain.”

Protocols on Data Access and Confidentiality, Protocols on Data Access and Confidentiality, pp. 7-8 pp. 7-8 --ONS-UK(2004)(2004)

www.statistics.gov.uk/about_ns/cop/downloads/prot_data_access_confidentiality.pdf

Page 45: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

7. Conclusions and Future Work7. Conclusions and Future Work

Page 46: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

1.1. Uniform legal authorization with national statistical Uniform legal authorization with national statistical authorities authorities

2.2. Access restricted to academics with need who agree to Access restricted to academics with need who agree to abide by stringent confidentiality protectionsabide by stringent confidentiality protections

3.3. Experienced integration teamsExperienced integration teams

4.4. Proven web-based distribution systemProven web-based distribution system

5.5. High user satisfactionHigh user satisfaction

6.6. Sustainable: NSF, NIH, FP-6 (7?) funded (Europe only)Sustainable: NSF, NIH, FP-6 (7?) funded (Europe only)

IPUMS-International strengthsIPUMS-International strengths

Page 47: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

Significant weakness: statistical disclosure controlsSignificant weakness: statistical disclosure controls…as a result of PSD2006, we will:…as a result of PSD2006, we will:

» Re-consider our portfolio of statistical disclosure Re-consider our portfolio of statistical disclosure controlscontrols

» Implement a uniform set of controls across all samples Implement a uniform set of controls across all samples and countriesand countries

» Do sensitivity analysisDo sensitivity analysis

» Do confidentiality assessmentDo confidentiality assessment

» Revise our documentation on the confidentializing of Revise our documentation on the confidentializing of datasets for each country, describing principles, but not datasets for each country, describing principles, but not the “keys”the “keys”

» Cite bibliography for users to confidentialize tables and Cite bibliography for users to confidentialize tables and graphsgraphs

Page 48: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

IPUMS-International, August IPUMS-International, August 2009???2009???

dark greendark green = = disseminating (50 countries, 150 disseminating (50 countries, 150 censuses, 300mpr)censuses, 300mpr)

green = harmonizing (?? countries, ?? censuses, ???green = harmonizing (?? countries, ?? censuses, ???mpr)mpr)

lightest greenlightest green = negotiating = negotiating

Page 49: IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts

Thank you!Thank you!https://www.ipums.org/international

additional information at:additional information at:

www.hist.umn.edu/~rmccaa/

* * * * * ** * * * * *Contact: Contact: [email protected]