record matching for census purposes in the netherlands eric schulte nordholt senior researcher and...
TRANSCRIPT
![Page 1: Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e885503460f94b8cd62/html5/thumbnails/1.jpg)
Record matching for census purposes in the Netherlands
Eric Schulte NordholtSenior researcher and project leader of the Census
Statistics NetherlandsDivision Social and Spatial Statistics
Department Support and DevelopmentSection Research and Development
Joint UNECE/Eurostat Meeting on Population and Housing Censuses in Astana
4-6 June 2007
![Page 2: Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e885503460f94b8cd62/html5/thumbnails/2.jpg)
2
Contents
• History of the Dutch Census
• Data sources
• Micro linkage
• Micro integration
• Social Statistical Database
• Estimation aspects
• Statistical confidentiality
• Conclusions
![Page 3: Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e885503460f94b8cd62/html5/thumbnails/3.jpg)
3
History of the Dutch Census
TRADITIONAL CENSUS
Ministry of Home Affairs:
1829, 1839, 1849, 1859, 1869, 1879 and 1889
Statistics Netherlands:
1899, 1909, 1920, 1930, 1947, 1960 and 1971
Unwillingness (nonresponse) and reduction expenses no more Traditional Censuses
ALTERNATIVE: VIRTUAL CENSUS1981 and 1991: Population Register and surveys
development 90’s: more registers →
2001: integrated set of registers and surveys, SSD
![Page 4: Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e885503460f94b8cd62/html5/thumbnails/4.jpg)
4
Data sources
Registers:• Population Register (PR), 16 million records demographic variables: sex, age, household status etc.
• Jobs file, employees, 6.5 million records, and self-employed persons, 790 thousand records dates of job, branch of economic activity
• Fiscal administration (FIBASE) jobs, 7.2 million records, and pensions and life insurance benefits, 2.7 million records
• Social Security administrations, 2 million records, auxiliary information integration process
Surveys:• Survey on Employment and Earnings (SEE), 3 million records, working hours, place of work
• Labour Force Survey (LFS), 2 years: 230.000 records education, occupation, (economic) activity
![Page 5: Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e885503460f94b8cd62/html5/thumbnails/5.jpg)
5
– Matching of registers and datasets to a self constructed Central Matching File
– Records are identified by a surrogate identifier (RIN)
– One unique table RIN-Social Security Number– Minimal set of identifying variables– Every step in the process is a deterministic
match
Matching process
![Page 6: Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e885503460f94b8cd62/html5/thumbnails/6.jpg)
6
Statistics Netherlands’ backbone of persons
The Central Matching File (April 2007)46.436.060 records 16.334.210 unique persons
Social security number (sofi) < 0.03 % unknown for 1995-2007;
Date of birth < 0.5% unknown month and/or day
Gender always
Postal code < 0.05% unknown
House number < 0.05% unknown
RIN Person always
RIN Address always
Time frame of variable validity always
![Page 7: Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e885503460f94b8cd62/html5/thumbnails/7.jpg)
7
Matching process
1. Social security number matchingCheck on date of birth and genderA valid match when no more than one of the variables year, month, day of birth and gender differ
else2. Matching using other variables like postal
code, house number, date of birth, gender All keys must match
else3. Match on social security number without any
control on other variables
![Page 8: Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e885503460f94b8cd62/html5/thumbnails/8.jpg)
8
Micro data with Surrogate Identifier
Registers
Surveys
Direct Identifier
Surrogate Identifier (RIN)
Micro data Preparation
and documentati
on
YearMonthBirth, gender,
municipality, civil status
employment income, jobs
education social
security,..
Municipal Population Register
RIN
de-id
entificatio
n tab
le
de-identified micro data
RIN
RIN
RIN
RIN
Selection from Municipal population register
pro
du
ction
en
viron
men
t S
N Mic
ro d
ata
Se
rvic
es
So
cia
l Sta
tistic
s D
ata
ba
se
![Page 9: Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e885503460f94b8cd62/html5/thumbnails/9.jpg)
9
Example
Employement and Wages survey 2003 3801246 100,0
Total matched 3747976 98,6
1 Sofi number, year of birth, month, day, gender 3577090 94,1
2 Postal code, year of birth, month, day, gender 164267 4,3
3 Sofi number 6619 0,2
Not matched 53270 1,4
Valid sofi number 21194 0,6
valid postal code 5799 0,2
invalid postal code 10294 0,3
non-resident 5101 0,1
Unknown or invalid sofi number 32076 0,8
valid postal code 8718 0,2
invalid postal code 20052 0,5
non-resident 3306 0,1
![Page 10: Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e885503460f94b8cd62/html5/thumbnails/10.jpg)
10
Micro integration (1)
The aim of micro integration is:
– To check the linked data and modify incorrect records,
– In such a way that the results that are to be published are of higher quality than the original sources
![Page 11: Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e885503460f94b8cd62/html5/thumbnails/11.jpg)
11
Micro integration (2)
To fulfil this demand an integrated process of:
• data editing,
• derivation of statistical variables,
• and imputation
is executed
![Page 12: Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e885503460f94b8cd62/html5/thumbnails/12.jpg)
12
Micro integration (3)
Constraints and limitations:
- Only variables that are to be published are micro integrated
- Identity rules are necessary, e.g. the same variable in two sources or a relationship between two or more variables in one or more sources
- No mass imputation
![Page 13: Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e885503460f94b8cd62/html5/thumbnails/13.jpg)
13
Social Statistical Database (SSD)
Social Statistical Database (SSD): Set of integrated microdata files with coherent and detailed demographic and socio-economic data on persons, households, jobs and benefits
No remaining internal conflicting information
SSD set:• Population Register (backbone)• Integrated jobs file• Integrated file of (social and other) benefits• Surveys, e.g. LFS
Combining element: RIN-person
![Page 14: Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e885503460f94b8cd62/html5/thumbnails/14.jpg)
14
SSD-core
satellite
sate
llite
sate
llite
satellitesatellite
sate
llite
satellite
satellite
Core and satellites (1)
![Page 15: Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e885503460f94b8cd62/html5/thumbnails/15.jpg)
15
Core and satellites (2)
Core:
• contains only integral register information
• contains the most important demographic and socio-economic information
• contains only information that is used in at least two satellites
![Page 16: Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e885503460f94b8cd62/html5/thumbnails/16.jpg)
16
Core and satellites (3)
Satellites are produced in two steps:
• Copying and derivation of the relevant information from the core SSD
• Adding of the unique information on a specific theme from registers and surveys
![Page 17: Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e885503460f94b8cd62/html5/thumbnails/17.jpg)
17
Conclusions SSD
The SSD diminishes the administrative burdenThe SSD increases
– The efficiency of statistics production– The accuracy of statistical outputs – The relevance of social statistics– The possibilities for social policy research
![Page 18: Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e885503460f94b8cd62/html5/thumbnails/18.jpg)
18
Estimation aspects
– Surveys are samples from the population
– If surveys are enriched with register information, estimations of the register part of the enriched survey will lead to inconsistencies with the counts from the entire register
– Statistics Netherlands developed the method of consistent and repeated weighting to solve these inconsistencies
![Page 19: Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e885503460f94b8cd62/html5/thumbnails/19.jpg)
19
Statistical confidentiality
IDs Variables
Characteristics
Identifiers (PINs, sex,date of birth, address)
PERSONS BACKBONEfull range of all persons as from 1995
Administrative sources
IDs Variables
Household surveys
IDs in sources are replaced by randomRecord Identification Numbers (RINs)
![Page 20: Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e885503460f94b8cd62/html5/thumbnails/20.jpg)
20
Conclusions
• Matching is relatively cheap• Matching is relatively quick (short production time)• Micro integration remains important• The SSD has found its place in the organisation• Repeated weighting method guarantees consistent estimates• Statistical confidentiality aspects have become very important
![Page 21: Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e885503460f94b8cd62/html5/thumbnails/21.jpg)
21
Time for questions and discussion