production of grid based statistics in statistics estonia kreet masik, leading gis specialist
TRANSCRIPT
Production of grid based statistics in Statistics Estonia
Kreet Masik, Leading GIS specialist
Agenda
Grid based statistics before 2011 Census
Grid based statistics after 2011 Census
Grid based statistics before
3 different resolutions – 500x500m, 1x1km, 5x5km;
Only in one projection - L-EST97 (epsg:3301); Statistical data is joined with spatial data based
on grids ID; Mostly published variable - total population; Variables were published separetly The most common method is counting;
Grid based statistics before
Grid is based on Estonian Base Map grid; Every grid has a row and a column number; Grid’s code = number of row * 10 000 + number
of column; Grids that intersect with borders are cut and the
values are not re-calculated according to the real grid size;
Grids that intersect with borders are not cut.
Grid based statistics before - confidentiality
If the value of the grid is smaller than 3 -> value is replaced with 99999;
Total value of one grid were not published (for example – in age groups gridmap smaller values than 3 where replaced with 99999 and total number of persons per grid were not published);
Grid based statistics before - confidentiality
Grid based statistics before - confidentiality
Grid based statistics before - confidentiality
Grid based statistics in the future
4 different resolutions – 250x250m (only in the biggest cities), 500x500m, 1x1km, 5x5km
At least two projection - L-EST97 (epsg:3301) and ETRS-LAEA (epsg:3035)
Grids that intersect with border are cut and the values can be calculated according to actual grid size (based on centroids);
Statistical data is joined with spatial data based on building ID -> enables to aggregate on whatever grid or region;
More variables; The most common method is counting;
How to solve confidentiality issues? Replace all values that are 1 or 2 with 0 -> is not suitable for
Estonia because these grids cover about 1/5 of the hole territory. Dissemination of some variables will be pointless. Based on 2000 Census results in 1x1km grid the total population will be about 12 900 smaller.
Replace all values that are 1 or 2 with 3 -> total number of persons will increase. Data in different tables will be controversial. There will be contradictions inside one table. Based on 2000 Census results in 1x1km grid the total population will grow 6300 people.
Replace all values that are 1 or 2 with 99999 -> if there is a possibility to combine information from different datasets -> data will not be confidential anymore
How to solve confidentiality issues?
If we choose random grid where total number of dwellings with area 90-99 m² is 99999 -> which means that in the selected grid there is only 1 or 2 such dwellings
If we compare this information with other variables where value is replaced with 99999 -> we will identify that:
1. The dwelling contains:- more than 4 rooms - is one-family house and - is built in 1991 – 19952. In the dwelling there lives person who:- whose citizenship is undefined- is divorced- Is working in the field of education- is economically active, employed, employee with stable
contract- belongs to agegroup 34 - 50
How to solve confidentiality issues?
How to solve confidentiality issues? If we choose random grid where total number of dwellings with
area 40-49 m² and 80-89 m² is 99999 -> which means that in the selected grid there is 1 dwelling which has an area of 40-49 m² and another dwelling which has an area of 80-89 m².
If we compare this information with other variables where value is replaced with 99999 -> we will identify that:
The dwellings contain:- 3 rooms - is in one-family house- one dwelling is built in before 1919- other dwelling is built in 1946 – 1960 In the dwellings there live 5 persons:- whose citizenship is estonian- one person is single and 4 are legally married- 3 persons are economically active, employed, employee with
stable contract- the persons belong to following agegroups: 16-20, 41-50, 51-
64 and 65+
How to solve confidentiality issues?
How to solve confidentiality issues?
There is possibility to ascertain person/building only if you are local.
Is afore mentioned info confidential?
In Estonian Statistical Law it is said:
“Data that will allow to directly or indirectly identify statistical unit is confidential.”
If we want to publish more variables –> we have to change the disclosure rules
How to solve confidentiality issues?
On the map we can lable these grids >5 (>3) or “confidential” like in INSPIRE -> is suitable also when there lives only one person
Publish per grid total variables (total number of persons, total number of dwellings ect) but will not publish more detailed variables if their values are 1 or 2 (age, marital status, family nuclei, area of dwelling, time of construction ect). If the total values per grid are larger than for one certain threshold value (for example 9) then small values for detailed variables will not be problematic. More unhabited grids!
How to solve confidentiality issues?
Population topics
Sex
All 3 variables 99999 -> 1873 grids (8.6%)
At least one of variables is 99999 -> 10046 grids (46.2%)
Total Male FemaleTotal number of grids
with value 999994303 3088 3088
Percentage of all the grids that have values
19.8 14.2 14.2
How to solve confidentiality issues?
Age
At least one of variables is 99999 -> 19798 grids (91.1%)
Total 0-5 6-10 11-15 16-20 21-30 31-40 41-50 51-64 65+
Total number of grids with value 99999
4303 4908 5465 5544 5463 6442 7070 7527 8328 9010
Percentage of all the grids that have values
19.8 22.6 25.1 25.5 25.1 29.7 32.5 34.6 38.3 41.5
How to solve confidentiality issues?
How to solve confidentiality issues?
Housing topics
Number of rooms
At least one of the variables is 99999 -> 18927 grids (88.1%)
Total One Two Three More than 4
Total number of grids with value 99999
10066 3051 7353 10277 8855
Percentage of all the grids that have values
46.9 14.2 34.3 47.9 41.3
How to solve confidentiality issues?
Type of dwellings
At least one of the variables is 99999 -> 12823 grids (59.7%)
Total One family dwelling
Part of the family dwelling
Apartment Separate living room(s)
Total number of grids with value 99999
10066 10614 2042 824 180
Percentage of all the grids that have values
46.9 49.4 9.5 3.8 0.84
How to solve confidentiality issues?
How to deal with delicate personal data (nationality and belief) – is it enough when we publish general group like estonians, non-estonians, unknown? Or should we publish that in one grid from 300 persons there are 2 scotsmans
Aggregate values to larger groups (instead of 5 to 10 years) -> is not acceptable for users
Aggregate grids – the legend will not be correct anymore and the grids are not so useful for spatial analysis
Plans for future
Analysis with different projections -> what can occure when we provide the same data in different projections?
Analyze more variables
Analyze more resolutions
Presentatsiooni pealkiri või esitaja nimi
Thank you!