production of grid based statistics in statistics estonia kreet masik, leading gis specialist

24
Production of grid based statistics in Statistics Estonia Kreet Masik, Leading GIS specialist

Upload: sharleen-hopkins

Post on 29-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Production of grid based statistics in Statistics Estonia Kreet Masik, Leading GIS specialist

Production of grid based statistics in Statistics Estonia

Kreet Masik, Leading GIS specialist

Page 2: Production of grid based statistics in Statistics Estonia Kreet Masik, Leading GIS specialist

Agenda

Grid based statistics before 2011 Census

Grid based statistics after 2011 Census

Page 3: Production of grid based statistics in Statistics Estonia Kreet Masik, Leading GIS specialist

Grid based statistics before

3 different resolutions – 500x500m, 1x1km, 5x5km;

Only in one projection - L-EST97 (epsg:3301); Statistical data is joined with spatial data based

on grids ID; Mostly published variable - total population; Variables were published separetly The most common method is counting;

Page 4: Production of grid based statistics in Statistics Estonia Kreet Masik, Leading GIS specialist

Grid based statistics before

Grid is based on Estonian Base Map grid; Every grid has a row and a column number; Grid’s code = number of row * 10 000 + number

of column; Grids that intersect with borders are cut and the

values are not re-calculated according to the real grid size;

Grids that intersect with borders are not cut.

Page 5: Production of grid based statistics in Statistics Estonia Kreet Masik, Leading GIS specialist

Grid based statistics before - confidentiality

If the value of the grid is smaller than 3 -> value is replaced with 99999;

Total value of one grid were not published (for example – in age groups gridmap smaller values than 3 where replaced with 99999 and total number of persons per grid were not published);

Page 6: Production of grid based statistics in Statistics Estonia Kreet Masik, Leading GIS specialist

Grid based statistics before - confidentiality

Page 7: Production of grid based statistics in Statistics Estonia Kreet Masik, Leading GIS specialist

Grid based statistics before - confidentiality

Page 8: Production of grid based statistics in Statistics Estonia Kreet Masik, Leading GIS specialist

Grid based statistics before - confidentiality

Page 9: Production of grid based statistics in Statistics Estonia Kreet Masik, Leading GIS specialist

Grid based statistics in the future

4 different resolutions – 250x250m (only in the biggest cities), 500x500m, 1x1km, 5x5km

At least two projection - L-EST97 (epsg:3301) and ETRS-LAEA (epsg:3035)

Grids that intersect with border are cut and the values can be calculated according to actual grid size (based on centroids);

Statistical data is joined with spatial data based on building ID -> enables to aggregate on whatever grid or region;

More variables; The most common method is counting;

Page 10: Production of grid based statistics in Statistics Estonia Kreet Masik, Leading GIS specialist

How to solve confidentiality issues? Replace all values that are 1 or 2 with 0 -> is not suitable for

Estonia because these grids cover about 1/5 of the hole territory. Dissemination of some variables will be pointless. Based on 2000 Census results in 1x1km grid the total population will be about 12 900 smaller.

Replace all values that are 1 or 2 with 3 -> total number of persons will increase. Data in different tables will be controversial. There will be contradictions inside one table. Based on 2000 Census results in 1x1km grid the total population will grow 6300 people.

Replace all values that are 1 or 2 with 99999 -> if there is a possibility to combine information from different datasets -> data will not be confidential anymore

Page 11: Production of grid based statistics in Statistics Estonia Kreet Masik, Leading GIS specialist

How to solve confidentiality issues?

If we choose random grid where total number of dwellings with area 90-99 m² is 99999 -> which means that in the selected grid there is only 1 or 2 such dwellings

If we compare this information with other variables where value is replaced with 99999 -> we will identify that:

1. The dwelling contains:- more than 4 rooms - is one-family house and - is built in 1991 – 19952. In the dwelling there lives person who:- whose citizenship is undefined- is divorced- Is working in the field of education- is economically active, employed, employee with stable

contract- belongs to agegroup 34 - 50

Page 12: Production of grid based statistics in Statistics Estonia Kreet Masik, Leading GIS specialist

How to solve confidentiality issues?

Page 13: Production of grid based statistics in Statistics Estonia Kreet Masik, Leading GIS specialist

How to solve confidentiality issues? If we choose random grid where total number of dwellings with

area 40-49 m² and 80-89 m² is 99999 -> which means that in the selected grid there is 1 dwelling which has an area of 40-49 m² and another dwelling which has an area of 80-89 m².

If we compare this information with other variables where value is replaced with 99999 -> we will identify that:

The dwellings contain:- 3 rooms - is in one-family house- one dwelling is built in before 1919- other dwelling is built in 1946 – 1960 In the dwellings there live 5 persons:- whose citizenship is estonian- one person is single and 4 are legally married- 3 persons are economically active, employed, employee with

stable contract- the persons belong to following agegroups: 16-20, 41-50, 51-

64 and 65+

Page 14: Production of grid based statistics in Statistics Estonia Kreet Masik, Leading GIS specialist

How to solve confidentiality issues?

Page 15: Production of grid based statistics in Statistics Estonia Kreet Masik, Leading GIS specialist

How to solve confidentiality issues?

There is possibility to ascertain person/building only if you are local.

Is afore mentioned info confidential?

In Estonian Statistical Law it is said:

“Data that will allow to directly or indirectly identify statistical unit is confidential.”

If we want to publish more variables –> we have to change the disclosure rules

Page 16: Production of grid based statistics in Statistics Estonia Kreet Masik, Leading GIS specialist

How to solve confidentiality issues?

On the map we can lable these grids >5 (>3) or “confidential” like in INSPIRE -> is suitable also when there lives only one person

Publish per grid total variables (total number of persons, total number of dwellings ect) but will not publish more detailed variables if their values are 1 or 2 (age, marital status, family nuclei, area of dwelling, time of construction ect). If the total values per grid are larger than for one certain threshold value (for example 9) then small values for detailed variables will not be problematic. More unhabited grids!

Page 17: Production of grid based statistics in Statistics Estonia Kreet Masik, Leading GIS specialist

How to solve confidentiality issues?

Population topics

Sex

All 3 variables 99999 -> 1873 grids (8.6%)

At least one of variables is 99999 -> 10046 grids (46.2%)

Total Male FemaleTotal number of grids

with value 999994303 3088 3088

Percentage of all the grids that have values

19.8 14.2 14.2

Page 18: Production of grid based statistics in Statistics Estonia Kreet Masik, Leading GIS specialist

How to solve confidentiality issues?

Age

At least one of variables is 99999 -> 19798 grids (91.1%)

Total 0-5 6-10 11-15 16-20 21-30 31-40 41-50 51-64 65+

Total number of grids with value 99999

4303 4908 5465 5544 5463 6442 7070 7527 8328 9010

Percentage of all the grids that have values

19.8 22.6 25.1 25.5 25.1 29.7 32.5 34.6 38.3 41.5

Page 19: Production of grid based statistics in Statistics Estonia Kreet Masik, Leading GIS specialist

How to solve confidentiality issues?

Page 20: Production of grid based statistics in Statistics Estonia Kreet Masik, Leading GIS specialist

How to solve confidentiality issues?

Housing topics

Number of rooms

At least one of the variables is 99999 -> 18927 grids (88.1%)

Total One Two Three More than 4

Total number of grids with value 99999

10066 3051 7353 10277 8855

Percentage of all the grids that have values

46.9 14.2 34.3 47.9 41.3

Page 21: Production of grid based statistics in Statistics Estonia Kreet Masik, Leading GIS specialist

How to solve confidentiality issues?

Type of dwellings

At least one of the variables is 99999 -> 12823 grids (59.7%)

Total One family dwelling

Part of the family dwelling

Apartment Separate living room(s)

Total number of grids with value 99999

10066 10614 2042 824 180

Percentage of all the grids that have values

46.9 49.4 9.5 3.8 0.84

Page 22: Production of grid based statistics in Statistics Estonia Kreet Masik, Leading GIS specialist

How to solve confidentiality issues?

How to deal with delicate personal data (nationality and belief) – is it enough when we publish general group like estonians, non-estonians, unknown? Or should we publish that in one grid from 300 persons there are 2 scotsmans

Aggregate values to larger groups (instead of 5 to 10 years) -> is not acceptable for users

Aggregate grids – the legend will not be correct anymore and the grids are not so useful for spatial analysis

Page 23: Production of grid based statistics in Statistics Estonia Kreet Masik, Leading GIS specialist

Plans for future

Analysis with different projections -> what can occure when we provide the same data in different projections?

Analyze more variables

Analyze more resolutions

Page 24: Production of grid based statistics in Statistics Estonia Kreet Masik, Leading GIS specialist

Presentatsiooni pealkiri või esitaja nimi

Thank you!