![Page 1: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf8a1a28abf838c8a86f/html5/thumbnails/1.jpg)
The Application for Statistical Processing at
SURS
Andreja Smukavec, SURS
Rudi Seljak, SURS
UNECE Statistical Data Confidentiality Work Session
Helsinki, 5 – 7 October 2015
![Page 2: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf8a1a28abf838c8a86f/html5/thumbnails/2.jpg)
Old system
• Stove-pipe oriented production– Ad-hoc solutions were developed for a
particular survey
• Survey methodologists‘ strive for improvement was crucial– “Our data are not confidential“
• Process metadata were not organized– Difficulties when a survey methodologist
resigns
![Page 3: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf8a1a28abf838c8a86f/html5/thumbnails/3.jpg)
Renovation• An internal project started in 2012
– IT, General Methodology and subject-matter specialists
– Build a global solution appropriate for most of the surveys
– Solution which covers most of the parts of statistical production:
• Data validation • Data editing and imputation• Aggregation and standard error estimation • Statistical disclosure control for tabular data• Tabulation
![Page 4: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf8a1a28abf838c8a86f/html5/thumbnails/4.jpg)
Renewed system
• Generalised metadata driven application– Database of process metadata
• MS Access -> ORACLE• For each survey instance
– General SAS code– GUI for process metadata– Different microdata environments allowed,
just some basic rules for the structure of microdata databases
• Ad hoc SAS program for preparation of microdata
![Page 5: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf8a1a28abf838c8a86f/html5/thumbnails/5.jpg)
Schematic presentation of the renewed system
Different microdata databases
General SAS
Ad -
Database of processmetadata
Metadata repository
Different kind of output
…program program
Application for management
Data on tables and variables
Ad-hoc
![Page 6: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf8a1a28abf838c8a86f/html5/thumbnails/6.jpg)
Tabular data protection1. Calculation of primary sensitivity for
seven types of statistics: number, total, share, ratio, average…
– Threshold, p%-rule, (n,k)-dominance rule– „Holding rule“ + sampling weights– Zeroes unsafe
2. Secondary suppression applied in case of sensitive statistics (number and total)
– SAS-Tool (Excel file with metadata, Tau Argus, SAS macros)
![Page 7: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf8a1a28abf838c8a86f/html5/thumbnails/7.jpg)
Tabular data protection• Results for each survey instance saved in
the database with statistics (ORACLE)– Statuses for lower precision– Confidentiality flags for the type of primary
and secondary suppression
• 3 types of tabulation (codelists)– Excel format (the most user-friendly)– plain text format (.tab,.hrc) for Tau-Argus– plain text format (.csv) for PX-Edit (SURS’s
publication tool)
![Page 8: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf8a1a28abf838c8a86f/html5/thumbnails/8.jpg)
Tabulation & Tabular Data Protection
program
General SAS program
…
Database of process metadata
Caculation of statistics
Tabulation
Different microdata databases
Ad - hoc program
Tabular protection
Output tables
General SAS program
Database with
statistics
Database of process metadata
![Page 9: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf8a1a28abf838c8a86f/html5/thumbnails/9.jpg)
Parameters for SDC in MetaSOP
![Page 10: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf8a1a28abf838c8a86f/html5/thumbnails/10.jpg)
Tabulation in MetaSOP
![Page 11: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf8a1a28abf838c8a86f/html5/thumbnails/11.jpg)
Processing in MetaSOP
![Page 12: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf8a1a28abf838c8a86f/html5/thumbnails/12.jpg)
Example of 3-dimensional table
After aggregation
CC_SI / Dim_2Dim_3
TOT F O TOT TOT 1209943548 1.09E+09 1.23E+08
1 37700934.42 35625442 207549311 47110694.48 46417660 693034.12 733763444.2 6.62E+08 7145629521 517712620.1 4.8E+08 3748999822 161044502.5 1.1E+08 5083708823 37903335.85 37783060 120275.824 343495995.1 2.86E+08 57438583
11 TOT 59283130.99 56199883 30832481 64428657.15 62453677 197498011 21989840.69 21609892 379948.22 69502173.33 67377101 212507321 13959568.67 13959569 -22 338148.7639 338148.8 z23 7911125.122 7911125 -24 27886089.54 26016025 1870064
12 TOT 215349659.2 2.04E+08 117929681 5993635.356 5993635 -11 2035728.954 2035729 -2 55635358.28 54430511 120484721 146242216.3 1.43E+08 278387622 4164502.417 3872003 292499.223 38774447.75 34931862 384258524 42332750.72 37447112 4885639
21 TOT 176972728 1.76E+08 13239981 2248602.352 2248602 z11 166013.5624 166013.6 z2 372993785.9 3.69E+08 413476921 418831917.8 4.08E+08 1033732322 29411096.08 29411096 z23 56581.5975 56581.6 z24 88244091.34 86483431 1760660
After use of SAS-Tool
CC_SI / Dim_2Dim_3
TOT F O TOT TOT 1209943548 1.09E+09 1.23E+08
1 37700934.42 35625442 207549311 47110694.48 46417660 693034.12 733763444.2 6.62E+08 7145629521 517712620.1 4.8E+08 3748999822 161044502.5 1.1E+08 5083708823 37903335.85 37783060 120275.824 343495995.1 2.86E+08 57438583
11 TOT 59283130.99 56199883 30832481 64428657.15 z z11 21989840.69 z z2 69502173.33 z z21 13959568.67 13959569 -22 338148.763 z z23 7911125.122 7911125 -24 27886089.54 z z
12 TOT 215349659.2 2.04E+08 117929681 5993635.356 5993635 -11 2035728.954 2035729 -2 55635358.28 54430511 120484721 146242216.3 1.43E+08 278387622 4164502.417 z z23 38774447.75 z z24 42332750.72 z z
21 TOT 176972728 1.76E+08 13239981 z z z11 z z z2 z z z21 418831917.8 4.08E+08 1033732322 29411096.08 z z23 z z z24 88244091.34 z z
![Page 13: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf8a1a28abf838c8a86f/html5/thumbnails/13.jpg)
New organization• Old system:
– Every survey had its own programmer and its own general methodologist
• Renewed system:– General methodologist and IT expert
(„support team“) help the subject-matter specialist to
• insert and edit the process metadata (except for SDC) into the application
• run particular parts of the statistical process
![Page 14: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf8a1a28abf838c8a86f/html5/thumbnails/14.jpg)
Advantages
• The subject-matter personnel‘s skills improve (higher quality of data)
• The process metadata can be changed easily and the procedure can be repeated in short time (flexibility)
• The rules for data processing are gathered in one place (transparency)
![Page 15: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf8a1a28abf838c8a86f/html5/thumbnails/15.jpg)
Drawbacks
• High risk of syntax errors in the process of the insertion of metadata expressions
• Subject-matter personnel has to learn some new skills (SAS expressions)
• An error during the execution can cause problem if the support team is busy or not available
![Page 16: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf8a1a28abf838c8a86f/html5/thumbnails/16.jpg)
Challenges for the future• Introduce the application successfully into
the production– Adjusting to changes by the subject-matter
specialists– Building a qualified support team
• Adding new functionalities – Indices – Secondary suppression for other types of
statistics– GUI instead of the Excel file for the SAS - Tool
![Page 17: The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,](https://reader036.vdocuments.mx/reader036/viewer/2022062518/5697bf8a1a28abf838c8a86f/html5/thumbnails/17.jpg)
Thank you for attention.