disclosure detection & control in research environments felix ritchie
TRANSCRIPT
![Page 1: Disclosure detection & control in research environments Felix Ritchie](https://reader036.vdocuments.mx/reader036/viewer/2022082818/56649ec05503460f94bcbff2/html5/thumbnails/1.jpg)
Disclosure detection & control in research environments
Felix Ritchie
![Page 2: Disclosure detection & control in research environments Felix Ritchie](https://reader036.vdocuments.mx/reader036/viewer/2022082818/56649ec05503460f94bcbff2/html5/thumbnails/2.jpg)
Why are research environments special?
• Little disclosure control on input
• Few limits on processing
• Unpredictable, complex outputs– an infinity of “special cases”
Manual review for disclosiveness required
![Page 3: Disclosure detection & control in research environments Felix Ritchie](https://reader036.vdocuments.mx/reader036/viewer/2022082818/56649ec05503460f94bcbff2/html5/thumbnails/3.jpg)
Problems of reviewing research outputs
• Limited application of rules
• How do we ensure– consistency?– transparency?– security?
• How do we do this with few resources?
![Page 4: Disclosure detection & control in research environments Felix Ritchie](https://reader036.vdocuments.mx/reader036/viewer/2022082818/56649ec05503460f94bcbff2/html5/thumbnails/4.jpg)
Classifying the research zoo
• Some outputs inherently “safe”• Some inherently “unsafe”
• Concentrate on the unsafe– Focus training– Define limits– Discourage use
![Page 5: Disclosure detection & control in research environments Felix Ritchie](https://reader036.vdocuments.mx/reader036/viewer/2022082818/56649ec05503460f94bcbff2/html5/thumbnails/5.jpg)
Safe versus unsafe
• Safe outputs– Will be released unless certain conditions arise
• Unsafe outputs– Won’t be released unless demonstrated to be safe
Examples:
* = conditions for release apply
Unsafe Safe Indeterminate
Quantiles Linear regression* Herfindahl indexes
Graphs Panel data estimates
Aggregated tables
Cross-product matrices
Estimated covariances*
![Page 6: Disclosure detection & control in research environments Felix Ritchie](https://reader036.vdocuments.mx/reader036/viewer/2022082818/56649ec05503460f94bcbff2/html5/thumbnails/6.jpg)
Determining safety
• Key is to understand whether the underlying functional form is safe or unsafe
• Each output type assessed for risk of– Primary disclosure– Disclosure by differencing
![Page 7: Disclosure detection & control in research environments Felix Ritchie](https://reader036.vdocuments.mx/reader036/viewer/2022082818/56649ec05503460f94bcbff2/html5/thumbnails/7.jpg)
Example:linear aggregates of data are unsafe
• Inherent disclosiveness:
cXXf )(
)()()( YXfYfXf – Differencing is feasible
each data point needs to be assessedfor threshold/dominance limits
=> resource problem for large datasets
• Disclosure by differencing:
![Page 8: Disclosure detection & control in research environments Felix Ritchie](https://reader036.vdocuments.mx/reader036/viewer/2022082818/56649ec05503460f94bcbff2/html5/thumbnails/8.jpg)
Example:linear regression coefficients are safe
• Let yXXXyXf 'ˆ),( 1
),(),(),( zyXfzXfyXf
cXyXf ),(
can’t identify single data point
• But
No risk of differencing
• Exceptions– All right hand variables public and an excellent fit (easily
tested, can generate automatic limits on prediction)– All observations on a single person/company– Must be a valid regression
![Page 9: Disclosure detection & control in research environments Felix Ritchie](https://reader036.vdocuments.mx/reader036/viewer/2022082818/56649ec05503460f94bcbff2/html5/thumbnails/9.jpg)
Example:cross-product/variance-covariance matrices
– Can’t create a table for X unless Z=X and W=I
weighted covariance matrix is safe
211 ˆ'ˆ XXVyXXX
211 ̂ WXXZZWXXV
• Cross product matrix M = (X’X) is unsafe• Frequencies/totals identified by interaction with constant• And for any other categorical variables
• What about variance-covariance matrices?
– V is unsafe – can be inverted to produce M– But in the more general case
![Page 10: Disclosure detection & control in research environments Felix Ritchie](https://reader036.vdocuments.mx/reader036/viewer/2022082818/56649ec05503460f94bcbff2/html5/thumbnails/10.jpg)
Example:Herfindahl indices
• Safe as long as at least 3 firms in the industry?• No:
– Quadratic term exacerbates dominance– If second-largest share is much smaller, H share of
largest firm– Standard dominance rule of largest unit<45% share
doesn’t prevent this
• Current tests for safety not very satisfactory
i
iiii
i xxssH /2
• Composite index of industrial concentration
![Page 11: Disclosure detection & control in research environments Felix Ritchie](https://reader036.vdocuments.mx/reader036/viewer/2022082818/56649ec05503460f94bcbff2/html5/thumbnails/11.jpg)
Questions?
Felix Ritchie
Microdata Analysis and User Support
Office for National Statistics
+44 1633 45 5846