risk based de-identification for sharing health data
DESCRIPTION
This presentation describes a methodology, tools, and experiences for the de-identification of health information. The objective is to support data sharing for the purpose of research and public health.TRANSCRIPT
![Page 1: Risk Based De-identification for Sharing Health Data](https://reader035.vdocuments.mx/reader035/viewer/2022062307/55495e99b4c905e94e8b5677/html5/thumbnails/1.jpg)
Risk-based De-identificationKhaled El Emam, CHEO RI & uOttawa
![Page 2: Risk Based De-identification for Sharing Health Data](https://reader035.vdocuments.mx/reader035/viewer/2022062307/55495e99b4c905e94e8b5677/html5/thumbnails/2.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
• Re-identification risk assessment, re-identification attacks, de-identification:– Birth registry / newborn screening program– Tumor bank– Hospital data (discharge abstracts and
pharmacy databases) – local, provincial/state, national
– EMR data
Background
![Page 3: Risk Based De-identification for Sharing Health Data](https://reader035.vdocuments.mx/reader035/viewer/2022062307/55495e99b4c905e94e8b5677/html5/thumbnails/3.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
• De-identification works well in practice if you adopt a risk-based approach
• Re-identification attacks are hard• It is possible to de-identify data sets
and still retain sufficient utility• De-identification can be made simple
Issues
![Page 4: Risk Based De-identification for Sharing Health Data](https://reader035.vdocuments.mx/reader035/viewer/2022062307/55495e99b4c905e94e8b5677/html5/thumbnails/4.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
Re-identification Risk Spectrum
![Page 5: Risk Based De-identification for Sharing Health Data](https://reader035.vdocuments.mx/reader035/viewer/2022062307/55495e99b4c905e94e8b5677/html5/thumbnails/5.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
![Page 6: Risk Based De-identification for Sharing Health Data](https://reader035.vdocuments.mx/reader035/viewer/2022062307/55495e99b4c905e94e8b5677/html5/thumbnails/6.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
Managing Re-identification Risk
RiskExposure
Amount ofDe-identification
MitigatingControls
Motives &Capacity
Invasion-of-PrivacyV A
V A
-
- ++
![Page 7: Risk Based De-identification for Sharing Health Data](https://reader035.vdocuments.mx/reader035/viewer/2022062307/55495e99b4c905e94e8b5677/html5/thumbnails/7.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
Determining Pr Re-identification Attempts
![Page 8: Risk Based De-identification for Sharing Health Data](https://reader035.vdocuments.mx/reader035/viewer/2022062307/55495e99b4c905e94e8b5677/html5/thumbnails/8.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
Determining Risk Threshold to Use
![Page 9: Risk Based De-identification for Sharing Health Data](https://reader035.vdocuments.mx/reader035/viewer/2022062307/55495e99b4c905e94e8b5677/html5/thumbnails/9.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
• Adjust threshold• Adjust amount of suppression that is
acceptable• Adjust precision of variables• Sub-sample• Adjust variable weights
Tradeoffs Made
![Page 10: Risk Based De-identification for Sharing Health Data](https://reader035.vdocuments.mx/reader035/viewer/2022062307/55495e99b4c905e94e8b5677/html5/thumbnails/10.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
• Passage through research ethics is significantly faster for “secondary use” protocols that are certified as low risk
• Provides an incentive for data recipients to improve their security and privacy practices
• Provides an incentive for funders to cover the costs of infrastructure for handling data
• Amount of de-identification is proportionate to the actual risk
Advantages
![Page 11: Risk Based De-identification for Sharing Health Data](https://reader035.vdocuments.mx/reader035/viewer/2022062307/55495e99b4c905e94e8b5677/html5/thumbnails/11.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
Risk Assessment
![Page 12: Risk Based De-identification for Sharing Health Data](https://reader035.vdocuments.mx/reader035/viewer/2022062307/55495e99b4c905e94e8b5677/html5/thumbnails/12.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
De-identification
![Page 13: Risk Based De-identification for Sharing Health Data](https://reader035.vdocuments.mx/reader035/viewer/2022062307/55495e99b4c905e94e8b5677/html5/thumbnails/13.jpg)
Risk Assessment for REB
![Page 14: Risk Based De-identification for Sharing Health Data](https://reader035.vdocuments.mx/reader035/viewer/2022062307/55495e99b4c905e94e8b5677/html5/thumbnails/14.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
Risk Assessment for REB
![Page 15: Risk Based De-identification for Sharing Health Data](https://reader035.vdocuments.mx/reader035/viewer/2022062307/55495e99b4c905e94e8b5677/html5/thumbnails/15.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
Risk Assessment for REB
![Page 16: Risk Based De-identification for Sharing Health Data](https://reader035.vdocuments.mx/reader035/viewer/2022062307/55495e99b4c905e94e8b5677/html5/thumbnails/16.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
• ‘Rogue researcher’ adversary• Search queries considered high risk• Combination of sub-sampling and
generalization for each tumor site data• Moving towards researcher self-
assessments to decide appropriate level of de-identification
Example – Tumor Bank
![Page 17: Risk Based De-identification for Sharing Health Data](https://reader035.vdocuments.mx/reader035/viewer/2022062307/55495e99b4c905e94e8b5677/html5/thumbnails/17.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
• ‘ Nosey neighbor’ adversary• Creation of a public data file• Diagnosis and intervention codes
presented difficulties• High level of suppression for a public
file, but acceptable utility with stronger access controls (higher threshold)
Example – Discharge Abstracts
![Page 18: Risk Based De-identification for Sharing Health Data](https://reader035.vdocuments.mx/reader035/viewer/2022062307/55495e99b4c905e94e8b5677/html5/thumbnails/18.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
• An audit program is required to ensure compliance with ‘mitigating controls’
• What if a breach happens ?– A risk management approach ensures that
the data is highly de-identified in situations where breaches are most likely
– Can demonstrate due diligence
Practical Considerations
![Page 19: Risk Based De-identification for Sharing Health Data](https://reader035.vdocuments.mx/reader035/viewer/2022062307/55495e99b4c905e94e8b5677/html5/thumbnails/19.jpg)
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
• Geospatial data and longitudinal data always represent challenges because they increase the risk of re-identification
• Thus far we’ve never had to decline a data request because of identifiability or were unable to provide data with sufficient utility for a study
Lessons Learned
![Page 20: Risk Based De-identification for Sharing Health Data](https://reader035.vdocuments.mx/reader035/viewer/2022062307/55495e99b4c905e94e8b5677/html5/thumbnails/20.jpg)
www.ehealthinformation.ca
www.ehealthinformation.ca/knowledgebase