responsible methods for sharing data
TRANSCRIPT
251 Laurier Avenue W, Suite 200
Ottawa, ON Canada K1P 5J6
www.privacyanalytics.ca | 855.686.4781
Responsible Methods for Sharing Data
Khaled El Emam (PhD)
De-identification Symposium
21st October 2014
© 2014 Privacy Analytics, Inc.
• Legal or regulatory requirements
• Obtaining patient consent/authorization – not practical for large databases and introduces bias
• Limiting principles / minimal necessary
• Contractual obligations
• Maintain public / consumer / patient trust
• Costs of breach notification
• Rising discipline of re-identificationattacks
Motivations for Anonymization
© 2014 Privacy Analytics, Inc.
Canadian Definitions of Identifiability1
Privacy Law Definition
Ontario
PHIPA
“Identifying information” means information that identifies
an individual or for which it is reasonably foreseeable in
the circumstances that it could be utilized, either alone or
with other information, to identify an individual.
Nfld PPHI “Identifying information” means information that identifies
an individual or for which it is reasonably foreseeable in
the circumstances that it could be utilized either alone or
together with other information to identify an individual.
Sask HIPA “De-identified personal health information” means
personal health information from which any information
that may reasonably be expected to identify an individual
has been removed.
© 2014 Privacy Analytics, Inc.
Canadian Definitions of Identifiability1
Privacy Law Definition
Alberta HIA “Individually identifying” means that the identity of the individual
who is the subject of the information can be readily ascertained
from the information; “nonidentifying” means that the identity of
the individual who is the subject of the information cannot be
readily ascertained from the information.
NB PPIA “Identifiable individual” means an individual can be identified by
the contents of the information because the information includes
the individual’s name, makes the individual’s identity obvious, or
is likely in the circumstances to be combined with other
information that includes the individual’s name or makes the
individual’s identity obvious.
© 2014 Privacy Analytics, Inc.
• Two day course on risk management when sharing data will be provided by Ryerson (February 2015)
• Privacy Analytics launching an on-line exam for anonymization professionals (November 2014)
• On-going educational and professional development opportunities in this area through Privacy Analytics
• Methodology manuals being developed for specific types of data, e.g., clinical trials and geospatial data
Anonymization Education & Credentials
© 2014 Privacy Analytics, Inc.
• Some states release or sell their hospital discharge database for free or a small fee
• Information about medical incidents that were published in newspapers are matched with the White Pages and the publicly available state hospital discharge database
State Discharge Databases Attack - I1
© 2014 Privacy Analytics, Inc.
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0028071
© 2014 Privacy Analytics, Inc.
Direct & Quasi-Identifiers
Examples of direct identifiers: Name, address, telephone number, fax number, MRN, health card number, health plan beneficiary number, VID, license plate number, email address, photograph, biometrics, SSN, SIN, device number, clinical trial record number
Examples of quasi-identifiers: sex, date of birth or age, geographic locations (such as postal codes, census geography, information about proximity to known or unique landmarks), language spoken at home, ethnic origin, total years of schooling, marital status, criminal history, total income, visible minority status, profession, event dates, number of children, high level diagnoses and procedures
© 2014 Privacy Analytics, Inc.
Spectrum of Identifiability
Little De-identification Significant De-identification
5
20
3
2
10
811
16
There are a range of operational precedents, based on situational context and mitigating controls.
© 2014 Privacy Analytics, Inc.
Large EMR Vendor
De-identified data would allow:
1. Post-marketing surveillance of adverse
events
2. Public health surveillance
3. Prescription pattern analysis
4. Health services analysis
PARAT CORE
PARAT integrated in ETL pipeline
Challenge
Why Privacy Analytics
Solution
EMR vendor with more than 2664
clinics and 5850 physicians using the
system in family clinics and walk-in
clinics. The data set spans more than
five years of all clinical, prescription,
laboratory, scheduling and billing
data.
Customer Profile
Wants to anonymize data on 535,595 patients
from general practices
Longitudinal data needs to be used for on-going
and on-demand analytics
Enabling Post-marketing and Public Health Surveillance
© 2014 Privacy Analytics, Inc.
• Two arm protocol; GI events after taking NSAIDs with and without a PPI
GI Protocol
© 2014 Privacy Analytics, Inc.
• Females 14-24 years old inclusive tested and tested positive for Chlamydia in the previous 12 months
Chlamidya Protocol