de-identifying the ehr: building a resource for research clinical e-science framework de-identifying...

19
Clinical e-Science Framework De-identifying the EHR: De-identifying the EHR: building a resource for building a resource for research research All Hands Meeting - BOF session Dr Dipak Kalra, UCL on behalf of the CLEF Consortium

Upload: cody-french

Post on 04-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: De-identifying the EHR: building a resource for research Clinical e-Science Framework De-identifying the EHR: building a resource for research All Hands

Clinical e-Science FrameworkDe-identifying the EHR: De-identifying the EHR:

building a resource for researchbuilding a resource for research

All Hands Meeting - BOF sessionDr Dipak Kalra, UCL

on behalf of the CLEF Consortium

Page 2: De-identifying the EHR: building a resource for research Clinical e-Science Framework De-identifying the EHR: building a resource for research All Hands

CLEF’s Goals

GRID

GRID

• Collect clinical information from multiple sitesCollect clinical information from multiple sites

• Analyse, structure and integrate itAnalyse, structure and integrate it

• Make it available using GRID tools (e.g. Make it available using GRID tools (e.g. mymyGrid)Grid)

• To authorised clinicians and e-Health scientistsTo authorised clinicians and e-Health scientists

• In a secure and ethical collaborative frameworkIn a secure and ethical collaborative framework

Ethical oversight committee

Page 3: De-identifying the EHR: building a resource for research Clinical e-Science Framework De-identifying the EHR: building a resource for research All Hands

The CLEF records resource

• a repository of longitudinal cancer clinical records– that has been analysed and semantically indexed – to provide a summary of what happened and why at each

point in a patient's evolving story of care– that can be queried across substantial populations of similar

patients through an intuitive query workbench

Page 4: De-identifying the EHR: building a resource for research Clinical e-Science Framework De-identifying the EHR: building a resource for research All Hands

The CLEF repository has to be:

• scalable to populate– capable of incorporating large numbers of fine grained

personal health records– from many different clinical systems in primary, secondary

and tertiary care– each longitudinally linked so that the CLEF record can grow

as each actual patient's care progresses

• widely accessible to distributed research teams across the UK and ultimately internationally

• conformant to ethical and legal requirementsconformant to ethical and legal requirements

Page 5: De-identifying the EHR: building a resource for research Clinical e-Science Framework De-identifying the EHR: building a resource for research All Hands

All use of personal health data is regulated

• In the UK:– Common Law of Confidentiality– Data Protection Act 1998– Human Rights Act 1998– Section 60 of Heath & Social Care Act 2001– BMA Guidance Oct 1999– GMC Guidance Sept 2000

• At a European Level– European Community Directive 95/46/EC (1995)– Council of Europe Recommendation R(97)5 (1997)

Page 6: De-identifying the EHR: building a resource for research Clinical e-Science Framework De-identifying the EHR: building a resource for research All Hands

Personal data

• The Data Protection Act defines "personal data" as:"data which relate to a living individual who can be identified (a)

from those data, or (b) from those data and other information which is in the possession of, or is likely to come into the possession of, the data controller"

• This is likely to apply to any clinically useful information about living patients

• Patient consent would be required for CLEF to acquire the data into its repository, and for each new kind of research access to the data– This is not scalableThis is not scalable

Page 7: De-identifying the EHR: building a resource for research Clinical e-Science Framework De-identifying the EHR: building a resource for research All Hands

Anonymised research data

• If legitimately processed for research or statistical purposes“can be kept indefinitely and are exempt from the subject access rights if the results of the work are not made available in a form from which data subjects can be identified”

• If CLEF can make sure the data is anonymous consent is not required and the data may be used for any reasonable research purpose– This is the only scalable approach

• But.. no anonymisation can be perfectBut.. no anonymisation can be perfect

Page 8: De-identifying the EHR: building a resource for research Clinical e-Science Framework De-identifying the EHR: building a resource for research All Hands

The CLEF ethics approach

1) de-identify the data

2) depersonalise the parts of the record which are most vulnerable to revealing who the patient is

3) still treat the data as having some small potential risk of re-identification

regulate, restrict and monitor access

Page 9: De-identifying the EHR: building a resource for research Clinical e-Science Framework De-identifying the EHR: building a resource for research All Hands

ReidentifyBy Hospital

PseudonymiseIn Hospital

Depersonalise

ExtractInformation

Integrate &Aggregate

Construct‘Chronicle’

Chronicle

Ethical oversightcommittee

PseudonymisedRepository

Hazardmonitoring

Knowledgeenrichment

Summarise& Formulate

Queries

Individual Summaries& Queries

PrivacyEnhancementTechnologies

ReidentifyBy Hospital

PseudonymiseIn Hospital

Depersonalise

ExtractInformation

Integrate &Aggregate

Construct‘Chronicle’

Chronicle

Ethical oversightcommittee

PseudonymisedRepository

Hazardmonitoring

Knowledgeenrichment

Summarise& Formulate

Queries

Individual Summaries& Queries

PrivacyEnhancementTechnologies

ArchitectureOutline

Data Acquisition Cycle

Data Access Cycle

Page 10: De-identifying the EHR: building a resource for research Clinical e-Science Framework De-identifying the EHR: building a resource for research All Hands

1) De-identification

1a) replace real patient identifiers with random keys – done securely by the clinical site providing the data– consistent longitudinally within an enterprise

(across enterprises is more difficult)– not known to CLEF– but it remains possible for the original site to re-identify a patient

if this is warranted– "one-way key encryption"

1b) exclude highly identifying data elements from the record extraction– e.g. demographics (except postal district, gender, year of birth)

Page 11: De-identifying the EHR: building a resource for research Clinical e-Science Framework De-identifying the EHR: building a resource for research All Hands

2) Depersonalisation

Medical narratives (letters, reports, summaries) are– rich in useful clinical data – most likely to reveal something personal about the patient

2a) CLEF tools will lexically analyse all such narratives– to remove occurrences of personal names or references,

locations, highly-specified occupations etc.– to extract and code the key features of the clinical story and

care process– records will be stored within a standards-based architecture

• incorporating formal access control measures

2b) these original depersonalised narratives will not be accessed directly by the query workbench– access will be limited to the extracted coded data

Page 12: De-identifying the EHR: building a resource for research Clinical e-Science Framework De-identifying the EHR: building a resource for research All Hands

ROYAL MARSDEN NHS TRUST - PATIENT CASE NOTE 324A621F:MRS Dorothy Smith

DOB: 12/05/44 21, Park Crescent

Basingstoke B12 Q13 16 Dec 1992 Seen in General Surgical  This lady who has had a mastectomy and left open capsulotomy and removal of her prosthesis was seen by me in the clinic today on behalf of Mr Peterson. She has extensive bony lymphoedema in her left arm which does not seem to be getting any better although she is more or less reconciled to the problem. The original problem was that she complained of shooting pain in the direction of ulna nerve and although there does not seem to be any evidence of local, regional or distant recurrence the pain itself warrants management in a pain clinic. Mrs Smith could be seen in the pain clinic at the Marsden but as this would involve a lot of travelling would like to be treated nearer her home. I wonder whether it would be possible for you to investigate if there is a pain clinic available at Basingstoke as I am sure Dotty could be treated and benefit from its management. I have otherwise arranged for her to be seen in the clinic again in a year's time. There are no signs of recurrence at this time. 

Mr Thomas Partridge

Pseudonymisation at hospital

Carer pseudonymised

Clinic date blurredpreserving sequence

Mrs SmithXXXXXXXXX

324A621F:MRS Dorothy Smith

21, Park CrescentBasingstoke B12 Q13

########:######### ####### Obvious mentions of patient name

ROYAL MARSDEN##### #######

MarsdenXXXXXXX

BasingstokeXXXXXXXXXXX

CLEF-RMH-Entry-Key: 52A4F6DB2B46E

or hospital name removed

Overt identifying information removed in hospital & ID replaced by CLEF Entry Key

Mr Thomas Partridge

16 Dec 1992

5213A4F612F1

AB 1992

12/05/441944

Date of birth reduced to year

Page 13: De-identifying the EHR: building a resource for research Clinical e-Science Framework De-identifying the EHR: building a resource for research All Hands

Depersonalisation by CLEF Language Technology…

Non-obvious identifyinginformation removedusing languagetechnology

##### ####### NHS TRUST - PATIENT CASE NOTE ########:######### ####### DOB: 1944 CLEF-RMH-Entry-Key: 52A4F6DB2B46E

AB 1992 Seen in General Surgical  This lady who has had a mastectomy and left open capsulotomy and removal of her prosthesis was seen by me in the clinic today on behalf of Mr Peterson. She has extensive bony lymphoedema in her left arm which does not seem to be getting any better although she is more or less reconciled to the problem. The original problem was that she complained of shooting pain in the direction of ulna nerve and although there does not seem to be any evidence of local, regional or distant recurrence the pain itself warrants management in a pain clinic. XXXXXXXXX could be seen in the pain clinic at the XXXXXXX but as this would involve a lot of travelling would like to be treated nearer her home. I wonder whether it would be possible for you to investigate if there is a pain clinic available at XXXXXXXXXXX as I am sure Dotty could be treated and benefit from its management. I have otherwise arranged for her to be seen in the clinic again in a year's time. There are no signs of recurrence at this time.

5213A4F612F1

Nick-name “Dotty”spotted by language software & removedXXXXX

XXXXXXXXXXX

Carer name spotted & pseudonymised

Page 14: De-identifying the EHR: building a resource for research Clinical e-Science Framework De-identifying the EHR: building a resource for research All Hands

##### ####### NHS TRUST - PATIENT CASE NOTE ########:######### ####### DOB: 1944 CLEF-RMH-Entry-Key: 52A4F6DB2B46E

AB 1992 Seen in General Surgical  This lady who has had a mastectomy and left open capsulotomy and removal of her prosthesis was seen by me in the clinic today on behalf of XXXXXXXXXXX. She has extensive bony lymphoedema in her left arm which does not seem to be getting any better although she is more or less reconciled to the problem. The original problem was that she complained of shooting pain in the direction of ulna nerve and although there does not seem to be any evidence of local, regional or distant recurrence the pain itself warrants management in a pain clinic. XXXXXXXXX could be seen in the pain clinic at the XXXXXXX but as this would involve a lot of travelling would like to be treated nearer her home. I wonder whether it would be possible for you to investigate if there is a pain clinic available at XXXXXXXXXXX as I am sure XXXXX could be treated and benefit from its management. I have otherwise arranged for her to be seen in the clinic again in a year's time. There are no signs of recurrence at this time.

5213A4F612F1

Extraction of keyinformation from text

Information Extraction identifies events and relationships between them from the text, based on templates & knowledge resources

recurrence

no signs of recurrence

bony lymphoedema

shooting pain in thedirection of ulna nerve

pain

Interventions

Problems

Problem Site

Locations

left arm

local, regional or distant

a year’s time

today

at this time

Time

pain clinic

clinic

pain clinic

General Surgical

pain clinic

mastectomy left open capsulotomyremoval of her prosthesis

management

management

Page 15: De-identifying the EHR: building a resource for research Clinical e-Science Framework De-identifying the EHR: building a resource for research All Hands

3a) Regulation and restriction of access

• Ethical Oversight Board will approve the kinds of organisations, teams and purposes for which the CLEF repository may be queried– defining the appropriate security measures to be taken

• e.g. for authentication, authorisation and encryption

• A research project specific identifier will be used for data extracts to prevent cross-linking– the approval process will determine the extent to which

longitudinal access to records is required, and the extent of drill-down permitted

Page 16: De-identifying the EHR: building a resource for research Clinical e-Science Framework De-identifying the EHR: building a resource for research All Hands

3b) Monitoring of access

• All accessed will be logged in an audit trail database• Published algorithms will be used to help detect

attempts to combine queries maliciously • Selected research clients will be requested to help

spot personal characteristics that slip through the net– the process of depersonalisation is still early R&D

Page 17: De-identifying the EHR: building a resource for research Clinical e-Science Framework De-identifying the EHR: building a resource for research All Hands

Privacy Enhancement& authorisation

Queries logged,threats to confidentiality monitored.

CLEF WYSIWYM Query WriterLogin Query OMIM Exit

Relevant Subjects

Treatment Profiles

Outcome Measures

[Male] patients with [adenocarcinoma] of of [this laterality] of [this part] of [breast]AND [age] at [diagnosis] was [less than 30].

Percentage of patients [alive] after [1 year] and after [2 years] and after [5 years].

Patients who received [radiotherapy] [daily], compared with patients who received [radiotherapy] [every other day] and those who received [no radiotherapy]. 

WARNINGLess than 20 male patients diagnosed with adenocarcinoma of the breast were found.

Further subanalysis on small groups increases the risk that a patient may be identifiable.

Your CLEF security authorisation does not permit your query to be processed.

Queries on small patient groups are blocked or the figures blurred.

malefemale

[Female] patients with [adenocarcinoma] of of [this laterality] of [this part] of [breast]

QUERY RESULT1792 patients diagnosed with adenocarcinoma of the breast were found. 788 had radiotherapy daily, 513 had it on alternate days and 491 had no radiotherapy.

After 5 years, 20% (n=158) of patients who had a daily treatment were alive. After 5 years, 10% (n=49) who had alternate day treatment were alive. After 5 years, 5% (n=27) of the patients who had no treatment were alive.

With special authorisation researchers may examine individual records in anonymised form.

CLEF Patient Chronicle Viewer – L2Exit

#12345678910111213141516171819202122232425262728293031

17

1974

Grade III infiltrating ductal carcinoma left breast

7/22 sampled nodes positive

Radical Mastectomy Left Breast

Insertion Left Breast Prosthesis

MEFUP Chemotherapy

1982/3

Recurrence Left supraclavicular nodes

Excision biopsy of nodes

Radiotherapy

1992

Replacement of Left Breast Prosthesis

Removal of replacement to left breast prosthesis

1994

Recurrence inside chest (confirmed biopsy)

VAC Chemotherapy aborted (toxicity)

Radiotherapy completed

L5/S1 degeneration

Left phrenic nerve paralysis

1996

Multiple pulmonary emboli

Post-radiation fibrosis left upper lung

Prior rib fractures

Frontal lobe ischaemic atrophy

Teflon injection vocal cord

1997

Recurrence in chest

Pleural effusions

VAC Chemotherapy 6 cycles

1998

Recurrence in chest

Radiotherapy

Normal Left Shoulder Xray

1999

No evidence of recurrence

Congestive cardiac failure

Died June 1999

19 75 19 80 19 85 19 90 19 95 20 00

D i edG rad e I I I i n fi l tr ati n g

d u c ta lc ar c i n om a l e f t b reas t

RR ecu rr e n ce

R R R

TAMOXIFEN ARIMIDEX

RA D IO

C H E M O

SSSSSS S S S SS SSS t ag in g C T

T 1N 3 cM 0

T 1 >N 3 cM 1

S tag e IIA S tag e I II c S tag e IV

N o d e sL iv erS p l ee nK id n e yB o n e

N o d e sL iv erS p l ee nK id n e yB o n e

T 1 >N 1 >M 0

Textual summary of CLEF Chronicle for patient #17

Graphical ‘time line’ view of CLEF

Chronicle

Page 18: De-identifying the EHR: building a resource for research Clinical e-Science Framework De-identifying the EHR: building a resource for research All Hands

Gaining ethical approval

• This depersonalisation process has MREC approval as a valid candidate methodologybut as it has not yet been validated, CLEF cannot yet use live patient data without consent

• However, the project has been approved to use the records of deceased patients as an initial step towards developing, refining and evaluating the depersonalisation approach

• If successful, CLEF hopes to be permitted to migrate to live patient's records next year

Page 19: De-identifying the EHR: building a resource for research Clinical e-Science Framework De-identifying the EHR: building a resource for research All Hands

Intended final security results

• A validated approach– accepted by MREC, PIAG, and other stakeholder groups

(BMA, GMS, NHS, etc.)

• Exemplar policies and procedures– Ethical Oversight Committee– employee/researcher contracts– safe data extraction – access controls

• Open source tools– mechanisms to support security– active monitoring of use, limiting risk of inferential attack