overview of the synthetic derivative april 16, 2010 melissa basford, mba program manager –...

22
Overview of the Synthetic Derivative April 16, 2010 Melissa Basford, MBA Program Manager – Synthetic Derivative

Upload: rosa-farmer

Post on 24-Dec-2015

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Overview of the Synthetic Derivative April 16, 2010 Melissa Basford, MBA Program Manager – Synthetic Derivative

Overview of the Synthetic Derivative

April 16, 2010

Melissa Basford, MBAProgram Manager – Synthetic Derivative

Page 2: Overview of the Synthetic Derivative April 16, 2010 Melissa Basford, MBA Program Manager – Synthetic Derivative

Synthetic Derivative resource overview Rich, multi-source database of de-

identified clinical and demographic data

Contains ~1.8 million records ~1 million with detailed longitudinal data averaging 100k bytes in size an average of 27 codes per record

Records updated over time and are current through 7/31/09

Page 3: Overview of the Synthetic Derivative April 16, 2010 Melissa Basford, MBA Program Manager – Synthetic Derivative

SD Establishment

Star Server

EDW

HEO

DE

-ID

EN

TIF

ICA

TIO

N

One way hashOne way hash

Dat

a P

arsi

ngD

ata

Par

sing

Information collected Information collected during clinical careduring clinical care

Restructuring for research

Data export

SD Database

Access through secured online application

Page 4: Overview of the Synthetic Derivative April 16, 2010 Melissa Basford, MBA Program Manager – Synthetic Derivative

Data Types (so far) Narratives, such as:

Clinical Notes Discharge Summaries History & Physicals Problem Lists Surgical Reports Progress Notes Letters & Clinical Communications

Diagnostic codes, procedural codes Forms (intake, assessment) Reports (pathology, ECGs, echocardiograms) Lab values and vital signs Medication orders TraceMaster (ECGs) ˜100 SNPs for 7000+ samples

Page 5: Overview of the Synthetic Derivative April 16, 2010 Melissa Basford, MBA Program Manager – Synthetic Derivative

Research use cases assumed in resource development (either alone, or with DNA samples)

Retrospective chart reviews

Rapid preliminary data forgrant

submissions

Feasibility assessment

Hypothesis generation

Page 6: Overview of the Synthetic Derivative April 16, 2010 Melissa Basford, MBA Program Manager – Synthetic Derivative

Technology + policy

De-identification Derivation of 128-character identifier (RUI) from the

MRN generated by Secure Hash Algorithm (SHA-512) RUI is unique to input, cannot be used to regenerate MRN RUI links data through time and across data sources

HIPAA identifiers removed using combination of custom techniques and established de-identification software

Restricted access & continuous oversight Access restricted to VU; not a public resource IRB approval for study (non-human) Data Use Agreement Audit logs of all searches and data exports

Page 7: Overview of the Synthetic Derivative April 16, 2010 Melissa Basford, MBA Program Manager – Synthetic Derivative

Date shift feature Our algorithm shifts the dates

within a record by a time period that is consistent within each record, but differs across records up to 364 days backwards e.g. if the date in a particular record is

April 1, 2005 and the randomly generated shift is 45 days in the past, then the date in the SD is February 15, 2005)

Page 8: Overview of the Synthetic Derivative April 16, 2010 Melissa Basford, MBA Program Manager – Synthetic Derivative

What the SD can’t do Outbreaks and other date-specific

studies (catastrophes, etc) Find a specific patient (e.g. to contact) Replace large scale epidemiology

research (e.g. TennCare database) Temporal search capabilities limited (but

under development) “First this, than that” study designs require

significant manual effort Expect “timeline” views and searching Q1-Q2

Page 9: Overview of the Synthetic Derivative April 16, 2010 Melissa Basford, MBA Program Manager – Synthetic Derivative

SD Davidson County

Tennessee United States

N 1,716,085 578,698 6,038,803 299,398,484

Gender (%)

Female 55.2 51.3 51.1 50.7 Male 44.6 48.7 48.9 48.3 Unknown 0.2 - - -

Race/Ethnicity* (%)

Afr American 14.3 27.9 16.9 12.8 Asian / Pacific 1.2 3.0 1.4 4.6 Caucasian 80.5 60.1 77.5 66.4 Hispanic 2.6 7.1 3.2 14.8 Indian American 0.1 0.4 0.3 1.0 Others 1.4 - - - Multiple Races 0 1.5 1.0 1.6

Demographic Characteristics

*A significant number of SD records are of unknown race/ethnicity. Multiple efforts are underway to better classify these records including NLP on narratives.

Page 10: Overview of the Synthetic Derivative April 16, 2010 Melissa Basford, MBA Program Manager – Synthetic Derivative

Examples of frequent diagnoses in total SD

0

10,000

20,000

30,000

40,000

50,000

60,000

70,000

Top diagnosis codes overall:1. FEVER 2. CHEST PAIN 3. ABDOMINAL PAIN4. COUGH 5. PAIN IN LIMB6. HYPERTENSION7. ROUTINE MEDICAL EXAM8. ACUTE URI9. MALAISE & FATIGUE10. HEADACHE11. URINARY TRACT INFECTION

Page 11: Overview of the Synthetic Derivative April 16, 2010 Melissa Basford, MBA Program Manager – Synthetic Derivative

Examples of frequent diagnoses among peds in SD

0

1,000

2,000

3,000

4,000

5,000

6,000

7,000

8,000

9,000

Top diagnosis codes overall:• ROUTIN CHILD HEALTH EXAM • FEVER • COUGH • ACUTE PHARYNGITIS • URIN TRACT INFECTION NOS • VOMITING ALONE • CARDIAC MURMURS NEC • ABDOMINAL PAIN-SITE NOS • OTITIS MEDIA NOS • ACUTE URI NOS • PAIN IN LIMB

Page 12: Overview of the Synthetic Derivative April 16, 2010 Melissa Basford, MBA Program Manager – Synthetic Derivative

Examples of ICD-9 codes for rare diseases

Example Rare Disease

Frequency Number in SD Number in BioVU

Microcephalus 0.00007 566 6

Pica 0.00004 59 9

Septicemic Plague 0.00004 20 0

Pick’s Disease 0.00004 72 7

Acromegaly and Gigantism 0.00041 464 57

Ehlers-Danlos Syndrome 0.00011 154 9

Narcolepsy without Cataplexy

0.00004 166 17

Spina Bifida 0.00022 1327 77

Stiff-Man Syndrome 0.00007 42 5

Tourette Syndrome 0.00007 366 9

Bell’s Palsy 0.00078 1509 141

Bulimia Nervosa 0.00021 640 35

Cushing’s 0.00116 1065 129

Peyronies Disease 0.00018 369 57

Page 13: Overview of the Synthetic Derivative April 16, 2010 Melissa Basford, MBA Program Manager – Synthetic Derivative

Statistical considerations and limitationsWorking with biostats (Schildcrout) on these issues. Some

considerations: Selection bias for inclusion in population;

representativeness of cohort and generalizability Bias in ICD-9 coding Confounding by indication Severity of disease Medication prescribed/ordered vs received Timing

For example, AE must come after medication (timecourse) Timescale upon which events could be attributed to events

Dropout (Death vs. discharge vs. transfer) Intervention based on in-hospital disease history

Page 14: Overview of the Synthetic Derivative April 16, 2010 Melissa Basford, MBA Program Manager – Synthetic Derivative

Using the SD resource

Page 15: Overview of the Synthetic Derivative April 16, 2010 Melissa Basford, MBA Program Manager – Synthetic Derivative

SD Access Protocol

Researcher

Requests IRB

Exemption

Signs DUAResearcher accesses

SD

SD staff verify/access granted

Enters StarBRITE to

complete electronic application

(IRB status is in StarBRITE)

Page 16: Overview of the Synthetic Derivative April 16, 2010 Melissa Basford, MBA Program Manager – Synthetic Derivative

Data Use Agreement Components

Page 17: Overview of the Synthetic Derivative April 16, 2010 Melissa Basford, MBA Program Manager – Synthetic Derivative

Phenotype Searching Definition of phenotype for cases and controls is

critical May require consultation with experts

Basic understanding of data elements; uses and limitations of particular data points is important

List of ‘watch outs’ under development

Reviewing records manually to make case determination (or even to calculate PPV of search methodology) will be somewhat time consuming

Page 18: Overview of the Synthetic Derivative April 16, 2010 Melissa Basford, MBA Program Manager – Synthetic Derivative

The problem with ICD9 codes ICD9 give both false negatives and false positives False negativesnegatives:

Outpatient billing limited to 4 diagnoses/visit Outpatient billing done by physicians (e.g., takes too long to find

the unknown ICD9) Inpatient billing done by professional coders:

omit codes that don’t pay well can only code problems actually explicitly mentioned in documentation

False positivespositives Diagnoses evolve over time -- physicians may initially bill for

suspected diagnoses that later are determined to be incorrect Billing the wrong code (perhaps it is easier to find for a busier

clinician) Physicians may bill for a different condition if it pays for a given

treatment Example: Anti-TNF biologics (e.g., infliximab) originally not covered for

psoriatic arthritis, so rheumatologists would code the patient as having rheumatoid arthritis

Page 19: Overview of the Synthetic Derivative April 16, 2010 Melissa Basford, MBA Program Manager – Synthetic Derivative

Lessons from preliminary phenotype development (can be corrected)

Eliminating negated and uncertain terms: “I don’t think this is MS”, “uncertain if multiple

sclerosis” Delineating section tag of the note

“FAMILY MEDICAL HISTORY: Mother had multiple sclerosis.”

Adding requirements for further signs of “severity of disease”

For MS: an MRI with T2 enhancement, myelin basic protein or oligoclonal bands on lumbar puncture, etc.

This could potentially miss patients with outside work-ups, however

Page 20: Overview of the Synthetic Derivative April 16, 2010 Melissa Basford, MBA Program Manager – Synthetic Derivative

Other lessons (more difficult to correct via algorithms) A number of incorrect ICD9 codes for RA and MS assigned

to patients Evolving disease

“Recently diagnosed with Susac’s syndrome - prior diagnosis of MS incorrect.” (Notes also included a thorough discussion of MS, ADEM, and Susac’s syndrome.)

Difference between two doctors: Presurgical admission H&P includes “rheumatoid

arthritis” in the past medical history Rheumatology clinic visits notes say the diagnosis is

“dermatomyositis” - never mention RA Sometimes incorrect diagnoses are propagated through

the record due to cutting-and-pasting / note reuse

Page 21: Overview of the Synthetic Derivative April 16, 2010 Melissa Basford, MBA Program Manager – Synthetic Derivative

Resources StarPanel

Identified clinical data; designed for clinical use Record Counter

De-identified clinical data; sophisticated phenotype searching Returns a number – record counts and aggregate

demographics Synthetic Derivative

De-identified clinical data; sophisticated phenotype searching Returns record counts AND de-identified narratives, test

values, medications, etc., for review and creation of study data sets

BioVU SNP data De-identified clinical data; sophisticated phenotype searching Able to link phenotype information to biological sample

Page 22: Overview of the Synthetic Derivative April 16, 2010 Melissa Basford, MBA Program Manager – Synthetic Derivative

Live Demo