electronic phenotypingfor genomic research · location care_site death cost device_exposure note...

25
Electronic Phenotyping for Genomic Research George Hripcsak, Columbia University On behalf of Phenotyping WG October 30, 2017

Upload: others

Post on 27-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Electronic Phenotypingfor Genomic Research · Location Care_site Death Cost Device_exposure Note Observation ... –And multimodal in general ... information retrieval, informatics

ElectronicPhenotyping forGenomicResearch

GeorgeHripcsak,ColumbiaUniversityOnbehalfofPhenotyping WG

October30,2017

Page 2: Electronic Phenotypingfor Genomic Research · Location Care_site Death Cost Device_exposure Note Observation ... –And multimodal in general ... information retrieval, informatics

1. HowcaneMERGE improveuponthecurrentlabor-intensivephenotyping towardfully-automatedphenotyping methodstoincreasephenotyping efficiencyandvalidityusingEMRs?

Page 3: Electronic Phenotypingfor Genomic Research · Location Care_site Death Cost Device_exposure Note Observation ... –And multimodal in general ... information retrieval, informatics

Phenotypesharing

• Onepartofthelaborissharing– eMERGE adoptingOHDSIOMOPCommonDataModel

– ConvertcurrenteMERGE datawarehousestosameschemaandvocabulary

– Butpreservesourceinformation

Page 4: Electronic Phenotypingfor Genomic Research · Location Care_site Death Cost Device_exposure Note Observation ... –And multimodal in general ... information retrieval, informatics

Concept

Concept_relationship

Concept_ancestor

Vocabulary

Source_to_concept_map

Relationship

Concept_synonym

Drug_strength

Cohort_definition

Standardizedvocabularies

Attribute_definition

Domain

Concept_class

Cohort

Dose_era

Condition_era

Drug_era

Cohort_attribute

Standardizedderivedelem

ents

Stan

dardize

dclinicaldata

Drug_exposure

Condition_occurrence

Procedure_occurrence

Visit_occurrence

Measurement

Observation_period

Payer_plan_period

Provider

Care_siteLocation

Death

Cost

Device_exposure

Note

Observation

Standardizedhealthsystemdata

Fact_relationship

SpecimenCDM_source

Standardizedmeta-data

Standardizedhealtheconom

ics

Person

DeepInformationModel:OMOPv5.2

Note_NLP

Page 5: Electronic Phenotypingfor Genomic Research · Location Care_site Death Cost Device_exposure Note Observation ... –And multimodal in general ... information retrieval, informatics

Extensivevocabularies

Page 6: Electronic Phenotypingfor Genomic Research · Location Care_site Death Cost Device_exposure Note Observation ... –And multimodal in general ... information retrieval, informatics

eMERGE phenotypegeneration• eMERGE phenotypinglessons

– [KhoAN,Sci TransMed2011]• ComplexityofeMERGE phenotypes

– [ConwayM,AMIA2011]• Multi-modalapproaches

– [PeissigPL,JAMIA2012]• UseofNQFQualityDataModel

– [ThompsonWK,AMIA2012]• Improvingvalidation

– [NewtonKM,JAMIA2013]• Designpatterns

– [RasmussenLV,JBI2014]• PhEMA:PhenotypeExecutionandModelingArchitecture

– [Pathaketal.]

Page 7: Electronic Phenotypingfor Genomic Research · Location Care_site Death Cost Device_exposure Note Observation ... –And multimodal in general ... information retrieval, informatics

Phenotypegenerationlessons• Challengeofbillingcodes• ImportanceofNLP– Andmultimodalingeneral

• Complexityofeffectivephenotypedefinitions• Possibleimprovementfromtoolsandreuse,butmostlyjustsloggingitout

• Differinggoals:– KnowledgediscoveryviaGWASneedshighPPV– Knowledgedeploymentfordecisionsupportalsoneedssensitivity

Page 8: Electronic Phenotypingfor Genomic Research · Location Care_site Death Cost Device_exposure Note Observation ... –And multimodal in general ... information retrieval, informatics

Phenotypingforthefuture

• High-fidelityphenotypes[HripcsakG,JAMIA2017]

– Encodedegree,severityofcondition• Redoforpastphenotypes?

– Exploittimetocreatemoreaccuratephenotypes– Encodetimeofcondition

• Diseasecourse,responsetotreatment– Continuousstates(topology, wherenotdichotomous)– Hiddenphysiologicphenotypes(dataassimilation)– Latentabstractstates(deeplearning)– Accommodatehealthcareprocessbias

Page 9: Electronic Phenotypingfor Genomic Research · Location Care_site Death Cost Device_exposure Note Observation ... –And multimodal in general ... information retrieval, informatics

High-fidelityphenotypes

• Encodedegree,severityofcondition

Albers,AMIA2015

Page 10: Electronic Phenotypingfor Genomic Research · Location Care_site Death Cost Device_exposure Note Observation ... –And multimodal in general ... information retrieval, informatics

High-fidelityphenotypes

• Exploittimetocreatemoreaccuratephenotypes

• Encodetimeofcondition

Hripcsak,JAMIA2015

Page 11: Electronic Phenotypingfor Genomic Research · Location Care_site Death Cost Device_exposure Note Observation ... –And multimodal in general ... information retrieval, informatics

High-fidelityphenotypes

• Continuousstates(topology, wherenotdichotomous)

Nicolau,PNAS2011

Page 12: Electronic Phenotypingfor Genomic Research · Location Care_site Death Cost Device_exposure Note Observation ... –And multimodal in general ... information retrieval, informatics

High-fidelityphenotypes

• Hiddenphysiologicphenotypes(dataassimilation)

Albers,PLOSCompBio2017

Page 13: Electronic Phenotypingfor Genomic Research · Location Care_site Death Cost Device_exposure Note Observation ... –And multimodal in general ... information retrieval, informatics

High-fidelityphenotypes

• Latentabstractstates(deeplearning)

Miotto,ScientificReports2016

Page 14: Electronic Phenotypingfor Genomic Research · Location Care_site Death Cost Device_exposure Note Observation ... –And multimodal in general ... information retrieval, informatics

High-fidelityphenotypes

• Accommodatehealthcareprocessbias

Hripcsak,JAMIA2013

Page 15: Electronic Phenotypingfor Genomic Research · Location Care_site Death Cost Device_exposure Note Observation ... –And multimodal in general ... information retrieval, informatics

2. Howmightmachine-learningandotheradvancedcomputationaltoolsbeusedtoimproveelectronicphenotypingintheeMERGEnetwork?

Page 16: Electronic Phenotypingfor Genomic Research · Location Care_site Death Cost Device_exposure Note Observation ... –And multimodal in general ... information retrieval, informatics

Advancedcomputationaltools

• Naturallanguageprocessing– Largeproportionofphenotypesemployit– Disparatesystemsacrossthenetwork–Mostgetbywithrelativelysimpleprocessing–WorkingonsharingNLP!

Page 17: Electronic Phenotypingfor Genomic Research · Location Care_site Death Cost Device_exposure Note Observation ... –And multimodal in general ... information retrieval, informatics

Advancedcomputationaltools• Machinelearningresearch– eMERGE research:seefollowingslides– Anchors,noisysetstolearnfromimperfecttrainingdata(MIT,Stanford,Columbia)

– Activelearningtoreducetrainingsetlabor(Marshfield,…)

– Deeplearningtocharacterizepatients(Mt.Sinai,…)– Physiologicphenotypesviadataassimilation(Columbia)• E.g.,kidney&liverfunction,bodyspace,insulinexcretion

– Topologyforcontinuousphenotypes(Stanford,Columbia)

Page 18: Electronic Phenotypingfor Genomic Research · Location Care_site Death Cost Device_exposure Note Observation ... –And multimodal in general ... information retrieval, informatics

RheumatoidArthritisAlgorithmFinalFeatureBetas

Createatrainingsetusingclinicianchartreview(N=200) Trainamachinelearning

algorithm

Usealgorithmtoidentifycases(and

controls)

Validationbasedonadditional100chartreview(PPV=0.92)

HarvardeMERGE – RheumatoidArthritisMachineLearningPhenotypeAlgorithm

AUC:0.967

• Machinelearningalgorithmscanbeeffectivelyandefficientlyappliedtoalargepopulationtoaccuratelyphenotypepatients

• Algorithmsprovideflexibilitytoadjustsensitivityandspecificitytovariedusecasescomparedtopre-definedrules-basedalgorithms

RheumatoidArthritisAlgorithmDevelopmentWorkflow

Page 19: Electronic Phenotypingfor Genomic Research · Location Care_site Death Cost Device_exposure Note Observation ... –And multimodal in general ... information retrieval, informatics

• ChallengesfacedinusingNLPforcomputationalphenotyping– Poorportabilitycausedbysyntactic,semantic,andprocessvariations– Semanticgapsamongusers,experts,anddata– Itisnot“onesizefitsall”solutionsforcomputationalphenotyping

• Solutionsproposed– Improvesyntacticinteroperabilitybyadoptingcommondatamodels– Mitigatethesemanticgapsthroughacombinationofdeeplearningrepresentation,

informationretrieval,informaticsextraction,andlatebindingNLPanddatanormalization– Developa platformforsharingNLPknowledgeartifactsandmappingbetweendatasemantics

andexpertsemantics

Mayo

Page 20: Electronic Phenotypingfor Genomic Research · Location Care_site Death Cost Device_exposure Note Observation ... –And multimodal in general ... information retrieval, informatics

PhEMA• PhEMA:PhenotypeExecutionandModelingArchitecture

[Pathaketal.]– Standards-basedrepresentationofphenotypes– Visualtoolforauthoringphenotypes(PhAT)– ExecutionagainstOMOPori2b2(PheX)– DevelopingNLP&MLextensions– IntegrateswithPheKB

Page 21: Electronic Phenotypingfor Genomic Research · Location Care_site Death Cost Device_exposure Note Observation ... –And multimodal in general ... information retrieval, informatics

NLP– MLApproach• ApplyexclusionandinclusioncriteriabasedonICD9codefiltering

• AcquireEMRdataforthefilteredpatients• ProcessclinicalnotestodiscoverSNOMED-CTandRxNORM conceptswiththeirattributes(ApachecTAKES)andgeneratefeaturevectors

• Applymachinelearningpredictiononfeaturevectorsbasedontrainingfromexpert-providedlabels

• CommunicateMLmodeltoothersitestorunontheirdata

Page 22: Electronic Phenotypingfor Genomic Research · Location Care_site Death Cost Device_exposure Note Observation ... –And multimodal in general ... information retrieval, informatics

PhenotypingusingRelationalMachineLearning

Page 23: Electronic Phenotypingfor Genomic Research · Location Care_site Death Cost Device_exposure Note Observation ... –And multimodal in general ... information retrieval, informatics

Marshfield,Castro2008

Page 24: Electronic Phenotypingfor Genomic Research · Location Care_site Death Cost Device_exposure Note Observation ... –And multimodal in general ... information retrieval, informatics

3. HowcaneMERGEassessphenotypecomparabilityacrossdiversepatientpopulationsanddiversehealthcaresettings(e.g.academicandcountyhospitals,communityclinicsandothernationalhealthcaresystems)?

Page 25: Electronic Phenotypingfor Genomic Research · Location Care_site Death Cost Device_exposure Note Observation ... –And multimodal in general ... information retrieval, informatics

Diversepopulationsandsettings

• DesignspecificeMERGE experiments– Busynowwithexisiting phenotypes

• CollaboratewithAllofUsResearchProgram– Gettinguptospeed;usessamedatamodel

• CollaboratewithOHDSI– Large,internationalsetforphenotypepart