representation and reinforcement learning for personalized...
Post on 28-Sep-2020
8 Views
Preview:
TRANSCRIPT
Representation and Reinforcement Learning for Personalized Glycemic Control in Septic PatientsWei-Hung Weng1, Mingwu Gao2, Ze He3, Susu Yan4, Peter Szolovits1
1 CSAIL, MIT | 2 Philips Connected Sensing Venture | 3 Philips Research North America | 4 Massachusetts General Hospital
BackgroundMotivation•Criticallyillpatientshavetheissueofpoorglucosecontrol,whichin-cludesthepresenceofdysglycemiaandhighglycemicvariability.•CurrentclinicalpracticefollowstheguidelinessuggestedbytheNICE-Sugartrialtocontrolthebloodsugarlevelforcricitalcare.•However,thereareoverwhelmingvariationsinclinicalconditionsandphysiologicalstatesamongpatientsundercriticalcare.Thislimitscli-nicians’abilitytoperformappropriateglycemiccontrol.Inaddition,clinicianssometimesmaynotbeabletoconsidertheissueofglycemiccontrol.•Tohelpcliniciansbetteraddressthechallengeofmanagingpatients’glucoselevel,weneedapersonalizedglycemiccontrolstrategythatcantakeintoaccountthevariationsinpatients’physiologicalandpathologicalstates.
Reinforcement Learning (RL) in Clinical Domain•RLisapotentialapproachforthescenarioofsequentialdecisionmak-ingwithdelayedrewardoroutcome.•RLalsohastheabilitytogenerateoptimalstrategiesbasedonnon-op-timizedtrainingdata.•RLhasbeenusedfortreatmentofschizophrenia[Shortreed2011];hep-arindosingproblem[Nemati2016];mechanicalventilationadministra-tionandweaning[Prasad2016];andsepsistreatment[Raghu2017].•Relatedtoglycemiccontrol,somestudiesutilizeRLandinverseRLtodesignclinicaltrialsandadjustclinicaltreatments[Bothe2014].•FewerstudieshaveutilizedtheRLapproachtolearnabettertargetlaboratoryvaluesasreferencesforclinicaldecisionmaking.
Proposed Approach and Objectives•Learnoptimalpolicytosimulatepersonalizedoptimalglycemictrajec-tories,whicharesequencesofappropriateglycemictargets.•Thesimulatedtrajectoriesareintendedasareferenceforclinicianstodecidetheirglycemiccontrolstrategy,andtoachievebetterclinicaloutcomes.•Wehypothesizedthatthepatientstates,glycemicvalues,andpatientoutcomescanbemodeledasaMarkovdecisionprocess(MDP).•‘Action‘=theglycemicvaluethatleadtorealclinicalaction.•WeexploredRLapproachtolearnthepolicyforlearningpersonalizedoptimalglycemictrajectories,andcomparedtheprognosisofthetra-jectoriessimulatedbytheoptimalpolicytotherealtrajectories.•Thelearnedpolicyisintendedasreferencesforclinicianstoadaptandoptimizetheircarestrategy,andtoachievebetterclinicaloutcomes.
Abstract
Glycemiccontrolisessentialforcriticalcare.However,itisachallengingtasksincetherehasbeennostudyonpersonalizedoptimalstrategyforglycemiccontrol.Thisworkaimstolearnpersonalizedoptimalglycemictrajectoriesforseverelyillsepticpatientsvialearningdata-drivenpoli-ciesofdecidingoptimaltargetedbloodglucoselevelasclinicians’refer-ence.Weencodedpatientstatesusingasparseautoencoderandadoptedareinforcementlearningparadigmusingpolicyiterationtolearntheop-timalpolicyfromdata.Wealsoestimatedtheexpectedreturnfollowingthepolicylearnedfromtherecordedglycemictrajectories,whichyieldedafunctionindicatingtherelationshipbetweenrealbloodglucosevaluesand90-daymortalityrates.Thissuggeststhatthelearnedoptimalpolicycouldreducethepatients’estimated90-daymortalityrateby6.3%,from31%to24.7%.Theresultdemonstratesthatthereinforcementlearningwithappropriatepatientstateencodingcanpotentiallyprovideoptimalglycemictrajectoriesandallowclinicianstodesignapersonalizedstrat-egyforglycemiccontrolinsepticpatients.
Methods
Experiment
Reference•Bellman.DynamicProgramming.1957.•Botheetal.Theuseofreinforcementlearningalgorithmstomeetthechallengesofanartificialpancreas.ExpertRe-viewofMedicalDevices,2014.
•Howard.DynamicProgrammingandMarkovProcesses.1960.•Nematietal.Optimalmedicationdosingfromsuboptimalclinicalexamples:Adeepreinforcementlearningapproach.InIEEEEngineeringinMedicineandBiologySociety,2016.
•Ng.Sparseautoencoder.2011.https://web.stanford.edu/class/archive/cs/cs294a/cs294a.1104/sparseAutoencoder.pdf.•Prasadetal.Areinforcementlearningapproachtoweaningofmechanicalventilationinintensivecareunits.arXiv2017.
•Raghuetal.Continuousstate-spacemodelsforoptimalsepsistreatment-adeepreinforcementlearningapproach.arXiv2017.
•Shortreedetal.Murphy.Informingsequentialclinicaldecision-makingthroughreinforcementlearning:anempiricalstudy.MachineLearning2010.
•Silver.ReinforcementLearningLecture3:PlanningbyDynamicProgramming.2015.
Data Source and Study Cohort•5,565septicpatientsinMIMIC-IIIversion1.4.•Sepsis-3criteriatoidentifypatientswithsepsis.•Exclusion:age<18,SOFA<2,notfirstICUadmission.•Diabetes:(1)ICD-9,(2)pre-admissionHbA1c>7.0%,(3)admissionmedication,and(4)historyofdiabetesinthefreetext•Datawerecollectedatonehourinterval.•Missingvalues:linearandpiecewiseconstantinterpolation
RL Settings•Reward:90-daymortality(+100/-100)•Action:discretizedglucoselevels(11bins)astheproxyofrealactions•State:Total46normalizedvariables(patientlevelvariables,bloodglu-coserelatedvariables,periodicvitalsigns)
Patient State Encoding•Rawfeaturesvs.sparseautoencoder-encodedfeatures[Ng2011].•500stateclustersbyk-meansclustering.
Policy Evaluation / Iteration•Learnoptimalpolicy&evaluateonrealtrajectories.•90-daymortalityrate=f (expectedreturn)•Computeandcomparetheestimatedmortalityrateofrealandopti-malglucosetrajectoriesobtainedbyRL-learnedpolicy.[Figurecourtesy:DavidSilver]
Conclusion
WeutilizedtheRLalgorithmwithrepresentationlearningtolearnthepersonalizedoptimalpolicyforbetterpredictingglycemictargetsfromretrospectivedata.Themethodmayreducethemortalityrateofsepticpatients,andpotentiallyassistclinicianstooptimizethereal-timetreat-mentstrategyatdynamicpatientstatelevelswithamoreaccuratetreat-mentgoal,andleadtooptimalclinicaldecisions.Futureworksincludeapplyingacontinuousstateapproach,differentevaluationmethods,andapplyingthemethodtodifferentclinicaldecisionmakingproblems.
Results•Thedatadistributionoflearnedexpectedre-turn,whichistherescaledQ-value,isnegative-lycorrelatedwithmortalityratewithhighcor-relationvalue.•Thelearnedexpectedreturnreflectstherealpatientstatuswell.•Theoptimalpolicylearnedbythepolicyitera-tionalgorithmcanpotentiallyreducearound6.3%ofestimatedmortalityrateifwechosetheappropriatepatientstaterepresentations.
Rawfeaturerepresentation
32-dimensionsparseautoencoderrepresentation
top related