mobile network big data for modeling spread of infec8ous...

24
Mobile Network Big Data for Modeling Spread of Infec8ous Disease LIRNEasia Big Data for Development Team

Upload: others

Post on 19-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

MobileNetworkBigDataforModelingSpreadofInfec8ous

Disease

LIRNEasiaBigDataforDevelopmentTeam

Page 2: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

Dengue•  WHOes8mates50-100millioninfec8onsoccureveryyear•  Endemicinover100countries•  Avector-bornediseasecausedbyanRNAvirusoffamily

Flaviviridae;genusFlavivirus(Huhtamo,Eilietal.,2008)•  Mainvectors(Monath,1994)

– Aedesaegyp8– Aedesalbopictus

•  Fourserotypeshavebeeniden8fied(DENV1-4)

Dengvaxia,avaccinefordengue,developedbySanofiPasteuriscurrentlyundergoingclinicaltrialsinmul<plecountries(Sirisena&Noordeen,2016)

2

Page 3: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

DengueinSriLanka

•  All4serotypeshavebeenobservedinSriLanka

•  42,194casesreportedin2016sofar;likelytoexceeedthe44,000reportedin2012

–  51.19%reportedfromWesternProvince

–  IncreaseofDENV-2duringJune,2016outbreak

Page 4: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

Roleofhumanmobilityindenguepropaga8on

•  Themosquitohasalifespanof2-4weeksandarangeofaround100-800m(Muir&Kay1998;Honórioetal.,2003)

•  Dengueisspreadbeyondthenaturalrangeofthemosquitobythemovementofinfectedhosts

Knowledgeofhumanmobilitypaaernscanshedlightondenguepropaga8onandthelevelofdiseaseincidenceinaregion

Page 5: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

DengueinJaffnaCeasefireagreementmayhavecontributedtodengueoutbreaksinJaffnaader2002

Propaga8onofdengueinSriLankaovertheyears(ImageSource:EpidemiologyUnit,M.ofH.(2009).CURRENTSITUATION&EPIDEMIOLOGYOFDENGUEINSRILANKA.InMediaSeminarforDengueWeek.Retrievedfromhap://www.healthedu.gov.lk/web/images/pdf/msp/current_situa8on_epidemiology_of_dengue.pdf)

Page 6: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

MobileNetworkBigData(MNBD)usedintheresearch

•  Mul8plemobileoperatorsinSriLankahaveprovidedfourdifferenttypesofmeta-data–  CallDetailRecords(CDRs)

•  Recordsofcalls•  SMS•  Internetaccess

–  Air8merechargerecords•  DatasetsdonotincludeanyPersonallyIden8fiableInforma8on

–  Allphonenumbersarepseudonymized–  LIRNEasiadoesnotmaintainanymappingsofiden8fierstooriginalphone

numbers•  Cover50-60%ofusers;veryhighcoverageinWestern(wherethecapital

cityislocated)&Northern(mostaffectedbycivilconflict)provinces,basedoncorrela8onwithcensusdata

Page 7: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

InsightsalreadyusedinurbanplanninginColombo

46.9%ofColomboCity’sday8mepopula8oncomesfromthesurroundingregions

Page 8: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

MNBDformodelingspreadofdisease

•  AmyWesolowskihaspioneereduseofMNBDfordiseasemodeling

•  Studyon2009MalariaoutbreakinKenya(Wesolowskiet.al,2012)usesMNBDtoiden8fyregionswithhighMalariaoutbreakrisk

•  Travelnetworkisderivedbycalcula8ngtheaveragemonthlytripsbyasubscribertodifferentregions

Travelnetworkbetweennodes(Regions).Edgesweightedbyvolumeoftraffic.Wesolowskietal.2012

Page 9: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

UsingMNBDtopredictDengueEpidemics

•  ModeldevelopedbyWesolowskietal.(2015)modelonemergenceofdengueepidemicsinPakistancorrespondstoactualdenguepropaga8on

•  Exis8ngento-epidemiologicalmodelwasusedtopredictthenumberofdenguecaseswithinanadministra8veunitknownasatehsil

–  ModelwasfiaedtotheendemicregionofKarachistar8ngatDay1,toes8matedailyinfectednumberofpeopleover8mebasedonreportedcases

•  Usedes8matelikelihoodofdengueimporta8onfromKarachitootherregionsonagivenday

–  Modelwasfiaed(inreverse)tonaïveregionstoes8matethe8meforintroduc8onofdenguefromoutsidesolelywithreportedcases

•  Usedtovalidatees8matesofdengueintroduc8onbasedonhumanmobilitypaaerns

•  MobilitymodelsdevelopedfromCDRandgravitymodelswerecomparedtodeterminewhichmodelwasmoreaccurate

Page 10: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

ConclusionsfromPakistanstudy•  CDRMobilitymodelcanpredictthe

8meofintroduc8onofdenguemoreaccuratelythanthediffusionmodel

•  CDRpredictedthefirstinfectedcasescomingtoLahoresome8mebeforetheactualoutbreak

–  ThisdelayintheoutbreakinLahoreisaaributedtotheimmunologyofthepopula8oninLahore

–  Therewasapreviousoutbreakin2011•  Ontheotherhand,theoutbreak

occurredimmediatelyadertheintroduc8oninMingora

–  Mingorapopula8onisimmunologicallynaive

The8meline of introduc8onofDengue toA) Lahore,B) Mingora according to the Pakistan study(Wesolowskiet.al,2015)

Page 11: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

Dataneededtopredictdengueoutbreaks

•  Mobilityisoneamongmanyfactorsthataffectdiseasepropaga8on–  Weather,seasonality–  Vectorpopula8ondynamics–  Incuba8onperiodsofthedisease–  Vectorcontrolmechanisms–  Asymptoma8ctransmission–  Immunityofthepopula8ontodifferentserotypesof

dengue

Page 12: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

Howdowemodelthesedifferentparameters?

•  Differentapproachesarepossible–  Machinelearningtechniques–  Sta8s8calmodellingtechniques

•  Needtoderivemathema8calmodelswithgoodassump8ons

•  Needtounderstandhowdifferentparametersbehavetoapplymachinelearningmodels

•  Epidemiologicalandentomologicalexper8seareessen8al

Es8ma8ngvectortohuman(ƛV->h)andhumantovector(ƛh->V)dengueincidenceraterequiresknowledgeonthepopula8ons(Ih,Sh,Nh,Iv,Sv,Nh)andbi8ngrate(a)andtransmissionprobabili8es(ΦV->h,Φh->V)(Wesolowskiet.al.,2015)

Page 13: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

SriLanka:Mul8-disciplinary,mul8-stakeholderresearcheffort

•  EpidemiologyUnit,MinistryofHealth–  Casedata,groundtruthdata–  LIRNEasia&UofMoratuwalackexper8seon

epidemiologyandentomologyneededtounderstandthediseasedynamics

–  Keycollaborators:Dr.HasithaA.Tissera,Dr.AzharGhouse

•  UniversityofMoratuwa–  Providesexper8seon

•  Sta8s8cs&Machinelearning•  Computa8onalmodeling

–  Keycollaborators:Dr.ShehanPerera

Page 14: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

Workdonesofar•  Dataonweeklynewdenguecasesobtainedandpreprocessedfor2013

and2014•  DatafromSriLanka’sweathersta8ons(total414)beinganalyzed

–  Obtainedrainfalldatafrom112sta8onsfor2013•  70sta8onshaddatafortheen8reyear•  24sta8onshadmissingdataforlessthan2months(filledbyimputa8on)•  18sta8onshadmissingdataformorethan2months(discarded)

–  Temperaturedatafrom23sta8ons•  Onesta8onwasmissing6monthsofdata(discarded)•  Othermissingvalueswereimputed

•  Spa8alboundariesofcelltowers,MOHdivisions,andGNdivisionsdiffered

–  Wecouldnotmappopula8ontoMOHdivisionormobilitytoMOHdivisionbecauseofthesedifferentboundaries

–  AnundergraduategroupfromUniversityofMoratuwawasabletosolvetheissueofoverlappingspa8alboundariesusingVoronoitessella8onanddevelopedtheirownalgorithmtoderivemobility

Page 15: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

Studentworkindetail•  Voronoitessella8onwasemployedtoes8matecelltowercoverageand

MOHdivisionalboundaries•  Thefrac8onofcoverageoverlappinganMOHdivisionwasusedtoobtain

frac8onalvaluesformobilityfromeachcellcoverage•  StudentswerelaterabletoobtaintheMOHboundariesandthe

popula8onswithassistancefromtheEpidemiologyUnit•  Mul8plesta8s8calmodelsaswellasdifferentmachinelearning

techniquesweretriedoutonthedata

fij - fractional coverage of a cell tower j on MOH area i

Page 16: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

Meta-popula8onmodelforColomboCity

•  Asta8s8calmodelproposedfordenguebySarzynska,Udiani&Zhang,2013inastudydoneforPeruwasadaptedforSriLanka,butusingparametervaluesfromPeru

•  Resultswerenotencouraging(RMSE-70.29)

•  Weareworkingones8ma8ngparametersforSriLankausingmachinelearningtechniques

Predicted

Actual

Page 17: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

Machinelearning:From1tomanyMOHdivisions

•  ForasingleMOHdivision,only52datapointsforayear–  Only70%(36datapoints)ledfortraining

•  Duetosmalltrainingset,modelstrainedareoverfiaedandperformbadlywhenpredic8ngunseendata

•  TrainingforasingleMOHdivisionwasdonetoiden8fysuitabilityofdifferenttechniques–  Finalmodelwillincorporatespa8alaspectsandpredictfortheen8re

country•  SincetrainingforasingleMOHwasnotyieldinggoodresults,

6similarMOHareaswereconsideredfortraining–  Providesmoredatapointsfortraining–  Providesanideaonmodelbehaviourunderdifferentcondi8ons

Page 18: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

Mul8pleapproaches• RandomForestmethodusedinPakistan(Rehmanetal.,2016)triedinDehiwalaMOHDivision

– Lowerror,butdoesnotpredicttrendyet• NeuralnetworkshavebeentriedinendemiccountriessuchasSingapore(Aburasetal.,2010);wastriedforKoaeMOHDivision

– SriLankalacksdataforalongenoughperiod;nineyearsofdatainSEAsiacases

Page 19: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

XGBoostforDehiwalaMOH•  XGBoostisshortfor“ExtremeGradientBoos8ng”.Canbeconsideredas

anensembledecisiontreealgorithm•  RMSEof8.61adercleaningthedata•  Temperature,weatherdataextrapolatedfor2014duetounavailability

ofrealdata•  Topfeatures

–  temperaturelagfor2weeks

–  caseslagfor2weeks

–  temperaturelaggedfor1week

•  Ourpredic8onsarege{ngclosertotheactualcurve

Page 20: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

DoesMobilityimproveourmodel?•  Appliedthenaivemobilitymodeltoneuralnetworks•  SmallimprovementinRMSEvaluefrom9.62to8.75•  Improvementinthepredic8oncurve•  Wordofcau8on:Thesearepreliminaryresults;Insomeitera8onswedid

notseesignificantimprovements•  However,highcorrela8onwithmobilitywasconsistent

Observedhighercorrela8onformobilitythanrainfallduringourpreliminaryanalysis-  Analysedwith15variables-  Correla8onofrainfall:0.10497-  Correla8onwithmobility:0.14514

Page 21: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

Whatdothepreliminaryresultssay?

•  ThereisnotmuchdifferenceinRMSEvaluesbetweenthemachinelearningtechniquesaaemptedsofar

–  Notenoughofadifferencetodiscardanytechniqueoutright•  EventhoughRMSEvalueisgood,modelswithoutmobilitydonotpredict

thepeaks,troughsandthegeneraltrendofdengueincidence•  Mobilityhasadefinitecorrela8onwithdengueincidenceandimproved

RMSEaswellasthepredic8oncurve–  Roomforimprovementintwoaspectswhenconsideringmobility

•  Es8ma8ngthe8mespentbythetravellingpopula8oninanMOHarea•  Methodologyoffusingmobilitytoourmodels

•  Wehaven’tfiguredoutthefinerdetailsofincorpora8ngspa8alproper8estothemodel,inwhichcase,introducingmul8pleMOHdivisionsmightactuallyincreasetheerror

•  Qualityofweatherdata,presenceofmissingvaluescanalsoaffectmodelaccuracy

Page 22: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

Generalizeddiseasepropaga8onpredic8onmodel

•  Ifourmodelcanincorporatetheessen8alfactors(includinghumanmobility)thataffectspa8otemporaldiseasepropaga8onandaccuratelypredictfordengue,wecangeneralizeitforotherdiseases

•  CanusethesamemodelforZikaandpredictanoutbreak–  Canexecutevectorcontrolmechanisms,awarenesscampaignspre-

emp8vely•  Mobilitywillplayanevenlargerroleinnon-vectorborne

diseases–  Ifwecancomeupwithanaccuratemodelforvectorbornediseases,

modellingothertypesofinfec8ousdiseaseswillbeeasier

Page 23: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

References1.  Huhtamo, Eili et al. “Molecular Epidemiology of Dengue Virus Strains from Finnish Travelers.” Emerging Infectious

Diseases 14.1 (2008): 80–83. PMC. Web. 10 Oct. 2016. 2.  Monath, T. P. (1994). Dengue: the risk to developed and developing countries. Proceedings of the National Academy of

Sciences, 91(7), 2395-2400. 3.  Centre for Dengue Research, U. of S. J. (2016). Dengue Virus Serotypes of Current Epidemic. Retrieved October 6, 2016,

from https://www.facebook.com/206869486156896/photos/a.206973839479794.1073741828.206869486156896/598635470313627/?type=3&theater

4.  Brockmann, D. (2010). Human Mobility and Spatial Disease Dynamics. Reviews of Nonlinear Dynamics and Complexity, 2, 1–24. http://doi.org/10.1002/9783527628001.ch1

5.  Brockmann, D., Hufnagel, L., & Geisel, T. (2006). The scaling laws of human travel. Nature, 439(7075), 462–465. http://doi.org/10.1038/nature04292

6.  Wesolowski, A., Eagle, N., Tatem, A. J., Smith, D. L., Noor, A. M., Snow, R. W., & Buckee, C. O. (2012). Quantifying the impact of human mobility on malaria. Science (New York, N.Y.), 338(6104), 267–70. http://doi.org/10.1126/science.1223467

7.  Wesolowski, A., Buckee, C. O., Bengtsson, L., Wetter, E., Lu, X., & Tatem, A. J. (2014). Commentary: Containing the Ebola Outbreak – the Potential and Challenge of Mobile Network Data. PLoS Currents Outbreaks, 1–17. http://doi.org/10.1371/currents.outbreaks.0177e7fcf52217b8b634376e2f3efc5e.Funding

8.  Wesolowski, A., Qureshi, T., Boni, M. F., Sundsøy, P. R., Johansson, M. A., Rasheed, S. B., … Buckee, C. O. (2015). Impact of human mobility on the emergence of dengue epidemics in Pakistan. Proceedings of the National Academy of Sciences, 112(38), 11887–11892. http://doi.org/10.1073/pnas.1504964112

9.  Sarzynska, M., Udiani, O., & Zhang, N. (2013). A study of gravity-linked metapopulation models for the spatial spread of dengue fever. arXiv Preprint arXiv:1308.4589, 2008, 1–32. Retrieved from http://arxiv.org/abs/1308.4589

10.  Rehman, N. A., Kalyanaraman, S., Ahmad, T., Pervaiz, F., Saif, U., & Subramanian, L. (2016). Fine-grained dengue forecasting using telephone triage services. Science Advances, 2(7), 1–10. http://doi.org/10.1126/sciadv.1501215

Page 24: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

References11.  Aburas, H. M., Cetiner, B. G., & Sari, M. (2010). Dengue confirmed-cases prediction: A neural network model. Expert

Systems with Applications, 37(6), 4256–4260. http://doi.org/10.1016/j.eswa.2009.11.077 12.  Husin, N. A., Salim, N., & Ahmad, A. R. (2008). Modeling of dengue outbreak prediction in Malaysia: A comparison of

neural network and nonlinear regression model. Proceedings - International Symposium on Information Technology 2008, ITSim, 4, 6–9. http://doi.org/10.1109/ITSIM.2008.4632022

13.  Rachata, N., Charoenkwan, P., Yooyativong, T., Chamnongthai, K., Lursinsap, C., & Higuchi, K. (2008). Automatic prediction system of dengue haemorrhagic-fever outbreak risk by using entropy and artificial neural network. 2008 International Symposium on Communications and Information Technologies, ISCIT 2008, (Iscit), 210–214. http://doi.org/10.1109/ISCIT.2008.4700184

14.  WHO Scientific Group on Arthropod-Borne and Rodent-Borne Viral Diseases. (1985). Arthropod-borne and rodent-borne viral diseases : report of a WHO scientific group [meeting held in Geneva from 28 February to 4 March 1983].

15.  Vitarana, T., Jayakuru, W., & Withane, N. (1997). Historical account of Dengue Haemorrhagic Fever in Sri Lanka. 16.  Yang, H. M., Macoris, M. L. G., Galvani, K. C., Andrighetti, M. T. M., & Wanderley, D. M. V. (2009). Assessing the effects

of temperature on the population of Aedes aegypti, the vector of dengue. Epidemiology and Infection, 137(8), 1188–1202. http://doi.org/10.1017/S0950268809002040

17.  P. H. D. Kusumawathie, & R. R. M. L. R. Siyambalagoda. (2005). Distribution and breeding sites of potential dengue vectors in Kandy and Nuwara Eliya districts of Sri Lanka. The Ceyion Journal of Medical Science .