epidemics forecast ing challenge luca colombo july 4, 2018 · econofisica università degli studi...

22
ECONOFISICA Università degli Studi di Torino Epidemics Forecasting Challenge Luca Colombo July 4, 2018 Abstract The RAPIDD ebola forecasting challenge is an innovative work inspired by the West African Ebola crisis in 2014-2015, involving 16 international academic teams and US government agencies. The participants were invited to predict 140 epidemiological targets across five different time points of four synthetic Ebola outbreaks. Here i present the results of a more modest work, based on the idea proposed by the RAPIDD ebola forecasting challenge paper: evaluate the performances of a simple model on a synthetic outbreak dataset. Both the dataset and the model were created in NetLogo, an useful tool for agent based modelling and simulation. The synthetic outbreak dataset is generated by a complex meta-populations stochastic SIR/SIS scenario in which the agents diffuse in a scale-free network and spread the infection. The model is a simple Deterministic SIR evaluated with different degrees of information about the space structure (the network features), the interventions of other agents (medics) and the diffusion probability. The goal of this work is to understand how these informations affect the forecasting and, introducing them gradually, to compare the results to the scenario’s dataset. The simpler models predicts poorly the scenario whereas the more complex ones go in the right direction, predicting correctly the peak timing and amplitude, the outbreak duration and the overall trend. However the missing informations, like the money-making and spending, the outbreak alert and so on cause some parameters to be different from the scenario. Introducing a birth/death mechanism as well as immunization processes would be an interesting step forward. A key element to improve on is the reliability of the model: removing excessive randomness in some mechanism would avoid accidents like leaving multiple isolated clusters or slow or non-existent outbreak’s starts.

Upload: others

Post on 25-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  •  

    ECONOFISICA Università degli Studi di Torino   

    Epidemics Forecasting Challenge Luca Colombo July 4, 2018 

     

     

    Abstract 

    The RAPIDD ebola forecasting challenge is an innovative work inspired by the West African                           

    Ebola crisis in 2014-2015, involving 16 international academic teams and US government                       

    agencies. The participants were invited to predict 140 epidemiological targets across five                       

    different time points of four synthetic Ebola outbreaks. Here i present the results of a more                               

    modest work, based on the idea proposed by the RAPIDD ebola forecasting challenge                         

    paper: evaluate the performances of a simple model on a synthetic outbreak dataset. Both                           

    the dataset and the model were created in NetLogo, an useful tool for agent based                             

    modelling and simulation. The synthetic outbreak dataset is generated by a complex                       

    meta-populations stochastic SIR/SIS scenario in which the agents diffuse in a scale-free                       

    network and spread the infection. The model is a simple Deterministic SIR evaluated with                           

    different degrees of information about the space structure (the network features), the                       

    interventions of other agents (medics) and the diffusion probability. The goal of this work is                             

    to understand how these informations affect the forecasting and, introducing them                     

    gradually, to compare the results to the scenario’s dataset. The simpler models predicts                         

    poorly the scenario whereas the more complex ones go in the right direction, predicting                           

    correctly the peak timing and amplitude, the outbreak duration and the overall trend.                         

    However the missing informations, like the money-making and spending, the outbreak alert                       

    and so on cause some parameters to be different from the scenario. Introducing a                           

    birth/death mechanism as well as immunization processes would be an interesting step                       

    forward. A key element to improve on is the reliability of the model: removing excessive                             

    randomness in some mechanism would avoid accidents like leaving multiple isolated                     

    clusters or slow or non-existent outbreak’s starts. 

     

     

  •   

    1. Introduction 

    The development of computational and mathematical models is crucial to prevent and                       

    control the emerging infectious diseases and to guide intervention strategies. For example                       

    in the 2014-2015 West African Ebola Virus Disease (EVD) epidemic were used a variety of                             

    models to generate real-time predictions on the unfolding of the outbreak and help the                           

    authorities fight the disease.   

    At the end of the West African Ebola epidemic, in spring 2015, a workshop was organized                               

    by the RAPIDD program led by Fogarty International Center of the National Institutes of                           

    Health (NIH). The aim of the workshop was to analyze and discuss the models used during                               

    the outbreak and find possible improvements in the forecasting accuracy. The participants                       

    decided that the best way to do so was to build a forecasting challenge relying on synthetic                                 

    Ebola datasets in a controlled and systematic environment, evaluate the models prediction                       

    performances and how they scale with epidemiological complexity and data availability.                     

    The synthetic epidemiological datasets had been generated using spatially structured,                   

    stochastic, agent-based model at the level of single household that integrates detailed data                         

    on Liberia demography. The model was used to generate four outbreak scenarios with an                           

    increasing level of complexity in terms of epidemiology, layered interventions, data                     

    availability and reporting noise. The goal of the participants was to predict 140 targets in                             

    total across all scenarios and time points. 

    This work takes inspiration from a paper published recently (The RAPIDD ebola forecasting                         

    challenge: Synthesis and lessons learnt [1]) and proposes to do the same in a much simpler                               

    environment in which we build a base complex scenario, we extract data including                         

    incidencies, final size of the outbreak, peak size, peak timing and we build a new simulation                               

    trying to predict these features. In the new simulation is possible to gradually make                           

    hypotheses on infection rate and recover rate, on the space structure, on the travel                           

    probability and so on. However some features of the base scenario are deliberately                         

    unknown in the model. We will evaluate their influence and if they change the parameters                             

    we look for. The goal of this work is to understand how these informations affect the                               

    forecasting and, introducing them gradually, to compare the results to the scenario’s                       

    dataset. 

     2 

  •   

    NetLogo is used to build the ABM simulations. This environment is useful to graphically see                             

    the structure of the space in which the agents interact and move and the incidence through                               

    time of the infection. 

     

    2. Space structure 

    Scenario 

    It is used a network structure instead of letting the agents move randomly in the NetLogo                               

    world. This allows to have control on the scenario structure with a mathematical formalized                           

    instrument and be flexible at the same time. In a real scenario the nodes represent the                               

    cities and the links the travel vectors on which agents move between cities. The more a                               

    node is linked the more is important and likely to be a hub in which agents interact, spread                                   

    the infection or get cured. On the contrary more isolated nodes have generally low                           

    population and it is difficult to get access to cures. In the scenario the infected people are                                 

    spawned in peripheral and isolated nodes (first thing the model doesn’t know about). 

    How to reproduce a real scenario in a network based structure? In the real world big cities                                 

    are way more linked than the average size city and orders of magnitude more than a little                                 

    city. So the new nodes are more likely to link to nodes with a bigger degree (already deeply                                   

    linked). In the Barabasi-Albert paper [2] is called preferential attachment. 

    The growing network with the preferential attachment leads to a fundamental property:                       

    the scale invariance. This network are called scaled-free and they are convenient in our                           

    case because they are a not arbitrary structure (decided by the author), always different but                             

    with specific features: flexible and reproducible. (Figure 2.1, 2.3) 

     

    2.1 Preferential attachment code. Each         

    node is more likely to link to a more linked                   

    node. 

     3 

  •   

    In order to take in consideration the eventuality of “shortcuts” of a real world scenario (like                               

    airlines, trains and so on), there is the possibility to set a rewiring probability of the built                                 

    network. This feature gives even more flexibility to the model because we can transform                           

    gradually a scale-free network in a random-like network. 

    Model 

    Switching from the scenario to the model the same number of nodes are maintained with                             

    identical position. Obviously if one would like to predict the parameters of a real outbreak                             

    he’d know the position of the cities and their infrastructures. However in this case the links                               

    represent the diffusion vectors, which are unknown.  

    In this work are considered two possible options to rebuild the network: homogeneous                         

    hypothesis and heterogeneous hypothesis. 

    The first one assumes that a person on a random nodes sees an average number of                                 

    possible destinations (nodes) to travel to: average degree. The network is built thinking that                           

    a person is surely likely to travel to nearby nodes. So the node he is on is linked to these                                       

    nodes forming a cluster in which almost every node is connected (depends on fixed                           

    average degree). However can be fixed a rewiring probability that creates the shortcuts                         

    existing in a real world scenario. If the rewiring probability isn’t high enough the risk is that                                 

    a cluster or more is isolated. Inversely if it is too high, the preferred destinations would be                                 

    all the nodes and not the closer ones which is unrealistic. The degree distribution is                             

    substantially Gaussian. (Figure 2.2, 2.4) 

    The second option is that the network is heterogeneous: a person on a random node sees                               

    a number of possible destinations (nodes) so different from one node another that not                           

    considering the degree distribution would be incorrect. 

     

    2.2 Clustering code. Choices are the nodes at the minimum distance  

     4 

  •   

     

     

    2. 3 Example of a degree distribution of the nodes of a scale-free network. n_nodes = 150 

    2.4 Example of the degree distribution of the same network in figure 2.3 with n_nodes = 150,                                 

    average-degree = 3 and rewiring probability = 0.30. It is clear how the distribution has its center on 3 with                                       

    low variance. 

    2.5 Example of the degree distribution of the same network in figure 2.3 with n_nodes = 150 and rewiring                                     

    probability = 0.30. The distribution is obviously similar to the scale-free one with differences due to the                                 

    rewiring probability.  

     

    To build the network it is used the degree of each node of the scale-free network. In a real                                     

    world scenario we could know the number of people going in or out of the city, but don’t                                   

    know their destinations. So the degree fixes the number of links of the node but these are                                 

    created with the same criteria of the homogeneous network (closer nearby nodes are                         

    privileged) and rewired with a rewiring probability. (Figure 2.5) 

    This work tries to point out the differences in the infection spread mechanism and in the                               

    intervention one if the agents diffuse in a scale-free network, in a random network with                             

    homogeneous hypothesis and in a random scale-free-like one with heterogeneous                   

    hypothesis.  

     

     

     

     5 

  •   

    3. Outbreak scenario 

     

    3.1 People distribution in the         

    base scenario (Gaussian) 

    3.2 People distribution on the         

    Random Network (Gaussian) 

     

    In the scenario and in every model the people are generated randomly distributed on the                             

    nodes with a minimum of 2 people per node. At the start the big cities, nodes with high                                   

    degree, have almost the same number of people of the little cities. Their importance is due                               

    to the in/out flow: after 1 tick they tend to have doubled (if not more) the people in them,                                     

    resulting in many more interactions than the isolated ones. 

    One difference from the scenario and a model is that the first generates the infected                             

    people on isolated nodes whereas the second generates them uniformly distributed. This                       

    could result in a delayed outbreak in the first case, especially if the rewiring probability is                               

    low: the infected people could after 1 or 2 ticks not reach the hubs, delaying the infection                                 

    spreading. Inversely, in the model, the infected people, being evenly distributed, could                       

    spread the infection too fast if the rewiring probability is too high, creating strong                           

    discrepancies between the model and the scenario. 

    The model knows only the total number of people and the initial number of infected people                               

    (initial-outbreak-size). 

    Travel 

    The virus spreads following the metapopulation model setup. In each node the people                         

    interact with each other and then every person has an average probability to diffuse into                             

    another node (travel-tendency). 

    There is a feature that the model doesn’t take in consideration: the outbreak mechanism. 

     6 

  •   

    When the overall number of infected people exceeds a so called outbreak-threshold it                         

    triggers an outbreak alert that changes the behaviour of all the agents: the medics and the                               

    agents start to travel whereas the normal people (susceptible, infected, recovered and                       

    cured) travel much more easily (5 times the base travel-tendency). 

    Lastly there is another feature that the model will not take in consideration: the money                             

    mechanism. In the real life everybody starts with a certain amount of money and works his                               

    way into the society. In the scenario every agents is generated with a random Poissonian                             

    amount of money and every tick, when he doesn’t travel, makes money. In order to travel                               

    he needs to pay a price and so when he doesn’t have the right amount of money he stays in                                       

    the node. This creates a micro-delay for those poorer agents that overall changes the                           

    movement speed. 

    Interaction  

    In a SIR model susceptible people have the probability the become infected proportionally                         

    to the virus-spread-chance multiplied by the probability to encounter an infected person in                         

    a node and the infected agents have a recovery-chance to become recovered. 

     

    3.3 SIR compartment model (epidemiology). 

    3.4 Number-of-people(time) in a simple         

    Deterministic SIR Model.  

    Blue = Susceptible 

    Green = Infected 

    Red = Recovered 

     

     7 

  •   

    The dynamics of an epidemic are often much faster than the dynamics of birth and death,                               

    therefore, birth and death are often omitted in simple compartmental models. The SIR                         

    system without so-called vital dynamics (birth and death, sometimes called demography)                     

    described above can be expressed by the following set of ordinary differential equations. 

     

    The beta and gamma coefficients are respectively the transition rates from susceptible to                         

    infected and from infected to recovered, here called virus-spread-chance and                   

    recovery-chance. Without the birth and death dynamics we can see that: 

     

    and furthermore: 

     

     

    The scenario created in this work differs from a standard SIR model in many features. First                               

    of all the virus-spread-chance is inversely proportional to the number of susceptible                       

    people. This means that the outbreak is delayed whereas at the peak of the infection (when                               

    the susceptible are less) it is amplified compared to the start of the infection.  

     

     8 

  •   

    It is used a mixed SIR-SIS model: recovered people are potentially susceptible or cured,                           

    which is the agent state that doesn’t allow to be infected anymore, proportionally to a                             

    variable called recovery-time (tr), randomly generated when the agent recovered. So the                       

    more time the agent stays in the recovered status the more is likely to become cured and                                 

    less to become infected again. 

    In the scenario are generated randomly a number (initial-doctors) of medics M that cures a                             

    random number of infected people on a node with a probability of 50% (pc).  

    The medics behaviour changes dramatically when another player joins the game: the agent.                         

    At the start of the outbreak a random number (max 10) of agents are created. They travel                                 

    freely from node to node and if the number of infected people is too high, notably more                                 

    than 30% of the people on the node, it closes the node. This behaviour stops the in/out                                 

    flow of infected and prevents the virus spreading.  

    The medic in a closed node is much more efficient curing the infected. This agent-medic                             

    behavior is something that the model deliberately doesn’t know about. 

     

     

     

     

    In the code (3.5) the two-bodies interactions are conducted via nested commands to the                           

    agentset “people” (a command in NetLogo is called ask). The 𝜏 is set to 1 via “not                                 

    generated-in-this-loop?”. The medics have a 50% chance to cure a infected person whereas,                         

    while into a closed node, they have the 100% chance to cure. Every recovered has a                               

    recovery-time assigned in the change of status (infected to recovered) Gaussian distributed. 

     9 

  •   

     

    3.5 Infection, recover and part of the cured code 

    Scenario  

    We generate 3000 people evenly distributed on 150 nodes or locations. The network is                           

    scale-free with rewiring probability of 0.10. 

    The initial-outbreak-size is set to 4, the virus-spread-chance to 60%, the recovery-chance to                         

    10%, the travel-tendency to 0.4, the outbreak threshold to 0.3 and the number of doctors to                               

    a reasonable 30 (1% of the population). 

     

    3.6 People distribution at the start           

    of the simulation. 

    3.7 Degree distribution of the         

    scale-free network. 

     10 

  •   

    3.8 People distribution after 3 ticks. Most of the people (at least 10%)                         

    are on the node with degree 30. It is an hub on which hundreds of                             

    interactions take place. 

    3.9 Network structure. Scale-free with some shortcuts. Pink people are                   

    the susceptible ones. 

     

     

     

     

     11 

  •   

     

    3.10 Populations after 115 ticks. Blue = Susceptible, Red = Infected, Light-green = Recovered, 

    Green = Cured. 

     

    The susceptible curve has a logistic-like form as suggested by the previous differential                         

    equation. The Infected one follows the logistic growth, but as the number of infected                           

    reaches the 30% of the total population the outbreak alert is activated. From now on the                               

    agents travel much faster and the agents and the medics joins with a dramatic decrease of                               

    infected people as well as an increase of cured ones. Until here the cured count was near                                 

    the recovered one.  

    The max count of infected is near the 30% of the population (33%) on the 20th tick. From                                   

    now on the infected count fluctuate due the micro-interactions in the nodes. Slowly but                           

    surely it decreases until a zero infected situation is reached: 514 ticks in this case. The end                                 

    of the outbreak is not reliable because the randomness in the micro interactions here is                             

    more important that the actual macro situation. 

     

    3.11 Populations trend from the start to the end of the simulation. 

     12 

  •   

    4. Model 

    The model chosen to replicate the scenario is the logistic growth of the the basic                             

    deterministic SIR. The initial number of the infected as well as the total number of people                               

    and the number of nodes is know. We will subsequently make 4 hypotheses about the                             

    structure of the network, the pure SIR versus a modified SIS and the travel tendency of the                                 

    agents. 

    No hypothesis 

    4.1 Degree distribution of the homogeneous random network. 

    Without any hypothesis the network is random and               

    homogeneous (4.3): the agents travel much faster than in a                   

    scale-free network and they see, on every node, the same                   

    number of neighbors. This implies that no hub is created                   

    and isolated nodes exists rarely (clusters could be isolated). 

    The outbreak involves more susceptible at the same time: about 70% of them are infected                             

    at the peak of the infection which is located around the 7th tick. Its end is located around                                   

    the 90th tick. So it has a faster and more aggressive initial development but also a faster                                 

    end because of the fast growth of the recovered people which can’t be infected again at                               

    this stage of the model. 

     

    4.2 Outbreak of a Deterministic SIR in a random homogeneous network. The line in dark red is the trend                                     

    of the infected people in the scenario. beta = 0.60, gamma = 0.10. 

     13 

  •   

     

    4.3 Network view.     

    Built with the idea       

    that nodes (cities)     

    are likely to be       

    linked to nearby     

    cities as well as       

    have some link     

    with cities far     

    away. Random   

    network building   

    often leaves   

    clusters of nodes     

    isolated if the     

    rewiring 

    probability is low.     

    The network is     

    homogeneous as   

    you can see by       

    the size of the       

    nodes which is     

    proportional at   

    their degree (4.1). 

    Average degree = 4; rewiring probability = 0.05 

    Network Structure hypothesis 

    Even if we don’t know the network in its details, it’s reasonable to think that in real scenario                                   

    forecasting, one will not know the exact movement of the agents, but he can make an                               

    hypothesis on the main vectors and cities through which they will pass. 

    In this case the degree of the scenario nodes as well as the position in the world is taken. 

     14 

  •   

    The rewiring probability is fixed to 0.05 in this case but could be needed to set it around                                   

    0.10 because of the clustering tendency used. 

    This hypothesis doesn’t change substantially the trend of the infected people count, but it                           

    generates a wider plateau on the peak of the infection (4.4), caused by the medium-low                             

    mobility that the agents have. 

     

    4.4 Trend of the       

    infection in a     

    scale-free 

    network 

    randomly 

    rebuilt. 

    4.5 World   

    rebuilt with the     

    hypothesis that   

    the nodes don’t     

    have the same     

    number of   

    neighbors. Some   

    of the are hubs       

    others are   

    isolated. 

     15 

  •   

    Cured, Doctors and Travel Hypotheses 

    The outbreak evolves too fast and too widely still. So we need to introduce something that                               

    could delay and contain the infection. Three big hypotheses are made here:  

    1. A travel probability not equal to 1 which is not realistic. 

    2. Introducing a SIR/SIS mixing mechanism in which the recovered has a chance to be                           

    infected again as well as be cured totally. 

    3. Introducing doctors that simply seek the infected people and cure them. 

     

    4.6 Travel probability = 0.1; Pure SIR, no doctors. 

     

    4.7 Travel probability = 0.4; Pure SIR, no doctors. 

    The travel probability plays a role in the diffusion and the control of the infected people. In                                 

    figures 4.6 and 4.7 it is clear how with a low travel probability the agents change state more                                   

    slowly (from susceptible to infected and from infected to recovered). In reverse it is                           

    sufficient a medium travel probability to concentrate a lot of people in one or two nodes so                                 

    that the agents are more likely to interact with each other. 

     16 

  •   

    It is introduced the cured/recovered model and the doctors which travel freely on the                           

    network and cure infected people. This produces infected curves much closer to the                         

    scenario one: the outbreak is a bit slower, its peak happens always around the 20th tick                               

    and its end tends to be around the 500th tick. This last information isn’t something to rely                                 

    on because of the randomness but it is important the trend of the infection. 

     

    4.8 Travel probability = 0.1, cured probability = 0.05, number of doctors = 10 

     

    4.9 Travel probability = 0.1, cured probability = 0.05, number of doctors = 10 

     

    4.10 Travel probability = 0.1, cured probability = 0.05, number of doctors = 10, delay = 11 

     17 

  •   

    In the scenario the infection is regulated by non-linear effects. These are unknown in the                             

    model and impossible to reproduce if one doesn’t know the differential equation behind it-                           

    A delay was added to reproduce the trend of the infected people in the scenario (4.10). 

    In figures 4.11, 4.12. and 4.13 it is visible how a model depends on the structure of the                                   

    network: with low rewiring probability some clusters can be excessively isolated, causing  

     

    4.11 Tick = 31, cured probability = 0.02, number of doctors = 30, delay = 11 

     

    4.12 Complete outbreak: cured probability = 0.02, number of doctors = 30, delay = 11 

     

    4.13 beta = 0.70, gamma = 0.05, cured probability = 0.01, number of doctors = 30, delay = 9 

     18 

  •   

    the infection to not spread fully and after 100 ticks generating another little outbreak. 

    In figures 4.14, 4.15 and 4.16 is visible the tweaking process of the parameters. The 

    rewiring probability went from 0.05 to 0.10, producing less isolated clusters. 

     

    4.14 Rewiring probability = 0.10, beta = 0.60, gamma = 0.10, number of doctors = 31,  

    cured probability = 0.01, delay = 11 

    .  

    4.15 Rewiring probability = 0.10, number of doctors = 36, cured probability = 0.02 

     

    4.16 Rewiring probability = 0.10, number of doctors = 32, cured probability = 0.01 

     19 

  •   

    5. Conclusions  

    The scenario was built with a lot of details and mechanisms that the model didn’t know                               

    about: the infected spawn in isolated nodes, the money-making mechanism, the costs of                         

    the travel, the outbreak alert and the doctor-agent interaction.  

    On the contrary the model was based on a simple logistic growth differential equation                           

    system (Deterministic SIR model). It managed to reproduce the scenario trend of the                         

    infected people with the progressive additions of hypotheses. The network structure, that                       

    changes from homogeneous to heterogeneous, and the low travel probability mainly allow                       

    to set the peak position and width whereas the SIR-SIS mixing and the introduction of                             

    doctors set the its amplitude and the outbreak duration.  

    The outbreak parameters are correctly provided: the scenario’s virus spread chance and                       

    recovery chance are 60% and 10% respectively whereas the model’s parameters are beta =                           

    0.60 and gamma = 0.10. 

    The network structure is similar to the scenario’s one. Despite this the money making                           

    process and the travel cost slow down the movement of the agents in the scenario. For this                                 

    reasons the model’s travel probability is set to 0.1 whereas the scenario travel tendency                           

    was 0.4 or 1 with the panic due the outbreak alert. 

    In the scenario the people didn’t change the status from recovered to cured in a linear way,                                 

    but it seems that setting a cured probability of the recovered of 0.01 works well. Without                               

    the agent-doctor mechanism the number of doctors needed to achieve the scenario’s                       

    infected trend is a bit higher: from 30 of the scenario to 32 of the model. 

    The non-linear infection growth of the scenario, the spawn of the infected people on                           

    isolated nodes and especially that model doesn’t have these informations forces a delay to                           

    be introduced (delay = 11). 

    Lastly it is clear that the multiple model’s features based on randomness make the model                             

    less reliable. This problem could be solved repeating the simulations a lot of times,                           

    providing an error on the simulation ’s results and therefore validating them or introducing                           

    more complex mathematical structures. 

     20 

  •   

      21 

  •   

    Bibliography  

    [1] The RAPIDD ebola forecasting challenge: Syntesis and lessons learnt. 

    [2] Emergence of scaling in random networks. AL Barabási, R Albert - science, 1999 

    [3] NetLogo library: Virus on a Network 

    [4] NetLogo library: Preferential Attachment 

     

     

     

     

     

     22 

    https://www.sciencedirect.com/science/article/pii/S1755436517301275http://science.sciencemag.org/content/286/5439/509http://ccl.northwestern.edu/netlogo/models/VirusonaNetworkhttp://ccl.northwestern.edu/netlogo/models/PreferentialAttachment