epidemics forecast ing challenge luca colombo july 4, 2018 · econofisica università degli studi...

ECONOFISICA Università degli Studi di Torino

Epidemics Forecasting Challenge Luca Colombo July 4, 2018

Abstract

The RAPIDD ebola forecasting challenge is an innovative work inspired by the West African

Ebola crisis in 2014-2015, involving 16 international academic teams and US government

agencies. The participants were invited to predict 140 epidemiological targets across five

different time points of four synthetic Ebola outbreaks. Here i present the results of a more

modest work, based on the idea proposed by the RAPIDD ebola forecasting challenge

paper: evaluate the performances of a simple model on a synthetic outbreak dataset. Both

the dataset and the model were created in NetLogo, an useful tool for agent based

modelling and simulation. The synthetic outbreak dataset is generated by a complex

meta-populations stochastic SIR/SIS scenario in which the agents diffuse in a scale-free

network and spread the infection. The model is a simple Deterministic SIR evaluated with

different degrees of information about the space structure (the network features), the

interventions of other agents (medics) and the diffusion probability. The goal of this work is

to understand how these informations affect the forecasting and, introducing them

gradually, to compare the results to the scenario’s dataset. The simpler models predicts

poorly the scenario whereas the more complex ones go in the right direction, predicting

correctly the peak timing and amplitude, the outbreak duration and the overall trend.

However the missing informations, like the money-making and spending, the outbreak alert

and so on cause some parameters to be different from the scenario. Introducing a

birth/death mechanism as well as immunization processes would be an interesting step

forward. A key element to improve on is the reliability of the model: removing excessive

randomness in some mechanism would avoid accidents like leaving multiple isolated

clusters or slow or non-existent outbreak’s starts.

1. Introduction

The development of computational and mathematical models is crucial to prevent and

control the emerging infectious diseases and to guide intervention strategies. For example

in the 2014-2015 West African Ebola Virus Disease (EVD) epidemic were used a variety of

models to generate real-time predictions on the unfolding of the outbreak and help the

authorities fight the disease.

At the end of the West African Ebola epidemic, in spring 2015, a workshop was organized

by the RAPIDD program led by Fogarty International Center of the National Institutes of

Health (NIH). The aim of the workshop was to analyze and discuss the models used during

the outbreak and find possible improvements in the forecasting accuracy. The participants

decided that the best way to do so was to build a forecasting challenge relying on synthetic

Ebola datasets in a controlled and systematic environment, evaluate the models prediction

performances and how they scale with epidemiological complexity and data availability.

The synthetic epidemiological datasets had been generated using spatially structured,

stochastic, agent-based model at the level of single household that integrates detailed data

on Liberia demography. The model was used to generate four outbreak scenarios with an

increasing level of complexity in terms of epidemiology, layered interventions, data

availability and reporting noise. The goal of the participants was to predict 140 targets in

total across all scenarios and time points.

This work takes inspiration from a paper published recently (The RAPIDD ebola forecasting

challenge: Synthesis and lessons learnt [1]) and proposes to do the same in a much simpler

environment in which we build a base complex scenario, we extract data including

incidencies, final size of the outbreak, peak size, peak timing and we build a new simulation

trying to predict these features. In the new simulation is possible to gradually make

hypotheses on infection rate and recover rate, on the space structure, on the travel

probability and so on. However some features of the base scenario are deliberately

unknown in the model. We will evaluate their influence and if they change the parameters

we look for. The goal of this work is to understand how these informations affect the

forecasting and, introducing them gradually, to compare the results to the scenario’s

dataset.

2

NetLogo is used to build the ABM simulations. This environment is useful to graphically see

the structure of the space in which the agents interact and move and the incidence through

time of the infection.

2. Space structure

Scenario

It is used a network structure instead of letting the agents move randomly in the NetLogo

world. This allows to have control on the scenario structure with a mathematical formalized

instrument and be flexible at the same time. In a real scenario the nodes represent the

cities and the links the travel vectors on which agents move between cities. The more a

node is linked the more is important and likely to be a hub in which agents interact, spread

the infection or get cured. On the contrary more isolated nodes have generally low

population and it is difficult to get access to cures. In the scenario the infected people are

spawned in peripheral and isolated nodes (first thing the model doesn’t know about).

How to reproduce a real scenario in a network based structure? In the real world big cities

are way more linked than the average size city and orders of magnitude more than a little

city. So the new nodes are more likely to link to nodes with a bigger degree (already deeply

linked). In the Barabasi-Albert paper [2] is called preferential attachment.

The growing network with the preferential attachment leads to a fundamental property:

the scale invariance. This network are called scaled-free and they are convenient in our

case because they are a not arbitrary structure (decided by the author), always different but

with specific features: flexible and reproducible. (Figure 2.1, 2.3)

2.1 Preferential attachment code. Each

node is more likely to link to a more linked

node.

3

In order to take in consideration the eventuality of “shortcuts” of a real world scenario (like

airlines, trains and so on), there is the possibility to set a rewiring probability of the built

network. This feature gives even more flexibility to the model because we can transform

gradually a scale-free network in a random-like network.

Model

Switching from the scenario to the model the same number of nodes are maintained with

identical position. Obviously if one would like to predict the parameters of a real outbreak

he’d know the position of the cities and their infrastructures. However in this case the links

represent the diffusion vectors, which are unknown.

In this work are considered two possible options to rebuild the network: homogeneous

hypothesis and heterogeneous hypothesis.

The first one assumes that a person on a random nodes sees an average number of

possible destinations (nodes) to travel to: average degree. The network is built thinking that

a person is surely likely to travel to nearby nodes. So the node he is on is linked to these

nodes forming a cluster in which almost every node is connected (depends on fixed

average degree). However can be fixed a rewiring probability that creates the shortcuts

existing in a real world scenario. If the rewiring probability isn’t high enough the risk is that

a cluster or more is isolated. Inversely if it is too high, the preferred destinations would be

all the nodes and not the closer ones which is unrealistic. The degree distribution is

substantially Gaussian. (Figure 2.2, 2.4)

The second option is that the network is heterogeneous: a person on a random node sees

a number of possible destinations (nodes) so different from one node another that not

considering the degree distribution would be incorrect.

2.2 Clustering code. Choices are the nodes at the minimum distance

4

2. 3 Example of a degree distribution of the nodes of a scale-free network. n_nodes = 150

2.4 Example of the degree distribution of the same network in figure 2.3 with n_nodes = 150,

average-degree = 3 and rewiring probability = 0.30. It is clear how the distribution has its center on 3 with

low variance.

2.5 Example of the degree distribution of the same network in figure 2.3 with n_nodes = 150 and rewiring

probability = 0.30. The distribution is obviously similar to the scale-free one with differences due to the

rewiring probability.

To build the network it is used the degree of each node of the scale-free network. In a real

world scenario we could know the number of people going in or out of the city, but don’t

know their destinations. So the degree fixes the number of links of the node but these are

created with the same criteria of the homogeneous network (closer nearby nodes are

privileged) and rewired with a rewiring probability. (Figure 2.5)

This work tries to point out the differences in the infection spread mechanism and in the

intervention one if the agents diffuse in a scale-free network, in a random network with

homogeneous hypothesis and in a random scale-free-like one with heterogeneous

hypothesis.

5

3. Outbreak scenario

3.1 People distribution in the

base scenario (Gaussian)

3.2 People distribution on the

Random Network (Gaussian)

In the scenario and in every model the people are generated randomly distributed on the

nodes with a minimum of 2 people per node. At the start the big cities, nodes with high

degree, have almost the same number of people of the little cities. Their importance is due

to the in/out flow: after 1 tick they tend to have doubled (if not more) the people in them,

resulting in many more interactions than the isolated ones.

One difference from the scenario and a model is that the first generates the infected

people on isolated nodes whereas the second generates them uniformly distributed. This

could result in a delayed outbreak in the first case, especially if the rewiring probability is

low: the infected people could after 1 or 2 ticks not reach the hubs, delaying the infection

spreading. Inversely, in the model, the infected people, being evenly distributed, could

spread the infection too fast if the rewiring probability is too high, creating strong

discrepancies between the model and the scenario.

The model knows only the total number of people and the initial number of infected people

(initial-outbreak-size).

Travel

The virus spreads following the metapopulation model setup. In each node the people

interact with each other and then every person has an average probability to diffuse into

another node (travel-tendency).

There is a feature that the model doesn’t take in consideration: the outbreak mechanism.

6

When the overall number of infected people exceeds a so called outbreak-threshold it

triggers an outbreak alert that changes the behaviour of all the agents: the medics and the

agents start to travel whereas the normal people (susceptible, infected, recovered and

cured) travel much more easily (5 times the base travel-tendency).

Lastly there is another feature that the model will not take in consideration: the money

mechanism. In the real life everybody starts with a certain amount of money and works his

way into the society. In the scenario every agents is generated with a random Poissonian

amount of money and every tick, when he doesn’t travel, makes money. In order to travel

he needs to pay a price and so when he doesn’t have the right amount of money he stays in

the node. This creates a micro-delay for those poorer agents that overall changes the

movement speed.

Interaction

In a SIR model susceptible people have the probability the become infected proportionally

to the virus-spread-chance multiplied by the probability to encounter an infected person in

a node and the infected agents have a recovery-chance to become recovered.

3.3 SIR compartment model (epidemiology).

3.4 Number-of-people(time) in a simple

Deterministic SIR Model.

Blue = Susceptible

Green = Infected

Red = Recovered

7

The dynamics of an epidemic are often much faster than the dynamics of birth and death,

therefore, birth and death are often omitted in simple compartmental models. The SIR

system without so-called vital dynamics (birth and death, sometimes called demography)

described above can be expressed by the following set of ordinary differential equations.

The beta and gamma coefficients are respectively the transition rates from susceptible to

infected and from infected to recovered, here called virus-spread-chance and

recovery-chance. Without the birth and death dynamics we can see that:

and furthermore:

The scenario created in this work differs from a standard SIR model in many features. First

of all the virus-spread-chance is inversely proportional to the number of susceptible

people. This means that the outbreak is delayed whereas at the peak of the infection (when

the susceptible are less) it is amplified compared to the start of the infection.

8

It is used a mixed SIR-SIS model: recovered people are potentially susceptible or cured,

which is the agent state that doesn’t allow to be infected anymore, proportionally to a

variable called recovery-time (tr), randomly generated when the agent recovered. So the

more time the agent stays in the recovered status the more is likely to become cured and

less to become infected again.

In the scenario are generated randomly a number (initial-doctors) of medics M that cures a

random number of infected people on a node with a probability of 50% (pc).

The medics behaviour changes dramatically when another player joins the game: the agent.

At the start of the outbreak a random number (max 10) of agents are created. They travel

freely from node to node and if the number of infected people is too high, notably more

than 30% of the people on the node, it closes the node. This behaviour stops the in/out

flow of infected and prevents the virus spreading.

The medic in a closed node is much more efficient curing the infected. This agent-medic

behavior is something that the model deliberately doesn’t know about.

In the code (3.5) the two-bodies interactions are conducted via nested commands to the

agentset “people” (a command in NetLogo is called ask). The 𝜏 is set to 1 via “not

generated-in-this-loop?”. The medics have a 50% chance to cure a infected person whereas,

while into a closed node, they have the 100% chance to cure. Every recovered has a

recovery-time assigned in the change of status (infected to recovered) Gaussian distributed.

9

3.5 Infection, recover and part of the cured code

Scenario

We generate 3000 people evenly distributed on 150 nodes or locations. The network is

scale-free with rewiring probability of 0.10.

The initial-outbreak-size is set to 4, the virus-spread-chance to 60%, the recovery-chance to

10%, the travel-tendency to 0.4, the outbreak threshold to 0.3 and the number of doctors to

a reasonable 30 (1% of the population).

3.6 People distribution at the start

of the simulation.

3.7 Degree distribution of the

scale-free network.

10

3.8 People distribution after 3 ticks. Most of the people (at least 10%)

are on the node with degree 30. It is an hub on which hundreds of

interactions take place.

3.9 Network structure. Scale-free with some shortcuts. Pink people are

the susceptible ones.

11

3.10 Populations after 115 ticks. Blue = Susceptible, Red = Infected, Light-green = Recovered,

Green = Cured.

The susceptible curve has a logistic-like form as suggested by the previous differential

equation. The Infected one follows the logistic growth, but as the number of infected

reaches the 30% of the total population the outbreak alert is activated. From now on the

agents travel much faster and the agents and the medics joins with a dramatic decrease of

infected people as well as an increase of cured ones. Until here the cured count was near

the recovered one.

The max count of infected is near the 30% of the population (33%) on the 20th tick. From

now on the infected count fluctuate due the micro-interactions in the nodes. Slowly but

surely it decreases until a zero infected situation is reached: 514 ticks in this case. The end

of the outbreak is not reliable because the randomness in the micro interactions here is

more important that the actual macro situation.

3.11 Populations trend from the start to the end of the simulation.

12

4. Model

The model chosen to replicate the scenario is the logistic growth of the the basic

deterministic SIR. The initial number of the infected as well as the total number of people

and the number of nodes is know. We will subsequently make 4 hypotheses about the

structure of the network, the pure SIR versus a modified SIS and the travel tendency of the

agents.

No hypothesis

4.1 Degree distribution of the homogeneous random network.

Without any hypothesis the network is random and

homogeneous (4.3): the agents travel much faster than in a

scale-free network and they see, on every node, the same

number of neighbors. This implies that no hub is created

and isolated nodes exists rarely (clusters could be isolated).

The outbreak involves more susceptible at the same time: about 70% of them are infected

at the peak of the infection which is located around the 7th tick. Its end is located around

the 90th tick. So it has a faster and more aggressive initial development but also a faster

end because of the fast growth of the recovered people which can’t be infected again at

this stage of the model.

4.2 Outbreak of a Deterministic SIR in a random homogeneous network. The line in dark red is the trend

of the infected people in the scenario. beta = 0.60, gamma = 0.10.

13

4.3 Network view.

Built with the idea

that nodes (cities)

are likely to be

linked to nearby

cities as well as

have some link

with cities far

away. Random

network building

often leaves

clusters of nodes

isolated if the

rewiring

probability is low.

The network is

homogeneous as

you can see by

the size of the

nodes which is

proportional at

their degree (4.1).

Average degree = 4; rewiring probability = 0.05

Network Structure hypothesis

Even if we don’t know the network in its details, it’s reasonable to think that in real scenario

forecasting, one will not know the exact movement of the agents, but he can make an

hypothesis on the main vectors and cities through which they will pass.

In this case the degree of the scenario nodes as well as the position in the world is taken.

14

The rewiring probability is fixed to 0.05 in this case but could be needed to set it around

0.10 because of the clustering tendency used.

This hypothesis doesn’t change substantially the trend of the infected people count, but it

generates a wider plateau on the peak of the infection (4.4), caused by the medium-low

mobility that the agents have.

4.4 Trend of the

infection in a

scale-free

network

randomly

rebuilt.

4.5 World

rebuilt with the

hypothesis that

the nodes don’t

have the same

number of

neighbors. Some

of the are hubs

others are

isolated.

15

Cured, Doctors and Travel Hypotheses

The outbreak evolves too fast and too widely still. So we need to introduce something that

could delay and contain the infection. Three big hypotheses are made here:

1. A travel probability not equal to 1 which is not realistic.

2. Introducing a SIR/SIS mixing mechanism in which the recovered has a chance to be

infected again as well as be cured totally.

3. Introducing doctors that simply seek the infected people and cure them.

4.6 Travel probability = 0.1; Pure SIR, no doctors.

4.7 Travel probability = 0.4; Pure SIR, no doctors.

The travel probability plays a role in the diffusion and the control of the infected people. In

figures 4.6 and 4.7 it is clear how with a low travel probability the agents change state more

slowly (from susceptible to infected and from infected to recovered). In reverse it is

sufficient a medium travel probability to concentrate a lot of people in one or two nodes so

that the agents are more likely to interact with each other.

16

It is introduced the cured/recovered model and the doctors which travel freely on the

network and cure infected people. This produces infected curves much closer to the

scenario one: the outbreak is a bit slower, its peak happens always around the 20th tick

and its end tends to be around the 500th tick. This last information isn’t something to rely

on because of the randomness but it is important the trend of the infection.

4.8 Travel probability = 0.1, cured probability = 0.05, number of doctors = 10

4.9 Travel probability = 0.1, cured probability = 0.05, number of doctors = 10

4.10 Travel probability = 0.1, cured probability = 0.05, number of doctors = 10, delay = 11

17

In the scenario the infection is regulated by non-linear effects. These are unknown in the

model and impossible to reproduce if one doesn’t know the differential equation behind it-

A delay was added to reproduce the trend of the infected people in the scenario (4.10).

In figures 4.11, 4.12. and 4.13 it is visible how a model depends on the structure of the

network: with low rewiring probability some clusters can be excessively isolated, causing

4.11 Tick = 31, cured probability = 0.02, number of doctors = 30, delay = 11

4.12 Complete outbreak: cured probability = 0.02, number of doctors = 30, delay = 11

4.13 beta = 0.70, gamma = 0.05, cured probability = 0.01, number of doctors = 30, delay = 9

18

the infection to not spread fully and after 100 ticks generating another little outbreak.

In figures 4.14, 4.15 and 4.16 is visible the tweaking process of the parameters. The

rewiring probability went from 0.05 to 0.10, producing less isolated clusters.

4.14 Rewiring probability = 0.10, beta = 0.60, gamma = 0.10, number of doctors = 31,

cured probability = 0.01, delay = 11

.

4.15 Rewiring probability = 0.10, number of doctors = 36, cured probability = 0.02

4.16 Rewiring probability = 0.10, number of doctors = 32, cured probability = 0.01

19

5. Conclusions

The scenario was built with a lot of details and mechanisms that the model didn’t know

about: the infected spawn in isolated nodes, the money-making mechanism, the costs of

the travel, the outbreak alert and the doctor-agent interaction.

On the contrary the model was based on a simple logistic growth differential equation

system (Deterministic SIR model). It managed to reproduce the scenario trend of the

infected people with the progressive additions of hypotheses. The network structure, that

changes from homogeneous to heterogeneous, and the low travel probability mainly allow

to set the peak position and width whereas the SIR-SIS mixing and the introduction of

doctors set the its amplitude and the outbreak duration.

The outbreak parameters are correctly provided: the scenario’s virus spread chance and

recovery chance are 60% and 10% respectively whereas the model’s parameters are beta =

0.60 and gamma = 0.10.

The network structure is similar to the scenario’s one. Despite this the money making

process and the travel cost slow down the movement of the agents in the scenario. For this

reasons the model’s travel probability is set to 0.1 whereas the scenario travel tendency

was 0.4 or 1 with the panic due the outbreak alert.

In the scenario the people didn’t change the status from recovered to cured in a linear way,

but it seems that setting a cured probability of the recovered of 0.01 works well. Without

the agent-doctor mechanism the number of doctors needed to achieve the scenario’s

infected trend is a bit higher: from 30 of the scenario to 32 of the model.

The non-linear infection growth of the scenario, the spawn of the infected people on

isolated nodes and especially that model doesn’t have these informations forces a delay to

be introduced (delay = 11).

Lastly it is clear that the multiple model’s features based on randomness make the model

less reliable. This problem could be solved repeating the simulations a lot of times,

providing an error on the simulation ’s results and therefore validating them or introducing

more complex mathematical structures.

20

Bibliography

[1] The RAPIDD ebola forecasting challenge: Syntesis and lessons learnt.

[2] Emergence of scaling in random networks. AL Barabási, R Albert - science, 1999

[3] NetLogo library: Virus on a Network

[4] NetLogo library: Preferential Attachment

22
https://www.sciencedirect.com/science/article/pii/S1755436517301275http://science.sciencemag.org/content/286/5439/509http://ccl.northwestern.edu/netlogo/models/VirusonaNetworkhttp://ccl.northwestern.edu/netlogo/models/PreferentialAttachment

epidemics forecast ing challenge luca colombo july 4, 2018 · econofisica università degli studi...

Documents