[ieee 2012 ieee 51st annual conference on decision and control (cdc) - maui, hi, usa...

6
A Data-driven Inference Algorithm for Epidemic Pathways Using Surveillance Reports in 2009 Outbreak of Influenza A (H1N1) Xun Li 1 , Xiang Li 1 and Yu-Ying Jin 2 Abstract— In this paper, we propose an epidemiological infective-hospitalized (IH) model and adopt a heuristic algo- rithm to predict the transition of infective individuals, which optimizes, at the metapopulation level, the IH model’s approx- imation to the surveillance reports of (cumulative) laboratory confirmed cases. Applying to the data of the 2009 outbreak of a new strain of influenza A (H1N1) in the United States, we obtain the invasion tree along which the virus spreads from the source state reporting the first confirmed case to infect other states. Basically, the surveillance-data-based inference of invasion tree agrees with real epidemic pathways observed in outbreaks of influenza A (H1N1), which verifies the validity of our heuristic inference algorithm. I. INTRODUCTION Complex networks have been extensively utilized to model the world composed of entities and relations, which vary from biological and social to engineering and industrial sys- tems [1], [2], [3]. Modern transportation and mobility infras- tructure (e.g., air traffic and commuting networks), equipping a huge number of travelers with wide interconnectivity and far reachability from different geographical regions all over the world, are significantly reshaping our daily life [4], [5]. However, along with it come more and more serious risks of large-scale outbreaks of communicable diseases in modern human societies. The worldwide prevalence of the pandemic disease outbreaks such as severe acute respiratory syndrome (SARS) and influenza A (H1N1), indicates that patterns of human mobility dramatically alter the spreading behavior of viral infections, and even dominate the spa- tiotemporal dynamics of epidemics in human societies [6], [7], [8], [9], [10], [11], [12], [13]. Over the last decade, the study of infectious diseases in metapopulation models has attracted growing attention, in which the entire population is demographically divided into interconnected geographic regions, allowing migration of individuals between different subpopulations, where the individual transition has been witnessed its important role in understanding of the emerging of disease outbreaks [14], [15], [16], [17], [18], [19]. Yet so far, the inverse problem that how to infer epi- demic pathways (i.e., the most likely chains or channels for *This work was partially supported by the 973 program (No. 2010CB731403), the NCET program (No. NCET-09-0317) , the NSFC program (Nos. 61273223, 71173142), and the key program of Social Science Foundation (No. 12AZD051) of China. 1 Xun Li and Xiang Li are with Adaptive Networks and Control Lab., the Department of Electronic Engineering, Fudan University, Han- dan Road 220, Shanghai 200433, China {10110720039,lix} at fudan.edu.cn 2 Yu-Ying Jin is with the School of International Business Administration, Shanghai University of Finance ad Economics, Guoding Road 777, Shanghai 200433, China jyyshang at mail.shufe.edu.cn the infection transmission due to the individual transition among different subpopulations) from asymptotic behaviors of epidemic dynamics has not received adequate focus [20], [21], [22], [23]. One main difficulty of the inverse prob- lem lies in the stochasticity of both viral transmission and individual transition, i.e., the derived epidemic pathways from simulations of spreading processes vary from one realization to another. V. Colizza et al. [9] particularly addressed the predictability of epidemic pathways, stressing that a stable prediction should reflect the inter-similarity between spatiotemporal courses in stochastic realizations of epidemic spreading processes, which characterize such emerging pandemic outbreaks for the purpose of efficient outbreak control. For example, travel bans as an outbreak control measure may better works if imposed on predicted epidemic pathways. In the recent 2009 H1N1 outbreaks, evidences have shown that travel restrictions to and from Mexico taken by several adjacent countries decelerated the epidemic outbreak efficiently [7], [13]. In this paper, we propose a novel inference algorithm to identify epidemic pathways based on the data of surveillance reports of cumulative laboratory confirmed cases at the metapopulation level. We adopt a heuristic rule to generate the optimal inference for epidemic pathways, achieving a tradeoff between the minimization of the flow of individual transition between subpopulations and the minimization of the error of the model’s fit to real surveillance data. The remainder of this paper is organized as follows. In Sec. II, we first give a schematic description of our method for prediction of epidemic pathways. Sec. II-A introduces an infective-hospitalized (IH) model, and gives, as a case study, the values of model’s parameters for the 2009 outbreak of influenza A (H1N1) in the United States. Based on the IH epidemiological model, Sec. II-B proposes our heuristic in- ference algorithm for the calculation of individual transition matrices, and Sec. II-C gives the construction algorithm for invasion trees of the infection transmission and flow networks of the individual transition. In Sec. II-C we further analyze topological features of the epidemic pathways of influenza A (H1N1) outbreak, in which super-spreader or hub nodes in invasion trees or flow networks are highlighted by the results of our inference method. Finally, we conclude our work in Sec. III. II. THE CASE STUDY OF 2009 OUTBREAK OF INFLUENZA A (H1N1) IN THE UNITED STATES Our method to predict epidemic pathways (i.e., the in- dividual transition between interconnected subpopulations) 51st IEEE Conference on Decision and Control December 10-13, 2012. Maui, Hawaii, USA 978-1-4673-2064-1/12/$31.00 ©2012 IEEE 2840 978-1-4673-2066-5/12/$31.00 ©2012 IEEE

Upload: yu-ying

Post on 11-Mar-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE 2012 IEEE 51st Annual Conference on Decision and Control (CDC) - Maui, HI, USA (2012.12.10-2012.12.13)] 2012 IEEE 51st IEEE Conference on Decision and Control (CDC) - A Data-driven

A Data-driven Inference Algorithm for Epidemic Pathways Using

Surveillance Reports in 2009 Outbreak of Influenza A (H1N1)

Xun Li1, Xiang Li1 and Yu-Ying Jin2

Abstract— In this paper, we propose an epidemiologicalinfective-hospitalized (IH) model and adopt a heuristic algo-rithm to predict the transition of infective individuals, whichoptimizes, at the metapopulation level, the IH model’s approx-imation to the surveillance reports of (cumulative) laboratoryconfirmed cases. Applying to the data of the 2009 outbreak of anew strain of influenza A (H1N1) in the United States, we obtainthe invasion tree along which the virus spreads from the sourcestate reporting the first confirmed case to infect other states.Basically, the surveillance-data-based inference of invasion treeagrees with real epidemic pathways observed in outbreaks ofinfluenza A (H1N1), which verifies the validity of our heuristicinference algorithm.

I. INTRODUCTION

Complex networks have been extensively utilized to model

the world composed of entities and relations, which vary

from biological and social to engineering and industrial sys-

tems [1], [2], [3]. Modern transportation and mobility infras-

tructure (e.g., air traffic and commuting networks), equipping

a huge number of travelers with wide interconnectivity and

far reachability from different geographical regions all over

the world, are significantly reshaping our daily life [4],

[5]. However, along with it come more and more serious

risks of large-scale outbreaks of communicable diseases in

modern human societies. The worldwide prevalence of the

pandemic disease outbreaks such as severe acute respiratory

syndrome (SARS) and influenza A (H1N1), indicates that

patterns of human mobility dramatically alter the spreading

behavior of viral infections, and even dominate the spa-

tiotemporal dynamics of epidemics in human societies [6],

[7], [8], [9], [10], [11], [12], [13]. Over the last decade, the

study of infectious diseases in metapopulation models has

attracted growing attention, in which the entire population

is demographically divided into interconnected geographic

regions, allowing migration of individuals between different

subpopulations, where the individual transition has been

witnessed its important role in understanding of the emerging

of disease outbreaks [14], [15], [16], [17], [18], [19].

Yet so far, the inverse problem that how to infer epi-

demic pathways (i.e., the most likely chains or channels for

*This work was partially supported by the 973 program (No.2010CB731403), the NCET program (No. NCET-09-0317) , the NSFCprogram (Nos. 61273223, 71173142), and the key program of Social ScienceFoundation (No. 12AZD051) of China.

1Xun Li and Xiang Li are with Adaptive Networks and ControlLab., the Department of Electronic Engineering, Fudan University, Han-dan Road 220, Shanghai 200433, China {10110720039,lix} atfudan.edu.cn

2Yu-Ying Jin is with the School of International Business Administration,Shanghai University of Finance ad Economics, Guoding Road 777, Shanghai200433, China jyyshang at mail.shufe.edu.cn

the infection transmission due to the individual transition

among different subpopulations) from asymptotic behaviors

of epidemic dynamics has not received adequate focus [20],

[21], [22], [23]. One main difficulty of the inverse prob-

lem lies in the stochasticity of both viral transmission and

individual transition, i.e., the derived epidemic pathways

from simulations of spreading processes vary from one

realization to another. V. Colizza et al. [9] particularly

addressed the predictability of epidemic pathways, stressing

that a stable prediction should reflect the inter-similarity

between spatiotemporal courses in stochastic realizations

of epidemic spreading processes, which characterize such

emerging pandemic outbreaks for the purpose of efficient

outbreak control. For example, travel bans as an outbreak

control measure may better works if imposed on predicted

epidemic pathways. In the recent 2009 H1N1 outbreaks,

evidences have shown that travel restrictions to and from

Mexico taken by several adjacent countries decelerated the

epidemic outbreak efficiently [7], [13].

In this paper, we propose a novel inference algorithm to

identify epidemic pathways based on the data of surveillance

reports of cumulative laboratory confirmed cases at the

metapopulation level. We adopt a heuristic rule to generate

the optimal inference for epidemic pathways, achieving a

tradeoff between the minimization of the flow of individual

transition between subpopulations and the minimization of

the error of the model’s fit to real surveillance data.

The remainder of this paper is organized as follows. In

Sec. II, we first give a schematic description of our method

for prediction of epidemic pathways. Sec. II-A introduces an

infective-hospitalized (IH) model, and gives, as a case study,

the values of model’s parameters for the 2009 outbreak of

influenza A (H1N1) in the United States. Based on the IH

epidemiological model, Sec. II-B proposes our heuristic in-

ference algorithm for the calculation of individual transition

matrices, and Sec. II-C gives the construction algorithm for

invasion trees of the infection transmission and flow networks

of the individual transition. In Sec. II-C we further analyze

topological features of the epidemic pathways of influenza A

(H1N1) outbreak, in which super-spreader or hub nodes in

invasion trees or flow networks are highlighted by the results

of our inference method. Finally, we conclude our work in

Sec. III.

II. THE CASE STUDY OF 2009 OUTBREAK OF

INFLUENZA A (H1N1) IN THE UNITED STATES

Our method to predict epidemic pathways (i.e., the in-

dividual transition between interconnected subpopulations)

51st IEEE Conference on Decision and ControlDecember 10-13, 2012. Maui, Hawaii, USA

978-1-4673-2064-1/12/$31.00 ©2012 IEEE 2840978-1-4673-2066-5/12/$31.00 ©2012 IEEE

Page 2: [IEEE 2012 IEEE 51st Annual Conference on Decision and Control (CDC) - Maui, HI, USA (2012.12.10-2012.12.13)] 2012 IEEE 51st IEEE Conference on Decision and Control (CDC) - A Data-driven

Apr 23 May 1 May 10 15 18 22 25 27 29 Jun 1 5 Jun 12 Jun 19

101

102

103

104

h(t)

days

Fig. 1. Time evolution of total amount of cumulative laboratory confirmedcases of influenza A (H1N1) Infection in the United States in 2009, reportedby the Centers for Disease Control and Prevention (CDC). The dashed linecorresponds to the IH model’s fit of surveillance data with the parametersgiven in Tab. I.

consists of three steps as follows:

(a) Estimation of basic epidemiological parameters at the

whole population level.

(b) Inference for temporary courses of epidemic dynamics

in each subpopulation.

(c) Construction of epidemic pathways at the metapopu-

lation level.

Next we will describe each step of the inference method

in detail, taking the 2009 outbreak of influenza A (H1N1) in

the United States of America as a case study.

A. Basic epidemiological model

Figure 1 plots the surveillance data of the 2009 outbreak

of influenza A (H1N1) in the United States1, which typically

shows a piecewise exponential growth in the number of

laboratory confirmed cases of the disease [24]. Thus we

adopt the basic epidemiological model in an extraordinarily

simple form to describe the growth of disease outbreak at

the collective level as follows:

di

dt= α0i(t), (1)

where i(t)(= i(0)eα0t) is the number of infective individuals

at time t, and α0 is the Malthusian parameter that governs

the exponential growth rate of the infectious disease [25]. In

the traditional epidemic dynamics, the Malthusian rate can

be simply approximated as α0 = τ−1 (R0 − 1), where τ is

the average infective period, and R0 is the basic reproductive

number [26].

To model the surveillance data of cumulative laboratory

confirmed cases of patients, we compartmentalize individuals

into two groups: infective (I) or hospitalized (H), and

consider the following infective-hospitalized (IH) dynamics:

di

dt= α0i(t)− β0i(t),

dh

dt= β0i(t), (2)

where β0 is the hospitalizing rate of infectives, i.e., the

probability of an infective individual being hospitalized (who

1The surveillance report data of cumulative laboratory confirmed casesof influenza A (H1N1) in the United States in 2009 are available athttp://www.cdc.gov/h1n1flu/updates/.

TABLE I

PARAMETERS OF IH MODEL USED IN THE PAPER

k Date Interval αk βk αk/βk

0 April 23 ∼ April 29 1.068 0.644 1.6581 April 29 ∼ May 9 0.771 0.452 1.7062 May 9 ∼ May 15 0.364 0.243 1.4983 May 15 ∼ June 19 0.120 0.076 1.579

is counted up to the number of confirmed cases and thus

removed from the I-compartment due to quarantined hos-

pitalization) during every infinitesimally small time interval

dτ is β0dτ .

More precisely, we fit the real surveillance data using

piecewise exponential growth functions with different pa-

rameters2 αk and βk (k = 0, 1, 2, 3) during different time

intervals (given by Tab. I). Note that there is a decay in

the exponential growth rate of the H-individuals: from a

high increasing rate at the early stage of the pandemic

outbreak to a stable exponential growth with a relatively

smaller rate. Under the mean-field approximation we assume

that every state, at the metapopulation level, has the same

epidemiological parameters, αk and βk, obtained by the

best fit of our IH model to the data of surveillance reports

of cumulative laboratory confirmed cases within the entire

population (summing the numbers of hospitalized individuals

in all subpopulations), as shown in Fig. 1. Next, we will

propose a heuristic inference algorithm for predicting the

metapopulation-leveled transition of individuals.

B. Heuristic inference algorithm

Given the basic epidemiological parameters and the ini-

tial conditions (corresponding to real initial situations of

epidemic outbreaks) for each subpopulation, the epidemic

dynamics determines an analytically predictive trajectory, i.e,

the temporal course of the number of hospitalized individuals

in this subpopulation. However, these theoretical trajectories

are usually far biased from the real ones (the recorded data

of surveillance reports). This bias would be offset by, e.g.,

the individual transition at the metapopulation level and/or

fluctuation of real epidemiological parameters3. Here we pro-

pose a heuristic inference algorithm to predict a trajectory for

each subpopulation, which parameterizably reaches a balance

between the basic epidemiological model’s prediction and the

real trajectory of surveillance reports data.

With the above pre-determined basic epidemiological pa-

rameters, we consider the metapopulation epidemic IH model

2The reproductive number in our IH model is defined as the ratio betweenthe infection rate and the hospitalization rate, i.e., Rk = αk/βk for eachdate interval [tk, tk+1). We find that our obtained reproductive numbersRk (Tab. I) are basically consistent with the range of 1.4 to 1.6 reportedby Ref. [27].

3The predicted trajectory by the basic epidemic model only considersan isolated subpopulation without interconnectivity, and thus from this biasone can infer the individual transition between different subpopulations, i.e.,epidemic pathways. Here we neglect the fluctuation factor of real epidemi-ological parameters under the mean-field approximation for simplicity.

2841

Page 3: [IEEE 2012 IEEE 51st Annual Conference on Decision and Control (CDC) - Maui, HI, USA (2012.12.10-2012.12.13)] 2012 IEEE 51st IEEE Conference on Decision and Control (CDC) - A Data-driven

h(t)

Ala

bam

aA

lask

aA

rizon

aA

rkan

sas

Calif

orni

aCo

lora

doCo

nnec

ticut

Del

awar

eFl

orid

aG

eorg

iaH

awai

iId

aho

Illin

ois

Indi

ana

Iow

aKa

nsas

Kent

ucky

Loui

sian

aM

aryl

and

Mai

neM

assa

chus

etts

Mic

higa

nM

inne

sota

Mis

siss

ippi

Mis

sour

iM

onta

naN

ebra

ska

Nev

ada

New

Ham

pshi

reN

ew Je

rsey

New

Mex

ico

New

Yor

kN

orth

Car

olin

aN

orth

Dak

ota

Ohi

oO

klah

oma

Ore

gon

Penn

sylv

ania

Rhod

e Is

land

Sout

h Ca

rolin

aSo

uth

Dak

ota

Tenn

esse

eTe

xas

Uta

hVi

rgin

iaVe

rmon

tW

ashi

ngto

nW

ashi

ngto

n D

.CW

est V

irgin

iaW

isco

nsin

Wyo

min

g

4000

3000

2000

1000

0

Fig. 2. IH model’s fit of the number of H-individuals (red circles), hs(tN )at the metapopulation level, to the surveillance report (blue dotted line) onJune 19 at the late stage of the pandemic outbreak of influenza A (H1N1)in the United States. All the 51 states are sorted in alphabetical order. Thedamping parameter d = 0.8, and the step number D = 5 for our heuristicalgorithm.

plus the diffusion term corresponding to the individual tran-

sition between different subpopulations as follows:

disdt

= αkis(t)− βkis(t) +S∑

s′=1

Γs′s(t),

dhs

dt= βkis(t), tk ≤ t < tk+1

(3)

where S is the number of subpopulations, is(t) and hs(t)are the numbers of I- and H-individuals in state s at time

t, respectively. [tk, tk+1) are the k-th date intervals which

correspond to different parameters of the Malthusian growth

rates αk and hospitalizing rates βk, for k = 0, 1, ..., N − 1.

Γss′(t) is the individual transition matrix in the form of

Γss′(t) =

N−1∑

k=0

Nk−1∑

j=0

δ(t− tk,j)mk,jss′ , (4)

where δ(·) is the Dirac delta function, and hence Eq. (4)

implies that at each time tk,j , there is mk,jss′ (if m

k,jss′ > 0)

infective individuals transferring from state s to state s′.

Otherwise, a negative mk,jss′ < 0 stands for the individual

transition along an opposite direction (from state s′ to state

s). We denote

tk = tk,0 < ... < tk,j ... < tk,Nk−1 < tk+1, (5)

the sequence of Nk dates at which the individual transition

occurs during the k-th date interval [tk, tk+1). Here we

combine the continuous-time IH epidemic dynamics with

the metapopulation-leveled transition at discrete dates. Next

we target to infer the individual transition matrix Γs′s(t)that optimizes the model’s approximation, the number of

H-individuals hs(t) of state s, to the corresponding real

surveillance data.

Consider the following heuristic strategy to minimize the

error between hs(t) and hs(t): At each discrete time tk,j ,

the optimal number iopts (tk,j) of I-individuals in state s is

Apr 23 May 1 May 10 15 18 22 25 27 29 Jun 1 5 Jun 12 Jun 190

200

400

600

800

h(t)

D=1

D=5

D=10

D=15

Real data

days

Fig. 3. Time evolution of the H-individual number in Arizona under differ-ent parameters D. In the case of D = 1, the heuristic optimization strategy

Eq. (6) reduces to consider only one step in the future, yielding iopts (tk,j) =

max{0, b−1k

(

hs(tk,j+1)− hs(t−1k,j

))

ak−bk

e(ak−bk)(tk,j+1−tk,j)

−1}.

given by

iopts (tk,j) = argmaxi0>0

1≤δ≤D,

tk,j+δ≤tN

dδ(hs(tk,j+δ)− hs(tk,j+δ))2,

(6)

where d is the damping factor (we adopt d = 0.8 throughout

the paper), hs(t) is the real data of surveillance reports, and

hs(t) is the H-variable of an auxiliary dynamics for every

state s (corresponding to the dynamics of Eqs. (3) with no

individual diffusion):

dis

dt= α(t)is(t)− β(t)is(t),

dhs

dt= β(t)is(t), (7)

where α(t) = αk, β(t) = βk for t ∈ [tk, tk+1). The initial

condition is that is(tk,j) = i0, hs(tk,j) = hs(t−

k,j), where

hs(t−

k,j) denotes the left limit of hs(t) at time tk,j . For

notational simplicity, we provisionally assume that tk,Nk=

tk+1,0, tk,Nk+1 = tk+1,1, ..., and so on. Thus according

to Eq. (6), the optimal selection of iopts (tk,j) is the initial

number of infective individuals that will generates the best

fit to hs(t) in the next D steps of the date sequence for

state s, if isolated from other metapopulations. Therefore we

obtain

is(tk,j) =iopts (tk,j)S∑

s′=1

iopts′ (tk,j)

S∑

s′=1

is′(t−

k,j). (8)

Here, the continuity condition of total number of I-

individuals works for normalization.

Figure 2 plots the prediction of our IH model, hs(tN ), the

numbers of hospitalized individuals in all subpopulations on

the last day, which shows a good agreement with the real

surveillance report at the metapopulation level. We also ob-

serve the time course of hs(t) for Arizona, which reflects the

role of the step number D for optimization in our heuristic

inference algorithm. A large D implies smooth trajectories

with less fluctuation in growth rates of hs(tk,j) and thus

there is only a small number of I-individuals immigrate or

emigrate between subpopulations; whereas a small D implies

a myopic redistribution of hs(tk,j) which generates precise

2842

Page 4: [IEEE 2012 IEEE 51st Annual Conference on Decision and Control (CDC) - Maui, HI, USA (2012.12.10-2012.12.13)] 2012 IEEE 51st IEEE Conference on Decision and Control (CDC) - A Data-driven

trajectories to track the surveillance reports, as shown in

Fig. 3. Thus there is a tradeoff between the model’s ap-

proximation to the shape of surveillance reports hs(tk,j) and

the smoothness of the model’s prediction hs(tk,j). Note that

hs(tk,j) noised by the presence of case reports of infections

that were missed, wrong, or delayed [28], usually works to

the disadvantage of the model’s inference of real infectives’

growth and transition, and therefore hs(tk,j) can be properly

smoothened by choosing an intermediate parameter D in our

model (we shall adopt D = 5 in numerical simulations,

reaching a good tradeoff between the precision and the

smoothness of the model’s fit to the surveillance data, as

shown in Fig. 3).

C. Construction of transition matrices

From our heuristic inference algorithm Eq. (8), we have

obtained the redistributed I-individual numbers hs(tk,j) in

all subpopulations. In this section we consider the construc-

tion algorithm of the individual transition matrix Γss′(t)satisfying

is(tk,j) = is(t−

k,j) +

S∑

s′=1

mk,js′s, (9)

where mk,jss′ is the same as that of Eq. (4).

Given that an intuitive role is played by individual transi-

tion in causing redistribution of (infective) individuals among

different subpopulations, and as a result, it generates the

fluctuation of growth rates around their mean value under

the mean-filed approximation, our basic idea of epidemic

pathway construction is also derived from this intuition.

According to the bias between the analytically predicted

trajectories and the results obtained from the previous sec-

tion, with the method of agent-based simulations we can

realize how infective individuals transfer between different

subpopulations at each time step, in which the most likely

epidemic pathways can be inferred from the statistics of these

extensive numerical results.

Using the technology of agent-based simulations, we con-

sider the construction algorithm of the individual transition

matrix mk,jss′ at each time {tk,j} of the sequential dates as

follows:

1) Initially set mk,js′s = 0 for all states s and s′, and

introduce a group of auxiliary variables, I-differences

∆s = is(tk,j)− is(t−

k,j), for all states s;

2) Randomly select states s and s′ from all the possible

pairs of states with the differences ∆s and ∆s′ with

opposite signs (without loss of generality, we assume

that ∆s < 0 and ∆s′ > 0) with probability propor-

tional to√

|∆s∆s′ |;3) Consider an infective individual from state s transfers

to state s′, and accordingly, mk,jss′ → m

k,jss′ +1, m

k,js′s →

mk,js′s − 1, and ∆s → ∆s + 1, ∆s′ → ∆s′ − 1;

4) Repeat Steps 2 and 3 until all the I-differences of

states reach ∆s = 0.

From the construction algorithm of transition matrices, we

can obtain the invasion tree that shows the epidemic path-

ways along which the infection of diseases transmits from

TABLE II

SUPER-SPREADERS IN THE 2009 OUTBREAK OF INFLUENZA A (H1N1)

IN THE UNITED STATES OF AMERICA∗

Rank State phubs

1 New York 97.55%2 Texas 91.09%3 California 89.64%4 Illinois 57.82%5 Arizona 23.89%6 Delaware 22.20%7 Massachusetts 18.22%8 South Carolina 10.85%9 Colorado 3.51%

10 Kansas 2.61%11 New Jersey 1.75%12 Michigan 0.80%13 Indiana 0.58%14 Wisconsin 0.52%15 Utah 0.50%

(∗Each phubs is obtained by averaging over 105 realizations of epidemicprocesses. Those states with phubs < 0.5% are not listed above.)

one subpopulation to another during the disease outbreaks.

In one realization of simulations of epidemic processes, for

every state s (expect California and Texas which initially

report confirmed cases of influenza A (H1N1) Infection), we

record from which state the seeded case of infection (i.e. the

first infective individual appeared in state s) come from.

We find that there exists a set of hub nodes in the

invasion trees (which act as the so-called “super-spreaders”),

the I-travelers from which are capable of infecting a large

number of other states. We have carried out 105 independent

realizations of simulations, and in every generated invasion

tree a node is deemed as a hub if it has out-degree larger

than 3. Thus we obtained the probability phubs of each state

s being a super-spreader in the epidemic process, given by

the fraction of realizations in which state s acts as a hub

nodes. Table II lists the top 15 states with the largest phubs

in the influenza A (H1N1) Infection. Note that the role of

hub nodes is usually played by those states which have

undergone an early outbreak of the Influenza A (H1N1)

Infection (e.g., Texas, California and New York), or a rapid

growth of disease outbreak (for example, Illinois reported 8

cases of H1N1 Infection on May 4th, and 82 cases on May

5th with 75 newly added H-individuals within a day; similar

situations also occurred in Arizona and New York), which is

also consistent with intuition.

Although our inference on the set of hub nodes (super-

spreaders) of invasion trees provides us the information

about the skeleton of epidemic pathways of the infection

transmission, leaf nodes connected to these hubs may usually

vary from one realization to another (due to the stochasticity

of our construction rule of the individual transition matrices).

Thus we consider the average result of different invasion

trees generated by independent realizations as follows: Con-

struct the weighted epidemic invasion tree with link weight

wij defined as the probability that the first infective case

reported by state j comes from state i, which can be

numerically calculated as the ratio between the number of

2843

Page 5: [IEEE 2012 IEEE 51st Annual Conference on Decision and Control (CDC) - Maui, HI, USA (2012.12.10-2012.12.13)] 2012 IEEE 51st IEEE Conference on Decision and Control (CDC) - A Data-driven

Alabama

Alaska

Arizona

Arkansas

California

Colorado

Connecticut

Delaware

Florida

Georgia

Hawaii

Idaho

Illinois

Indiana

Iowa

Kansas

Kentucky

Louisiana

Maryland

Maine

Massachusetts

Michigan

Minnesota

Mississippi

Missouri

Montana

Nebraska

Nevada

New HampshireNew Jersey

New Mexico

New York

North Carolina

North Dakota

Ohio

Oklahoma

Oregon

Pennsylvania

Rhode Island

South Carolina

South Dakota

Tennessee

Texas

Utah

Virginia

Vermont

WashingtonWashington D.C

West Virginia

Wisconsin

Wyoming

Pajek

Fig. 4. The most likely invasion tree of influenza A (H1N1) Infection inthe United States. The spanning tree is constructed from weighted epidemicinvasion tree summing over 105 independent realizations. Parameters:damping parameter d = 0.8 and step number D = 5. The structure ofthe invasion tree is visualized by Pajek software [29].

realizations in which state j is infected by a seeded case from

state i and the total number of realizations. Then we construct

the maximum spanning tree from the weighted epidemic

invasion tree, where the sum of links wij contained in the

spanning tree is maximized. Averaging over 105 realizations

of epidemic processes, we obtain the most likely invasion

tree, as shown in Fig. 4.

We further consider the flow network for individual transi-

tion where each link weight fij denotes the total number of

individuals having transferred from state i to state j during

the history of epidemic outbreak. We find that the flow

network obtained by averaging over different 105 realizations

is a complete graph due to the nature of stochastic processes

of individual transition as well as our construction algorithm.

Therefore we further define T%-flow network as a subgraph

of the flow network where a minimal number of links with

the largest weight fij remains, and the sum of these reserved

link weights is over T% of the total amount of the entire link

weights of the flow network. Figure 5 shows different T%-

flow networks with different level T%, reflecting the hetero-

geneity of individual transition (only about 8.0%(2.5%) of

all the possible links carries 70%(50%) of transition flow).

Note that Wisconsin with the largest outbreak size at the

late stage of influenza A (H1N1) Infection acts as a hub

node in the flow network, but counterintuitively, it has a

very low super-spreader probability phubs = 0.52%, as given

in Tab. II, in the invasion tree because of an belated disease

outbreak in Wisconsin relative to other hub nodes in the

epidemic infection tree. Also, some nodes (see the real data

of surveillance reports of Arizona, as shown in Fig. 3) with

a relatively large fluctuation in the growth rate of diseases

are likely to be hubs in the flow networks because our

model assumes that this growth rate fluctuation is caused

by the redistribution of I-individuals due to the individual

Alabama

Arizona

California

Colorado

Connecticut

DelawareFloridaHawaii

Illinois

IndianaIowa

Kansas

Kentucky

Louisiana

MaineMassachusetts

Michigan

Minnesota

Mississippi

Nevada

New Hampshire

New Jersey

New Mexico

New York

North Carolina

Oregon

Pennsylvania

Rhode Island

Tennessee

TexasUtah

VermontWashington D.C

West Virginia

WisconsinWyoming

Pajek

Arizona

California

ConnecticutFlorida

Illinois Maine

Massachusetts

Michigan Minnesota

New Jersey

New York Pennsylvania

Texas

Utah

Washington D.C

WisconsinArizona

California

Connecticut

Illinois

Massachusetts

Minnesota

New Jersey

Pennsylvania

Texas

Utah

Washington D.C

Wisconsin

(a)

(b)

(c)

Fig. 5. T%-flow networks of influenza A (H1N1) Infection in the UnitedStates with different parameter (a) T% = 70%, (b) T% = 50%, and(c) T% = 30%. The 70%-flow network composes of 36 nodes and 202(directed) links, the 50%-flow network composes of 16 nodes and 65 links,the 30%-flow network composed of 12 nodes and 23 links. The T%-flownetworks are constructed from weighted epidemic invasion tree summingover 105 independent realizations. These networks are visualized by Pajeksoftware.

transition.

III. CONCLUSIONS

In summary, we have considered the inference of individ-

ual diffusions in epidemic spreading processes on metapop-

ulation networks. Solely driven by the data of surveillance

reports of cumulative laboratory confirmed cases, i.e., the

real trajectory (temporary course) of the epidemic dynamics,

we have inferred the number of infectives participating in

individual transition for each subpopulation at sequential

times during the history of disease outbreaks, which differs

from previous metapopulation epidemic models assuming

transition probability of individuals between different sub-

populations [23]. Given that the presence of differences in

patterns of human mobility, or awareness of the disease

outbreak at early, middle or late stages has an effect on

transition probabilities of individuals, we have therefore

relaxed the assumption on the pre-determined and/or time-

invariant transition probability in our IH model.

Also, we have proposed a construction algorithm of epi-

demic pathways and invasion trees of the disease outbreaks,

which is intuitively based on the model’s approximation to

surveillance data. We find that a key model parameter, step

number D, plays an important role in affecting the network

structure of invasion trees. A proper choice of D gives a

relatively precise and smooth fit of our IH model, and hence

generates the invasion tree which is more close to reality

of epidemic processes. Applying our algorithm to influenza

A (H1N1) Infection in the United States in 2009, we have

obtained the skeleton of epidemic pathways composed of a

set of super-spreaders (under D = 5, New York, Texas, Cal-

ifornia, Illinois, etc.) during epidemic spreading of influenza

A (H1N1).

2844

Page 6: [IEEE 2012 IEEE 51st Annual Conference on Decision and Control (CDC) - Maui, HI, USA (2012.12.10-2012.12.13)] 2012 IEEE 51st IEEE Conference on Decision and Control (CDC) - A Data-driven

It should be noted that our data-driven inference algo-

rithm has ignored some factors that may confine, to some

extent, the virus transmission channels (such as air traffic

and commuting flow, communication networks, geographic

distance, demographic distribution etc.), which have been

shown closely related to the spreading of epidemics [30], [7],

[9], [31], [32], [33], [34], [35], [36], [37]. One can readily

incorporate these population-mobility-related factors into the

extended agent-based simulations to modify the inference

results of the invasion trees and epidemic pathways.

Besides, the case study of influenza A (H1N1) in the

United States validates the applicability of our inference

method for epidemic pathways to other epidemic outbreaks

if the data of surveillance reports is available. Moreover,

applying to the statistics of surveillance data of different

epidemics, one may adopt some other basic epidemiological

models and conceivable heuristic rules for the inference

algorithm of the individual transition, which deserve further

study in near future.

ACKNOWLEDGMENT

The authors thank Lin Wang for helpful discussions and

suggestions.

REFERENCES

[1] M. Newman, “The structure and function of complex networks,” SIAM

Rev., pp. 167–256, 2003.[2] X. Wang and G. Chen, “Complex networks: small-world, scale-free

and beyond,” Circuits and Systems Magazine, IEEE, vol. 3, no. 1, pp.6–20, 2003.

[3] A. Barabasi, “The architecture of complexity,” Control Systems Mag-

azine, IEEE, vol. 27, no. 4, pp. 33–42, 2007.

[4] M. Bell and Y. Iida, Transportation network analysis. J. Wiley, 1997.[5] R. Guimera, S. Mossa, A. Turtschi, and L. Amaral, “The worldwide

air transportation network: Anomalous centrality, community structure,and cities’ global roles,” Proc. Natl. Acad. Sci., vol. 102, no. 22, p.7794, 2005.

[6] J. Lau, X. Yang, H. Tsui, E. Pang, and J. Kim, “Sars preventive and riskbehaviours of hong kong air travellers,” Epidemiology and Infection,vol. 132, no. 04, pp. 727–736, 2004.

[7] V. Colizza, A. Barrat, M. Barthelemy, and A. Vespignani, “The role ofthe airline transportation network in the prediction and predictabilityof global epidemics,” Proc. Natl. Acad. Sci., vol. 103, no. 7, p. 2015,2006.

[8] Z. Liu, K. He, L. Yang, C. Bian, and Z. Wang, “Characterizingtransmission and control of the sars epidemic: Novel stochastic spatio-temporal models,” in Engineering in Medicine and Biology Society,

2005. IEEE-EMBS 2005. 27th Annual International Conference of the,2006, pp. 7463–7469.

[9] V. Colizza, A. Barrat, M. Barthelemy, and A. Vespignani, “Predictabil-ity and epidemic pathways in global outbreaks of infectious diseases:the sars case study,” BMC medicine, vol. 5, no. 1, p. 34, 2007.

[10] S. Peng, K. Yang, Q. Xu, J. Wang, J. Xiong, and L. Liu, “A simulationstudy of h1n1 space-time epidemic based on agent-based modeling,”in Geoinformatics, 2010 18th International Conference on, 2010, pp.1–4.

[11] Y. Zhang, Z. Liu, Y. Zhang, H. Yang, Y. Bo, L. Fang, and X. Xiao,“Spatially explicit epidemiological simulation system of influenza a(h1n1) in china,” in Geoinformatics, 2010 18th International Confer-

ence on, 2010, pp. 1–6.[12] S. Merler and M. Ajelli, “The role of population heterogeneity and

human mobility in the spread of pandemic influenza,” Proc. Roy. Soc.

B, vol. 277, no. 1681, p. 557, 2010.

[13] P. Bajardi, C. Poletto, J. Ramasco, M. Tizzoni, V. Colizza, andA. Vespignani, “Human mobility networks, travel restrictions, and theglobal spread of 2009 h1n1 pandemic,” PLoS ONE, vol. 6, no. 1, p.e16591, 2011.

[14] I. Hanski and O. Ovaskainen, “The metapopulation capacity of afragmented landscape,” Nature, vol. 404, no. 6779, pp. 755–758, 2000.

[15] P. Van den Driessche and J. Watmough, “Reproduction numbers andsub-threshold endemic equilibria for compartmental models of diseasetransmission,” Math. Biosci., vol. 180, no. 1, pp. 29–48, 2002.

[16] V. Colizza, R. Pastor-Satorras, and A. Vespignani, “Reaction–diffusionprocesses and metapopulation models in heterogeneous networks,”Nature Phys., vol. 3, no. 4, pp. 276–282, 2007.

[17] V. Colizza and A. Vespignani, “Invasion threshold in heterogeneousmetapopulation networks,” Phys. Rev. Lett., vol. 99, no. 14, p. 148701,Oct 2007.

[18] ——, “Epidemic modeling in metapopulation systems with heteroge-neous coupling pattern: Theory and simulations,” J. Theor. Biol., vol.251, no. 3, pp. 450–467, 2008.

[19] L. Cao, X. Li, B. Wang, and K. Aihara, “Rendezvous effects in thediffusion process on bipartite metapopulation networks,” Phys. Rev. E,vol. 84, no. 4, p. 041936, 2011.

[20] H. Chen and D. Zeng, “Ai for global disease surveillance,” Intelligent

Systems, IEEE, vol. 24, no. 6, pp. 66–82, 2009.[21] R. Colbaugh and K. Glass, “Predictive analysis for social processes i:

Multi-scale hybrid system modeling, and ii: Predictability and warninganalysis,” in Proc. 2009 IEEE Multi-Conference on Systems and

Control, 2009.[22] L. Yulian, “Investigation of prediction and establishment of sir model

for h1n1 epidemic disease,” in Bioinformatics and Biomedical Engi-

neering (iCBBE), 2010 4th International Conference on, 2010, pp.1–4.

[23] Y. Maeo, “Discovering network behind infectious disease outbreak,”Physica A, vol. 389, no. 21, pp. 4755–4768, 2010.

[24] S. de Picoli Junior, J. Teixeira, H. Ribeiro, L. Malacarne, R. dos San-tos, and R. dos Santos Mendes, “Spreading patterns of the influenzaa (h1n1) pandemic,” PLoS ONE, vol. 6, no. 3, p. e17823, 2011.

[25] R. Anderson and R. May, Infectious diseases of humans: dynamics

and control. Oxford University Press, Oxford, 1991.[26] L. White, J. Wallinga, L. Finelli, C. Reed, S. Riley, M. Lipsitch, and

M. Pagano, “Estimation of the reproductive number and the serialinterval in early phase of the 2009 influenza a/h1n1 pandemic in theusa,” Influenza and other respiratory viruses, vol. 3, no. 6, pp. 267–276, 2009.

[27] C. Fraser, C. Donnelly, S. Cauchemez, W. Hanage, M. Van Kerkhove,T. Hollingsworth, J. Griffin, R. Baggaley, H. Jenkins, E. Lyons et al.,“Pandemic potential of a strain of influenza a (h1n1): early findings,”Science, vol. 324, no. 5934, pp. 1557–1561, 2009.

[28] L. White and M. Pagano, “Reporting errors in infectious disease out-breaks, with an application to pandemic influenza a/h1n1,” Epidemiol.

Perspect. Innov., vol. 7, p. 12, 2010.[29] V. Batagelj and A. Mrvar, “Pajek-program for large network analysis,”

Connections, vol. 21, no. 2, pp. 47–57, 1998.[30] H. Zhuge and X. Shi, “Fighting epidemics in the information and

knowledge age,” Computer, vol. 36, no. 10, pp. 116–114, 2003.[31] S. Meloni, A. Arenas, and Y. Moreno, “Traffic-driven epidemic

spreading in finite-size scale-free networks,” Proc. Natl. Acad. Sci.,vol. 106, no. 40, pp. 16 897–16 902, 2009.

[32] S. Lee and N. Wong, “Reconstruction of epidemic curves for pandemicinfluenza a (h1n1) 2009 at city and sub-city levels,” Virology Journal,vol. 7, no. 321, 2010.

[33] Y. Luo, D. Zeng, Z. Cao, X. Zheng, Y. Wang, Q. Wang, andH. Zhao, “Using multi-source web data for epidemic surveillance: Acase study of the 2009 influenza a (h1n1) pandemic in beijing,” inService Operations and Logistics and Informatics (SOLI), 2010 IEEE

International Conference on, 2010, pp. 76–81.[34] H. Achrekar, A. Gandhe, R. Lazarus, S. Yu, and B. Liu, “Predicting

flu trends using twitter data,” in Computer Communications Workshops

(INFOCOM WKSHPS), 2011 IEEE Conference on, 2011, pp. 702–707.[35] P. Loganathan, C. Ho, H. Lee, and S. Lakshminarayanan, “Towards

forecasting flu dynamics using a regionalized state space model,” inAdvanced Control of Industrial Processes (ADCONIP), 2011 Interna-

tional Symposium on, 2011, pp. 175–180.[36] Y. Hu, J. Zhang, D. Huan, and Z. Di, “Toward a general understanding

of the scaling laws in human and animal mobility,” Europhys. Lett.,vol. 96, p. 38006, 2011.

[37] L. Wang, X. Li, Y. Zhang, Y. Zhang, and K. Zhang, “Evolution ofscaling emergence in large-scale spatial epidemic spreading,” PLoS

ONE, vol. 6, no. 7, p. e21197, 2011.

2845