simnest: social media nested epidemic simulation via...

22
ICDM 2015 Presentation SimNest: Social Media Nested Epidemic Simulation via Online Semi-supervised Deep Learning Joint work with Jiangzhuo Chen 1 , Feng Chen 2 , Wei Wang 1 , Chang-Tien Lu 1 , and Naren Ramakrishnan 1 Liang Zhao, Virginia Tech 1 Virginia Tech, 2 SUNY-Albany

Upload: others

Post on 10-Jun-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

ICDM 2015 Presentation

SimNest: Social Media Nested Epidemic

Simulation via Online Semi-supervised

Deep Learning

Joint work with Jiangzhuo Chen1, Feng Chen2, Wei Wang1,

Chang-Tien Lu1, and Naren Ramakrishnan1

Liang Zhao, Virginia Tech

1Virginia Tech, 2SUNY-Albany

Introduction: Epidemics

2

• Seasonal influenza:

• Yearly 3~5million cases of severe illness

• Yearly 250,000 to 500,000 deaths

• Pandemic Flu of 1918:

• Killed 2.5 - 5% of global population

• Many more were sick

• Ebola outbreak in West Africa:

• 27,055 cases

• 11,142 deaths

Introduction: Seasonal Epidemics

3

Week 47 Week 46 Week 45

Influenza outbreak on Week 47 ending Nov 22, 2014 in southern region

CDC: www.cdc.gov/flu/ Google Flu Trends: https://www.google.org/flutrends/about/

Epidemics Modeling (Category 1):

Computational Epidemiology

4

3. Run simulation model

a. Demographics and social contact network

b. Disease progression

c. Interventions

School Closure Vaccination Isolation

1. Model the following mechanisms 2. Tune parameters against

surveillance data

Epidemics Modeling (Category 1):

Computational Epidemiology

• Challenges

– Challenge 1: Coarse-grained surveillance data

– Challenge 2: Dynamics of contact networks

– Challenge 3: Poor timeliness

• Surveillance data comes by weeks.

• Surveillance data is at least one week behind.5

State-wise: Week-wise:

Peter moves out to another cityTaylor is immune

to flu after

getting flu shot Jim is on vacation from Dec 23.

Epidemics Modeling (Category 2):

Data Driven on Social Media

• Fast monitoring real-time epidemics

• Individual-wise health condition mining

6

• Temporally fine-grained

• No delay

Avoid crowds

in flu season,

What Peter will

do?Get flu shot

Feel I’m getting flu

AHA, false alarm

Maybe I indeed need see doctor

2. Identify the individual’s disease progression1. Identify the response to flu

Epidemics Modeling (Category 2):

Data Driven on Social Media

7

Have No Idea of the Underlying Mechanism

What is the real mechanism of disease progression?

What is infection process of flu across the crowds?

What is the consequence if someone took vaccine?

Is there any influence on infectivity if Jim will have summer holiday?

Challenge: Real Mechanism is hidden to social media

Motivations

8

• Drawbacks:– No mechanism on disease progression

– No mechanism on disease diffusion

– No consideration on interventions

Computational Epidemiology

• Drawbacks:– Temporally coarse-grained

– Spatially coarse-grained

– Poor dynamics in social contact

network

– One week delay

Social Media Mining

Combine

• Advantages:– Mechanism on disease progression

– Mechanism on disease diffusion

– Consideration on interventions

• Advantages:– Temporally fine-grained

– Spatially fine-grained

– Individually monitoring

– Change in social contact network is

observable in real time

– No time delay

+

• Drawbacks:– Temporally coarse-grained

– Spatially coarse-grained

– Poor dynamics in social contact

network

– One week delay

• Drawbacks:– No mechanism on disease progression

– No mechanism on disease diffusion

– No consideration on interventions

Idea

9

Model: Overview

10

minℒ= min ℒ𝐴 + ℒB + ℒC + ℒD

Our objective:

Minimize loss

Model (part A): Supervised Loss

11

• Input (tweet content):

• Output (health stage):

• Mapping:

I: Infectious

• Supervised Loss:

𝑓𝑊 ⋅ : one-hidden layer perceptron

A

Model (part B): Bi-space Consistency Loss

• Social Contact Network:

– Nodes: 𝒱,

– Edges: ℰ,

– Weights: 𝒲, contact duration between

• Disease Progression: SEIR model

– Individual’s health stage: , where

Susceptible (S), Exposed (E), Infectious (I), and Recovered (R)

– Progression: S E I R

• Bi-space Loss•

12

Incubation period: 𝑝𝐸(𝑣) ∼ 𝒩(𝜇𝐸 , 𝜎𝐸) Infectious period: 𝑝𝐼(𝑣) ∼ 𝒩(𝜇𝐼 , 𝜎𝐼)

B

Model (part C): Infectious Period Loss

13

• Infectious Period observed in social media should be statistically

consistent with that in disease progression model

• Maximize the likelihood, and re-arrange:

C

Model (part D): Temporal Pattern Loss

• Health stage should be consecutive.

• Individual who recovers from flu cannot get it

again.

14

D

Online Training Algorithm

• Objective function:

• Alternating optimization:

– Solving for 𝑊, fix others.

• Stochastic Gradient Descent

– Solving for Θ, fix others.

• Nelder-Mead method.

– Solving for 𝑝𝐼 , 𝜆1.

15

C D

A B

Model Extensions

1. Consider dynamics of contact network

2. Consider heterogeneous surveillance

– Loss:

– Scaling down time frame:

16

Dynamically adjust the

transmissibility:

E

Experiments: Dataset

17

Connecticut (CT), Massachusetts (MA), Maryland (MD), and Virginia (VA), and

the District of Columbia (DC)

• Dataset:

– Twitter: Year 2011 ~ Year 2015 in the US.

– Training set: Aug 1 2011 ~ Jul 31 2012.

– Test set: Aug 1 2012 ~ Jul 31 2014.

Experiments: Label and Metrics

18

• Label:

– influenza statistics reported by the Centers for Disease Control

and Prevention (CDC).

– The CDC weekly publishes the percentage of the number of

physician visits related to influenza-like illness (ILI) within each

major region in the United States.

• Metrics:

– Lead time: How much time the output is ahead of the input.

– Mean squared error (MSE)

– Pearson correlation

– P-value

– Peak time error: Error of the predicted time of peak value

Experiments: Comparison Methods

• social media mining methods:– Linear Autoregressive Exogenous model (LinARX)

– Logistic Autoregressive Exogenous model (LogARX)

– Simple Linear Regression model (simpleLinReg)

– Multi-variable linear regression model (multiLinReg)

• computational epidemiology methods:

– SEIR

– EpiFast

• Detailed parameter settings:

– See here: http://people.cs.vt.edu/liangz8/materials/papers/SimNestAddon.pdf

19

State-level influenza epidemic

forecasting performance

20

Spatial sub-region outbreaks

forecasting performance

21

Thank you

22