members: raghuram krishnamachari manish maheshwari maryam el kherba guided by: prof. alan mislove

10
PREDICTING PROPAGATION OF A DISEASE Members: Raghuram Krishnamachari Manish Maheshwari Maryam El Kherba Guided by: Prof. Alan Mislove

Upload: aron-kelley

Post on 05-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Members: Raghuram Krishnamachari Manish Maheshwari Maryam El Kherba Guided by: Prof. Alan Mislove

PREDICTING PROPAGATION OF A DISEASE

Members:

Raghuram Krishnamachari

Manish Maheshwari

Maryam El Kherba

Guided by:

Prof. Alan Mislove

Page 2: Members: Raghuram Krishnamachari Manish Maheshwari Maryam El Kherba Guided by: Prof. Alan Mislove

Flu Prediction / Activity

CDC Flu ActivityReports Influenza like Illness (ILI) for each

region Google Flu Trends

Aggregates search data to estimate flu activity Our experiment (Twitter)

Analyze Twitter data (tweets) to estimate flu activity

Page 3: Members: Raghuram Krishnamachari Manish Maheshwari Maryam El Kherba Guided by: Prof. Alan Mislove

Google Flu Trends

CDC’s ILI data VS Google Flu Trends

Page 4: Members: Raghuram Krishnamachari Manish Maheshwari Maryam El Kherba Guided by: Prof. Alan Mislove

Google Flu Trends Vs Twitter

1/6/

2008

1/25

/200

8

2/13

/200

8

3/3/

2008

3/22

/200

8

4/10

/200

8

4/29

/200

8

5/18

/200

8

6/6/

2008

6/25

/200

8

7/14

/200

8

8/2/

2008

8/21

/200

8

9/9/

2008

9/28

/200

8

10/1

7/20

08

11/5

/200

8

11/2

4/20

08

12/1

3/20

08

1/1/

2009

1/20

/200

9

2/8/

2009

2/27

/200

9

3/18

/200

9

4/6/

2009

4/25

/200

9

5/14

/200

9

6/2/

2009

6/21

/200

9

7/10

/200

9

7/29

/200

9

8/17

/200

90

2000

4000

6000

8000

10000

12000HHS Region 1 (CT, ME, MA, NH, RI, VT)

HHS Region 2 (NJ, NY)

HHS Region 3 (DE, DC, MD, PA, VA, WV)

HHS Region 4 (AL, FL, GA, KY, MS, NC, SC, TN)

HHS Region 5 (IL, IN, MI, MN, OH, WI)

HHS Region 6 (AR, LA, NM, OK, TX)

HHS Region 7 (IA, KS, MO, NE)

HHS Region 8 (CO, MT, ND, SD, UT, WY)

HHS Region 9 (AZ, CA, HI, NV)

HHS Region 10 (AK, ID, OR, WA)

United States

1/6/

2008

1/26

/200

8

2/15

/200

8

3/6/

2008

3/26

/200

8

4/15

/200

8

5/5/

2008

5/25

/200

8

6/14

/200

8

7/4/

2008

7/24

/200

8

8/13

/200

8

9/2/

2008

9/22

/200

8

10/1

2/20

08

11/1

/200

8

11/2

1/20

08

12/1

1/20

08

12/3

1/20

08

1/20

/200

9

2/9/

2009

3/1/

2009

3/21

/200

9

4/10

/200

9

4/30

/200

9

5/20

/200

9

6/9/

2009

6/29

/200

9

7/19

/200

9

8/8/

2009

8/28

/200

90

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0.008

0.009

Region 1

Region 2

Region 3

Region 4

Region 5

Region 6

Region 7

Region 8

Region 9

Region 10

Page 5: Members: Raghuram Krishnamachari Manish Maheshwari Maryam El Kherba Guided by: Prof. Alan Mislove

Google Flu Trends Vs Twitter

1/6/

2008

1/23

/200

8

2/9/

2008

2/26

/200

8

3/14

/200

8

3/31

/200

8

4/17

/200

8

5/4/

2008

5/21

/200

8

6/7/

2008

6/24

/200

8

7/11

/200

8

7/28

/200

8

8/14

/200

8

8/31

/200

8

9/17

/200

8

10/4

/200

8

10/2

1/20

08

11/7

/200

8

11/2

4/20

08

12/1

1/20

08

12/2

8/20

08

1/14

/200

9

1/31

/200

9

2/17

/200

9

3/6/

2009

3/23

/200

9

4/9/

2009

4/26

/200

9

5/13

/200

9

5/30

/200

9

6/16

/200

9

7/3/

2009

7/20

/200

9

8/6/

2009

8/23

/200

90

1000

2000

3000

4000

5000

6000

7000

G-R3

T-R3

1/6/

2008

1/23

/200

8

2/9/

2008

2/26

/200

8

3/14

/200

8

3/31

/200

8

4/17

/200

8

5/4/

2008

5/21

/200

8

6/7/

2008

6/24

/200

8

7/11

/200

8

7/28

/200

8

8/14

/200

8

8/31

/200

8

9/17

/200

8

10/4

/200

8

10/2

1/20

08

11/7

/200

8

11/2

4/20

08

12/1

1/20

08

12/2

8/20

08

1/14

/200

9

1/31

/200

9

2/17

/200

9

3/6/

2009

3/23

/200

9

4/9/

2009

4/26

/200

9

5/13

/200

9

5/30

/200

9

6/16

/200

9

7/3/

2009

7/20

/200

9

8/6/

2009

8/23

/200

90

1000

2000

3000

4000

5000

6000

7000

8000

G-R9

T-R9

Page 6: Members: Raghuram Krishnamachari Manish Maheshwari Maryam El Kherba Guided by: Prof. Alan Mislove

Tweets, Phrases"having a cold" 4"have a cold“ 7"feel feverish" "flu" 5"headache" "flu" 8"sick" "flu" 9"flu" "fever“ 5"came down with the flu" 7"chills" "flu" 7"catching the flu" 6"cough" "flu" 6"fatigue" "flu" 8"weakness" "flu" 6"flu like symptoms" 4"runny nose" "flu" 5"sore throat" "flu" 7"stomach ache" "flu" 6"stuffy nose" "flu" 6"tiredness" "flu" 4"vomiting" "flu" 4"watery eyes" "flu" 6"body hurts" "flu" 7

Page 7: Members: Raghuram Krishnamachari Manish Maheshwari Maryam El Kherba Guided by: Prof. Alan Mislove

Process

•Filter flu tweets from twitter data

•Store data for each state (FIPS)

Filter

•Count flu tweets (weekly)

•Count total tweets (weekly)

Count

•Ratio of flu related to total tweets

•Compare against Google/CDC

Plot

Page 8: Members: Raghuram Krishnamachari Manish Maheshwari Maryam El Kherba Guided by: Prof. Alan Mislove

Implementation

Linux bash shell script Filtering

find fips -name "*.gz" -exec zcat {} \; | grep "$1"

Counting find … -exec zcat {} \; | awk ‘{ print $3 }' | awk

'{ print $3 " " $2 " " $6 } sort -k 3n -k 2M -k 1n | uniq -c

Plotting pr -mft -s, dates.txt NJ.tot NY.tot > RE2.tot Microsoft Excel

Page 9: Members: Raghuram Krishnamachari Manish Maheshwari Maryam El Kherba Guided by: Prof. Alan Mislove

Challenges

FilteringPhrases that express flu symptomsProcessing timeSegregation based on location

CountingProcessing timeStorage format

PlottingLack of consistent CDC dataHandling of large numeric data

Page 10: Members: Raghuram Krishnamachari Manish Maheshwari Maryam El Kherba Guided by: Prof. Alan Mislove

Future

Better prediction algorithm Live Tweet monitoring Flu propagation Facebook application