![Page 1: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/1.jpg)
What’s Strange About Recent Events (WSARE)
Weng-Keen Wong (Carnegie Mellon University)
Andrew Moore (Carnegie Mellon University)
Gregory Cooper (University of Pittsburgh)
Michael Wagner (University of Pittsburgh)
DIMACS Tutorial on Statistical and Other Analytic Health Surveillance Methods
![Page 2: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/2.jpg)
Motivation
Primary Key
Date Time Hospital ICD9 Prodrome Gender Age Home Location
Work Location
Many more…
100 6/1/03 9:12 1 781 Fever M 20s NE ? …
101 6/1/03 10:45 1 787 Diarrhea F 40s NE NE …
102 6/1/03 11:03 1 786 Respiratory F 60s NE N …
103 6/1/03 11:07 2 787 Diarrhea M 60s E ? …
104 6/1/03 12:15 1 717 Respiratory M 60s E NE …
105 6/1/03 13:01 3 780 Viral F 50s ? NW …
106 6/1/03 13:05 3 487 Respiratory F 40s SW SW …
107 6/1/03 13:57 2 786 Unmapped M 50s SE SW …
108 6/1/03 14:22 1 780 Viral M 40s ? ? …
: : : : : : : : : : :
Suppose we have access to Emergency Department data from hospitals around a city (with patient confidentiality preserved)
![Page 3: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/3.jpg)
The Problem
From this data, can we detect if a disease outbreak is happening?
![Page 4: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/4.jpg)
The Problem
From this data, can we detect if a disease outbreak is happening?
We’re talking about a non-specific disease detection
![Page 5: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/5.jpg)
The Problem
From this data, can we detect if a disease outbreak is happening? How early can we detect it?
![Page 6: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/6.jpg)
The Problem
From this data, can we detect if a disease outbreak is happening? How early can we detect it?
The question we’re really asking: In the last n hours, has anything strange happened?
![Page 7: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/7.jpg)
Traditional ApproachesWhat about using traditional anomaly detection?
• Typically assume data is generated by a model
• Finds individual data points that have low probability with respect to this model
• These outliers have rare attributes or combinations of attributes
• Need to identify anomalous patterns not isolated data points
![Page 8: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/8.jpg)
Traditional Approaches
– Time series algorithms
– Regression techniques
– Statistical Quality Control methods
• Need to know apriori which attributes to form daily aggregates for!
Number of ED Visits per Day
0
10
20
30
40
50
1 10 19 28 37 46 55 64 73 82 91 100
Day Number
Nu
mb
er o
f E
D V
isit
s
What about monitoring aggregate daily counts of certain attributes?
• We’ve now turned multivariate data into univariate data
• Lots of algorithms have been developed for monitoring univariate data:
![Page 9: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/9.jpg)
Traditional Approaches
What if we don’t know what attributes to monitor?
![Page 10: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/10.jpg)
Traditional Approaches
What if we don’t know what attributes to monitor?
What if we want to exploit the spatial, temporal and/or demographic characteristics of the epidemic to detect the outbreak as early as possible?
![Page 11: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/11.jpg)
Traditional ApproachesWe need to build a univariate detector to monitor each interesting
combination of attributes:
Diarrhea cases among children
Respiratory syndrome cases among females
Viral syndrome cases involving senior citizens from eastern part of city
Number of children from downtown hospital
Number of cases involving people working in southern
part of the city
Number of cases involving teenage girls living in thewestern part of the city
Botulinic syndrome cases
And so on…
![Page 12: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/12.jpg)
Traditional ApproachesWe need to build a univariate detector to monitor each interesting
combination of attributes:
Diarrhea cases among children
Respiratory syndrome cases among females
Viral syndrome cases involving senior citizens from eastern part of city
Number of children from downtown hospital
Number of cases involving people working in southern
part of the city
Number of cases involving teenage girls living in thewestern part of the city
Botulinic syndrome cases
And so on…
You’ll need hundreds of univariate detectors!We would like to identify the groups with the strangest
behavior in recent events.
![Page 13: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/13.jpg)
Our Approach• We use Rule-Based Anomaly Pattern Detection• Association rules used to characterize anomalous
patterns. For example, a two-component rule would be:
Gender = Male AND 40 Age < 50• Related work:
– Market basket analysis [Agrawal et. al, Brin et. al.]
– Contrast sets [Bay and Pazzani]
– Spatial Scan Statistic [Kulldorff]
– Association Rules and Data Mining in Hospital Infection Control and Public Health Surveillance [Brossette et. al.]
![Page 14: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/14.jpg)
WSARE v2.0
“Last 24 hours”“Ignore key”
Primary Key
Date Time Hospital ICD9 Prodrome Gender Age Home Location
Work Location
Many more…
100 6/1/03 9:12 1 781 Fever M 20s NE ? …
101 6/1/03 10:45 1 787 Diarrhea F 40s NE NE …
102 6/1/03 11:03 1 786 Respiratory F 60s NE N …
: : : : : : : : : : :
• Inputs: 1. Multivariate date/time-indexed biosurveillance-relevant data stream
2. Time Window Length
3. Which attributes to use?
“Emergency Department Data”
![Page 15: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/15.jpg)
WSARE v2.0
• Outputs: 1. Here are the records that most surprise me
2. Here’s why3. And here’s how seriously you should take it
Primary Key
Date Time Hospital ICD9 Prodrome Gender Age Home Location
Work Location
Many more…
100 6/1/03 9:12 1 781 Fever M 20s NE ? …
101 6/1/03 10:45 1 787 Diarrhea F 40s NE NE …
102 6/1/03 11:03 1 786 Respiratory F 60s NE N …
: : : : : : : : : : :
• Inputs: 1. Multivariate date/time-indexed biosurveillance-relevant data stream
2. Time Window Length
3. Which attributes to use?
![Page 16: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/16.jpg)
WSARE v2.0 Overview
2. Search for rule with best score
3. Determine p-value of best scoring rule through randomization test
All Data
4. If p-value is less than threshold, signal alert
RecentData
Baseline
1. Obtain Recent and Baseline datasets
![Page 17: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/17.jpg)
Step 1: Obtain Recent and Baseline Data
RecentData
Baseline
Data from last 24 hours
Baseline data is assumed to capture non-outbreak behavior. We use data from 35, 42, 49 and 56 days prior to the current day
![Page 18: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/18.jpg)
Step 2. Search for Best Scoring RuleFor each rule, form a 2x2 contingency table eg.
• Perform Fisher’s Exact Test to get a p-value for each rule => call this p-value the “score”
• Take the rule with the lowest score. Call this rule RBEST.
• This score is not the true p-value of RBEST because we are performing multiple hypothesis tests on each day to find the rule with the best score
CountRecent CountBaseline
Age Decile = 3 48 45
Age Decile 3 86 220
![Page 19: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/19.jpg)
The Multiple Hypothesis Testing Problem
• Suppose we reject null hypothesis when score < , where = 0.05
• For a single hypothesis test, the probability of making a false discovery =
• Suppose we do 1000 tests, one for each possible rule
• Probability(false discovery) could be as bad as: 1 – ( 1 – 0.05)1000 >> 0.05
![Page 20: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/20.jpg)
Step 3: Randomization Test
• Take the recent cases and the baseline cases. Shuffle the date field to produce a randomized dataset called DBRand
• Find the rule with the best score on DBRand.
June 4, 2002 C2
June 5, 2002 C3
June 12, 2002 C4
June 19, 2002 C5
June 26, 2002 C6
June 26, 2002 C7
July 2, 2002 C8
July 3, 2002 C9
July 10, 2002 C10
July 17, 2002 C11
July 24, 2002 C12
July 30, 2002 C13
July 31, 2002 C14
July 31, 2002 C15
June 4, 2002 C2
June 12, 2002 C3
July 31, 2002 C4
June 26, 2002 C5
July 31, 2002 C6
June 5, 2002 C7
July 2, 2002 C8
July 3, 2002 C9
July 10, 2002 C10
July 17, 2002 C11
July 24, 2002 C12
July 30, 2002 C13
June 19, 2002 C14
June 26, 2002 C15
![Page 21: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/21.jpg)
Step 3: Randomization TestRepeat the procedure on the previous slide for 1000 iterations. Determine how many scores from the 1000 iterations are better than the original score.
If the original score were here, it would place in the top 1% of the 1000 scores from the randomization test. We would be impressed and an alert should be raised.
Estimated p-value of the rule is:
# better scores / # iterations
![Page 22: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/22.jpg)
Two Kinds of AnalysisDay by Day• If we want to run WSARE
just for the current day…
…then we end here.
Historical Analysis• If we want to review all
previous days and their p-values for several years and control for some percentage of false positives……then we’ll once again run into overfitting problems…we need to compensate for multiple hypothesis testing because we perform a hypothesis test on each day in the history
![Page 23: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/23.jpg)
We only need to do this for historical analysis!
False Discovery Rate [Benjamini and Hochberg]
• Can determine which of these p-values are significant
• Specifically, given an αFDR, FDR guarantees that
• Given an αFDR, FDR produces a threshold below which any p-values in the history are considered significant
FDRrejected washyp nullin which tests#
positives false#
![Page 24: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/24.jpg)
WSARE v3.0
![Page 25: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/25.jpg)
WSARE v2.0 Review
2. Search for rule with best score
3. Determine p-value of best scoring rule through randomization test
All Data
4. If p-value is less than threshold, signal alert
RecentData
Baseline
1. Obtain Recent and Baseline datasets
![Page 26: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/26.jpg)
Obtaining the Baseline
Recall that the baseline was assumed to be captured by data that was from 35, 42, 49, and 56 days prior to the current day.
Baseline
![Page 27: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/27.jpg)
Obtaining the Baseline
Recall that the baseline was assumed to be captured by data that was from 35, 42, 49, and 56 days prior to the current day.
Baseline
We would like to determine the baseline automatically!
What if this assumption isn’t true? What if data from 7, 14, 21 and 28
days prior is better?
![Page 28: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/28.jpg)
Temporal Trends• But health care data has many different
trends due to – Seasonal effects in temperature and weather– Day of Week effects– Holidays– Etc.
• Allowing the baseline to be affected by these trends may dramatically alter the detection time and false positives of the detection algorithm
![Page 29: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/29.jpg)
Temporal Trends
From: Goldenberg, A., Shmueli, G., Caruana, R. A., and Fienberg, S. E. (2002). Early statistical detection of anthrax outbreaks by tracking over-the-counter medication sales. Proceedings of the National Academy of Sciences (pp. 5237-5249)
![Page 30: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/30.jpg)
WSARE v3.0 Generate the baseline…• “Taking into account recent flu levels…”• “Taking into account that today is a public holiday…”• “Taking into account that this is Spring…”• “Taking into account recent heatwave…”• “Taking into account that there’s a known natural Food-
borne outbreak in progress…”
Bonus: More efficient use of historical data
![Page 31: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/31.jpg)
Conditioning on observed environment: Well understood for Univariate Time Series
Time
Sig
nal
Example Signals:• Number of ED visits today• Number of ED visits this hour• Number of Respiratory Cases Today• School absenteeism today• Nyquil Sales today
![Page 32: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/32.jpg)
An easy case
Time
Sig
nal
Dealt with by Statistical Quality Control
Record the mean and standard deviation up the the current time.
Signal an alarm if we go outside 3 sigmas
Mean
Upper Safe Range
![Page 33: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/33.jpg)
Conditioning on Seasonal Effects
Time
Sig
nal
![Page 34: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/34.jpg)
Time
Sig
nal
Fit a periodic function (e.g. sine wave) to previous data. Predict today’s signal and 3-sigma confidence intervals. Signal an alarm if we’re off.
Reduces False alarms from Natural outbreaks.
Different times of year deserve different thresholds.
Conditioning on Seasonal Effects
![Page 35: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/35.jpg)
Weekly counts of P&I from week 1/98 to 48/00
Example [Tsui et. Al]
From: “Value of ICD‑9–Coded Chief Complaints for Detection of Epidemics”, Fu-Chiang Tsui, Michael M. Wagner, Virginia Dato, Chung-Chou Ho Chang, AMIA 2000
![Page 36: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/36.jpg)
Seasonal Effects with Long-Term Trend
Weekly counts of IS from week 1/98 to 48/00.
From: “Value of ICD‑9–Coded Chief Complaints for Detection of Epidemics”, Fu-Chiang Tsui, Michael M. Wagner, Virginia Dato, Chung-Chou Ho Chang, AMIA 2000
![Page 37: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/37.jpg)
Fit a periodic function (e.g. sine wave) plus a linear trend:
E[Signal] = a + bt + c sin(d + t/365)
Good if there’s a long term trend in the disease or the population.
Weekly counts of IS from week 1/98 to 48/00.
From: “Value of ICD‑9–Coded Chief Complaints for Detection of Epidemics”, Fu-Chiang Tsui, Michael M. Wagner, Virginia Dato, Chung-Chou Ho Chang, AMIA 2000
Called the Serfling Method [Serfling, 1963]
Seasonal Effects with Long-Term Trend
![Page 38: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/38.jpg)
Day-of-week effects
From: Goldenberg, A., Shmueli, G., Caruana, R. A., and Fienberg, S. E. (2002). Early statistical detection of anthrax outbreaks by tracking over-the-counter medication sales. Proceedings of the National Academy of Sciences (pp. 5237-5249)
![Page 39: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/39.jpg)
Day-of-week effects
From: Goldenberg, A., Shmueli, G., Caruana, R. A., and Fienberg, S. E. (2002). Early statistical detection of anthrax outbreaks by tracking over-the-counter medication sales. Proceedings of the National Academy of Sciences (pp. 5237-5249)
Fit a day-of-week component
E[Signal] = a + deltaday
E.G: deltamon= +5.42, deltatue= +2.20, deltawed= +3.33, deltathu= +3.10, deltafri= +4.02, deltasat= -12.2, deltasun= -23.42
Another simple form of ANOVA
![Page 40: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/40.jpg)
Analysis of variance (ANOVA)
• Good news:If you’re tracking a daily aggregate (univariate
data)…then ANOVA can take care of many of these effects.
• But…What if you’re tracking a whole joint distribution
of events?
![Page 41: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/41.jpg)
Idea: Bayesian Networks
“On Cold Tuesday Mornings the folks coming in from the North
part of the city are more likely to have respiratory problems”
“Patients from West Park Hospital are less likely to be young”
“On the day after a major holiday, expect a boost in the morning followed by a lull in
the afternoon”
Bayesian Network: A graphical model representing the joint probability distribution of a set of random variables
“The Viral prodrome is more likely to co-occur with a Rash
prodrome than Botulinic”
![Page 42: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/42.jpg)
WSARE Overview
2. Search for rule with best score
3. Determine p-value of best scoring rule through randomization test
All Data
4. If p-value is less than threshold, signal alert
RecentData
Baseline
1. Obtain Recent and Baseline datasets
![Page 43: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/43.jpg)
Obtaining Baseline Data
Baseline
All HistoricalData
Today’s Environment
1. Learn Bayesian Network
2. Generate baseline given today’s environment
![Page 44: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/44.jpg)
Obtaining Baseline Data
Baseline
All HistoricalData
Today’s Environment
1. Learn Bayesian Network
2. Generate baseline given today’s environment
What should be happening today given today’s environment
![Page 45: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/45.jpg)
Step 1: Learning the Bayes Net StructureInvolves searching over DAGs for the structure that maximizes a scoring function. Most common algorithm is hillclimbing.
Initial Structure
Add an arc Delete an arc Reverse an arc
3 possible operations:
![Page 46: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/46.jpg)
Step 1: Learning the Bayes Net StructureInvolves searching over DAGs for the structure that maximizes a scoring function. Most common algorithm is hillclimbing.
Initial Structure
Add an arc Delete an arc Reverse an arc
3 possible operations:
But hillclimbing is too slow and single link modifications may not find the correct structure (Xiang, Wong and Cercone 1997). We use Optimal Reinsertion (Moore and Wong 2002).
![Page 47: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/47.jpg)
T
1. Select target node in current graph
T
2. Remove all arcs connected to T
Optimal Reinsertion
![Page 48: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/48.jpg)
Optimal Reinsertion
T
3. Efficiently find new in/out arcs
T
4. Choose best new way to connect T
??
?? ?
?
?
?
![Page 49: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/49.jpg)
The Outer Loop
Until no change in current DAG:
• Generate random ordering of nodes
• For each node in the ordering, do Optimal Reinsertion
![Page 50: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/50.jpg)
The Outer Loop
For NumJolts:
• Begin with randomly corrupted version of best DAG so far
Until no change in current DAG:
• Generate random ordering of nodes
• For each node in the ordering, do Optimal Reinsertion
![Page 51: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/51.jpg)
For NumJolts:
• Begin with randomly corrupted version of best DAG so far
The Outer Loop
Until no change in current DAG:
• Generate random ordering of nodes
• For each node in the ordering, do Optimal Reinsertion
Conventional hill-climbing without maxParams restriction
![Page 52: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/52.jpg)
How is Optimal Reinsertion done efficiently?
1. Create an efficient cache of NodeScore(PS->T) values using ADSearch [Moore and Schneider 2002]
2. Restrict PS->T combinations to those with CPTs with maxParams or fewer parameters
3. Additional Branch and Bound is used to restrict space an additional order of magnitude
Scoring functions can be decomposed:P1 P2 P3
T
Efficiency Tricks
))(()(1
iiPSNodeScoreDDagScorem
i
![Page 53: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/53.jpg)
Environmental Attributes
Divide the data into two types of attributes:
• Environmental attributes: attributes that cause trends in the data eg. day of week, season, weather, flu levels
• Response attributes: all other non-environmental attributes
![Page 54: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/54.jpg)
Environmental AttributesWhen learning the Bayesian network structure, do not allow
environmental attributes to have parents.
Why?
• We are not interested in predicting their distributions
• Instead, we use them to predict the distributions of the response attributes
Side Benefit: We can speed up the structure search by avoiding DAGs that assign parents to the environmental attributes
Season Day of Week Weather Flu Level
![Page 55: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/55.jpg)
Step 2: Generate Baseline Given Today’s Environment
Season Day of Week Weather Flu Level
Today Winter Monday Snow High
Season = Winter
Day of Week = Monday
Weather = Snow
Flu Level = High
Suppose we know the following for today:
We fill in these values for the environmental attributes in the learned Bayesian network
Baseline
We sample 10000 records from the Bayesian network and make this data set the baseline
![Page 56: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/56.jpg)
Step 2: Generate Baseline Given Today’s Environment
Season Day of Week Weather Flu Level
Today Winter Monday Snow High
Season = Winter
Day of Week = Monday
Flu Level = High
Suppose we know the following for today:
We fill in these values for the environmental attributes in the learned Bayesian network
Baseline
We sample 10000 records from the Bayesian network and make this data set the baseline
Sampling is easy because
environmental attributes are at the
top of the Bayes Net
Weather = Snow
![Page 57: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/57.jpg)
Why not use inference?
• With sampling, we create the baseline data and then use it to obtain the p-value of the rule for the randomization test
• If we used inference, we will not be able to perform the same randomization test and we need to find some other way to correct for the multiple hypothesis testing
• Sampling was chosen for its simplicity
![Page 58: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/58.jpg)
Why not use inference?
• With sampling, we create the baseline data and then use it to obtain the p-value of the rule for the randomization test
• If we used inference, we will not be able to perform the same randomization test and we need to find some other way to correct for the multiple hypothesis testing
• Sampling was chosen for its simplicity
But there may be clever things to do with inference which may help us. File this under future work
![Page 59: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/59.jpg)
SimulationNW
100
N
400
NE
500
W
100
C
200
E
300
SW
200
S
200
SE
600
City with 9 regions and different population in each region
For each day, sample the city’s environment from the following Bayesian Network
Date
Day of Week
PreviousWeather Season
PreviousFlu Level
PreviousRegion Food
Condition
PreviousRegion Anthrax
Concentration
Region FoodCondition
Region Anthrax Concentration
Weather Flu Level
![Page 60: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/60.jpg)
Simulation
DATE
DAY OF WEEK SEASONFLU LEVEL WEATHER
REGION
AGE
GENDER Region Grassiness
Region Anthrax Concentration
Region Food
Condition
ImmuneSystem
OutsideActivity
HasAnthrax
HasFlu
HasAllergy
Has HeartAttack
HasSunburn
HasCold
HeartHealth
Has FoodPoisoning
Disease
ACTION
ActualSymptom
REPORTEDSYMPTOM DRUG
For each person in a region, sample their profile
![Page 61: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/61.jpg)
Visible Environmental Attributes
DATE
DAY OF WEEK SEASONFLU LEVEL WEATHER
REGION
AGE
GENDER Region Grassiness
Region Anthrax Concentration
Region Food
Condition
ImmuneSystem
OutsideActivity
HasAnthrax
HasFlu
HasAllergy
Has HeartAttack
HasSunburn
HasCold
HeartHealth
Has FoodPoisoning
Disease
ACTION
ActualSymptom
REPORTEDSYMPTOM DRUG
![Page 62: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/62.jpg)
Simulation
DATE
DAY OF WEEK SEASONFLU LEVEL WEATHER
REGION
AGE
GENDER Region Grassiness
Region Anthrax Concentration
Region Food
Condition
ImmuneSystem
OutsideActivity
HasAnthrax
HasFlu
HasAllergy
Has HeartAttack
HasSunburn
HasCold
HeartHealth
Has FoodPoisoning
Disease
ACTION
ActualSymptom
REPORTEDSYMPTOM DRUG
Diseases: Allergy, cold, sunburn, flu, food poisoning, heart problems, anthrax (in order of precedence)
![Page 63: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/63.jpg)
Simulation
DATE
DAY OF WEEK SEASONFLU LEVEL WEATHER
REGION
AGE
GENDER Region Grassiness
Region Anthrax Concentration
Region Food
Condition
ImmuneSystem
OutsideActivity
HasAnthrax
HasFlu
HasAllergy
Has HeartAttack
HasSunburn
HasCold
HeartHealth
Has FoodPoisoning
Disease
ACTION
ActualSymptom
REPORTEDSYMPTOM DRUG
Actions: None, Purchase Medication, ED visit, Absent. If Action is not None, output record to dataset.
![Page 64: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/64.jpg)
Simulation Plot
![Page 65: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/65.jpg)
Simulation PlotAnthrax release
(not highest peak)
![Page 66: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/66.jpg)
Simulation• 100 different data sets• Each data set consisted of a two year period• Anthrax release occurred at a random point during the
second year• Algorithms allowed to train on data from the current day
back to the first day in the simulation• Any alerts before actual anthrax release are considered a
false positive• Detection time calculated as first alert after anthrax
release. If no alerts raised, cap detection time at 14 days
![Page 67: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/67.jpg)
Other Algorithms used in Simulation
Time
Sig
nal
Mean
Upper Safe Range
1. Standard algorithm
2. WSARE 2.0
3. WSARE 2.5
• Use all past data but condition on environmental attributes
![Page 68: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/68.jpg)
Results on Simulation
![Page 69: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/69.jpg)
Conclusion• One approach to biosurveillance: one algorithm
monitoring millions of signals derived from multivariate data
instead ofHundreds of univariate detectors
• WSARE is best used as a general purpose safety net in combination with other detectors
• Modeling historical data with Bayesian Networks to allow conditioning on unique features of today
• Computationally intense unless we use clever algorithms
![Page 70: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/70.jpg)
Conclusion
• WSARE 2.0 deployed during the past year• WSARE 3.0 about to go online• WSARE now being extended to
additionally exploit over the counter medicine sales
![Page 71: Whats Strange About Recent Events (WSARE) Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University](https://reader033.vdocuments.mx/reader033/viewer/2022061613/55162e6355034694308b5f04/html5/thumbnails/71.jpg)
For more informationReferences:
• Wong, W. K., Moore, A. W., Cooper, G., and Wagner, M. (2002). Rule-based Anomaly Pattern Detection for Detecting Disease Outbreaks. Proceedings of AAAI-02 (pp. 217-223). MIT Press.
• Wong, W. K., Moore, A. W., Cooper, G., and Wagner, M. (2003). Bayesian Network Anomaly Pattern Detection for Disease Outbreaks. Proceedings of ICML 2003.
• Moore, A., and Wong, W. K. (2003). Optimal Reinsertion: A New Search Operator for Accelerated and More Accurate Bayesian Network Structure Learning. Proceedings of ICML 2003.
AUTON lab website: http://www.autonlab.org/wsare
Email: [email protected]