eds challenge: overview and some preliminary results
TRANSCRIPT
EDS Challenge: Overview and Some Preliminary
Results
Preliminary Results – For Official Use Only
Katie Umberg, USEPA
Victoria Berry, CH2M HILL
Steve Allgeier, USEPA
AWWA Water Security Conference
Nashville, TN; September 13, 2011
1. Overview
2. Some Preliminary Results
3. Path Forward
2
Overview
3
Event Detection Systems (EDSs)
• Water Quality Monitoring event detection system
(EDS): software that monitors water quality data in
real-time and produces an alarm if water quality is
abnormal. An EDS allows the utility to efficiently
monitor the large quantity of data that can be
produced by an online water quality monitoring
system
– May use supporting data such as sensor alarms and
operations data
– Can be implemented at the actual sensor site or at a central
location
• Goal: maximize detection of abnormal water quality
events, while minimizing false alarms
4
5
EDS Challenge
• Goals:
– To provide an objective demonstration of available
EDS’s performance, measuring both true and false
alarms.
– To challenge EDS developers to incorporate
innovative approaches for analyzing complex water
quality
• Factors such as cost, ease of use, and support were
not considered. Nor is the LOE for installation and
configuration: each developer trained their own tool.
6
Testing Data
• One year of data was obtained for a total of 6 monitoring
stations from 4 US water utilities.
– 3 months were provided to participants for training. 9
months were used for evaluation.
– Data from sites with variable / complex water quality
was requested.
– Corresponding operational data was also provided,
where available.
• 2, 5, 10, and 20 minute polling intevals were used. The
longer ones are not ideal, but some utilities could not
provide data on a smaller interval.
Testing Data
• For each station, each trained EDS was challenged
with:
– Baseline utility data to primarily measure false
alarms
– 96 simulated contamination events per monitoring
station to calculate detection ability
>For each station, an event was created with
every combination of 6 contaminants, 2
contaminant concentrations, 4 event start times,
and 2 event profiles
>The events were designed to be varied and
realistic
7
8
Simulated Events
• Water quality changes consistent with contamination
scenarios were superimposed on the baseline data
– Empirically modeled from contaminant reaction
studies
0
0.2
0.4
0.6
0.8
1
1.2
4:00 6:00 8:00 10:00 12:00 14:00 16:00
mg
/L
Chlorine Data with
Event Superimposed
Baseline Chlorine
Data
Baseline Events
9
• All testing data was methodically analyzed to identify
periods in the baseline data where data was abnormal.
For analysis, alarms during these periods were considered
true detections, not false alarms
– 16 total “baseline events” were identified in the testing data
0
0.25
0.5
0.75
1
1.25
1.5
1.75
2
2.25
2.5
11/10 11/11 11/12 11/13 11/14 11/15
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1TOC Turbidity
Bad Quality Alarms
• The testing data was also processed to identify
periods of bad quality data.
• These were considered false alarms, but were listed
separately as a utility could easily identify the cause
of these alarms
10
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
11/10 1:40
11/10 11:40
11/10 21:40
11/11 7:40
11/11 17:40
11/12 3:40
11/12 13:40
11/12 23:40
11/13 9:40
TO
C
11
EDS Challenge Participants
• Open to any team with software capable of analyzing
time series data and producing normal / abnormal
output for each timestep
• Participating EDS developers had to submit their
“trained” software tool to EPA for testing
– Originally 16 teams registered
– 8 withdrew due to limited resources and / or
unwillingness to adhere to requirements
– 3 withdrew due to poor performance
EDS Challenge Participants
• 5 EDS tools participated (in alphabetical order by
EDS name):
– BlueBox, WhiteWater Security*
– CANARY, Sandia National Labs / USEPA
– Event Monitor, the Hach Company*
– Moni::tool, S::CAN
– OptiEDS, OptiWater (Elad Salomons)
* Due to issues with running the software in off-line mode, BlueBox was
only run on the 3 stations with longer polling intervals
* Hach chose to analyze only the 3 sites with the 2-minute polling
intervals
12
Preliminary Results
13
Challenge Results
• This presentation gives a “snapshot” of the results
obtained in the EDS Challenge. Formal, more in
depth results will be published via a report and journal
article.
• As these results are preliminary, EDS names will not
be used in this presentation of results.
14
Performance Notes
• This truly was a Challenge, and utilities would likely
experience better performance in their implementation.
– Stations with complex WQ were intentionally chosen to challenge
the tools (these are “worst case” stations).
– Some of the simulated events were intentionally hard to detect.
• Participants had to significantly modify their tool to run in
off-line mode, as required by the Challenge.
– BlueBox, Event Monitor, and Moni::tool use real-time user feedback
to determine future alarming.
• Also, all participants have updated and enhanced their
software since these results were generated.
15
Alarms Summary, Station A
• Polling interval: 5 minutes
• At a distribution system point of entry. Entirely
different water source and quality depending on the
status of three co-located pumps. The on/off status
for each of these pumps was provided in data.
16Preliminary Results – For Official Use Only
Alarms Summary, Station B
• Polling interval: 20 minutes
• At the connection where a ground storage tank is
filled, which provides water to the utility’s large
customers. Flow through the station is intermittent
based on demand, and there are clear daily patterns.
17Preliminary Results – For Official Use Only
Alarms Summary, Station D
• Polling interval: 2 minutes
• At a large reservoir. WQ is affected by operations at
co-located pump station and by two upstream pump
stations. A variety of operations data was provided,
but there is no exact correlation between these and
the station’s WQ changes
18Preliminary Results – For Official Use Only
Alarms Summary, Station E
• Polling interval: 10 minutes
• At a reservoir. Water sources to the station include
two different mains and the reservoir. Many
operational data tags were provided including WQ
from the main, reservoir level and flow, and pump
status for 3 non-co-located pumps.
19Preliminary Results – For Official Use Only
Alarms Summary, Station F
• Polling interval: 2 minutes
• Located at a large elevated tank. Tank levels and co-
located pump statuses were provided.
20Preliminary Results – For Official Use Only
Alarms Summary, Station G
• Polling interval: 2 minutes
• At a major pumping station, connected to a bi-
directional line that runs between the reservoir and
pump station. Pumping operations cause “blips” in
the data that look like sensor errors. Reservoir flow
and pump statuses provided in data.
21Preliminary Results – For Official Use Only
Path Forward
22
23
Additional Analyses
• Further analysis to be done by EPA, including:
– Performance analysis using multiple alarm
threshold settings
>Production of ROC curves
– Evaluation of how setpoint alarms would perform
– Consideration of additional metrics such as the
accuracy of outputted trigger parameters, detection
time, alarm length, output variability…
– Analysis of the impact of each event characteristic
on detection: breakdown of results by monitoring
location, contaminant, contaminant concentration,
start time, and event profile
Acknowledgements
• The patience and cooperation of the Challenge
Participants: the Hach Company, Optiwater (Elad
Salomons), Sandia National Labs, S::CAN, and
WhiteWater Security
• Support from Erin Cummings, Zheng Jie, Reese
Johnson, Raja Kadiyala, and Adam Pollak, CH2M
HILL
24