hawkeye: a real-time anomaly detection system
TRANSCRIPT
![Page 1: HawkEye: A Real-Time Anomaly Detection System](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55b76d79bb61eb20248b4696/html5/thumbnails/1.jpg)
HawkEye: A Real-Time Anomaly Detection System
Satnam Singh, PhD
![Page 2: HawkEye: A Real-Time Anomaly Detection System](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55b76d79bb61eb20248b4696/html5/thumbnails/2.jpg)
Anomaly Types: Point Anomalies
• Data points that are significantly away from baseline are considered as outliers/point anomalies
• Detection Strategy: Use classification model/parametric models to learn the baseline and then detect deviations from baseline as anomalous
• Anomaly Detectors: Parametric models (LogNormal, Poisson), MPCA, One-Class SVM
*
![Page 3: HawkEye: A Real-Time Anomaly Detection System](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55b76d79bb61eb20248b4696/html5/thumbnails/3.jpg)
Collective Anomalies
• Anomalies are sequence of data points, measured typically at successive times, spaced at (often uniform) time intervals
• Detection Strategy: Compute anomaly score sequentially [e.g. likelihood ratio of anomalous to baseline probability distributions] and declare an anomaly whenever it crosses a threshold
• Anomaly Detectors: Change detection statistical techniques such as CUSUM, Page’s Test, GLRT
![Page 4: HawkEye: A Real-Time Anomaly Detection System](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55b76d79bb61eb20248b4696/html5/thumbnails/4.jpg)
Contextual Anomalies
• Data items are considered as anomalous in a specific context but not in other situations
• Using the context either raise or supress anomalies
• Anomaly Detectors: Seasonality detection using multiple models, Time series modeling
NormalAnomaly
Number of Requests madeon Retail website
Tuesday Tuesday Tuesday
![Page 5: HawkEye: A Real-Time Anomaly Detection System](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55b76d79bb61eb20248b4696/html5/thumbnails/5.jpg)
Data Stream Complexity
• Data stream complexity varies from simple to complex
• Don’t need complex algorithms for simple data streams
• Use algorithms to define data stream complexity
![Page 6: HawkEye: A Real-Time Anomaly Detection System](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55b76d79bb61eb20248b4696/html5/thumbnails/6.jpg)
Data Stream Complexity Estimator• Compute data stream summary statistics (e.g. percentage of zero values, max value,
non-zero values, entropy, etc.) of each job for entire training data• Perform anomaly detection using most complex anomaly detection pipeline on all the
job streams of training data• Use anomaly counts-based rule to perform to estimate data complexity. Used
following rule for complexity estimation:Constant-valued data stream: Anomaly counts 0%Simple: Anomaly counts Less than 0.5%, Medium Complex: Anomaly counts in between 0.5% and 2%Highly Complex: Anomaly counts more than 2%
• Use summary statistics computed in Step (a) as features and complexity computed in step (c) as class labels. Feed these features and class labels to a decision tree.
• Using Decision tree (information gain heuristic) identify features that are informative for classification. We found that decision tree achieves nearly 84% accuracy. Using Decision tree we derive following rule to automatically classify any job stream:
If entropy==0 : Level-1 “Constant-valued”if entropy <= 0.42: Level-2 “Simple” entropy > 0.42 and entropy< 0.75 if zero percentage <= 97: Level-3 “Medium Complex” elif zero percentage > 97: Level-2 “Simple” entropy>0.75 if zero percentage <= 97: Level-4 “Highly Complex” elif zero percentage > 97: Level-2 “Simple”
![Page 7: HawkEye: A Real-Time Anomaly Detection System](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55b76d79bb61eb20248b4696/html5/thumbnails/7.jpg)
HawkEye: Anomaly Detection Framework
1. Data StreamComplexity Estimator• Summary
Statistics• Entropy
2. Automated Baselining &Anomaly
Detector Selection- Parametric Models - Page’s Test
3. Seasonality Detection and
Prediction
4. AnomalySuppressionand Fusion
AlertsdB
Metricsdata
UserDashboard
![Page 8: HawkEye: A Real-Time Anomaly Detection System](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55b76d79bb61eb20248b4696/html5/thumbnails/8.jpg)
Sliding Window Size Selection
Level 2
Level 3
Level 4
![Page 9: HawkEye: A Real-Time Anomaly Detection System](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55b76d79bb61eb20248b4696/html5/thumbnails/9.jpg)
Anomaly Score: Statistics-based Detector
![Page 10: HawkEye: A Real-Time Anomaly Detection System](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55b76d79bb61eb20248b4696/html5/thumbnails/10.jpg)
Page’s Test: Detect Collective AnomaliesAn efficient change detection scheme
Use Page’s test to detect a switch from ordinary noise-only observations to those which look similar to the models
A change detection problem, is such that the distribution of observations is different before and after an unknown time no; and we want to detect the change, if it exists, asap.
Find the stopping time
Process beginsat t = 75
Detectiondeclared at t = 80
h = 30
arg minT nn
N S h test statistic 1max 0, ( )n n nS S g x
log likelihood ratio
Test statistic Sn is “clamped” at zero
( )( ) ln
( )K n
nH n
f xg x
f x
![Page 11: HawkEye: A Real-Time Anomaly Detection System](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55b76d79bb61eb20248b4696/html5/thumbnails/11.jpg)
Anomaly Score: Page’s Test Detector
![Page 12: HawkEye: A Real-Time Anomaly Detection System](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55b76d79bb61eb20248b4696/html5/thumbnails/12.jpg)
Seasonality Detection and Prediction
![Page 13: HawkEye: A Real-Time Anomaly Detection System](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55b76d79bb61eb20248b4696/html5/thumbnails/13.jpg)
Anomaly Detection Results: Historical Statistics-Detector
Anomalies in RedAnomaly Count:170
Anomaly Score Distribution
![Page 14: HawkEye: A Real-Time Anomaly Detection System](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55b76d79bb61eb20248b4696/html5/thumbnails/14.jpg)
Anomaly Detection Results: Page’s Test
Anomalies in RedAnomaly Count:6
Anomaly Score Distribution
![Page 15: HawkEye: A Real-Time Anomaly Detection System](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55b76d79bb61eb20248b4696/html5/thumbnails/15.jpg)
Anomaly Detection Results: Historical Statistics-Detector
Anomalies in RedAnomaly Count:159
Anomaly Score Distribution
![Page 16: HawkEye: A Real-Time Anomaly Detection System](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55b76d79bb61eb20248b4696/html5/thumbnails/16.jpg)
Anomaly Detection Results: Page’s Test
Anomalies in RedAnomaly Count:16
Anomaly Score Distribution
![Page 17: HawkEye: A Real-Time Anomaly Detection System](https://reader035.vdocuments.mx/reader035/viewer/2022062406/55b76d79bb61eb20248b4696/html5/thumbnails/17.jpg)
Anomaly Detection Results
System 1 System 2 Historical
Statistics-based detector
Page’s Test Historical Statistics-based detector
Page’s Test
Total No. of datums No of Jobs* Datums per Job= 155*14050=2177750
151*14050=2121550
No. of Missing datums No. of Jobs*Missing Datums= 155*133=20615
151*18=2718
No. of valid datums 2157135 2118832
No. of anomalies 28832 9793 32054 13197
% Anomalies 1.33% 0.454% 1.512% 0.623%
Computation Time Taken
2 mins 4 mins 2 mins 5 mins