detecting hacks: anomaly detection on networking data
Post on 28-Jul-2015
1.348 Views
Preview:
TRANSCRIPT
1 © 2010 Cisco and/or its affiliates. All rights reserved.
Detecting Hacks: Anomaly Detection on Networking Data James Sirota (@JamesSirota) Lead Data Scientist – Managed Threat Defense Chester Parrott (@ParrottSquawk) Data Scientist – Managed Threat Defense
June 2015
© 2015 Cisco and/or its affiliates. All rights reserved. 2
In the next few minutes… • Defense in Depth for Big Data • Network Anomaly Detection Overview
• Volume Anomaly Detection • Feature Anomaly Detection
• Model Architecture • Deployment on OpenSOC Platform • Questions
© 2015 Cisco and/or its affiliates. All rights reserved. 3
Who are we?
Big Data Security Analytics
Open Source
Managed Service
© 2015 Cisco and/or its affiliates. All rights reserved. 4
The New Defense-In-Depth
Defense Strategy
Static Sandboxing
Threat Intel Feeds
Rules Engines
Volume-Based
Feature-Based
NLP-Based
Token Clustering
User Profiling
Asset Profiling
Interaction Profiling
Dynamic Sandboxing
Malware Classifiers
Script Classifiers
Perimeter Monitoring
Web Scraping
Soc. Media Analytics
Model Validators
Training Set Generation
Signature Matching
Rules-Based
Matching
Network Anomaly Detection
Log Anomaly Detection
Behavioral Anomaly Detection
Malware Family
Script Family Scraping Honeypots
Misuse Detection
Intrusion Detection
Supervised Class.
Look-Ahead
Analytics
Legacy Mindset Generic Threats Targeted Threats Future Threats
© 2015 Cisco and/or its affiliates. All rights reserved. 5
Network Anomaly Detection
Network Anomaly Detection
Volume-Based
Feature-Based
Statistical Process Control
Frequency Domain
Time series Forecasting
Information Theory
Principal Component
Analysis
Sketch-Based
3-sigma algorithms
Exponential Smoothing
ARIMA
Fast Fourier Transform
Wavelets
Entropy Subspace Heavy Hitters
Set Cardinality
Probability Models
Markov Models
Bayes Nets
Unsupervised ML
Clustering
Density
Proximity
Anomalous Traffic Patterns
Interrelationships between Features
© 2015 Cisco and/or its affiliates. All rights reserved. 6
Volume-Based vs. Feature Based
Telemetry Volume-Based Feature-Based Encrypted Traffic (Raw Packet) YES NO
Raw Packet + Header Metadata YES YES
Machine Exhaust Data YES (online) NO DPI Metadata NO YES
Netflow YES YES Enrichment Metadata YES YES
Application Logs YES YES Other Alerts NO* YES
© 2015 Cisco and/or its affiliates. All rights reserved. 7
Anomaly Detection: 3-Phase Process
Unstructured Data
Identify
Anomaly
Classify
Alert
Examine + Reinforce
Training Set Historical Context
© 2015 Cisco and/or its affiliates. All rights reserved. 8
Phase 1: Identify
Unstructured Data
Und
erst
andi
ng o
f N
orm
al
Anomaly A
Anomaly B
Anomaly C
Anomaly (N)
© 2015 Cisco and/or its affiliates. All rights reserved. 9
Phase 2: Classify
Full Packet Telemetry DPI Telemetry Telemetry (N) Outcome
Volume Anomaly
Entropy Anomaly
Feature (x)
Heavy Hitters Anomaly
Volume Anomaly
Cardinality Anomaly
Feature (x)
Protocol Anomaly Featur(x)
Anomaly (A) Anomaly (B) Anomaly (N) Class Label
x x x x x x x Port Scan
x x x x x False Positive
x x x x Network Scan
x x x x Port Scan
x x x x False Positive
x x x x x x DDoS
© 2015 Cisco and/or its affiliates. All rights reserved. 10
Phase 3: Examine + Reinforce Full Packet Telemetry DPI Telemetry Telemetry (N) Outcome
Volume Anomaly
Entropy Anomaly
Feature (x)
Heavy Hitters Anomaly
Volume Anomaly
Cardinality Anomaly
Feature (x)
Protocol Anomaly Featur(x)
Anomaly (A) Anomaly (B) Anomaly (N) Class Label
x x x x x x x Port Scan
x x x x x False Positive
x x x x Network Scan
x x x x False Positive
x x x x x x DDoS
x x x x x x False Positive
x x x x x x False Positive
x x x x False Positive
x x x x x x DDoS
© 2015 Cisco and/or its affiliates. All rights reserved. 11
Basic Anomalies
Anomaly Definition Alpha Flows Large volume point-to-point flows
DoS Denial of service (distributed or single source)
Flash Crowd Large volume of traffic to a single destination from a large number of sources
Port Scan Probe to many destination ports on a small number of destination addresses
Network Scan Probe to many destination addresses on a small number of destination ports
Outage Events Traffic shifts because of equipment failures or maintenance
Plateau Behavior Behavior caused by traffic reaching environmental limits
Point-to-Multipoint Traffic from a single source to many destinations, e.g., content distribution
Worms Scanning by worms for vulnerable hosts, which is a special case of network scan
© 2015 Cisco and/or its affiliates. All rights reserved. 12
Batch Analytics Normalcy Models
© 2015 Cisco and/or its affiliates. All rights reserved. 13
Implementation
MAP MAP MAP
Time Series DB
Key: assetID-metricID-Bin
RED RED RED
Asset Bin Value
Server 1 15 5pt *
Server 2 15 5pt *
Server (N) 15 5pt *
assetID-metricID-Bin : 5pt
Telemetry
Anomaly?
* 5-point summary (5pt): 1. the sample minimum
(smallest observation) 2. the lower quartile or first
quartile 3. the median (middle value) 4. the upper quartile or third
quartile 5. the sample maximum (largest
observation)
Table Name: Metric ID (Cumulative Volume)
© 2015 Cisco and/or its affiliates. All rights reserved. 14
Batch Analytics Forecasting Models
Forecast
Forecasting Algorithm (ARIMA/Holt-Winters, …)
© 2015 Cisco and/or its affiliates. All rights reserved. 15
Implementation
MAP MAP MAP
Time Series DB
Key: assetID-metricID-Bin
RED RED RED
Key: assetID-metricID-Bin: [Expected | STD]
Telemetry
Anomaly? Asset Bin Value
Server 1 15 EX |STD
Server 2 15 EX |STD
Server (N) 15 EX |STD
Table Name: Metric ID (Cumulative Volume)
© 2015 Cisco and/or its affiliates. All rights reserved. 16
Time Series DB
Batch Model Deployment Step 1: Bootstrap: Stream Data
Unstructured Data
OpenSOC OpenSOC JSON
Step 2: Pre-Compute Expected Values (Batch)
Timestamp
HIVE
Time Series DB MR/Spark MR/Spark MR/Spark
Step 3: Generate Alerts (Online)
Unstructured Data
OpenSOC
Expected Values Reference Cache
Time Series DB
OpenSOC JSON
Timestamp
HIVE
Alert ES
Expected Values Reference
Cache
© 2015 Cisco and/or its affiliates. All rights reserved. 17
Online Analytics Data Preparation
Deseasonalizer AV CMA RAT UF RF DV
© 2015 Cisco and/or its affiliates. All rights reserved. 18
Online Analytics Other things to check for
Trend:
Seasonal Variability:
Evolution of Regularities:
© 2015 Cisco and/or its affiliates. All rights reserved. 19
Online Processing
3-Sigma Algorithms
Micro Forecasting
Histogram Bins
© 2015 Cisco and/or its affiliates. All rights reserved. 20
Frequency Domain
High
• Trendless • Noise • Spikes represent
Anomalies
Medium • Flatter • Finer-grained
Trends
Low • Seasonal &
‘Peaky’ • Weekly/Daily
Trends
© 2015 Cisco and/or its affiliates. All rights reserved. 21
Frequency Domain – Wavelet Separation
© 2015 Cisco and/or its affiliates. All rights reserved. 22
Online Model Deployment
Time Series DB
Step 1: Bootstrap: Stream Data
Unstructured Data
OpenSOC OpenSOC JSON
Step 2: Generate Adjuster
Timestamp
HIVE
Time Series DB MR/Spark
Adjuster / Decomposer
Step 3: Generate Alerts (Online)
Unstructured Data OpenSOC
Time Series DB
OpenSOC JSON
Timestamp
HIVE
Alert ES
Adjuster Decomposer
MR/Spark MR/Spark
© 2015 Cisco and/or its affiliates. All rights reserved. 23
Feature-Based Anomaly Detection Continuous Numeric Features*
• Continuous Numeric Feature - can take on any value between its minimum value and its maximum value • Normalization - adjusting values measured on different scales to a notionally common scale
1. Proximity Based Techniques Example: K-Nearest Neighbors (KNN)
2. Clustering Example: K-Means 3. Density - Based MPS Anomaly
KBps Anomaly
Possible Explanation
TOO HIGH TOO LOW Port Scan Network Scan
TOO HIGH TOO HIGH DDoS
TOO LOW TOO HIGH Control Traffic Anomaly
OK OK No Anomaly
Sample Anomalies Detected
© 2015 Cisco and/or its affiliates. All rights reserved. 24
Feature-Based Anomaly Detection Categorical Features *
• Categorical Features - can take on one of a limited, and usually fixed, number of possible values • Stream Sketch - algorithm produces an approximate answer based on a summary of the data stream in memory
Example: Protocol {UDP|FTP|HTTP|…}, GEO-MET {PHOENIX | DALLAS | LONDON| …}, … Count-Min (CM) Sketch : number of occurrences of the element in a stream (Heavy Hitters) Why not count? Protocol: 42k elements per asset. GeoMet: 246k per asset
Time Series DB Categorical Data CM
Sketch Heavy Hitters
Asset Bin Value
Server 1 15 HH
Server 2 15 HH
Server (N) 15 HH M
R
Table Name: Protocol
Unstructured Data CM
Sketch Alert
Expected: {HTTP, UDP, FTP, DNS} ACTUAL: {DNS, ICMP, HTP, FTP}
© 2015 Cisco and/or its affiliates. All rights reserved. 25
Feature-Based Anomaly Detection Feature Ratios HyperLogLog: approximating the number of distinct elements in a multiset Useful Ratio: # distinct elements / total elements [0-1]
• Digest- structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means
Unstructured Data Hyper
LogLog Distinct
Src_port Dst_port Src_ip Dst_ip
Storm Bolt
Src_port Dst_port Src_ip Dst_ip
Ack Total
Ratios
Digest *
Alert FEATURE DT RATIO
Anomaly Possible Reason
SRC_IP ~1/~0 Flash Crowd/DDoS
SRC_PORT ~1/~0 Failure Probing/App Hijack
DST_IP ~1/~0 Network Scan/DDoS
DST_PORT ~1/~0 Port Scan/Footprinting
© 2015 Cisco and/or its affiliates. All rights reserved. 26
Feature-Based Anomaly Detection Correlation - Information Theory
• Information Theory - study of fundamental limits on signal processing, compression, and storage • Entropy- a measure of unpredictability of information content
Unstructured Data
Anomaly-Free Training Set
Entropy Summarizer
Entropy
Src_port Dst_port Src_ip Dst_ip Time Bin (n)
SRC_IP
SRC_PORT
DST_IP
DST_PORT
SRC_IP - .95 .85 .75
SRC_PORT - .97 .76
DST_IP - - - .98
DST_PORT - - - -
MR
Ale
rt
Time Bin (n)
© 2015 Cisco and/or its affiliates. All rights reserved. 27
Principal Component Analysis (PCA)
Analysis
Component
Principal • Feature Selection Algorithm
• Dimensionality Reduction
• E.g. 4 features
• ServerA (A)
• ServerB (B)
• ServerC (C)
• Cumulative = A + B + C
© 2015 Cisco and/or its affiliates. All rights reserved. 28
PCA – Component Construction ServerA Traffic
X -0.5052803
ServerB Traffic
X -0.4990556
ServerC Traffic
X -0.4816276
Cumulative X
-0.5134882
PC1
σ: 0.0135
ServerA Traffic
X 0.2801275
ServerB Traffic
X 0.4611079
ServerC Traffic
X -0.8395562
Cumulative X
0.0636666
PC2 σ: 0.5773
ServerA Traffic
X 0.6867089
ServerB Traffic
X -0.6988557
ServerC Traffic
X -0.1441834
Cumulative X
0.138718
PC3 σ: 0.5773
ServerA Traffic
X -0.4411929
ServerB Traffic
X -0.2234362
ServerC Traffic
X -0.2058916
Cumulative X
0.8444132
PC4
σ: 0.5773
© 2015 Cisco and/or its affiliates. All rights reserved. 29
PCA – Component Separation
© 2015 Cisco and/or its affiliates. All rights reserved. 30
PCA – Component Separation
© 2015 Cisco and/or its affiliates. All rights reserved. 31
Putting it All Together: OpenSOC
RAW Transform Enrich Alert (Rules-Based)
Enriched
Filter Aggregators
Router Model 1 Scorer
HIVE + Hbase Long-Term Data Store
Flume Kafka Storm
Model 2 Model n
OpenSOC-Streaming
OpenSOC-Aggregation
OpenSOC-ML
SOC Alert Consumers
UI UI UI UI UI Web Services
Secure Gateway Services
External Alert Consumers
Big Data Stores
Elastic Search Real-Time Index and Search
Hbase OpenTSDB Titan Graph
Alerts
ES/HIVE Alerts Store
Remedy Ticketing System
© 2015 Cisco and/or its affiliates. All rights reserved. 32
We are hiring…
• Data Scientists (Security) • Aspiring Data Scientists
• Security/Networking Experience Required • Software Engineering Experience Required • PhD not required • Background in stats or ML not required
• Security Researchers *Please contact us via LinkedIn with your profile
© 2015 Cisco and/or its affiliates. All rights reserved. 33
Book idea…
Security Analytics on Hadoop • Anomaly Detection • Targeted Models • Deployment Best Practices • Alerts • Visualization Techniques • Etc…
If interested in contributing please contact James Sirota on LinkedIn
© 2015 Cisco and/or its affiliates. All rights reserved. 34
OpenSOC Resources (@ProjectOpenSOC)
Github Repo • https://github.com/OpenSOC/opensoc Slides • http://www.slideshare.net/JamesSirota • https://speakerdeck.com/jsirota
Corporate Blogs • http://blogs.cisco.com/author/jamessirota • http://blogs.cisco.com/security/opensoc-an-open-commitment-to-security
Contributor Blogs • https://medium.com/@jamessirota • parrottsquawk.com
Thank you.
top related