anomaly detection studies in the ip backbone tao ye sprint burlingame, ca 2007-09-19
TRANSCRIPT
Anomaly Detection Studies in the IP Backbone
Tao Ye
Sprint
Burlingame, CA
2007-09-19
2
24: “Stop that packet at the router!”
• Detected an anomaly
• Specify and activate a new ACL
• At OC-192 routers? @500Kpps? In 26μs?
Anomaly Detection at the IP Backbone
3
Outline
• Tier-1 backbone: an overview
• TAPS: connectionless port scan detection and tracking on the backbone
• Scaling up: sampling and anomaly detection
4
Today’s Tier-1 Backbones
• Topology – high speed routers in points-of-presence (POPs) connected by long-haul fiber > numerous small POPs (e.g., UUNet)> relatively few large POP (e.g., Sprint)
• Technologies > IP over SONET (POS)> IP over ATM (phasing out)> MPLS, VPN tunnel
• Common Engineering Practice> failure protection implemented at IP layer> “over-provisioned” core
5
What we (Research group @ Sprint ) do
• Measurement: Collect a lot of data from the Internet backbone, understand the current state
• Monitoring: Use of measurement to detect events of (operational) interest
• Hardware> CMON Monitoring boxes in the field, @5 POPs> Storage (30T) and analysis platform at the lab> Website for sharing results
• Algorithms and Software tools> Continuous monitoring> Anomaly detection> Active measurement
• Other: > Wireless
• Paging attacks• Fairness implementations• TCP over wireless
6
Outline
• Measurement and Monitoring at a tier-1 backbone: an overview from the industry perspective
• TAPS: Connectionless port scan detection and tracking on the backbone
• Scaling up: sampling and anomaly detection
7
Motivation and Challenges
• Our goals> Detect and track> Understand long term behavior of scanners> On the backbone network
• Why Backbone ? > Detection: Existing work most at stub networks, limited
visibility > Tracking: Honeypots can be evaded> More scanning activities visible at core> Peering point unique vantage point
• Challenges> Backbone traffic unidirectional, asymmetric> High speed (OC-48, OC-192) links, needs fast algorithm> Diverse traffic mix, needs efficient data structure
8
Intuition: Access Patterns
9
TAPS: Time-based Access Pattern Sequential hypothesis testing
• Based on 5-tuple flow summary on unidirectional link
• Scanner suspects: source IPs accesses IP/port (or port/IP) ratio > k in time-bin
• Sequential Hypothesis Testing
1
1
0
1
1
0
[ 1 | ] IPif
[ 1 | ] Port
[ 0 | ] IPif
[ 0 | ] Port
i
i
P Y Hk
P Y HiP Y H
kP Y H
10
TAPS
( ) 1Y
1( )Y
0( )Y
Threshold for tagging source as scanner
Increment when IP/port > K
Decrement when IP/port < K
Threshold for tagging source as benignScanner if i 1
Benign if i 0
> <
SrcIP
11
Performance: TCP
Detection Algo Comparison
90.20%
81.70%
35.10%
9.80%12.10%
45.60%
25.70%
74.30%
64.90%
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Success False Positive False Negative
per
cen
tag
es
TAPS
TRWSYN
SNORT
12
Online Implementation Architecture
• Use CMON to produce flows in NetFlow5
• Flow Daemon distributes flows
• Keep flows in circular buffer
CMON Flow Collector
Flow
Daemon
Core
App Handler
TAPS Other
Disk Writer
Disk Reader
Circular Buffer
Disk
Flow Daemon
13
Detector and Tracker Architecture
14
Design choices: Approximation Counters
• Issues: > Need to keep the fan-out count
for each IP> Heap implementation has
prohibitively high memory requirements
• Probabilistic Counters: > Many recently proposed
counters: • Small SRAM Implementation:
Multi-resolution bitmap, trigger bitmap
> Simple Flajolet-Martin counter
• FM counter performance> 8 hash functions accurate
enough for <>k test> 256, 32 and 8 hash functions
15
Results
• Data set> OC48 Peering link incoming, ~320Mbps, 22 days> OC48 Peering link outgoing, ~560Mbps, 3 days
16
Scanner Duration
22 days 3 days
17
Scanner Rate
18
Number of Scanner Detected (1)
• Time series of Number of scanners detected (3days)
19
Scanning Ports
• Port accessed
20
Conclusion
• Online Scan Detection and Tracking> Targets unidirectional backbone link> Detector: Time-based Access Pattern Sequential
Hypothesis (TAPS)• Combines rate limiting with statistical tests on destination IP
and port access patterns> Implementation design: Queue model and FM counter
• Scanner Behavior> 90-10 split of scanning rate, scanning duration behavior> Spike in number of scanners detected
21
Outline
• Tier-1 backbone: an overview
• TAPS: connectionless port scan detection on the backbone
• Scaling up: sampling and anomaly detection
22
Motivation
• Sampling to reduce processing overhead in traffic monitoring
• Sampled data used in:> Traffic Engineering -- computing traffic matrices> Inferring flow statistics from sampled data (Duffield03,
Hohn03)
• Anomaly Detection (DDoS attacks, worm scans):
Does sampled data contain sufficient information for effective anomaly detection?
• The brief answer … it depends> On sampling method> On sampling rate
• The impact of sampling> Number of anomalies detected: decreased> False positives: increased
23
Methodology
Anomaly Detection Module
Traffic
traces
Anomaly Detection Module
Sampling Module
Results
Results
compare
24
Anomalies and Detection Algorithms
Type of Anomaly Detection Algorithms
Volume Anomaly :
DoS attacks, flash crowds
1. Wavelet-based change detection [Barford02]
Port Scanning:
Worm/virus propergation
2. Threshold Random Walk [Jung04]
3. Access Pattern: TAPS [Sridharan06]
Anomaly Detection Module
Traffic
tracesAnomaly Detection
ModuleSampling
Module
Results
Results
compare
25
Sampling Methods
• Random packet sampling: each packet sampled with probability r < 1
> Simple implementation (good for busy routers)> Widely deployed (Cisco NetFlow)> Flow statistics hard to recover
• Random flow sampling: classify flows, each flows sampled with probability p < 1
> High resource requirement> Accurate estimation of flow statistics
Anomaly Detection Module
Traffic
tracesAnomaly Detection
ModuleSampling
Module
Results
Results
compare
26
Sampling (continue)
• Designer flow sampling: for catching heavy-hitters> Smart Sampling [Duffield02] – flow records selected with a
probability
> Sample-and-Hold [Estan02]:
Each byte of a packet sampled with a small probability h. All the following packets in the flow will be sampled once the a packet in the flow gets sampled.
27
Comparing Sampling Algorithms
• How to compare: normalizing CPU load, or memory consumption
• Our choice – the percentage of flows sampled> Input to the anomaly detection based on flows,> Number of flows translates to memory consumption.
• Example of sampling parameter settings:
28
Impact of Sampling on Volume Anomaly Detection (1)• Wavelet-base change
detection on flow rate
• Decomposition
• Re-synthesize into three bands
•High ~ 1sec
•Mid ~ 1min
•Low ~ 15min
• Detection on high/mid
•Sliding window
•Deviation score
29
Impact of Sampling on Volume Anomaly Detection (2)
• Original detection: 21
• False negatives > Random flow sampling introduces more local variance
> Random packet sampling introduces even more variance> Smart sampling and sample-and-hold flatten the time
series
30
Impact of Sampling on Port Scan Detection
• Performance Metrics Definition> Success Ratio Rs= Num True Scanners Detected / Num True Scanners> False Positive Ratio Rf+= Num False Scanners Detected / Num True Scanners
• Rs => effectiveness, Rf+= errors
• Ground truth: True scanner set examined by hand.
31
TRWSYN results
32
TAPS results
• Flow count reduction – false negatives
• Flow shortening – false positives shoot up in random packet sampling.> A multi-packet TCP flow shrunk to a single SYN-packet flow> The result: scanners and benign hosts are statistically
indistinguishable.
33
Conclusion
• Implications of Our Results:> Random flow sampling is generally robust to both volume
anomaly and port scan detections.> Random packet sampling is oblivious to any underlying
traffic features, and causes information loss and distortion which degrade the performance of anomaly detection algorithms.
• Smart sampling and sample-and-hold target heavy-hitters, thus not quite suitable for anomaly detections.
• Ongoing work: > Design anomaly detection algorithms robust to sampling,> Design new anomaly-detection-friendly sampling methods.
34
The End!
• Tier-1 backbone: an overview
• TAPS: Connectionless port scan detection on the backbone and scanner profiling
• Sampling data is not NOT sufficient for anomaly detection purposes
http://research.sprintlabs.com
35
A Backbone POP
Peer
Core Router
Other POPs
Edge Router