wenke lee and nick feamster georgia tech botnet and spam detection in high-speed networks
TRANSCRIPT
Wenke Lee and Nick FeamsterGeorgia Tech
Botnet and Spam Detection in High-Speed Networks
Overview
• Problem: Botnet and Spam Detection in high-speed networks
• Common theme: Examine network-level properties and build classifier
• Two systems: BotMiner and SNARE– Overview– Integration with SMITE architecture
• Current integration status and plan
3
BotMiner: Structure and Protocol Independent
• Botnets can change their C&C content (encryption, etc.), protocols (IRC, HTTP, etc.), structures (P2P, etc.), C&C servers, infection models …
bot
bot
bot
bot
bot
C&C
bot
bot
bot
bot
bot
bot
(a) (b)
4
Definition of a Botnet
• “A coordinated group of malware instances that are controlled by a botmaster via some C&C channel”– Hosts that have similar C&C-like traffic and similar
malicious activities
• We need to monitor two planes– C-plane (C&C communication plane): “who is talking
to whom”– A-plane (malicious activity plane): “who is doing what”
5
BotMiner Architecture
Scan
Spam
A-Plane Monitor
BinaryDownloading
C-Plane Monitor
Flow Log
C-PlaneClustering
NetworkTraffic
Exploit
...
Activity Log
A-PlaneClustering
Cross-PlaneCorrelation
Reports
SensorsAlgorithms
Correlation
6
Cross-plane Correlation
• Botnet score s(h) for every host h– A host has higher score if it is in more activity
clusters and in both activity and communication clusters
– A host with a high score is a bot
• Similarity score between bot host hi and hj
– Two hosts in the same A-clusters and in at least one common C-cluster are clustered together
– Each cluster is a bot
7
SMITE Integration: BotMiner
8
• Sensors– Feature extraction for C-Plane and A-Plane
clustering– C-Flow temporal and statistical features
(SMITE flow analysis sensors)• Counting packets and connections between each
pair of endpoints: bytes per second, flows per hour, bytes per packet, packets per flow
– A-Plane header and payload features (SMITE flow sensors, AVIES)
• Destination IP addresses and ports, payload bytes/strings
Integrating BotMiner and SMITE
9
• Algorithms– C-plane clustering
• Multi-step clustering based on statistical and temporal C-flow features
– A-plane clustering• Based on activity-specific similarity measures: e.g., spread of
destination IP addresses and ports, and payload similarity• Analyze additional alerts from other detection algorithms
– Bot scoring and botnet clustering methods• Scoring based on participation in C-plane and A-plane
clusters• Clustering based on common memberships in the C-plane
and A-plane clusters
Integrating BotMiner and SMITE
10
• Cross-plane correlation – Botnet detection involves both vertical and horizontal
analysis/clustering:• Vertical: what activities a host has been involved in
– Bot detection
• Horizontal: what other hosts have similar (vertical) behavior patterns
– Botnet detection
Integrating BotMiner and SMITE
11
• Filter email based on how it is sent, in addition to simply what is sent.
• Network-level properties are less malleable– Hosting or upstream ISP (AS number)– Membership in a botnet (spammer, hosting
infrastructure)– Network location of sender and receiver– Set of target recipients
Network-Based Spam Detection
12
Finding the Right Features
• Goal: Sender reputation from a single packet header?– Low overhead– Fast classification– In-network– Perhaps more evasion resistant
• Key challenge– What features satisfy these properties and can
distinguish spammers from legitimate senders?
13
Network-Level Features
• Single-Packet– AS of sender’s IP– Distance to k nearest senders– Status of email service ports– Geodesic distance– Time of day
• Single-Message– Number of recipients– Length of message
• Aggregate (Multiple Message/Recipient)
14
Sender-Receiver Geodesic Distance
90% of legitimate messages travel 2,200 miles or less
15
Density of Senders in IP Space
For spammers, k nearest senders are much closer in IP space
16
Local Time of Day at Sender
Spammers “peak” at different local times of day
17
Combining Features: RuleFit
• Put features into the RuleFit classifier• 10-fold cross validation on one day of query logs
from a large spam filtering appliance provider
• Comparable performance to SpamHaus– Incorporating into the system can further reduce FPs
• Using only network-level features• Completely automated
18
Sample Results
False positives reduced to 0.14%
19
Integrating SNARE and SMITE
Sensors Algorithms
20
SMITE Integration Challenges
• Sources of labeled data– SNARE requires clean sources of labeled data
for training
• Data collection– SNARE’s performance improves when behavior
can be observed across multiple domains
• Availability of external data in RTEN testbed
21
SMITE Integration: Current Work
• Study pipeline architecture and code
• Modify flow-analyzer to dump 5-tuple flow information
22
SMITE Integration: Step 1
• Modify flow-analyzer with SMITE team to generate 5-tuple flow information (mid-March)
• Spam/scan detection, flow aggregation in BotMiner; Spam feature extraction in SNARE (end of March)
• Clustering and correlation in BotMiner; Classifier in SNARE (end of April)
23
SMITE Integration: Step 2
• Evaluate performance of BotMiner and SNARE– How many hours to process one-day of traffic, or what is
the “lag” time between event and detection?
• Design real-time detection algorithms– A two-tier system: off-line module output lists of suspicious
hosts, and real-time module inspects all packets of these hosts; or, off-line module output clusters
• Design algorithms to handle asymmetric traffic– Cluster on each direction of traffic and cross-correlate