machine-learning approaches for p2p botnet detection using signal-processing techniques

1
Poster template by ResearchPosters.co.za Machine-learning Approaches for P2P Botnet Detection using Signal-processing Techniques Pratik Narang, Vansh Khurana, Chittaranjan Hota Birla Institute of Technology & Science - Pilani, Hyderabad Campus References Packet Validation & Filtering Module Conversation Creation Module P2P botnets identified Valid packets Discarded packets Malicious conversation Benign conversation Feature Set Extraction Module Signal-processing based features K-nn REP trees ANNs SVMs Network-behavior based features Extracted Features Flowchart 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 K-nn REP trees ANN SVM Accuracy This work was supported by grants from the Department of Information Technology, Govt. of India 1. B. Rahbarinia, R. Perdisci, A. Lanzi, and K. Li. Peerrush: Mining for unwanted p2p traffic. In Detection of Intrusions and Malware, and Vulnerability Assessment, pages 62{82. Springer, 2013. 2. J. Zhang, R. Perdisci, W. Lee, X. Luo, and U. Sarfraz. Building a scalable system for stealthy p2p-botnet detection. Information Forensics and Security, IEEE Transactions on, 9(1):27{38, 2014. 3. X. Yu, X. Dong, G. Yu, Y. Qin, D. Yue, and Y. Zhao. Online botnet detection based on incremental discrete fourier transform. Journal of Networks, 5(5), 2010. Name No. of conversations Storm 10,000 Waledac 10,000 Zeus 2,657 Clean (multiple P2P apps) 78,000 Dataset used Results Abstract Motivation Bots tend to have certain regularity and periodicity in their C &C communication with other bot-peers We attempt to uncover these hidden patterns by the use of signal-processing techniques and thus detect P2P botnets Approach Apart from regular ‘network behaviorbased features, we extract several `signal-processing‘ based features The features are use to build detection models for P2P botnets We validate our approach using several supervised machine learning algorithms. Use of Entropy We quantify the entropy or randomness present in a conversation’s payload sizes For the payload values in each conversation, calculate the Expected Compression using Shannon’s entropy theory Bot C & C communication is more uniform than benign Internet traffic Hence higher compression should be achieved for conversations of bots. Use of DFT C & C communications of bots follows certain timing patterns Model each conversation as a signal Calculate DFT for the Inter-arrival time and payload lengths of packets in a conversation Sort the DFT values by magnitude. For any signal, the first few DFT coefficients contain most of the energy. Thus select top DFT coefficients Network-behavior based Signal-processing based Avg. payload (forward) Compression for payload Avg. payload (backward) DFT payload magnitude 1 Avg. packets sent (forward) DFT payload phase 1 Avg. packets sent (backward) DFT payload magnitude 2 Median inter-arrival time DFT payload phase 2 Variance in Packet size DFT Inter-arrival time magnitude 1 Duration of the conversation DFT Inter-arrival time phase 1 Extracted Features The distributed and decentralized nature of P2P botnets makes their detection a challenging task. Further, the bot-masters continuously try to improve their botnets in order to evade existing detection mechanisms. Thus, although a lot of research has been seen in this field, their detection continues to be an important area of research. We propose a novel approach for the detection of Command & Control (C & C) communication of P2P botnets by converting the `time-domain' network communications of nodes to the `frequency-domain'. We adopt a signal-processing based approach by treating the communication of each pair of nodes seen in the network traffic as a `signal'. Apart from the regular `network behavior' based features, we extract features based on Discrete Fourier Transforms and Shannon's Entropy to build supervised machine learning models for the detection of P2P botnets.

Upload: pratik-narang

Post on 09-Jun-2015

263 views

Category:

Documents


1 download

DESCRIPTION

The Poster of my paper published in ACM DEBS 2014

TRANSCRIPT

Page 1: Machine-learning Approaches for P2P Botnet Detection using Signal-processing Techniques

Poster template by ResearchPosters.co.za

Machine-learning Approaches for P2P Botnet Detection

using Signal-processing Techniques

Pratik Narang, Vansh Khurana, Chittaranjan Hota

Birla Institute of Technology & Science - Pilani, Hyderabad Campus

References

Packet Validation & Filtering Module

Conversation Creation Module

P2P botnets identified

Valid packets Discarded packets Malicious conversation Benign conversation

Feature Set Extraction Module

Signal-processing based features K-nn

REP trees ANNs SVMs Network-behavior based

features

Extracted Features

Flowchart

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

K-nn REP trees ANN SVM

Accu

racy

This work was supported by grants from the Department of Information Technology, Govt. of India

1. B. Rahbarinia, R. Perdisci, A. Lanzi, and K. Li.

Peerrush: Mining for unwanted p2p traffic. In Detection

of Intrusions and Malware, and Vulnerability

Assessment, pages 62{82. Springer, 2013.

2. J. Zhang, R. Perdisci, W. Lee, X. Luo, and U. Sarfraz.

Building a scalable system for stealthy p2p-botnet

detection. Information Forensics and Security, IEEE

Transactions on, 9(1):27{38, 2014.

3. X. Yu, X. Dong, G. Yu, Y. Qin, D. Yue, and Y. Zhao.

Online botnet detection based on incremental discrete

fourier transform. Journal of Networks, 5(5), 2010.

Name No. of

conversations

Storm 10,000

Waledac 10,000

Zeus 2,657

Clean

(multiple P2P apps) 78,000

Dataset used

Results

Abstract

Motivation

• Bots tend to have certain regularity and periodicity in their C &C communication

with other bot-peers

• We attempt to uncover these hidden patterns by the use of signal-processing techniques and thus detect P2P botnets

Approach

• Apart from regular ‘network behavior’ based features, we extract several

`signal-processing‘ based features

• The features are use to build detection models for P2P botnets

• We validate our approach using several supervised machine learning algorithms.

Use of Entropy

• We quantify the entropy or randomness present in a conversation’s payload sizes

• For the payload values in each conversation, calculate the Expected

Compression using Shannon’s entropy theory

• Bot C & C communication is more uniform than benign Internet traffic

• Hence higher compression should be achieved for conversations of bots.

Use of DFT

• C & C communications of bots follows certain timing patterns

• Model each conversation as a signal

• Calculate DFT for the Inter-arrival time and payload lengths of packets in a

conversation

• Sort the DFT values by magnitude.

• For any signal, the first few DFT coefficients contain most of the energy.

Thus select top DFT coefficients

Network-behavior based Signal-processing based

Avg. payload (forward) Compression for payload

Avg. payload (backward) DFT payload – magnitude 1

Avg. packets sent (forward) DFT payload – phase 1

Avg. packets sent (backward) DFT payload – magnitude 2

Median inter-arrival time DFT payload – phase 2

Variance in Packet size DFT Inter-arrival time – magnitude 1

Duration of the conversation … DFT Inter-arrival time – phase 1 …

Extracted Features

The distributed and decentralized nature of P2P botnets makes their detection a challenging task. Further, the bot-masters continuously try to improve their botnets in

order to evade existing detection mechanisms. Thus, although a lot of research has been seen in this field, their detection continues to be an important area of research.

We propose a novel approach for the detection of Command & Control (C & C) communication of P2P botnets by converting the `time-domain' network communications

of nodes to the `frequency-domain'. We adopt a signal-processing based approach by treating the communication of each pair of nodes seen in the network traffic as a

`signal'. Apart from the regular `network behavior' based features, we extract features based on Discrete Fourier Transforms and Shannon's Entropy to build supervised

machine learning models for the detection of P2P botnets.