http-based botntet detection

7/30/2019 HTTP-Based Botntet Detection

1/124

botAnalytics: Improving HTTP-Based Botnet Detection

by Using Network Behavior Analysis System

Meisam Eslahi

DISSERTATION SUBMITTED IN FULFILMENT OF THE

REQUIREMENTS FOR THE DEGREE OF MASTER OF COMPUTER

SCIENCE

Faculty of Computer Science and Information Technology

University of Malaya

2010


2/124

II

UNIVERSITI MALAYA

ORIGINAL LITERARY WORK DECLARATION

Name of Candidate: Meisam Eslahi (I.C/Passport No: I2140114)

Registration/Matric No: WGA070104

Name of Degree: Master of Computer Science

Title of Project Paper/Research Report/Dissertation/Thesis (this Work):

botAnalytics: Improving HTTP-Based Botnet Detection by Using Network Behavior

Analysis System

Field of Study: Network Security

I do solemnly and sincerely declare that:

(1) I am the sole author/writer of this Work;(2) This Work is original;

(3) Any use of any work in which copyright exists was done by way of fair dealingand for permitted purposes and any excerpt or extract from, or reference to or

reproduction of any copyright work has been disclosed expressly andsufficiently and the title of the Work and its authorship have been acknowledged

in this Work;(4) I do not have any actual knowledge nor do I ought reasonably to know that the

making of this work constitutes an infringement of any copyright work;(5) I hereby assign all and every rights in the copyright to this Work to the

University of Malaya (UM), who henceforth shall be owner of the copyrightin this Work and that any reproduction or use in any form or by any means

whatsoever is prohibited without the written consent of UM having been firsthad and obtained;

(6) I am fully aware that if in the course of making this Work I have infringed anycopyright whether intentionally or otherwise, I may be subject to legal action or

any other action as may be determined by UM.

Candidates Signature Date

Subscribed and solemnly declared before,

Witnesss Signature Date

Name: Dr Rosli Salleh

Designation: Supervisor


3/124

III

Abstract

This thesis reports on the research conducted to develop a method for detecting

HTTP-based Botnets based on the Network Behaviour Analysis system.Bots are small-size

malwares that infect computers, and join with other bots via the Internet to form a network

of bots called Botnet.

Botnets and their bots have a dynamic and flexible nature. The Botmasters, who

control the Botnets, update the bots and change their codes day by day to avoid the

traditional detection methods such as signature-based anti-viruses. In addition, many

techniques are employed by Botmasters to make their Botnets undetectable for as long as

possible. The latest generations of Botnets are HTTP-based, and use the standard HTTP

protocol to communicate with their bots. By using the normal HTTP traffic, the bots passed

off as normal users of the networks, and they can easily bypass the current network security

systems.

To solve this problem, a method based on network behaviour analysis system was

developed to improve the existing methods of detecting HTTP-based Botnets and their

bots. The system, botAnalytics, was developed by modifying the existing network behavior

analysis methods and adding new features to them. The Delphi programming language was

used to develop the botAnalytics system, while Microsoft Sql Server 2008 was selected as

its database management system. New filters and algorithms were designed and developed

to analyse the collected network packets to look for any evidence of suspicious HTTP-

based Botnets activities.


4/124

IV

In addition to HTTP-based Botnet detection, one of the HTTP header fields, called

the User-Agent, was used by botAnalytics to analyse the level of danger of detected

suspicious activities. This is the first reported use of the User-Agent to aid Botnet detection.

Based on the result of the testing and evaluation of botAnalytics, the system has been found

to be very efficient in detecting HTTP-based Botnets. botAnalytics was also found to be

very efficient for detecting small-scale Botnets.


5/124

V

Acknowledgements

Thank God, the most Gracious and Merciful, for all the blessings bestowed on me.

The submission of this dissertation marks the end of a somewhat long journey in my pursuit

of Masters degree at the University of Malaya, Kuala Lumpur. The journey would have

been difficult if not for all the help, understanding and kindness of many people.

Without doubt, I would like to express my sincere gratitude to my supervisors, Dr

Omar Zakaria and Dr Rosli Salleh for their kindness to take me under their charge to

conduct this research. Their patience and encouragement gave me the motivation to work

on this research until its successful completion. Their guidance and readiness to share their

knowledge have greatly contributed to the direction I should take and what I should do to

achieve my goal. I cannot thank them enough, and it is hoped the Malay way of expressing

how I feel says it all Ribuan terima kasih.

While doing my studies and research in FCSIT, one can say that one is never

working alone. I have the friendship, goodwill and support of my course-mates and friends,

who have never hesitated to offer their advice and moral support when it is needed. To my

good friend, Mohsen Saghafi, in particular, thank you for being there whenever I needed

someone to go to for advice. To all of them, especially Saiful Khan, Teh Kang Hai, Paul

Nelson, and Ali Keshavarz a big thank you.

I would like to express my gratitude and love to my family for their care and

understanding when I was doing my research. To the two special women in my life, my

mother K.Abdullahi and my wife Maryam Var Naseri, your boundless love, and for your

confidence in me, you have been my pillars of strength and determination to help me to


6/124

VI

carry on, and if I have succeeded, then you have been a big part of my success, and I

dedicate it to both of you together with my love.


7/124

VII

Table of Content

Abstract ........................................................................................................................... III

Acknowledgements .......................................................................................................... V

Table of Content ............................................................................................................ VII

List of Figures ................................................................................................................. XI

List of Tables................................................................................................................. XIII

Abbreviations ................................................................................................................ XIV

Chapter 1: Introduction ................................................................................................ 1

1.1 Background............................................................................................................ 1

1.2 Motivation.............................................................................................................. 3

1.3 Statement of Problem ............................................................................................. 5

1.4 Statement of Objectives .......................................................................................... 6

1.5 Proposed Solution .................................................................................................. 7

1.6 Thesis Scope .......................................................................................................... 7

1.7 Thesis Organisation................................................................................................ 8

Chapter 2: Bot and Botnets......................................................................................... 10

2.1 Introduction ............................................................................................................ 10

2.2 Characteristics of Botnet......................................................................................... 11

2.2.1 Botnet life cycle and Botmaster activities ......................................................... 11

2.2.2 Botmasters Prime Targets................................................................................. 13

2.2.3 Botnet Command and Control (C&C) Mechanism............................................ 13

2.2.4 Centralised Command and Control Mechanism ................................................ 14

2.2.4.1 IRC-based Botnets .............................................................................. 16

2.2.4.2 HTTP-based Botnets ........................................................................... 17

2.2.5 Decentralised or P2P Command and Control Mechanism ................................. 18

2.3 Why Choose HTTP Botnets? .................................................................................. 19

2.4 Existing Botnet Detection Methods......................................................................... 21

2.4.1 Honeypot and Honeynet ................................................................................... 21


8/124

VIII

2.4.2 Detection by Signature ..................................................................................... 22

2.4.3 Detection by DNS Monitoring.......................................................................... 23

2.4.4 Detection using Attack Behaviour Analysis...................................................... 25

2.5 Detection Based on Network Behaviour Analysis ................................................... 26

2.5.1 Why Choose Network Behaviour Analysis? ..................................................... 27

2.5.2 Existing Detection Methods Based on NBA ..................................................... 27

2.5.3 Evaluation and Comparison of Existing NBA Methods for Botnet Detection.... 30

2.6 Conclusion ............................................................................................................. 32

Chapter 3: Modeling of Detection System .................................................................. 33

3.1 Introduction: ........................................................................................................... 33

3.2 Proposed Method Architecture................................................................................ 33

3.3 Data Reduction Filters ............................................................................................ 34

3.4 VOU Mechanism ................................................................................................. 36

3.5 Analysing the Collected Traffic............................................................................ 38

3.6 LODA Mechanism ............................................................................................... 42

3.7 Proposed Method Flowchart ................................................................................. 44

3.8 Conclusion ........................................................................................................... 46

Chapter 4: Implementation of Proposed Model ......................................................... 47

4.1 Introduction.......................................................................................................... 47

4.2 DELPHI programming language .......................................................................... 47

4.3 Client Side Implementation .................................................................................. 49

4.3.1 Settings ....................................................................................................... 49

4.3.2 Sniffing the Traffic ..................................................................................... 51

4.3.3 H.T.S. filter................................................................................................ 52

4.3.4 G.P.S. filter................................................................................................. 52

4.3.5 VOU Mechanism ........................................................................................ 53

4.4 Database Implementation ..................................................................................... 55

4.4.1 Microsoft Sql Server 2008 .......................................................................... 55

4.4.2 Tables structure ......................................................................................... 57


9/124

IX

4.4.3 Tables relationship...................................................................................... 63

4.5 Server Side Implementation.................................................................................. 63

4.5.1 General info ................................................................................................ 64

4.5.2 Analyse ...................................................................................................... 66

4.5.3 Notifications ............................................................................................... 73

4.5.4 Report......................................................................................................... 74

4.5.5 User Agent list ............................................................................................ 75

4.5.6 White list .................................................................................................... 76

4.5.7 Black list .................................................................................................... 77

4.5.8 Sensor status............................................................................................... 77

4.5.9 User account ............................................................................................... 80

4.6 Conclusion ........................................................................................................... 82

Chapter 5: Testing the Proposed Model ..................................................................... 83

5.1 Introduction.......................................................................................................... 83

5.2 Hardware Requirements ....................................................................................... 83

5.3 Testing bots.......................................................................................................... 84

5.4 Testing Command and Control servers ................................................................. 86

5.5 Testing clients ...................................................................................................... 86

5.6 Testing analyser................................................................................................... 87

5.7 Testing results ...................................................................................................... 87

5.8 Conclusion ........................................................................................................... 88

Chapter 6: Data Analysis and Discussion................................................................... 89

6.1 Introduction ............................................................................................................ 89

6.2 Evaluation of botAnalytics ..................................................................................... 89

6.2.1 Filtering evaluation .......................................................................................... 90

6.2.2 VOU algorithm evaluation ............................................................................... 96

6.2.3 LODA algorithm evaluation ............................................................................. 97

6.3 Comparison of botAnalytics with Other Systems .................................................... 99

6.3.1 False-Positive rate ............................................................................................ 99


10/124

X

6.3.2 Efficiency in small-scale Botnets.................................................................... 100

6.4 Conclusion ........................................................................................................... 101

Chapter 7: Conclusion and Future Work................................................................. 102

7.1 Introduction .......................................................................................................... 102

7.2 Achievement of Objectives ................................................................................... 102

7.3 Contributions........................................................................................................ 103

7.3.1 HTTP-based Botnet Detection: ................................................................. 103

7.3.2 Establishment of User-Agent : .................................................................. 104

7.3.3 New Filters and Algorithms: ..................................................................... 104

7.3.4 Evaluate the Level of Danger:................................................................... 104

7.4 Limitations and Future Work................................................................................ 105

7.4.1 Real Time Detection: ................................................................................ 105

7.4.2 Linux Platform ......................................................................................... 105

7.4.3 Other Type of Bots and Botnets: ............................................................... 106

7.4.4 Prevention Methods: ................................................................................. 106

7.4.5 Advanced the User-Agent for Botnet Detection : ...................................... 106

7.5 Conclusion ........................................................................................................... 107

References .................................................................................................................... 108


11/124

XI

List of Figures

Figure 2-1: Botnet life cycle (Schiller & Binkley, 2007) .................................................. 11

Figure 2-2: General schema of Botnets C&C mechanism ................................................. 14

Figure 2-3: Centralised Botnet (Ping, et al., 2007) ............................................................ 15Figure 2-4: IRC-based C&C Botnet (Gu, et al., 2008) ...................................................... 16

Figure 2-5: HTTP-based C&C Botnet (Gu, et al., 2008) ................................................... 17

Figure 2-6: Decentralized or P2P Botnet (Ping, et al., 2007) ............................................. 18

Figure 3-1: botAnalytics System Architecture .................................................................. 33

Figure 3-2: The flowchart of H.T.S. filter ......................................................................... 35

Figure 3-3: The flowchart of G.P.S. filter ......................................................................... 36

Figure 3-4: The VOU Module Flowchart ......................................................................... 37

Figure 3-5: Flowchart of H.A.R. Filter ............................................................................. 40

Figure 3-6: Flowchart of L.A.R. Filter ............................................................................. 41

Figure 3-7: P.A.R. Filter Flowchart .................................................................................. 42

Figure 3-8: LODA Module Flowchart .............................................................................. 43

Figure 3-9: The Proposed Method Flowchart ................................................................... 45

Figure 4-1: botAnalytics Client Side GUI ........................................................................ 49

Figure 4-2: Setting GUI ................................................................................................... 50

Figure 4-3: Traffic Sniffer GUI ........................................................................................ 51

Figure 4-4 : H.T.S Filter GUI ........................................................................................... 52

Figure 4-5 : G.P.S Filter GUI ........................................................................................... 53Figure 4-6 : VOU Mechanism GUI .................................................................................. 53

Figure 4-7 : VOU Pseudo Code ........................................................................................ 54

Figure 4-8: botAnalytics Database: Relationship between the Tables ............................... 63

Figure 4-9: botAnalytics Server Side GUI ........................................................................ 64

Figure 4-10: General Info GUI ......................................................................................... 65

Figure 4-11: GET and POST Percentage Query Pseudo Code .......................................... 65

Figure 4-12: Collected Traffic Statistics Query Pseudo Code ........................................... 66

Figure 4-13: Primary Data Tab of the Analyse Section ..................................................... 67

Figure 4-14: Black/White listing Tab of the Analyse Section ........................................... 68

Figure 4-15: H.A.R. Result Tab of the Analyse Section ................................................... 69Figure 4-16: H.A.R. Filter Pseudo Code ........................................................................... 69

Figure 4-17: L.A.R Result Tab of the Analyse Section ..................................................... 70

Figure 4-18: L.A.R. Filter Pseudo Code ........................................................................... 70

Figure 4-19: P.A.R. Result Tab of the Analyse Section .................................................... 71

Figure 4-20: P.A.R. Filter Pseudo Code ........................................................................... 72

Figure 4-21: LODA Module Pseudo Code ....................................................................... 72

Figure 4-22: LODA Module Result GUI .......................................................................... 73


12/124

XII

Figure 4-23: Notifications GUI ........................................................................................ 74

Figure 4-24: Report GUI .................................................................................................. 75

Figure 4-25: User Agent List GUI .................................................................................... 76

Figure 4-26: White List GUI ............................................................................................ 76

Figure 4-27: Black List GUI ............................................................................................ 77

Figure 4-28: Sensor Status GUI ........................................................................................ 78

Figure 4-29: Sensor Info Pseudo Code ............................................................................. 79

Figure 4-30: Top 10 Active Sensors Pseudo Code ............................................................ 79

Figure 4-31: Edit Profile Tab ........................................................................................... 80

Figure 4-32: Create New User Tab ................................................................................... 81

Figure 4-33: Manage Existing Users Tab ......................................................................... 82

Figure 5-1 : General Schema for the Testing Phase .......................................................... 84

Figure 5-2 : The Black Energy User Agent (Nazario, 2007) ............................................. 84

Figure 5-3: The Firefox User Agent ................................................................................. 85

Figure 5-4: The Bobax User Agent ................................................................................... 85

Figure 6-1: The H.T.S. Filter Results Chart (See also Table 6-1) ...................................... 91

Figure 6-2: The G.P.S. Filter Results Chart (See also Table 6-1) ...................................... 92

Figure 6-3: The H.A.R. Filter Results Chart (See also Table 6-1) ..................................... 93

Figure 6-4: The L.A.R. Filter Results Chart (See also Table 6-1) ..................................... 94

Figure 6-5: The P.A.R. Filter Results Chart (See also Table 6-1) ...................................... 95


13/124

XIII

List of Tables

Table 2-1: Comparison of Methods from Past Researches with botAnalytics ................... 30

Table 4-1: Comparison of Managed-code with Native-code Languages ........................... 48Table 4-2 : Comparison of DBMSs .................................................................................. 55

Table 4-3 : Microsoft Sql Server 2008 Extra New Features .............................................. 56

Table 4-4 : tblUser Structure ............................................................................................ 57

Table 4-5: tblSquestion Structure ..................................................................................... 58

Table 4-6: tblRole Structure ............................................................................................. 58

Table 4-7: tblUserAgent Structure .................................................................................... 58

Table 4-8: tblWhiteList Structure ..................................................................................... 59

Table 4-9: tblBlackList Structure ..................................................................................... 59

Table 4-10: tblClientsInfo Structure ................................................................................. 60

Table 4-11: tblVOU Structure .......................................................................................... 60

Table 4-12: Structure of tblHMType table ........................................................................ 60

Table 4-13: Structure of tblVouValue Table ..................................................................... 61

Table 4-14: Structure of tblResult Table .......................................................................... 61

Table 4-15: Structure of tblLODA Table .......................................................................... 62

Table 4-16: Structure of the tblNotification Table ............................................................ 62

Table 5-1: botAnalytics Filtering Result ........................................................................... 87

Table 5-2: botAnalytics Botnet Detection Results ............................................................ 87

Table 6-1: botAnalytics: Results of Filtering .................................................................... 90

Table 6-2: The VOU Algorithm Result ............................................................................ 96

Table 6-3: The LODA Algorithm Results ........................................................................ 98

Table 6-4: Comparison of the botAnalytics with existing HTTP-based Botnet detection

researches ........................................................................................................................ 99

Table 6-5: The botAnalytics False-Positive .....................................................................100


14/124

XIV

Abbreviations

C&C Command and Control

DBMS Database Management System

DDOS Distributed Denial of Service

DNS Domain Name System

DNSBL DNS-based Black Hole List

ERD Entity Relationship Diagram

G.A.S. Grouping and Sorting

G.P.S. GET and POST Separator

GUI Graphical User Interface

H.A.R. High Access Rate

H.T.S. HTTP Traffic Separator

HTTP Hyper Text Transfer Protocol

IID Iterative and Incremental Development

IRC Internet Relay Chat

L.A.R. Low Access Rate

LODA Level of Danger Analysing

NBA Network Behaviour Analysis

P2P Peer-to-Peer

P.A.R. Periodical Access Rate

RAD Rapid Application Development

SDLC System Development Life Cycle

VOU Validation of User-Agent
http://databases.about.com/cs/specificproducts/g/er.htmhttp://databases.about.com/cs/specificproducts/g/er.htm


15/124

1

Chapter 1: Introduction1.1Background

The development of computer networking, followed by the Internet in the second half

of the last century, can be said to be one of the key technological developments that has

revolutionised our daily life. The convenience and speed of digital communication has

become an integral part of home computer use, as well in every other aspects of human

activities, today, from education to business and research. While high-speed computer

networking and the Internet have brought great convenience, a number of security

challenges have also emerged with these technologies (O'Connor, 2004; Tanenbaum,

2002).

With the increasing use of computer networks and Internet on a global scale, network

security becomes an important issue. In fact, without having adequate network security all

the benefits brought by these technological developments would be lost as the networks and

Internet are vulnerable to malicious attacks. These attacks or threats can come in different

forms and can generally be categorised as: Viruses and Worms; Trojans; Backdoors;

Spyware; Phishing; and Botnets. Among all these threats, the Botnet is considered the most

dangerous (Barroso, 2007; Jae-Seo, HyunCheol, Jun-Hyung, Minsoo, & Bong-Nam, 2008;

Star, 2008)

A Botnet is a linked group of infected computers (termed as bots or zombie), which

communicate with each other and get their commands from a controller, called Botmaster.

A Botmaster has a mechanism to control their Botnets by sending commands to the bots

and receiving response from them. Different command and control mechanisms (e.g. IRC,


16/124

2

HTTP, and P2P) are used by Botmasters to achieve this goal (Govil & Jivika, 2007;

Naseem, shafqat, Sabir, & Shahzad, 2010).

The main aim of Botnets is to carry out different types of malicious activities or to gain

illegal profits. Some of these activities such as Distributed Denial of Service (DDoS),

Spamming, Thieving Personal Information, Illegal Hosting, Click Fraud, and Adware are

described below:

a) DDOS: this is the distributed form of Denial of Service or DOS attack that iscarried out by sending of a large number of UDP packets, ICMP requests, or TCP

sync floods, aimed at using the resources of particular servers and forcing them to

shut down. Because the Botmasters control the Botnets, they can carry out this type

of attack from thousands of different places by sending a particular command to the

bots in the infected computers in the same Botnet (Govil & Jivika, 2007; Puri,

2003; Srikanth, Dina, Matthias, & Arthur, 2005).

b) Spamming: spamming refers to emails, which have the same content but are sentin high volume. Botnets can be considered as a perfect platform to collect different

email addresses from infected computers, and generate and send spam or phishing

emails (Yinglian et al., 2008).

c) Thieving Personal Information: Botmasters use the Botnets to steal informationand use them for their own benefits. They can set a trigger to the bots and make

them scan the websites where the important information is entered. In addition,

other applications such as key-loggers are spread by the bots to obtain important

information like personal passwords, and financial data like online banking


17/124

3

passwords, and credit card information. Depending on the size of the Botnet, a

Botmaster can collect the required data or information from thousands to millions

of computers (Al-Hammadi & Aickelin, 2008; Govil & Jivika, 2007).

d) Illegal hosting : A computer or server with a large storage and a high-bandwidthconnection to the Internet can became a target for a Botmaster to gain control and

use for file sharing, illegally (AUSCERT, 2002; Puri, 2003).

e) Click Fraud and Adware: One of the main differences between Botnets and otherInternet threats is that a Botnet can be used to make money by click frauding.

Botmasters can amass a lot of money by using their bots to click on open websites

that pay a small sum of money for each visit to the website or for each click on the

advertisement. Pop-up advertisements can also be downloaded, installed, or

displayed by bots to force a user to visit particular websites (Barroso, 2007).

In addition, the Botnets can be used to spread different types of computer threats in the

form of viruses, Trojans, Backdoors, worms, etc. This means that Botnets are not only a

threat, but also a platform for the distribution of other threats (Star, 2008).

1.2MotivationIn recent years, the Botnets have become the biggest threat to cyber security, and

have been used as an infrastructure to carry out nearly every type of cyber attacks. In a

review of the different types of malicious activities perpetrated by Botnets, it is found


18/124

4

that they are not only a dangerous threat to computer networks and the Internet, but are

also involved in other types of threats and attacks (Jae-Seo, et al., 2008; Lee, Wang, &

Dagon, 2007).

Based on the network world report in 2009, more than 11.1 million computers in the

US had been infected by the 10 most damaging Botnets. While the theft of personal

information has always been considered as one of the most disturbing Internet threats,

the Zeus Botnet alone had infected nearly 3.5 million computers and attempted to steal

sensitive information. Each bot can send an average of three spam emails or fake

messages per second, thus, the Koobface Botnet with 2.9 million infected computers

can generate more than 8 million fake messages per second (Messmer, 2009).

In addition, the detection of Botnets and their associated bots are difficult based on

justification described below:

a) Skilful Developers: Botnet developers have higher technical capabilities thanany other online attackers. Unlike other types of network threats, Botnets and

their bots are designed and developed for long-term goals, or even, for illegal

monetary gains. Botmasters have various strategies to keep the bots safe and

hidden, as long as possible (Lee et al., 2007).

b) Dynamic Nature and Flexibility: Botnets and bots have a dynamic and flexiblenature. They are continuously being updated and their codes changed by the

developers and owners to elude the traditional detection methods such as

signature-based anti-viruses. The McAfee Research Lab reported that any

success in Botnet detection is only temporary as the Botmasters frequently


19/124

5

change their strategies, and design new methods to recover and restore their

detected bots, within a short time (McAfee, 2010).

c) Using Standard Protocols: The Botnets use standard protocols to establishtheir communication infrastructure. The latest generations of Botnets, called

HTTP-based Botnets, use the HTTP protocol as their communication method.

By using the normal HTTP traffic, they disguise themselves as normal network

users and easily avoid detection by the current network security systems (Jae-

Seo et al., 2008).

d) Silent Threats: Barroso (2007) termed the Botnets as Silent Threats, as theytry to control the infected computers without the knowledge of the computer

users. The bots on infected computers will not make any unusual or suspicious

use of the CPU, memory, or other computer resources, which will, otherwise,

cause their presence to be exposed.

The examples above show that Botnet detection is a big challenge in the network

security management.

1.3Statement of ProblemCompanies computers with high-bandwidth connectivity to the Internet, university

servers, and home computers are the main targets for Botnets. The Botmasters try to get

the control of these targets and carry out their malicious activities.


20/124

6

Today, the detection of Botnets has become a main issue in the field of computer

network security. Botnets have several characteristics that make them difficult to be

detected. They are distributed very fast and the Botmasters are always trying different

techniques to protect their bots from existing anti-virus software and detection systems

(Lee, et al., 2007). Currently, there isnt any effective technique to stop Botnets and

existing detection techniques are unable to detect and prevent the Botnets sufficiently.

The McAfee Research Labs predicted that the cyber community will face more

widely-distributed and more resilient Botnets, which are difficult to detect and destroy.

Undoubtedly, network security researchers will continuously face big challenges on

this problem (McAfee, 2010).

1.4Statement of ObjectivesThe aim of this research is to develop an improved method for the detection of

HTTP-based Botnets. In this context, the objectives of this research are as follows:

To study detailed knowledge of the HTTP-based and other types of Botnets.

To evaluate the existing methods of Botnet detection.

To study an overview of the characteristics and architecture of Network

Behaviour Analysis (NBA) System.

To model a system to improve detection of the HTTP-based Botnets based

on the Network Behaviour Analysis (NBA).

To develop a HTTP-based Botnet detection system by using NBA system

architecture.

To test and evaluate the proposed and developed system and to compare it to

existing NBA methods.


21/124

7

1.5Proposed SolutionIn this thesis, a Network Behaviour Analysis system, called botAnalytics, is

developed. The botAnalytics uses software sensors which are installed on network

clients to collect information on the network flows. The information from an entire

network will be stored in the server database and will be examined by another part of

botAnalytics system, known as the analyser, to look for any evidence of HTTP-based

Botnets activities.

botAnalytics aims to be able to detect HTTP-based Botnets regardless of their size

and with very low false-positive ratio. Various types of data filtering were introduced

for first time or modified by botAnalytics to make the detection process better. In

addition, one of the HTTP header fields, User-Agent (Fielding et al., 1999), was used to

design a new algorithm to evaluate the danger level of detected suspicious activities.

1.6Thesis ScopeIn this research, the Network Behaviour Analysis technique (Scarfone & Mell,

2007) was selected as it can be modified and used to detect the HTTP-based Botnets.

Improvement will be made to existing HTTP-based Botnet detection capabilities by

adding new features. The Network Behavior Analysis technique was chosen because of

its ability to detect encrypted and new (Zero-Day) bots, despite its drawback that it

works passively, and is not suitable for real-time detection (Derek, 2009).

It is difficult to find the source codes of HTTP-based bots to establish a real Botnet,

hence, it has to be simulated and implemented using appropriate programming

approaches. The implementation of the bots in this research is based on two existing


22/124

8

HTTP-based bots - the Black Energy (Nazario, 2007) and Bobax (Joe, 2004). Black

Energy and Bobax were selected because the methods proposed by the other researches

such as Jae-Seo et al. (2008) and Gu, Zhang, & Lee (2008) used these bots to evaluate

their methods. Thus, the same bot structure can also be used to evaluate the new

proposed method developed in this research, and compare it with other methods.

1.7Thesis OrganisationChapter 1 (Introduction): This chapter presents an overview of Botnets and their

malicious activities, the motivation of this research, problem statements, the objectives,

and the scope of this research.

Chapter 2 (Literature Review): This chapter presents information from the

literature on Botnets characteristics, lifecycle, and architecture. It also gives an

overview of current Botnet detection methods.

Chapter 3 (Modeling of Detection System): This chapterpresents the steps

involved in modeling the HTTP-based Botnet detection system to achieve the

objectives.

Chapter 4 (Implementation of Proposed Model): This chapter discusses the steps

involved in developing the proposed system.

Chapter 5 (Testing the Proposed Method): This chapter discusses the steps

involved in testing the proposed system, and the testing process.


23/124

9

Chapter 6 (Result Analysis and Discussion): This chapterpresents the research

findings, and discusses the effects of the new filters and algorithms developed.

Chapter 7 (Conclusion and future work): This chapter provides a summary of the

whole research and the significance of its findings. It also gives recommendations for

related work to be undertaken, in future.


24/124

10

Chapter 2: Bot and BotnetsThis chapter presents a review of the literature on other researches on Botnets, and

the methods for Botnet detection. Section one gives an overview of bots and Botnets.

Section two discusses the characteristics of Botnets, including the life cycle, the

Botmasters functions, and their prime targets, as well as their command and control

mechanisms. Existing Botnet detection methods are reviewed in section three. The last

section presents the network behaviour analysis technique, and background information on

its use in Botnet detection.

2.1 Introduction

A bot (originates from the term robots) is an application that can perform and

repeat a particular task faster when compared to human. When a large number of bots

spread to different computers and connect to each other through the Internet, they form a

group called Botnet, which is a network of bots (Mitsuaki et al., 2007). Botnets range in

size from a large Botnet having millions of bots, to a small Botnet having thousands of

bots, only. Regardless of their size, which has a direct link to their complexity and purpose,

Botnets are mainly created to carry out malicious activities in computer networks (Govil &

Jivika, 2007; Lee, et al., 2007; Zhaosheng et al., 2008).

A bot is designed to infect computers, and the infected computers become a part of

a Botnet without their owners knowledge, and come under the control of a person, known

as the Botmaster. The Botmaster sends orders to all the bots and controls the entire Botnet

through the Internet and the servers, known as the command and control (C&C) servers

(Govil & Jivika, 2007; Zhaosheng, et al., 2008).


25/124

11

2.2 Characteristics of Botnet

2.2.1 Botnet life cycle and Botmaster activities

Botnets can be of different sizes or structures but, in general, they go through the

same stages in their life cycle (Govil & Jivika, 2007; Schiller & Binkley, 2007). Figure 2-

1 shows the life cycle of Botnets.

a) InfectionThe life cycle of a Botnet begins with the infection of the different

computers by its bots. An infected computer is known as a zombie (Lee, et al.,

2007) .

Figure 2-1: Botnet life cycle (Schiller & Binkley, 2007)


26/124

12

b) RallyingAfter infecting the computer, the bot must connect to its Command and

Control (C&C) server and let the Botmaster know that it has already

established a zombie, successfully. In addition, it updates itself with essential

information such as updating the list of relative C&C server IP address list.

Therefore, rallying refers to the process when the bots connect to the C&C

server for the first time (Schiller & Binkley, 2007).

c) Get Commands and Send ReportsDuring this stage, the bots on the infected computers or zombies, listen to

the Command and Control server or connect to them periodically to get new

commands from the Botmaster. A new command, when detected by the bots, is

treated as an order; they execute the order and the results are reported to the

Command and Control server; the bots then wait for new commands (Govil &

Jivika, 2007; Schiller & Binkley, 2007).

d) AbandonWhen a bot is no longer usable (e.g. too slow) or the Botmaster decides that

the particular bot is no longer suitable, it may be abandoned by the Botmaster.

If this happens, the Botnet is still available. A whole Botnet is destroyed when

all its bots are detected or abandoned or when the Command and Control

Servers are detected and blocked (Schiller & Binkley, 2007).


27/124

13

e) Securing the BotnetOne of the important issues in each Botnet life cycle is the constant effort to

keep the whole Botnet secure. The Botmasters do this by encrypting the

messages that are delivered between the bots, and between the bots and the

Command and Control servers. In addition, Botmasters may update the bots

with new codes and new techniques to evade the anti-virus software (Schiller &

Binkley, 2007).

2.2.2 Botmasters Prime Targets

The Botmasters may infect different types of computers or servers but the most

common targets are the less-monitored computers, high-bandwidth connectivity,

university servers, and home computers. Computers that are connected to the Internet

using broadband connection, give attackers an opportunity to use the same bandwidth.

The not so computer-savvy home users are also prime targets of the Botmasters. These

users usually have low awareness or lack knowledge of network security, and

Botmasters take advantage of this to gain unauthorised access into the computers and

keep their bots there for a long time without being detected (Govil & Jivika, 2007; Puri,

2003).

2.2.3 Botnet Command and Control (C&C) Mechanism

As discussed in the previous sections, a Botnet threat comes from three main

elements - the bots, the Command and Control (C&C) servers, and the Botmasters. The

bots infect the computers, and the Command and Control servers distribute the


28/124

14

Botmasters order to the bots in infected computers. These three elements have close

communication with one another, thus, they will be useless without some form of

Command and Control mechanism for this to take place (Gu, Zhang, & Lee, 2008).

The Command and Control mechanism creates an interface between the bots, C&C

servers and the Botmasters, to transmit data between them. It is very crucial for

Botmasters to establish a fool-proof connection between themselves, the infected

computers, and C&C servers (Govil & Jivika, 2007). Figure 2-2 shows the logical

relationship between these three elements.

Figure 2-2: General schema of Botnets C&C mechanism

There are two types of Botnet command and control architectures - centralised and

decentralised - based on the way communication is implemented (Chao, Wei, & Xin,

2009; Zeidanloo & Manaf, 2009).

2.2.4 Centralised Command and Control Mechanism

In the centralised command and control approach, all the zombies or bots are

connected to the central C&C server, which is constantly waiting for new bots to be

connected. Depending on the Botmasters settings, a C&C server may provide some


29/124

15

services to register the available bots, and this will make it possible to track their

activities. Undoubtedly, the Botmaster must be connected to the C&C server to have

control of the Botnets and distribute its commands and tasks (Gu, et al., 2008; Jing,

Yang, Kaveh, Hongmei, & Jingyuan, 2009; Lee, et al., 2007; Ping, Sherri, & Cliff,

2007). Figure 2-3 shows the structure of a Centralised Command and Control Botnet.

Figure 2-3: Centralised Botnet (Ping, et al., 2007)

Centralised Botnets are the most common type of Botnets as they use simple steps

to create and manage the bots, and response is fast (Gu, et al., 2008; Jing, et al., 2009;

Ping, et al., 2007). The centralised C&C mechanism is divided into two main types -

IRC-based or HTTP-based - based on the communication protocols they use to

establish their connection (Naseem, et al., 2010; Zeidanloo & Manaf, 2009; Zhaosheng,

et al., 2008).


30/124

16

2.2.4.1 IRC-based Botnets

IRC or Internet Relay Chat is a system that is used by computer users to

communicate online or chat in real-time mode (Kalt, 2000). This method was used in

the first generation of bots, at which the Botmaster used the IRC server and the relevant

channels to distribute their command (Jae-Seo, et al., 2008). Each bot connects to the

IRC server and channel that has been selected by a Botmaster, and waits for commands.

In this setup, the Botmaster establishes real-time communication with all the connected

bots, and controls them. The IRC bots follow the PUSH approach, which means that

when an IRC bot connects to a selected channel, it does not get disconnected, and

remains in the connect mode (Gu, et al., 2008; Naseem, et al., 2010; Ping, Lei, Baber, &

Cliff, 2009). Figure 2- 4 shows the IRC-based Command and Control Botnets.

Figure 2-4: IRC-based C&C Botnet (Gu, et al., 2008)


31/124

17

2.2.4.2 HTTP-based Botnets

HTTP-based Command and Control is a new technique that allows the Botmasters

to control their bots by using the HTTP protocol (Jae-Seo, et al., 2008). In this

technique, the bots use specific URL or IP address defined by the Botmaster, to

connect to a specific web server, which plays a Command and Control Server role

(Naseem, et al., 2010).

HTTP bots adopt the PULL approach, unlike the PUSH approach used by the IRC-

based bots. In the PULL approach, the HTTP-based bots do not remain in the connect

mode after it has established a connection to the Command and Control server, the first

time. In the PULL approach, the Botmasters publish the commands on certain web

servers, and the bots periodically visit those web servers to update themselves or get

new commands. This process continues at a regular interval, that is defined by the

Botmaster (Gu, et al., 2008; Jae-Seo, et al., 2008; Naseem, et al., 2010; Ping, et al.,

2009). Figure 2-5 shows the HTTP-based Command and Control Botnets.

Figure 2-5: HTTP-based C&C Botnet (Gu, et al., 2008)


32/124

18

2.2.5 Decentralised or P2P Command and Control Mechanism

The decentralised Command and Control architecture is based on the peer-to-peer

network model. In this model, the infected computers or zombies can act as a bot and as

a C&C server at the same time (Ianelli & Hackworth, 2005.; Jing, et al., 2009; Naseem,

et al., 2010). In fact, in P2P Botnets, instead of having a central C&C server, each bot

acts as a server to transmit the commands to its neigbouring bots. The Botmaster sends

commands to one or more bots, and the bots that receive the commands then deliver

them to other bots, and this process is repeated by each bot that receives a new

command.

Unlike the centralised Botnet, creating and managing the P2P Botnets involve

complex procedures and require a high level of expertise (Gu, et al., 2008; Jing, et

al., 2009; Ping, et al., 2007) . Figure 2-6 shows the structure of a decentralised

Command and Control Botnet.

Figure 2-6: Decentralized or P2P Botnet (Ping, et al., 2007)


33/124

19

2.3 Why Choose HTTP Botnets?

As discussed in sections 2.2.4 and 2.2.5, there are three different types of Botnets -

IRC, HTTP, and P2P. The reasons for choosing the HTTP-based Botnets for this

research, are as follows:

In the first generation of Botnets, the IRC technology was used by Botmasters to

control the bots because the IRC system has several advantages such as ease of use,

ease of control, and ease of management (Ianelli & Hackworth, 2005.; Jae-Seo, et al.,

2008). However, the main weakness of IRC Botnet is the central control mechanism. A

whole Botnet can be destroyed by blocking the IRC server or blocking the IRC ports.

Hence, the P2P Botnets were designed to overcome this problem (Wei, Tavallaee,

Goaletsa, & A. Ghorbani, 2009; Zhaosheng, et al., 2008).

In the decentralised Botnets or P2P Botnets, there is no central Command and

Control server, rather, there are multiple distributed servers. Commands are delivered

bot by bot to the entire Botnet. In addition, some decryption methods are used to make

the communication secure (Gu, et al., 2008; Ianelli & Hackworth, 2005.; Ping, et al.,

2007) . These techniques make it more difficult to detect P2P Botnets as compared to

the IRC Botnets. However, P2P Botnets are not as widely used as IRC Botnets because

the implementation and control of P2P bots can be quite difficult and complex. In

addition, there is no latency in message delivery in P2P Botnets, and also the

Botmasters are not able to know about the delivery status of the commands (Bailey,

Cooke, Jahanian, Yunjing, & Karir, 2009).

Recently, Botmasters have begun to use the centralised Command and Control

structure, again. However, the HTTP protocol is used in place of the IRC protocol


34/124

20

(Jae-Seo, et al., 2008; Naseem, et al., 2010), and also port 80 is used. Because of the

wide range of services used, it is not easy to block the central Command and Control

server (Sandvine, 2006). In addition, by using the HTTP protocol, bots hide their

communication flows among the normal HTTP flows, and avoid detection by the

network defenders such as the firewalls (Chao, et al., 2009; Govil & Jivika, 2007;

Zeidanloo & Manaf, 2009).

From the review of the characteristics of IRC, P2P, and HTTP-based Botnets, it is

clear that the HTTP command and control mechanism is a new technology that is

preferred by Botmasters. Compared to the IRC and P2P Botnets, HTTP-based Botnets

have a set of attributes that make it difficult for them to be detected. Surprisingly, the

number of researches focusing on the detection of HTTP-based Botnets is relatively low

as compared to the number of researches on the detection methods for IRC-based and

P2P Botnets.

The following sections discuss the past and current researches on Botnet detection

methods.


35/124

21

2.4 Existing Botnet Detection Methods

This section discusses the current methods and research on Botnet detection.

2.4.1 Honeypot and Honeynet

Honeypots are tools that are used as traps for bots as they can detect bots or collect

information on their activities. The information can be used to understand more about

bots behaviour or the intentions of the Botmasters. Nepenthes is a good example of a

Honeypot that is used to collect the bots binary codes and other information about them

(Niels & Thorsten, 2007; Rajab, Zarfoss, Monrose, & Terzis, 2006).

Freiling, Holz, and Wicherski (2005) used Honeypots to collect information about

DDOS attacks. This information includes DDOS signs and characteristics, cases, and

the attackers intention and behaviour. This information is useful for the development

of methods to prevent DDOS attacks.

Similarly, Rajab et al. (2006) combined several Honeypots as a multifaceted

approach to collect a large amount of information about IRC bots. By analysing the

data, as well as tracking the activities of the bots, they learned more about the bots

characteristics and behaviour.

Like any other tools and techniques, Honeypots have their weaknesses. There are

two types of Honeypots - low-interaction honeypots, and high-interaction honeypots.

The main difference between them is the level of access rights to system resources,

services, and functions.


36/124

22

Low-interaction honeypots like Nepenthes, are installed on computers to emulate

limited services of their operating system, thus, they provide Botmasters limited

interaction with the computers. Therefore, these computers may not be completely

compromised, and the information collected on them may not be sufficient for analysis

to detect Botnets (Niels & Thorsten, 2007).

On the other hand, the high-interaction honeypots do not emulate any services of

operating system but provide the real system and services. The Botmaster can use this

real services to gain full control of the computer in which the high-interaction honeypot

is installed (Niels & Thorsten, 2007).

Today, it is not surprising that Botmasters use many techniques to avoid the

honeypots (C. Zou & Cunningham, 2006) .

2.4.2 Detection by Signature

Signature refers to the known patterns or characteristics of threats from intruders

into computer systems. By analysing and comparing these patterns or characteristics, it

is possible to distinguish the threat activities from the normal activities (Scarfone &

Mell, 2007).

Goebel and Holz (2007) used an IRC nickname as signature. Using this method,

known as Rishi, a reasonable amount of information on IRC traffic can be collected.

Subsequently, all the IRC nicknames are extracted from the collected data and checked

for known bots nicknames by using some regular expressions. To reduce the amount

of comparison and the time taken, Goebel and Holz used a white list and a black list.


37/124

23

The signature-based detection method is not very effective because this method

cannot identify new behaviour patterns or certain characteristics. This method is based

on a simple comparison of the collected information with the predefined characteristics

of well-known bots. Thus, this method is good for detecting well-known bots, but quite

useless for detecting new and zero-day bots (Chao, et al., 2009; Scarfone & Mell,

2007).

2.4.3 Detection by DNS Monitoring

Monitoring and analysing the DNS traffic generated by bots had been used as a

technique to detect Botnets. Choi, Lee, Lee, and Kim (2007) found that bots generate

DNS traffic in some situations, for example, when identifying the Command and

Control server or arranging attacks such as DDOS attack. The researchers used three

main differences between the bot-generated DNS flows and the normal DNS flows, as

ways to detect Botnets.

The first difference they noticed is the amount of the source IP addresses that send

the DNS queries to specific domain names. The Botnet DNS queries are generated by a

fixed number of IP addresses that belong to the bots in the same Botnet. On the other

hand, a number of IP addresses of legitimate DNS queries generated by anonymous

users to a particular domain name, are random.

The second difference is the difference in the format and frequency of DNS queries

generated by bots and by normal users. Bots have similar group activities, thus, DNS

queries of the same format are generated by bots from the same Botnets, intermittently,


38/124

24

and only in special situations, but the DNS queries of normal users are generated

continuously, and in a random format.

The third difference is that normal users hardly use a distributed DNS

(DDNS), whereas, it is used by bots.

Salomon and Brustoloni (2008) also used DDNS as base parameters to suggest two

approaches for Botnet detection. They found that Botmasters do not use certain

Command and Control servers for a long time, and periodically change the servers. In

this situation, bots will try to find the address of a new Command and Control server.

When this happens, there will be a higher number of DDNS queries to specific domain

names. These are signs of unusual activities of Botnets.

NXDOMAIN has been evaluated as another parameter. The term, NXDOMAIN or

Non-Existent Domain, describes the special state that accrues when the DNS resolvers

are unable to resolve a certain domain name for any reason such as change of domain

names, unregistered domain names, or server problems. Salomon and Brustoloni

(2008) suggested that the high number of DDNS queries containing the NXDOMAIN

code, could have been generated by the bots, which are searching for their Command

and Control servers that might have been blocked, or moved.

DNSBL or DNS Block List is a list of spamming computers and network IP

addresses. Ramachandran, Feamster, and Dagon (2006) stated that DNSBLs may be

checked by Botmasters to keep themselves aware of their bots status - to find out

whether a particular bot is being blocked. Thus, their algorithms are designed to

distinguish the normal DNSBL queries (generated in a normal service such as mail

servers) from the queries generated by Botmasters.


39/124

25

The detection methods, discussed above, were designed to analyse bots and Botnet-

generated DNS (domain name system) queries. These methods are no longer effective

as the new generation of bots and Botnets have been designed to generate minimum

number of DNS queries. Moreover, the process of analysing DNS is very complex (Jae-

Seo, et al., 2008).

2.4.4 Detection using Attack Behaviour Analysis

In this method, the characteristics and behaviour of attacks have been studied by

researchers more than other issues such as the bots, Command and Control servers, or

Botmaster behaviour, or the communication methods used.

Hu, Knyz, and Shin (2009)proposed a system, called RB-Seeker, which has three

different sub-systems to detect bots that carry out URL redirection attacks. The first two

sub-systems of the method attempt to identify all domains, which are related to

redirection activities, based on the characteristics and behaviour of the URL redirection

attack. At this stage, the system does not make any decision about the domain status,

which can either be normal or malicious. In the next stage, the third sub-system

examines the DNS queries to distinguish the malicious domains from the normal

domains.

This method, however, uses DNS-based techniques, but the main aim is to focus

more on URL redirection activities, and DNS probing is used only as a sub-system.

Therefore, this method does not belong to the DNS-based category.

Brodsky and Brodsky (2007) found that a higher number of spam emails are sent by

bots, within a short period than those sent by humans. Based on this observation, the


40/124

26

source of spam emails were identified and recorded. Subsequently, the number of spam

emails generated by the same recorded sources, within a short period, was used as a

parameter for decision-making.

Likewise, Yinglianet al. (2008) designed a system to collect all the URLs that were

sent by the spam emails, and divided them into different groups based on their Web

domains. In the next step, all the URL groups were given the regular expression

generator to create a signature for malicious URLs.

These methods can identify bots based on the similarity of their group activities.

The methods are effective when countering attacks from a large number of attackers.

2.5 Detection Based on Network Behaviour Analysis

Network Behaviour Analysis or NBA is a method that can be used to collect a wide

range of information and statistics about network traffic. The information is analysed to

detect for any signs of threats or malicious activities. The NBA method consists of

several components that include the sensors and management servers (Analyser)

(Scarfone & Mell, 2007; Timofte & Romania, 2007).

The NBA system collects information such as IP addresses, operating system,

available services, and logging data such as Timestamp, event type, network protocols,

host ports, and additional packet header field for each client (Scarfone & Mell, 2007) .


41/124

27

2.5.1 Why Choose Network Behaviour Analysis?

The Network Behaviour Analysis system has been chosen for this research for two

main reasons:

a) Ability to Detect Unknown Threats:Botmasters update their techniques day-by-day to hide their activities from

existing detection methods (Lee, et al., 2007). The NBA system can thwart the

Botmasters strategy as it can detect unknown (zero-day) threats. This feature of

the NBA system can further improve Botnet detection (Derek, 2009; Scarfone &

Mell, 2007).

b) Ability to Detect Encrypted Threats:Botnets try to hide their communication flow among normal web traffic (e.g.

HTTP C&C) (Zeidanloo & Manaf, 2009) or use encryption methods (e.g. P2P

C&C) (Ping, et al., 2007). NBA looks out for abnormal flow patterns in network

traffic, and not at the content of the information being transmitted (Rehak et al.,

2009).

In addition, the benchmark report from Aberdeen Group (Derek, 2009) pointed out

that the NBA methods produce good results when combined with other methods.

2.5.2 Existing Detection Methods Based on NBA

The Network Behaviour Analysis technique has been widely used by researchers for

Botnet detection for many years.


42/124

28

Strayer, Walsh, Livadas, and Lapsley (2006)designed a system to detect IRC bots

using five filters. Initially, the IRC chat traffic is separated from the other types of

traffic. The IRC traffic is then examined using five different filters to reduce the amount

of useless traffic flows. The first filter is applied to reduce the amount of IRC traffic

based on the assumption that bots use only TCP-based IRC flows.

The other four filters, respectively, further reduce the IRC traffic tracked based on

the following criteria: flows that only have a SYN and RST flags; high bit rate flows;

average packet size is bigger than expected; and short duration flows. In the last stage

of filtering of the IRC traffic flow, the machine-learning technique proposed by

Livadas, Walsh, Lapsley, and Strayer (2006) is applied. Finally, a five-dimensional

correlation algorithm is used to make a final decision to detect IRC bots (Strayer,

Walsh, Livadas, & Lapsley, 2006).

Gianvecchio, Xie, Wu, and Wang (2008) studied the results from different

measurements, which show the difference between the bot behaviour and human

behaviour in the IRC chat. They noticed a difference between the bots and human with

respect to the inter-message delay and message size in the Internet chat rooms. After

analysing these two parameters, they proposed a system that uses entropy and machine-

learning-based classifiers to detect chat bots.

Mitsuaki et al. (2007) introduced three metrics - relationship style, response time,

and synchronization activities - for detecting bots. Because the Botmasters are

connected to the bots via Command and Control servers, they assume that there is a 1 to

N relationship between the Botmaster and the bots in a Botnet. Mitsuaki et al. use the

structure of this relationship as a metric to detect Botnets.


43/124

29

They also observed that the IRC chat bots respond faster than human, hence, the

response time is used as the second metric. Finally, they observed that the bots get their

commands from the Botmaster. This means that the bots may perform abnormal

activities to be in synchronisation with other bots in the same Botnet. This

synchronisation activity is used as another metric.

Wei et al. (2009) categorised the services or applications using signature-based and

decision tree classifiers. They categorised the network applications into IRC chat, P2P,

and web applications. Then, focusing on each category, they use the response time, and

synchronisation activities metrics introduced by Mitsuaki et al. (2007) to differentiate

the bot activities from the normal activities.

Guofei, Phillip, Vinod, Martin, and Wenke (2007) proposed the BotHunter that

models the five subsets, which may happen during the infection process by bots. They

set these subsets in different correlation engines to examine the traffic flows to look for

any evidence of Botnet activities.

BotSniffer (Gu, et al., 2008) and its extension BotMiner (Guofei, Roberto, Junjie, &

Wenke, 2008), are Botnet detection systems that carry out their tasks by analysing the

similarity in the abnormal or malicious activities generated by the bots of the same

Botnet.

Jae-Seo et al. (2008) used a parameter based on one of the pre-defined

characteristics of HTTP-based Botnets. As discussed earlier, the HTTP bots

periodically connect to a particular Command and Control server to get updates. The

researchers suggested that there is a degree of periodic repeatability or DPR to show the


44/124

30

rate of periodic connections to certain servers. The value of DPR is used as a parameter

to detect HTTP-based bots.

The next section will evaluate some of the methods used in past researches and

compare them to the system developed in this research.

2.5.3 Evaluation and Comparison of Existing NBA Methods for Botnet Detection

A Botnet detection system, called botAnalytics, was developed in this research to

detect HTTP-based Botnets. The reasons for choosing HTTP-based bots, and the

Network Behaviour Analysis approach for the design of botAnalytics, had already been

discussed. In this section, botAnalytics will be compared with other methods from past

researches that also used the NBA technique. Table 2-1 shows the comparison, in brief.

Table 2-1: Comparison of Methods from Past Researches with botAnalytics

As shown in table 2-1, all the methods are able to detect unknown (zero-day) bots.

This ability is one of the main advantages of using the Network Behaviour Analysis

system, as discussed earlier. botAnalytics was designed to detect HTTP-based Botnets,


45/124

31

hence, it cannot be compared with the first five methods, that were designed to detect

IRC-based Botnets.

Jae-Seo et al. (2008) proposed a system to detect only HTTP-based bots. In this

method, normal applications can incorrectly be detected as bots, and this can produce

very high false-positive results.

The methods proposed by Guofei et al. (2008) and Gu et al.(2008)were designed to

detect all three types of bots IRC-based, P2P, and HTTP-based bots. In general, their

methods produce low false-positive results, but their sub-systems, which are involved in

detecting HTTP-based bots, produce high false-positive results. This is because the

proposed HTTP-based Botnet detection sub-systems have the same design as that

proposed by Jae-Seo et al.

As discussed earlier, the technique proposed by Guofei et al. and its extension by

Gu et al., are based on the similarity of the bots group activities, and use data mining

approaches. These techniques work with a Botnet that has a large number of bots to

produce results to make better decision. For this reason, these methods are not effective

in small-scale Botnets.

Gu et al. (2008) proposed a method to detect small-scale Botnets, but this method

has a direct relationship with the false-positive rate, which means that if its

effectiveness in small-scale Botnets increases, the false-positive ratio also increases.

The botAnalytics system developed in this research was aimed at overcoming the

weaknesses of BotSniffer (Gu, et al., 2008), and BotHunter (Guofei, Phillip, Vinod,

Martin, & Wenke, 2007). It can detect even a very small-scale Botnet that has only one


46/124

32

bot. In addition, botAnalytics produces very low false-positive rate, unlike the method

developed by Jae-Seo et al. (2008).

2.6 Conclusion

There are three types of Botnet based on the way their bots communicate with each

other. IRC-based and HTTP-based Botnets are called centralised and P2P is called

decentralised. HTTP-based bots are the latest generation of Botnets that hide their

activity by using the normal HTTP traffic.

HTTP-based Botnets have a set of characteristics that make its detection difficult

compared to the IRC and P2P Botnets. There are a several methods and techniques that

have been used by researchers to track the Botnet activities and detect them, but the

number of researches in HTTP-based Botnet detection is low as compared to the

number of researches on the detection methods for IRC-based and P2P Botnets.

The ability of the NBA system to detect unknown and encrypted threats made it the

preferred system to modeling botAnalytics. Next chapter discusses the process of

modeling a detection system based on NBA architecture.


47/124

33

Chapter 3: Modeling of Detection System3.1 Introduction:

This chapter describes the method adopted to carry out the research on modeling a

new system for detecting HTTP-based Botnet. As described in literature review, in this

research a detection method has been proposed by using the network behaviour analysis

(NBA) architecture (Derek, 2009; Scarfone & Mell, 2007). The proposed method use NBA

architecture to collect a wide range of information and statistics about particular network

traffic. Then the collected information is analysed to search for any signs of bots and Botnet

activities.

3.2 Proposed Method Architecture

There are three layers in proposed method architecture - data collecting platform,

data storing platform, and data analysing platform. Based on the NBA structure, the

proposed method consists of several components that include the software sensors and

management server (Analyser) (Scarfone & Mell, 2007; Timofte & Romania, 2007). Figure

3-1 shows the schema of proposed method architecture.

Figure 3-1: botAnalytics System Architecture


48/124

34

3.2.1 Data Collecting PlatformThe data collecting platform consists of a set of software sensors, which had

been installed on each client in a particular network. The main task of the data

collecting platform is to collect data of the HTTP traffic in each client and to store

the data in the database. This platform also uses a set of filters and other techniques

to separate out data on unwanted traffic.

3.2.2 Data Analysing PlatformThe data collected by the data collecting platform are analysed by the data

analysing platform to detect suspicious activities associated with a bot or Botnet. A

set of filters and techniques are used by this platform to make the analysis process

fool-proof.

3.2.3 Data Storing PlatformThe data storing platform is the place where the collected data are kept

before and after the analysis process. All the results are saved in the database to

maintain the history of the system performance.

3.3 Data Reduction Filters

In addition to sniff network traffic, the proposed data collecting platform apply two

filters on collected data to filter out the useless data from being collected, and reduce the

amount of unwanted data.


49/124

35

3.3.1 HTTP Traffic Separator FilterHTTP Traffic Separator filter (H.T.S) was designed to separate the HTTP

traffic from other types of traffic in the network. botAnalytics was designed to

detect HTTB-based Botnets. As mentioned in section 2.2.4, HTTP-based Botnets

use the HTTP traffic; hence, the data on other types of network traffic are not

collected. Figure 3-2 shows the flowchart of this filter.

Figure 3-2: The flowchart of H.T.S. filter

3.3.2 Get and Post Separator FilterThe Get and Post Separator (G.P.S.) filter designed to select only the HTTP

traffic with GET or POST methods. The HTTP-based bots use the GET or POST

methods to contact their Command and Control server, thus, the other methods

provide no information about bot activities (Joe, 2004; Naseem, et al., 2010;

Nazario, 2007). Therefore, The G.P.S. filter focuses on the HTTP methods, and

only selects the HTTP traffic with the GET and POST methods. Figure 3-3 shows

the flowchart of this filter.


50/124

36

Figure 3-3: The flowchart of G.P.S. filter

3.4 VOU MechanismThe VOU or Validation of User-Agents mechanism was designed based on a unique

algorithm. It is used, for first time in this research in the data collecting platform. This

mechanism defines the VOU field for each collected HTTP traffic packet with an

appropriate value.

The VOU mechanism acts on each collected packet of HTTP traffic with the GET

or POST methods, and obtains the User-Agent from the collected traffic header. In the next

step, the VOU tries to define the User-Agent string and its corresponding application from

the installed application list. The install application list contains the list of applications and

services, which are available on each client within a network, together with their

corresponding User-Agent. This list can be updated by users or automatically from

websites such as www.user-agents.org . Figure 3-4 shows the flowchart of the VOU

mechanism.
http://www.user-agents.org/http://www.user-agents.org/


51/124

37

Figure 3-4: The VOU Module Flowchart

For each collected HTTP packet, the VOU field is updated with either one of

three different values, based on different conditions, as explained below:


52/124

38

1) UNKNOWN valueIf the VOU mechanism is not able to determine the User-Agent for

any reason, for example, due to encryption or use of fake User-Agents, the

VOU field of the collected traffic will be given the UNKNOWN value. If

the VOU mechanism is able to determine the User-Agent but is not able to

identify the corresponding application, the VOU field will also be given the

UNKNOWN value.

2) VALID valueThe VOU field will be set to the VALID value if the User-Agent and

its corresponding application have been identified, and the corresponding

application has been installed on the client and is available at the same time.

3) NOTVALID valueIf the User-Agent and its corresponding application have been

identified but the corresponding application is not available on the client, the

VOU field will be given the NOTVALID value.

3.5 Analysing the Collected TrafficThe data collecting platform periodically sniffs the network traffic and applies the

H.T.S. and G.P.S. filters to select only HTTP-type traffic using the GET or POST method.

In addition to these filters the VOU mechanism is applied on collected data as described on

section 3.4. When a reasonable number of packets have been collected and stored in the

data store platform, the Analyser begins its work in the data analysing platform as follows:


53/124

39

3.5.1 Grouping and SortingThe Grouping and Sorting (G.A.S) process sorts data on the collected traffic

and divides them into different groups based on the source IP address (SIP),

destination IP address (DIP), URL, and the User-Agent string (UA).

While the other researches mostly use source IP, destination IP and Domain

names to divide the collected traffic packets to different groups, in the proposed

method one of the HTTP header fields known as the User-Agent has been used as

another parameter beside the previous ones, to make the collected network packets

classification more accurate. The G.A.S. process categorised the traffic packets into

different groups, then the three different filters are applied to each group of packets

to search for signs of suspicious activities and presence of HTTP bots.

3.5.2 High Access Rate FilterThe H.A.R. filter or High Access Rate filter eliminates the group of similar

HTTP connections or requests that have been generated within a very short time, for

example, more than one request per second. Figure 3-5 shows the H.A.R. filter

flowchart.


54/124

40

Figure 3-5: Flowchart of H.A.R. Filter

3.5.3 Low Access Rate FilterThe L.A.R. filter or Low Access Rate filter removes the HTTP traffic with

less than 2 packets of requests in the whole data collecting period. For example, if a

group of HTTP traffic is generated within a very short time in the data collecting

period, it will be removed by this filter. Figure 3-6 shows the L.A.R. filter

flowchart.


55/124

41

Figure 3-6: Flowchart of L.A.R. Filter

3.5.4 Periodic Access Rate FilterThe P.A.R. filter or Periodic Access Rate filter selects the HTTP

connections or requests that were generated at periodic intervals. This filter was

designed based on the nature of HTTP-based Botnets. As noted in the literature

review, the HTTP bots connect to their command and control server periodically to

get the commands or updates. Figure 3-7 shows the P.A.R. filter flowchart.


56/124

42

Figure 3-7: P.A.R. Filter Flowchart

3.6 LODA MechanismLODA or Level of Danger Analysing mechanism is designed to analyse the

detected suspicious traffic to define its level of danger. Figure 4-9 shows the flow chart of

the analysing algorithm of LODA. Figure 3-8 shows the flow chart of the analysing

algorithm of LODA.

For every suspicious activity detected, the analysis process starts by examining the

VOU field value, which has been set by the VOU mechanism. If the value of the VOU field

of a particular group of suspicious traffic is VALID, the level of danger field for that group

will be set to LOW. If the VOU value is NOTVALID, the level of danger will be set to

HIGH, and if the VOU value is UNKNOWN, the next step of analysing will start.


57/124

43

Figure 3-8: LODA Module Flowchart

If the value of the VOU field is UNKNOWN, the query is referred to the database

to retrieve the count of similar traffic group, which is generated by other clients in the


58/124

44

network. The answer is compared with the limit value set by the system Administrators. If

the count is greater than the limit value, the level of da

http-based botntet detection

Documents