http-based botntet detection

Upload: oldyaar-babak

Post on 14-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 HTTP-Based Botntet Detection

    1/124

    botAnalytics: Improving HTTP-Based Botnet Detection

    by Using Network Behavior Analysis System

    Meisam Eslahi

    DISSERTATION SUBMITTED IN FULFILMENT OF THE

    REQUIREMENTS FOR THE DEGREE OF MASTER OF COMPUTER

    SCIENCE

    Faculty of Computer Science and Information Technology

    University of Malaya

    2010

  • 7/30/2019 HTTP-Based Botntet Detection

    2/124

    II

    UNIVERSITI MALAYA

    ORIGINAL LITERARY WORK DECLARATION

    Name of Candidate: Meisam Eslahi (I.C/Passport No: I2140114)

    Registration/Matric No: WGA070104

    Name of Degree: Master of Computer Science

    Title of Project Paper/Research Report/Dissertation/Thesis (this Work):

    botAnalytics: Improving HTTP-Based Botnet Detection by Using Network Behavior

    Analysis System

    Field of Study: Network Security

    I do solemnly and sincerely declare that:

    (1) I am the sole author/writer of this Work;(2) This Work is original;

    (3) Any use of any work in which copyright exists was done by way of fair dealingand for permitted purposes and any excerpt or extract from, or reference to or

    reproduction of any copyright work has been disclosed expressly andsufficiently and the title of the Work and its authorship have been acknowledged

    in this Work;(4) I do not have any actual knowledge nor do I ought reasonably to know that the

    making of this work constitutes an infringement of any copyright work;(5) I hereby assign all and every rights in the copyright to this Work to the

    University of Malaya (UM), who henceforth shall be owner of the copyrightin this Work and that any reproduction or use in any form or by any means

    whatsoever is prohibited without the written consent of UM having been firsthad and obtained;

    (6) I am fully aware that if in the course of making this Work I have infringed anycopyright whether intentionally or otherwise, I may be subject to legal action or

    any other action as may be determined by UM.

    Candidates Signature Date

    Subscribed and solemnly declared before,

    Witnesss Signature Date

    Name: Dr Rosli Salleh

    Designation: Supervisor

  • 7/30/2019 HTTP-Based Botntet Detection

    3/124

    III

    Abstract

    This thesis reports on the research conducted to develop a method for detecting

    HTTP-based Botnets based on the Network Behaviour Analysis system.Bots are small-size

    malwares that infect computers, and join with other bots via the Internet to form a network

    of bots called Botnet.

    Botnets and their bots have a dynamic and flexible nature. The Botmasters, who

    control the Botnets, update the bots and change their codes day by day to avoid the

    traditional detection methods such as signature-based anti-viruses. In addition, many

    techniques are employed by Botmasters to make their Botnets undetectable for as long as

    possible. The latest generations of Botnets are HTTP-based, and use the standard HTTP

    protocol to communicate with their bots. By using the normal HTTP traffic, the bots passed

    off as normal users of the networks, and they can easily bypass the current network security

    systems.

    To solve this problem, a method based on network behaviour analysis system was

    developed to improve the existing methods of detecting HTTP-based Botnets and their

    bots. The system, botAnalytics, was developed by modifying the existing network behavior

    analysis methods and adding new features to them. The Delphi programming language was

    used to develop the botAnalytics system, while Microsoft Sql Server 2008 was selected as

    its database management system. New filters and algorithms were designed and developed

    to analyse the collected network packets to look for any evidence of suspicious HTTP-

    based Botnets activities.

  • 7/30/2019 HTTP-Based Botntet Detection

    4/124

    IV

    In addition to HTTP-based Botnet detection, one of the HTTP header fields, called

    the User-Agent, was used by botAnalytics to analyse the level of danger of detected

    suspicious activities. This is the first reported use of the User-Agent to aid Botnet detection.

    Based on the result of the testing and evaluation of botAnalytics, the system has been found

    to be very efficient in detecting HTTP-based Botnets. botAnalytics was also found to be

    very efficient for detecting small-scale Botnets.

  • 7/30/2019 HTTP-Based Botntet Detection

    5/124

    V

    Acknowledgements

    Thank God, the most Gracious and Merciful, for all the blessings bestowed on me.

    The submission of this dissertation marks the end of a somewhat long journey in my pursuit

    of Masters degree at the University of Malaya, Kuala Lumpur. The journey would have

    been difficult if not for all the help, understanding and kindness of many people.

    Without doubt, I would like to express my sincere gratitude to my supervisors, Dr

    Omar Zakaria and Dr Rosli Salleh for their kindness to take me under their charge to

    conduct this research. Their patience and encouragement gave me the motivation to work

    on this research until its successful completion. Their guidance and readiness to share their

    knowledge have greatly contributed to the direction I should take and what I should do to

    achieve my goal. I cannot thank them enough, and it is hoped the Malay way of expressing

    how I feel says it all Ribuan terima kasih.

    While doing my studies and research in FCSIT, one can say that one is never

    working alone. I have the friendship, goodwill and support of my course-mates and friends,

    who have never hesitated to offer their advice and moral support when it is needed. To my

    good friend, Mohsen Saghafi, in particular, thank you for being there whenever I needed

    someone to go to for advice. To all of them, especially Saiful Khan, Teh Kang Hai, Paul

    Nelson, and Ali Keshavarz a big thank you.

    I would like to express my gratitude and love to my family for their care and

    understanding when I was doing my research. To the two special women in my life, my

    mother K.Abdullahi and my wife Maryam Var Naseri, your boundless love, and for your

    confidence in me, you have been my pillars of strength and determination to help me to

  • 7/30/2019 HTTP-Based Botntet Detection

    6/124

    VI

    carry on, and if I have succeeded, then you have been a big part of my success, and I

    dedicate it to both of you together with my love.

  • 7/30/2019 HTTP-Based Botntet Detection

    7/124

    VII

    Table of Content

    Abstract ........................................................................................................................... III

    Acknowledgements .......................................................................................................... V

    Table of Content ............................................................................................................ VII

    List of Figures ................................................................................................................. XI

    List of Tables................................................................................................................. XIII

    Abbreviations ................................................................................................................ XIV

    Chapter 1: Introduction ................................................................................................ 1

    1.1 Background............................................................................................................ 1

    1.2 Motivation.............................................................................................................. 3

    1.3 Statement of Problem ............................................................................................. 5

    1.4 Statement of Objectives .......................................................................................... 6

    1.5 Proposed Solution .................................................................................................. 7

    1.6 Thesis Scope .......................................................................................................... 7

    1.7 Thesis Organisation................................................................................................ 8

    Chapter 2: Bot and Botnets......................................................................................... 10

    2.1 Introduction ............................................................................................................ 10

    2.2 Characteristics of Botnet......................................................................................... 11

    2.2.1 Botnet life cycle and Botmaster activities ......................................................... 11

    2.2.2 Botmasters Prime Targets................................................................................. 13

    2.2.3 Botnet Command and Control (C&C) Mechanism............................................ 13

    2.2.4 Centralised Command and Control Mechanism ................................................ 14

    2.2.4.1 IRC-based Botnets .............................................................................. 16

    2.2.4.2 HTTP-based Botnets ........................................................................... 17

    2.2.5 Decentralised or P2P Command and Control Mechanism ................................. 18

    2.3 Why Choose HTTP Botnets? .................................................................................. 19

    2.4 Existing Botnet Detection Methods......................................................................... 21

    2.4.1 Honeypot and Honeynet ................................................................................... 21

  • 7/30/2019 HTTP-Based Botntet Detection

    8/124

    VIII

    2.4.2 Detection by Signature ..................................................................................... 22

    2.4.3 Detection by DNS Monitoring.......................................................................... 23

    2.4.4 Detection using Attack Behaviour Analysis...................................................... 25

    2.5 Detection Based on Network Behaviour Analysis ................................................... 26

    2.5.1 Why Choose Network Behaviour Analysis? ..................................................... 27

    2.5.2 Existing Detection Methods Based on NBA ..................................................... 27

    2.5.3 Evaluation and Comparison of Existing NBA Methods for Botnet Detection.... 30

    2.6 Conclusion ............................................................................................................. 32

    Chapter 3: Modeling of Detection System .................................................................. 33

    3.1 Introduction: ........................................................................................................... 33

    3.2 Proposed Method Architecture................................................................................ 33

    3.3 Data Reduction Filters ............................................................................................ 34

    3.4 VOU Mechanism ................................................................................................. 36

    3.5 Analysing the Collected Traffic............................................................................ 38

    3.6 LODA Mechanism ............................................................................................... 42

    3.7 Proposed Method Flowchart ................................................................................. 44

    3.8 Conclusion ........................................................................................................... 46

    Chapter 4: Implementation of Proposed Model ......................................................... 47

    4.1 Introduction.......................................................................................................... 47

    4.2 DELPHI programming language .......................................................................... 47

    4.3 Client Side Implementation .................................................................................. 49

    4.3.1 Settings ....................................................................................................... 49

    4.3.2 Sniffing the Traffic ..................................................................................... 51

    4.3.3 H.T.S. filter................................................................................................ 52

    4.3.4 G.P.S. filter................................................................................................. 52

    4.3.5 VOU Mechanism ........................................................................................ 53

    4.4 Database Implementation ..................................................................................... 55

    4.4.1 Microsoft Sql Server 2008 .......................................................................... 55

    4.4.2 Tables structure ......................................................................................... 57

  • 7/30/2019 HTTP-Based Botntet Detection

    9/124

    IX

    4.4.3 Tables relationship...................................................................................... 63

    4.5 Server Side Implementation.................................................................................. 63

    4.5.1 General info ................................................................................................ 64

    4.5.2 Analyse ...................................................................................................... 66

    4.5.3 Notifications ............................................................................................... 73

    4.5.4 Report......................................................................................................... 74

    4.5.5 User Agent list ............................................................................................ 75

    4.5.6 White list .................................................................................................... 76

    4.5.7 Black list .................................................................................................... 77

    4.5.8 Sensor status............................................................................................... 77

    4.5.9 User account ............................................................................................... 80

    4.6 Conclusion ........................................................................................................... 82

    Chapter 5: Testing the Proposed Model ..................................................................... 83

    5.1 Introduction.......................................................................................................... 83

    5.2 Hardware Requirements ....................................................................................... 83

    5.3 Testing bots.......................................................................................................... 84

    5.4 Testing Command and Control servers ................................................................. 86

    5.5 Testing clients ...................................................................................................... 86

    5.6 Testing analyser................................................................................................... 87

    5.7 Testing results ...................................................................................................... 87

    5.8 Conclusion ........................................................................................................... 88

    Chapter 6: Data Analysis and Discussion................................................................... 89

    6.1 Introduction ............................................................................................................ 89

    6.2 Evaluation of botAnalytics ..................................................................................... 89

    6.2.1 Filtering evaluation .......................................................................................... 90

    6.2.2 VOU algorithm evaluation ............................................................................... 96

    6.2.3 LODA algorithm evaluation ............................................................................. 97

    6.3 Comparison of botAnalytics with Other Systems .................................................... 99

    6.3.1 False-Positive rate ............................................................................................ 99

  • 7/30/2019 HTTP-Based Botntet Detection

    10/124

    X

    6.3.2 Efficiency in small-scale Botnets.................................................................... 100

    6.4 Conclusion ........................................................................................................... 101

    Chapter 7: Conclusion and Future Work................................................................. 102

    7.1 Introduction .......................................................................................................... 102

    7.2 Achievement of Objectives ................................................................................... 102

    7.3 Contributions........................................................................................................ 103

    7.3.1 HTTP-based Botnet Detection: ................................................................. 103

    7.3.2 Establishment of User-Agent : .................................................................. 104

    7.3.3 New Filters and Algorithms: ..................................................................... 104

    7.3.4 Evaluate the Level of Danger:................................................................... 104

    7.4 Limitations and Future Work................................................................................ 105

    7.4.1 Real Time Detection: ................................................................................ 105

    7.4.2 Linux Platform ......................................................................................... 105

    7.4.3 Other Type of Bots and Botnets: ............................................................... 106

    7.4.4 Prevention Methods: ................................................................................. 106

    7.4.5 Advanced the User-Agent for Botnet Detection : ...................................... 106

    7.5 Conclusion ........................................................................................................... 107

    References .................................................................................................................... 108

  • 7/30/2019 HTTP-Based Botntet Detection

    11/124

    XI

    List of Figures

    Figure 2-1: Botnet life cycle (Schiller & Binkley, 2007) .................................................. 11

    Figure 2-2: General schema of Botnets C&C mechanism ................................................. 14

    Figure 2-3: Centralised Botnet (Ping, et al., 2007) ............................................................ 15Figure 2-4: IRC-based C&C Botnet (Gu, et al., 2008) ...................................................... 16

    Figure 2-5: HTTP-based C&C Botnet (Gu, et al., 2008) ................................................... 17

    Figure 2-6: Decentralized or P2P Botnet (Ping, et al., 2007) ............................................. 18

    Figure 3-1: botAnalytics System Architecture .................................................................. 33

    Figure 3-2: The flowchart of H.T.S. filter ......................................................................... 35

    Figure 3-3: The flowchart of G.P.S. filter ......................................................................... 36

    Figure 3-4: The VOU Module Flowchart ......................................................................... 37

    Figure 3-5: Flowchart of H.A.R. Filter ............................................................................. 40

    Figure 3-6: Flowchart of L.A.R. Filter ............................................................................. 41

    Figure 3-7: P.A.R. Filter Flowchart .................................................................................. 42

    Figure 3-8: LODA Module Flowchart .............................................................................. 43

    Figure 3-9: The Proposed Method Flowchart ................................................................... 45

    Figure 4-1: botAnalytics Client Side GUI ........................................................................ 49

    Figure 4-2: Setting GUI ................................................................................................... 50

    Figure 4-3: Traffic Sniffer GUI ........................................................................................ 51

    Figure 4-4 : H.T.S Filter GUI ........................................................................................... 52

    Figure 4-5 : G.P.S Filter GUI ........................................................................................... 53Figure 4-6 : VOU Mechanism GUI .................................................................................. 53

    Figure 4-7 : VOU Pseudo Code ........................................................................................ 54

    Figure 4-8: botAnalytics Database: Relationship between the Tables ............................... 63

    Figure 4-9: botAnalytics Server Side GUI ........................................................................ 64

    Figure 4-10: General Info GUI ......................................................................................... 65

    Figure 4-11: GET and POST Percentage Query Pseudo Code .......................................... 65

    Figure 4-12: Collected Traffic Statistics Query Pseudo Code ........................................... 66

    Figure 4-13: Primary Data Tab of the Analyse Section ..................................................... 67

    Figure 4-14: Black/White listing Tab of the Analyse Section ........................................... 68

    Figure 4-15: H.A.R. Result Tab of the Analyse Section ................................................... 69Figure 4-16: H.A.R. Filter Pseudo Code ........................................................................... 69

    Figure 4-17: L.A.R Result Tab of the Analyse Section ..................................................... 70

    Figure 4-18: L.A.R. Filter Pseudo Code ........................................................................... 70

    Figure 4-19: P.A.R. Result Tab of the Analyse Section .................................................... 71

    Figure 4-20: P.A.R. Filter Pseudo Code ........................................................................... 72

    Figure 4-21: LODA Module Pseudo Code ....................................................................... 72

    Figure 4-22: LODA Module Result GUI .......................................................................... 73

  • 7/30/2019 HTTP-Based Botntet Detection

    12/124

    XII

    Figure 4-23: Notifications GUI ........................................................................................ 74

    Figure 4-24: Report GUI .................................................................................................. 75

    Figure 4-25: User Agent List GUI .................................................................................... 76

    Figure 4-26: White List GUI ............................................................................................ 76

    Figure 4-27: Black List GUI ............................................................................................ 77

    Figure 4-28: Sensor Status GUI ........................................................................................ 78

    Figure 4-29: Sensor Info Pseudo Code ............................................................................. 79

    Figure 4-30: Top 10 Active Sensors Pseudo Code ............................................................ 79

    Figure 4-31: Edit Profile Tab ........................................................................................... 80

    Figure 4-32: Create New User Tab ................................................................................... 81

    Figure 4-33: Manage Existing Users Tab ......................................................................... 82

    Figure 5-1 : General Schema for the Testing Phase .......................................................... 84

    Figure 5-2 : The Black Energy User Agent (Nazario, 2007) ............................................. 84

    Figure 5-3: The Firefox User Agent ................................................................................. 85

    Figure 5-4: The Bobax User Agent ................................................................................... 85

    Figure 6-1: The H.T.S. Filter Results Chart (See also Table 6-1) ...................................... 91

    Figure 6-2: The G.P.S. Filter Results Chart (See also Table 6-1) ...................................... 92

    Figure 6-3: The H.A.R. Filter Results Chart (See also Table 6-1) ..................................... 93

    Figure 6-4: The L.A.R. Filter Results Chart (See also Table 6-1) ..................................... 94

    Figure 6-5: The P.A.R. Filter Results Chart (See also Table 6-1) ...................................... 95

  • 7/30/2019 HTTP-Based Botntet Detection

    13/124

    XIII

    List of Tables

    Table 2-1: Comparison of Methods from Past Researches with botAnalytics ................... 30

    Table 4-1: Comparison of Managed-code with Native-code Languages ........................... 48Table 4-2 : Comparison of DBMSs .................................................................................. 55

    Table 4-3 : Microsoft Sql Server 2008 Extra New Features .............................................. 56

    Table 4-4 : tblUser Structure ............................................................................................ 57

    Table 4-5: tblSquestion Structure ..................................................................................... 58

    Table 4-6: tblRole Structure ............................................................................................. 58

    Table 4-7: tblUserAgent Structure .................................................................................... 58

    Table 4-8: tblWhiteList Structure ..................................................................................... 59

    Table 4-9: tblBlackList Structure ..................................................................................... 59

    Table 4-10: tblClientsInfo Structure ................................................................................. 60

    Table 4-11: tblVOU Structure .......................................................................................... 60

    Table 4-12: Structure of tblHMType table ........................................................................ 60

    Table 4-13: Structure of tblVouValue Table ..................................................................... 61

    Table 4-14: Structure of tblResult Table .......................................................................... 61

    Table 4-15: Structure of tblLODA Table .......................................................................... 62

    Table 4-16: Structure of the tblNotification Table ............................................................ 62

    Table 5-1: botAnalytics Filtering Result ........................................................................... 87

    Table 5-2: botAnalytics Botnet Detection Results ............................................................ 87

    Table 6-1: botAnalytics: Results of Filtering .................................................................... 90

    Table 6-2: The VOU Algorithm Result ............................................................................ 96

    Table 6-3: The LODA Algorithm Results ........................................................................ 98

    Table 6-4: Comparison of the botAnalytics with existing HTTP-based Botnet detection

    researches ........................................................................................................................ 99

    Table 6-5: The botAnalytics False-Positive .....................................................................100

  • 7/30/2019 HTTP-Based Botntet Detection

    14/124

    XIV

    Abbreviations

    C&C Command and Control

    DBMS Database Management System

    DDOS Distributed Denial of Service

    DNS Domain Name System

    DNSBL DNS-based Black Hole List

    ERD Entity Relationship Diagram

    G.A.S. Grouping and Sorting

    G.P.S. GET and POST Separator

    GUI Graphical User Interface

    H.A.R. High Access Rate

    H.T.S. HTTP Traffic Separator

    HTTP Hyper Text Transfer Protocol

    IID Iterative and Incremental Development

    IRC Internet Relay Chat

    L.A.R. Low Access Rate

    LODA Level of Danger Analysing

    NBA Network Behaviour Analysis

    P2P Peer-to-Peer

    P.A.R. Periodical Access Rate

    RAD Rapid Application Development

    SDLC System Development Life Cycle

    VOU Validation of User-Agent

    http://databases.about.com/cs/specificproducts/g/er.htmhttp://databases.about.com/cs/specificproducts/g/er.htm
  • 7/30/2019 HTTP-Based Botntet Detection

    15/124

    1

    Chapter 1: Introduction1.1Background

    The development of computer networking, followed by the Internet in the second half

    of the last century, can be said to be one of the key technological developments that has

    revolutionised our daily life. The convenience and speed of digital communication has

    become an integral part of home computer use, as well in every other aspects of human

    activities, today, from education to business and research. While high-speed computer

    networking and the Internet have brought great convenience, a number of security

    challenges have also emerged with these technologies (O'Connor, 2004; Tanenbaum,

    2002).

    With the increasing use of computer networks and Internet on a global scale, network

    security becomes an important issue. In fact, without having adequate network security all

    the benefits brought by these technological developments would be lost as the networks and

    Internet are vulnerable to malicious attacks. These attacks or threats can come in different

    forms and can generally be categorised as: Viruses and Worms; Trojans; Backdoors;

    Spyware; Phishing; and Botnets. Among all these threats, the Botnet is considered the most

    dangerous (Barroso, 2007; Jae-Seo, HyunCheol, Jun-Hyung, Minsoo, & Bong-Nam, 2008;

    Star, 2008)

    A Botnet is a linked group of infected computers (termed as bots or zombie), which

    communicate with each other and get their commands from a controller, called Botmaster.

    A Botmaster has a mechanism to control their Botnets by sending commands to the bots

    and receiving response from them. Different command and control mechanisms (e.g. IRC,

  • 7/30/2019 HTTP-Based Botntet Detection

    16/124

    2

    HTTP, and P2P) are used by Botmasters to achieve this goal (Govil & Jivika, 2007;

    Naseem, shafqat, Sabir, & Shahzad, 2010).

    The main aim of Botnets is to carry out different types of malicious activities or to gain

    illegal profits. Some of these activities such as Distributed Denial of Service (DDoS),

    Spamming, Thieving Personal Information, Illegal Hosting, Click Fraud, and Adware are

    described below:

    a) DDOS: this is the distributed form of Denial of Service or DOS attack that iscarried out by sending of a large number of UDP packets, ICMP requests, or TCP

    sync floods, aimed at using the resources of particular servers and forcing them to

    shut down. Because the Botmasters control the Botnets, they can carry out this type

    of attack from thousands of different places by sending a particular command to the

    bots in the infected computers in the same Botnet (Govil & Jivika, 2007; Puri,

    2003; Srikanth, Dina, Matthias, & Arthur, 2005).

    b) Spamming: spamming refers to emails, which have the same content but are sentin high volume. Botnets can be considered as a perfect platform to collect different

    email addresses from infected computers, and generate and send spam or phishing

    emails (Yinglian et al., 2008).

    c) Thieving Personal Information: Botmasters use the Botnets to steal informationand use them for their own benefits. They can set a trigger to the bots and make

    them scan the websites where the important information is entered. In addition,

    other applications such as key-loggers are spread by the bots to obtain important

    information like personal passwords, and financial data like online banking

  • 7/30/2019 HTTP-Based Botntet Detection

    17/124

    3

    passwords, and credit card information. Depending on the size of the Botnet, a

    Botmaster can collect the required data or information from thousands to millions

    of computers (Al-Hammadi & Aickelin, 2008; Govil & Jivika, 2007).

    d) Illegal hosting : A computer or server with a large storage and a high-bandwidthconnection to the Internet can became a target for a Botmaster to gain control and

    use for file sharing, illegally (AUSCERT, 2002; Puri, 2003).

    e) Click Fraud and Adware: One of the main differences between Botnets and otherInternet threats is that a Botnet can be used to make money by click frauding.

    Botmasters can amass a lot of money by using their bots to click on open websites

    that pay a small sum of money for each visit to the website or for each click on the

    advertisement. Pop-up advertisements can also be downloaded, installed, or

    displayed by bots to force a user to visit particular websites (Barroso, 2007).

    In addition, the Botnets can be used to spread different types of computer threats in the

    form of viruses, Trojans, Backdoors, worms, etc. This means that Botnets are not only a

    threat, but also a platform for the distribution of other threats (Star, 2008).

    1.2MotivationIn recent years, the Botnets have become the biggest threat to cyber security, and

    have been used as an infrastructure to carry out nearly every type of cyber attacks. In a

    review of the different types of malicious activities perpetrated by Botnets, it is found

  • 7/30/2019 HTTP-Based Botntet Detection

    18/124

    4

    that they are not only a dangerous threat to computer networks and the Internet, but are

    also involved in other types of threats and attacks (Jae-Seo, et al., 2008; Lee, Wang, &

    Dagon, 2007).

    Based on the network world report in 2009, more than 11.1 million computers in the

    US had been infected by the 10 most damaging Botnets. While the theft of personal

    information has always been considered as one of the most disturbing Internet threats,

    the Zeus Botnet alone had infected nearly 3.5 million computers and attempted to steal

    sensitive information. Each bot can send an average of three spam emails or fake

    messages per second, thus, the Koobface Botnet with 2.9 million infected computers

    can generate more than 8 million fake messages per second (Messmer, 2009).

    In addition, the detection of Botnets and their associated bots are difficult based on

    justification described below:

    a) Skilful Developers: Botnet developers have higher technical capabilities thanany other online attackers. Unlike other types of network threats, Botnets and

    their bots are designed and developed for long-term goals, or even, for illegal

    monetary gains. Botmasters have various strategies to keep the bots safe and

    hidden, as long as possible (Lee et al., 2007).

    b) Dynamic Nature and Flexibility: Botnets and bots have a dynamic and flexiblenature. They are continuously being updated and their codes changed by the

    developers and owners to elude the traditional detection methods such as

    signature-based anti-viruses. The McAfee Research Lab reported that any

    success in Botnet detection is only temporary as the Botmasters frequently

  • 7/30/2019 HTTP-Based Botntet Detection

    19/124

    5

    change their strategies, and design new methods to recover and restore their

    detected bots, within a short time (McAfee, 2010).

    c) Using Standard Protocols: The Botnets use standard protocols to establishtheir communication infrastructure. The latest generations of Botnets, called

    HTTP-based Botnets, use the HTTP protocol as their communication method.

    By using the normal HTTP traffic, they disguise themselves as normal network

    users and easily avoid detection by the current network security systems (Jae-

    Seo et al., 2008).

    d) Silent Threats: Barroso (2007) termed the Botnets as Silent Threats, as theytry to control the infected computers without the knowledge of the computer

    users. The bots on infected computers will not make any unusual or suspicious

    use of the CPU, memory, or other computer resources, which will, otherwise,

    cause their presence to be exposed.

    The examples above show that Botnet detection is a big challenge in the network

    security management.

    1.3Statement of ProblemCompanies computers with high-bandwidth connectivity to the Internet, university

    servers, and home computers are the main targets for Botnets. The Botmasters try to get

    the control of these targets and carry out their malicious activities.

  • 7/30/2019 HTTP-Based Botntet Detection

    20/124

    6

    Today, the detection of Botnets has become a main issue in the field of computer

    network security. Botnets have several characteristics that make them difficult to be

    detected. They are distributed very fast and the Botmasters are always trying different

    techniques to protect their bots from existing anti-virus software and detection systems

    (Lee, et al., 2007). Currently, there isnt any effective technique to stop Botnets and

    existing detection techniques are unable to detect and prevent the Botnets sufficiently.

    The McAfee Research Labs predicted that the cyber community will face more

    widely-distributed and more resilient Botnets, which are difficult to detect and destroy.

    Undoubtedly, network security researchers will continuously face big challenges on

    this problem (McAfee, 2010).

    1.4Statement of ObjectivesThe aim of this research is to develop an improved method for the detection of

    HTTP-based Botnets. In this context, the objectives of this research are as follows:

    To study detailed knowledge of the HTTP-based and other types of Botnets.

    To evaluate the existing methods of Botnet detection.

    To study an overview of the characteristics and architecture of Network

    Behaviour Analysis (NBA) System.

    To model a system to improve detection of the HTTP-based Botnets based

    on the Network Behaviour Analysis (NBA).

    To develop a HTTP-based Botnet detection system by using NBA system

    architecture.

    To test and evaluate the proposed and developed system and to compare it to

    existing NBA methods.

  • 7/30/2019 HTTP-Based Botntet Detection

    21/124

    7

    1.5Proposed SolutionIn this thesis, a Network Behaviour Analysis system, called botAnalytics, is

    developed. The botAnalytics uses software sensors which are installed on network

    clients to collect information on the network flows. The information from an entire

    network will be stored in the server database and will be examined by another part of

    botAnalytics system, known as the analyser, to look for any evidence of HTTP-based

    Botnets activities.

    botAnalytics aims to be able to detect HTTP-based Botnets regardless of their size

    and with very low false-positive ratio. Various types of data filtering were introduced

    for first time or modified by botAnalytics to make the detection process better. In

    addition, one of the HTTP header fields, User-Agent (Fielding et al., 1999), was used to

    design a new algorithm to evaluate the danger level of detected suspicious activities.

    1.6Thesis ScopeIn this research, the Network Behaviour Analysis technique (Scarfone & Mell,

    2007) was selected as it can be modified and used to detect the HTTP-based Botnets.

    Improvement will be made to existing HTTP-based Botnet detection capabilities by

    adding new features. The Network Behavior Analysis technique was chosen because of

    its ability to detect encrypted and new (Zero-Day) bots, despite its drawback that it

    works passively, and is not suitable for real-time detection (Derek, 2009).

    It is difficult to find the source codes of HTTP-based bots to establish a real Botnet,

    hence, it has to be simulated and implemented using appropriate programming

    approaches. The implementation of the bots in this research is based on two existing

  • 7/30/2019 HTTP-Based Botntet Detection

    22/124

    8

    HTTP-based bots - the Black Energy (Nazario, 2007) and Bobax (Joe, 2004). Black

    Energy and Bobax were selected because the methods proposed by the other researches

    such as Jae-Seo et al. (2008) and Gu, Zhang, & Lee (2008) used these bots to evaluate

    their methods. Thus, the same bot structure can also be used to evaluate the new

    proposed method developed in this research, and compare it with other methods.

    1.7Thesis OrganisationChapter 1 (Introduction): This chapter presents an overview of Botnets and their

    malicious activities, the motivation of this research, problem statements, the objectives,

    and the scope of this research.

    Chapter 2 (Literature Review): This chapter presents information from the

    literature on Botnets characteristics, lifecycle, and architecture. It also gives an

    overview of current Botnet detection methods.

    Chapter 3 (Modeling of Detection System): This chapterpresents the steps

    involved in modeling the HTTP-based Botnet detection system to achieve the

    objectives.

    Chapter 4 (Implementation of Proposed Model): This chapter discusses the steps

    involved in developing the proposed system.

    Chapter 5 (Testing the Proposed Method): This chapter discusses the steps

    involved in testing the proposed system, and the testing process.

  • 7/30/2019 HTTP-Based Botntet Detection

    23/124

    9

    Chapter 6 (Result Analysis and Discussion): This chapterpresents the research

    findings, and discusses the effects of the new filters and algorithms developed.

    Chapter 7 (Conclusion and future work): This chapter provides a summary of the

    whole research and the significance of its findings. It also gives recommendations for

    related work to be undertaken, in future.

  • 7/30/2019 HTTP-Based Botntet Detection

    24/124

    10

    Chapter 2: Bot and BotnetsThis chapter presents a review of the literature on other researches on Botnets, and

    the methods for Botnet detection. Section one gives an overview of bots and Botnets.

    Section two discusses the characteristics of Botnets, including the life cycle, the

    Botmasters functions, and their prime targets, as well as their command and control

    mechanisms. Existing Botnet detection methods are reviewed in section three. The last

    section presents the network behaviour analysis technique, and background information on

    its use in Botnet detection.

    2.1 Introduction

    A bot (originates from the term robots) is an application that can perform and

    repeat a particular task faster when compared to human. When a large number of bots

    spread to different computers and connect to each other through the Internet, they form a

    group called Botnet, which is a network of bots (Mitsuaki et al., 2007). Botnets range in

    size from a large Botnet having millions of bots, to a small Botnet having thousands of

    bots, only. Regardless of their size, which has a direct link to their complexity and purpose,

    Botnets are mainly created to carry out malicious activities in computer networks (Govil &

    Jivika, 2007; Lee, et al., 2007; Zhaosheng et al., 2008).

    A bot is designed to infect computers, and the infected computers become a part of

    a Botnet without their owners knowledge, and come under the control of a person, known

    as the Botmaster. The Botmaster sends orders to all the bots and controls the entire Botnet

    through the Internet and the servers, known as the command and control (C&C) servers

    (Govil & Jivika, 2007; Zhaosheng, et al., 2008).

  • 7/30/2019 HTTP-Based Botntet Detection

    25/124

    11

    2.2 Characteristics of Botnet

    2.2.1 Botnet life cycle and Botmaster activities

    Botnets can be of different sizes or structures but, in general, they go through the

    same stages in their life cycle (Govil & Jivika, 2007; Schiller & Binkley, 2007). Figure 2-

    1 shows the life cycle of Botnets.

    a) InfectionThe life cycle of a Botnet begins with the infection of the different

    computers by its bots. An infected computer is known as a zombie (Lee, et al.,

    2007) .

    Figure 2-1: Botnet life cycle (Schiller & Binkley, 2007)

  • 7/30/2019 HTTP-Based Botntet Detection

    26/124

    12

    b) RallyingAfter infecting the computer, the bot must connect to its Command and

    Control (C&C) server and let the Botmaster know that it has already

    established a zombie, successfully. In addition, it updates itself with essential

    information such as updating the list of relative C&C server IP address list.

    Therefore, rallying refers to the process when the bots connect to the C&C

    server for the first time (Schiller & Binkley, 2007).

    c) Get Commands and Send ReportsDuring this stage, the bots on the infected computers or zombies, listen to

    the Command and Control server or connect to them periodically to get new

    commands from the Botmaster. A new command, when detected by the bots, is

    treated as an order; they execute the order and the results are reported to the

    Command and Control server; the bots then wait for new commands (Govil &

    Jivika, 2007; Schiller & Binkley, 2007).

    d) AbandonWhen a bot is no longer usable (e.g. too slow) or the Botmaster decides that

    the particular bot is no longer suitable, it may be abandoned by the Botmaster.

    If this happens, the Botnet is still available. A whole Botnet is destroyed when

    all its bots are detected or abandoned or when the Command and Control

    Servers are detected and blocked (Schiller & Binkley, 2007).

  • 7/30/2019 HTTP-Based Botntet Detection

    27/124

    13

    e) Securing the BotnetOne of the important issues in each Botnet life cycle is the constant effort to

    keep the whole Botnet secure. The Botmasters do this by encrypting the

    messages that are delivered between the bots, and between the bots and the

    Command and Control servers. In addition, Botmasters may update the bots

    with new codes and new techniques to evade the anti-virus software (Schiller &

    Binkley, 2007).

    2.2.2 Botmasters Prime Targets

    The Botmasters may infect different types of computers or servers but the most

    common targets are the less-monitored computers, high-bandwidth connectivity,

    university servers, and home computers. Computers that are connected to the Internet

    using broadband connection, give attackers an opportunity to use the same bandwidth.

    The not so computer-savvy home users are also prime targets of the Botmasters. These

    users usually have low awareness or lack knowledge of network security, and

    Botmasters take advantage of this to gain unauthorised access into the computers and

    keep their bots there for a long time without being detected (Govil & Jivika, 2007; Puri,

    2003).

    2.2.3 Botnet Command and Control (C&C) Mechanism

    As discussed in the previous sections, a Botnet threat comes from three main

    elements - the bots, the Command and Control (C&C) servers, and the Botmasters. The

    bots infect the computers, and the Command and Control servers distribute the

  • 7/30/2019 HTTP-Based Botntet Detection

    28/124

    14

    Botmasters order to the bots in infected computers. These three elements have close

    communication with one another, thus, they will be useless without some form of

    Command and Control mechanism for this to take place (Gu, Zhang, & Lee, 2008).

    The Command and Control mechanism creates an interface between the bots, C&C

    servers and the Botmasters, to transmit data between them. It is very crucial for

    Botmasters to establish a fool-proof connection between themselves, the infected

    computers, and C&C servers (Govil & Jivika, 2007). Figure 2-2 shows the logical

    relationship between these three elements.

    Figure 2-2: General schema of Botnets C&C mechanism

    There are two types of Botnet command and control architectures - centralised and

    decentralised - based on the way communication is implemented (Chao, Wei, & Xin,

    2009; Zeidanloo & Manaf, 2009).

    2.2.4 Centralised Command and Control Mechanism

    In the centralised command and control approach, all the zombies or bots are

    connected to the central C&C server, which is constantly waiting for new bots to be

    connected. Depending on the Botmasters settings, a C&C server may provide some

  • 7/30/2019 HTTP-Based Botntet Detection

    29/124

    15

    services to register the available bots, and this will make it possible to track their

    activities. Undoubtedly, the Botmaster must be connected to the C&C server to have

    control of the Botnets and distribute its commands and tasks (Gu, et al., 2008; Jing,

    Yang, Kaveh, Hongmei, & Jingyuan, 2009; Lee, et al., 2007; Ping, Sherri, & Cliff,

    2007). Figure 2-3 shows the structure of a Centralised Command and Control Botnet.

    Figure 2-3: Centralised Botnet (Ping, et al., 2007)

    Centralised Botnets are the most common type of Botnets as they use simple steps

    to create and manage the bots, and response is fast (Gu, et al., 2008; Jing, et al., 2009;

    Ping, et al., 2007). The centralised C&C mechanism is divided into two main types -

    IRC-based or HTTP-based - based on the communication protocols they use to

    establish their connection (Naseem, et al., 2010; Zeidanloo & Manaf, 2009; Zhaosheng,

    et al., 2008).

  • 7/30/2019 HTTP-Based Botntet Detection

    30/124

    16

    2.2.4.1 IRC-based Botnets

    IRC or Internet Relay Chat is a system that is used by computer users to

    communicate online or chat in real-time mode (Kalt, 2000). This method was used in

    the first generation of bots, at which the Botmaster used the IRC server and the relevant

    channels to distribute their command (Jae-Seo, et al., 2008). Each bot connects to the

    IRC server and channel that has been selected by a Botmaster, and waits for commands.

    In this setup, the Botmaster establishes real-time communication with all the connected

    bots, and controls them. The IRC bots follow the PUSH approach, which means that

    when an IRC bot connects to a selected channel, it does not get disconnected, and

    remains in the connect mode (Gu, et al., 2008; Naseem, et al., 2010; Ping, Lei, Baber, &

    Cliff, 2009). Figure 2- 4 shows the IRC-based Command and Control Botnets.

    Figure 2-4: IRC-based C&C Botnet (Gu, et al., 2008)

  • 7/30/2019 HTTP-Based Botntet Detection

    31/124

    17

    2.2.4.2 HTTP-based Botnets

    HTTP-based Command and Control is a new technique that allows the Botmasters

    to control their bots by using the HTTP protocol (Jae-Seo, et al., 2008). In this

    technique, the bots use specific URL or IP address defined by the Botmaster, to

    connect to a specific web server, which plays a Command and Control Server role

    (Naseem, et al., 2010).

    HTTP bots adopt the PULL approach, unlike the PUSH approach used by the IRC-

    based bots. In the PULL approach, the HTTP-based bots do not remain in the connect

    mode after it has established a connection to the Command and Control server, the first

    time. In the PULL approach, the Botmasters publish the commands on certain web

    servers, and the bots periodically visit those web servers to update themselves or get

    new commands. This process continues at a regular interval, that is defined by the

    Botmaster (Gu, et al., 2008; Jae-Seo, et al., 2008; Naseem, et al., 2010; Ping, et al.,

    2009). Figure 2-5 shows the HTTP-based Command and Control Botnets.

    Figure 2-5: HTTP-based C&C Botnet (Gu, et al., 2008)

  • 7/30/2019 HTTP-Based Botntet Detection

    32/124

    18

    2.2.5 Decentralised or P2P Command and Control Mechanism

    The decentralised Command and Control architecture is based on the peer-to-peer

    network model. In this model, the infected computers or zombies can act as a bot and as

    a C&C server at the same time (Ianelli & Hackworth, 2005.; Jing, et al., 2009; Naseem,

    et al., 2010). In fact, in P2P Botnets, instead of having a central C&C server, each bot

    acts as a server to transmit the commands to its neigbouring bots. The Botmaster sends

    commands to one or more bots, and the bots that receive the commands then deliver

    them to other bots, and this process is repeated by each bot that receives a new

    command.

    Unlike the centralised Botnet, creating and managing the P2P Botnets involve

    complex procedures and require a high level of expertise (Gu, et al., 2008; Jing, et

    al., 2009; Ping, et al., 2007) . Figure 2-6 shows the structure of a decentralised

    Command and Control Botnet.

    Figure 2-6: Decentralized or P2P Botnet (Ping, et al., 2007)

  • 7/30/2019 HTTP-Based Botntet Detection

    33/124

    19

    2.3 Why Choose HTTP Botnets?

    As discussed in sections 2.2.4 and 2.2.5, there are three different types of Botnets -

    IRC, HTTP, and P2P. The reasons for choosing the HTTP-based Botnets for this

    research, are as follows:

    In the first generation of Botnets, the IRC technology was used by Botmasters to

    control the bots because the IRC system has several advantages such as ease of use,

    ease of control, and ease of management (Ianelli & Hackworth, 2005.; Jae-Seo, et al.,

    2008). However, the main weakness of IRC Botnet is the central control mechanism. A

    whole Botnet can be destroyed by blocking the IRC server or blocking the IRC ports.

    Hence, the P2P Botnets were designed to overcome this problem (Wei, Tavallaee,

    Goaletsa, & A. Ghorbani, 2009; Zhaosheng, et al., 2008).

    In the decentralised Botnets or P2P Botnets, there is no central Command and

    Control server, rather, there are multiple distributed servers. Commands are delivered

    bot by bot to the entire Botnet. In addition, some decryption methods are used to make

    the communication secure (Gu, et al., 2008; Ianelli & Hackworth, 2005.; Ping, et al.,

    2007) . These techniques make it more difficult to detect P2P Botnets as compared to

    the IRC Botnets. However, P2P Botnets are not as widely used as IRC Botnets because

    the implementation and control of P2P bots can be quite difficult and complex. In

    addition, there is no latency in message delivery in P2P Botnets, and also the

    Botmasters are not able to know about the delivery status of the commands (Bailey,

    Cooke, Jahanian, Yunjing, & Karir, 2009).

    Recently, Botmasters have begun to use the centralised Command and Control

    structure, again. However, the HTTP protocol is used in place of the IRC protocol

  • 7/30/2019 HTTP-Based Botntet Detection

    34/124

    20

    (Jae-Seo, et al., 2008; Naseem, et al., 2010), and also port 80 is used. Because of the

    wide range of services used, it is not easy to block the central Command and Control

    server (Sandvine, 2006). In addition, by using the HTTP protocol, bots hide their

    communication flows among the normal HTTP flows, and avoid detection by the

    network defenders such as the firewalls (Chao, et al., 2009; Govil & Jivika, 2007;

    Zeidanloo & Manaf, 2009).

    From the review of the characteristics of IRC, P2P, and HTTP-based Botnets, it is

    clear that the HTTP command and control mechanism is a new technology that is

    preferred by Botmasters. Compared to the IRC and P2P Botnets, HTTP-based Botnets

    have a set of attributes that make it difficult for them to be detected. Surprisingly, the

    number of researches focusing on the detection of HTTP-based Botnets is relatively low

    as compared to the number of researches on the detection methods for IRC-based and

    P2P Botnets.

    The following sections discuss the past and current researches on Botnet detection

    methods.

  • 7/30/2019 HTTP-Based Botntet Detection

    35/124

    21

    2.4 Existing Botnet Detection Methods

    This section discusses the current methods and research on Botnet detection.

    2.4.1 Honeypot and Honeynet

    Honeypots are tools that are used as traps for bots as they can detect bots or collect

    information on their activities. The information can be used to understand more about

    bots behaviour or the intentions of the Botmasters. Nepenthes is a good example of a

    Honeypot that is used to collect the bots binary codes and other information about them

    (Niels & Thorsten, 2007; Rajab, Zarfoss, Monrose, & Terzis, 2006).

    Freiling, Holz, and Wicherski (2005) used Honeypots to collect information about

    DDOS attacks. This information includes DDOS signs and characteristics, cases, and

    the attackers intention and behaviour. This information is useful for the development

    of methods to prevent DDOS attacks.

    Similarly, Rajab et al. (2006) combined several Honeypots as a multifaceted

    approach to collect a large amount of information about IRC bots. By analysing the

    data, as well as tracking the activities of the bots, they learned more about the bots

    characteristics and behaviour.

    Like any other tools and techniques, Honeypots have their weaknesses. There are

    two types of Honeypots - low-interaction honeypots, and high-interaction honeypots.

    The main difference between them is the level of access rights to system resources,

    services, and functions.

  • 7/30/2019 HTTP-Based Botntet Detection

    36/124

    22

    Low-interaction honeypots like Nepenthes, are installed on computers to emulate

    limited services of their operating system, thus, they provide Botmasters limited

    interaction with the computers. Therefore, these computers may not be completely

    compromised, and the information collected on them may not be sufficient for analysis

    to detect Botnets (Niels & Thorsten, 2007).

    On the other hand, the high-interaction honeypots do not emulate any services of

    operating system but provide the real system and services. The Botmaster can use this

    real services to gain full control of the computer in which the high-interaction honeypot

    is installed (Niels & Thorsten, 2007).

    Today, it is not surprising that Botmasters use many techniques to avoid the

    honeypots (C. Zou & Cunningham, 2006) .

    2.4.2 Detection by Signature

    Signature refers to the known patterns or characteristics of threats from intruders

    into computer systems. By analysing and comparing these patterns or characteristics, it

    is possible to distinguish the threat activities from the normal activities (Scarfone &

    Mell, 2007).

    Goebel and Holz (2007) used an IRC nickname as signature. Using this method,

    known as Rishi, a reasonable amount of information on IRC traffic can be collected.

    Subsequently, all the IRC nicknames are extracted from the collected data and checked

    for known bots nicknames by using some regular expressions. To reduce the amount

    of comparison and the time taken, Goebel and Holz used a white list and a black list.

  • 7/30/2019 HTTP-Based Botntet Detection

    37/124

    23

    The signature-based detection method is not very effective because this method

    cannot identify new behaviour patterns or certain characteristics. This method is based

    on a simple comparison of the collected information with the predefined characteristics

    of well-known bots. Thus, this method is good for detecting well-known bots, but quite

    useless for detecting new and zero-day bots (Chao, et al., 2009; Scarfone & Mell,

    2007).

    2.4.3 Detection by DNS Monitoring

    Monitoring and analysing the DNS traffic generated by bots had been used as a

    technique to detect Botnets. Choi, Lee, Lee, and Kim (2007) found that bots generate

    DNS traffic in some situations, for example, when identifying the Command and

    Control server or arranging attacks such as DDOS attack. The researchers used three

    main differences between the bot-generated DNS flows and the normal DNS flows, as

    ways to detect Botnets.

    The first difference they noticed is the amount of the source IP addresses that send

    the DNS queries to specific domain names. The Botnet DNS queries are generated by a

    fixed number of IP addresses that belong to the bots in the same Botnet. On the other

    hand, a number of IP addresses of legitimate DNS queries generated by anonymous

    users to a particular domain name, are random.

    The second difference is the difference in the format and frequency of DNS queries

    generated by bots and by normal users. Bots have similar group activities, thus, DNS

    queries of the same format are generated by bots from the same Botnets, intermittently,

  • 7/30/2019 HTTP-Based Botntet Detection

    38/124

    24

    and only in special situations, but the DNS queries of normal users are generated

    continuously, and in a random format.

    The third difference is that normal users hardly use a distributed DNS

    (DDNS), whereas, it is used by bots.

    Salomon and Brustoloni (2008) also used DDNS as base parameters to suggest two

    approaches for Botnet detection. They found that Botmasters do not use certain

    Command and Control servers for a long time, and periodically change the servers. In

    this situation, bots will try to find the address of a new Command and Control server.

    When this happens, there will be a higher number of DDNS queries to specific domain

    names. These are signs of unusual activities of Botnets.

    NXDOMAIN has been evaluated as another parameter. The term, NXDOMAIN or

    Non-Existent Domain, describes the special state that accrues when the DNS resolvers

    are unable to resolve a certain domain name for any reason such as change of domain

    names, unregistered domain names, or server problems. Salomon and Brustoloni

    (2008) suggested that the high number of DDNS queries containing the NXDOMAIN

    code, could have been generated by the bots, which are searching for their Command

    and Control servers that might have been blocked, or moved.

    DNSBL or DNS Block List is a list of spamming computers and network IP

    addresses. Ramachandran, Feamster, and Dagon (2006) stated that DNSBLs may be

    checked by Botmasters to keep themselves aware of their bots status - to find out

    whether a particular bot is being blocked. Thus, their algorithms are designed to

    distinguish the normal DNSBL queries (generated in a normal service such as mail

    servers) from the queries generated by Botmasters.

  • 7/30/2019 HTTP-Based Botntet Detection

    39/124

    25

    The detection methods, discussed above, were designed to analyse bots and Botnet-

    generated DNS (domain name system) queries. These methods are no longer effective

    as the new generation of bots and Botnets have been designed to generate minimum

    number of DNS queries. Moreover, the process of analysing DNS is very complex (Jae-

    Seo, et al., 2008).

    2.4.4 Detection using Attack Behaviour Analysis

    In this method, the characteristics and behaviour of attacks have been studied by

    researchers more than other issues such as the bots, Command and Control servers, or

    Botmaster behaviour, or the communication methods used.

    Hu, Knyz, and Shin (2009)proposed a system, called RB-Seeker, which has three

    different sub-systems to detect bots that carry out URL redirection attacks. The first two

    sub-systems of the method attempt to identify all domains, which are related to

    redirection activities, based on the characteristics and behaviour of the URL redirection

    attack. At this stage, the system does not make any decision about the domain status,

    which can either be normal or malicious. In the next stage, the third sub-system

    examines the DNS queries to distinguish the malicious domains from the normal

    domains.

    This method, however, uses DNS-based techniques, but the main aim is to focus

    more on URL redirection activities, and DNS probing is used only as a sub-system.

    Therefore, this method does not belong to the DNS-based category.

    Brodsky and Brodsky (2007) found that a higher number of spam emails are sent by

    bots, within a short period than those sent by humans. Based on this observation, the

  • 7/30/2019 HTTP-Based Botntet Detection

    40/124

    26

    source of spam emails were identified and recorded. Subsequently, the number of spam

    emails generated by the same recorded sources, within a short period, was used as a

    parameter for decision-making.

    Likewise, Yinglianet al. (2008) designed a system to collect all the URLs that were

    sent by the spam emails, and divided them into different groups based on their Web

    domains. In the next step, all the URL groups were given the regular expression

    generator to create a signature for malicious URLs.

    These methods can identify bots based on the similarity of their group activities.

    The methods are effective when countering attacks from a large number of attackers.

    2.5 Detection Based on Network Behaviour Analysis

    Network Behaviour Analysis or NBA is a method that can be used to collect a wide

    range of information and statistics about network traffic. The information is analysed to

    detect for any signs of threats or malicious activities. The NBA method consists of

    several components that include the sensors and management servers (Analyser)

    (Scarfone & Mell, 2007; Timofte & Romania, 2007).

    The NBA system collects information such as IP addresses, operating system,

    available services, and logging data such as Timestamp, event type, network protocols,

    host ports, and additional packet header field for each client (Scarfone & Mell, 2007) .

  • 7/30/2019 HTTP-Based Botntet Detection

    41/124

    27

    2.5.1 Why Choose Network Behaviour Analysis?

    The Network Behaviour Analysis system has been chosen for this research for two

    main reasons:

    a) Ability to Detect Unknown Threats:Botmasters update their techniques day-by-day to hide their activities from

    existing detection methods (Lee, et al., 2007). The NBA system can thwart the

    Botmasters strategy as it can detect unknown (zero-day) threats. This feature of

    the NBA system can further improve Botnet detection (Derek, 2009; Scarfone &

    Mell, 2007).

    b) Ability to Detect Encrypted Threats:Botnets try to hide their communication flow among normal web traffic (e.g.

    HTTP C&C) (Zeidanloo & Manaf, 2009) or use encryption methods (e.g. P2P

    C&C) (Ping, et al., 2007). NBA looks out for abnormal flow patterns in network

    traffic, and not at the content of the information being transmitted (Rehak et al.,

    2009).

    In addition, the benchmark report from Aberdeen Group (Derek, 2009) pointed out

    that the NBA methods produce good results when combined with other methods.

    2.5.2 Existing Detection Methods Based on NBA

    The Network Behaviour Analysis technique has been widely used by researchers for

    Botnet detection for many years.

  • 7/30/2019 HTTP-Based Botntet Detection

    42/124

    28

    Strayer, Walsh, Livadas, and Lapsley (2006)designed a system to detect IRC bots

    using five filters. Initially, the IRC chat traffic is separated from the other types of

    traffic. The IRC traffic is then examined using five different filters to reduce the amount

    of useless traffic flows. The first filter is applied to reduce the amount of IRC traffic

    based on the assumption that bots use only TCP-based IRC flows.

    The other four filters, respectively, further reduce the IRC traffic tracked based on

    the following criteria: flows that only have a SYN and RST flags; high bit rate flows;

    average packet size is bigger than expected; and short duration flows. In the last stage

    of filtering of the IRC traffic flow, the machine-learning technique proposed by

    Livadas, Walsh, Lapsley, and Strayer (2006) is applied. Finally, a five-dimensional

    correlation algorithm is used to make a final decision to detect IRC bots (Strayer,

    Walsh, Livadas, & Lapsley, 2006).

    Gianvecchio, Xie, Wu, and Wang (2008) studied the results from different

    measurements, which show the difference between the bot behaviour and human

    behaviour in the IRC chat. They noticed a difference between the bots and human with

    respect to the inter-message delay and message size in the Internet chat rooms. After

    analysing these two parameters, they proposed a system that uses entropy and machine-

    learning-based classifiers to detect chat bots.

    Mitsuaki et al. (2007) introduced three metrics - relationship style, response time,

    and synchronization activities - for detecting bots. Because the Botmasters are

    connected to the bots via Command and Control servers, they assume that there is a 1 to

    N relationship between the Botmaster and the bots in a Botnet. Mitsuaki et al. use the

    structure of this relationship as a metric to detect Botnets.

  • 7/30/2019 HTTP-Based Botntet Detection

    43/124

    29

    They also observed that the IRC chat bots respond faster than human, hence, the

    response time is used as the second metric. Finally, they observed that the bots get their

    commands from the Botmaster. This means that the bots may perform abnormal

    activities to be in synchronisation with other bots in the same Botnet. This

    synchronisation activity is used as another metric.

    Wei et al. (2009) categorised the services or applications using signature-based and

    decision tree classifiers. They categorised the network applications into IRC chat, P2P,

    and web applications. Then, focusing on each category, they use the response time, and

    synchronisation activities metrics introduced by Mitsuaki et al. (2007) to differentiate

    the bot activities from the normal activities.

    Guofei, Phillip, Vinod, Martin, and Wenke (2007) proposed the BotHunter that

    models the five subsets, which may happen during the infection process by bots. They

    set these subsets in different correlation engines to examine the traffic flows to look for

    any evidence of Botnet activities.

    BotSniffer (Gu, et al., 2008) and its extension BotMiner (Guofei, Roberto, Junjie, &

    Wenke, 2008), are Botnet detection systems that carry out their tasks by analysing the

    similarity in the abnormal or malicious activities generated by the bots of the same

    Botnet.

    Jae-Seo et al. (2008) used a parameter based on one of the pre-defined

    characteristics of HTTP-based Botnets. As discussed earlier, the HTTP bots

    periodically connect to a particular Command and Control server to get updates. The

    researchers suggested that there is a degree of periodic repeatability or DPR to show the

  • 7/30/2019 HTTP-Based Botntet Detection

    44/124

    30

    rate of periodic connections to certain servers. The value of DPR is used as a parameter

    to detect HTTP-based bots.

    The next section will evaluate some of the methods used in past researches and

    compare them to the system developed in this research.

    2.5.3 Evaluation and Comparison of Existing NBA Methods for Botnet Detection

    A Botnet detection system, called botAnalytics, was developed in this research to

    detect HTTP-based Botnets. The reasons for choosing HTTP-based bots, and the

    Network Behaviour Analysis approach for the design of botAnalytics, had already been

    discussed. In this section, botAnalytics will be compared with other methods from past

    researches that also used the NBA technique. Table 2-1 shows the comparison, in brief.

    Table 2-1: Comparison of Methods from Past Researches with botAnalytics

    As shown in table 2-1, all the methods are able to detect unknown (zero-day) bots.

    This ability is one of the main advantages of using the Network Behaviour Analysis

    system, as discussed earlier. botAnalytics was designed to detect HTTP-based Botnets,

  • 7/30/2019 HTTP-Based Botntet Detection

    45/124

    31

    hence, it cannot be compared with the first five methods, that were designed to detect

    IRC-based Botnets.

    Jae-Seo et al. (2008) proposed a system to detect only HTTP-based bots. In this

    method, normal applications can incorrectly be detected as bots, and this can produce

    very high false-positive results.

    The methods proposed by Guofei et al. (2008) and Gu et al.(2008)were designed to

    detect all three types of bots IRC-based, P2P, and HTTP-based bots. In general, their

    methods produce low false-positive results, but their sub-systems, which are involved in

    detecting HTTP-based bots, produce high false-positive results. This is because the

    proposed HTTP-based Botnet detection sub-systems have the same design as that

    proposed by Jae-Seo et al.

    As discussed earlier, the technique proposed by Guofei et al. and its extension by

    Gu et al., are based on the similarity of the bots group activities, and use data mining

    approaches. These techniques work with a Botnet that has a large number of bots to

    produce results to make better decision. For this reason, these methods are not effective

    in small-scale Botnets.

    Gu et al. (2008) proposed a method to detect small-scale Botnets, but this method

    has a direct relationship with the false-positive rate, which means that if its

    effectiveness in small-scale Botnets increases, the false-positive ratio also increases.

    The botAnalytics system developed in this research was aimed at overcoming the

    weaknesses of BotSniffer (Gu, et al., 2008), and BotHunter (Guofei, Phillip, Vinod,

    Martin, & Wenke, 2007). It can detect even a very small-scale Botnet that has only one

  • 7/30/2019 HTTP-Based Botntet Detection

    46/124

    32

    bot. In addition, botAnalytics produces very low false-positive rate, unlike the method

    developed by Jae-Seo et al. (2008).

    2.6 Conclusion

    There are three types of Botnet based on the way their bots communicate with each

    other. IRC-based and HTTP-based Botnets are called centralised and P2P is called

    decentralised. HTTP-based bots are the latest generation of Botnets that hide their

    activity by using the normal HTTP traffic.

    HTTP-based Botnets have a set of characteristics that make its detection difficult

    compared to the IRC and P2P Botnets. There are a several methods and techniques that

    have been used by researchers to track the Botnet activities and detect them, but the

    number of researches in HTTP-based Botnet detection is low as compared to the

    number of researches on the detection methods for IRC-based and P2P Botnets.

    The ability of the NBA system to detect unknown and encrypted threats made it the

    preferred system to modeling botAnalytics. Next chapter discusses the process of

    modeling a detection system based on NBA architecture.

  • 7/30/2019 HTTP-Based Botntet Detection

    47/124

    33

    Chapter 3: Modeling of Detection System3.1 Introduction:

    This chapter describes the method adopted to carry out the research on modeling a

    new system for detecting HTTP-based Botnet. As described in literature review, in this

    research a detection method has been proposed by using the network behaviour analysis

    (NBA) architecture (Derek, 2009; Scarfone & Mell, 2007). The proposed method use NBA

    architecture to collect a wide range of information and statistics about particular network

    traffic. Then the collected information is analysed to search for any signs of bots and Botnet

    activities.

    3.2 Proposed Method Architecture

    There are three layers in proposed method architecture - data collecting platform,

    data storing platform, and data analysing platform. Based on the NBA structure, the

    proposed method consists of several components that include the software sensors and

    management server (Analyser) (Scarfone & Mell, 2007; Timofte & Romania, 2007). Figure

    3-1 shows the schema of proposed method architecture.

    Figure 3-1: botAnalytics System Architecture

  • 7/30/2019 HTTP-Based Botntet Detection

    48/124

    34

    3.2.1 Data Collecting PlatformThe data collecting platform consists of a set of software sensors, which had

    been installed on each client in a particular network. The main task of the data

    collecting platform is to collect data of the HTTP traffic in each client and to store

    the data in the database. This platform also uses a set of filters and other techniques

    to separate out data on unwanted traffic.

    3.2.2 Data Analysing PlatformThe data collected by the data collecting platform are analysed by the data

    analysing platform to detect suspicious activities associated with a bot or Botnet. A

    set of filters and techniques are used by this platform to make the analysis process

    fool-proof.

    3.2.3 Data Storing PlatformThe data storing platform is the place where the collected data are kept

    before and after the analysis process. All the results are saved in the database to

    maintain the history of the system performance.

    3.3 Data Reduction Filters

    In addition to sniff network traffic, the proposed data collecting platform apply two

    filters on collected data to filter out the useless data from being collected, and reduce the

    amount of unwanted data.

  • 7/30/2019 HTTP-Based Botntet Detection

    49/124

    35

    3.3.1 HTTP Traffic Separator FilterHTTP Traffic Separator filter (H.T.S) was designed to separate the HTTP

    traffic from other types of traffic in the network. botAnalytics was designed to

    detect HTTB-based Botnets. As mentioned in section 2.2.4, HTTP-based Botnets

    use the HTTP traffic; hence, the data on other types of network traffic are not

    collected. Figure 3-2 shows the flowchart of this filter.

    Figure 3-2: The flowchart of H.T.S. filter

    3.3.2 Get and Post Separator FilterThe Get and Post Separator (G.P.S.) filter designed to select only the HTTP

    traffic with GET or POST methods. The HTTP-based bots use the GET or POST

    methods to contact their Command and Control server, thus, the other methods

    provide no information about bot activities (Joe, 2004; Naseem, et al., 2010;

    Nazario, 2007). Therefore, The G.P.S. filter focuses on the HTTP methods, and

    only selects the HTTP traffic with the GET and POST methods. Figure 3-3 shows

    the flowchart of this filter.

  • 7/30/2019 HTTP-Based Botntet Detection

    50/124

    36

    Figure 3-3: The flowchart of G.P.S. filter

    3.4 VOU MechanismThe VOU or Validation of User-Agents mechanism was designed based on a unique

    algorithm. It is used, for first time in this research in the data collecting platform. This

    mechanism defines the VOU field for each collected HTTP traffic packet with an

    appropriate value.

    The VOU mechanism acts on each collected packet of HTTP traffic with the GET

    or POST methods, and obtains the User-Agent from the collected traffic header. In the next

    step, the VOU tries to define the User-Agent string and its corresponding application from

    the installed application list. The install application list contains the list of applications and

    services, which are available on each client within a network, together with their

    corresponding User-Agent. This list can be updated by users or automatically from

    websites such as www.user-agents.org . Figure 3-4 shows the flowchart of the VOU

    mechanism.

    http://www.user-agents.org/http://www.user-agents.org/
  • 7/30/2019 HTTP-Based Botntet Detection

    51/124

    37

    Figure 3-4: The VOU Module Flowchart

    For each collected HTTP packet, the VOU field is updated with either one of

    three different values, based on different conditions, as explained below:

  • 7/30/2019 HTTP-Based Botntet Detection

    52/124

    38

    1) UNKNOWN valueIf the VOU mechanism is not able to determine the User-Agent for

    any reason, for example, due to encryption or use of fake User-Agents, the

    VOU field of the collected traffic will be given the UNKNOWN value. If

    the VOU mechanism is able to determine the User-Agent but is not able to

    identify the corresponding application, the VOU field will also be given the

    UNKNOWN value.

    2) VALID valueThe VOU field will be set to the VALID value if the User-Agent and

    its corresponding application have been identified, and the corresponding

    application has been installed on the client and is available at the same time.

    3) NOTVALID valueIf the User-Agent and its corresponding application have been

    identified but the corresponding application is not available on the client, the

    VOU field will be given the NOTVALID value.

    3.5 Analysing the Collected TrafficThe data collecting platform periodically sniffs the network traffic and applies the

    H.T.S. and G.P.S. filters to select only HTTP-type traffic using the GET or POST method.

    In addition to these filters the VOU mechanism is applied on collected data as described on

    section 3.4. When a reasonable number of packets have been collected and stored in the

    data store platform, the Analyser begins its work in the data analysing platform as follows:

  • 7/30/2019 HTTP-Based Botntet Detection

    53/124

    39

    3.5.1 Grouping and SortingThe Grouping and Sorting (G.A.S) process sorts data on the collected traffic

    and divides them into different groups based on the source IP address (SIP),

    destination IP address (DIP), URL, and the User-Agent string (UA).

    While the other researches mostly use source IP, destination IP and Domain

    names to divide the collected traffic packets to different groups, in the proposed

    method one of the HTTP header fields known as the User-Agent has been used as

    another parameter beside the previous ones, to make the collected network packets

    classification more accurate. The G.A.S. process categorised the traffic packets into

    different groups, then the three different filters are applied to each group of packets

    to search for signs of suspicious activities and presence of HTTP bots.

    3.5.2 High Access Rate FilterThe H.A.R. filter or High Access Rate filter eliminates the group of similar

    HTTP connections or requests that have been generated within a very short time, for

    example, more than one request per second. Figure 3-5 shows the H.A.R. filter

    flowchart.

  • 7/30/2019 HTTP-Based Botntet Detection

    54/124

    40

    Figure 3-5: Flowchart of H.A.R. Filter

    3.5.3 Low Access Rate FilterThe L.A.R. filter or Low Access Rate filter removes the HTTP traffic with

    less than 2 packets of requests in the whole data collecting period. For example, if a

    group of HTTP traffic is generated within a very short time in the data collecting

    period, it will be removed by this filter. Figure 3-6 shows the L.A.R. filter

    flowchart.

  • 7/30/2019 HTTP-Based Botntet Detection

    55/124

    41

    Figure 3-6: Flowchart of L.A.R. Filter

    3.5.4 Periodic Access Rate FilterThe P.A.R. filter or Periodic Access Rate filter selects the HTTP

    connections or requests that were generated at periodic intervals. This filter was

    designed based on the nature of HTTP-based Botnets. As noted in the literature

    review, the HTTP bots connect to their command and control server periodically to

    get the commands or updates. Figure 3-7 shows the P.A.R. filter flowchart.

  • 7/30/2019 HTTP-Based Botntet Detection

    56/124

    42

    Figure 3-7: P.A.R. Filter Flowchart

    3.6 LODA MechanismLODA or Level of Danger Analysing mechanism is designed to analyse the

    detected suspicious traffic to define its level of danger. Figure 4-9 shows the flow chart of

    the analysing algorithm of LODA. Figure 3-8 shows the flow chart of the analysing

    algorithm of LODA.

    For every suspicious activity detected, the analysis process starts by examining the

    VOU field value, which has been set by the VOU mechanism. If the value of the VOU field

    of a particular group of suspicious traffic is VALID, the level of danger field for that group

    will be set to LOW. If the VOU value is NOTVALID, the level of danger will be set to

    HIGH, and if the VOU value is UNKNOWN, the next step of analysing will start.

  • 7/30/2019 HTTP-Based Botntet Detection

    57/124

    43

    Figure 3-8: LODA Module Flowchart

    If the value of the VOU field is UNKNOWN, the query is referred to the database

    to retrieve the count of similar traffic group, which is generated by other clients in the

  • 7/30/2019 HTTP-Based Botntet Detection

    58/124

    44

    network. The answer is compared with the limit value set by the system Administrators. If

    the count is greater than the limit value, the level of da