http-based botntet detection
TRANSCRIPT
-
7/30/2019 HTTP-Based Botntet Detection
1/124
botAnalytics: Improving HTTP-Based Botnet Detection
by Using Network Behavior Analysis System
Meisam Eslahi
DISSERTATION SUBMITTED IN FULFILMENT OF THE
REQUIREMENTS FOR THE DEGREE OF MASTER OF COMPUTER
SCIENCE
Faculty of Computer Science and Information Technology
University of Malaya
2010
-
7/30/2019 HTTP-Based Botntet Detection
2/124
II
UNIVERSITI MALAYA
ORIGINAL LITERARY WORK DECLARATION
Name of Candidate: Meisam Eslahi (I.C/Passport No: I2140114)
Registration/Matric No: WGA070104
Name of Degree: Master of Computer Science
Title of Project Paper/Research Report/Dissertation/Thesis (this Work):
botAnalytics: Improving HTTP-Based Botnet Detection by Using Network Behavior
Analysis System
Field of Study: Network Security
I do solemnly and sincerely declare that:
(1) I am the sole author/writer of this Work;(2) This Work is original;
(3) Any use of any work in which copyright exists was done by way of fair dealingand for permitted purposes and any excerpt or extract from, or reference to or
reproduction of any copyright work has been disclosed expressly andsufficiently and the title of the Work and its authorship have been acknowledged
in this Work;(4) I do not have any actual knowledge nor do I ought reasonably to know that the
making of this work constitutes an infringement of any copyright work;(5) I hereby assign all and every rights in the copyright to this Work to the
University of Malaya (UM), who henceforth shall be owner of the copyrightin this Work and that any reproduction or use in any form or by any means
whatsoever is prohibited without the written consent of UM having been firsthad and obtained;
(6) I am fully aware that if in the course of making this Work I have infringed anycopyright whether intentionally or otherwise, I may be subject to legal action or
any other action as may be determined by UM.
Candidates Signature Date
Subscribed and solemnly declared before,
Witnesss Signature Date
Name: Dr Rosli Salleh
Designation: Supervisor
-
7/30/2019 HTTP-Based Botntet Detection
3/124
III
Abstract
This thesis reports on the research conducted to develop a method for detecting
HTTP-based Botnets based on the Network Behaviour Analysis system.Bots are small-size
malwares that infect computers, and join with other bots via the Internet to form a network
of bots called Botnet.
Botnets and their bots have a dynamic and flexible nature. The Botmasters, who
control the Botnets, update the bots and change their codes day by day to avoid the
traditional detection methods such as signature-based anti-viruses. In addition, many
techniques are employed by Botmasters to make their Botnets undetectable for as long as
possible. The latest generations of Botnets are HTTP-based, and use the standard HTTP
protocol to communicate with their bots. By using the normal HTTP traffic, the bots passed
off as normal users of the networks, and they can easily bypass the current network security
systems.
To solve this problem, a method based on network behaviour analysis system was
developed to improve the existing methods of detecting HTTP-based Botnets and their
bots. The system, botAnalytics, was developed by modifying the existing network behavior
analysis methods and adding new features to them. The Delphi programming language was
used to develop the botAnalytics system, while Microsoft Sql Server 2008 was selected as
its database management system. New filters and algorithms were designed and developed
to analyse the collected network packets to look for any evidence of suspicious HTTP-
based Botnets activities.
-
7/30/2019 HTTP-Based Botntet Detection
4/124
IV
In addition to HTTP-based Botnet detection, one of the HTTP header fields, called
the User-Agent, was used by botAnalytics to analyse the level of danger of detected
suspicious activities. This is the first reported use of the User-Agent to aid Botnet detection.
Based on the result of the testing and evaluation of botAnalytics, the system has been found
to be very efficient in detecting HTTP-based Botnets. botAnalytics was also found to be
very efficient for detecting small-scale Botnets.
-
7/30/2019 HTTP-Based Botntet Detection
5/124
V
Acknowledgements
Thank God, the most Gracious and Merciful, for all the blessings bestowed on me.
The submission of this dissertation marks the end of a somewhat long journey in my pursuit
of Masters degree at the University of Malaya, Kuala Lumpur. The journey would have
been difficult if not for all the help, understanding and kindness of many people.
Without doubt, I would like to express my sincere gratitude to my supervisors, Dr
Omar Zakaria and Dr Rosli Salleh for their kindness to take me under their charge to
conduct this research. Their patience and encouragement gave me the motivation to work
on this research until its successful completion. Their guidance and readiness to share their
knowledge have greatly contributed to the direction I should take and what I should do to
achieve my goal. I cannot thank them enough, and it is hoped the Malay way of expressing
how I feel says it all Ribuan terima kasih.
While doing my studies and research in FCSIT, one can say that one is never
working alone. I have the friendship, goodwill and support of my course-mates and friends,
who have never hesitated to offer their advice and moral support when it is needed. To my
good friend, Mohsen Saghafi, in particular, thank you for being there whenever I needed
someone to go to for advice. To all of them, especially Saiful Khan, Teh Kang Hai, Paul
Nelson, and Ali Keshavarz a big thank you.
I would like to express my gratitude and love to my family for their care and
understanding when I was doing my research. To the two special women in my life, my
mother K.Abdullahi and my wife Maryam Var Naseri, your boundless love, and for your
confidence in me, you have been my pillars of strength and determination to help me to
-
7/30/2019 HTTP-Based Botntet Detection
6/124
VI
carry on, and if I have succeeded, then you have been a big part of my success, and I
dedicate it to both of you together with my love.
-
7/30/2019 HTTP-Based Botntet Detection
7/124
VII
Table of Content
Abstract ........................................................................................................................... III
Acknowledgements .......................................................................................................... V
Table of Content ............................................................................................................ VII
List of Figures ................................................................................................................. XI
List of Tables................................................................................................................. XIII
Abbreviations ................................................................................................................ XIV
Chapter 1: Introduction ................................................................................................ 1
1.1 Background............................................................................................................ 1
1.2 Motivation.............................................................................................................. 3
1.3 Statement of Problem ............................................................................................. 5
1.4 Statement of Objectives .......................................................................................... 6
1.5 Proposed Solution .................................................................................................. 7
1.6 Thesis Scope .......................................................................................................... 7
1.7 Thesis Organisation................................................................................................ 8
Chapter 2: Bot and Botnets......................................................................................... 10
2.1 Introduction ............................................................................................................ 10
2.2 Characteristics of Botnet......................................................................................... 11
2.2.1 Botnet life cycle and Botmaster activities ......................................................... 11
2.2.2 Botmasters Prime Targets................................................................................. 13
2.2.3 Botnet Command and Control (C&C) Mechanism............................................ 13
2.2.4 Centralised Command and Control Mechanism ................................................ 14
2.2.4.1 IRC-based Botnets .............................................................................. 16
2.2.4.2 HTTP-based Botnets ........................................................................... 17
2.2.5 Decentralised or P2P Command and Control Mechanism ................................. 18
2.3 Why Choose HTTP Botnets? .................................................................................. 19
2.4 Existing Botnet Detection Methods......................................................................... 21
2.4.1 Honeypot and Honeynet ................................................................................... 21
-
7/30/2019 HTTP-Based Botntet Detection
8/124
VIII
2.4.2 Detection by Signature ..................................................................................... 22
2.4.3 Detection by DNS Monitoring.......................................................................... 23
2.4.4 Detection using Attack Behaviour Analysis...................................................... 25
2.5 Detection Based on Network Behaviour Analysis ................................................... 26
2.5.1 Why Choose Network Behaviour Analysis? ..................................................... 27
2.5.2 Existing Detection Methods Based on NBA ..................................................... 27
2.5.3 Evaluation and Comparison of Existing NBA Methods for Botnet Detection.... 30
2.6 Conclusion ............................................................................................................. 32
Chapter 3: Modeling of Detection System .................................................................. 33
3.1 Introduction: ........................................................................................................... 33
3.2 Proposed Method Architecture................................................................................ 33
3.3 Data Reduction Filters ............................................................................................ 34
3.4 VOU Mechanism ................................................................................................. 36
3.5 Analysing the Collected Traffic............................................................................ 38
3.6 LODA Mechanism ............................................................................................... 42
3.7 Proposed Method Flowchart ................................................................................. 44
3.8 Conclusion ........................................................................................................... 46
Chapter 4: Implementation of Proposed Model ......................................................... 47
4.1 Introduction.......................................................................................................... 47
4.2 DELPHI programming language .......................................................................... 47
4.3 Client Side Implementation .................................................................................. 49
4.3.1 Settings ....................................................................................................... 49
4.3.2 Sniffing the Traffic ..................................................................................... 51
4.3.3 H.T.S. filter................................................................................................ 52
4.3.4 G.P.S. filter................................................................................................. 52
4.3.5 VOU Mechanism ........................................................................................ 53
4.4 Database Implementation ..................................................................................... 55
4.4.1 Microsoft Sql Server 2008 .......................................................................... 55
4.4.2 Tables structure ......................................................................................... 57
-
7/30/2019 HTTP-Based Botntet Detection
9/124
IX
4.4.3 Tables relationship...................................................................................... 63
4.5 Server Side Implementation.................................................................................. 63
4.5.1 General info ................................................................................................ 64
4.5.2 Analyse ...................................................................................................... 66
4.5.3 Notifications ............................................................................................... 73
4.5.4 Report......................................................................................................... 74
4.5.5 User Agent list ............................................................................................ 75
4.5.6 White list .................................................................................................... 76
4.5.7 Black list .................................................................................................... 77
4.5.8 Sensor status............................................................................................... 77
4.5.9 User account ............................................................................................... 80
4.6 Conclusion ........................................................................................................... 82
Chapter 5: Testing the Proposed Model ..................................................................... 83
5.1 Introduction.......................................................................................................... 83
5.2 Hardware Requirements ....................................................................................... 83
5.3 Testing bots.......................................................................................................... 84
5.4 Testing Command and Control servers ................................................................. 86
5.5 Testing clients ...................................................................................................... 86
5.6 Testing analyser................................................................................................... 87
5.7 Testing results ...................................................................................................... 87
5.8 Conclusion ........................................................................................................... 88
Chapter 6: Data Analysis and Discussion................................................................... 89
6.1 Introduction ............................................................................................................ 89
6.2 Evaluation of botAnalytics ..................................................................................... 89
6.2.1 Filtering evaluation .......................................................................................... 90
6.2.2 VOU algorithm evaluation ............................................................................... 96
6.2.3 LODA algorithm evaluation ............................................................................. 97
6.3 Comparison of botAnalytics with Other Systems .................................................... 99
6.3.1 False-Positive rate ............................................................................................ 99
-
7/30/2019 HTTP-Based Botntet Detection
10/124
X
6.3.2 Efficiency in small-scale Botnets.................................................................... 100
6.4 Conclusion ........................................................................................................... 101
Chapter 7: Conclusion and Future Work................................................................. 102
7.1 Introduction .......................................................................................................... 102
7.2 Achievement of Objectives ................................................................................... 102
7.3 Contributions........................................................................................................ 103
7.3.1 HTTP-based Botnet Detection: ................................................................. 103
7.3.2 Establishment of User-Agent : .................................................................. 104
7.3.3 New Filters and Algorithms: ..................................................................... 104
7.3.4 Evaluate the Level of Danger:................................................................... 104
7.4 Limitations and Future Work................................................................................ 105
7.4.1 Real Time Detection: ................................................................................ 105
7.4.2 Linux Platform ......................................................................................... 105
7.4.3 Other Type of Bots and Botnets: ............................................................... 106
7.4.4 Prevention Methods: ................................................................................. 106
7.4.5 Advanced the User-Agent for Botnet Detection : ...................................... 106
7.5 Conclusion ........................................................................................................... 107
References .................................................................................................................... 108
-
7/30/2019 HTTP-Based Botntet Detection
11/124
XI
List of Figures
Figure 2-1: Botnet life cycle (Schiller & Binkley, 2007) .................................................. 11
Figure 2-2: General schema of Botnets C&C mechanism ................................................. 14
Figure 2-3: Centralised Botnet (Ping, et al., 2007) ............................................................ 15Figure 2-4: IRC-based C&C Botnet (Gu, et al., 2008) ...................................................... 16
Figure 2-5: HTTP-based C&C Botnet (Gu, et al., 2008) ................................................... 17
Figure 2-6: Decentralized or P2P Botnet (Ping, et al., 2007) ............................................. 18
Figure 3-1: botAnalytics System Architecture .................................................................. 33
Figure 3-2: The flowchart of H.T.S. filter ......................................................................... 35
Figure 3-3: The flowchart of G.P.S. filter ......................................................................... 36
Figure 3-4: The VOU Module Flowchart ......................................................................... 37
Figure 3-5: Flowchart of H.A.R. Filter ............................................................................. 40
Figure 3-6: Flowchart of L.A.R. Filter ............................................................................. 41
Figure 3-7: P.A.R. Filter Flowchart .................................................................................. 42
Figure 3-8: LODA Module Flowchart .............................................................................. 43
Figure 3-9: The Proposed Method Flowchart ................................................................... 45
Figure 4-1: botAnalytics Client Side GUI ........................................................................ 49
Figure 4-2: Setting GUI ................................................................................................... 50
Figure 4-3: Traffic Sniffer GUI ........................................................................................ 51
Figure 4-4 : H.T.S Filter GUI ........................................................................................... 52
Figure 4-5 : G.P.S Filter GUI ........................................................................................... 53Figure 4-6 : VOU Mechanism GUI .................................................................................. 53
Figure 4-7 : VOU Pseudo Code ........................................................................................ 54
Figure 4-8: botAnalytics Database: Relationship between the Tables ............................... 63
Figure 4-9: botAnalytics Server Side GUI ........................................................................ 64
Figure 4-10: General Info GUI ......................................................................................... 65
Figure 4-11: GET and POST Percentage Query Pseudo Code .......................................... 65
Figure 4-12: Collected Traffic Statistics Query Pseudo Code ........................................... 66
Figure 4-13: Primary Data Tab of the Analyse Section ..................................................... 67
Figure 4-14: Black/White listing Tab of the Analyse Section ........................................... 68
Figure 4-15: H.A.R. Result Tab of the Analyse Section ................................................... 69Figure 4-16: H.A.R. Filter Pseudo Code ........................................................................... 69
Figure 4-17: L.A.R Result Tab of the Analyse Section ..................................................... 70
Figure 4-18: L.A.R. Filter Pseudo Code ........................................................................... 70
Figure 4-19: P.A.R. Result Tab of the Analyse Section .................................................... 71
Figure 4-20: P.A.R. Filter Pseudo Code ........................................................................... 72
Figure 4-21: LODA Module Pseudo Code ....................................................................... 72
Figure 4-22: LODA Module Result GUI .......................................................................... 73
-
7/30/2019 HTTP-Based Botntet Detection
12/124
XII
Figure 4-23: Notifications GUI ........................................................................................ 74
Figure 4-24: Report GUI .................................................................................................. 75
Figure 4-25: User Agent List GUI .................................................................................... 76
Figure 4-26: White List GUI ............................................................................................ 76
Figure 4-27: Black List GUI ............................................................................................ 77
Figure 4-28: Sensor Status GUI ........................................................................................ 78
Figure 4-29: Sensor Info Pseudo Code ............................................................................. 79
Figure 4-30: Top 10 Active Sensors Pseudo Code ............................................................ 79
Figure 4-31: Edit Profile Tab ........................................................................................... 80
Figure 4-32: Create New User Tab ................................................................................... 81
Figure 4-33: Manage Existing Users Tab ......................................................................... 82
Figure 5-1 : General Schema for the Testing Phase .......................................................... 84
Figure 5-2 : The Black Energy User Agent (Nazario, 2007) ............................................. 84
Figure 5-3: The Firefox User Agent ................................................................................. 85
Figure 5-4: The Bobax User Agent ................................................................................... 85
Figure 6-1: The H.T.S. Filter Results Chart (See also Table 6-1) ...................................... 91
Figure 6-2: The G.P.S. Filter Results Chart (See also Table 6-1) ...................................... 92
Figure 6-3: The H.A.R. Filter Results Chart (See also Table 6-1) ..................................... 93
Figure 6-4: The L.A.R. Filter Results Chart (See also Table 6-1) ..................................... 94
Figure 6-5: The P.A.R. Filter Results Chart (See also Table 6-1) ...................................... 95
-
7/30/2019 HTTP-Based Botntet Detection
13/124
XIII
List of Tables
Table 2-1: Comparison of Methods from Past Researches with botAnalytics ................... 30
Table 4-1: Comparison of Managed-code with Native-code Languages ........................... 48Table 4-2 : Comparison of DBMSs .................................................................................. 55
Table 4-3 : Microsoft Sql Server 2008 Extra New Features .............................................. 56
Table 4-4 : tblUser Structure ............................................................................................ 57
Table 4-5: tblSquestion Structure ..................................................................................... 58
Table 4-6: tblRole Structure ............................................................................................. 58
Table 4-7: tblUserAgent Structure .................................................................................... 58
Table 4-8: tblWhiteList Structure ..................................................................................... 59
Table 4-9: tblBlackList Structure ..................................................................................... 59
Table 4-10: tblClientsInfo Structure ................................................................................. 60
Table 4-11: tblVOU Structure .......................................................................................... 60
Table 4-12: Structure of tblHMType table ........................................................................ 60
Table 4-13: Structure of tblVouValue Table ..................................................................... 61
Table 4-14: Structure of tblResult Table .......................................................................... 61
Table 4-15: Structure of tblLODA Table .......................................................................... 62
Table 4-16: Structure of the tblNotification Table ............................................................ 62
Table 5-1: botAnalytics Filtering Result ........................................................................... 87
Table 5-2: botAnalytics Botnet Detection Results ............................................................ 87
Table 6-1: botAnalytics: Results of Filtering .................................................................... 90
Table 6-2: The VOU Algorithm Result ............................................................................ 96
Table 6-3: The LODA Algorithm Results ........................................................................ 98
Table 6-4: Comparison of the botAnalytics with existing HTTP-based Botnet detection
researches ........................................................................................................................ 99
Table 6-5: The botAnalytics False-Positive .....................................................................100
-
7/30/2019 HTTP-Based Botntet Detection
14/124
XIV
Abbreviations
C&C Command and Control
DBMS Database Management System
DDOS Distributed Denial of Service
DNS Domain Name System
DNSBL DNS-based Black Hole List
ERD Entity Relationship Diagram
G.A.S. Grouping and Sorting
G.P.S. GET and POST Separator
GUI Graphical User Interface
H.A.R. High Access Rate
H.T.S. HTTP Traffic Separator
HTTP Hyper Text Transfer Protocol
IID Iterative and Incremental Development
IRC Internet Relay Chat
L.A.R. Low Access Rate
LODA Level of Danger Analysing
NBA Network Behaviour Analysis
P2P Peer-to-Peer
P.A.R. Periodical Access Rate
RAD Rapid Application Development
SDLC System Development Life Cycle
VOU Validation of User-Agent
http://databases.about.com/cs/specificproducts/g/er.htmhttp://databases.about.com/cs/specificproducts/g/er.htm -
7/30/2019 HTTP-Based Botntet Detection
15/124
1
Chapter 1: Introduction1.1Background
The development of computer networking, followed by the Internet in the second half
of the last century, can be said to be one of the key technological developments that has
revolutionised our daily life. The convenience and speed of digital communication has
become an integral part of home computer use, as well in every other aspects of human
activities, today, from education to business and research. While high-speed computer
networking and the Internet have brought great convenience, a number of security
challenges have also emerged with these technologies (O'Connor, 2004; Tanenbaum,
2002).
With the increasing use of computer networks and Internet on a global scale, network
security becomes an important issue. In fact, without having adequate network security all
the benefits brought by these technological developments would be lost as the networks and
Internet are vulnerable to malicious attacks. These attacks or threats can come in different
forms and can generally be categorised as: Viruses and Worms; Trojans; Backdoors;
Spyware; Phishing; and Botnets. Among all these threats, the Botnet is considered the most
dangerous (Barroso, 2007; Jae-Seo, HyunCheol, Jun-Hyung, Minsoo, & Bong-Nam, 2008;
Star, 2008)
A Botnet is a linked group of infected computers (termed as bots or zombie), which
communicate with each other and get their commands from a controller, called Botmaster.
A Botmaster has a mechanism to control their Botnets by sending commands to the bots
and receiving response from them. Different command and control mechanisms (e.g. IRC,
-
7/30/2019 HTTP-Based Botntet Detection
16/124
2
HTTP, and P2P) are used by Botmasters to achieve this goal (Govil & Jivika, 2007;
Naseem, shafqat, Sabir, & Shahzad, 2010).
The main aim of Botnets is to carry out different types of malicious activities or to gain
illegal profits. Some of these activities such as Distributed Denial of Service (DDoS),
Spamming, Thieving Personal Information, Illegal Hosting, Click Fraud, and Adware are
described below:
a) DDOS: this is the distributed form of Denial of Service or DOS attack that iscarried out by sending of a large number of UDP packets, ICMP requests, or TCP
sync floods, aimed at using the resources of particular servers and forcing them to
shut down. Because the Botmasters control the Botnets, they can carry out this type
of attack from thousands of different places by sending a particular command to the
bots in the infected computers in the same Botnet (Govil & Jivika, 2007; Puri,
2003; Srikanth, Dina, Matthias, & Arthur, 2005).
b) Spamming: spamming refers to emails, which have the same content but are sentin high volume. Botnets can be considered as a perfect platform to collect different
email addresses from infected computers, and generate and send spam or phishing
emails (Yinglian et al., 2008).
c) Thieving Personal Information: Botmasters use the Botnets to steal informationand use them for their own benefits. They can set a trigger to the bots and make
them scan the websites where the important information is entered. In addition,
other applications such as key-loggers are spread by the bots to obtain important
information like personal passwords, and financial data like online banking
-
7/30/2019 HTTP-Based Botntet Detection
17/124
3
passwords, and credit card information. Depending on the size of the Botnet, a
Botmaster can collect the required data or information from thousands to millions
of computers (Al-Hammadi & Aickelin, 2008; Govil & Jivika, 2007).
d) Illegal hosting : A computer or server with a large storage and a high-bandwidthconnection to the Internet can became a target for a Botmaster to gain control and
use for file sharing, illegally (AUSCERT, 2002; Puri, 2003).
e) Click Fraud and Adware: One of the main differences between Botnets and otherInternet threats is that a Botnet can be used to make money by click frauding.
Botmasters can amass a lot of money by using their bots to click on open websites
that pay a small sum of money for each visit to the website or for each click on the
advertisement. Pop-up advertisements can also be downloaded, installed, or
displayed by bots to force a user to visit particular websites (Barroso, 2007).
In addition, the Botnets can be used to spread different types of computer threats in the
form of viruses, Trojans, Backdoors, worms, etc. This means that Botnets are not only a
threat, but also a platform for the distribution of other threats (Star, 2008).
1.2MotivationIn recent years, the Botnets have become the biggest threat to cyber security, and
have been used as an infrastructure to carry out nearly every type of cyber attacks. In a
review of the different types of malicious activities perpetrated by Botnets, it is found
-
7/30/2019 HTTP-Based Botntet Detection
18/124
4
that they are not only a dangerous threat to computer networks and the Internet, but are
also involved in other types of threats and attacks (Jae-Seo, et al., 2008; Lee, Wang, &
Dagon, 2007).
Based on the network world report in 2009, more than 11.1 million computers in the
US had been infected by the 10 most damaging Botnets. While the theft of personal
information has always been considered as one of the most disturbing Internet threats,
the Zeus Botnet alone had infected nearly 3.5 million computers and attempted to steal
sensitive information. Each bot can send an average of three spam emails or fake
messages per second, thus, the Koobface Botnet with 2.9 million infected computers
can generate more than 8 million fake messages per second (Messmer, 2009).
In addition, the detection of Botnets and their associated bots are difficult based on
justification described below:
a) Skilful Developers: Botnet developers have higher technical capabilities thanany other online attackers. Unlike other types of network threats, Botnets and
their bots are designed and developed for long-term goals, or even, for illegal
monetary gains. Botmasters have various strategies to keep the bots safe and
hidden, as long as possible (Lee et al., 2007).
b) Dynamic Nature and Flexibility: Botnets and bots have a dynamic and flexiblenature. They are continuously being updated and their codes changed by the
developers and owners to elude the traditional detection methods such as
signature-based anti-viruses. The McAfee Research Lab reported that any
success in Botnet detection is only temporary as the Botmasters frequently
-
7/30/2019 HTTP-Based Botntet Detection
19/124
5
change their strategies, and design new methods to recover and restore their
detected bots, within a short time (McAfee, 2010).
c) Using Standard Protocols: The Botnets use standard protocols to establishtheir communication infrastructure. The latest generations of Botnets, called
HTTP-based Botnets, use the HTTP protocol as their communication method.
By using the normal HTTP traffic, they disguise themselves as normal network
users and easily avoid detection by the current network security systems (Jae-
Seo et al., 2008).
d) Silent Threats: Barroso (2007) termed the Botnets as Silent Threats, as theytry to control the infected computers without the knowledge of the computer
users. The bots on infected computers will not make any unusual or suspicious
use of the CPU, memory, or other computer resources, which will, otherwise,
cause their presence to be exposed.
The examples above show that Botnet detection is a big challenge in the network
security management.
1.3Statement of ProblemCompanies computers with high-bandwidth connectivity to the Internet, university
servers, and home computers are the main targets for Botnets. The Botmasters try to get
the control of these targets and carry out their malicious activities.
-
7/30/2019 HTTP-Based Botntet Detection
20/124
6
Today, the detection of Botnets has become a main issue in the field of computer
network security. Botnets have several characteristics that make them difficult to be
detected. They are distributed very fast and the Botmasters are always trying different
techniques to protect their bots from existing anti-virus software and detection systems
(Lee, et al., 2007). Currently, there isnt any effective technique to stop Botnets and
existing detection techniques are unable to detect and prevent the Botnets sufficiently.
The McAfee Research Labs predicted that the cyber community will face more
widely-distributed and more resilient Botnets, which are difficult to detect and destroy.
Undoubtedly, network security researchers will continuously face big challenges on
this problem (McAfee, 2010).
1.4Statement of ObjectivesThe aim of this research is to develop an improved method for the detection of
HTTP-based Botnets. In this context, the objectives of this research are as follows:
To study detailed knowledge of the HTTP-based and other types of Botnets.
To evaluate the existing methods of Botnet detection.
To study an overview of the characteristics and architecture of Network
Behaviour Analysis (NBA) System.
To model a system to improve detection of the HTTP-based Botnets based
on the Network Behaviour Analysis (NBA).
To develop a HTTP-based Botnet detection system by using NBA system
architecture.
To test and evaluate the proposed and developed system and to compare it to
existing NBA methods.
-
7/30/2019 HTTP-Based Botntet Detection
21/124
7
1.5Proposed SolutionIn this thesis, a Network Behaviour Analysis system, called botAnalytics, is
developed. The botAnalytics uses software sensors which are installed on network
clients to collect information on the network flows. The information from an entire
network will be stored in the server database and will be examined by another part of
botAnalytics system, known as the analyser, to look for any evidence of HTTP-based
Botnets activities.
botAnalytics aims to be able to detect HTTP-based Botnets regardless of their size
and with very low false-positive ratio. Various types of data filtering were introduced
for first time or modified by botAnalytics to make the detection process better. In
addition, one of the HTTP header fields, User-Agent (Fielding et al., 1999), was used to
design a new algorithm to evaluate the danger level of detected suspicious activities.
1.6Thesis ScopeIn this research, the Network Behaviour Analysis technique (Scarfone & Mell,
2007) was selected as it can be modified and used to detect the HTTP-based Botnets.
Improvement will be made to existing HTTP-based Botnet detection capabilities by
adding new features. The Network Behavior Analysis technique was chosen because of
its ability to detect encrypted and new (Zero-Day) bots, despite its drawback that it
works passively, and is not suitable for real-time detection (Derek, 2009).
It is difficult to find the source codes of HTTP-based bots to establish a real Botnet,
hence, it has to be simulated and implemented using appropriate programming
approaches. The implementation of the bots in this research is based on two existing
-
7/30/2019 HTTP-Based Botntet Detection
22/124
8
HTTP-based bots - the Black Energy (Nazario, 2007) and Bobax (Joe, 2004). Black
Energy and Bobax were selected because the methods proposed by the other researches
such as Jae-Seo et al. (2008) and Gu, Zhang, & Lee (2008) used these bots to evaluate
their methods. Thus, the same bot structure can also be used to evaluate the new
proposed method developed in this research, and compare it with other methods.
1.7Thesis OrganisationChapter 1 (Introduction): This chapter presents an overview of Botnets and their
malicious activities, the motivation of this research, problem statements, the objectives,
and the scope of this research.
Chapter 2 (Literature Review): This chapter presents information from the
literature on Botnets characteristics, lifecycle, and architecture. It also gives an
overview of current Botnet detection methods.
Chapter 3 (Modeling of Detection System): This chapterpresents the steps
involved in modeling the HTTP-based Botnet detection system to achieve the
objectives.
Chapter 4 (Implementation of Proposed Model): This chapter discusses the steps
involved in developing the proposed system.
Chapter 5 (Testing the Proposed Method): This chapter discusses the steps
involved in testing the proposed system, and the testing process.
-
7/30/2019 HTTP-Based Botntet Detection
23/124
9
Chapter 6 (Result Analysis and Discussion): This chapterpresents the research
findings, and discusses the effects of the new filters and algorithms developed.
Chapter 7 (Conclusion and future work): This chapter provides a summary of the
whole research and the significance of its findings. It also gives recommendations for
related work to be undertaken, in future.
-
7/30/2019 HTTP-Based Botntet Detection
24/124
10
Chapter 2: Bot and BotnetsThis chapter presents a review of the literature on other researches on Botnets, and
the methods for Botnet detection. Section one gives an overview of bots and Botnets.
Section two discusses the characteristics of Botnets, including the life cycle, the
Botmasters functions, and their prime targets, as well as their command and control
mechanisms. Existing Botnet detection methods are reviewed in section three. The last
section presents the network behaviour analysis technique, and background information on
its use in Botnet detection.
2.1 Introduction
A bot (originates from the term robots) is an application that can perform and
repeat a particular task faster when compared to human. When a large number of bots
spread to different computers and connect to each other through the Internet, they form a
group called Botnet, which is a network of bots (Mitsuaki et al., 2007). Botnets range in
size from a large Botnet having millions of bots, to a small Botnet having thousands of
bots, only. Regardless of their size, which has a direct link to their complexity and purpose,
Botnets are mainly created to carry out malicious activities in computer networks (Govil &
Jivika, 2007; Lee, et al., 2007; Zhaosheng et al., 2008).
A bot is designed to infect computers, and the infected computers become a part of
a Botnet without their owners knowledge, and come under the control of a person, known
as the Botmaster. The Botmaster sends orders to all the bots and controls the entire Botnet
through the Internet and the servers, known as the command and control (C&C) servers
(Govil & Jivika, 2007; Zhaosheng, et al., 2008).
-
7/30/2019 HTTP-Based Botntet Detection
25/124
11
2.2 Characteristics of Botnet
2.2.1 Botnet life cycle and Botmaster activities
Botnets can be of different sizes or structures but, in general, they go through the
same stages in their life cycle (Govil & Jivika, 2007; Schiller & Binkley, 2007). Figure 2-
1 shows the life cycle of Botnets.
a) InfectionThe life cycle of a Botnet begins with the infection of the different
computers by its bots. An infected computer is known as a zombie (Lee, et al.,
2007) .
Figure 2-1: Botnet life cycle (Schiller & Binkley, 2007)
-
7/30/2019 HTTP-Based Botntet Detection
26/124
12
b) RallyingAfter infecting the computer, the bot must connect to its Command and
Control (C&C) server and let the Botmaster know that it has already
established a zombie, successfully. In addition, it updates itself with essential
information such as updating the list of relative C&C server IP address list.
Therefore, rallying refers to the process when the bots connect to the C&C
server for the first time (Schiller & Binkley, 2007).
c) Get Commands and Send ReportsDuring this stage, the bots on the infected computers or zombies, listen to
the Command and Control server or connect to them periodically to get new
commands from the Botmaster. A new command, when detected by the bots, is
treated as an order; they execute the order and the results are reported to the
Command and Control server; the bots then wait for new commands (Govil &
Jivika, 2007; Schiller & Binkley, 2007).
d) AbandonWhen a bot is no longer usable (e.g. too slow) or the Botmaster decides that
the particular bot is no longer suitable, it may be abandoned by the Botmaster.
If this happens, the Botnet is still available. A whole Botnet is destroyed when
all its bots are detected or abandoned or when the Command and Control
Servers are detected and blocked (Schiller & Binkley, 2007).
-
7/30/2019 HTTP-Based Botntet Detection
27/124
13
e) Securing the BotnetOne of the important issues in each Botnet life cycle is the constant effort to
keep the whole Botnet secure. The Botmasters do this by encrypting the
messages that are delivered between the bots, and between the bots and the
Command and Control servers. In addition, Botmasters may update the bots
with new codes and new techniques to evade the anti-virus software (Schiller &
Binkley, 2007).
2.2.2 Botmasters Prime Targets
The Botmasters may infect different types of computers or servers but the most
common targets are the less-monitored computers, high-bandwidth connectivity,
university servers, and home computers. Computers that are connected to the Internet
using broadband connection, give attackers an opportunity to use the same bandwidth.
The not so computer-savvy home users are also prime targets of the Botmasters. These
users usually have low awareness or lack knowledge of network security, and
Botmasters take advantage of this to gain unauthorised access into the computers and
keep their bots there for a long time without being detected (Govil & Jivika, 2007; Puri,
2003).
2.2.3 Botnet Command and Control (C&C) Mechanism
As discussed in the previous sections, a Botnet threat comes from three main
elements - the bots, the Command and Control (C&C) servers, and the Botmasters. The
bots infect the computers, and the Command and Control servers distribute the
-
7/30/2019 HTTP-Based Botntet Detection
28/124
14
Botmasters order to the bots in infected computers. These three elements have close
communication with one another, thus, they will be useless without some form of
Command and Control mechanism for this to take place (Gu, Zhang, & Lee, 2008).
The Command and Control mechanism creates an interface between the bots, C&C
servers and the Botmasters, to transmit data between them. It is very crucial for
Botmasters to establish a fool-proof connection between themselves, the infected
computers, and C&C servers (Govil & Jivika, 2007). Figure 2-2 shows the logical
relationship between these three elements.
Figure 2-2: General schema of Botnets C&C mechanism
There are two types of Botnet command and control architectures - centralised and
decentralised - based on the way communication is implemented (Chao, Wei, & Xin,
2009; Zeidanloo & Manaf, 2009).
2.2.4 Centralised Command and Control Mechanism
In the centralised command and control approach, all the zombies or bots are
connected to the central C&C server, which is constantly waiting for new bots to be
connected. Depending on the Botmasters settings, a C&C server may provide some
-
7/30/2019 HTTP-Based Botntet Detection
29/124
15
services to register the available bots, and this will make it possible to track their
activities. Undoubtedly, the Botmaster must be connected to the C&C server to have
control of the Botnets and distribute its commands and tasks (Gu, et al., 2008; Jing,
Yang, Kaveh, Hongmei, & Jingyuan, 2009; Lee, et al., 2007; Ping, Sherri, & Cliff,
2007). Figure 2-3 shows the structure of a Centralised Command and Control Botnet.
Figure 2-3: Centralised Botnet (Ping, et al., 2007)
Centralised Botnets are the most common type of Botnets as they use simple steps
to create and manage the bots, and response is fast (Gu, et al., 2008; Jing, et al., 2009;
Ping, et al., 2007). The centralised C&C mechanism is divided into two main types -
IRC-based or HTTP-based - based on the communication protocols they use to
establish their connection (Naseem, et al., 2010; Zeidanloo & Manaf, 2009; Zhaosheng,
et al., 2008).
-
7/30/2019 HTTP-Based Botntet Detection
30/124
16
2.2.4.1 IRC-based Botnets
IRC or Internet Relay Chat is a system that is used by computer users to
communicate online or chat in real-time mode (Kalt, 2000). This method was used in
the first generation of bots, at which the Botmaster used the IRC server and the relevant
channels to distribute their command (Jae-Seo, et al., 2008). Each bot connects to the
IRC server and channel that has been selected by a Botmaster, and waits for commands.
In this setup, the Botmaster establishes real-time communication with all the connected
bots, and controls them. The IRC bots follow the PUSH approach, which means that
when an IRC bot connects to a selected channel, it does not get disconnected, and
remains in the connect mode (Gu, et al., 2008; Naseem, et al., 2010; Ping, Lei, Baber, &
Cliff, 2009). Figure 2- 4 shows the IRC-based Command and Control Botnets.
Figure 2-4: IRC-based C&C Botnet (Gu, et al., 2008)
-
7/30/2019 HTTP-Based Botntet Detection
31/124
17
2.2.4.2 HTTP-based Botnets
HTTP-based Command and Control is a new technique that allows the Botmasters
to control their bots by using the HTTP protocol (Jae-Seo, et al., 2008). In this
technique, the bots use specific URL or IP address defined by the Botmaster, to
connect to a specific web server, which plays a Command and Control Server role
(Naseem, et al., 2010).
HTTP bots adopt the PULL approach, unlike the PUSH approach used by the IRC-
based bots. In the PULL approach, the HTTP-based bots do not remain in the connect
mode after it has established a connection to the Command and Control server, the first
time. In the PULL approach, the Botmasters publish the commands on certain web
servers, and the bots periodically visit those web servers to update themselves or get
new commands. This process continues at a regular interval, that is defined by the
Botmaster (Gu, et al., 2008; Jae-Seo, et al., 2008; Naseem, et al., 2010; Ping, et al.,
2009). Figure 2-5 shows the HTTP-based Command and Control Botnets.
Figure 2-5: HTTP-based C&C Botnet (Gu, et al., 2008)
-
7/30/2019 HTTP-Based Botntet Detection
32/124
18
2.2.5 Decentralised or P2P Command and Control Mechanism
The decentralised Command and Control architecture is based on the peer-to-peer
network model. In this model, the infected computers or zombies can act as a bot and as
a C&C server at the same time (Ianelli & Hackworth, 2005.; Jing, et al., 2009; Naseem,
et al., 2010). In fact, in P2P Botnets, instead of having a central C&C server, each bot
acts as a server to transmit the commands to its neigbouring bots. The Botmaster sends
commands to one or more bots, and the bots that receive the commands then deliver
them to other bots, and this process is repeated by each bot that receives a new
command.
Unlike the centralised Botnet, creating and managing the P2P Botnets involve
complex procedures and require a high level of expertise (Gu, et al., 2008; Jing, et
al., 2009; Ping, et al., 2007) . Figure 2-6 shows the structure of a decentralised
Command and Control Botnet.
Figure 2-6: Decentralized or P2P Botnet (Ping, et al., 2007)
-
7/30/2019 HTTP-Based Botntet Detection
33/124
19
2.3 Why Choose HTTP Botnets?
As discussed in sections 2.2.4 and 2.2.5, there are three different types of Botnets -
IRC, HTTP, and P2P. The reasons for choosing the HTTP-based Botnets for this
research, are as follows:
In the first generation of Botnets, the IRC technology was used by Botmasters to
control the bots because the IRC system has several advantages such as ease of use,
ease of control, and ease of management (Ianelli & Hackworth, 2005.; Jae-Seo, et al.,
2008). However, the main weakness of IRC Botnet is the central control mechanism. A
whole Botnet can be destroyed by blocking the IRC server or blocking the IRC ports.
Hence, the P2P Botnets were designed to overcome this problem (Wei, Tavallaee,
Goaletsa, & A. Ghorbani, 2009; Zhaosheng, et al., 2008).
In the decentralised Botnets or P2P Botnets, there is no central Command and
Control server, rather, there are multiple distributed servers. Commands are delivered
bot by bot to the entire Botnet. In addition, some decryption methods are used to make
the communication secure (Gu, et al., 2008; Ianelli & Hackworth, 2005.; Ping, et al.,
2007) . These techniques make it more difficult to detect P2P Botnets as compared to
the IRC Botnets. However, P2P Botnets are not as widely used as IRC Botnets because
the implementation and control of P2P bots can be quite difficult and complex. In
addition, there is no latency in message delivery in P2P Botnets, and also the
Botmasters are not able to know about the delivery status of the commands (Bailey,
Cooke, Jahanian, Yunjing, & Karir, 2009).
Recently, Botmasters have begun to use the centralised Command and Control
structure, again. However, the HTTP protocol is used in place of the IRC protocol
-
7/30/2019 HTTP-Based Botntet Detection
34/124
20
(Jae-Seo, et al., 2008; Naseem, et al., 2010), and also port 80 is used. Because of the
wide range of services used, it is not easy to block the central Command and Control
server (Sandvine, 2006). In addition, by using the HTTP protocol, bots hide their
communication flows among the normal HTTP flows, and avoid detection by the
network defenders such as the firewalls (Chao, et al., 2009; Govil & Jivika, 2007;
Zeidanloo & Manaf, 2009).
From the review of the characteristics of IRC, P2P, and HTTP-based Botnets, it is
clear that the HTTP command and control mechanism is a new technology that is
preferred by Botmasters. Compared to the IRC and P2P Botnets, HTTP-based Botnets
have a set of attributes that make it difficult for them to be detected. Surprisingly, the
number of researches focusing on the detection of HTTP-based Botnets is relatively low
as compared to the number of researches on the detection methods for IRC-based and
P2P Botnets.
The following sections discuss the past and current researches on Botnet detection
methods.
-
7/30/2019 HTTP-Based Botntet Detection
35/124
21
2.4 Existing Botnet Detection Methods
This section discusses the current methods and research on Botnet detection.
2.4.1 Honeypot and Honeynet
Honeypots are tools that are used as traps for bots as they can detect bots or collect
information on their activities. The information can be used to understand more about
bots behaviour or the intentions of the Botmasters. Nepenthes is a good example of a
Honeypot that is used to collect the bots binary codes and other information about them
(Niels & Thorsten, 2007; Rajab, Zarfoss, Monrose, & Terzis, 2006).
Freiling, Holz, and Wicherski (2005) used Honeypots to collect information about
DDOS attacks. This information includes DDOS signs and characteristics, cases, and
the attackers intention and behaviour. This information is useful for the development
of methods to prevent DDOS attacks.
Similarly, Rajab et al. (2006) combined several Honeypots as a multifaceted
approach to collect a large amount of information about IRC bots. By analysing the
data, as well as tracking the activities of the bots, they learned more about the bots
characteristics and behaviour.
Like any other tools and techniques, Honeypots have their weaknesses. There are
two types of Honeypots - low-interaction honeypots, and high-interaction honeypots.
The main difference between them is the level of access rights to system resources,
services, and functions.
-
7/30/2019 HTTP-Based Botntet Detection
36/124
22
Low-interaction honeypots like Nepenthes, are installed on computers to emulate
limited services of their operating system, thus, they provide Botmasters limited
interaction with the computers. Therefore, these computers may not be completely
compromised, and the information collected on them may not be sufficient for analysis
to detect Botnets (Niels & Thorsten, 2007).
On the other hand, the high-interaction honeypots do not emulate any services of
operating system but provide the real system and services. The Botmaster can use this
real services to gain full control of the computer in which the high-interaction honeypot
is installed (Niels & Thorsten, 2007).
Today, it is not surprising that Botmasters use many techniques to avoid the
honeypots (C. Zou & Cunningham, 2006) .
2.4.2 Detection by Signature
Signature refers to the known patterns or characteristics of threats from intruders
into computer systems. By analysing and comparing these patterns or characteristics, it
is possible to distinguish the threat activities from the normal activities (Scarfone &
Mell, 2007).
Goebel and Holz (2007) used an IRC nickname as signature. Using this method,
known as Rishi, a reasonable amount of information on IRC traffic can be collected.
Subsequently, all the IRC nicknames are extracted from the collected data and checked
for known bots nicknames by using some regular expressions. To reduce the amount
of comparison and the time taken, Goebel and Holz used a white list and a black list.
-
7/30/2019 HTTP-Based Botntet Detection
37/124
23
The signature-based detection method is not very effective because this method
cannot identify new behaviour patterns or certain characteristics. This method is based
on a simple comparison of the collected information with the predefined characteristics
of well-known bots. Thus, this method is good for detecting well-known bots, but quite
useless for detecting new and zero-day bots (Chao, et al., 2009; Scarfone & Mell,
2007).
2.4.3 Detection by DNS Monitoring
Monitoring and analysing the DNS traffic generated by bots had been used as a
technique to detect Botnets. Choi, Lee, Lee, and Kim (2007) found that bots generate
DNS traffic in some situations, for example, when identifying the Command and
Control server or arranging attacks such as DDOS attack. The researchers used three
main differences between the bot-generated DNS flows and the normal DNS flows, as
ways to detect Botnets.
The first difference they noticed is the amount of the source IP addresses that send
the DNS queries to specific domain names. The Botnet DNS queries are generated by a
fixed number of IP addresses that belong to the bots in the same Botnet. On the other
hand, a number of IP addresses of legitimate DNS queries generated by anonymous
users to a particular domain name, are random.
The second difference is the difference in the format and frequency of DNS queries
generated by bots and by normal users. Bots have similar group activities, thus, DNS
queries of the same format are generated by bots from the same Botnets, intermittently,
-
7/30/2019 HTTP-Based Botntet Detection
38/124
24
and only in special situations, but the DNS queries of normal users are generated
continuously, and in a random format.
The third difference is that normal users hardly use a distributed DNS
(DDNS), whereas, it is used by bots.
Salomon and Brustoloni (2008) also used DDNS as base parameters to suggest two
approaches for Botnet detection. They found that Botmasters do not use certain
Command and Control servers for a long time, and periodically change the servers. In
this situation, bots will try to find the address of a new Command and Control server.
When this happens, there will be a higher number of DDNS queries to specific domain
names. These are signs of unusual activities of Botnets.
NXDOMAIN has been evaluated as another parameter. The term, NXDOMAIN or
Non-Existent Domain, describes the special state that accrues when the DNS resolvers
are unable to resolve a certain domain name for any reason such as change of domain
names, unregistered domain names, or server problems. Salomon and Brustoloni
(2008) suggested that the high number of DDNS queries containing the NXDOMAIN
code, could have been generated by the bots, which are searching for their Command
and Control servers that might have been blocked, or moved.
DNSBL or DNS Block List is a list of spamming computers and network IP
addresses. Ramachandran, Feamster, and Dagon (2006) stated that DNSBLs may be
checked by Botmasters to keep themselves aware of their bots status - to find out
whether a particular bot is being blocked. Thus, their algorithms are designed to
distinguish the normal DNSBL queries (generated in a normal service such as mail
servers) from the queries generated by Botmasters.
-
7/30/2019 HTTP-Based Botntet Detection
39/124
25
The detection methods, discussed above, were designed to analyse bots and Botnet-
generated DNS (domain name system) queries. These methods are no longer effective
as the new generation of bots and Botnets have been designed to generate minimum
number of DNS queries. Moreover, the process of analysing DNS is very complex (Jae-
Seo, et al., 2008).
2.4.4 Detection using Attack Behaviour Analysis
In this method, the characteristics and behaviour of attacks have been studied by
researchers more than other issues such as the bots, Command and Control servers, or
Botmaster behaviour, or the communication methods used.
Hu, Knyz, and Shin (2009)proposed a system, called RB-Seeker, which has three
different sub-systems to detect bots that carry out URL redirection attacks. The first two
sub-systems of the method attempt to identify all domains, which are related to
redirection activities, based on the characteristics and behaviour of the URL redirection
attack. At this stage, the system does not make any decision about the domain status,
which can either be normal or malicious. In the next stage, the third sub-system
examines the DNS queries to distinguish the malicious domains from the normal
domains.
This method, however, uses DNS-based techniques, but the main aim is to focus
more on URL redirection activities, and DNS probing is used only as a sub-system.
Therefore, this method does not belong to the DNS-based category.
Brodsky and Brodsky (2007) found that a higher number of spam emails are sent by
bots, within a short period than those sent by humans. Based on this observation, the
-
7/30/2019 HTTP-Based Botntet Detection
40/124
26
source of spam emails were identified and recorded. Subsequently, the number of spam
emails generated by the same recorded sources, within a short period, was used as a
parameter for decision-making.
Likewise, Yinglianet al. (2008) designed a system to collect all the URLs that were
sent by the spam emails, and divided them into different groups based on their Web
domains. In the next step, all the URL groups were given the regular expression
generator to create a signature for malicious URLs.
These methods can identify bots based on the similarity of their group activities.
The methods are effective when countering attacks from a large number of attackers.
2.5 Detection Based on Network Behaviour Analysis
Network Behaviour Analysis or NBA is a method that can be used to collect a wide
range of information and statistics about network traffic. The information is analysed to
detect for any signs of threats or malicious activities. The NBA method consists of
several components that include the sensors and management servers (Analyser)
(Scarfone & Mell, 2007; Timofte & Romania, 2007).
The NBA system collects information such as IP addresses, operating system,
available services, and logging data such as Timestamp, event type, network protocols,
host ports, and additional packet header field for each client (Scarfone & Mell, 2007) .
-
7/30/2019 HTTP-Based Botntet Detection
41/124
27
2.5.1 Why Choose Network Behaviour Analysis?
The Network Behaviour Analysis system has been chosen for this research for two
main reasons:
a) Ability to Detect Unknown Threats:Botmasters update their techniques day-by-day to hide their activities from
existing detection methods (Lee, et al., 2007). The NBA system can thwart the
Botmasters strategy as it can detect unknown (zero-day) threats. This feature of
the NBA system can further improve Botnet detection (Derek, 2009; Scarfone &
Mell, 2007).
b) Ability to Detect Encrypted Threats:Botnets try to hide their communication flow among normal web traffic (e.g.
HTTP C&C) (Zeidanloo & Manaf, 2009) or use encryption methods (e.g. P2P
C&C) (Ping, et al., 2007). NBA looks out for abnormal flow patterns in network
traffic, and not at the content of the information being transmitted (Rehak et al.,
2009).
In addition, the benchmark report from Aberdeen Group (Derek, 2009) pointed out
that the NBA methods produce good results when combined with other methods.
2.5.2 Existing Detection Methods Based on NBA
The Network Behaviour Analysis technique has been widely used by researchers for
Botnet detection for many years.
-
7/30/2019 HTTP-Based Botntet Detection
42/124
28
Strayer, Walsh, Livadas, and Lapsley (2006)designed a system to detect IRC bots
using five filters. Initially, the IRC chat traffic is separated from the other types of
traffic. The IRC traffic is then examined using five different filters to reduce the amount
of useless traffic flows. The first filter is applied to reduce the amount of IRC traffic
based on the assumption that bots use only TCP-based IRC flows.
The other four filters, respectively, further reduce the IRC traffic tracked based on
the following criteria: flows that only have a SYN and RST flags; high bit rate flows;
average packet size is bigger than expected; and short duration flows. In the last stage
of filtering of the IRC traffic flow, the machine-learning technique proposed by
Livadas, Walsh, Lapsley, and Strayer (2006) is applied. Finally, a five-dimensional
correlation algorithm is used to make a final decision to detect IRC bots (Strayer,
Walsh, Livadas, & Lapsley, 2006).
Gianvecchio, Xie, Wu, and Wang (2008) studied the results from different
measurements, which show the difference between the bot behaviour and human
behaviour in the IRC chat. They noticed a difference between the bots and human with
respect to the inter-message delay and message size in the Internet chat rooms. After
analysing these two parameters, they proposed a system that uses entropy and machine-
learning-based classifiers to detect chat bots.
Mitsuaki et al. (2007) introduced three metrics - relationship style, response time,
and synchronization activities - for detecting bots. Because the Botmasters are
connected to the bots via Command and Control servers, they assume that there is a 1 to
N relationship between the Botmaster and the bots in a Botnet. Mitsuaki et al. use the
structure of this relationship as a metric to detect Botnets.
-
7/30/2019 HTTP-Based Botntet Detection
43/124
29
They also observed that the IRC chat bots respond faster than human, hence, the
response time is used as the second metric. Finally, they observed that the bots get their
commands from the Botmaster. This means that the bots may perform abnormal
activities to be in synchronisation with other bots in the same Botnet. This
synchronisation activity is used as another metric.
Wei et al. (2009) categorised the services or applications using signature-based and
decision tree classifiers. They categorised the network applications into IRC chat, P2P,
and web applications. Then, focusing on each category, they use the response time, and
synchronisation activities metrics introduced by Mitsuaki et al. (2007) to differentiate
the bot activities from the normal activities.
Guofei, Phillip, Vinod, Martin, and Wenke (2007) proposed the BotHunter that
models the five subsets, which may happen during the infection process by bots. They
set these subsets in different correlation engines to examine the traffic flows to look for
any evidence of Botnet activities.
BotSniffer (Gu, et al., 2008) and its extension BotMiner (Guofei, Roberto, Junjie, &
Wenke, 2008), are Botnet detection systems that carry out their tasks by analysing the
similarity in the abnormal or malicious activities generated by the bots of the same
Botnet.
Jae-Seo et al. (2008) used a parameter based on one of the pre-defined
characteristics of HTTP-based Botnets. As discussed earlier, the HTTP bots
periodically connect to a particular Command and Control server to get updates. The
researchers suggested that there is a degree of periodic repeatability or DPR to show the
-
7/30/2019 HTTP-Based Botntet Detection
44/124
30
rate of periodic connections to certain servers. The value of DPR is used as a parameter
to detect HTTP-based bots.
The next section will evaluate some of the methods used in past researches and
compare them to the system developed in this research.
2.5.3 Evaluation and Comparison of Existing NBA Methods for Botnet Detection
A Botnet detection system, called botAnalytics, was developed in this research to
detect HTTP-based Botnets. The reasons for choosing HTTP-based bots, and the
Network Behaviour Analysis approach for the design of botAnalytics, had already been
discussed. In this section, botAnalytics will be compared with other methods from past
researches that also used the NBA technique. Table 2-1 shows the comparison, in brief.
Table 2-1: Comparison of Methods from Past Researches with botAnalytics
As shown in table 2-1, all the methods are able to detect unknown (zero-day) bots.
This ability is one of the main advantages of using the Network Behaviour Analysis
system, as discussed earlier. botAnalytics was designed to detect HTTP-based Botnets,
-
7/30/2019 HTTP-Based Botntet Detection
45/124
31
hence, it cannot be compared with the first five methods, that were designed to detect
IRC-based Botnets.
Jae-Seo et al. (2008) proposed a system to detect only HTTP-based bots. In this
method, normal applications can incorrectly be detected as bots, and this can produce
very high false-positive results.
The methods proposed by Guofei et al. (2008) and Gu et al.(2008)were designed to
detect all three types of bots IRC-based, P2P, and HTTP-based bots. In general, their
methods produce low false-positive results, but their sub-systems, which are involved in
detecting HTTP-based bots, produce high false-positive results. This is because the
proposed HTTP-based Botnet detection sub-systems have the same design as that
proposed by Jae-Seo et al.
As discussed earlier, the technique proposed by Guofei et al. and its extension by
Gu et al., are based on the similarity of the bots group activities, and use data mining
approaches. These techniques work with a Botnet that has a large number of bots to
produce results to make better decision. For this reason, these methods are not effective
in small-scale Botnets.
Gu et al. (2008) proposed a method to detect small-scale Botnets, but this method
has a direct relationship with the false-positive rate, which means that if its
effectiveness in small-scale Botnets increases, the false-positive ratio also increases.
The botAnalytics system developed in this research was aimed at overcoming the
weaknesses of BotSniffer (Gu, et al., 2008), and BotHunter (Guofei, Phillip, Vinod,
Martin, & Wenke, 2007). It can detect even a very small-scale Botnet that has only one
-
7/30/2019 HTTP-Based Botntet Detection
46/124
32
bot. In addition, botAnalytics produces very low false-positive rate, unlike the method
developed by Jae-Seo et al. (2008).
2.6 Conclusion
There are three types of Botnet based on the way their bots communicate with each
other. IRC-based and HTTP-based Botnets are called centralised and P2P is called
decentralised. HTTP-based bots are the latest generation of Botnets that hide their
activity by using the normal HTTP traffic.
HTTP-based Botnets have a set of characteristics that make its detection difficult
compared to the IRC and P2P Botnets. There are a several methods and techniques that
have been used by researchers to track the Botnet activities and detect them, but the
number of researches in HTTP-based Botnet detection is low as compared to the
number of researches on the detection methods for IRC-based and P2P Botnets.
The ability of the NBA system to detect unknown and encrypted threats made it the
preferred system to modeling botAnalytics. Next chapter discusses the process of
modeling a detection system based on NBA architecture.
-
7/30/2019 HTTP-Based Botntet Detection
47/124
33
Chapter 3: Modeling of Detection System3.1 Introduction:
This chapter describes the method adopted to carry out the research on modeling a
new system for detecting HTTP-based Botnet. As described in literature review, in this
research a detection method has been proposed by using the network behaviour analysis
(NBA) architecture (Derek, 2009; Scarfone & Mell, 2007). The proposed method use NBA
architecture to collect a wide range of information and statistics about particular network
traffic. Then the collected information is analysed to search for any signs of bots and Botnet
activities.
3.2 Proposed Method Architecture
There are three layers in proposed method architecture - data collecting platform,
data storing platform, and data analysing platform. Based on the NBA structure, the
proposed method consists of several components that include the software sensors and
management server (Analyser) (Scarfone & Mell, 2007; Timofte & Romania, 2007). Figure
3-1 shows the schema of proposed method architecture.
Figure 3-1: botAnalytics System Architecture
-
7/30/2019 HTTP-Based Botntet Detection
48/124
34
3.2.1 Data Collecting PlatformThe data collecting platform consists of a set of software sensors, which had
been installed on each client in a particular network. The main task of the data
collecting platform is to collect data of the HTTP traffic in each client and to store
the data in the database. This platform also uses a set of filters and other techniques
to separate out data on unwanted traffic.
3.2.2 Data Analysing PlatformThe data collected by the data collecting platform are analysed by the data
analysing platform to detect suspicious activities associated with a bot or Botnet. A
set of filters and techniques are used by this platform to make the analysis process
fool-proof.
3.2.3 Data Storing PlatformThe data storing platform is the place where the collected data are kept
before and after the analysis process. All the results are saved in the database to
maintain the history of the system performance.
3.3 Data Reduction Filters
In addition to sniff network traffic, the proposed data collecting platform apply two
filters on collected data to filter out the useless data from being collected, and reduce the
amount of unwanted data.
-
7/30/2019 HTTP-Based Botntet Detection
49/124
35
3.3.1 HTTP Traffic Separator FilterHTTP Traffic Separator filter (H.T.S) was designed to separate the HTTP
traffic from other types of traffic in the network. botAnalytics was designed to
detect HTTB-based Botnets. As mentioned in section 2.2.4, HTTP-based Botnets
use the HTTP traffic; hence, the data on other types of network traffic are not
collected. Figure 3-2 shows the flowchart of this filter.
Figure 3-2: The flowchart of H.T.S. filter
3.3.2 Get and Post Separator FilterThe Get and Post Separator (G.P.S.) filter designed to select only the HTTP
traffic with GET or POST methods. The HTTP-based bots use the GET or POST
methods to contact their Command and Control server, thus, the other methods
provide no information about bot activities (Joe, 2004; Naseem, et al., 2010;
Nazario, 2007). Therefore, The G.P.S. filter focuses on the HTTP methods, and
only selects the HTTP traffic with the GET and POST methods. Figure 3-3 shows
the flowchart of this filter.
-
7/30/2019 HTTP-Based Botntet Detection
50/124
36
Figure 3-3: The flowchart of G.P.S. filter
3.4 VOU MechanismThe VOU or Validation of User-Agents mechanism was designed based on a unique
algorithm. It is used, for first time in this research in the data collecting platform. This
mechanism defines the VOU field for each collected HTTP traffic packet with an
appropriate value.
The VOU mechanism acts on each collected packet of HTTP traffic with the GET
or POST methods, and obtains the User-Agent from the collected traffic header. In the next
step, the VOU tries to define the User-Agent string and its corresponding application from
the installed application list. The install application list contains the list of applications and
services, which are available on each client within a network, together with their
corresponding User-Agent. This list can be updated by users or automatically from
websites such as www.user-agents.org . Figure 3-4 shows the flowchart of the VOU
mechanism.
http://www.user-agents.org/http://www.user-agents.org/ -
7/30/2019 HTTP-Based Botntet Detection
51/124
37
Figure 3-4: The VOU Module Flowchart
For each collected HTTP packet, the VOU field is updated with either one of
three different values, based on different conditions, as explained below:
-
7/30/2019 HTTP-Based Botntet Detection
52/124
38
1) UNKNOWN valueIf the VOU mechanism is not able to determine the User-Agent for
any reason, for example, due to encryption or use of fake User-Agents, the
VOU field of the collected traffic will be given the UNKNOWN value. If
the VOU mechanism is able to determine the User-Agent but is not able to
identify the corresponding application, the VOU field will also be given the
UNKNOWN value.
2) VALID valueThe VOU field will be set to the VALID value if the User-Agent and
its corresponding application have been identified, and the corresponding
application has been installed on the client and is available at the same time.
3) NOTVALID valueIf the User-Agent and its corresponding application have been
identified but the corresponding application is not available on the client, the
VOU field will be given the NOTVALID value.
3.5 Analysing the Collected TrafficThe data collecting platform periodically sniffs the network traffic and applies the
H.T.S. and G.P.S. filters to select only HTTP-type traffic using the GET or POST method.
In addition to these filters the VOU mechanism is applied on collected data as described on
section 3.4. When a reasonable number of packets have been collected and stored in the
data store platform, the Analyser begins its work in the data analysing platform as follows:
-
7/30/2019 HTTP-Based Botntet Detection
53/124
39
3.5.1 Grouping and SortingThe Grouping and Sorting (G.A.S) process sorts data on the collected traffic
and divides them into different groups based on the source IP address (SIP),
destination IP address (DIP), URL, and the User-Agent string (UA).
While the other researches mostly use source IP, destination IP and Domain
names to divide the collected traffic packets to different groups, in the proposed
method one of the HTTP header fields known as the User-Agent has been used as
another parameter beside the previous ones, to make the collected network packets
classification more accurate. The G.A.S. process categorised the traffic packets into
different groups, then the three different filters are applied to each group of packets
to search for signs of suspicious activities and presence of HTTP bots.
3.5.2 High Access Rate FilterThe H.A.R. filter or High Access Rate filter eliminates the group of similar
HTTP connections or requests that have been generated within a very short time, for
example, more than one request per second. Figure 3-5 shows the H.A.R. filter
flowchart.
-
7/30/2019 HTTP-Based Botntet Detection
54/124
40
Figure 3-5: Flowchart of H.A.R. Filter
3.5.3 Low Access Rate FilterThe L.A.R. filter or Low Access Rate filter removes the HTTP traffic with
less than 2 packets of requests in the whole data collecting period. For example, if a
group of HTTP traffic is generated within a very short time in the data collecting
period, it will be removed by this filter. Figure 3-6 shows the L.A.R. filter
flowchart.
-
7/30/2019 HTTP-Based Botntet Detection
55/124
41
Figure 3-6: Flowchart of L.A.R. Filter
3.5.4 Periodic Access Rate FilterThe P.A.R. filter or Periodic Access Rate filter selects the HTTP
connections or requests that were generated at periodic intervals. This filter was
designed based on the nature of HTTP-based Botnets. As noted in the literature
review, the HTTP bots connect to their command and control server periodically to
get the commands or updates. Figure 3-7 shows the P.A.R. filter flowchart.
-
7/30/2019 HTTP-Based Botntet Detection
56/124
42
Figure 3-7: P.A.R. Filter Flowchart
3.6 LODA MechanismLODA or Level of Danger Analysing mechanism is designed to analyse the
detected suspicious traffic to define its level of danger. Figure 4-9 shows the flow chart of
the analysing algorithm of LODA. Figure 3-8 shows the flow chart of the analysing
algorithm of LODA.
For every suspicious activity detected, the analysis process starts by examining the
VOU field value, which has been set by the VOU mechanism. If the value of the VOU field
of a particular group of suspicious traffic is VALID, the level of danger field for that group
will be set to LOW. If the VOU value is NOTVALID, the level of danger will be set to
HIGH, and if the VOU value is UNKNOWN, the next step of analysing will start.
-
7/30/2019 HTTP-Based Botntet Detection
57/124
43
Figure 3-8: LODA Module Flowchart
If the value of the VOU field is UNKNOWN, the query is referred to the database
to retrieve the count of similar traffic group, which is generated by other clients in the
-
7/30/2019 HTTP-Based Botntet Detection
58/124
44
network. The answer is compared with the limit value set by the system Administrators. If
the count is greater than the limit value, the level of da