intrusion detection sytem
Post on 14-Apr-2015
Embed Size (px)
DESCRIPTIONintrusion detection sytem
Chapter 1 INTRODUCTION
1.1 IntroductionSecurity of computers and network systems has become more and more important, as more computers are connected to each other and more applications are implemented in this "virtual" world. Applications, such as Electronic Commerce and Online Banking, require a strict sense of security. While security may not be able to stop all kinds of threats and attacks, there is a need for a way to know the events of any attack that is happening. This record of attacks can be used as a tool to strengthen the security and also can be used as a forensic tool for evidences of crime. The tool for this purpose is known as Intrusion Detection System (IDS). As the name suggested, IDS is built to provide detection of "intruder" in the system. With the knowledge of attacks happening, it gives better way of countering them. Intrusion prevention techniques, such as user authentication (e.g. using passwords or biometrics), avoiding programming errors, and information protection (e.g., encryption) have been used to protect computer systems as a first line of defense. Intrusion prevention alone is not sufficient because as systems become ever more complex, there are always exploitable weaknesses in the systems due to design and programming errors, or various socially engineered penetration techniques. Intrusion detection is therefore needed as another wall to protect computer systems. An intrusion into a computer system can be compared to a physical intrusion into a building by a thief. It is an entity gaining unauthorized access to resources. The unauthorized access is intended to steal or change information or to disrupt the valid use of the resource by an authorized user. Intrusion detection is the ability to determine that an intruder has gained, or is attempting to gain unauthorized access. An intrusiondetection system is a tool used to make this determination. The goal of any intrusiondetection system is to alert an authority of unauthorized access before the intruders can cause any damage or take any information, much like a burglar alarm system in a
building. However, a digital computer system is far more vulnerable than a building and much harder to protect. The intruder can be hundreds of miles away when the attack is initiated, leaving behind very little evidence. There are some basic definitions of the terms used in this system: Security: Security consists of mechanisms for providing confidentiality, integrity, and availability. Confidentiality means that only the individuals allowed access to particular information should be able to access that information. Integrity refers to those controls that prevent information from being altered in any unauthorized manner. Availability controls are those that prevent the proper functioning of computer systems from being interfered with. Threat: A threat is any situation or event that has potential to harm a system. Threats may be external or internal. Threats from users consist of masqueraders (those who use credentials of others) and clandestine users (those who avoid auditing and detection). Misfeasors are legitimate users who exceed their privileges. Attack: An intentional attempt to bypass computer security measures in some fashion. Intrusion: A successful attack. An intrusion can be defined as any set of actions that attempt to compromise the confidentiality, integrity or availability of a resource. Signature: A pattern that can be matched to identify a particular type if activity. Detection rules: A rule typically consists of a signature and associated contextual and response information.
1.2 MotivationOur objective is to eliminate, as much as possible, the manual and ad-hoc elements from the process of building an intrusion detection system. We take a datacentric point of view and consider intrusion detection as a data analysis process. Anomaly detection is about finding the normal usage patterns from the audit data, whereas misuse detection is about encoding and matching the intrusion patterns using the audit data. The central theme of our approach is to apply data mining techniques to
intrusion detection. Data Mining generally refers to the process of (automatically) extracting models from large stores of data. The recent rapid development in data mining has made available a wide variety of algorithms, drawn from the fields of statistics, pattern recognition, machine learning, and database. Several types of algorithms are particularly relevant to our research.
1.3 Problem DefinitionIn this Research work, we describe a data mining framework for adaptively building Intrusion Detection (ID) model. Datamining refers to Knowledge Discovry from the Data(KDD). We say that knowledge mining from data that is knowledge extraction, data analysis or pattern analysis. Datamining can be applicable to any kind of data repossitory. It is possible to find the pattern for further references, aggregattion operation and find the intresting mesure also. Through datamining large data tombs turns into golden nuggets of the knowledge. The central idea is to utilize auditing programs to extract an extensive set of features that describe each network connection or host session, and apply data mining programs to learn rules that accurately capture the behavior of intrusions and normal activities. These nuggets or rule then used for misuse detection and anomaly detection. The network-based approached relies on the Tcpdump data as input, which gives per packet information. This data was pre-processed as grouping records corresponding to their protocols and extract features from the data that is useful to train the intrusion detection system. After extract features from the Tcpdump data, the data mining algorithm is operated on the data. Here the task is to get association rules from the data. The datamining algorithm will take the data that contains extensive set of features and produce the rules. Apriori association algorithm is chosen for getting the rules. The rules generated from the above algorithm are then used to detect intruders. To perform this task ID3 algorithm is used that take data network data as input and compare it with the association rules. This algorithm labels the events as either normal or attacks.
1.4 The ChallengesFormulating the classification tasks, i.e., determining the class labels and the set of features, from audit data is a very difficult and time-consuming task. Since security is usually an after-thought of computer system design, there is no standard auditing mechanisms and data format specifically for intrusion analysis purposes. Considerable amount of data pre-processing, which involves domain knowledge, is required to extract raw action" level audit data into higher level session/event" records with the set of intrinsic system features. Figure 1.4 shows an example of audit data preprocessing. Here, binary tcpdump data is first converted into ASCII packet level data, where each line contains the information of one network packet. The data is ordered by the timestamps of the packets. Therefore, packets belonging to different connections may be interleaved. For example, the 3 packets shown in the figure 1.4  are from different connections. The packet data is then processed into connection records with a number of features (i.e. attributes), e.g., time (the starting time of the connection, i.e., the timestamp of its first packet), dur (the duration of the connection), src and dst (source and destination hosts), bytes (number of data bytes from source to destination), srv (the service, i.e., port, in the destination), and ag (how the connection conforms to the network protocols, e.g., SF is normal, REJ is rejected"), etc.
Figure1.1: Generation of audit data
These intrinsic features essentially summarize the packet level information within a connection. There are commonly available programs that can process packet level data into such connection records for network traffic analysis tasks. However, for intrusion detection, the temporal and statistical characteristics of connections also need to be considered because of the temporal nature of event sequences in network-based computer systems. For example, a large number of rejected" connections, i.e., flag = REJ, within a short time frame can be a strong indication of intrusions, because normal connections are rejected rarely. A critical requirement for using classification rules as an anomaly detector is that we need to have sufficient" training data that covers as much variation of the normal behavior as possible, so that the false positive rate is kept low (i.e., we wish to minimize detected abnormal normal" behavior). It is not always possible to formulate a classification model to learn the anomaly detector with limited (insufficient") training data, and then incrementally update the classifier using on-line learning algorithms. This is because the limited training data may not have covered all the class labels, and on-line algorithms. For example in modeling daily network traffic, we use the services, e.g., http, telnet etc., of the connections as the class labels in training models. We may not have connection records of the infrequently used services with, say, only one week's traffic data. A formal audit data gathering process therefore needs to take place first. As we collect audit data, we need an indicator that can tell us whether the new audit data exhibits any new" normal behavior, so that we can stop the process when there is no more variation. This indicator should be simple to compute and must be incrementally updated.
1.5 ConclusionThis chapter describes the background and motivation System. This chapter also contains introduction to data mining and its need, and problem statement of the research.
Chapter 2 LITERATURE SURVEY