adaptive intrusion detection using learning classifiers

Adaptive Intrusion

Detection Using Learning

ClassifiersPatrick Nicolas

June 21, 2013

patricknicolas.blogspot.comwww.slideshare.net/pnicolas

github.com/prnicolas

The objective of this presentation is to review the different method to implement an adaptive intrusion detection (IDS) solution. The second part of the presentation dives into learning classifiers class of algorithms to detect, evaluate and act upon a security breach or cyber attack.

Introduction

Data Mining Techniques

Learning Classifiers Systems

Context

The effectiveness of an intrusion detection system depends on its adaptability to

● Ever changing IT environment● Evolving internal policies & regulations● Agile organization & mobile workforce

Data mining is becoming a popular method to extract knowledge from historical data.However, traditional data mining techniques fail to capture the evolutionary nature of an organization, its process, rules and IT infrastructure.

Data Mining: Overview

Data Mining: Clustering

Unsupervised learning methods such as clustering or spectral analysis have drawbacks:●Poor classification of mix variable types●No descriptive representation●Limited leverage of the domain expertise●High computational cost to update

models

Supervised learning methods can be effective on a large set of historical data but have the following limitations:●Need for large training set to alleviate

data over-fitting●No descriptive representation●Limited role for domain expert

Data Mining: Supervised Learning

Data Mining TechniquesLearning Classifiers Systems

An evolutionary approach

1. An intrusion detection solution should learn from its suggestions through a process borrowed from human behavior: reward-based learning

2. It should evolve with the system it monitors: Darwinian process

A class of algorithms known as learning classifiers (LCS) or extended learning classifiers (XCS) combines genetic algorithm and reinforcement learning to discover, evolve security policies and rules from real-time data.

Rule-based Learners

● Rule-based representation allows security experts to monitor evolving knowledge

● Learn from each security event, making very well suited for streamed data

● Support various seeds schema such as initial rules set, training set and clustering.

LCS/XCS Benefits

Security rules are used to represent the knowledge of a security expert.IF num. outbounds ftp sessions >5 THEN cost+2 (source: KDD Cup Dataset 1999) Those rules are chained to support reasoning about a sequence of events in a data center.

Security rules

The rules set needs to adapt constantly to the ever changing environment & objectives.

Rules Set Evolution

In order to evolve, rules are represented as genes in Genetic Algorithm. A gene is implemented at a binary vector structure for which the state or condition of the rule is expressed as op(x, value) (i.e. x > value) IF op(x, value) THEN f(cost) is translated

Rule Encoding

010 1000101 0101101110 01101110100101010

op x values cost or action

As with any rules-based inference engine, encoded rules can be chained by aggregating binary representations:IF op1(x1, v1) AND op2(x2, v2)THEN f(cost)

Rules Chains & Chromosomes

001 010 1000101 01011110 010 100101 0101101110 01101110100101010

&& op1 x1 v1 op2 x2 v2 cost or action

In terms of evolutionary algorithm, the firing of multiple rules is represented as a sequence of genes or chromosomes

The rules set evolves through the genetic recombination of rules using cross-over, mutation and transposition operations.

Rules Evolutionary Process

0101101011101110101010111010100111

1101010101110101001101010110101110

0101101011101110101010111010100111

11010101011101101001110101101011101

Parent rules Offspring rules

Cross-over operation

0101101011101110101010111010100111 0101101011101110101010101010100011

Mutation operation

0101101011101110101010111010100111 0101101011101110101010101010100011

Transposition operation

Rules are selected according to their fitness before being ‘mated’ and mutated. The fitness of a rule represents its contribution to a detection or prevention of an intrusion.

The rules which are repeatedly invoked, have the highest fitness values and thrive overtime. Other rules become slowly irrelevant.

Rules Fitness

Overview Genetic Algorithm

Initial rules set Encoding Initial chromosomes

Selection

Cross-over

Mutation

New chromosomesDecodingNew rules set

Fitness

The rules set is constantly updated by the Genetic Algorithm to guarantee that it identifies intrusion correctly.

The fitness criteria of one or multiple rules has to be updated according to the state of the Infrastructure, organization & policies. The fitness function is updated to provide the best possible reward (or credit) to the rules that contribute to the detection of an intrusion.

Rule Fitness & Reward

Reinforcement learning techniques are widely used in robotics. In the context of IDS, it rewards (or punishes) rules for their contribution (or lack of) in identifying threats taking into account changes in the organization, external accesses and IT infrastructure.

Reinforcement Learning

Evolutionary Security Rules

1. Process new data/event from the system2. Find the security related rule(s) which

condition matches the event3. Create a new rule if none match (Covering)4. Fire the fittest rules with the highest

predicted outcome.

Matching

Genetic Algorithm

Threat predictor

Update Fitness

Real-time data

Threat levelState

New ruleThreats monitor

IDSData

CenterCloud

Rules7

Evolution

Reward6

Evolutionary Security Rules

Matching

Genetic Algorithm

Threat predictor

Update Fitness

Real-time data

Threat levelState

5. Process new state on system6. Reward contributing/matching rules by

updating the rule fitness7. Genetic algorithm update the existing

population of security rules through reproduction and mutation of rules.

New ruleThreats monitor

IDSData

CenterCloud

Rules7

Evolution

Reward6

By combining evolutionary algorithms with reinforcement learning, rule-based learners such as learning classifiers systems allow security policies and constraints to adapt to any change in environment or data center and therefore stay a step ahead of ever changing threats.

Conclusion

● Genetic Programming: On the Programming of Computers by Means of Natural Selection - j. Koza

● Reinforcement Learning: An Introduction to Adaptive Computation and Machine Learning - R. Sutton, A. Barto

● Learning Classifiers Systems in Data Mining L. Bull, E. Bernado-Mansilla, J. Holms

● Hacking Smart Machines with Smarter Ones: How to Extract Meaningful Data from Machine Learning Classifiers G. Ateniese, G. Felici, L. Mancini, D. Vitali, A. Spognardi

● Evaluation of anomaly-based IDS for mobile devices using machine learning classifiers D. Damopoulos, S. Menesidou, G. Kambourakis, M Papadaki, N. Clarke

● http://patricknicolas.blogspot.com

References

adaptive intrusion detection using learning classifiers

Technology

5 classifiers

review: boosting classifiers for intrusion detection ·...

adaptive neuro-fuzzy intrusion detection...

more classifiers

adaptive real-time anomaly-based intrusion detection using...

applying new network security technologies to …...some of...

an agent-based distributed framework for intrusion ... ·...

gaussian classifiers

classifiers notes

adversarial deep learning against intrusion detection...

a novel approach for intrusion detection system using...

reflux classifiers

adversarial deep learning against intrusion detection...

adepts: adaptive intrusion response using attack … ·...

linear classifiers

adaptive intrusion detection of malicious unmanned air...

intrusion detection intrusion detection intrusion detection...

dynamic ch selection and intrusion detection in wsn using...

hacqit: hierarchical adaptive control of qos for intrusion...

arc-lh: a new adaptive resampling algorithm for improving...