artificial immune systems ais model tutorial
TRANSCRIPT
2005-12-13 Y. Tan---Artificial Immune Sys. 1
Artificial Immune System and Its Applications
Prof. Ying TAN
National Laboratory on Machine PerceptionDepartment of Intelligence Science
Peking University, Beijing 100871, P.R.China
2005-12-13 Y. Tan---Artificial Immune Sys. 2
Contents• Biological Immune System• Artificial Immune System• Basic Algorithms of AIS • AIS design procedure• Case Studies
– Malicious Executable Detection– Film Recommender
New• Immuneocomputing – IC• Danger Theory• Future
2005-12-13 Y. Tan---Artificial Immune Sys. 3
The Immune System is…
Immunity: state or quality of being resistant (immune), either by virtue of previous exposure (adaptive immunity) or as an inherited trait (innate immunity)
Immune system: a system that protects the body from foreign substances and pathogenic organisms by producing the immune response
2005-12-13 Y. Tan---Artificial Immune Sys. 4
Why is the Immune System?Immune system has following appealing features:• Recognition
– Anomaly detection– Noise tolerance
• Robustness• Feature extraction• Diversity• Reinforcement learning• Memory; • Dynamically changing coverage• Distributed• Multi-layered• Adaptive
2005-12-13 Y. Tan---Artificial Immune Sys. 5
Role of Biological Immune System
• Protect our bodies from pathogen and viruses
• Primary immune response– Launch a response to invading pathogens
• Secondary immune response– Remember past encounters– Faster response the second time around
2005-12-13 Y. Tan---Artificial Immune Sys. 6
Immune cells• There are two primarily types of
lymphocytes:– B-lymphocytes (B cells)– T-lymphocytes (T cells)
• Others types include macrophages, phagocytic cells, cytokines, etc.
2005-12-13 Y. Tan---Artificial Immune Sys. 7
Where is it?
L y m p h a tic v e s s e ls
L y m p h n o d e s
T h y m u s
S p le e n
T o n s ils a n d a d e n o id s
B o n e m a rro w
A p p e n d ix
P e y e r ’s p a tc h e s
P r im a r y ly m p h o id o r g a n s S e c o n d a r y ly m p h o id o r g a
2005-12-13 Y. Tan---Artificial Immune Sys. 8
Multiple layers of the immune system
Phagocyte
Adaptive immune
response
Lymphocytes
Innate immune
response
Biochemical barriers
Skin
Pathogens
2005-12-13 Y. Tan---Artificial Immune Sys. 9
Antigen
• Substances capable of starting a specific immune response commonly are referred to as antigens
• This includes some pathogens such as viruses, bacteria, fungi etc .
2005-12-13 Y. Tan---Artificial Immune Sys. 10
Biological Immune System
T Cell (Helper)
vsInnate Acquired
vsCell Mediated Humoral
B CellSecretes
AntibodyT Cell (Killer)
2005-12-13 Y. Tan---Artificial Immune Sys. 11
How does IS work: A simplistic view
A P C
M H C p r o t e i n A n t i g e n
P e p t i d e
T - c e l l
A c t i v a t e d T - c e l l
B - c e l l
L y m p h o k i n e s
A c t i v a t e d B - c e l l( p l a s m a c e l l )
( I )
( I I I )
( I V )
( V )
( V I )
( V I I )
( I I )
2005-12-13 Y. Tan---Artificial Immune Sys. 12
Self/Non-Self Recognition
• Immune system needs to be able to differentiate between self and non-self cells
• Antigenic encounters may result in cell death, therefore– Some kind of positive selection– Some element of negative selection
2005-12-13 Y. Tan---Artificial Immune Sys. 13
Immune Pattern Recognition
• The immune recognition is based on the complementaritybetween the binding region of the receptor and a portion of the antigen called epitope.
• Antibodies present a single type of receptor, antigens might present several epitopes.– This means that each antibody can recognize a single
antigen
B-cell
BCR or Antibody
Epitopes
B-cell Receptors (Ab)
Antigen
2005-12-13 Y. Tan---Artificial Immune Sys. 14
Clonal Selection
Foreign antigens
Proliferation(Cloning)
Differentiation
Plasma cells
Memory cellsSelection
M
M
Antibody
Self-antigen
Self-antigen
Clonal deletion(negative selection)
Clonal deletion(negative selection)
2005-12-13 Y. Tan---Artificial Immune Sys. 15
Main Properties of ClonalSelection (Burnet, 1978)
• Elimination of self antigens• Proliferation and differentiation on contact of
mature lymphocytes with antigen• Restriction of one pattern to one differentiated
cell and retention of that pattern by clonaldescendants;
• Generation of new random genetic changes, subsequently expressed as diverse antibody patterns by a form of accelerated somatic mutation
2005-12-13 Y. Tan---Artificial Immune Sys. 16
Immune Network Theory
• Idiotypic network (Jerne, 1974)
• B cells co-stimulate each other– Treat each other a bit like antigens
• Creates an immunological memory
12
3
A g
ActivationPositive response
SuppressionNegative response
Antibody
Paratope
Idiotope
2005-12-13 Y. Tan---Artificial Immune Sys. 17
Reinforcement Learning and Immune Memory
• Repeated exposure to an antigen throughout a lifetime
• Primary, secondary immune responses• Remembers encounters
– No need to start from scratch– Memory cells
• Continuous learning
2005-12-13 Y. Tan---Artificial Immune Sys. 18
Learning (2)
Antigen Ag1 Antigens Ag1, Ag2
Primary Response Secondary Response
Lag
Response to Ag1
Ant
ibod
y C
once
ntra
tion
Time
Lag
Response to Ag2
Response to Ag1
...
...
Cross-Reactive Response
...
...
Antigen Ag1 + Ag3
Response to Ag1
’=Ag1 + Ag3
Lag
2005-12-13 Y. Tan---Artificial Immune Sys. 19
Immune System: Summary
• Define host (body cells) from external entities.• When an entity is recognized as foreign (or
dangerous)- activate several defense mechanisms leading to its destruction (or neutralization).
• Subsequent exposure to similar entity results in rapid immune response.
• Overall behavior of the immune system is an emergent property of many local interactions.
Back
2005-12-13 Y. Tan---Artificial Immune Sys. 20
Immune metaphors
Immune System
Idea! Idea ‘
Other areas
Artificial ImmuneSystems
Back
2005-12-13 Y. Tan---Artificial Immune Sys. 21
What is an Artificial Immune System?
Dasgupta’99: “Artificial immune systems (AIS) are intelligent and adaptive systems inspired by the immune system toward real-world problem solving”
de Castro and Timmis: “Artificial Immune Systems (AIS) are adaptive systems, inspired by theoretical immunology and observed immune functions, principles and models, which are applied to problem solving”http://www.cs.kent.ac.uk/people/staff/jt6/aisbook/
Definition
•Using natural immune system as a metaphor for solving complex computational problems. •Not modelling the immune system
2005-12-13 Y. Tan---Artificial Immune Sys. 22
AI models and their corresponding natural prototypes
Genetic Algorithms (GA)
MolecularGenetic code
Artificial immune systems (AIS)
MolecularMolecules of proteins
Cellular automata (CA)CellsBiological cells
Neural computing (NC) Neural networks (NN)
CellsBrain nervous net
Formal logicFormal linguistic
Left hemisphere of brain
Natural language
AI modelBiological levelNatural prototype
2005-12-13 Y. Tan---Artificial Immune Sys. 23
Some History• Developed from the field of theoretical
immunology in the mid 1980’s.– Suggested we ‘might look’ at the IS
• 1990 – Bersini first use of immune algorithms to solve problems
• Forrest et al – Computer Security mid 1990’s
• Hunt et al, mid 1990’s – Machine learning• More……
2005-12-13 Y. Tan---Artificial Immune Sys. 24
AIS’ Scope• Pattern recognition;• Fault and anomaly detection;• Data analysis;• Data mining (classification/clustering)• Agent-based systems;• Scheduling;• Machine-learning;• Autonomous navigation and control;• Search and optimization methods;• Artificial life;• Security of information systems; • Optimization;• Just to name a few.
2005-12-13 Y. Tan---Artificial Immune Sys. 25
Typical Applications of AIS• Computer Security(Forrest’94’96’98, Kephart’94, Lamont’98’01,02,
Dasgupta’99’01, Bentley’00’01,02)• Anomaly Detection (Dasgupta’96’01’02)• Fault Diagnosis (Ishida’92’93, Ishiguro’94)• Data Mining & Retrieval (Hunt’95’96, Timmis’99’01, ’02)• Pattern Recognition (Forrest’93, Gibert’94, de Castro ’02)• Adaptive Control (Bersini’91)• Job shop Scheduling (Hart’98, ’01, ’02)• Chemical Pattern Recognition (Dasgupta’99)• Robotics (Ishiguro’96’97,Singh’01)• Optimization (DeCastro’99,Endo’98, de Castro ’02)• Web Mining (Nasaroui’02,Secker’05)• Fault Tolerance (Tyrrell, ’01, ’02, Timmis ’02) • Autonomous Systems (Varela’92,Ishiguro’96)• Engineering Design Optimization (Hajela’96 ’98, Nunes’00)
Back
2005-12-13 Y. Tan---Artificial Immune Sys. 26
Basic Immune Models and Algorithms
• Bone Marrow Models• Negative Selection Algorithms• Clonal Selection Algorithm• Immune Network Models• Somatic Hypermutation
2005-12-13 Y. Tan---Artificial Immune Sys. 27
Bone Marrow Models• Gene libraries are used to create antibodies
from the bone marrow• Antibody production through a random
concatenation from gene libraries• Simple or complex libraries
An individual genome corresponds to four libraries:
Library 1 Library 2 Library 3 Library 4
A1 A2 A3 A4 A5 A6 A7 A8
A3 D5C8B2
A3 D5C8B2
A3 B2 C8 D5
= four 16 bit segments
= a 64 bit chain
Expressed Ab molecule
B1 B2 B3 B4 B5 B6 B7 B8 C1 C2 C3 C4 C5 C6 C7 C8 D1 D2 D3 D4 D5 D6 D7 D8
2005-12-13 Y. Tan---Artificial Immune Sys. 28
Negative Selection (NS) Algorithms• Forrest 1994: Idea taken from the negative
selection of T-cells in the thymus• Applied initially to computer security• Split into two parts:
– Censoring– Monitoring
Selfstrings (S)
Generaterandom strings
(R0)Match Detector
Set (R)
Reject
No
Yes
N o
Y es
D etec to r S e t(R )
P ro tectedS trings (S ) M a tch
N on-se lfD e tec ted
CensoringMonitoring
2005-12-13 Y. Tan---Artificial Immune Sys. 29
Clonal Selection Algorithm (de Castro & von Zuben, 2001)
1. Initialisation: Randomly initialise a population (P)2. Antigenic Presentation: for each pattern in Ag, do:
2.1 Antigenic binding: determine affinity to each P2.2 Affinity maturation: select n highest affinity from P and clone and mutate prop. to affinity with Ag, then add new mutants to P
3. Metadynamics: 3.1 select highest affinity P to form part of M3.2 replace n number of random new ones
4. Cycle: repeat 2 and 3 until stopping criteria (e.g. Max Generation)
2005-12-13 Y. Tan---Artificial Immune Sys. 30
CLONALG for PR, Learning, Optimization
Ab {r}
Ab {m}
fj
Agj
Select
Ab {n}
Clone
CjSelect
Cj*
Fj*
Select
Abj*
Ab{d}
L.N. de Castro, et.al., Learning and optimization using the clonal selection principle, IEEE Trans. Evolutionary computation, vol.6, no.3, June 2002, pp.239-251
2005-12-13 Y. Tan---Artificial Immune Sys. 31
Discrete Immune Network Models (Timmis & Neal, 2001)
1. Initialisation: create an initial network from a sub-section of the antigens2. Antigenic presentation: for each antigenic pattern, do:
2.1 Clonal selection and network interactions: for each network cell,determine its stimulation level (based on antigenic and network interaction)2.2 Metadynamics: eliminate network cells with a low stimulation2.3 Clonal Expansion: select the most stimulated network cells andreproduce them proportionally to their stimulation2.4 Somatic hypermutation: mutate each clone 2.5 Network construction: select mutated clones and integrate
3. Cycle: Repeat step 2 until termination condition is met
2005-12-13 Y. Tan---Artificial Immune Sys. 32
Immune Network Models• Timmis & Neal, 2000• Used immune network theory as a basis,
proposed the AINE algorithmInitialize AINFor each antigen
Present antigen to each ARB in the AINCalculate ARB stimulation levelAllocate B cells to ARBs, based on stimulation levelRemove weakest ARBs (ones that do not hold any B cells)
If termination condition metexit
elseClone and mutate remaining ARBsIntegrate new ARBs into AIN
2005-12-13 Y. Tan---Artificial Immune Sys. 33
Immune Network Models• De Castro & Von Zuben (2000c)• aiNET, based in similar principles
At each iteration step doFor each antigen do
Determine affinity to all network cellsSelect n highest affinity network cellsClone these n selected cells
Increase the affinity of the cells to antigen by reducing the distance between them (greedy search)
Calculate improved affinity of these n cellsRe-select a number of improved cells and place into matrix MRemove cells from M whose affinity is below a set thresholdCalculate cell-cell affinity within the networkRemove cells from network whose affinity is below
a certain thresholdConcatenate original network and M to form new network
Determine whole network inter-cell affinities and remove all those below the set threshold
Replace r% of worst individuals by novel randomly generated onesTest stopping criterion
2005-12-13 Y. Tan---Artificial Immune Sys. 34
Somatic Hypermutation• Mutation rate in proportion to affinity• Very controlled mutation in the natural immune
system• Trade-off between the normalized antibody
affinity D* and its mutation rate α,
0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 7 0 . 8 0 . 9 10
0 . 1
0 . 2
0 . 3
0 . 4
0 . 5
0 . 6
0 . 7
0 . 8
0 . 9
1
D *
α
ρ = 5
ρ = 1 0
ρ = 2 0
Back
2005-12-13 Y. Tan---Artificial Immune Sys. 35
General Framework of AIS
Immune Algorithms
Affinity Measures
Representation
Solution
Problem Application Domain
2005-12-13 Y. Tan---Artificial Immune Sys. 36
Representation – Shape Space
• Describe the general shape of a molecule
A n t i b o d y
A n t i g e n
•Describe interactions between molecules
•Degree of binding between molecules
2005-12-13 Y. Tan---Artificial Immune Sys. 37
Representation
• Vectors Ab = ⟨Ab1, Ab2, ..., AbL⟩Ag = ⟨Ag1, Ag2, ..., AgL⟩
• Real-valued shape-space• Integer shape-space• Binary shape-space• Symbolic shape-space
2005-12-13 Y. Tan---Artificial Immune Sys. 38
Define their Interaction• Define the term Affinity• Affinity is related to distance
– Euclidian∑=
−=L
iii AgAbD
1
2)(
• Other distance measures such as Hamming, Manhattan etc. etc.
• Affinity Threshold
2005-12-13 Y. Tan---Artificial Immune Sys. 39
Shape Space Formalism
• Repertoire of the immune system is complete (Perelson, 1989)
• Extensive regions of complementarity
• Some threshold of recognition
εVε
εVε
εVε
V
´ ´
´
´
´
´
´
2005-12-13 Y. Tan---Artificial Immune Sys. 40
AIS Design• Problem description• Deciding the immune principles used for
problem solving• Engineering the AIS
– Defining the types of immune components used– Defining the representation for the elements of the AIS– Applying immune principle to problem solving– The meta-dynamics of an AIS
• Reverse mapping from AIS to the real problem
Back
2005-12-13 Y. Tan---Artificial Immune Sys. 41
Case Studies of AIS• Malicious Executables Detection ---
From Z.H. Guo, Z.K. Liu, and Y. Tan, An NN-based Malicious Executables Detection Algorithm based on Immune Principles, F.Yin, J.Wang, C. Guo (Eds.): ISNN 2004, Springer, Lecture Notes in Computer Science 3174, pp. 675-680, 2004. (http://dblp.uni-trier.de)
• Film Recommender --- From Dr. Dr UweAickelin (http://www.aickelin.com), University of Nottingham, U.K. 2004
Back
2005-12-13 Y. Tan---Artificial Immune Sys. 42
Immuneocomputing -- ICBy Tarakanov, A. 2001.Aims of• A proper mathematical framework;• A new kind of computing;• A new kind of hardware.
New concepts offormal protein (FP) ------- vs. neuronformal immune networks (FIN)------- vs. NN
New!
•A.O. Tarakanov, V.A. skormin, and S.P. Sokolova, Immunocomputing: Principles and Applications, Springer, 2003.Refer to
2005-12-13 Y. Tan---Artificial Immune Sys. 43
Problems of Traditional Self/Non-self View
• No reaction to foreign bacteria in gut (friendly bacteria…).
• No reaction to food / air / etc.• The human body changes over its life.• Auto-immune diseases.• How do we produce antibodies that react against
antigens and yet avoid self?• Is it necessary to attack all non-self or a specific self?
2005-12-13 Y. Tan---Artificial Immune Sys. 44
The Danger Theory• In the danger model, the idea is to recognise ‘danger’
rather than non self.
• The screening is accomplished post production through an external ‘danger’ signal. Thus the production of autoreactive antibodies (which react to self) is allowed.
• If an (e.g. autoreactive) antibody matches a stimulus in the absence of danger, it is removed. Thus harmless antigens are tolerated, and changing self accommodated.
Matzinger (2002). The Danger Model: A renewed sense of self , Science 296: 301-304.
New!
2005-12-13 Y. Tan---Artificial Immune Sys. 45
Danger Theory (con’t)
• Danger Theory – Not self/non-self but Danger/Non-Danger– Immune response is initiated in the tissues.
Danger Zone.– This makes it context dependant
• Matzinger (2002) The Danger Model: A renewed sense of self Science 296: 301-304• Aickelin & Cayzer (2002) The Danger Theory and Its Application to Artificial Immune Systems, Proc. International Conference on AIS (ICARIS 2002)
2005-12-13 Y. Tan---Artificial Immune Sys. 46
Danger Zone
Antigens
Antibodies
Match, buttoo faraway
Stimulation
DangerZone
Danger Signal
Damaged Cell
Cells
No match
2005-12-13 Y. Tan---Artificial Immune Sys. 47
Towards a ‘dangerous’ IDS
Aickelin U, Bentley P, Cayzer S, Kim J and McLeod J (2003): 'Danger Theory: The Link between AIS and IDS?', Proceedings ICARIS-2003, 2nd International Conference on Artificial Immune Systems, LNCS 2787, pp 147-155
“The danger theory suggests that the immune system reacts to threats based on the correlation of various (danger) signals, providing a method of ‘grounding’ the immune response, i.e. linking it directly to the attacker.”
2005-12-13 Y. Tan---Artificial Immune Sys. 48
Other ways of using dangerDanger = Crime, Antigen = Suspect
or...
Danger = Context ?
It could also be useful for data mining, where the ‘danger’signal is a proxy measure of interest
‘Danger Zone’ can be spatial or temporalAndrew Secker, Alex Freitas, and Jon Timmis (2005) “Towards a danger theory inspired artificial immune system for web mining” in A Scime, editor, Web Mining: applications and techniques, pages 145-168 (Idea Group)
2005-12-13 Y. Tan---Artificial Immune Sys. 49
Some Recent Applications of Danger Theory
• Anjum Iqbal, Mohd Aizaini Maarof, “Danger Theory and Intelligent Data Processing,”International Journal of Information Technology, Vol.1, No.1, 2004.
• Andrew Secker, Alex A. Freitas, and Jon Timmis, “A Danger Thory Inspired Approach to Web Mining,” Computing Lab. University of Kent, Canterbury, Kent, UK.2005
• So on.
Back
2005-12-13 Y. Tan---Artificial Immune Sys. 50
The Future• More formal approach required?• Wide possible application domains.• What makes the immune system
unique?• More work with immunologists:
– Danger theory.– Idiotypic Networks.– Self-Assertion.
2005-12-13 Y. Tan---Artificial Immune Sys. 51
Reference for further readingBooks• Artificial Immune Systems and Their
Applications by Dipankar Dasgupta (Editor) Springer Verlag, January 1999.
• L.N. de Castro and J. Timmis, Artificial Immune Systems: A New Computational Intelligence Approach, Springer, 2002.
• A.O. Tarakanov, V.A. skormin, and S.P. Sokolova, Immunocomputing: Principles and Applications, Springer, 2003.
Related academic papers• J. Timmis, P.Bentley, and Emma Hart (Eds.): Artificial Immune Systems,
Proceedings of Second International Conference, ICARIS 2003, Edinburgh, UK, September 2003. LNCS 2787, Springer.
2005-12-13 Y. Tan---Artificial Immune Sys. 52
New Events:• Special Session on Artificial Immune Systems at the Congress
on Evolutionary Computation (CEC), December 8-12, 2003, Canberra, Australia.
• Special Session on Immunity-Based Systems at Seventh International Conference on Knowledge-Based Intelligent Information & Engineering Systems (KES), September 3-5, 2003, University of Oxford, UK.
• Second International Conference on Artificial Immune Systems (ICARIS), September 1-3, 2003, Napier University, Edinburgh, UK.
• Tutorial on Artificial Immune Systems at 1st Multidisciplinary International Conference on Scheduling: Theory and Applications (MISTA), 12 August 2003, The University of Nottingham, UK.
• Tutorial on Immunological Computation at International Joint Conference on Artificial Intelligence (IJCAI), August 10, 2003, Acapulco, Mexico.
• Special Track on Artificial Immune Systems at Genetic and Evolutionary Computation Conference (GECCO), Chicago, USA, July 12-16, 2003
2005-12-13 Y. Tan---Artificial Immune Sys. 53
AIS Resources• Artificial Immune Systems and Their Applications by D
Dasgupta (Editor), Springer Verlag, 1999.• Artificial Immune Systems: A New Computational
Intelligence Approach by L de Castro, J Timmis, Springer Verlag, 2002.
• Immunocomputing: Principles and Applications by A Tarakanov et al, Springer Verlag, 2003.
• Third International Conference on Artificial Immune Systems (ICARIS), September 13-16, 2004, University of Catania, Italy.
• 4th International Conference on Artificial Immune Systems(ICARIS), 14th-17th August, 2005 in Banff, Alberta, Canada
2005-12-13 Y. Tan---Artificial Immune Sys. 54
That’s all
First Page
2005-12-13 Y. Tan---Artificial Immune Sys. 55
Malicious Executables Detectionbased on Artificial Immune Principles*
* This work was supported by Natural Science Foundation of China with Grant No. 60273100.
From Z.H. Guo, Z.K. Liu, and Y. Tan, An NN-based Malicious Executables Detection Algorithm based on Immune Principles, F.Yin, J.Wang, C. Guo (Eds.): ISNN 2004, Springer Lecture Notes on Computer Science 3174, pp. 675-680, 2004. (http://dblp.uni-trier.de)
Case Study 1:
2005-12-13 Y. Tan---Artificial Immune Sys. 56
Outline• Definition of Terms• Goal and Motivation• Previous Research works• Immune Principle for Malicious Executable
Detection• Malicious Executable Detection Algorithm • Experiments and Discussion• Concluding Remarks
2005-12-13 Y. Tan---Artificial Immune Sys. 57
Definition of Terms• Malicious Executable
is generally defined as a program that has some malicious functions, such as compromising a system’s security, damaging a system or obtaining sensitive information without the permission of users. It includes virus, trojanhorse, worm etc.
• Benign Executableis a normal program without any malicious function.
Back
2005-12-13 Y. Tan---Artificial Immune Sys. 58
Dos/Win32 viruses
Computers / Information SystemsTrojan horses
Worms
eMail attached viruses
Malicious executables
tens of thousands of new viruses / year
Appear!
But: Current antivirus systems attempt to detect these new malicious programs with heuristics by hand (costly and ineffective)
Current Task:Devise new methods for detecting new ME
2005-12-13 Y. Tan---Artificial Immune Sys. 59
Definition of Symbols and Structures
B: binary code alphabet, B={0,1}.Seq(s,k,l): short sequence cutting operation. Supposing s is binary sequence, and s=b(0)b(1)…b(n-1), b(i)∈B, then Seq(s,k,l)=b(k)b(k+1)…b(k+l-1).E(k): executable set, k∈{m,b},m denotes malicious executable, b benign executable.E: whole set of executables, i.e., E= E(m)∪E(b).e(fj,n): executable as binary sequence of length n, and fj is executable identifier.ld: detector code length.lstep: step size of detector generation.dl: detector, dl = Seq(s,k,l).Dl: set of detector with code length l, i.e., Dl ={ dl (0), dl (1),…, dl (nd-1)}, |Dl|= nd.
Back
2005-12-13 Y. Tan---Artificial Immune Sys. 60
Goal and Motivation
• Aiming at developing an automatic detection approach of new malicious executables.
• Aiming at trying to use artificial immune system (AIS) and artificial neural networks (ANN), to detect malicious executable with a high Detection Rate (DR) with low False Positive Rate (FPR) over others.
Back
2005-12-13 Y. Tan---Artificial Immune Sys. 61
Previous Related Works
• Signature-based Methods
• Expert Knowledge-based Methods
• Machine Learning Methods
Back
2005-12-13 Y. Tan---Artificial Immune Sys. 62
Signature-based MethodsIt creates a unique tag for each malicious program so that future
examples of it can be correctly classified with a small error rate. And relies on signatures of known malicious executable to generate detection models.
Drawbacks:• Can not detect unknown and mutated viruses. • As increase of the number and type of viruses, its detection speed
become slow dramatically. At the same time, the analysis of the signatures of viruses become very difficult, in particular, for the encrypted signatures.
(refer to IBM Anti-virus Group’s report: R.W. Lo, K.N. Levitt, and R.A. Olsson. MCF: a Malicious Code Filter. Computers & Security, 14(6):541–566., 1995.)
Back
2005-12-13 Y. Tan---Artificial Immune Sys. 63
Expert Knowledge-based Methods
Using the knowledge of a group of virus experts to construct heuristic classifiers for detection of unknown viruses.
Drawbacks:• Time-consuming analysis method.• Only discover some unknown viruses, but its false
detection rate is very high. For detecting unknown virus based on ANN, IBM Anti-virus
Group also proposes one method to detect Boot Sector viruses only.
(refer to W. Arnold and G. Tesauro. Automatically Generated Win32 Heuristic Virus Detection. Proceedings of the 2000 International Virus Bulletin Conference, 2000.)
Back
2005-12-13 Y. Tan---Artificial Immune Sys. 64
Machine Learning Methods
• M.G. Schultz developed a framework that used data mining algorithms, i.e., Multi-Naïve Bayesmethod, to train multiple classifiers on a set of malicious and benign executables to detect new examples (unknown ME).
(refer to M.G. Schultz.,E. Eskin and E. Zadok . Data Mining Methods for Detection of New Malicious Executables. IEEE Symposium on Security and Privacy, May 2001.)
Back
2005-12-13 Y. Tan---Artificial Immune Sys. 65
Biologically-motivated Information Processing Systems
• Brain-nervous systems – Neural Networks (NN)• Genetic systems – Genetic Algorithms(GA)• Immune systems – Artificial Immune Systems(AIS)
or immunological computation.
NN and GA have extensively studied with wide applications but AIS has relative few applications
2005-12-13 Y. Tan---Artificial Immune Sys. 66
Natural prototypes vs. their models
Genetic Algorithms (GA)
MolecularGenetic code
Artificial immune systems (AIS)
MolecularMolecules of proteins
Cellular automata (CA)
CellsBiological cells
Artificial Neural networks (ANN)
CellsBrain nervous net
Formal logicFormal linguistic
Left hemisphere of brain
Natural language
Computing modelBiological level
Natural prototype
2005-12-13 Y. Tan---Artificial Immune Sys. 67
Comparison of Three Algorithms
Recognition / Objective FunctionExternal StimuliFitness FunctionInteraction with
Environment
Recognition / Network ConnectionsNetwork ConnectionsCrossoverInteraction between
Components
Recruitment / Elimination of Components
Construction / Pruning of Connections
Recruitment / Elimination of ComponentsMeta-Dynamics
Evolution / LearningLearningEvolutionDynamics
Component Concentration / Network
ConnectionsConnection StrengthsChromosome StringsKnowledge Storage
Discrete components / Networked ComponentsNetworked ComponentsDiscrete ComponentsStructure
DynamicPre-DefinedDynamicLocation of Components
Attribute StringsArtificial NeuronsChromosome StringsComponents
AISNN (Classification)GA (Optimisation)
2005-12-13 Y. Tan---Artificial Immune Sys. 68
Immune Principles for Malicious Executable Detection
• Non-self Detection Principle
• Anomaly Detection Based on Thickness
• The Diversity of Detector Representation vs. Anomaly Detection Hole
Back
2005-12-13 Y. Tan---Artificial Immune Sys. 69
Non-self Detection Principle• For natural immune system, all cells of body are
categorized as two types of self and non-self. The immune process is to detect non-self from cells.
• To realize the non-self detection, the maturation process of lymphocytes T cell undergoes two selection stages of Positive Selection and Negative Selection since antigenic encounters may result in cell death. Some computer scientists inspired by these two stages had proposed some algorithms used to detect anomaly information. Here, we will use the Positive Selection Algorithm (PSA) to perform the non-self detection for recognizing the malicious executable.
2005-12-13 Y. Tan---Artificial Immune Sys. 70
Non-self Detection by PSA
Match ??
Y
N
Detector Set Dl
self non-self
Short sequence to be detected
(Its length is l)
Process of anomaly detection with PSA
Back
2005-12-13 Y. Tan---Artificial Immune Sys. 71
Anomaly Detection Based on Thickness
• Anomaly recognition process is one process that immune cells detect antigens and are activated.
• The activated threshold of immune cells is decided by the thickness of immune cells matching antigens.
Back
2005-12-13 Y. Tan---Artificial Immune Sys. 72
The Diversity of Detector Representation vs. Anomaly Detection Hole
• The main difficulty of anomaly detection is utmost decreasing the anomaly detection hole. The natural immune system resolves this problem well by use of the diversity of MHC (Major Histocompatibility Complex) cell representations, which decides the diversity of anti-body touched in surface of T cells. This property is very useful in increasing the power of detecting mutated antigens, and decreasing the anomaly detection hole.
• According to the principle, we can use the diversity of detector representation to decrease the anomaly detection hole. As was illustrated by following schematic drawings.
2005-12-13 Y. Tan---Artificial Immune Sys. 73
Schematic diagram of abnormal detection holes (cont’)
Self Space
Nonself Space
Detectors
Abnormal detection holes
2005-12-13 Y. Tan---Artificial Immune Sys. 74
Reduction of abnormal detection holes by use of the diversity of
detector representations
Detector Representation 1
Detector Representation 2
Detector Representation 3
Combination of detectors
back
2005-12-13 Y. Tan---Artificial Immune Sys. 75
Malicious Executable Detection Algorithm (MEDA)
MEDA based on AIS includes three parts,
• Detector generation,
• Anomaly information extraction ,
• and Classification.
2005-12-13 Y. Tan---Artificial Immune Sys. 76
Flow Chart of Malicious Executable Detection Algorithm (MEDA)
Executable to be detected(…00111101…) Output
Update Gene(…10101101…)
Generating detector set
Extracting anomaly property
Classifier
Gene(…01101001…)
MEDA
Back
2005-12-13 Y. Tan---Artificial Immune Sys. 77
Generation of Detector SetDetector generation algorithm:
• Begin initialize lstep、ld、k=0• Do cutting e(fk,n) from Eg(b)• i=0;• While i <= n-ld-1 do • Begin• d = Seq(e(fk,n),i, ld);• if d≠€ Dld then Dld←d;• i=i+lstep;• End• k=k+1;• Until Eg(b) is empty• Return Dld ;• End
2005-12-13 Y. Tan---Artificial Immune Sys. 78
Illustration of Detector Generating Process
File Hex Sequence: 56 32 12 0A 34 ED FF 00 2D…. . 00 0A 34 ED FF FA 11 00Extracting Detector: 56 32 12
32 12 0A12 0A 34
┋……………………………………………┋
FF FA 11FA 11 00
Generating Process of 24-bit Detectors with 8-bit stepsize (ld=24, lstep=8)
Back
2005-12-13 Y. Tan---Artificial Immune Sys. 79
Extraction of Anomaly Characteristics --Non-self Thickness (NST)
• Non-self Detection• NST, as Anomaly Property, is defined
as the ratio of number of non-self units to file binary sequence, pl=nn/(nn+ns).
• If there are m kinds of detectors, the file has a NST Vector P=(pl1, pl2, …, plm)T.
2005-12-13 Y. Tan---Artificial Immune Sys. 80
NST Extraction Diagram
“Nonself” Detection
File to be detected(…00111101)
Is “Nonself” ?
ns add 1 nn add 1
Y
N
Completing detection ?Y
N
Compute pl=nn/(nn+ns)
Initialization,choose lstep、ld , Dl
End
2005-12-13 Y. Tan---Artificial Immune Sys. 81
NST Extraction Algorithm• Begin open e(fk,n);• Select lstep, ld; • Set ns=0, nn=0, i=0;• While i <= n-ld-1 do • Begin• s = Seq(e(fk,n),i, ld);• if s ≠€ Dld then nn = nn+1;• else ns = ns + 1;• i = i + lstep;• End• pld = nn / ( ns+nn );• Return pld ;• End
Back
2005-12-13 Y. Tan---Artificial Immune Sys. 82
BP Network Classifier• We use Anomaly Property Vector
(APV), i.e., NST vector P, as input variable of two-layer BP network classifier. The number of nodes of input layer equals to APV’s dimension.
• The Sigmoid transfer function is chosen for the hidden layer and Linear function for the output layer.
2005-12-13 Y. Tan---Artificial Immune Sys. 83
BP Network Classifier Structure
pl1
pl2
plm
Out (1-ME, 0-BE)P
Non-S
elf Thickness (NS
T) Vector
Back
2005-12-13 Y. Tan---Artificial Immune Sys. 84
Experiments and Discussion• Experimental Data Set
• Generation of Detector Set
• Experimental Result Using Single Detector Set
• Experimental Result Using Multi-Detector Set
Back
2005-12-13 Y. Tan---Artificial Immune Sys. 85
Experimental Data Set
• BE—Benign Executable• ME—Malicious Executable
All Justified by Antivirus cleaner Tools
4481Total
DOS virus, Win32 virus, Trojan, Worm, etc. from Internet.
3566 ME
Win 2K OS and some application programs.
915BE
RemarksFilesType
Back
2005-12-13 Y. Tan---Artificial Immune Sys. 86
Generation of Detector Set
• Eg(b) is Gene of generating detector, ld ∈{16,24,32,64,96}, and lstep=8bits. By using the detector generating algorithm, we can get D16, D24, D32, D64, and D96, separately.
Table1: Detectors generation result
TreeTreeTreeBitmap Index
Bitmap Index
storestructure
21,294,857
12,768,361
8,938,352
10,931,627
65536|Dld|
9664322416Code Length ld
Back
2005-12-13 Y. Tan---Artificial Immune Sys. 87
Detection Result of Malicious Executables by D24
异己
”浓
度P2
4
(a) NST of files, where symbol ‘x’ represents benign program (Red),
‘□’ malicious program (Blue)
正确检测
率(Detection Rate)%
误判率(False Positive Rate)%
(b) ROC Curve
File No.
NS
T p24
2005-12-13 Y. Tan---Artificial Immune Sys. 88
Detection Result of Malicious Executables by D32
文件序号
异己
”浓
度P32
正确
检测
率(Detection Rate)%
误判率(False Positive Rate)%
(b) ROC Curve(a) NST of files, where symbol ‘x’ represents benign program,
‘□’ malicious program
NS
T p32
2005-12-13 Y. Tan---Artificial Immune Sys. 89
Detection Result of Malicious Executables by D64
文件序号
异己
”浓
度P64
正确
检测率
(Detection Rate)%
误判率(False Positive Rate)%
(b) ROC Curve(a) NST of files, where symbol
‘x’ represents benign program (Red), ‘□’ malicious program (Blue)
NS
T p64
2005-12-13 Y. Tan---Artificial Immune Sys. 90
Experimental Result Using Single Detector Set
0 2 0 4 0 6 0 8 0 1 0 00
2 0
4 0
6 0
8 0
1 0 0
1 6 b its D a ta S e t2 4 b its D a ta S e t3 2 b its D a ta S e t6 4 b its D a ta S e t9 6 b its D a ta S e t
F a l s e P o s i t i v e R a t e ( % )
Det
ectio
n R
ate
(%)
2005-12-13 Y. Tan---Artificial Immune Sys. 91
When FPR is fixed, relationship curves of DR versus Code
Length ldDetection Rate (%)
C o d e le n g th l d( b its)
Note: from the bottom to up, the FPR is 0%, 0.5%, 1%, 2%, 4%, 8%, and 16%, in sequence.
Back
2005-12-13 Y. Tan---Artificial Immune Sys. 92
Experimental Result Using Multi-Detector Set
• This experiment selects multi-detector set to detect benign and malicious executables.
• We don’t use D16 because of its zero DR and also set D96 as upper limit because almost same DR values when ld ≥96.
• Here we selects D24, D32, D64 and D96 four detector sets as anomaly detection data set, and uses them to extract Non-self thickness (NST) vector, and finally a BP network is exploited as classifier.
• For the process of classification, we randomly selects 30% files of E(b) as Eg(b) to train a BP network, and use the remaining data to illustrate the anomaly detection performance.
2005-12-13 Y. Tan---Artificial Immune Sys. 93
NST Distribution and ROC Curve of Multi-Detector Set Method
Detection Rate (%)
False Positive Rate (%)
“异
己”
浓度
(64bits)
“异己”
浓度(24bits)
“异己”浓度(32bits)
(a) NST of files for mixture of D24, D32 and D64.
‘x’ benign program (in Red),‘□’ malicious program (in Blue).
(b) ROC Curve of mixed detector set of D24, D32, D64 and D96
2005-12-13 Y. Tan---Artificial Immune Sys. 94
Comparisons With Bayes Methods and Signature-based Method
0 2 4 6 8 1 0 1 20
2 0
4 0
6 0
8 0
1 0 0
M E D A w ith B P N e tw o r kN a ive B a ye s w ith S tr in g sM u lti- N a ive B a ye s w ith B yte sS ig n a tu r e M e th o d
F a l s e P o s i t i v e R a t e ( % )
Det
ectio
n R
ate
(%)
2005-12-13 Y. Tan---Artificial Immune Sys. 95
Algorithm Complexities
ComputingJoint Probs.
Computing NST
Name
Operation type 3
1Gblf float multiplica-tions
Depend on P(Fi/C)
Searching P(Fi/C)
>>ltrainProb. Info.
Bayes
0.4Gb4×lf additions
≤80×ltest
detector matching
ltraindetectorsMEDA
AmountAmountNameAmountNameAlgorithm
Store Space
Operation type 2Operation type 1
1
( ) ( / )n
ii
P C P F C=∏
Back
2005-12-13 Y. Tan---Artificial Immune Sys. 96
Remarks• For short binary sequence and single detector
set for the detection of malicious executables, the performance of D24 is the best, giving out DR 80.6% with FPR 3%.
• For long code length of detector and multi-detector set, our method obtains the best performance of DR 97.46% with FPR 2%, over current methods.
• This result verifies – diversity of detector representation can decrease
anomaly detection holes. – “non-self” thickness detection.
Back
Back
2005-12-13 Y. Tan---Artificial Immune Sys. 97
Film Recommender
Prediction:
– What rating would I give a specific film?
Recommendation:
– Give me a ‘top 10’ list of films I might like.
From Dr. Dr Uwe Aickelin (http://www.aickelin.com)University of Nottingham, U.K.,
Case Study 2:
2005-12-13 Y. Tan---Artificial Immune Sys. 98
Film Recommender (con’t 1)
• EachMovie database (70k users).• User Profile: set of tuples {movie, rating}.• Me: My user profile.• Neighbour: User profile of others. • Similarity metric: Correlation score.• Neighbourhood: Group of similar users.• Recommendations: From neighbourhood.
2005-12-13 Y. Tan---Artificial Immune Sys. 99
Film Recommender (con’t 2)
• User Profile: set of tuples {movie, rating}• Me: My user profile.• Neighbour: User profile of others. • Affinity metric: Correlation score.
• Neighbourhood: Group of similar users.
• Recommendations: From neighbourhood
Antigen
Antibody
Antibody – Antigen Binding Antibody – Antibody Binding
Group of antibodies similar to antigen and dissimilar to other antibodies
Stim
ulat
ion
Suppression
Weighted Score based on Similarities.
2005-12-13 Y. Tan---Artificial Immune Sys. 100
Film Recommender (con’t 3)
• Start with empty AIS.• Encode target user as an antigen Ag.• WHILE (AIS not full) && (More Users):
– Add next user as antibody Ab.– IF (AIS at full size) Iterate AIS.
• Generate recommendations from AIS.
2005-12-13 Y. Tan---Artificial Immune Sys. 101
Film Recommender (con’t 4)Suppose we have 5 users and 4 movies:
– u1={(m1,v11),(m2,v12),(m3,v13)}.– u2={(m1,v21),(m2,v22),(m3,v23),(m4,v24)}.– u3={(m1,v31),(m2,v32),(m4,v34)}.– u4={(m1,v41),(m4,v44)}.– u5={(m1,v51),(m2,v52),(m3,v53), (m4,v54)}.
• We do not have users’ votes for every film.
• We want to predict the vote of user u4 on movie m3.
2005-12-13 Y. Tan---Artificial Immune Sys. 102
Algorithm walkthrough (1)
Start with empty AIS:DATABASE
u1, u2, u3, u4, u5
AIS
User for whom to predict becomes antigen:
DATABASE
u1, u2, u3, u5
u4
AIS
Ag
2005-12-13 Y. Tan---Artificial Immune Sys. 103
Algorithm walkthrough (2)
Add antibodies until AIS is full…
Ab1
DATABASE
u2, u3, u5
u1
AIS
Ag
DATABASE
u4
u2,u3
AIS
Ag
Ab1 Ab2
Ab3
2005-12-13 Y. Tan---Artificial Immune Sys. 104
Algorithm walkthrough (3)
• Table of Correlation between Aband Ag:– MS14, MS24, MS34.
• Table of Correlation between Antibodies:– MS12 = CorrelCoef(Ab1, Ab2) – MS13 = CorrelCoef(Ab1, Ab3)– MS23 = CorrelCoef(Ab2, Ab3)
Ab2
Ab3
Ab1
Ag
2005-12-13 Y. Tan---Artificial Immune Sys. 105
Algorithm walkthrough (4)
• Calculate Concentration of each Ab:– Interaction with Ag (Stimulation).– Interaction with other Ab (Suppression).
Ab1
Ab1 Ab2
Ab2
Ab2
Ab2Ab2
Ag
Ab1 Ab2
Ab3
AIS
Ag
AIS
2005-12-13 Y. Tan---Artificial Immune Sys. 106
Algorithm walkthrough (5)
• Generate Recommendation based on Antibody Concentration.
Recommendation for user u4 on movie m3will be highly based on vote on m3 of user u2
AIS
Ab1
Ab2
Ab1
AgAb2
Ab2
Ab2Ab2
2005-12-13 Y. Tan---Artificial Immune Sys. 107
Film Recommender Results
• Tested against standard method (Pearson k-nearest neighbours).
• Prediction:– Results of same quality.
• Recommendation:– 4 out of 5 films correct (AIS).– 3 out of 5 films correct (Pearson).
Back