am odeolg f enle ibrarye volutionint h de ynamicc l onasl ... · am odeolg f enle ibrarye...

8
A Model of Gene Library Evolution in the Dynamic Cl onal Selection Algorithm J. Kim Department of Computer Science King’s College London Strand London WC2R 2LS [email protected] P. J. Bentley Department of Computer Science University College London Gower Street London WC1E 6BT [email protected] Abstract The dynamic clonal selection algorithm (dynamiCS) was created to tackle the difficulties of anomaly detection in continuously changing environments (Kim and Bentley, 2002a). This algorithm was extended in a sister paper (Kim and Bentley, 2002b), so that memory detectors that are no longer valid are automatically deleted. Here we describe a further extension to the system: the use of hypermutation on deleted memory detectors to produce, in effect, a “virtual gene library” which seeds the immature detector population. 1 INTRODUCTION When using an Artificial Immune System (AIS) in a r eal environment (e.g., monitoring network traffic), nor mal or self behaviours can change after a certain period. In addition, the system may only see a small subset of self antigens at any one time. In order for our AIS to be able to deal with such an environment, a dynamic clonal selection algorithm (DynamiCS) was introduced in previous work (Kim and Bentley, 2002a). The results described there showed that DynamiCS could incrementally learn the globally converged distribu tions even though only one subset distribution was given at each generation. This feature was achieved by emplo ying three important parameters: tolerisation period of an immature detector ( T), activation threshold of a mature detector ( A) and the life span of a mature detector. However, the original DynamiCS could not learn new self-antigens when learned self and non-self behavi ours suddenly altered due to legal self change. This res ulted in high false positive (FP) rates when new antigens we re monitored by DynamiCS, although it produced high tr ue positive (TP) rates. A sister paper to this describes a further extensio n of DynamiCS, which reduces FP rates increased by memor y detectors (Kim and Bentley 2002b). The extended DynamiCS handles generated memory detectors based o n their detection results. The original DynamiCS pres erved memory detectors for an infinite lifespan. In contr ast, the extended DynamiCS kills memory detectors if they sh ow poor self-tolerance to new antigens (Kim and Bentle y 2002b). This extended system was tested to determin e whether surviving memory detectors no longer cause seriously high FP error rates or not. From this tes t, it was analysed to see whether any other problems occur as a consequence of killing memory detectors. The analys is showed that the extended DynamiCS requires a larger amount of co-stimulation if it yielded high TP rate s. This analysis led to the work described in this pap er: the addition of hypermutation to the extended DynamiCS, to in effect evolve a gene library of the AIS. This additional extension is designed to fine-tune gener ated memory detectors so that the system obtains higher TP rates without increasing the amount of co-stimulation. Here, the new extension is tested to determine whet her it gains high TP rates without increasing the amount o f co- stimulation as the result of gene library evolution . The test results are then analysed to see how hypermutation leads to such a gene library evolution effect, and thus w hether it improves the overall system performance. Finally, t he novel features of DynamiCS studied in this work are discussed in accordance with a comparison to the mo st similar AIS developed by (Hofmeyr, 1999; Hofmeyr an d Forest, 2000). 2 DYNAMIC CLONAL SELECTION (DynamiCS) ALGORITHM The new AIS introduced in previous work (Kim and Bentley, 2002a) follows the basic concept of the AI S proposed by Hofmeyr (1999). The adaptability of Hofmeyr’s AIS was achieved via co-ordinated dynamic s of three different detector populations: immature, mature, andmemory detector populations. In order to fully comprehend the co-ordinated dynamics of these three detector populations in terms of AIS adaptability, we introduced an artificial immune algorithm, called t he dynamic clonal selection algorithm (DynamiCS). Although Hofmeyr proposed various new features in order to effect great adaptability and distributed detection, DynamiCS attempts to distill only the crucial compo nents that yield adaptability to the system (and reduce t he number of system parameters to ensure the algorithm is usable). The following pseudo code provides an over view of the extended DynamiCS.

Upload: lethien

Post on 09-May-2019

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AM odeolG f enLe ibraryE volutionint h De ynamiCc l onaSl ... · AM odeolG f enLe ibraryE volutionint h De ynamiCc l onaSl election Algorithm JK. im DepartmenotC f omputeSr cience

A Model of Gene Library Evolution in the Dynamic Cl onal Selection Algorithm

J. Kim

Department of Computer Science King’s College London

Strand London WC2R 2LS

[email protected]

P. J. Bentley

Department of Computer Science University College London

Gower Street London WC1E 6BT

[email protected]

Abstract

The dynamic clonal selection algorithm (dynamiCS) was created to tackle the difficulties of anomaly detection in continuously changing environments (Kim and Bentley, 2002a). This algorithm was extended in a sister paper (Kim and Bentley, 2002b), so that memory detectors that are no longer valid are automatically deleted. Here we describe a further extension to the system: the use of hypermutation on deleted memory detectors to produce, in effect, a “virtual gene library” which seeds the immature detector population.

1 INTRODUCTION

When using an Artificial Immune System (AIS) in a r eal environment (e.g., monitoring network traffic), nor mal or self behaviours can change after a certain period. In addition, the system may only see a small subset of self antigens at any one time. In order for our AIS to be able to deal with such an environment, a dynamic clonal selection algorithm (DynamiCS) was introduced in previous work (Kim and Bentley, 2002a). The results described there showed that DynamiCS could incrementally learn the globally converged distribu tions even though only one subset distribution was given at each generation. This feature was achieved by emplo ying three important parameters: tolerisation period of an immature detector ( T), activation threshold of a mature detector ( A) and the life span of a mature detector. However, the original DynamiCS could not learn new self-antigens when learned self and non-self behavi ours suddenly altered due to legal self change. This res ulted in high false positive (FP) rates when new antigens we re monitored by DynamiCS, although it produced high tr ue positive (TP) rates.

A sister paper to this describes a further extensio n of DynamiCS, which reduces FP rates increased by memor y detectors (Kim and Bentley 2002b). The extended DynamiCS handles generated memory detectors based o n their detection results. The original DynamiCS pres erved memory detectors for an infinite lifespan. In contr ast, the extended DynamiCS kills memory detectors if they sh ow

poor self-tolerance to new antigens (Kim and Bentle y 2002b). This extended system was tested to determin e whether surviving memory detectors no longer cause seriously high FP error rates or not. From this tes t, it was analysed to see whether any other problems occur as a consequence of killing memory detectors. The analys is showed that the extended DynamiCS requires a larger amount of co-stimulation if it yielded high TP rate s.

This analysis led to the work described in this pap er: the addition of hypermutation to the extended DynamiCS, to – in effect – evolve a gene library of the AIS. This additional extension is designed to fine-tune gener ated memory detectors so that the system obtains higher TP rates without increasing the amount of co-stimulation. Here, the new extension is tested to determine whet her it gains high TP rates without increasing the amount o f co-stimulation as the result of gene library evolution . The test results are then analysed to see how hypermutation leads to such a gene library evolution effect, and thus w hether it improves the overall system performance. Finally, t he novel features of DynamiCS studied in this work are discussed in accordance with a comparison to the mo st similar AIS developed by (Hofmeyr, 1999; Hofmeyr an d Forest, 2000).

2 DYNAMIC CLONAL SELECTION (DynamiCS) ALGORITHM

The new AIS introduced in previous work (Kim and Bentley, 2002a) follows the basic concept of the AI S proposed by Hofmeyr (1999). The adaptability of Hofmeyr’s AIS was achieved via co-ordinated dynamic s of three different detector populations: immature, mature, and memory detector populations. In order to fully comprehend the co-ordinated dynamics of these three detector populations in terms of AIS adaptability, we introduced an artificial immune algorithm, called t he dynamic clonal selection algorithm (DynamiCS). Although Hofmeyr proposed various new features in order to effect great adaptability and distributed detection, DynamiCS attempts to distill only the crucial compo nents that yield adaptability to the system (and reduce t he number of system parameters to ensure the algorithm is usable). The following pseudo code provides an over view of the extended DynamiCS.

Page 2: AM odeolG f enLe ibraryE volutionint h De ynamiCc l onaSl ... · AM odeolG f enLe ibraryE volutionint h De ynamiCc l onaSl election Algorithm JK. im DepartmenotC f omputeSr cience

Initialise Dynamic Clonal Selection Algorithm Create an initial immature detector population with random detectors; Generation_Number = 1; Do { If (Generation_Number = N) then Select a new anti gen cluster. Select 80% of self and non-self antigens from chos en antigen cluster; Reset Parameters Generation_Number++; Memory Detector Age+ +; Mature Detector Age++; Immature Detector A ge++; Monitor Antigens { Monitor Antigens by Memory Detectors Co-stimulation: does the memory detector de tect a non-self antigen or does it detect a self antigen? Kill memory detectors that detect self anti gens. Monitor Antigens by Mature Detectors Check whether any mature detector detects any non-self antigen; Check whether any mature detector detects any se lf antigen; Create new memory detectors; Old mature detectors are killed; Monitor Antigens by Immature Detectors Check whether any immature detector detec ts any self antigen; Delete any immature detector matching any self antigen; Create new mature detectors; } If (immature detector population size + mature detector population size < non-memory detector pop size) { Do { Generate a random detector; Add a random detector to an immature dete ctor population; } Until (immature detector population size + mature detector population siz e = non-memory detector pop size); } } While (generation Number < max Generation)

Full details of this algorithm are given in (Kim an d Bentley, 2002a and b).

All experiments used the Wisconsin breast cancer da ta set. The cancer data has two classes, ‘Malignant’ a nd ‘Benign’. The system treated ‘Malignant’ as non-sel f and ‘Benign’ as self. In order to be sure of providing antigens of novel distributions, self and non-self antigen d ata was clustered into several groups: the 240 ‘Malignant’ examples were divided into three clusters of 45, 11 7 and 78 examples, and the 460 ‘Benign’ examples were grouped into three clusters of 42, 355 and 63 examp les. The Expectation Maximization (EM) clustering algori thm was applied to cluster antigen data. The EM algorit hm is widely-used as the basis for various unsupervised l earning algorithms (Mitchell, 1997). 80% of the self and no n-self antigen data belonging to each cluster were randoml y selected for N generations . Therefore, DynamiCS was provided with different antigen data at each genera tion and the distributions of these data changed at ever y N generations. The costimulation mechanism involving a security officer was implemented by simply increasi ng the match count only when a detector detects non-se lf antigens.

3 BACKGROUND: GENE LIBRARY EVOLUTION AND HYPERMUTATION

A problem found in previous experimental results is that the extended DynamiCS required a large number of memory detector co-stimulations in order to obtain satisfactory TP rates (Kim and Bentley 2002b). This problem could originate from the simplification of the developed AIS, which did not adopt all the evolutio n processes engaged in the human immune system. So fa r negative selection and clonal selection have alread y been employed in DynamiCS and their effects were analyse d. However, gene library evolution has not yet been ad opted in DynamiCS.

The analyses of previous experimental results expla ined that the extended DynamiCS with high activation threshold of a mature detector ( A) provided a smaller number of memory detectors and thus it required les s involvement from human security officers. However, it missed a larger number of non-self antigens. In add ition, they have shown that the generation of more memory detectors by decreasing the A can increase TP rates. This was mainly because all the new detectors were gener ated randomly and thus generated detectors were randomly scattered in the non-self antigen space. In other w ords, although existing memory detectors detected a suffi cient number of non-self antigens to activate, they can b e further finely tuned to match more non-self antigen s.

If new detectors are generated by taking some feedb ack from previous detection results into account, then a new detector can be improved to match a larger number o f non-self antigens. This idea can be implemented by a model of gene library evolution using hypermutation , as will be described later. Bearing in mind the effect of gene library evolution, this section addresses how the h uman immune system evolves over generations, and how existing AIS’s adopt these mechanisms.

3.1 GENE LIBRARY EVOLUTION BY HUMAN IMMUNE SYSTEMS

The human immune system learns dynamically changing antigens via clonal selection. To be more precise, activating antibodies clone themselves and prolifer ate across different parts of the body. Cloning antibodies trigger a somatic hypermutation process. Somatic hypermuatation mutates a random portion of genes in antibody clones. Mutated offspring of activating antibodies are expected to have wider variations in their antigen-matching genes. Mutants are quickly disseminated across the body and start detecting ot her types of antibodies. During this process, mutants a nd existing antibodies compete to detect more antigens and their antigen detection results determine their aff inities. The antibodies with higher affinities survive longe r and clone themselves more. It is known that clonal sele ction with hypermutation is essential for the human immun e system to permanently learn newly appearing antigen s (Paul, 1993; Sompayrac, 1999).

Page 3: AM odeolG f enLe ibraryE volutionint h De ynamiCc l onaSl ... · AM odeolG f enLe ibraryE volutionint h De ynamiCc l onaSl election Algorithm JK. im DepartmenotC f omputeSr cience

Somatic hypermutation mechanism is distinguished fr om mutation taking place in a germ line level 1. While a germ line level of mutation occurs typically at a low ra te, mutation applied on activating antibody clones oper ates at a much higher rate. Another different feature of so matic hypermutation is that it is applied only on a somat ic level. It is known that the mutated genes of antibody clon es cannot be directly written back to the DNA (or a ge ne library) of an egg or sperm cell. As a result, the genes of surviving antibody mutants are not passed onto the next generation of the immune system (Paul, 1993; Sompayrac, 1999).

However, it is also known that the learning results via clonal selection with hypermutation during a lifeti me indirectly lead the evolution of a gene library in the human immune system over generations. Although the genes of useful antibody mutants are not directly inherited, individuals capable of generating more u seful mutants are more likely to survive against various types of pathogens. Thus, the gene libraries of these ind ividuals are passed over generations and offspring having th ese inherited gene libraries are more likely to have an immune system with a good capability of producing useful mutants. This effect was proposed for the fi rst time by Mark Baldwin in 1896 and named as the Baldwin effect (Baldwin, 1896).

While it has been reported that the learning of the human immune system during a lifetime indirectly determin es the direction of gene library evolution (Hightower et al. , 1996; Perelson et al. , 1996), other work by Hightower et al. (1995) investigated what determines the direction of gene library evolution (i.e. where the selection pr essure of gene library evolution is aimed). This question is about what the evolution strategy of the human immune system is when the goal is that a dynamically changing vas t number of antigens should be covered by a much smal ler number of antibodies. This work showed that the bin ary antibodies of AIS evolve toward a balancing point between maximum coverage of the antigen space and t he least overlapping coverage of antibody space.

Oprea and Forrest (1998; 1999) advanced further the work by Hightower et al. (1995) and studied the diversity required of a gene library in the human immune syst em, and the role of gene library evolution. This work v erified that antibody evolution gets slower and evolves to cover more random antigen niches when the pathogen size (exposed to antibodies) gets smaller. In this case, the immune system does not let the gene library evolve toward existing antigen specific niches. Instead, i t evolves toward covering a coarse-grained antigen space. Thi s understanding was drawn from the observation that t he survival probability of the organism (the average f itness of immune systems) increased logarithmically with t he size of its germ line-encoded antibody repertoire ( the

1 Germ line manipulation requires the altering of the DNA in the reproductive cells which make the fertilized egg, s o that the genetic changes will be copied into every cell of the futur e adult, including his or her reproductive cells.

number of antibody genes in the library). This res ult clearly illustrated that the gene library diversity is not maintained for specific recognition of individual pathogens, but rather it evolved to cover a coarse-gain encoding of the regions of the pathogen universe th at the species has encountered. A later study by the same author (Oprea, 1999b) investigated the role of hypermutati on by investigating its mutating targets. Her experiments showed that hypermutation usually targets to mutate the antigen-binding regions of a gene library and the m utation results often led fine-tuning of antigen-binding pa rts.

In summary, the gene library evolves by getting ind irect feedback from what the human immune system has learned during its lifetime. Germ line diversity th at is obtained through gene library evolution is somewhat directed toward covering a coarse-grain antigen spa ce, and learning through hypermuation leads the immune system to fine-tune its detection of the existing a ntigens.

3.2 GENE LIBRARY EVOLUTION BY ARTIFICIAL IMMUNE SYSTEMS

There are two methods employed by the currently available AIS’s in order to evolve their gene libra ries. The first approach directs gene library evolution throu gh the Baldwin effect and the second approach allows provi sion of direct feedback from learning results to a gene library. The first approach initially builds a gene library that is a collection of previously known antibody genes. This initial gene library provides a certain degree of a ntigen diversity but it obtains a satisfactory level of an tigen diversity through gene expression and learning usin g hypermutation. Although this approach does not dire ctly alter the genes in the gene library, it still allow s the gene library to evolve via the Baldwin effect. The secon d approach often does not distinguish a gene library from an antibody population. It treats a currently existing antibody population as a gene library and thus concentrates on antibody population evolution. As the result, this approach ignores the difference between lifetime le arning and evolution over generations, but it emphasises m ore the study of whether hypermutation accelerates the degree of antibody population evolution, and controls the evolution direction. These two different approaches have been implemented in various ways depending on the adopted AIS model.

One popular group of AIS is the extension of a conventional genetic algorithm. Researchers added several immune features to GA in order to complemen t some weaknesses found from a conventional GA (Dasgupta et al. , 1999a; Hart and Ross, 1999; Gaspar and Collard, 1999; Hajela and Yoo, 1999; Potter and De Jong, 1998; Nikolaev et al. , 1999; Michaud et al. , 2001). The static clonal selection algorithm introduced in pre vious work (Kim and Bentley, 2001) belongs to this group. Among these systems, (Hart and Ross, 1999) and (Michaud et al. , 2001) used a gene library that is separate from the antibody population. The gene libraries us ed in these work are collections of some partial solution s and

Page 4: AM odeolG f enLe ibraryE volutionint h De ynamiCc l onaSl ... · AM odeolG f enLe ibraryE volutionint h De ynamiCc l onaSl election Algorithm JK. im DepartmenotC f omputeSr cience

thus new antibody solutions were generated by concatenating these partial solutions. While Hart a nd Ross (1999) generated new antibodies using this method exclusively (Michaud et al. , 2001) generated only the initial antibody population using a gene library an d the antibody population was evolved using a conventiona l GA. However, neither investigated whether these approaches have additional benefits compared with o thers that did not differentiate the antibody population from the gene library. These other methods typically generat ed new antibodies using crossover and mutation operato rs of GA and antibodies in the population were continuous ly replaced with evolved new ones (Dasgupta et al. , 1999a; Gaspar and Collard, 1999; Hajela and Yoo, 1999; Pot ter and De Jong, 1998; Nikolaev et al. , 1999). From these latter approaches, apart from (Gaspar and Collard, 1999), none of these systems employed hypermutation, which might provide fine-tuned diversity of the antibody population that can cover currently existing antige ns. The AIS developed in (Gaspar and Collard, 1999) cloned the best n % of antibodies and mutated them with a high rate. From these mutated antibodies, only ones having improved fitness values were entered to the antibod y population for selection. They did not study the ef fect of hypermutation in terms of antibody evolution.

Another popular type of AIS, which use network theo ry, usually apply a mutation operator to n % of best antibodies in an antibody network, and mutated anti bodies are tested whether it is added to an existing immun e network (Timmis, 2001; Fukuda et al ., 1998; Watanabe et al., 1998; Ishida, 1996; Lee et al. , 1999). From these AIS’s, Timmis (2001) and Fukuda et al. (1998) did not use a gene library to create initial antibody nodes while others (Fukuda et al. , 1998; Watanabe et al. , 1998; Lee et al., 1999; Ishida, 1996) initialised antibody nodes wi th already known local solutions, which can be regarde d as a gene library. The systems using a gene library typically developed an artificial immune network in order to find a global solution under a dynamically changing environment by finding an optimal combination of existing local solutions as a global solution. Amon g these systems, Timmis (2001), Fukuda et al. (1998), and Lee et al. (1999) applied a high rate of mutation when clonin g new antibodies, and only Timmis (2001) investigated the different effects according to different rates of m utation. In this work, he has shown that the network connect ivity declined as the mutation rate got higher and thus contributes to increasing the diversity of the anti body network.

Other work by (De Castro and Von Zuben, 2000; De Castro and Von Zuben, 2001) developed an AIS by mimicking exactly the clonal selection process with out differentiating the gene library and the antibody population. When this system cloned new antibodies, it applied various mutation rates to each antibody dep ending on its affinity. It assigned smaller mutation rates when affinity is higher with the intention of increasing the diversity by correcting poorly performing antibodie s. However, this work neither investigated the effect of

mutation on the antibody population evolution, nor the need to have a separate gene library to accelerate antibody evolution.

4 EXTENDED DYNAMICS: SIMULATING GENE LIBRARY EVOLUTION USING HYPERMUTATION

4.1 ALGORITHM DESCRIPTION

The problem found from previous experiment results was that the extended DynamiCS obtained high TP rates o nly when it produced a large amount of memory detector co-stimulation. In contrast, for the case having a sma ller amount of memory detector co-stimulation, extended DynamiCS struggled to show high TP rates. However, the related work introduced in the previous section sug gests that applying hypermutation to immune cells for clo ning is a necessary mechanism to fine-tune current immun e cells to target non-self antigen binding regions. A s a way of resolving the problem of excessive co-stimulatio n, extended DynamiCS applies this mechanism.

If (immature detector population size + mature detector population size < non-memory detector pop size) { Do { if ( number of deleted memory detectors > 0 && mutation rate != 0 ) { Select a deleted memory de tector randomly and create its mutant Add this mutant to immat ure detector population. } else Generate a random detecto r and add it to an immature det ector population } Until (immature detector popu lation size + mature detector po pulation size = non-memory detecto r pop size) }

Figure 1. Modified Pseudo Code for Extended DynamiC S

It can be interpreted that low TP rates obtained by the extended DynamiCS were originated from coarse-grain ed non-self antigen niche coverage of activating detec tors. Thus, if these detectors were more fine-tuned to co ver existing non-self antigens, the extended DynamiCS c ould have higher TP rates without necessarily having a l arge amount of activating detectors. In order to investi gate the effect of hypermutation only, the extended DynamiCS does not create a separate gene library (i.e., a co llection of useful detector genes). Instead, it continues to ma intain three detector populations: immature, mature and me mory detector populations and treats a portion of the me mory detector population as a gene library. In order to let memory detectors evolve towards existing non-self antigens without binding self antigens, the extende d DynamiCS clones memory detectors by applying a hypermutation operator on deleted memory detectors. These mutants of deleted memory detectors are added to an immature detector population for the negative se lection test. Immature detectors in DynamiCS have always be en

Page 5: AM odeolG f enLe ibraryE volutionint h De ynamiCc l onaSl ... · AM odeolG f enLe ibraryE volutionint h De ynamiCc l onaSl election Algorithm JK. im DepartmenotC f omputeSr cience

randomly generated for negative selection. Now exte nded DynamiCS produces immature detectors by mutants of deleted memory detectors, if there are memory detec tors available or by random otherwise. Hence, this furth er extension of DynamiCS employs a “virtual gene libra ry” dynamically made from mutations of deleted memory detectors. Through the various selection mechanisms and hypermutation operator, the seed immature detectors produced by the virtual gene library evolve over ti me, just as the immature, mature and memory detectors evolve in their separate populations. This modification is summarised in the pseudo code shown in figure 1.

While the mutation rate used in GAs is very low (ar ound 0.01~0.05%), extended DynamiCS employs much higher rates (0.1% and 0.2%) for hypermutation. This also follows the mutation strategy of the human immune system. The human immune system deliberately uses a higher mutation rate in order to maintain its diver sity (Paul, 1993). Similarly, adopting a higher rate of mutation is expected to lead detectors to explore new non-se lf antigen niches and thus escape from existing self antigen niches. The following sections will study how an unusually larger mutation rate affects the performa nce of extended DynamiCS.

It also should be noted that hypermutation is appli ed to deleted memory detectors, not to existing memory detectors. This part is a slight variation of the h uman immune system. The human immune system clones successful memory detectors and spreads them to oth er lymph nodes distributed in the body. These new clon ed detectors are expected to detect associative non-se lf antigens that share some non-self antigen patterns detected by previously detectors but do not necessa rily have the same non-self antigen patterns with the pr evious detectors. In other words, cloned detectors are exp ected to detect new antigens belonging to a new antigen clus ter as soon as possible. During this process, the self-tol erance of new mutants are maintained by the helper T-cells. However, extended DynamiCS does not have a separate helper T-detector population to confirm self-tolera nce of newly cloned detectors. Therefore, extended DynamiC S uses hypermutation in a way to generate new detecto rs more tuned to target non-self antigen detection, an d at the same time still effectively avoid self antigen dete ction. Memory detectors are deleted when they match self antigens of the current antigen cluster, but the fa ct that they managed to become memory detectors at all impl ies that they hold valid information about non-self ant igens in previous clusters. By mutating these and reusing th em in the form of a virtual gene library to seed new imma ture detectors, this evolved information is being retain ed and fine-tuned by the system.

4.2 EXPERIMENT RESULTS

Two series of experiments were performed in order t o investigate the effects of hypermutation on true po sitive (TP) and false-positive (FP) rates by the extended DynamiCS introduced here. These experiments had the

same values of given parameters that were used in t he experiments of previous work (Kim and Bentley 2002b ), which are summarised in table 1.

Table 1. Parameter values used for DynamiCS experim ents

The first series of experiments was performed by va rying A values with mutation rate = 0.1 and the second ser ies was performed with mutation rate = 0.2. Figure 2 an d 3 show the average TP and FP rates of each series of experiments after running them five times. The X-a xes of these graphs represent the number of generations an d the Y-axes indicate detection rates. Each graph has tw o lines, one displaying a True Positive (TP) rate and anothe r showing a False Positive (FP) rate. The grid lines on the X-axis were placed at every N generations for N = 30. Each experiment was also run for maximum 2000 generations.

Table 2. Average numbers of surviving, generated an d deleted memory detectors during 2000 generations, a nd average number of memory detector costimulations pe r generation for the extended DynamiCS with mutation rate = 0.1. The mean values are followed by the variances in parentheses.

Extended DynamiCS with Mutation Rate = 0.1

Surviving Memory Detectors

Generated Memory Detectors

Deleted Memory Detectors

Memory Detector Co-stimulation

per generation

A = 5 45.5 (21.67) 535.5 (8869.67) 490 (8448.67) 40.48 (14.35)

A=10 37 (4) 376 (1444.67) 339 (1456.67) 31.39 (1.43)

A=20 32.5 (7) 259.5 (176.33) 227 (172) 28.08 (2.99)

A=40 27.5 (24.5) 203.5 (14964.5) 176 (13778) 22.56 (6.66)

Table 3. Average numbers of surviving, generated an d deleted memory detectors during 2000 generations, a nd the average number of memory detector costimulations pe r generation for the extended DynamiCS with mutation rate = 0.2. The values in parentheses are variances.

Extended DynamiCS with Mutation Rate = 0.2

Surviving Memory Detectors

Generated Memory Detectors

Deleted Memory Detectors

Memory Detecor Co-stimulation per generation

A = 5 44.75 (8.25) 264.5 (94.33) 219.75 (88.25) 39.15 (10.52)

A = 10 32.75 (24.92) 193.5 (539) 160.75 (393.58) 27.52 (14.62)

A = 20 29 (8.67) 126.5 (53.67) 97.5 (67) 24.48 (6.94)

A = 40 19.5 (0.33) 98 (1078) 78.5 (1013) 16.75 (1.12)

Parameters Values

Tolerisation Period (T) 30

Life Span of Mature Detectors (L) 10

Activation Threshold of Mature Detectors (A) {5, 10 , 20, 40}

Number of Generations that Antigens are Selected from the Same Cluster (N)

30

Page 6: AM odeolG f enLe ibraryE volutionint h De ynamiCc l onaSl ... · AM odeolG f enLe ibraryE volutionint h De ynamiCc l onaSl election Algorithm JK. im DepartmenotC f omputeSr cience

The effects of hypermutation are clearly revealed w hen these results are compared to the results obtained in the previous work (Bentley and Kim 2002b). From figure 2 and 3, FP rates are consistently low except one cas e where A = 5 and mutation rate = 0.2. The differences in TP rates depending on different mutation rates are clearly noticeable when A has a larger value. For instance, when A is 40 without mutants of memory detectors, TP rate s ranged between 0.5 and 0.9. On the other hand, when A is 40 with mutation rate = 0.2, TP rates increase so that they range between 0.85 and 0.95 (see figure 3). More importantly, the improvement in TP rates was obtain ed without increase of FP rates. The scale of TP rate increase is much more noticeable when mutation rate is 0.2 although TP rate increase can be seen when A is 40 with mutation rate = 0.1 (see figure 2). Thus, it is ver ified that hypermutation affects the result of extended Dynami CS in

a positive way: TP rates increase while maintaining low FP rates.

In addition, when A = 40 with mutation rate 0.2 which shows high TP rates and low FP rates, extended DynamiCS still maintained the average number of memory detector co-stimulation per generation as sm all as seen previously, when mutants of memory detector s were absent (Kim and Bentley 2002b). This result can be found from table 2 and table 3. They show the total number of surviving, generated and deleted memory detectors for total two thousand generations when mutation rate is 0.1 and 0.2 respectively. These nu mbers are the average numbers of five runs. For both case s, extended DynamiCS had the smallest number of memory detector co-stimulation when A = 40. Furthermore, when the extended DynamiCS had a larger mutation rate, 0 .2, it performed less memory detector co-stimulation than when it had a mutation rate 0.1.

A = 5

0

0 . 1

0 . 2

0 . 3

0 . 4

0 . 5

0 . 6

0 . 7

0 . 8

0 . 9

1

0 100

200

300

400

500

600

700

800

900

1000

1100

1200

1300

1400

1500

1600

1700

1800

1900

A = 5

0

0 . 1

0 . 2

0 . 3

0 . 4

0 . 5

0 . 6

0 . 7

0 . 8

0 . 9

1

0 100

200

300

400

500

600

700

800

900

1000

1100

1200

1300

1400

1500

1600

1700

1800

1900

A = 10

0

0 . 1

0 . 2

0 . 3

0 . 4

0 . 5

0 . 6

0 . 7

0 . 8

0 . 9

1

A = 10

0

0 . 1

0 . 2

0 . 3

0 . 4

0 . 5

0 . 6

0 . 7

0 . 8

0 . 9

1

A = 20

0

0 . 1

0 . 2

0 . 3

0 . 4

0 . 5

0 . 6

0 . 7

0 . 8

0 . 9

1

A = 20

0

0 . 1

0 . 2

0 . 3

0 . 4

0 . 5

0 . 6

0 . 7

0 . 8

0 . 9

1

A = 40

0

0 . 1

0 . 2

0 . 3

0 . 4

0 . 5

0 . 6

0 . 7

0 . 8

0 . 9

1

0

100

200

300

400

500

600

700

800

900

1000

1100

1200

1300

1400

1500

1600

1700

1800

1900

A = 40

0

0 . 1

0 . 2

0 . 3

0 . 4

0 . 5

0 . 6

0 . 7

0 . 8

0 . 9

1

0

100

200

300

400

500

600

700

800

900

1000

1100

1200

1300

1400

1500

1600

1700

1800

1900

Figure 2. TP and FP rates when A varies and T = 30, L = 10, N =30 with mutation rate = 0.1

Figure 3. TP and FP rates when A varies and T = 30, L = 10, N =30 with mutation rate = 0.2

Page 7: AM odeolG f enLe ibraryE volutionint h De ynamiCc l onaSl ... · AM odeolG f enLe ibraryE volutionint h De ynamiCc l onaSl election Algorithm JK. im DepartmenotC f omputeSr cience

To summarise, two series of experimental results sh ow that TP rates increased when immature detectors wer e generated by applying a hypermutation operator to deleted memory detectors. Furthermore, it maintaine d low FP rates and the small number of memory detector co-stimulation. These positive effects were more clear ly found when a larger mutation rate was applied.

5 DISCUSSION OF DYNAMICS

DynamiCS has been introduced to make our AIS fulfil two properties required by an effective intrusion d etection system: learn stabilised self behaviours when prese nted with only a small subset of self antigens at one ti me and learn sudden changes in converged self behaviours. In order to provide these features to the AIS, DynamiC S employed several novel components such as immature, mature and memory detector populations, tolerisatio n period, activation threshold, mature detector life- span, mature and memory detector co-stimulation and apply ing hypermutation to generate immature detectors. All o f these novel components were designed by following t he mechanisms existing in the human immune system and thus led the AIS to yield desired two properties.

Many of these novel components are based on the different AIS, called LYSIS, proposed by (Hofmeyr, 1999; Hofmeyr and Forrest, 2000). LYSIS is also equipped with three detector populations (immature, mature and memory), tolerisation period, activation threshold, co-stimulation and mature detector life- span. (Hofmeyr, 1999; Hofmeyr and Forrest, 2000) tested LYSIS system against network traffic headers collec ted for 50 days, consisting of 3900 unique self strings . In order to scale this size of self strings, (Hofmeyr , 1999; Hofmeyr and Forrest, 2000) developed LYSIS in a distributed environment and thus fifty different ho sts generated total 5000 immature detectors per day. Si milar to DynamiCS, LYSIS also dynamically generated immature detectors and started to monitor new antig ens after the first tolerisation period. Although this system was tested against real network headers, the real environment scenario given to these tests was only limited to the first real environment scenario studied in t his work: learn stabilised self behaviours with only a small subset of self antigens at one time. Thus, DynamiCS is the on ly AIS that employed novel components introduced in th is work and has been tested on another important IDS r eal scenario: learn quickly any sudden changes in conve rged self behaviours. Under this scenario, DynamiCS was capable of detecting non-self antigens in a satisfa ctory level without losing its self-tolerance and this wa s achieved by applying hypermutation, which is not adopted by LYSIS.

(Hofmeyr, 1999; Hofmeyr and Forrest, 2000) investig ated a way to tune LYSIS behaviours to get desired TP an d FP rates. This study was focused on choosing an appropriate tolerisation period, activation threshold and decay rate. It

should be noted that the decay rate used in LYSIS w as not adopted by DynamiCS. It was regarded that the numbe r of parameters used in DynamiCS already seemed to be la rge enough to make controlling system behaviour difficu lt. Although a decay rate was introduced in LYSIS in or der to replace detectors in a more dynamic way, DynamiC S managed to provide a similar effect without this parameter by using a gene library evolution model w ith hypermutation.

6 CONCLUSION

As one way to decrease the poor FP rates caused by memory detectors, DynamiCS was extended by eliminating memory detectors when they showed a poo r degree of self-tolerance to new antigens (Kim and Bentley, 2002a). This extended system was tested to determine whether surviving memory detectors no lon ger caused seriously high FP error rates or not. The te st results showed that deletion of memory detectors ba sed on their self-antigen detection dramatically decrea sed high FP rates. However, this method required a larg er amount of co-stimulation in order to gain such bene fits. The large amount of co-stimulation can render the s ystem weak for intrusion detection. This disadvantage dem anded further extension of DynamiCS.

In order to resolve this problem, this paper explor ed the use of hypermutation in DynamiCS to produce the eff ect of gene library evolution. This additional extensio n was designed to fine-tune generated memory detectors so that the system obtained higher TP rates without increas ing the amount of co-stimulation. The gene library evol ution was modelled by producing immature detectors via hypermutation on deleted memory detectors. Thus a “virtual gene library”, made from mutations of dele ted memory detectors was maintained. The new extension was tested to determine whether it achieved high TP rates without increasing the amount of co-stimulation. Th e test results confirmed that hypermutation enabled the evolution of the virtual gene library and thus produced immature detectors that were better tuned to cover existing non-self antigens.

References

Kim, J. and Bentley, P. J. (2002a) Towards an Artif icial Immune System for Network Intrusion Detection: An Investigation of Dynamic Clonal Selection. Proceedings of Congress on Evolutionary Computation , pp.1015-1020, 2002 .

Kim, J. and Bentley, P. J. (2002b) Immune Memory in the Dynamic Clonal Selection Algorithm. Submitted to th e first International Conference on Artificial Immune Systems (ICARIS) .

Hofmeyr, S., (1999) An Immunological Model of Distributed Detection and Its Application to Comput er

Page 8: AM odeolG f enLe ibraryE volutionint h De ynamiCc l onaSl ... · AM odeolG f enLe ibraryE volutionint h De ynamiCc l onaSl election Algorithm JK. im DepartmenotC f omputeSr cience

Security, PhD Thesis, Dept of Computer Science, University of New Mexico, 1999.

Hofmeyr, S. A., and Forrest, S., (2000) “Architectu re for an Artificial Immune System”, Evolutionary Computation, Vol. 7, No. 1, Morgan-Kaufmann, San Francisco, CA, pp. 1289-1296, 2000.

Paul, W. E., (1993) “The Immune System: An Introduction”, Fundamental Immunology 3 rd Ed., W. E. Paul (Ed), Raven Press Ltd, 1993.

Sompayrac, L., (1999) How the Immune System Works , Blackwell Sicence, 1999.

Baldwin, J. M., (1896) “A New Factor in Evolution”, American Naturalist , Vol.30, pp.441-451.

Hightower, R., Forrest, S., and Perelson, A. S., (1 996) “The Baldwin Effect in the Immune System: Learning by Somatic Hypermutation”, in R.K. Belew and M. Mitche ll, (eds.), Adaptive Individuals in Evolving Populations , Addison-Wesley, Reading, MA, pp. 159-167, 1996.

Perelson, A. S., Hightower, R., and Forrest, S., (1 996) “Evolution and Somatic Learning in V-Region Genes”, Research in Immunology, Vol. 147, pp. 202-208.

Hightower, R., Forrest, S., and Perelson, A. S., (1 995) “The Evolution of Emergent Organization in Immune System Gene Libraries”, Proceeding of the Sixth International. Conference. on Genetic Algorithms , L.J. Eshelman (Ed.), Morgan Kaufmann, San Francisco, CA, pp.344—350, 1995.

Oprea, M., and Forrest, S., (1998) “Simulated Evolu tion of Antibody Libraries Under Pathogen Selection”, Proceeding of IEEE International Conference on Syst ems, Man and Cybernetics, 1998.

Oprea, M. and Forrest, S., (1999) "How the Immune System Generates Diversity: Pathogen Space Coverage with Random and Evolved Antibody Libraries.", Proceeding of Genetic and Evolutionary Computation Conference (GECCO), July,1999.

Dasgupta, D., Cao, Y., and Yang, C., (1999) “An Immunogenetic Approach to Spectra Recognition”, Proceeding of Genetic and Evolutionary Computation Conference (GECCO’ 99), July 13-17, pp149-155, 1999.

Hart, E. and Ross, P., (1999) “An Immune System Approach to Scheduling in Changing Environments”, Proc. of Genetic and Evolutionary Computation Conference (GECCO’99), pp.1559-1566.

Gaspar, A., and Collard, P., (1999) “From Gas to Artificial Immune Systems: Improving Adaptation in Time Dependent Optimisation”, Proceeding of CEC99 , 1999.

Hajela, P., and Yoo, J. S., (1999) “Immune Network Modelling in Design Optimization”, in New Ideas in Optimization, (Eds.) D. Corne, M. Dorigo, & F. Glover, McGraw Hill, London, pp.203-215, 1999.

Potter, M. A. and De Jong, K.A., (1998) “The Coevolution of Antibodies for Concept Learning”,

Proceeding of the fifth Intl. Conference on Paralle l Problem Solving From Nature , pp.530-539, 1998.

Mitchell, T., (1997) Machine Learning , McGraw-Hill, 1997.

Nikolaev, N., Iba, H., and Slavov, V., (1999) “Indu ctive Genetic Programming with Immune Network Dynamics”, Advances in Genetic Programming 3 , MIT Press, Chapter 15, pp.335-376, 1999.

Michaud, S. R., et al. , (2001) “Protein Structure Prediction with EA Immunological Computation”, Proceeding of Genetic and Evolutionary Computation Conference (GECCO’2001), July 7-11 , pp.1367-1874, 2001.

Kim, J. and Bentley, P. J. (2001). Towards an Artif icial Immune System for Network Intrusion Detection: An Investigation of Clonal Selection with a Negative Selection Operator. In Proc. of CEC2001, the Congress on Evolutionary Computation , Seoul, Korea, May 27-30, 2001. pp. 1244-1252.

Timmis, J., (2001) Artificial Immune Systems: a Novel Data Analysis Technique Inspired by the Immune Network Theory , PhD Thesis, Dept. of Computer Science, University of Wales, Aberystwyth, 2001.

Fukuda, T., Mori, K., and Tsukiyama, M., (1998) “Parallel Search for Mutil-Modal Function Optimizat ion with Diversity and Learning of Immune Algorithm”, Artificial Immune Systems and Their Applications , (Ed) Dasgupta, D., Springer-Verlag, Berlin, pp.210 – 220, 1998.

Watanabe, Y., Ishiguro, A., Shirai, Y., and Uchikaw a, Y., (1998) "Emergent Construction of Behavior Arbitrati on Mechanism Based on the Immune System", Proceeding of ICEC'98 , pp.481-486, 1998.

Ishida, Y., (1996) “An Immune Network Approach to Sensor-Based Diagnosis by Self-Organization”, Complex Systems, Vol. 10:1, pp. 73-90.

Lee, W., Park, C., and Stolfo, S. J., (1999) “Towar ds Automatic Intrusion Detection Using NFR”, to appear in the Proceeding of 1st USENIX Workshop on Intrusion Detection and Network Monitoring , 1999.

De Castro, L. N., and Von Zuben, F. J., (2000) “The Clonal Selection Algorithm with Engineering Applications”, Proceeding of Artificial Immune System Workshop, Genetic and Evolutionary Computation Conference (GECCO’ 2000) , pp36-37.

De Castro, L. N., and Von Zuben, F. J., (2001) “AiN et: an Artificial Immune Network for Data Analysis”, (Book chapter in) Data Mining: A Heuristic Approach , (Eds) Abbass, H. A., Sarker R. A., Newton, C. S., Idea Gr oup Publishing, 2001.