09/20/04 introducing proteins into genetic algorithms – csimta'04 introducing “proteins”...

29
09/20/0 4 Introducing Proteins into Genetic Algorithms – CSIMTA'04 Introducing “Proteins” in Genetic Algorithms Virginie LEFORT, Carole KNIBBE, Guillaume BESLON , Joël FAVR INSA-IF/PRISMa, FRANCE Artificial Life and Behaviour Team (ALAB)

Upload: magdalen-tyler

Post on 13-Dec-2015

230 views

Category:

Documents


0 download

TRANSCRIPT

09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04

Introducing “Proteins” into Genetic Algorithms

Virginie LEFORT, Carole KNIBBE, Guillaume BESLON, Joël FAVREL

INSA-IF/PRISMa, FRANCEArtificial Life and Behaviour Team (ALAB)

2

09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04

Introduction: Origin of species Natural (Darwinian) evolution

Variation of the genotype ( variation of the phenotype) Extinction of the less fitted individuals Preservation (and diffusion) of favourable variations Rejection of unfavourable variations

Information support (genotype) DNA Genes (DNA coding sequences)

Genotype to phenotype mapping (simplified!) Transcription-translation (genes proteins) Biochemistry (proteins cells)

3

09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04

Principle of genetic algorithms Mimic darwinian evolution in the context of

parametric optimization All parameters are aligned to build a (genetic) sequence An artificial population is randomly generated Individuals reproduce themselves (generation loop) Selection mechanism based on a fitness function The genetic sequence can be modified during the

reproduction process (Mutations, Crossover)

Genetic algorithms are very efficient They can be applied to a wide range of problems

even when no a priori knowledge is available

4

09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04

Principles of genetic algorithms The reproduction loop :

Selection

Reproduction

Fitness Evaluation

5

09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04

But ... The genotype structure is chosen initially (and

arbitrarily) The genotype structure constraints the evolutionary

process Close genes evolve together even though the

corresponding parameters are independent Distant genes evolve separately even though the

corresponding parameters are dependent Building blocks hypothesis (J.H. Holland)

The algorithm precision is also chosen initially Precision depends on the parameter encoding Fixed along the overall evolutionary process Precision generally is the same for all parameters

6

09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04

Why ? The genotype to phenotype mapping is too simple

one gene one parameter “linear” transformation

The algorithm depends on the genetic structure The genetic structure cannot evolve

Gene 1 Gene 2 Gene 3 Gene 4 Gene 5

Param 1 Param 2 Param 3 Param 4 Param 5

7

09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04

Genetic structure constraints In genetic algorithms the genome is directly

mapped into a phenotype The genome structure cannot be modified

Under-specified parameters,

Gene 1 Gene 2 Gene 3 Gene 5

Param 1 Param 2 Param 3 ??? Param 5

8

09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04

In genetic algorithms the genome is directly mapped into a phenotype

The genome structure cannot be modified Over-specified parameters,

Gene 1 Gene 2 Gene 3 Gene 4 Gene 5Gene 4’

Param 1 Param 2 Param 3 ??? Param 5

Genetic structure constraints

9

09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04

In genetic algorithms the genome is directly mapped into a phenotype

The genome structure cannot be modified Incoherent crossing-over

Gene 3’ Gene 5’ Gene 1’ Gene 4’ Gene 2’

Gene 1 Gene 2 Gene 3 Gene 4 Gene 5

??? ??? ??? Param 4’ ???

Gene 1 Gene 2 Gene 1’ Gene 4’ Gene 2’

Genetic structure constraints

10

09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04

… in biology ? In living beings, different genetic structures give

rise to different organisms on the basis of the same translation mechanism …

Genetic principles of the C. Elegans worm are (quite) the same as for bacterias or humans …

The rules are the same in (quite) all the living kingdom …

The gene number, size, position (locus), order … are free to evolve

The information sources are (only) the coding sequences

Why do we loose this property in GAs ?

11

09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04

The proteome In biology there is an intermediate level between

the genotype and the phenotype :

The genotype structureis lost …

Genotype and phenotype structures can evolve separately ...Phenotype

Gene 1 Gene 2 Gene 3 Gene 4 Gene 5

Proteome

12

09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04

The RBF-Gene algorithm: Basic ideas Back to the “biological” gene definition

The genome is a succession of coding and non-coding sequences

Coding sequences (genes) are identified by their local context

Each gene expresses a protein whose function is “only” determined by the local sequence

The local sequence is translated thanks to a “genetic code” Proteins interact to produce the phenotype

The RBF-Gene model is based on: A “protein layer” between genotype and phenotype A “genetic” code to find the genes and the associated

“protein” functions

13

09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04

Our “protein” layer The phenotype is an Rn Rm function (regression function) The RBF-Gene model introduces an intermediate layer

between the parameters and the regression function The function is a linear combination of elementary kernel functions The kernel shape is predefined (e.g. gaussian functions, sinus, …) one coding sequence (one gene) one kernel (event. not effective) The genetic code is used to translate the gene sequence into kernel

parameters

Example: R R gaussian kernels Three parameters/kernel : μi, σi and wi

The final phenotype is given by :

μ

σ

Kernel Ki

n

iii Kw

1

14

09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04

The genetic code Biological genetic code

4 bases (A, C, G, T) 64 codons (3 bases) 4 specific codons : Start (‘ATG’) and Stop (‘TAA’, ‘TAG’ and ‘TGA’) 20 amino-acids

RBF-Gene genetic code Simplification : direct use of the “DNA” bases (n bases) 2 specific bases : Start (‘A’) and Stop (‘B’) 2 bases for each kernel parameter (e.g. ‘C’ and ‘D’ for parameter

w) The number of bases depends on the number of parameters (i.e.

on the function dimension) Binary, variable length Gray code ...

15

09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04

G2 G3 G4

The genotype to phenotype mapping

G1

FE…BEFDGGCFDGHEGA…D

μ

σKernel K1:

σ: 00010(gray) 00010(bin) 0.0625

Phenotype :

n

iii Kw

1

1σH

0σG

1μF

0μE

1wD

0wC

StopB

StartA

ValueParameterBase

Genetic code

w: 101(gray) 110(bin) 0.75μ: 0110(gray) 0100(bin) 0.25

16

09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04

The reproduction loop General Principle: Same as GAs

Biologically inspired operators (local, global, …)

Fitness Evaluation

Selection

Reproduction

17

09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04

Advantages of the RBF-Gene model The regression function is computable whatever the

genome structure (size, genes number, genes order, …) The algorithm is (partly) problem-independant

The algorithm adapts the gene number The algorithm can adapt the phenotype complexity

The algorithm adapts the gene length The algorithm can adapt the phenotype precision The algorithm can enhance the precision during the

evolutionary process

The “protein” layer enables us to analyse the phenotype

E.g. One kernel one fuzzy rule

18

09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04

Example: regression on a “toy-problem”

• Composition of 5 gaussian functions• Gaussian noise : =0.05• Two example sets :

• Learning set (50 points)• Validation set (50 points)

• Parameters :• Population size : 100• Initial genome size : 200• Number of codons : 8• Mutation rate : 5.10-4 / base• Indel rate : 2 x 5. 10-4 / base• Rearrangement rate : 3 x 0.02 / indiv.• Crossing-over rate : 0.6 / indiv.

• Fitness criteria : mean square error

19

09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04

Results (1): Evolution of the fitness

20

09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04

Results (2): Genome, “proteome” and phenotypeGeneration: 0

Initial population :• Genome size : 200• Number of kernels: 16 (4 coding)• Learning fitness: 1.3612• Validation fitness: 1.0056

Final results :• Genome size : 472• Number of kernels: 15 (10 coding)• Learning fitness: 0.0206• Validation fitness: 0.0497

Generation: 2000

21

09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04

Results (2): “proteome” and phenotypeGeneration: 2000

22

09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04

Results (3): Overfitting

23

09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04

Results (3): Overfitting

24

09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04

Results (4): Genome size

25

09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04

Results (5): Number of genes

26

09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04

Results (6): Gene size (i.e. precision)

27

09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04

Results (7): Coding proportion

28

09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04

Conclusion

Reorganization of the genome DURING and BY the evolutionary process The algorithm adapts the gene number The algorithm adapts the gene size

Tested on the abalone dataset (R8 to R regression) Very good results (but slow computations)

Perspectives: Evolution of neural networks The final structure is an RBF-Network … Other architectures are possible (MLP, recurrent networks, …) The algorithm adapts the synaptic weights and the network structure

(e.g. number of neurons) Rules extraction from the proteome

29

09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04

Questions ?