gas and premature convergence
DESCRIPTION
GAs and Premature Convergence. Premature convergence - GAs converge too early to suboptimal solution as the population evolves, only a little new can be produced. Reasons for premature convergence: improper selection pressure insufficient population size deception - PowerPoint PPT PresentationTRANSCRIPT
GAs and Premature Convergence
Premature convergence - GAs converge too early to suboptimal solution o as the population evolves, only a little new can be produced
Reasons for premature convergence:o improper selection pressureo insufficient population sizeo deception o improper representation and
genetic operators
Motivation and Realization
Motivation – to maintain a diversity of the evolved population and extend the explorative power of the algorithm
Realizationo Convergence of the population is allowed up to specified extento Convergence at individual positions of the representation is controlled o Convergence rate – specifies a maximal difference in the frequency of
ones and zeroes in every column of the population– ranges from 0 to PopSize/2
o Principal condition – at any position of the representation neither ones nor zeroes can exceed the frequency constraint
o Specific way of modifying the population genotype
Algorithm of GALCO
1. Generate initial population2. Choose parents3. Create offspring4. if (offspring > parents)
then replace parents with offspring
else{find(replacement)replace_with_mask(child1, replacement)find(replacement)replace_with_mask(child2, replacement)
}5. if (not finished) then go to step 2
Operator replace_with_mask
Mask – vector of integer counters; stores a number of 1s for each bit of the representation
50
Testovací úlohy - statické
F101(x, y) Deceptive function
Hierarchická funkce Royal Road Problem
GALCO – vliv parametru C
GALCO vs. SGA
GALCO – vliv parametru C
GALCO vs. SGA
Multimodal Optimization
Initial population SIGA
with without
Multimodal Optimization (cont.)
Initial population GALCO SIGA
GA s reálně kódovanou binární rep. (GARB)
Pseudo-binární rep. - bity kódovány reálným číslem r 0.0, 1.0o interpretace(r) = 1, pro r > 0.5
= 0, pro r < 0.5 redundance kóduo Příklad: ch1 = [0.92 0.07 0.23 0.62]
ch2 = [0.65 0.19 0.41 0.86] interpretace(ch1) = interpretace(ch2) = [1 0 0 1]
Síla genů – vyjadřuje míru stability genů
o Čím blíže k 0.5 tím je gen slabší (nestabilnější)
o „Jedničkové geny“: 0.92 > 0.86 > 0.65 > 0.62
o „Nulové geny“: 0.07 > 0.19 > 0.23 > 0.41
Gene-strength adjustment mechanism
Geny chromozomů vzniklých při křížení jsou upravenyo v závislosti na jejich interpretaci o a relativní frequenci jedniček (nul) na dané pozici v populaci P[]
př.: P[0.82 0.17 0.35 0.68] v populaci je na 1. pozici 82% jedniček,
na 2. pozici 17% jedniček,na 3. pozici 35% jedniček,na 4. pozici 68% jedniček.
Geny, které v populaci převládají jsou oslabovány; ostatní jsou posilovány.
Posilování a oslabování genů
Oslabovánígen’ = gen + c*(1.0-P[i]), když (gen<0.5) a (P[i]<0.5)
(gen má hodnotu nula a v populaci na i-té pozici převažují nuly)a
gen’ = gen – c*P[i], když (gen>0.5) a (P[i]>0.5)
Posilovánígen’ = gen – c*(P[i]), když (gen<0.5) a (P[i]>0.5)
(gen má hodnotu nula a v populaci na i-té pozici převažují jedničky)a
gen’ = gen + c*(1.0-P[i]), když (gen>0.5) a (P[i]<0.5)
Konstanta c určuje rychlost adaptace genů: c (0.0,0.2
Stabilizace slibných jedinců
Potomci, kteří jsou lepší než jejich rodiče by měli být stabilnější než ostatní vygenerovaná nekvalitní řešení
o Chromozomy slibných jedinců jsou vygenerovány se silnými genych = (0.71, 0.45, 0.18, 0.57)
ch’= (0.97, 0.03, 0.02, 0.99)
o Geny slibných jedinců přežijí více generací aniž by byly zmeněny v důsledku oslabování
Pseudocode for GARB1 begin2 initialize(OldPop)3 repeat4 calculate P[] from OldPop5 repeat6 select Parents from OldPop7 generate Children8 adjust Children genes9 evaluate Children10 if Child is better than Parents11 then rescale Child12 insert Children to NewPop13 until NewPop is completed14 switch OldPop and NewPop15 until termination condition16 end
Testovací úlohy - dynamické
Ošmerův dynamický problémg(x,t) = 1-exp(-200(x-c(t))2)c(t) = 0,04(t/20)
Minimum g(x,t)=0.0 se mění každých 20 generací
Oscillating Knapsack Problem14 objektů, wi=2i, i=0,...,13
f(x)=1/(1+target-wixi) Target osciluje mezi hodnotami
12643 a 2837, které se v binárním vyjádření liší o 9 bitů
Výsledky na statických problémech
0 100 200 300 400 5001300
1350
1400
1450
1500
f itness ev aluations (x1000)
fitne
ss
GARBSGA
DF3
0 100 200 300 400 500500
1000
1500
2000
2304
f itness ev aluations (x1000)
fitne
ss
GARBSGA
H-IFF
F101
0 100 200 300 400 500-955
-900
-800
-700
f itness ev aluations (x1000)
fitne
ss
GARBSGA
F101
Výsledky na statických problémech
0 100 200 300 400 5000
100
200
300
400
500
frequ
ency
of o
nes
at g
iven
pos
ition
gene60gene62
0 100 200 300 400 500500
1000
1500
fitness evaluations (x1000)
fitne
ss
best fitnessaverage fitness
0 100 200 300 400 5000
100
200
300
400
500
frequ
ency
of o
nes
at g
iven
pos
ition
gene51gene59
0 100 200 300 400 500-955
-750
-500
-250
fitness evaluations (x1000)
fitne
ss
best fitnessaverage fitness
0 100 200 3000
100
200
300
400
500
frequ
ency
of o
nes
at g
iven
pos
ition
gene80gene200
0 100 200 234 300500
1000
1500
2000
2304
fitness evaluations (x1000)
fitne
ss
best fitnessaverage fitness
Výsledky na dynamických problémech
Oscillating knapsack problem
Výsledky na dynamických problémech• Ošmerův dynamický problém
Bezprostředně po změně opt. Celkově
Algoritmus MTE StDev MTEStDev
GARB c = 0:025 83.3 30.6 50.425.2
GARB c = 0:075 25.6 34.6 2.47.4GARB c = 0:125 12.8 22.4 1.03.9GARB c = 0:175 10.2 19.7 0.73.0GARB c = 0:225 9.2 19.3 0.62.7SGA binary N/A N/A 57.343.61SGA Gray N/A N/A 47.6642.94CBM-B N/A N/A 19.3933.13
MTE – Mean Tracking Error [%] – střední odchylka nejlepšího jedince v populaci a optimálního řešení počítaná přes všechny gen.
Zotavení z homogenní populace
0 4 25 50 75 1000.0
0.125
0.250
0.375
0.5
700
875
1050
1225
1400
1400
1425
1450
1475
1500DF3 problem
generations0 4 25 50 75 100
0.0
0.125
0.25
0.375
0.5
0.0
0.125
0.25
0.375
0.5
0.0
0.25
0.50
0.75
1.0
generations
Knapsack problem
dive
rsity
mea
sure
aver
age
fitne
ss
best
fit
ness
best
fit
ness
aver
age
fitne
ss
dive
rsity
mea
sure
Weakness of Simple Selectorecombinative GAs
Scale poorely on hard problems, largely the result of their mixing behaviouro Inability of SGA to correctly identify and adequately mix the
appropriate BBs in subsequent generationso Exponential computation complexity of SGA
Crossover operators or other exchange emchanisms are needed such that adapt to the problem at hando Linkage adaptation
Naivní přístupy – operátor inverze Obrátí pořadí genů náhodně vybraného podřetězce v chromozomu
10011 – (1,1) | (2,0)(3,0)(4,1) | (5,1) po inverzi
(1,1) (4,1)(3,0)(2,0) (5,1)
Nepoužitelné z důvodu nevyváženosti signálu pro zlepšování linkage oproti signálu pro učení allel.o tα < tλ
- alely podstupují přímější selekci než linkage GA se rozhodne pro optimální nastavení alel dříve než zjistí, které kombinace genů zformovat dohromady a vzájemně mixovat.
o Řešení: obrátit nerovnítko na tα > tλ (ALE JAK?)
Competent GAs
Can solveo hard problems (multimodal, deceptive, high degree of
subsolution interaction, noise, ...),o quickly,o accurately,o reliably.
Messy GAs – mGA, fmGA, gemGA Learning linkage GAs – LLGA Compact GAs – cGA, ECGA Bayesian optimization algorithm - BOA
Messy Genetic Algorithms - mGAs
Inspirationfrom the nature – evolution starts from the simplest forms of life
mGA departed from SGA in four ways:o messy codingso messy operatorso separation of processing into three heterogeneous phaseso epoch-wise iteration to improve the complexity of solution
mGA’s codings
Tagged alleles: o Variable-length strings: (name1, allele1) … (nameN, alleleN)
((4,0) (1,1) (2,0) (4,1) (4,1) (5,1))
Over-specification – multiple gene instances (gene 4)o Majority voting – would express deceptive genes too readilyo First-come first-served (left to right expression) - positional
priority
Underspecification – missing gene instances (gene 3)o Average schema value – variance is too high o Competitive template – solution locally optimal with respect to
k-bit perturbations
Messy operators: cut & splice
Cut – divides a single string into two parts Splice – joins the head of one string with the tail of the other one
o When short strings are mated – probability of cut is small mostly the string will be just spliced
– the strings’ length is doubled
o When long string are mated – probability of cut is large one-point crossover
mGAs: three heterogeneous phases
Initializationo Enumerative initialization of the population with all sub-strings of a
certain length k<<l (lk)2k O(lk) computations
o Guaranteed that all BBs of certain size are present in the population
Primordial phaseo Only selection used to dope the population with good BBso Good linkage groups are selected before their alleles are allowed to
be mixed
Juxtapositional phaseo selection + cut&spliceo Mixing of the BBs
Fast messy genetic algorithms - fmGAs Probabilistically complete enumeration
o Population of strings of length l’ close to l is generatedo Assumption: each string contains many different BBs of length k<<l
Building block filtering – extracts highly-fit and effectively linked BBso Repeated (1) selection and (2) gene deletiono Only O(l) computations to converge
Extended thresholding – tournaments are held only between strings that have a threshold number of genes in common
fmGA vs mGA: 150-bit long problem, 305-bit deceptive functiono 1.9105 vs. 5.9108 evaluations
Gene expression messy GA - gemGA
Messy ???o No variable-length stringso No under- or over-specificationo No left-to-right expression
Messy use of heterogeneous phases of processing in gemGAo Linkage learning phase - first identifies linkage groupso Mixing phase – selection + recombination
– exchanges good allele combinations within those groups to find optimal solution
gemGA: The idea Linkage learning phase
o Transcription I (antimutation)– Each string undergoes l one-bit perturbations– Improvements are ignored ?!? (bit does not belong to optimal BB)– Changes that degrade the structure are marked as possible linkage
groups candidatesEx.: two 3-bit deceptive BBs 111 101
marked not marked (degrades) (improves)
o Transcription II– Identifies the exact relations among the genes by checking
nonlinearitiesIF f(X’i) + f(X’j) != f(X’ij) THEN link(i,j)
Linkage Learning GA - LLGA
More “messy” than gemGAo Variable-length stringso Left-to-right expressiono Always over-specification
NO primordial or juxtapositional phase – more SGA like
Idea: o Probabilistic expression that slows down the convergence of
alleleso Crossover that adapts linkage at the same time that alleles are
exchanged
LLGA – Probabilistic expression
Clockwise interpretation
(3,1)(2,0)(5,1)(1,1)(4,0)
1 0 1 0 1
LLGA – probabilistic expression cont.
The allele 1 is expressed with the probability δ/l and 1/l respectively
The allele 0 is expressed with the probability (l-δ)/l and (l-1)/l respectively
LLGA: Effect of PE on BBs
Assume a 6-bit problem where BB requiring genes 4, 5, and 6 to take on values of 1 in a trap function.
o Initially the block 111 will be expressed roughly 1/8th of the time
o After the linkage evolved properly the BB success rate increases
(6,1) (4,1) (5,1) (4,0) (5,0) (6,0)
expressed most of the time almost never expressed
Extended probabilistic expression EPE-qo q is the number of copies of unexpressed allele (q=2)
LLGA – introns
•Introns – non-coding genes (97% of DNA is non-coding)
oNumber of introns required for proper functioning grows exponentially compressed introns
Probabilistic Model-Building GAs
1. Initialize population at random
2. Select promising solutions
3. Build probabilistic model of selected solutions
4. Sample built model to generate new solutions
5. Incorporate new solutions into original population
6. Go to 2 (if not finished)
Com
pact
GA
-cG
A
5-bit trap problem
UMDA performance
UMDA with “good” statistics
Extended compact GA - ECGA
Marginal product model (MPM)
o Groups of bits (partitions) treated as chunks
o Partitions represent subproblem
o Onemax: [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]
o Traps: [1 2 3 4 5] [6 7 8 9 10]
Learning structure in ECGA
Two componentso Scoring metrics: minimal description length (MDL)
– Number of bits for storing probabilities:Cm = log2N i 2Si
– Number of bits storing population using model:Cp = N i E(Mi)
– Minimize C = Cm + Cp
o Search procedure: a greedy algorithm– Start with one-bit groups– Merge two groups for most improvement– No more improvement possible finish.
ECGA model
[0 ,2 ,5 ]
[1 ,4 ]
[3 ]
[0,2,5] [1,4] [3]
000 0.5 00 0.5 0 0.7
111 0.5 01 0.0 1 0.3
001, 010, 100 0.0 10 0.0
011, 101, 110 0.0 11 0.5
ECGA example