eda tutorial
Post on 12-Nov-2014
1.315 views
Embed Size (px)
DESCRIPTION
Probabilistic model-building algorithms (PMBGAs), also called estimation of distribution algorithms (EDAs) and iterated density estimation algorithms (IDEAs), replace traditional variation of genetic and evolutionary algorithms by (1) building a probabilistic model of promising solutions and (2) sampling the built model to generate new candidate solutions. PMBGAs are also known as estimation of distribution algorithms (EDAs) and iterated density-estimation algorithms (IDEAs).Replacing traditional crossover and mutation operators by building and sampling a probabilistic model of promising solutions enables the use of machine learning techniques for automatic discovery of problem regularities and exploitation of these regularities for effective exploration of the search space. Using machine learning in optimization enables the design of optimization techniques that can automatically adapt to the given problem. There are many successful applications of PMBGAs, for example, Ising spin glasses in 2D and 3D, graph partitioning, MAXSAT, feature subset selection, forest management, groundwater remediation design, telecommunication network design, antenna design, and scheduling.This tutorial provides a gentle introduction to PMBGAs with an overview of major research directions in this area. Strengths and weaknesses of different PMBGAs will be discussed and suggestions will be provided to help practitioners to choose the best PMBGA for their problem.The video of the presentation at GECCO-2008 conference can be found athttp://medal.cs.umsl.edu/blog/?p=293TRANSCRIPT
Probabilistic Model-Building Genetic Algorithmsa.k.a. Estimation of Distribution Algorithms a.k.a. Iterated Density Estimation Algorithms
Martin PelikanMissouri Estimation of Distribution Algorithms Laboratory (MEDAL) Dept. of Math. and Computer Science University of Missouri at St. Louis pelikan@cs.umsl.edu Copyright is held by the author/owner(s). http://medal.cs.umsl.edu/
[ last update: April 2008 ]
GECCO08, July 1216, 2008, Atlanta, Georgia, USA. ACM 978-1-60558-130-9/08/07.
ForewordMotivationGenetic and evolutionary computation (GEC) popular. Toy problems great, but difficulties in practice. Must design new representations, operators, tune,
This talkDiscuss a promising direction in GEC. Combine machine learning and GEC. Create practical and powerful optimizers.
Martin Pelikan, Probabilistic Model-Building GAs
2
OverviewIntroductionBlack-box optimization via probabilistic modeling.
Probabilistic Model-Building GAsDiscrete representation Continuous representation Computer programs (PMBGP) Permutations
Conclusions
Martin Pelikan, Probabilistic Model-Building GAs
3
Problem FormulationInputHow do potential solutions look like? How to evaluate quality of potential solutions?
OutputBest solution (the optimum).
ImportantNo additional knowledge about the problem.
Martin Pelikan, Probabilistic Model-Building GAs
4
Why View Problem as Black Box?AdvantagesSeparate problem definition from optimizer. Easy to solve new problems. Economy argument.
DifficultiesAlmost no prior problem knowledge. Problem specifics must be learned automatically. Noise, multiple objectives, interactive evaluation.
Martin Pelikan, Probabilistic Model-Building GAs
5
Representations Considered HereStart withSolutions are n-bit binary strings.
LaterReal-valued vectors. Program trees. Permutations
Martin Pelikan, Probabilistic Model-Building GAs
6
Typical SituationPreviously visited solutions + their evaluation: # Solution 1 00100 2 11011 3 01101 4 10111 Evaluation 1 4 0 3
Question: What solution to generate next?Martin Pelikan, Probabilistic Model-Building GAs
7
Many AnswersHill climberStart with a random solution. Flip bit that improves the solution most. Finish when no more improvement possible.
Simulated annealingIntroduce Metropolis.
Probabilistic model-building GAsInspiration from GAs and machine learning (ML).
Martin Pelikan, Probabilistic Model-Building GAs
8
Probabilistic Model-Building GAsCurrent population11001 11101 01011 11000
Selected population11001 10101 01011 11000
New population01111
Probabilistic Model
11001 11011 00111
replace crossover+mutation with learning and sampling probabilistic modelMartin Pelikan, Probabilistic Model-Building GAs
9
Other Names for PMBGAsEstimation of distribution algorithms (EDAs) (Mhlenbein & Paass, 1996) Iterated density estimation algorithms (IDEA) (Bosman & Thierens, 2000)
Martin Pelikan, Probabilistic Model-Building GAs
10
What Models to Use?Start with a simple exampleProbability vector for binary strings.
LaterDependency tree models (COMIT). Bayesian networks (BOA). Bayesian networks with local structures (hBOA).
Martin Pelikan, Probabilistic Model-Building GAs
11
Probability VectorAssume n-bit binary strings. Model: Probability vector p=(p1, , pn)
pi = probability of 1 in position i Learn p: Compute proportion of 1 in each position. Sample p: Sample 1 in position i with prob. pi
Martin Pelikan, Probabilistic Model-Building GAs
12
Example: Probability Vector(Mhlenbein, Paass, 1996), (Baluja, 1994)Current population11001 10101 01011 11000
Selected population11001 10101 01011 11000
Probability vector1.0 0.5 0.5 0.0 1.0
New population10101 10001 11101 11001
Martin Pelikan, Probabilistic Model-Building GAs
13
Probability Vector PMBGAsPBIL (Baluja, 1995)Incremental updates to the prob. vector.
Compact GA (Harik, Lobo, Goldberg, 1998)Also incremental updates but better analogy with populations.
UMDA (Mhlenbein, Paass, 1996)What we showed here.
DEUM (Shakya et al., 2004) All variants perform similarly.Martin Pelikan, Probabilistic Model-Building GAs
14
Probability Vector DynamicsBits that perform better get more copies. And are combined in new ways. But context of each bit is ignored. Example problem 1: Onemax
f ( X1, X 2 ,K, X n ) = X ii =1Martin Pelikan, Probabilistic Model-Building GAs
n
15
Probability Vector on Onemax1
Probability vector entries
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 10 20 30 40 5016
GenerationMartin Pelikan, Probabilistic Model-Building GAs
Probability Vector: Ideal Scale-upO(n log n) evaluations until convergence(Harik, Cant-Paz, Goldberg, & Miller, 1997) (Mhlenbein, Schlierkamp-Vosen, 1993)
Other algorithmsHill climber: O(n log n) (Mhlenbein, 1992) GA with uniform: approx. O(n log n) GA with one-point: slightly slower
Martin Pelikan, Probabilistic Model-Building GAs
17
When Does Prob. Vector Fail?Example problem 2: Concatenated trapsPartition input string into disjoint groups of 5 bits. Groups contribute via trap (ones=number of ones):
if ones = 5 5 trap (ones ) = 4 ones otherwiseConcatenated trap = sum of single traps Optimum: String 1111Martin Pelikan, Probabilistic Model-Building GAs
18
Trap-55 4
trap(u)
3 2 1 0 0 1
Number of ones, u
2
3
4
519
Martin Pelikan, Probabilistic Model-Building GAs
Probability Vector on Traps1
Probability vector entries
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 10 20 30 40 5020
GenerationMartin Pelikan, Probabilistic Model-Building GAs
Why Failure?Onemax:Optimum in 1111 1 outperforms 0 on average.
Traps: optimum in 11111, butf(0****) = 2 f(1****) = 1.375
So single bits are misleading.Martin Pelikan, Probabilistic Model-Building GAs
21
How to Fix It?Consider 5-bit statistics instead 1-bit ones. Then, 11111 would outperform 00000. Learn modelCompute p(00000), p(00001), , p(11111)
Sample modelSample 5 bits at a time Generate 00000 with p(00000), 00001 with p(00001),
Martin Pelikan, Probabilistic Model-Building GAs
22
Correct Model on Traps: Dynamics1
Probabilities of 11111
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 10 20 30 40 5023
Martin Pelikan, Probabilistic Model-Building GAs
Generation
Good News: Good Stats Work Great!Optimum in O(n log n) evaluations. Same performance as on onemax! OthersHill climber: O(n5 log n) = much worse. GA with uniform: O(2n) = intractable. GA with k-point xover: O(2n) (w/o tight linkage).
Martin Pelikan, Probabilistic Model-Building GAs
24
ChallengeIf we could learn and use relevant context for each positionFind non-misleading statistics. Use those statistics as in probability vector.
Then we could solve problems decomposable into statistics of order at most k with at most O(n2) evaluations!And there are many such problems (Simon, 1968).
Martin Pelikan, Probabilistic Model-Building GAs
25
Whats Next?COMITUse tree models
Extended compact GACluster bits into groups.
Bayesian optimization algorithm (BOA)Use Bayesian networks (more general).
Martin Pelikan, Probabilistic Model-Building GAs
26
Beyond single bits: COMIT(Baluja, Davies, 1997)
Model String
P(X=1) 75 %
X 0 1 X 0 1Martin Pelikan, Probabilistic Model-Building GAs
P(Y=1|X) 30 % 25 % P(Z=1|X) 86 % 75 %27
How to Learn a Tree Model?Mutual information:I ( X i , X j ) = P ( X i = a, X j = b) log P ( X i = a, X j = b) P ( X i = a ) P ( X j = b)a ,b
Goal
Find tree that maximizes mutual information between connected nodes. Will minimize Kullback-Leibler divergence.
AlgorithmPrims algorithm for maximum spanning trees.28
Martin Pelikan, Probabilistic Model-Building GAs
Prims AlgorithmStart with a graph with no edges. Add arbitrary node to the tree. IterateHang a new node to the current tree. Prefer addition of edges with large mutual information (greedy approach).
Complexity: O(n2)
Martin Pelikan, Probabilistic Model-Building GAs
29
Variants of PMBGAs with Tree ModelsCOMIT (Baluja, Davies, 1997)Tree models.
MIMIC (DeBonet, 1996)Chain distributions.
BMDA (Pelikan, Mhlenbein, 1998)Forest distribution (independent trees or tree)
Martin Pelikan, Probabilistic Model-Building GAs
30
Beyond Pairwise Dependencies: ECGAExtended Compact GA (ECGA) (Harik, 1999). Consider groups of string positions.String Model
00 01 10 11
16 % 45 % 35 % 4%
0 1
86 % 14 %
000 001
17 % 2% 24 %31
111
Martin Pelikan, Probabilistic Model-Building GAs
Learning the Model in ECGAStart with each bit in a separate group. Each iteration merges two groups for best improvement.
Martin Pelikan, Probabilistic Model-Building GAs
32
How to Compute Model Quality?ECGA uses minimum description length. Minimize number of bits to store model+data:MDL( M, D) = DModel + DData
Each frequency needs (0.5 log N) bits:DModel = 2|g|1 log NgG
Each solution X needs -log p(X) bits:DData = N p( X ) log p( X )XMartin Pelikan, Probabilistic Model-Building GAs
33
Sampling Model in ECGASample groups of bits at a time. Based on observed pr