# eda tutorial

Post on 12-Nov-2014

1.319 views

Embed Size (px)

DESCRIPTION

Probabilistic model-building algorithms (PMBGAs), also called estimation of distribution algorithms (EDAs) and iterated density estimation algorithms (IDEAs), replace traditional variation of genetic and evolutionary algorithms by (1) building a probabilistic model of promising solutions and (2) sampling the built model to generate new candidate solutions. PMBGAs are also known as estimation of distribution algorithms (EDAs) and iterated density-estimation algorithms (IDEAs).Replacing traditional crossover and mutation operators by building and sampling a probabilistic model of promising solutions enables the use of machine learning techniques for automatic discovery of problem regularities and exploitation of these regularities for effective exploration of the search space. Using machine learning in optimization enables the design of optimization techniques that can automatically adapt to the given problem. There are many successful applications of PMBGAs, for example, Ising spin glasses in 2D and 3D, graph partitioning, MAXSAT, feature subset selection, forest management, groundwater remediation design, telecommunication network design, antenna design, and scheduling.This tutorial provides a gentle introduction to PMBGAs with an overview of major research directions in this area. Strengths and weaknesses of different PMBGAs will be discussed and suggestions will be provided to help practitioners to choose the best PMBGA for their problem.The video of the presentation at GECCO-2008 conference can be found athttp://medal.cs.umsl.edu/blog/?p=293TRANSCRIPT

Probabilistic Model-Building Genetic Algorithmsa.k.a. Estimation of Distribution Algorithms a.k.a. Iterated Density Estimation Algorithms

Martin PelikanMissouri Estimation of Distribution Algorithms Laboratory (MEDAL) Dept. of Math. and Computer Science University of Missouri at St. Louis pelikan@cs.umsl.edu Copyright is held by the author/owner(s). http://medal.cs.umsl.edu/

[ last update: April 2008 ]

GECCO08, July 1216, 2008, Atlanta, Georgia, USA. ACM 978-1-60558-130-9/08/07.

ForewordMotivationGenetic and evolutionary computation (GEC) popular. Toy problems great, but difficulties in practice. Must design new representations, operators, tune,

This talkDiscuss a promising direction in GEC. Combine machine learning and GEC. Create practical and powerful optimizers.

Martin Pelikan, Probabilistic Model-Building GAs

2

OverviewIntroductionBlack-box optimization via probabilistic modeling.

Probabilistic Model-Building GAsDiscrete representation Continuous representation Computer programs (PMBGP) Permutations

Conclusions

Martin Pelikan, Probabilistic Model-Building GAs

3

Problem FormulationInputHow do potential solutions look like? How to evaluate quality of potential solutions?

OutputBest solution (the optimum).

ImportantNo additional knowledge about the problem.

Martin Pelikan, Probabilistic Model-Building GAs

4

Why View Problem as Black Box?AdvantagesSeparate problem definition from optimizer. Easy to solve new problems. Economy argument.

DifficultiesAlmost no prior problem knowledge. Problem specifics must be learned automatically. Noise, multiple objectives, interactive evaluation.

Martin Pelikan, Probabilistic Model-Building GAs

5

Representations Considered HereStart withSolutions are n-bit binary strings.

LaterReal-valued vectors. Program trees. Permutations

Martin Pelikan, Probabilistic Model-Building GAs

6

Typical SituationPreviously visited solutions + their evaluation: # Solution 1 00100 2 11011 3 01101 4 10111 Evaluation 1 4 0 3

Question: What solution to generate next?Martin Pelikan, Probabilistic Model-Building GAs

7

Many AnswersHill climberStart with a random solution. Flip bit that improves the solution most. Finish when no more improvement possible.

Simulated annealingIntroduce Metropolis.

Probabilistic model-building GAsInspiration from GAs and machine learning (ML).

Martin Pelikan, Probabilistic Model-Building GAs

8

Probabilistic Model-Building GAsCurrent population11001 11101 01011 11000

Selected population11001 10101 01011 11000

New population01111

Probabilistic Model

11001 11011 00111

replace crossover+mutation with learning and sampling probabilistic modelMartin Pelikan, Probabilistic Model-Building GAs

9

Other Names for PMBGAsEstimation of distribution algorithms (EDAs) (Mhlenbein & Paass, 1996) Iterated density estimation algorithms (IDEA) (Bosman & Thierens, 2000)

Martin Pelikan, Probabilistic Model-Building GAs

10

What Models to Use?Start with a simple exampleProbability vector for binary strings.

LaterDependency tree models (COMIT). Bayesian networks (BOA). Bayesian networks with local structures (hBOA).

Martin Pelikan, Probabilistic Model-Building GAs

11

Probability VectorAssume n-bit binary strings. Model: Probability vector p=(p1, , pn)

pi = probability of 1 in position i Learn p: Compute proportion of 1 in each position. Sample p: Sample 1 in position i with prob. pi

Martin Pelikan, Probabilistic Model-Building GAs

12

Example: Probability Vector(Mhlenbein, Paass, 1996), (Baluja, 1994)Current population11001 10101 01011 11000

Selected population11001 10101 01011 11000

Probability vector1.0 0.5 0.5 0.0 1.0

New population10101 10001 11101 11001

Martin Pelikan, Probabilistic Model-Building GAs

13

Probability Vector PMBGAsPBIL (Baluja, 1995)Incremental updates to the prob. vector.

Compact GA (Harik, Lobo, Goldberg, 1998)Also incremental updates but better analogy with populations.

UMDA (Mhlenbein, Paass, 1996)What we showed here.

DEUM (Shakya et al., 2004) All variants perform similarly.Martin Pelikan, Probabilistic Model-Building GAs

14

Probability Vector DynamicsBits that perform better get more copies. And are combined in new ways. But context of each bit is ignored. Example problem 1: Onemax

f ( X1, X 2 ,K, X n ) = X ii =1Martin Pelikan, Probabilistic Model-Building GAs

n

15

Probability Vector on Onemax1

Probability vector entries

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 10 20 30 40 5016

GenerationMartin Pelikan, Probabilistic Model-Building GAs

Probability Vector: Ideal Scale-upO(n log n) evaluations until convergence(Harik, Cant-Paz, Goldberg, & Miller, 1997) (Mhlenbein, Schlierkamp-Vosen, 1993)

Other algorithmsHill climber: O(n log n) (Mhlenbein, 1992) GA with uniform: approx. O(n log n) GA with one-point: slightly slower

Martin Pelikan, Probabilistic Model-Building GAs

17

When Does Prob. Vector Fail?Example problem 2: Concatenated trapsPartition input string into disjoint groups of 5 bits. Groups contribute via trap (ones=number of ones):

if ones = 5 5 trap (ones ) = 4 ones otherwiseConcatenated trap = sum of single traps Optimum: String 1111Martin Pelikan, Probabilistic Model-Building GAs

18

Trap-55 4

trap(u)

3 2 1 0 0 1

Number of ones, u

2

3

4

519

Martin Pelikan, Probabilistic Model-Building GAs

Probability Vector on Traps1

Probability vector entries

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 10 20 30 40 5020

GenerationMartin Pelikan, Probabilistic Model-Building GAs

Why Failure?Onemax:Optimum in 1111 1 outperforms 0 on average.

Traps: optimum in 11111, butf(0****) = 2 f(1****) = 1.375

So single bits are misleading.Martin Pelikan, Probabilistic Model-Building GAs

21

How to Fix It?Consider 5-bit statistics instead 1-bit ones. Then, 11111 would outperform 00000. Learn modelCompute p(00000), p(00001), , p(11111)

Sample modelSample 5 bits at a time Generate 00000 with p(00000), 00001 with p(00001),

Martin Pelikan, Probabilistic Model-Building GAs

22

Correct Model on Traps: Dynamics1

Probabilities of 11111

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 10 20 30 40 5023

Martin Pelikan, Probabilistic Model-Building GAs

Generation

Good News: Good Stats Work Great!Optimum in O(n log n) evaluations. Same performance as on onemax! OthersHill climber: O(n5 log n) = much worse. GA with uniform: O(2n) = intractable. GA with k-point xover: O(2n) (w/o tight linkage).

Martin Pelikan, Probabilistic Model-Building GAs

24

ChallengeIf we could learn and use relevant context for each positionFind non-misleading statistics. Use those statistics as in probability vector.

Then we could solve problems decomposable into statistics of order at most k with at most O(n2) evaluations!And there are many such problems (Simon, 1968).

Martin Pelikan, Probabilistic Model-Building GAs

25

Whats Next?COMITUse tree models

Extended compact GACluster bits into groups.

Bayesian optimization algorithm (BOA)Use Bayesian networks (more general).

Martin Pelikan, Probabilistic Model-Building GAs

26

Beyond single bits: COMIT(Baluja, Davies, 1997)

Model String

P(X=1) 75 %

X 0 1 X 0 1Martin Pelikan, Probabilistic Model-Building GAs

P(Y=1|X) 30 % 25 % P(Z=1|X) 86 % 75 %27

How to Learn a Tree Model?Mutual information:I ( X i , X j ) = P ( X i = a, X j = b) log P ( X i = a, X j = b) P ( X i = a ) P ( X j = b)a ,b

Goal

Find tree that maximizes mutual information between connected nodes. Will minimize Kullback-Leibler divergence.

AlgorithmPrims algorithm for maximum spanning trees.28

Martin Pelikan, Probabilistic Model-Building GAs

Prims AlgorithmStart with a graph with no edges. Add arbitrary node to the tree. IterateHang a new node to the current tree. Prefer addition of edges with large mutual information (greedy approach).

Complexity: O(n2)

Martin Pelikan, Probabilistic Model-Building GAs

29

Variants of PMBGAs with Tree ModelsCOMIT (Baluja, Davies, 1997)Tree models.

MIMIC (DeBonet, 1996)Chain distributions.

BMDA (Pelikan, Mhlenbein, 1998)Forest distribution (independent trees or tree)

Martin Pelikan, Probabilistic Model-Building GAs

30

Beyond Pairwise Dependencies: ECGAExtended Compact GA (ECGA) (Harik, 1999). Consider groups of string positions.String Model

00 01 10 11

16 % 45 % 35 % 4%

0 1

86 % 14 %

000 001

17 % 2% 24 %31

111

Martin Pelikan, Probabilistic Model-Building GAs

Learning the Model in ECGAStart with each bit in a separate group. Each iteration merges two groups for best improvement.

Martin Pelikan, Probabilistic Model-Building GAs

32

How to Compute Model Quality?ECGA uses minimum description length. Minimize number of bits to store model+data:MDL( M, D) = DModel + DData

Each frequency needs (0.5 log N) bits:DModel = 2|g|1 log NgG

Each solution X needs -log p(X) bits:DData = N p( X ) log p( X )XMartin Pelikan, Probabilistic Model-Building GAs

33

Sampling Model in ECGASample groups of bits at a time. Based on observed pr