from mating pool distributions to model overfitting

IntroductionSelection as a Source of Overfitting

Improving Model Accuracy

From Mating Pool Distributionsto Model Overfitting

Claudio F. Lima1 Fernando G. Lobo1 Martin Pelikan2

1Department of Electronics and Computer Science EngineeringUniversity of Algarve, Portugal

2Missouri Estimation of Distribution Algorithm LaboratoryUniversity of Missouri at St. Louis, USA

GECCO 2008, Atlanta, USA

Lima, Lobo, & Pelikan From Mating Pool Distributions to Model Overfitting



MotivationBOA: The Bayesian Optimization AlgorithmExperimental Setup

Probabilistic Modeling in EDAs

The coreEDA is defined by its model complexity (expressiveness).

The spectrumSimpler EDAs only learn the parameters of the model.More powerful EDAs learn both structure and parameters.

Bayesian EDAsSolve a broad class of problems efficiently.But oftentimes models do not exactly reflect the underlyingproblem structure.





Motivation

Importance of model accuracyModel-based efficiency enhancement techniques.

Substructural local search.Fitness estimation (surrogate fitness function).

Offline interpretation of probabilistic models.Bias structural learning in EDAs (Mark Hauschild’s talk).Design structure-aware variation operators.

Empirical findings with BOA: Tournament vs. truncationPopulation size requirements.Model quality difference.





Bayesian Optimization Algorithm (BOA)

BOAEDA which uses Bayesian networks (BNs).

Model selected solutions to generate new ones.Similar algorithms:

Estimation of Bayesian networks algorithm (EBNA).Learning factorized distribution algorithm (LFDA).

Learning BNsDecision trees express conditional probabilities.Greedy algorithm for learning.Scoring metrics considered: K2 and BIC.





Experimental Setup

Test problemTo investigate MSA we use a problem of known structure.Clear identification of:

Necessary dependencies→ successful tractability.Unnecessary dependencies→ poor model interpretability.

m − k trap problemConsists of m concatenated k -bit trap functions.For k ≥ 3, the model should be able to maintain k−orderstatistics, otherwise→ exponential scalability.k = 5 used in experiments.





Ideal Model for the m − k trap problem

Joint distribution in BNsp(Xa,Xb,Xc ,Xd ) = p(Xa)p(Xb|Xa)p(Xc |Xa,Xb)p(Xd |Xa,Xb,Xc)

Clique between variables of the same trap function.While between different traps there are no dependencies.





Measuring Model Structural Accuracy

Definition 1The model structural accuracy (MSA) is defined as the ratio ofcorrect edges over the total number of edges in the BN.

Definition 2An edge is correct if it connects two variables that are linkedaccording to the objective function definition.

Definition 3Model overfitting is defined as the inclusion of incorrect (orunnecessary) edges to the BN.




Influence of Selection in Model AccuracySelection: The Mating-Pool Distribution Generator

Selection Methods

Selection methods consideredTournament selection. Randomly pick s individuals andchoose the best. Repeat n times.Truncation selection. Choose the best τ proportion of thepopulation.

Equivalent selection intensitySelection intensity, I 0.56 0.84 1.03 1.16 1.35 1.54 1.87Tournament size, s 2 3 4 5 7 10 20

Truncation threshold, τ(%) 66 47 36 30 22 15 8





Selection-Metric Comparison

0.5 1 1.5 20

0.2

0.4

0.6

0.8

1

Selection intensity, I

Mod

el S

truc

tura

l Acc

urac

y, M

SA

Tournament w/ K2Tournament w/ BICTruncation w/ K2Truncation w/ BIC

0.5 1 1.5 20

1

2

3

4

5

6

7

x 105

Selection intensity, I

Num

ber

of fu

nctio

n ev

alua

tions

, nfe

Tournament w/ K2Tournament w/ BICTruncation w/ K2Truncation w/ BIC

Observations1 Truncation performs better than tournament selection.2 K2 metric performs better than BIC metric.





A Simple Analysis of Tournament Selection

Probability of an individual with rank i to win a tournament

pi =s−1∏j=1

i − jn − j

, for s ≥ 2. (1)

Number of copies in the mating pool

ci = s pi . (2)

Which can be approximated by a power distribution with p.d.f.

f (x) = α xα−1, 0 < x ≤ 1, α = s, x = i/n. (3)





A Simple Analysis of Truncation Selection

Number of copies in the mating pool

ci =

{0, if i < n (1− (τ/100))1, otherwise.

(4)





Distribution of the Expected Number of Copies

0 0.2 0.4 0.6 0.8 10

5

10

15

20

Rank, i (in percentile)

Exp

ecte

d nu

mbe

r of

cop

ies,

ci

s = 2s = 3s = 4s = 5s = 7s = 10s = 20

0 0.2 0.4 0.6 0.8 10

1

Rank, i (in percentile)

Exp

ecte

d nu

mbe

r of

cop

ies,

ci

τ = 66%

τ = 47%

τ = 36%

τ = 30%

τ = 22%

τ = 15%

τ = 8%

Tournament and truncation selection differ in:1 Window size.2 Distribution shape.





Mating Pool Distribution: An Example

Functionfmk−trap = f1(X1X2X3) + f2(X4X5X6) + f3(X7X8X9)

ExampleRank Individual Copies

8 111 100 000 1 4 87 111 111 101 1 3 46 111 000 110 1 2 25 000 001 000 1 1 14 111 010 011 0 0 03 000 101 001 0 0 02 000 110 101 0 0 01 101 000 101 0 0 0

Mating pool distributions:Uniform Linear Exponential

X6X7 freq. X6X7 freq. X6X7 freq.00 0.25 00 0.4 00 0.5301 0.25 01 0.2 01 0.1310 0.25 10 0.1 10 0.0711 0.25 11 0.3 11 0.27





Mating Pool w/ Non-Uniform Distribution

What’s good about it?

Detection of existent patterns is achieved with less data.Tournament selection requires smaller population sizesthan truncation.

What’s bad about it?Irrelevant features can also be detected as patterns.

Excessive complexity→ model overfitting.

So what?Recognize that we are learning from imbalanced data sets.Change scoring metric accordingly!




Modeling Metric Gain When OverfittingAdaptive Scoring Metric

Improving Model Accuracy for Tournament

Agenda1 Compute how the power-law-based frequencies differ from

uniform ones in the mating pool.2 Use computed frequency deviation to estimate scoring

metric gain due to overfitting.3 Change the complexity penalty to counterbalance the

misleading metric gain.





Deviation of power-law-based frequencies

Deviation is linear w.r.t. tournament sizeExpected market share of top-ranked individuals afterselection is given by the (right-side area of the) c.d.f. of thepower distribution.

F (x) = xs, 0 < x < 1, s ≥ 2. (5)

In the worst case, market share grows linearly withtournament size.Thus, misleading frequencies deviate linearly from the“true” ones (uniform distribution).More details in the paper.





Computing Metric Gain

Consider the decision of adding an edge from X2 to X1

Metric gain must be calculated:Gmetric = ScoreAfter − ScoreBefore − ComplexityPenaltyUniform mating pool revealsp00 = p01 = p10 = p11 = 0.25→ Gmetric < 0.Simulate tournament mating pool by assigning

p00 ≈ 0.25 + ∆(s − 1),p01 ≈ 0.25−∆(s − 1),p10 ≈ 0.25−∆(s − 1),p11 ≈ 0.25 + ∆(s − 1).

Where ∆ is related to the proportion of top-individualswhich lead to overfitting (more info in the paper).





Computing Metric Gain (2)

Approximated metric gain

G′ ≈ 2 (0.25−∆(s − 1)) log2 (0.25−∆(s − 1)) + 1. (6)

100

101

102

10−4

10−3

10−2

10−1

100

101

102

Tournament size, s

Met

ric g

ain,

G’ B

IC

ApproximationO(s)

O(s2)





Comparison of Different Penalty Corrections

Complexity penalty for the K2 metric

p(B) = 2−0.5cs log2(n)P`

i=1 |Li | → cs = {1,√

s, s, and s log2(s)}

5 10 15 200

0.2

0.4

0.6

0.8

1

Tournament size, s

Mod

el S

truc

tura

l Acc

urac

y, M

SA

cs=1

cs=sqrt(s)

cs=s

cs=s*log

2(s)

5 10 15 200

1

2

3

4

5x 10

5

Tournament size, s

Num

ber

of fu

nctio

n ev

alua

tions

, nfe

c

s=1

cs=sqrt(s)

cs=s

cs=s*log

2(s)





Results for the s−penalty (cs = s)

5 10 15 200.9

0.92

0.94

0.96

0.98

1

Tournament size, s

Mod

el S

truc

tura

l Acc

urac

y, M

SA

l = 40l = 60l = 80l = 100

5 100.992

0.994

0.996

0.998

1

5 10 15 200

2

4

6

8

10

12

14x 10

5

Tournament size, s

Num

ber

of fu

nctio

n ev

alua

tions

, nfe

l = 40l = 60l = 80l = 100

RemarkModel structural accuracy is superior to 99%!





Results for Truncation with the Standard Penalty

102030405060700.9

0.92

0.94

0.96

0.98

1

Truncation threshold, τ

Mod

el S

truc

tura

l Acc

urac

y, M

SA

l = 40l = 60l = 80l = 100

102030405060700

0.5

1

1.5

2x 10

6

Truncation threshold, τ

Num

ber

of fu

nctio

n ev

alua

tions

, nfe

l = 40l = 60l = 80l = 100

RemarkGood results, but not as good as for tournament selectionwith the s−penalty.





Summary & Conclusions

Selection addressed as a source of overfitting.Influence of selection method in BN learning:

Truncation selection→ uniform mating pool is moreappropriate for learning (standard scoring metrics).Tournament selection→ non-uniform mating pool leads tomodel overfitting.

Scoring metric adapted to the power distribution of themating pool generated by tournament selection.

Model accuracy has been considerably improved!

Improved model interpretability to assist efficiencyenhancement techiniques and human researchers.





From Mating Pool Distributionsto Model Overfitting

Claudio F. Lima1 Fernando G. Lobo1 Martin Pelikan2

1Department of Electronics and Computer Science EngineeringUniversity of Algarve, Portugal

2Missouri Estimation of Distribution Algorithm LaboratoryUniversity of Missouri at St. Louis, USA

GECCO 2008, Atlanta, USA





Scoring Metrics

(Bayesian-Dirichlet) K2 metric

K 2(B) = p(B)∏̀i=1

∏l∈Li

1Γ(mi(l) + 2)

∏xi

Γ(mi(xi , l) + 1), (7)

where p(B) = 2−0.5 log2(n)P`

i=1 |Li |.

Bayesian information criteria (BIC) metric

BIC(B) =∑̀i=1

∑l∈Li

∑xi

(mi(xi , l) log2

mi(xi , l)mi(l)

)− |Li |

log2(n)

2

.

(8)


from mating pool distributions to model overfitting

Technology

model accuracy

source of overtting