towards billion bit optimization via parallel estimation of distribution algorithm

22
Towards Billion Bit Optimization via Efficient Estimation of Distribution Algorithms Kumara Sastry 1,2 David E. Goldberg 1 , Xavier Llorà 1,3 1 Illinois Genetic Algorithms Laboratory (IlliGAL) 2 Materials Computation Center (MCC) 3 National Center for Super Computing Applications (NCSA) University of Illinois at Urbana-Champaign, Urbana, IL 61801 [email protected] , [email protected] , [email protected] http://www.illigal.uiuc.edu Supported by AFOSR FA9550-06-1-0096 and NSF DMR 03-25939. Computational results were obtained using CSE’s Turing cluster.

Upload: kknsastry

Post on 11-May-2015

1.687 views

Category:

Business


1 download

DESCRIPTION

This paper presents a highly efficient, fully parallelized implementation of the compact genetic algorithm to solve very large scale problems with millions to billions of variables. The paper presents principled results demonstrating the scalable solution of a difficult test function on instances over a billion variables using a parallel implementation of compact genetic algorithm (cGA). The problem addressed is a noisy, blind problem over a vector of binary decision variables. Noise is added equaling up to a tenth of the deterministic objective function variance of the problem, thereby making it difficult for simple hillclimbers to find the optimal solution. The compact GA, on the other hand, is able to find the optimum in the presence of noise quickly, reliably, and accurately, and the solution scalability follows known convergence theories. These results on noisy problem together with other results on problems involving varying modularity, hierarchy, and overlap foreshadow routine solution of billion-variable problems across the landscape of search problems.

TRANSCRIPT

Page 1: Towards billion bit optimization via parallel estimation of distribution algorithm

Towards Billion Bit Optimization via Efficient Estimation of Distribution Algorithms

Kumara Sastry1,2 David E. Goldberg1, Xavier Llorà1,3

1Illinois Genetic Algorithms Laboratory (IlliGAL)2Materials Computation Center (MCC)

3National Center for Super Computing Applications (NCSA)University of Illinois at Urbana-Champaign, Urbana, IL 61801

[email protected], [email protected], [email protected]://www.illigal.uiuc.edu

Supported by AFOSR FA9550-06-1-0096 and NSF DMR 03-25939. Computational results were obtained using CSE’s Turing cluster.

Page 2: Towards billion bit optimization via parallel estimation of distribution algorithm

2

Billion-Bit Optimization?

Strides w/ genetic algorithm (GA) theory/practice.Solving large, hard problems in principled way.Moving to practice in important problem domains.

Still GA boobirds claim:(1) no theory, (2) too slow, and (3) just voodoo.

How demonstrate results achieved so far in dramatic way?

DEG lunch questions:A million? Sure. A billion? Maybe.

Naïve GA approach/implementation goes nowhere:~100 terabytes memory for population storage.~272 random number calls.

Page 3: Towards billion bit optimization via parallel estimation of distribution algorithm

3

RoadmapMotivation

Robust, scalable, and efficient GA designs

Toward billion-variable optimizationTheory KeysImplementation KeysEfficiency KeysResults

Why this matters in practice?

Challenges to using this in real-world.

Summary and Conclusions

Page 4: Towards billion bit optimization via parallel estimation of distribution algorithm

4

Three Os and Million/Billion Decisions

The Os all have many decisions to make:

Nano, Bio, and Info

Modern systems increasingly complex:~105 parts in a modern automobile ~107 parts in commercial jetliner.

Increased complexity increases appetite for large optimization.

Will be driven toward routine million/billion variable problems.

“We get the warhead and then hold the world ransom for... 1 MILLION dollars!”

Page 5: Towards billion bit optimization via parallel estimation of distribution algorithm

5

Competent and Efficient GAsRobust, scalable and efficient GA designs available.

Competence: Solve hard problems quickly, reliably, and accurately (Intractable to tractable).

Efficiency: Develop speedup procedures (tractability to practicality).

Principled design: [Goldberg, 2002]

Relax rigor, emphasize scalability/quality.Use problem decomposition.Use facetwise models, and patchquilt integration using dimensional analysis.Test algorithms on adversarial problems

Page 6: Towards billion bit optimization via parallel estimation of distribution algorithm

6

Aiming for a Billion

Theory & algorithms in place.Focus on key theory, implementation, & efficiency enhancements.Theory keys:

Problem difficulty.Parallelism.

Implementation key: compact GA.Efficiency keys:

Various speedup.Memory savings.

Results on a billion-variable noisy OneMax.

Page 7: Towards billion bit optimization via parallel estimation of distribution algorithm

7

Theory Key 1: Master-Slave Linear Speedup

Speed-up:

Max speed-up at

[Cantu-Paz & Goldberg, 1997; Cantú-Paz, 2000]

Near linear speed-up until

Page 8: Towards billion bit optimization via parallel estimation of distribution algorithm

8

Theory Key 2: Noise Covers Most Problems

Adversarial problem design [Goldberg, 2002]

Blind noisy OneMax

P

Fluctuating

Deception NoiseScaling R

Page 9: Towards billion bit optimization via parallel estimation of distribution algorithm

9

Implementation Key: Compact GA

Simplest probabilistic model building GA [Harik, Lobo & Goldberg, 1997; Baluja, 1994; Mühlenbein & Paaß, 1996]

Represent population by probability vectorProbability that ith bit is 1

Replace recombination with probabilistic sampling

Selectionist scheme

New population evolution through probability updates

Equivalent to GA with steady-state tournament selection and uniform crossover

Page 10: Towards billion bit optimization via parallel estimation of distribution algorithm

10

Compact Genetic Algorithm (cGA)

Random initialization: Set probabilities to 0.5

Model Sampling: Generate two candidate solutions by sampling the probability vector

Evaluation: Evaluate the fitness of two sampled solutions

Selection: Select the best among the sampled solutions

Probabilistic model update: Increase the proportion of winning alleles by 1/n

Page 11: Towards billion bit optimization via parallel estimation of distribution algorithm

11

Parallel cGA Architecture

Processor #np

Sample bits (np-1)l/np+1- l

Select best individual

Update probabilities

Processor #1

Sample bits 1- l/np

Select best individual

Update probabilities

Processor #2

Sample bits l/np+1- 2l/np

Select best individual

Update probabilities

Collect partial sampled solutions and combine

Parallel fitness evaluation of sampled solutions

Broadcast fitness values of sampled solutions

Page 12: Towards billion bit optimization via parallel estimation of distribution algorithm

12

cGA is Memory Efficient: O(l) vs. O(l1.5)

Orders of magnitude memory savings via efficient GAExample: ~32 MB per processor on a modest 128 processors for billion-bit optimization

Simple GA:

Compact GA:Frequencies instead of probabilities (4 bytes)Parallelization reduces memory per processor by factor of np

Page 13: Towards billion bit optimization via parallel estimation of distribution algorithm

13

Vectorization Yields Speedup of 4

SIMD instruction set allows vector operations on 128-bit registersEquivalent to 4 processors per processor

Vectorize costly code segments with AltiVec/SSE2

Generate 4 random numbers at a timeSample 4 bits at a timeUpdate 4 probabilities at a time

Page 14: Towards billion bit optimization via parallel estimation of distribution algorithm

14

Other Efficiencies Yield Speedup of 15

Bitwise operations

Limited floating-point operations

Inline functions

Avoid using mod and division operations

Precomputing bit sums and indexing

Parallel, vectorized, and efficient GA:Memory scales as Θ(l/np); Speedup scales as 60np

~32 MB memory, and ~104 speedup with 128 processors

Solves 65,536-bit noisy OneMax problem in ~45 minutes on a 3GHz PC.

Page 15: Towards billion bit optimization via parallel estimation of distribution algorithm

15

Experimental Procedure

128 – 256 processor partition of 1280-processor Apple G5 Xserve

Population was doubled till cGA converged to at least l-1 out of l bits set to optimal values

For l > 223; Population size fixed according to theory.

Number of independent runsl ≤ 218 (262,144): 50l ≤ 225 (33, 554, 432): 10l > 225 (33, 554, 432): 1

Compare cGA performance withSequential hillclimber (sHC)Random hillclimber (rHC)

Page 16: Towards billion bit optimization via parallel estimation of distribution algorithm

16

Compact GA Population Sizing

Additive Gaussian noise with variance σ2

N

Population sizing scales:O(l0.5 log l)

Noise-to-fitness variance ratio

Error toleranceSignal-to-Noise ratio # Competing sub-components

# Components (# BBs)

[Harik, et al, 1997]

Page 17: Towards billion bit optimization via parallel estimation of distribution algorithm

17

Compact GA Convergence Time

Selection Intensity

Problem size (m·k )

[Miller & Goldberg, 1995; Goldberg, 2002; Sastry & Goldberg, 2002]

Convergence time scales: O(m0.5)

GA scales as:O(m log m)

Page 18: Towards billion bit optimization via parallel estimation of distribution algorithm

18

Scalability on OneMax

Page 19: Towards billion bit optimization via parallel estimation of distribution algorithm

19

GA scales Θ(l·logl·(1+σ2N/σ2

f))

EDA Solves Billion-Bit Noisy OneMax

Solved 33 million (225) bit problem to optimality.Solved 1.1 billion (230) bit problem with relaxed, but guaranteed convergence

Page 20: Towards billion bit optimization via parallel estimation of distribution algorithm

20

Do Problems Like This Matter?Yes, for three reasons:

Many GAs no more sophisticated than cGA.Inclusion of noise was important because it covers an important facet of difficulty.Know how to handle deception and other problems through EDAs like hBOA.

Compact GA-like algorithms can solve tough problems:Material science [Sastry et al, 2004; Sastry et al, 2005]*

Chemistry [Sastry et al, 2006]**.

Complex versions of these kinds of problems need million/billion-bit optimization.

*chosen by the AIP editors as focused article of frontier research in Virtual Journal of Nanoscale Science & Technology, 12(9), 2005; **Best paper and Silver “Humies” award. GECCO 2006]

Page 21: Towards billion bit optimization via parallel estimation of distribution algorithm

21

Challenges to Routine Billion-Bit Optimization

What if you have large nonlinear solver (PDE, ODE, FEM, KMC, MD, whatever)?

Need efficiency enhancement:Parallelization: Effective use of computational “space”Time continuation: Effective use of computational “time”Hybridization: Effective use of global and local searchersEvaluation relaxation: Effective use of expensive-accurate & cheap-inaccurate evaluations

Need more powerful solvers

Need highly efficient implementations

Page 22: Towards billion bit optimization via parallel estimation of distribution algorithm

22

Summary and Conclusions

Parallel and efficient implementation of compact GA.Memory and computational efficiency enhancementsSolved 33 million bit noisy OneMax problem to optimalitySolved 1.1 billion bit noisy OneMax problem to relaxed, but guaranteed convergence

Big optimization is a frontier today:Take extant cluster computingMix in robustness, scalability and efficiency lessonsIntegrate into problems.

Nano, bio, and info systems are increasingly complex:Call for routine mega/giga-variable optimizationNeed robust, scalable and efficient methods.