why advanced population initialization techniques perform poorly in high dimension

Why Advanced Population Initialization Techniques Perform Poorly in High Dimension?

Borhan KazimipourXiaodong LiA.K. Qin

Outlines

1. Introduction

2. Background

3. Questions

4. Experiments

5. Results

6. Conclusion

SEAL 2014, Dunedin, NZ 2Why Advanced PITs Perform Poorly in HD?

Outlines

1. Introduction

2. Background

3. Questions

4. Experiments

5. Results

6. Conclusion


Definition of Population Initialization

• Definition:– Initialization is the task of generating a set of initial points as potential solutions of

an optimization problem. These values are seen as the first position (or distribution) of the individuals in the first generation.

• Common Parameters:

– Population size

– Number of variables or dimensionality (given)

– Variables range (given)

• Note: In this study our main focus is on continuous techniques capable of generating real-value numbers in continuous spaces.


Importance of Population Initialization

• Why studying population initialization is importan t?

– Popularity: All population-based algorithms, including EA, need a population initialization module.

– “initialize population randomly” is the most widely used expression in EA community!

– Variety: Lots of different population initialization techniques are proposed, so far.– About 100 population initialization techniques are proposed so far*.

– Effectiveness: Clearly, starting from a good position makes it easier and faster to achieve the aim, than starting from a bad one.

– “Advanced initialization techniques can increase the probability of finding global optima, reduce the variation of the final results, decrease the computational costs and improve the solution(s) quality.” *

– Inconsistency(!): Some controversy findings have been reported.

– “For example, one claimed that the desirable effect of uniformity of initial population is moresignificant in high dimensions (up to 50 dimensions) while another study, in contrast,claimed that uniform initialization techniques loose their effectiveness in problems of 12 ormore dimensions.” *

* B. Kazimipour, X. Li, and A. K. Qin. "A review of population initialization techniques for evolutionary algorithms." In Evolutionary Computation (CEC), 2014 IEEE Congress on, pp. 2585-2592. IEEE, 2014.


Outlines

1. Introduction

2. Background

3. Questions

4. Experiments

5. Results

6. Conclusion


Definitions of Randomness

• True Random:

– A true random sequence is usually described as a sequence having strong properties such as complete unpredictability, incompressibility and irregularity.

– Some believe true random sequences do not exist (theoretical drawback).

– There is no tool to proof a given sequence is truly random (empirical drawback).

• Computational Random:

– A sequence is computationally random if it passes some tests on the properties of true randomness e.g., unpredictability, and incompressibility.

• Statistical Random:

– A sequence is statistically random if it passes some tests on the statistical (distributional) properties of true random sequences e.g., uniformity.


Continuum of RandomnessCompletely Deterministic

Truly Random

• In this work, we follow the technique proposed in [1] to categorise PITs based on randomness:

Does output depend on

initial seed ?

Stochastic Deterministic

YES NO

Measuring Randomness


Categorization based on Randomness


Population Initialization Techniques

Stochastic

Pseudo-Random Number Generator

Chaotic Number Generator

Deterministic

Quasi-Random Sequence

Uniform Experimental Design

Stochastic vs. Deterministic

Stochastic

• Definition:

– Their results depend on initial seeds.

• Properties:

– Unpredictable (computationally)

– Irregularity

• Examples:

– Pseudo-Random Number Generator (PRNG)

– e.g. WELL, KISS, and Mersenne Twister

– Chaotic Number Generator (CNG)– e.g. Tent, Logistic and Sine

Deterministic

• Definition:

– They always generate the same population regardless of any initial seed.

• Properties:

– Population uniformity is more important than randomness or unpredictability.

• Examples :

– Quasi-random Sequence– e.g. Sobol, Halton

– Uniform Experimental Design– e.g. Latin hypercube, good lattice points and

orthogonal design

SEAL 2014, Dunedin, NZ Why Advanced PITs Perform Poorly in HD? 10

Outlines

1. Introduction

2. Background

3. Questions

4. Experiments

5. Results

6. Conclusion


QuestionsGoal

• Research Question:

– Why EAs do not receive great benefit from advanced population initialization techniques when dimensionality of problems are very high?

• Hypothesis:

– The uniformity of population for both simple and advanced techniques drop to the same level when dimensionality grows.


Questions…Two experiments

Part A (baseline technique)

• Goal: Study the trend of population uniformity when generated by popular but simple techniques*.

• Research Questions:

1. How much the uniformity of a population can be affected by dimensionality ?

2. Is it possible to enhance the uniformity of initial population in high dimensional spaces by increasing the population size ?

Part B (advanced techniques)

• Goal: Compare the performance of advanced initialization techniques with a commonly used technique*.

• Research Questions:

1. Can adopting advancedinitialization techniques significantly improve population uniformity?

2. How population size affects performance of advancedinitializers?

SEAL 2014, Dunedin, NZ Why Advanced PITs Perform Poorly in HD? 13

*Random number generators (RNG) are the most widely used initializers in the field of EA.

Questions…Quality measures

In both parts, we use discrepancy values to measure quality of populations.

• Definition of discrepancy:

– Literally, discrepancy means non-uniformity.

– Technically, discrepancy measures are tools for determining non-uniformity level of a given point set.

– Point sets with low discrepancy are those with high level of uniformity.

• Variations of discrepancy:

– Star L2-discrepancy

– Centred L2-discrepancy*

– Modified L2-discrepancy

– Symmetric L2-discrepancy

– Wrap-around L2-discrepancy

* Centred L2-discrepancy (CD) is used in this study.


Questions…Analytic formulas

• L2-discrepancy (D: dimensionality, N: population size, P: population, xi,j: ith value of jth

individual)

• Centred L2-discrepancy (D: dimensionality, N: population size, P: population, xi,j: ith

value of jth individual)


Questions…Why we chose discrepancy?

Discrepancy measures with analytic formulas are use d in this study because:

�Discrepancy values are not affected by the features of benchmarked problems, employed EAs or their parameters.

– Unlike final fitness value and success rate.

�Discrepancy measures can be easily applied to all kinds of real-value populations.– Unlike DieHard and TestU01 which can only be applied on stochastic population.

�Discrepancy measures having analytic formulas are faster than similar iterative/recursive algorithms (ideal for large and high dimensional populations).

– Unlike early variants of Lp-discrepancy.


Outlines

1. Introduction

2. Background

3. Questions

4. Experiments

5. Results

6. Conclusion


ExperimentsSetup

• Six population initialization techniques are selected to study.

• Three stochastic and three deterministic techniques are included in the experiments.

• RNG, which is the most common and simple initializer is chosen as the control method.


Experiments…Setup

• In both parts:

– 20 different dimension sizes are examined (2 ≤ D ≤ 1,000).

– 20 different population sizes are examined (10 ≤ N ≤ 10,000).

– Each experiment is run for 25 times:

– 25 unique initial seeds are used for stochastic techniques

– 25 unique sequences are used for deterministic techniques (skip schema)

• Part A (baseline technique)

– Only performance of RNG is examined in different situations.

• Part B (advanced technique)

– Performance of advanced techniques are compared with the baseline (RNG).


Outlines

1. Introduction

2. Background

3. Questions

4. Experiments

5. Results

– Part A

– Part B

6. Conclusion


ResultsPart A – Dimensionality effect


ResultsPart A – Dimensionality effect


• Discrepancy grows (i.e., uniformity drops) exponentially when the dimensionality increases.

– Discrepancy of 10,000 points in 50 dimensions is comparable with the discrepancy of 10 points in 30 dimensions!

– 66% growth in dimensionality demands 100,000% increase in population size to recover the uniformity!

• For D ≤ 50, a large population size may lessen the undesirable effect of dimensionality (zoomed in the graph)

ResultsPart A – Low dimensions


ResultsPart A – Low dimensions


• Population size has no considerable effect on the uniformity of very small-sized problems (D ≤ 10).

• For 30 ≤ D ≤ 50, population size has a significant effect on uniformity such that it can be improved 10 to 20 times in the CD scale.

• The magnitude of improvements falls rapidly such that increasing population size beyond 1,000 points shows only a minimal improvement.

ResultsPart A – Medium dimensions


ResultsPart A – Medium dimensions


• Increase in population size significantly lessens the effect of dimensionality (specially N ≤ 200)

• The magnitudes of improvements fall as population grows.

ResultsPart A – High dimensions


ResultsPart A – High dimensions


• Uniformity of populations in spaces of above 100 dimensions is so weak that increasing population size from 1,000 to 10,000 cannot recover it.

• The feasible and reasonable population size for very large-scale problems (100 ≤ D) is surprisingly less than 1,000 points.

• It does not imply N has no effect in D > 100. Instead, it means N must be astronomically large to achieve a significant enhancement. Since evaluating high dimensional populations in that magnitude is currently computationally infeasible, keeping it around 1,000 points is more practical and reasonable.

Outlines

1. Introduction

2. Background

3. Questions

4. Experiments

5. Results

– Part A

– Part B

6. Conclusion


ResultsPart B – Improvement

Improvement over common technique:

• To compare advanced initialization techniques with a common RNG, we propose a simple formula reflecting relative improvement achieved from each advanced technique:

where Pc is the population generated by the control technique (RNG), and Pi is the population produced by the ith advanced initialization technique and CD is centred L2-discrepancy.


ResultsPart B – Low dimensions


ResultsPart B – Low dimensions


• Some techniques (TNT and SBL ) are successful in improving the common initializer (RNG), although the biggest improvement in 2 ≤ D ≤ 50 is less than 20%.

• Some techniques (GLP) are very sensitive to population size, others (SBL ) are more stable.

• For D ≤ 50 ,with no exception, all techniques work relatively better when population size increases.

• Mixed good and bad results can be expected from both categories of initialization techniques*.

*B. Kazimipour, X. Li, and A. K. Qin. "Initialization methods for large scale global optimization." In Evolutionary Computation (CEC), 2013 IEEE Congress on, pp. 2750-2757. IEEE, 2013.

ResultsPart B – Medium and High dimensions




• All trends converge to one of the three values: 0%, -25% and -80%.

• This clearly shows that employing advanced initialization techniques provides no significant improvement in high dimensions, at least in terms of uniformity.



• Even increasing population size from 10 to 10,000 does not result in any relative improvement

• SBL with 10 and TNT with all population sizes perform almost the same as RNG.

• The others, however, perform poorly in comparison with a RNG having the same population size*.

* B. Kazimipour, X. Li, and A. K. Qin. "Effects of population initialization on differential evolution for large scale optimization." In Evolutionary Computation (CEC), 2014 IEEE Congress on, pp. 2404-2411. IEEE, 2014.

Outlines

1. Introduction

2. Background

3. Questions

4. Experiments

5. Results

6. Conclusion


ConclusionWhat we did

• We investigate the reasons that causes advanced population initialization techniques to perform as poor as simple RNG in high dimensional spaces.

• We also studied the effect of population size on the quality (uniformity) of the resulting populations.

• We studied:

– 6 techniques (3 deterministic and 3 stochastic),

– 20 dimension sizes (up to 1,000),

– 20 population sizes (up to 10,000),

– thorough 25 runs.


ConclusionWhat we observed

• The uniformity of initial population drops exponentially when dimensionality rises linearly.

• Increasing population size up to a computationally feasible bound cannot maintain uniformity (except for some small and medium-sized spaces).

• The advanced initializers are as vulnerable to the curse of dimensionality as simple RNG.

• Adopting advanced initializers in medium and large-scale spaces does not result in any significant improvement.

• Some advanced techniques are even more sensitive to the adverse effect of dimensionality than the simple RNG.


We only recommend the use of advanced techniques when the population and dimension sizes are small. In higher dimensional spaces or when the population size is relatively large, no significant improvement is excepted from advanced techniques.

Thank you☺☺☺☺

Any question or comment?

39SEAL 2014, Dunedin, NZ Why Advanced PITs Perform Poorly in HD?

why advanced population initialization techniques perform poorly in high dimension

Data & Analytics

initial population

advanced pits

populationbased algorithms

random empirical drawback

given sequence

properties of true randomness

continuous techniques

high dimensions