“conservation of information in evolutionary search algorithms: measuring the cost of success”

“Conservation of Information in Evolutionary Search Algorithms:Measuring the Cost of Success”

Robert J. Marks II

Abstract

Conservation of information theorems indicate that any search algorithm performs on average as well as random search without replacement unless it takes advantage of problem-specific information about the search target or the search-space structure.

Combinatorics shows that even a moderately sized search requires problem-specific information to be successful. Three measures to characterize the information required for successful search are (1) endogenous information, which measures the difficulty of finding a target using random search; (2) exogenous information, which measures the difficulty that remains in finding a target once a search takes advantage of problem-

specific information; and (3) active information, which, as the difference between endogenous and exogenous information, measures the contribution of problem-specific

information for successfully finding a target. A methodology is developed based on these information measures to gauge the effectiveness with which problem-specific information facilitates successful search. It then applies this methodology to various

search tools widely used in evolutionary search.

Information

“The [computing] machine does not create any new information, but it performs a very valuable transformation of known information.”

--Leon Brillouin, Science and Information Theory (Academic Press, New York, 1956).

What is Evolutionary Computation?

Simulation of Evolution on a Computer

A set of possible solutions

Computer Model

How good is each solution?

Keep a set of the best

solutions

Duplicate, Mutate &

Crossover

Survival of the fittest

Mutation

Next generation

Search in Engineering Design

Yagi-Uda antenna (1954) Can we do better? Engineers…

1. Create a parameterized model

2. Establish a measure design’s fitness

3. Search the N-D parameter space

Parameters that give results better than Yagi-Uda

T

Space of all parameters.

Designed by Evolutionary Search at NASA

http://ic.arc.nasa.gov/projects/esg/research/antenna.htm

Random vs. Assisted Search: Information is given to you...

TargetTarget Target Info

•Warmer!

•Interval HalvingSearch

Space Info

•Steepest Descent

•Conjugate Gradient Descent

Blind Search

From the movie UHF

Search Space Assumption... Monkeys at a typewriter…

27 keys

Apply Bernoulli's principle of insufficient reason

“in the absence of any prior knowledge, we

must assume that the events have equal

probability

Jakob Bernoulli, ``Ars Conjectandi'' (``The Art of Conjecturing), (1713).

Information Theoretic Equivalent: Maximum Entropy

(A Good Optimization Assumption)

How Does Moore’s Law Help?

Computer today searches for a target of B =10000 bits in a year.

Double the speed.

Faster Computer searches for a target of B + 1 =10001 bits in a year.

Defining “Impossible”

Converting Mass to Computing Power

Minimum energy for an irreversible bit Von Neumann-Landaurer limit =

ln(2) k T = 1.15 x 10 -23 joules Mass of Universe ~ 1053 kg. Convert all the

mass in the universe to energy (E=mc2) , we could generate 7.83 x 1092 Bits

1. Assuming background radiation of 2.76 degrees Kelvin

Expected number

= NL

How Long a Phrase? Target

L

IN THE BEGINNING ... EARTH

JFD SDKA ASS SA ... KSLLS KASFSDA SASSF A ... JDASF J ASDFASD ASDFD ... ASFDG JASKLF SADFAS D ... ASSDF

.

.

.

IN THE BEGINNING ... EARTHN = 27

characters

How Long a Phrase from the Universe?

p=N-L

7.83 x 1092 bits = NL log2 NL

For N = 27,p=N-L

L = 63 characters

ppB log

Number of bits expected for a random search

How Long a Phrase from the Multiverse?

Does Quantum Computing Help?

Quantum computing reduces search time by a square root.

L. K. Grover, “A fast quantum mechanical algorithm for data search”,

Proc. ACM Symp. Theory Computing, 1996, pp. 212--219.

Active Information in Search

Probability Search Space

Pr()=1

TargetT

t

Pr[tT ] =

||||

T

Fitness

Each point in the parameter space has a fitness. The problem of the search is finding a good enough fitness.

Acceptable solutions

T

Search Algorithms

Steepest Ascent

Exhaustive

Newton-Rapheson

Levenberg-Marquardt

Tabu Search

Simulated Annealing

Particle Swarm Search

Evolutionary Approaches

Problem: In order to work better than average, each algorithm implicitly

assumes something about the search space and/or location of the

target.

No Free Lunch Theorem

With no knowledge of With no knowledge of where the target is at and where the target is at and no knowledge about the no knowledge about the

fitness surface, one fitness surface, one search performs, on search performs, on

average, as good as any average, as good as any another.another.

NFLT is obvious...

Chances of opening lock in 5 tries is independent of the algorithm used.

Quotes on the need for added information for targeted search …

1. “…unless you can make prior assumptions about the ... [problems] you are working on, then no search strategy, no matter how sophisticated, can be expected to perform better than any other” Yu-Chi Ho and D.L. Pepyne, (2001).

2. No free lunch theorems “indicate the importance of incorporating problem-specific knowledge into the behavior of the [optimization or search] algorithm.” David Wolpert & William G. Macready (1997).

1. ``Simple explanantion of the No Free Lunch Theorem", Proceedings of the 40th IEEE Conference on Decision and Control, Orlando, Florida, 2. "No free lunch theorems for optimization", IEEE Trans. Evolutionary Computation 1(1): 67-82 (1997).

Therefore...

Nothing works better, on the average, than random search.

For a search algorithm like evolutionary search to work, we require active information.

Can a computer program generate more information than it is given

If a search algorithm does not obey the NFL

theorem, it “is like a perpetual motion machine

- conservation of generalization

performance precludes it.” Cullen Schaffer (1994) – anticipating the NFLT.

3. Cullen Schaffer, 1994. “A conservation law for generalization performance,”in Proc. Eleventh International Conference on Machine Learning, H. Willian and W. Cohen, San Francisco: Morgan Kaufmann, pp.295-265.

9

6

?

Targeted Search

||||]Pr[

TTtp

Probability Search Space

TargetT

Bernoulli's Principle of Insufficient Reason = Maximum

Entropy Assumption

ninformatio e

log2

ndogenouspI

Endogenous Information

TargetT

||||log

log

2

2

T

pI

This is all of the information we can get from the search. We can get no more.

.ppS ,ppS

Probability of Success.Choose a search algorithm...

.ppS

SpLet

be the probability of success of an evolutionary search.

If there is no added information:

If

information has been added.

.1Sp

Active Information

SppI 2log

IpI

1log2

= all of the available information

1. For a “perfect search”,

reference

Checks:

Active Information

SppI 2log

Checks:

0log2

p

pI

= no active information

2. For a “blind query”, .ppS

EXAMPLES of ACTIVE INFORMATION

Random SearchRandom SearchPartitioned SearchPartitioned Search

FOO Search in Alphabet & NucleotidesFOO Search in Alphabet & NucleotidesStructured Information (ev) Structured Information (ev)

Stepping Stone Search (Avida)Stepping Stone Search (Avida)

1. Active Information in Random Searches...

For random search, for very small p

Qppp Q

S

)1(1

Q = Number of Queries (Trials)

p = success of a trial

pS = chance of one or more successes

1. Active Information in Random Searches...

IIpQQppS 2222 loglogloglog

1. Active information is not a function of the size of the space or the probability of success – but only the number of queries.

2. There is a diminishing return. Two queries gives one bit of added information. Four queries gives two bits. Sixteen queries gives four bits, 256 gives 8 bits, etc.

2. Active Information in Partitioned Search...

METHINKS*IT*IS*LIKE*A*WEASEL

XEHDASDSDDTTWSW*QITE*RIPOCFL

XERXPLEE*ETSXSR*IZAW**LPAEWL

MEQWASKL*RTPLSWKIRDOU*VPASRL


yada yada yada



QI Q 2log

For random search

QLI Q 2log

Hints amplify the added information by a factor of L.

For Partitioned Search

Comparison


L= 28 characters, 27 in alphabet

iterations 43Q

Reality: For Partitioned Search

iterations 101.1973 40Q

For Random Search

There is a lot of active information!


2. Domain knowledge can be applied differently resulting in varying degrees of active information

The knowledge used in The knowledge used in partitioned search can partitioned search can be used to find all the be used to find all the

letters and spaces in an letters and spaces in an arbitrarily large library arbitrarily large library using only 26 queries.using only 26 queries.

1.1. Specify a target of bits of length Specify a target of bits of length LL2.2. Initiate a string of random bits.Initiate a string of random bits.3.3. Form two children with mutation (bit flip) probability of Form two children with mutation (bit flip) probability of

. . 4.4. Find the best fit of the two children. Kill the parent and Find the best fit of the two children. Kill the parent and

weak child. If there is a tie between the kids, flip a coin.weak child. If there is a tie between the kids, flip a coin.5.5. Go to Step 3 and repeat.Go to Step 3 and repeat.

(WLOG, assume target is all ones)(WLOG, assume target is all ones)

3. Single Agent Mutation (MacKay)Single Agent Mutation (MacKay)

If If <<<< 11 , , this is a Markov this is a Markov birth process.birth process.

3.3. Single Agent Mutation (MacKay)Single Agent Mutation (MacKay)

k ones

1-

11- 2- 2 ((L-kL-k))

22 ((L-L-kk))

= 0.00005, = 0.00005, LL=128 bits=128 bits

128 bits = perfect search

information

0.0022 bits per query

I+ (Q) = 126.7516 bits

4. Active FOO InformationFOO = frequency of occurrence E 11.1607% M 3.0129%

A 8.4966% H 3.0034%

R 7.5809% G 2.4705%

I 7.5448% B 2.0720%

O 7.1635% F 1.8121%

T 6.9509% Y 1.7779%

N 6.6544% W 1.2899%

S 5.7351% K 1.1016%

L 5.4893% V 1.0074%

C 4.5388% X 0.2902%

U 3.6308% Z 0.2722%

D 3.3844% J 0.1965%

P 3.1671% Q 0.1962%

Information of nth Letter

)log( nn pI Average information=Entropy

)log( nn

n ppH

Concise Oxford Dictionary (9th edition, 1995)

English Alphabet Entropy

English– Uniform

– FOO

– Active information

characterper bits 76.4)27log(

H

characterper bits 36.3

)log(

nn

n ppH

characterper bits40.1I

Kullback-Leibler Distance between FOO and Maximum Entropy

Asymptotic Equapartition Theorem

Target

A FOO structuring of a long message restricts search to a subspace uniform in .

T

FOO Subspace

For a message with L characters with alphabet of N letters...

elements LNelements 2 LLI N

Asymptotic Equapartition Theorem

For King James Bible using FOO, the active information is

I+ = 6.169 MB.Endogenous Information

I = 16.717 MB Can we add MORE information?

digraphstrigraphs

5. Stepping Stone Search

1. STONE_Establish Sub Alphabet

2. TEN_TOES_Establish FOO

3. _TENSE_TEEN_TOOTS_ONE_TONE_TEST_SET_

Phrase

Endogenous Information

= 36 log2 27 =171 bits

SSS Active Information

I+= 29 bits

Examples of Active Information

The NFL theorem has been useful to address the "sometimes outrageous claims that had been made of specific optimization algorithms“

S. Christensen and F. Oppacher, "What can we learn from No Free Lunch? A First Attempt to Characterize the Concept of a Searchable,“ Proceedings of the Genetic and Evolutionary Computation (2001).

"Torture numbers, and they'll confess to anything." Gregg EasterbrookGregg Easterbrook

Example of Active Information

Schneider’s EV

Equivalent to inverting a perceptron:

24 weights on

[-511, 512]

Bias

[-511, 512]

error

String of 131 nucleotides

131 fixed binding site locations. 16 binding sites.

The Function...

The Results...

The Illusion...The Illusion...

Finding a specific stream of 131 bits has a Finding a specific stream of 131 bits has a probability ofprobability of

Endogenous InformationEndogenous Information

II = 131 bits= 131 bits

p p = 2= 2-131-131= 3.67 = 3.67 xx 10 10-38-38

N N = 704 generations x 64 genomes = 704 generations x 64 genomes per generation = 45,056 queriesper generation = 45,056 queries

for a perfect searchfor a perfect search

Active information rate = 3 Active information rate = 3 millibits per querymillibits per query

Source of active information: perceptron Source of active information: perceptron structure and error measure.structure and error measure.

AVIDA, NAND Logic & SSS

26 AVIDA, Alphabet

GOAL: Evolve and XNOR

AVIDA Stepping Stone Targets


Yada yada yada...

Summary of Points...

Evolutionary Computing Schaffer’s Perpetual Motion Machine for Information

Active information can be measured analytically or through simulation.

Active & endogenous information should be reported in all published models of simulated targeted evolution.

What is the source of active information?

Simon Conway Morris

Simon Conway Morris explores the evidence demonstrating life’s almost eerie ability to navigate to a single solution, repeatedly.

EvoInfo.org

Finis

Finis

Finis

“conservation of information in evolutionary search algorithms: measuring the cost of success”

Documents

information measures

exogenous information

new information

endogenous information

active information

search target

problemspecific information

successful search