“conservation of information in evolutionary search algorithms: measuring the cost of success”
DESCRIPTION
“Conservation of Information in Evolutionary Search Algorithms: Measuring the Cost of Success”. Robert J. Marks II. Abstract. - PowerPoint PPT PresentationTRANSCRIPT
“Conservation of Information in Evolutionary Search Algorithms:Measuring the Cost of Success”
Robert J. Marks II
Abstract
Conservation of information theorems indicate that any search algorithm performs on average as well as random search without replacement unless it takes advantage of problem-specific information about the search target or the search-space structure.
Combinatorics shows that even a moderately sized search requires problem-specific information to be successful. Three measures to characterize the information required for successful search are (1) endogenous information, which measures the difficulty of finding a target using random search; (2) exogenous information, which measures the difficulty that remains in finding a target once a search takes advantage of problem-
specific information; and (3) active information, which, as the difference between endogenous and exogenous information, measures the contribution of problem-specific
information for successfully finding a target. A methodology is developed based on these information measures to gauge the effectiveness with which problem-specific information facilitates successful search. It then applies this methodology to various
search tools widely used in evolutionary search.
Information
“The [computing] machine does not create any new information, but it performs a very valuable transformation of known information.”
--Leon Brillouin, Science and Information Theory (Academic Press, New York, 1956).
What is Evolutionary Computation?
Simulation of Evolution on a Computer
A set of possible solutions
Computer Model
How good is each solution?
Keep a set of the best
solutions
Duplicate, Mutate &
Crossover
Survival of the fittest
Mutation
Next generation
Search in Engineering Design
Yagi-Uda antenna (1954) Can we do better? Engineers…
1. Create a parameterized model
2. Establish a measure design’s fitness
3. Search the N-D parameter space
Parameters that give results better than Yagi-Uda
T
Space of all parameters.
Designed by Evolutionary Search at NASA
http://ic.arc.nasa.gov/projects/esg/research/antenna.htm
Random vs. Assisted Search: Information is given to you...
TargetTarget Target Info
•Warmer!
•Interval HalvingSearch
Space Info
•Steepest Descent
•Conjugate Gradient Descent
Blind Search
From the movie UHF
Search Space Assumption... Monkeys at a typewriter…
27 keys
Apply Bernoulli's principle of insufficient reason
“in the absence of any prior knowledge, we
must assume that the events have equal
probability
Jakob Bernoulli, ``Ars Conjectandi'' (``The Art of Conjecturing), (1713).
Information Theoretic Equivalent: Maximum Entropy
(A Good Optimization Assumption)
How Does Moore’s Law Help?
Computer today searches for a target of B =10000 bits in a year.
Double the speed.
Faster Computer searches for a target of B + 1 =10001 bits in a year.
Defining “Impossible”
Defining “Impossible”
Converting Mass to Computing Power
Minimum energy for an irreversible bit Von Neumann-Landaurer limit =
ln(2) k T = 1.15 x 10 -23 joules Mass of Universe ~ 1053 kg. Convert all the
mass in the universe to energy (E=mc2) , we could generate 7.83 x 1092 Bits
1. Assuming background radiation of 2.76 degrees Kelvin
Expected number
= NL
How Long a Phrase? Target
L
IN THE BEGINNING ... EARTH
JFD SDKA ASS SA ... KSLLS KASFSDA SASSF A ... JDASF J ASDFASD ASDFD ... ASFDG JASKLF SADFAS D ... ASSDF
.
.
.
IN THE BEGINNING ... EARTHN = 27
characters
How Long a Phrase from the Universe?
p=N-L
7.83 x 1092 bits = NL log2 NL
For N = 27,p=N-L
L = 63 characters
ppB log
Number of bits expected for a random search
How Long a Phrase from the Multiverse?
Does Quantum Computing Help?
Quantum computing reduces search time by a square root.
L. K. Grover, “A fast quantum mechanical algorithm for data search”,
Proc. ACM Symp. Theory Computing, 1996, pp. 212--219.
Active Information in Search
Probability Search Space
Pr()=1
TargetT
t
Pr[tT ] =
||||
T
Fitness
Each point in the parameter space has a fitness. The problem of the search is finding a good enough fitness.
Acceptable solutions
T
Search Algorithms
Steepest Ascent
Exhaustive
Newton-Rapheson
Levenberg-Marquardt
Tabu Search
Simulated Annealing
Particle Swarm Search
Evolutionary Approaches
Problem: In order to work better than average, each algorithm implicitly
assumes something about the search space and/or location of the
target.
No Free Lunch Theorem
With no knowledge of With no knowledge of where the target is at and where the target is at and no knowledge about the no knowledge about the
fitness surface, one fitness surface, one search performs, on search performs, on
average, as good as any average, as good as any another.another.
NFLT is obvious...
Chances of opening lock in 5 tries is independent of the algorithm used.
Quotes on the need for added information for targeted search …
1. “…unless you can make prior assumptions about the ... [problems] you are working on, then no search strategy, no matter how sophisticated, can be expected to perform better than any other” Yu-Chi Ho and D.L. Pepyne, (2001).
2. No free lunch theorems “indicate the importance of incorporating problem-specific knowledge into the behavior of the [optimization or search] algorithm.” David Wolpert & William G. Macready (1997).
1. ``Simple explanantion of the No Free Lunch Theorem", Proceedings of the 40th IEEE Conference on Decision and Control, Orlando, Florida, 2. "No free lunch theorems for optimization", IEEE Trans. Evolutionary Computation 1(1): 67-82 (1997).
Therefore...
Nothing works better, on the average, than random search.
For a search algorithm like evolutionary search to work, we require active information.
Can a computer program generate more information than it is given
If a search algorithm does not obey the NFL
theorem, it “is like a perpetual motion machine
- conservation of generalization
performance precludes it.” Cullen Schaffer (1994) – anticipating the NFLT.
3. Cullen Schaffer, 1994. “A conservation law for generalization performance,”in Proc. Eleventh International Conference on Machine Learning, H. Willian and W. Cohen, San Francisco: Morgan Kaufmann, pp.295-265.
9
6
?
Targeted Search
||||]Pr[
TTtp
Probability Search Space
TargetT
Bernoulli's Principle of Insufficient Reason = Maximum
Entropy Assumption
ninformatio e
log2
ndogenouspI
Endogenous Information
TargetT
||||log
log
2
2
T
pI
This is all of the information we can get from the search. We can get no more.
.ppS ,ppS
Probability of Success.Choose a search algorithm...
.ppS
SpLet
be the probability of success of an evolutionary search.
If there is no added information:
If
information has been added.
.1Sp
Active Information
SppI 2log
IpI
1log2
= all of the available information
1. For a “perfect search”,
reference
Checks:
Active Information
SppI 2log
Checks:
0log2
p
pI
= no active information
2. For a “blind query”, .ppS
EXAMPLES of ACTIVE INFORMATION
Random SearchRandom SearchPartitioned SearchPartitioned Search
FOO Search in Alphabet & NucleotidesFOO Search in Alphabet & NucleotidesStructured Information (ev) Structured Information (ev)
Stepping Stone Search (Avida)Stepping Stone Search (Avida)
1. Active Information in Random Searches...
For random search, for very small p
Qppp Q
S
)1(1
Q = Number of Queries (Trials)
p = success of a trial
pS = chance of one or more successes
1. Active Information in Random Searches...
IIpQQppS 2222 loglogloglog
1. Active information is not a function of the size of the space or the probability of success – but only the number of queries.
2. There is a diminishing return. Two queries gives one bit of added information. Four queries gives two bits. Sixteen queries gives four bits, 256 gives 8 bits, etc.
2. Active Information in Partitioned Search...
METHINKS*IT*IS*LIKE*A*WEASEL
XEHDASDSDDTTWSW*QITE*RIPOCFL
XERXPLEE*ETSXSR*IZAW**LPAEWL
MEQWASKL*RTPLSWKIRDOU*VPASRL
METHINKS*IT*IS*LIKE*A*WEASEL
yada yada yada
2. Active Information in Partitioned Search...
METHINKS*IT*IS*LIKE*A*WEASEL
QI Q 2log
For random search
QLI Q 2log
Hints amplify the added information by a factor of L.
For Partitioned Search
Comparison
METHINKS*IT*IS*LIKE*A*WEASEL
L= 28 characters, 27 in alphabet
iterations 43Q
Reality: For Partitioned Search
iterations 101.1973 40Q
For Random Search
There is a lot of active information!
2. Active Information in Partitioned Search...
2. Domain knowledge can be applied differently resulting in varying degrees of active information
The knowledge used in The knowledge used in partitioned search can partitioned search can be used to find all the be used to find all the
letters and spaces in an letters and spaces in an arbitrarily large library arbitrarily large library using only 26 queries.using only 26 queries.
1.1. Specify a target of bits of length Specify a target of bits of length LL2.2. Initiate a string of random bits.Initiate a string of random bits.3.3. Form two children with mutation (bit flip) probability of Form two children with mutation (bit flip) probability of
. . 4.4. Find the best fit of the two children. Kill the parent and Find the best fit of the two children. Kill the parent and
weak child. If there is a tie between the kids, flip a coin.weak child. If there is a tie between the kids, flip a coin.5.5. Go to Step 3 and repeat.Go to Step 3 and repeat.
(WLOG, assume target is all ones)(WLOG, assume target is all ones)
3. Single Agent Mutation (MacKay)Single Agent Mutation (MacKay)
If If <<<< 11 , , this is a Markov this is a Markov birth process.birth process.
3.3. Single Agent Mutation (MacKay)Single Agent Mutation (MacKay)
k ones
1-
11- 2- 2 ((L-kL-k))
22 ((L-L-kk))
= 0.00005, = 0.00005, LL=128 bits=128 bits
128 bits = perfect search
information
0.0022 bits per query
I+ (Q) = 126.7516 bits
4. Active FOO InformationFOO = frequency of occurrence E 11.1607% M 3.0129%
A 8.4966% H 3.0034%
R 7.5809% G 2.4705%
I 7.5448% B 2.0720%
O 7.1635% F 1.8121%
T 6.9509% Y 1.7779%
N 6.6544% W 1.2899%
S 5.7351% K 1.1016%
L 5.4893% V 1.0074%
C 4.5388% X 0.2902%
U 3.6308% Z 0.2722%
D 3.3844% J 0.1965%
P 3.1671% Q 0.1962%
Information of nth Letter
)log( nn pI Average information=Entropy
)log( nn
n ppH
Concise Oxford Dictionary (9th edition, 1995)
English Alphabet Entropy
English– Uniform
– FOO
– Active information
characterper bits 76.4)27log(
H
characterper bits 36.3
)log(
nn
n ppH
characterper bits40.1I
Kullback-Leibler Distance between FOO and Maximum Entropy
Asymptotic Equapartition Theorem
Target
A FOO structuring of a long message restricts search to a subspace uniform in .
T
FOO Subspace
For a message with L characters with alphabet of N letters...
elements LNelements 2 LLI N
Asymptotic Equapartition Theorem
For King James Bible using FOO, the active information is
I+ = 6.169 MB.Endogenous Information
I = 16.717 MB Can we add MORE information?
digraphstrigraphs
5. Stepping Stone Search
1. STONE_Establish Sub Alphabet
2. TEN_TOES_Establish FOO
3. _TENSE_TEEN_TOOTS_ONE_TONE_TEST_SET_
Phrase
Endogenous Information
= 36 log2 27 =171 bits
SSS Active Information
I+= 29 bits
Examples of Active Information
The NFL theorem has been useful to address the "sometimes outrageous claims that had been made of specific optimization algorithms“
S. Christensen and F. Oppacher, "What can we learn from No Free Lunch? A First Attempt to Characterize the Concept of a Searchable,“ Proceedings of the Genetic and Evolutionary Computation (2001).
"Torture numbers, and they'll confess to anything." Gregg EasterbrookGregg Easterbrook
Example of Active Information
Schneider’s EV
Equivalent to inverting a perceptron:
24 weights on
[-511, 512]
Bias
[-511, 512]
error
String of 131 nucleotides
131 fixed binding site locations. 16 binding sites.
The Function...
The Results...
The Illusion...The Illusion...
Finding a specific stream of 131 bits has a Finding a specific stream of 131 bits has a probability ofprobability of
Endogenous InformationEndogenous Information
II = 131 bits= 131 bits
p p = 2= 2-131-131= 3.67 = 3.67 xx 10 10-38-38
N N = 704 generations x 64 genomes = 704 generations x 64 genomes per generation = 45,056 queriesper generation = 45,056 queries
for a perfect searchfor a perfect search
Active information rate = 3 Active information rate = 3 millibits per querymillibits per query
Source of active information: perceptron Source of active information: perceptron structure and error measure.structure and error measure.
AVIDA, NAND Logic & SSS
AVIDA, NAND Logic & SSS
26 AVIDA, Alphabet
GOAL: Evolve and XNOR
AVIDA Stepping Stone Targets
AVIDA, NAND Logic & SSS
Yada yada yada...
AVIDA, NAND Logic & SSS
AVIDA, NAND Logic & SSS
Summary of Points...
Evolutionary Computing Schaffer’s Perpetual Motion Machine for Information
Active information can be measured analytically or through simulation.
Active & endogenous information should be reported in all published models of simulated targeted evolution.
What is the source of active information?
Simon Conway Morris
Simon Conway Morris explores the evidence demonstrating life’s almost eerie ability to navigate to a single solution, repeatedly.
EvoInfo.org
Finis
Finis
Finis