slightly beyond turing’s computability for studying genetic programming olivier teytaud, tao,...
Post on 20-Dec-2015
220 views
TRANSCRIPT
Slightly beyond Slightly beyond Turing’s Turing’s
computability for computability for studying Genetic studying Genetic
ProgrammingProgrammingOlivier Teytaud, Tao, Inria, Olivier Teytaud, Tao, Inria, Lri, UMR CNRS 8623, Univ. Lri, UMR CNRS 8623, Univ. Paris-Sud, Pascal, DigiteoParis-Sud, Pascal, Digiteo
OutlineOutline
What is genetic programmingWhat is genetic programming Formal analysis of Genetic Formal analysis of Genetic
ProgrammingProgramming Why is there nothing else than Why is there nothing else than
Genetic Programming ?Genetic Programming ? Computability point of viewComputability point of view Complexity point of viewComplexity point of view
What is Genetic What is Genetic Programming (GP)Programming (GP)
GP = mining Turing-equivalent spaces of GP = mining Turing-equivalent spaces of functionsfunctions
Typical example: symbolic regression.Typical example: symbolic regression. Inputs:Inputs:
x1,x2,x3,…,xN in {0,1}*x1,x2,x3,…,xN in {0,1}* y1,y2,y3,…,yN in {0,1} yi=f(xi) y1,y2,y3,…,yN in {0,1} yi=f(xi) (xi,yi)(xi,yi) assumed independently identically distributed assumed independently identically distributed
(unknown distribution of probability)(unknown distribution of probability) Goal:Goal:
Finding g such that Finding g such that
EE|g(x)-y| + C |g(x)-y| + C EE Time(g,x) Time(g,x)
as small as possibleas small as possible
How does GP works ?How does GP works ?
GP = evolutionary algorithm.GP = evolutionary algorithm. Evolutionary algorithm:Evolutionary algorithm:
P = initial populationP = initial population While (my favorite criterion)While (my favorite criterion)
SelectionSelection = best functions in P according to = best functions in P according to some scoresome score
MutationsMutations = random perturbations of progs in = random perturbations of progs in the the SelectionSelection
Cross-overCross-over = merging of programs in the = merging of programs in the SelectionSelection
P ≈ P ≈ SelectionSelection + + MutationsMutations + + Cross-overCross-over
How does GP works ?How does GP works ?
GP = evolutionary algorithm.GP = evolutionary algorithm. Evolutionary algorithm:Evolutionary algorithm:
P = initial populationP = initial population While (my favorite criterion)While (my favorite criterion)
SelectionSelection = best functions in P according to = best functions in P according to some scoresome score
MutationsMutations = random perturbations of progs in = random perturbations of progs in the the SelectionSelection
Cross-overCross-over = merging of programs in the = merging of programs in the SelectionSelection
P ≈ P ≈ SelectionSelection + + MutationsMutations + + Cross-overCross-over
Does itwork ?
How does GP works ?How does GP works ?
GP = evolutionary algorithm.GP = evolutionary algorithm. Evolutionary algorithm:Evolutionary algorithm:
P = initial populationP = initial population While (my favorite criterion)While (my favorite criterion)
SelectionSelection = best functions in P according to = best functions in P according to some scoresome score
MutationsMutations = random perturbations of progs in = random perturbations of progs in the the SelectionSelection
Cross-overCross-over = merging of programs in the = merging of programs in the SelectionSelection
P ≈ P ≈ SelectionSelection + + MutationsMutations + + Cross-overCross-over
Does itwork ?
Definitely, yes forrobust and multimodaloptimization in complex
domains (trees, bitstrings,…).
How does GP works ?How does GP works ?
GP = evolutionary algorithm.GP = evolutionary algorithm. Evolutionary algorithm:Evolutionary algorithm:
P = initial populationP = initial population While (my favorite criterion)While (my favorite criterion)
SelectionSelection = best functions in P according to = best functions in P according to some scoresome score
MutationsMutations = random perturbations of progs in = random perturbations of progs in the the SelectionSelection
Cross-overCross-over = merging of programs in the = merging of programs in the SelectionSelection
P ≈ P ≈ SelectionSelection + + MutationsMutations + + Cross-overCross-over
Does itwork ?
How does GP works ?How does GP works ?
GP = evolutionary algorithm.GP = evolutionary algorithm. Evolutionary algorithm:Evolutionary algorithm:
P = initial populationP = initial population While (my favorite criterion)While (my favorite criterion)
SelectionSelection = best functions in P according to = best functions in P according to some scoresome score
MutationsMutations = random perturbations of progs in = random perturbations of progs in the the SelectionSelection
Cross-overCross-over = merging of programs in the = merging of programs in the SelectionSelection
P ≈ P ≈ SelectionSelection + + MutationsMutations + + Cross-overCross-over
Which score ? A nice question
for mathematicians
Why studying GP ?Why studying GP ? GP is studied by many peopleGP is studied by many people
5440 articles in the GP bibliography [5] 5440 articles in the GP bibliography [5] More than 880 authorsMore than 880 authors
GP seemingly worksGP seemingly works Human-competitive results Human-competitive results
http://www.genetic-programming.com/humancohttp://www.genetic-programming.com/humancompetitive.htmlmpetitive.html
Nothing else for mining Turing-equivalent Nothing else for mining Turing-equivalent spaces of programsspaces of programs
Probably better than random searchProbably better than random search Not so many mathematical fundations in GPNot so many mathematical fundations in GP Not so many open problems in Not so many open problems in
computability, in particular with computability, in particular with applicationsapplications
OutlineOutline
What is genetic programmingWhat is genetic programming Formal analysis of Genetic Formal analysis of Genetic
ProgrammingProgramming Why is there nothing else than Why is there nothing else than
Genetic Programming ?Genetic Programming ? Computability point of viewComputability point of view Complexity point of viewComplexity point of view
Formalization of GPFormalization of GP
What is typically GP ?What is typically GP ? No halting criterion. We stop when time is No halting criterion. We stop when time is
exhausted.exhausted. No use of prior knowledge; no use of f, No use of prior knowledge; no use of f,
whenever you know it.whenever you know it.
People (often) do not like GP because:People (often) do not like GP because: It is slow and has no halting criterionIt is slow and has no halting criterion It uses the yi=f(xi) and not f (different from It uses the yi=f(xi) and not f (different from
automatic code generation)automatic code generation)
Are these two elements necessary ?Are these two elements necessary ?
Iterative algorithmsIterative algorithms
Black-box ?Black-box ?
Formalization of GPFormalization of GP
Summary:Summary:
GP uses only the f(xi) and the Time(f,xi).GP uses only the f(xi) and the Time(f,xi).
GP never halts: O1, O2, O3, … .GP never halts: O1, O2, O3, … .
Can we do better ?Can we do better ?
OutlineOutline
What is genetic programmingWhat is genetic programming Formal analysis of Genetic Formal analysis of Genetic
ProgrammingProgramming Why is there nothing else than Why is there nothing else than
Genetic Programming ?Genetic Programming ? Computability point of viewComputability point of view Complexity point of viewComplexity point of view
Known resultsKnown results
Whenever f is available (and not only Whenever f is available (and not only the f(xi) ), computing O such that the f(xi) ), computing O such that O≡f O≡f O optimal for size (or speed, or space O optimal for size (or speed, or space
…)…)
is not possible.is not possible.
(i.e. there’s no Turing machine (i.e. there’s no Turing machine performing that task for all f)performing that task for all f)
A first (easy) good reason A first (easy) good reason for GP.for GP.Whenever f Whenever f isis available (and not only the f(xi) ), available (and not only the f(xi) ),
computing O1, O2, …, such that computing O1, O2, …, such that Op ≡ f for p sufficiently largeOp ≡ f for p sufficiently large Lim size(Op) optimalLim size(Op) optimalisis possible, with proved convergence rates, e.g. by possible, with proved convergence rates, e.g. by
bloat penalization:bloat penalization:- while (true)- while (true) - select the best program P for a - select the best program P for a compromisecompromise
relevance on the n first relevance on the n first examples examples
+ penalization of size,+ penalization of size,e.g. Sum |P(xi)-yi |+ C( |P| , e.g. Sum |P(xi)-yi |+ C( |P| ,
n )n ) i < ni < n
- n=n+1- n=n+1(see details of the proof and of the algorithm in the (see details of the proof and of the algorithm in the paper)paper)
A first (easy) good reason A first (easy) good reason for GP.for GP.Whenever f is Whenever f is notnot available (and not only the f(xi) ), available (and not only the f(xi) ),
computing O1, O2, …, such that computing O1, O2, …, such that Op ≡ f for p sufficiently largeOp ≡ f for p sufficiently large Lim size(Op) optimalLim size(Op) optimalisis possible, with proved convergence rates, e.g. by bloat possible, with proved convergence rates, e.g. by bloat
penalization:penalization:- consider a population of programs; set n=1- consider a population of programs; set n=1- while (true)- while (true) - select the best program P for a - select the best program P for a compromisecompromise
relevance on the n first examples relevance on the n first examples + penalization of size,+ penalization of size,e.g. Sum |P(xi)-yi |+ e.g. Sum |P(xi)-yi |+ C( |P| , n )C( |P| , n ) i < ni < n- n=n+1- n=n+1
(see details of the proof and of the algorithm in the (see details of the proof and of the algorithm in the paper)paper)
A first (easy) good reason A first (easy) good reason for GP.for GP.
Asymptotically (only!), finding an Asymptotically (only!), finding an optimal optimal
function O ≡ f is possible. function O ≡ f is possible.
No halting criterion is possibleNo halting criterion is possible
(avoids the use of an oracle in (avoids the use of an oracle in 0’)0’)
OutlineOutline
What is genetic programmingWhat is genetic programming Formal analysis of Genetic Formal analysis of Genetic
ProgrammingProgramming Why is there nothing else than Why is there nothing else than
Genetic Programming ?Genetic Programming ? Computability point of viewComputability point of view Complexity point of viewComplexity point of view
OutlineOutline
What is genetic programmingWhat is genetic programming Formal analysis of Genetic Formal analysis of Genetic
ProgrammingProgramming Why is there nothing else than Why is there nothing else than
Genetic Programming ?Genetic Programming ? Computability point of viewComputability point of view Complexity point of view:Complexity point of view:
Kolmogorov’s complexity with bounded timeKolmogorov’s complexity with bounded time Application to genetic programmingApplication to genetic programming
Kolmogorov’s complexityKolmogorov’s complexity
Kolmogorov’s complexity of x :Kolmogorov’s complexity of x :
Minimum size of a program generating xMinimum size of a program generating x Kolmogorov’s complexity of x with time at Kolmogorov’s complexity of x with time at
most T :most T :
Minimum size of a program generating xMinimum size of a program generating x
in time at most T.in time at most T.
Kolmogorov’s complexity in bounded time Kolmogorov’s complexity in bounded time
= computable.= computable.
OutlineOutline
What is genetic programmingWhat is genetic programming Formal analysis of Genetic Formal analysis of Genetic
ProgrammingProgramming Why is there nothing else than Why is there nothing else than
Genetic Programming ?Genetic Programming ? Computability point of viewComputability point of view Complexity point of view:Complexity point of view:
Kolmogorov’s complexity with bounded timeKolmogorov’s complexity with bounded time Application to genetic programmingApplication to genetic programming
Kolmogorov’s complexity Kolmogorov’s complexity and genetic programmingand genetic programming
GP uses expensive simulations of programsGP uses expensive simulations of programs Can we get rid of the simulation time ? e.g. by Can we get rid of the simulation time ? e.g. by
using f not only as a black box ?using f not only as a black box ? Essentially, no:Essentially, no:
Example of GP problem: finding O as small as Example of GP problem: finding O as small as possible with possible with
ETime(O,x)<TETime(O,x)<Tnn, , |O|<S|O|<Snn O(x)=yO(x)=y
If TIf Tnn = = ΩΩ(2(2nn) and some S) and some Snn = O(log(n)), this requires = O(log(n)), this requires time at least Ttime at least Tnn/polynomial(n)/polynomial(n)
Just simulating all programs shorter than SJust simulating all programs shorter than Sn n and and « faster » than T« faster » than Tn n is possible in time polynomial(n)Tis possible in time polynomial(n)Tnn
OutlineOutline
What is genetic programmingWhat is genetic programming Formal analysis of Genetic Formal analysis of Genetic
ProgrammingProgramming Why is there nothing else than Why is there nothing else than
Genetic Programming ?Genetic Programming ? Computability point of viewComputability point of view Complexity point of view:Complexity point of view:
Kolmogorov’s complexity with bounded timeKolmogorov’s complexity with bounded time Application to genetic programmingApplication to genetic programming
ConclusionConclusion
ConclusionConclusion
SummarySummary
GP is typically solving approximately problems in 0’GP is typically solving approximately problems in 0’
A lot of work about approximating NP-complete problems, but not a A lot of work about approximating NP-complete problems, but not a
lot about 0’lot about 0’
We provide a theoretical analysis of GPWe provide a theoretical analysis of GP
Conclusions:Conclusions:
GP uses expensive simulations, but the simulation cost can anyway GP uses expensive simulations, but the simulation cost can anyway
not be removed.not be removed.
GP has no halting criterion, but no halting criterion can be found.GP has no halting criterion, but no halting criterion can be found.
Also, « bloat » penalization ensures consistency Also, « bloat » penalization ensures consistency this point proposes this point proposes
a parametrization of the usual algorithms.a parametrization of the usual algorithms.
ConclusionConclusion
SummarySummary
GP is typically solving approximately problems in 0’GP is typically solving approximately problems in 0’
A lot of work about approximating NP-complete problems, but not a A lot of work about approximating NP-complete problems, but not a
lot about 0’lot about 0’
We provide a We provide a theoreticaltheoretical analysis of GP analysis of GP
Conclusions:Conclusions:
GP uses expensive simulations, but the simulation cost can anyway GP uses expensive simulations, but the simulation cost can anyway
not be removed.not be removed.
GP has no halting criterion, but no halting criterion can be found.GP has no halting criterion, but no halting criterion can be found.
Also, « bloat » penalization ensures consistency Also, « bloat » penalization ensures consistency this point proposes this point proposes
a parametrization of the usual algorithms.a parametrization of the usual algorithms.
ConclusionConclusion
SummarySummary
GP is typically solving approximately problems in 0’GP is typically solving approximately problems in 0’
A lot of work about approximating NP-complete problems, but not a A lot of work about approximating NP-complete problems, but not a
lot about 0’lot about 0’
We provide a We provide a mathematicalmathematical analysis of GP analysis of GP
Conclusions:Conclusions:
GP uses expensive simulations, but the simulation cost can anyway GP uses expensive simulations, but the simulation cost can anyway
not be removed.not be removed.
GP has no halting criterion, but no halting criterion can be found.GP has no halting criterion, but no halting criterion can be found.
Also, « bloat » penalization ensures consistency Also, « bloat » penalization ensures consistency this point proposes this point proposes
a parametrization of the usual algorithms.a parametrization of the usual algorithms.