information theory in intelligent decision making · 2015. 3. 5. · daniel polani adaptive systems...
TRANSCRIPT
Information Theoryin Intelligent Decision Making
Daniel Polani
Adaptive Systems and Algorithms Research GroupsSchool of Computer Science
University of Hertfordshire, United Kingdom
March 5, 2015
Daniel Polani Information Theory in Intelligent Decision Making
Information Theoryin Intelligent Decision Making
The Theory
Daniel Polani
Adaptive Systems and Algorithms Research GroupsSchool of Computer Science
University of Hertfordshire, United Kingdom
March 5, 2015
Daniel Polani Information Theory in Intelligent Decision Making
Motivation
Artificial Intelligence
modelling cognition in humans
realizing human-level “intelligent” behaviour in machines
jumble of various ideas to get above points working
Question
Is there a joint way of understanding cognition?
Probability
we have probability theory for a theory of uncertainty
we have information theory for endowing probability with asense of “metrics”
Daniel Polani Information Theory in Intelligent Decision Making
Motivation
Artificial Intelligence
modelling cognition in humans
realizing human-level “intelligent” behaviour in machines (justperformance: not necessarily imitating biological substrate)
jumble of various ideas to get above points working
Question
Is there a joint way of understanding cognition?
Probability
we have probability theory for a theory of uncertainty
we have information theory for endowing probability with asense of “metrics”
Daniel Polani Information Theory in Intelligent Decision Making
Random Variables
Def.: Event Space
Consider an event space Ω = ω1, ω2, . . . , finite or countablyinfinite with a (probability) measure PΩ : Ω→ [0, 1] s.t.
∑ω PΩ(ω) = 1. The ω are called events.
Random Variable
A random variable X is a map X : Ω→ X with some outcomespace X = x1, x2, . . . and induced probability measurePX(x) = PΩ(X−1(x)).We also write instead
PX(x) ≡ P(X = x) ≡ p(x) .
Daniel Polani Information Theory in Intelligent Decision Making
Neyman-Pearson Lemma I
Lemma
Consider observations x1, x2, . . . , xn of a random variable Xand two potential hypotheses (distributions) p1 and p2 theycould have been based upon.
Consider the test for hypothesis p1 to be given as(x1, x2, . . . , xn) ∈ A where
A =
x = (x′1, x′2, . . . , x′n)∣∣∣ p1(x′1,x′2,...,x′n)
p2(x′1,x′2,...,x′n)≥ C
with some
C ∈ R+.
Assuming the rate α of false negatives p1(A) to be given.Generated by p1, but not in A
If β is the rate of false positives p2(A)Then: any test with false negative rate α′ ≤ α has false
positive rate β′ ≥ β.
(Cover and Thomas, 2006)
Daniel Polani Information Theory in Intelligent Decision Making
Neyman-Pearson Lemma II
Proof(Cover and Thomas, 2006)
Let A as above and B some other acceptance region; χA and χBbe the indicator functions. Then for all x:
[χA(x)− χB(x)] [p1(x)− Cp2(x)] ≥ 0 .
Multiplying out & integrating:
0 ≤∑A(p1 − Cp2)−∑
B(p1 − Cp2)
= (1− α)− Cβ− (1− α′) + Cβ′
= C(β′ − β)− (α− α′)
Daniel Polani Information Theory in Intelligent Decision Making
Neyman-Pearson Lemma III
Consideration
assume events xi.i.d.
test becomes:
∏i
p1(xi)
p2(xi)≥ C
logarithmize:
∑i
logp1(xi)
p2(xi)≥ κ (:= log C)
Note
: Kullback-Leibler Divergence
Average “evidence” growth per sample
DKL(p1||p2) =
E[
logp1(X)
p2(X)
]= ∑
x∈Xp(x) log
p1(x)p2(x)
Daniel Polani Information Theory in Intelligent Decision Making
Neyman-Pearson Lemma IV
Consideration
assume events xi.i.d.
test becomes:
∏i
p1(xi)
p2(xi)≥ C
logarithmize:
∑i
logp1(xi)
p2(xi)≥ κ (:= log C)
Note
: Kullback-Leibler Divergence
Average “evidence” growth per sample
DKL(p1||p2) =
E[
logp1(X)
p2(X)
]= ∑
x∈Xp(x) log
p1(x)p2(x)
Daniel Polani Information Theory in Intelligent Decision Making
Neyman-Pearson Lemma V
Consideration
assume events xi.i.d.
test becomes:
∏i
p1(xi)
p2(xi)≥ C
logarithmize:
∑i
logp1(xi)
p2(xi)≥ κ (:= log C)
Note: Kullback-Leibler Divergence
Average “evidence” growth per sample
DKL(p1||p2) = Ep1
[log
p1(X)
p2(X)
]= ∑
x∈Xp1(x) log
p1(x)p2(x)
Daniel Polani Information Theory in Intelligent Decision Making
Neyman-Pearson Lemma VI
-100
0
100
200
300
400
500
600
700
800
900
0 2000 4000 6000 8000 10000
log
sum
samples
"0.40_vs_0.60.dat""0.50_vs_0.60.dat""0.55_vs_0.60.dat"
Daniel Polani Information Theory in Intelligent Decision Making
Neyman-Pearson Lemma VII
-100
0
100
200
300
400
500
600
700
800
900
0 2000 4000 6000 8000 10000
log
sum
samples
"0.40_vs_0.60.dat""0.50_vs_0.60.dat""0.55_vs_0.60.dat"
dkl_04*xdkl_05 * x
dkl_055 * x
Daniel Polani Information Theory in Intelligent Decision Making
Part I
Information Theory — Motivation
Daniel Polani Information Theory in Intelligent Decision Making
Structural MotivationIntrinsic Pathways to Information Theory
InformationTheory
optimalcommunication
Shannonaxioms
physicalentropy
Laplace’sprinciple
typicalitytheory
optimal Bayes
Rate Distortion
informationgeometry
Daniel Polani Information Theory in Intelligent Decision Making
Structural MotivationIntrinsic Pathways to Information Theory
InformationTheory
optimalcommunication
Shannonaxioms
physicalentropy
Laplace’sprinciple
typicalitytheory
optimal Bayes
Rate Distortion
informationgeometry
Daniel Polani Information Theory in Intelligent Decision Making
Structural MotivationIntrinsic Pathways to Information Theory
InformationTheory
optimalcommunication
Shannonaxioms
physicalentropy
Laplace’sprinciple
typicalitytheory
optimal Bayes
Rate Distortion
informationgeometry
Daniel Polani Information Theory in Intelligent Decision Making
Structural MotivationIntrinsic Pathways to Information Theory
InformationTheory
optimalcommunication
Shannonaxioms
physicalentropy
Laplace’sprinciple
typicalitytheory
optimal Bayes
Rate Distortion
informationgeometry
Daniel Polani Information Theory in Intelligent Decision Making
Structural MotivationIntrinsic Pathways to Information Theory
InformationTheory
optimalcommunication
Shannonaxioms
physicalentropy
Laplace’sprinciple
typicalitytheory
optimal Bayes
Rate Distortion
informationgeometry
Daniel Polani Information Theory in Intelligent Decision Making
Structural MotivationIntrinsic Pathways to Information Theory
InformationTheory
optimalcommunication
Shannonaxioms
physicalentropy
Laplace’sprinciple
typicalitytheory
optimal Bayes
Rate Distortion
informationgeometry
Daniel Polani Information Theory in Intelligent Decision Making
Structural MotivationIntrinsic Pathways to Information Theory
InformationTheory
optimalcommunication
Shannonaxioms
physicalentropy
Laplace’sprinciple
typicalitytheory
optimal Bayes
Rate Distortion
informationgeometry
Daniel Polani Information Theory in Intelligent Decision Making
Structural MotivationIntrinsic Pathways to Information Theory
InformationTheory
optimalcommunication
Shannonaxioms
physicalentropy
Laplace’sprinciple
typicalitytheory
optimal Bayes
Rate Distortion
informationgeometry
Daniel Polani Information Theory in Intelligent Decision Making
Structural MotivationIntrinsic Pathways to Information Theory
InformationTheory
optimalcommunication
Shannonaxioms
physicalentropy
Laplace’sprinciple
typicalitytheory
optimal Bayes
Rate Distortion
informationgeometry
Daniel Polani Information Theory in Intelligent Decision Making
Structural MotivationIntrinsic Pathways to Information Theory
InformationTheory
AI
optimalcommunication
Shannonaxioms
physicalentropy
Laplace’sprinciple
typicalitytheory
optimal Bayes
Rate Distortion
informationgeometry
Daniel Polani Information Theory in Intelligent Decision Making
Optimal Communication
Codes
task: send messages (disambiguate states) from sender toreceiver
consider self-delimiting codes (without extra delimitingcharacter)
simple example: prefix codes
Def.: Prefix Codes
codes where none is a prefix of another code
Daniel Polani Information Theory in Intelligent Decision Making
Prefix Codes
0 1
0 1
0
0
0
1
1
Daniel Polani Information Theory in Intelligent Decision Making
Kraft Inequality
Theorem
Assume events x ∈ X = x1, x2, . . . xk are coded using prefixcodewords based on alphabet size b = |B|, with lengthsl1, l2, . . . , lk for the respective events, then one has
k
∑i=1
bli ≤ 1 .
Proof Sketch(Cover and Thomas, 2006)
Let lmax be the length of the longest codeword. Expand tree fullyto level lmax. Fully expanded leaves are either: 1. codewords; 2.descendants of codewords; 3. neither.An li codeword has blmax−li full-tree descendants, which must bedifferent for the different codewords and there cannot be morethan blmax in total. Hence
∑ blmax−li ≤ blmax
Remark
The converse also holds.
Daniel Polani Information Theory in Intelligent Decision Making
Considerations — Most compact code
Assume
Want to code stream of eventsx ∈ X appearing with probabilityp(x).
Note
1 try to make li as small aspossible
2 make b−li as large as possible3 limited by Kraft inequality;
ideally becoming equality
∑i
b−li = 1
as li are integers, that’s typically not exact
Minimize
Average code length:E[L] = ∑i p(xi) li under
constraint ∑i b−li != 1
Result
Differentiating Lagrangian
∑i
p(xi) li + λ ∑i
b−li
w.r.t. l gives codewordlengths for “shortest” code:
li = − logb p(xi)
Average Codeword Length
= ∑i
p(xi) · li = −∑x
p(x) log p(x)
In the following, assume binary log.
Daniel Polani Information Theory in Intelligent Decision Making
Considerations — Most compact code
Assume
Want to code stream of eventsx ∈ X appearing with probabilityp(x).
Note
1 try to make li as small aspossible
2 make b−li as large as possible3 limited by Kraft inequality;
ideally becoming equality
∑i
b−li = 1
as li are integers, that’s typically not exact
Minimize
Average code length:E[L] = ∑i p(xi) li under
constraint ∑i b−li != 1
Result
Differentiating Lagrangian
∑i
p(xi) li + λ ∑i
b−li
w.r.t. l gives codewordlengths for “shortest” code:
li = − logb p(xi)
Average Codeword Length
= ∑i
p(xi) · li = −∑x
p(x) log p(x)
In the following, assume binary log.
Daniel Polani Information Theory in Intelligent Decision Making
Entropy
Def.: Entropy
Consider the random variable X. Then the entropy H(X) of X isdefined as
H(X)
[≡ H(p)]
:= −∑x
p(x) log p(x)
with convention 0 log 0 ≡ 0
Interpretations
average optimal codeword lengthuncertainty (about next sample of X)physical entropymuch more . . .
Quote
“Why don’t you call it entropy. In the first place, a mathematicaldevelopment very much like yours already exists in Boltzmann’sstatistical mechanics, and in the second place, no one understandsentropy very well, so in any discussion you will be in a position ofadvantage.”
John von Neumann
Daniel Polani Information Theory in Intelligent Decision Making
Entropy
Def.: Entropy
Consider the random variable X. Then the entropy H(X) of X isdefined as
H(X)
[≡ H(p)]
:= −∑x
p(x) log p(x)
with convention 0 log 0 ≡ 0
Interpretations
average optimal codeword lengthuncertainty (about next sample of X)physical entropymuch more . . .
Quote
“Why don’t you call it entropy. In the first place, a mathematicaldevelopment very much like yours already exists in Boltzmann’sstatistical mechanics, and in the second place, no one understandsentropy very well, so in any discussion you will be in a position ofadvantage.”
John von Neumann
Daniel Polani Information Theory in Intelligent Decision Making
Entropy
Def.: Entropy
Consider the random variable X. Then the entropy H(X) of X isdefined as
H(X)[≡ H(p)] := −∑x
p(x) log p(x)
with convention 0 log 0 ≡ 0
Interpretations
average optimal codeword lengthuncertainty (about next sample of X)physical entropymuch more . . .
Quote
“Why don’t you call it entropy. In the first place, a mathematicaldevelopment very much like yours already exists in Boltzmann’sstatistical mechanics, and in the second place, no one understandsentropy very well, so in any discussion you will be in a position ofadvantage.”
John von Neumann
Daniel Polani Information Theory in Intelligent Decision Making
Meditation
Probability/Code Mismatch
Consider events x following a probability p(x), but modelerassuming mistakenly probability q(x), with optimal code lengths− log q(x). Then “code length waste per symbol” given by
−∑x
p(x) log q(x) + ∑x
p(x) log p(x)
= ∑x
p(x) logp(x)q(x)
= DKL(p||q)
Daniel Polani Information Theory in Intelligent Decision Making
Part II
Types
Daniel Polani Information Theory in Intelligent Decision Making
A Tip of Types(Cover and Thomas, 2006)
Method of Types: Motivation
consider sequences with same empirical distribution
how many of these with a particular distribution
probability of such a sequence
Sketch of the Method
consider binary event set X = 0, 1w.l.o.g.
consider sample x(n) = (x1, . . . , xn) ∈ X n
the type p(n)x is the empirical distribution of symbols y ∈ X in
sample x(n). I.e. px(n)(y) counts how often symbol y appears
in x(n). Let Pn be set of types with denominator n.or dividing n
for p ∈ Pn, call the set of all sequences x(n) ∈ X n with type pthe type class C(p) = x(n)|px(n) = p.
Daniel Polani Information Theory in Intelligent Decision Making
Type Theorem
Type Count
If |X | = 2, one has |Pn| = n + 1 different types for sequences oflength n.easy to generalize
Important
Pn grows only polynomially, but X n grows exponentially with n.It follows that (at least one) type must contain exponentially manysequences. This corresponds to the “macrostate” in physics.
Theorem(Cover and Thomas, 2006)
If x1, x2, . . . , xn is an i.i.d. drawn sample sequence drawn from q,then the probability of x(n) depends only on its type and is given by
2−n[H(px(n)
)+DKL(px(n)||q)]
Corollary
If x(n) has type q, then its probability is given by
2−nH(q)
A large value of H(q) indicates many possible candidates x(n) andhigh uncertainty, a small value few candidates and low uncertainty.
here, we interpret probability q as type
Daniel Polani Information Theory in Intelligent Decision Making
Part III
Laplace’s Principle and Friends
Daniel Polani Information Theory in Intelligent Decision Making
Laplace’s Principle of Insufficient Reason I
Scenario
Consider X . A probability distribution is assumed on X , but it isunknown.Laplace’s principle of insufficient reason states that, in absence ofany reason to assume that the outcomes are inequivalent, theprobability distribution on X is assumed as equidistribution.
Question
How to generalize when something is known?
Daniel Polani Information Theory in Intelligent Decision Making
Answer: Types
Dominant Sample Sequence
Remember: sequence probability of sequences in type class C(q)
2−nH(q)
A priori, a probability q maximizing H(q) will generate dominatingsequence types dominating all others.
Maximum Entropy Principle
Maximize: H(q) with respect to qResult: equidistribution q(x) = 1
|X |
Daniel Polani Information Theory in Intelligent Decision Making
Sanov’s Theorem I
Theorem
Consider i.i.d. sequenceX1, X2, . . . , Xn of random variables,distributed according to q(X). Letfurther E be a set of probabilitydistributions.
Then (amongst other), if E is closedand with p∗ = arg minp∈E D(p||q),one has
1n
log q(n)(E) −→ −D(p∗||q)
p∗E
q
Daniel Polani Information Theory in Intelligent Decision Making
Sanov’s Theorem II
Interpretation
p is unknown, but one knows constraints for p (e.g. some
condition, such as some mean value U != ∑x p(x)U(x) must be
attained, i.e. the set E is given), then the dominating types arethose close to p∗.
Special Case
if prior q is equidistribution (indifference), then minimizing D(p||q)under constraints E is equivalent to maximizing H(p) under theseconstraints.
Jaynes’ Maximum Entropy Principle
Daniel Polani Information Theory in Intelligent Decision Making
Sanov’s Theorem III
Jaynes’ Principle
generalization of Laplace’s Principle
maximally uncommitted distribution
Daniel Polani Information Theory in Intelligent Decision Making
Maximum Entropy Distributions INo constraints
We are interested in maximizing
H(X) = −∑x
p(x) log p(x)
over all probabilities p. The probability p lives in the simplex
∆ = q ∈ R|X ||∑i qi = 1, qi ≥ 0
The maximization requires to respect constraints, of which we now
consider only ∑x p(x) != 1.
The edge constraints happen not to be invoked here.
Daniel Polani Information Theory in Intelligent Decision Making
Maximum Entropy Distributions IINo constraints
Unconstrained maximization via Lagrange:
maxp
[−∑x
p(x) log p(x) + λ ∑x
p(x)]
Taking derivative ∇p(x) gives
− log p(x)− 1 + λ!= 0
. Thus p(x) = eλ−1 ≡ 1/|X | — equidistribution
Daniel Polani Information Theory in Intelligent Decision Making
Maximum Entropy DistributionsLinear Constraints
Constraints are now
∑x
p(x) != 1
∑x
p(x) f (x) != f .
Derive Lagrangian
0 =
∇P[
−∑x
p(x) log p(x) + λ ∑x
p(x) + µ ∑x
p(x) f (x)
]
− log p(x)− 1 + λ + µ f (x) = 0
so that one has
Boltzmann/Gibbs Distribution
p(x) = eλ−1+µ f (x)
=1Z
eµ f (x)
Daniel Polani Information Theory in Intelligent Decision Making
Maximum Entropy DistributionsLinear Constraints
Constraints are now
∑x
p(x) != 1
∑x
p(x) f (x) != f .
Derive Lagrangian
0 = ∇P[−∑x
p(x) log p(x) + λ ∑x
p(x) + µ ∑x
p(x) f (x)]
− log p(x)− 1 + λ + µ f (x) = 0
so that one has
Boltzmann/Gibbs Distribution
p(x) = eλ−1+µ f (x)
=1Z
eµ f (x)
Daniel Polani Information Theory in Intelligent Decision Making
Part IV
Kullback-Leibler and Friends
Daniel Polani Information Theory in Intelligent Decision Making
Conditional Kullback-Leibler
DKL can be conditional
DKL [p(Y|x)||q(Y|x)]DKL [p(Y|X)||q(Y||X)] = ∑
xp(x)DKL [p(Y|x)||q(Y|x)]
Daniel Polani Information Theory in Intelligent Decision Making
Kullback-Leibler and Bayes(Biehl, 2013)
Want to estimate p(x|θ), where θ is the parameter. Observe y.Seek “best” q(x|y) for this y in the following sense:
1 minimize DKL of true distribution to model distribution q
2 averaged over possible observations y3 averaged over θ
minq
∫dθ p(θ) ∑
yp(y|θ)
DKL[p(x|θ)||q(x|y)]
Result
q(x|y) is the Bayesian inference obtained from p(y|x) and p(x)
Daniel Polani Information Theory in Intelligent Decision Making
Kullback-Leibler and Bayes(Biehl, 2013)
Want to estimate p(x|θ), where θ is the parameter. Observe y.Seek “best” q(x|y) for this y in the following sense:
1 minimize DKL of true distribution to model distribution q2 averaged over possible observations y
3 averaged over θ
minq
∫dθ p(θ)
∑y
p(y|θ) DKL[p(x|θ)||q(x|y)]
Result
q(x|y) is the Bayesian inference obtained from p(y|x) and p(x)
Daniel Polani Information Theory in Intelligent Decision Making
Kullback-Leibler and Bayes(Biehl, 2013)
Want to estimate p(x|θ), where θ is the parameter. Observe y.Seek “best” q(x|y) for this y in the following sense:
1 minimize DKL of true distribution to model distribution q2 averaged over possible observations y3 averaged over θ
minq
∫dθ p(θ) ∑
yp(y|θ) DKL[p(x|θ)||q(x|y)]
Result
q(x|y) is the Bayesian inference obtained from p(y|x) and p(x)
Daniel Polani Information Theory in Intelligent Decision Making
Kullback-Leibler and Bayes(Biehl, 2013)
Want to estimate p(x|θ), where θ is the parameter. Observe y.Seek “best” q(x|y) for this y in the following sense:
1 minimize DKL of true distribution to model distribution q2 averaged over possible observations y3 averaged over θ
minq
∫dθ p(θ) ∑
yp(y|θ) DKL[p(x|θ)||q(x|y)]
Result
q(x|y) is the Bayesian inference obtained from p(y|x) and p(x)
Daniel Polani Information Theory in Intelligent Decision Making
Conditional Entropies
Special Case: Conditional Entropy
H(Y|X = x) := −∑y
p(y|x) log p(y|x)
H(Y|X) := −∑x
p(x)∑y
p(y|x) log p(y|x)
Information
Reduction of entropy (uncertainty) by knowing another variable
I(X; Y) := H(Y)− H(Y|X)
= H(X)− H(X|Y)= H(X) + H(Y)− H(X, Y)= DKL[p(x, y)||p(x)p(y)]
Daniel Polani Information Theory in Intelligent Decision Making
Part V
Towards Reality
Daniel Polani Information Theory in Intelligent Decision Making
Rate/Distortion TheoryCode below specifications
Reminder
Information is about sending messages. We considered mostcompact codes over a given noiseless channel. Now consider thesituation where either:
1 channel is not noiseless but has noisy characteristics p(x|x) or
2 we cannot afford to spend average of H(X) bits per symbolto transmit
Question
What happens? Total collapse of transmission
Daniel Polani Information Theory in Intelligent Decision Making
Rate/Distortion Theory IDistortion
“Compromise”
don’t longer insist on perfect transmission
accept compromise, measure distortion d(x, x) betweenoriginal x and transmitted xsmall distortion good, large distortion “baaad”
Theorem: Rate Distortion Function
Given p(x) for generation of symbols X,
R(D) := minp(x|x)
E[d(X,X)]=D
I(X; X)
where the mean is over p(x, x) = p(x|x)p(x).
Daniel Polani Information Theory in Intelligent Decision Making
Rate/Distortion Theory IIDistortion
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
0 0.2 0.4 0.6 0.8 1
r(x)
Daniel Polani Information Theory in Intelligent Decision Making
First Example: Infotaxis(Vergassola et al., 2007)
Daniel Polani Information Theory in Intelligent Decision Making
Information Theoryin Intelligent Decision Making
Applications
Daniel Polani
Adaptive Systems and Algorithms Research GroupsSchool of Computer Science
University of Hertfordshire, United Kingdom
March 5, 2015
Daniel Polani Information Theory in Intelligent Decision Making
Thank You
Information-theoretic PA-LoopInvariants,EmpowermentAlexander Klyubin
Relevant InformationChrystopher NehanivNaftali TishbyThomas MartinetzJan Kim
DigestedInformationChristoph Salge
ContinuousEmpowermentTobias JungPeter Stone
Christoph SalgeCornelius GlackinEC (FEELIXGROWING, FP6),NSF, ONR, DARPA,FHA
CollectiveEmpowermentPhilippe Capdepuy
Collective SystemsMalte Harder
World Structure,Graphs,Empowerment inGamesTom Anthony
Sensor Evolution,Informationdistribution over thePA-LoopSander van DijkAlexandra MarkAchim Liese
Information Flow,PA-Loop ModelsNihat Ay
FurtherContributionsMikhail ProkopenkoLars OlssonPhilippe CapdepuyMalte HarderSimon McGregor
This work was partially supported byFP7 ICT-270219
Daniel Polani Information Theory in Intelligent Decision Making
Part VI
Crash Introduction
Daniel Polani Information Theory in Intelligent Decision Making
Modelling Cognition: Motivation from Biology
Question
Why/how did cognition evolve in biology?
Observations in biology
sensors often highly optimized:
detection of few molecules (moths)(Dusenbery, 1992)
detection of few or individual photons (humans/toads)(Hecht et al., 1942; Baylor et al., 1979)
auditive sense operates close to thermal noise(Denk and Webb, 1989)
cognitive processing very expensive(Laughlin et al., 1998; Laughlin, 2001)
Daniel Polani Information Theory in Intelligent Decision Making
Conclusions
Evidence
sensors often operate at physical limitsevolutionary pressure for high cognitive functions
But What For?
close the cycleactions matter
Entscheidend ist, was hinten rauskommt.
Trade-Offs
sharpening sensorsimprove processingboosting actuators
Was man nicht im Kopf hat, muss man in denBeinen haben.
Daniel Polani Information Theory in Intelligent Decision Making
Conclusions
Evidence
sensors often operate at physical limitsevolutionary pressure for high cognitive functions
But What For?
close the cycleactions matter
Entscheidend ist, was hinten rauskommt.
Trade-Offs
sharpening sensorsimprove processingboosting actuators
Was man nicht im Kopf hat, muss man in denBeinen haben.
Daniel Polani Information Theory in Intelligent Decision Making
Conclusions
Evidence
sensors often operate at physical limitsevolutionary pressure for high cognitive functions
But What For?
close the cycleactions matter
Entscheidend ist, was hinten rauskommt.
Trade-Offs
sharpening sensorsimprove processingboosting actuators
Was man nicht im Kopf hat, muss man in denBeinen haben.
Daniel Polani Information Theory in Intelligent Decision Making
Part VII
Information
Daniel Polani Information Theory in Intelligent Decision Making
Decisions, Decisions
Challenge
Linking sensors, processing and actuators
The Physical and the Biological
Physics: given dynamical equations etc.known (in principle)
Biological Cognition: no established unique modelcomplex, difficult to untangle
Robotic Cognition: many near-equivalent incompatiblesolutions and architecturesoften specific and hand-designed
Problem
Considerable arbitrariness in treatment of cognition
Daniel Polani Information Theory in Intelligent Decision Making
Decisions, Decisions
Challenge
Linking sensors, processing and actuators
The Physical and the Biological
Physics: given dynamical equations etc.known (in principle)
Biological Cognition: no established unique modelcomplex, difficult to untangle
Robotic Cognition: many near-equivalent incompatiblesolutions and architecturesoften specific and hand-designed
Problem
Considerable arbitrariness in treatment of cognition
Daniel Polani Information Theory in Intelligent Decision Making
Idea
Issues
uniform treatment of cognition
distinguish:
essentialincidental
aspects of computation
Proposal: “Covariant” Modeling of Computation
Physics: observations may depend on “coordinate system”for same underlying phenomenon
Cognition: computation may depend on architecturebut essentially computes “the same concepts”
Bottom Line
“coordinate-” (mechanism-)free view of cognition?
Daniel Polani Information Theory in Intelligent Decision Making
Landauer’s Principle
Fundamental Limits for Information Processing
On lowest level: cannot fully separate physics and informationprocessing
Consequence: erasure of information from a “memory” createsheat
Connection: of energy and information
(Wt, Mt)
Wt
Mt
(Wt+1, Mt+1)
Wt+1
Mt+1
Daniel Polani Information Theory in Intelligent Decision Making
Informational Invariants: Beyond Physics
Law of Requisite Variety(Ashby, 1956; Touchette and Lloyd, 2000, 2004)
Ashby: “only variety can destroy variety”
extension by Touchette/Lloyd
Open-Loop Controller: max. entropy reduction∆H∗open
Closed-Loop Controller: max. entropy reduction∆Hclosed ≤ ∆H∗open + I(Wt; At)
. . . Wt−3
At−3
Wt−2
At−2
Wt−1
At−1
Wt
At
Wt+1
At+1
Wt+2. . .
Daniel Polani Information Theory in Intelligent Decision Making
Informational Invariants: Beyond Physics
Law of Requisite Variety(Ashby, 1956; Touchette and Lloyd, 2000, 2004)
Ashby: “only variety can destroy variety”
extension by Touchette/Lloyd
Open-Loop Controller: max. entropy reduction∆H∗open
Closed-Loop Controller: max. entropy reduction∆Hclosed ≤ ∆H∗open + I(Wt; At)
. . . Wt−3
At−3
Wt−2
At−2
Wt−1
At−1
Wt
At
Wt+1
At+1
Wt+2. . .
Daniel Polani Information Theory in Intelligent Decision Making
Informational Invariants: Scenario
Core Statement
Task: consider e.g. navigational task
Informationally: reduction of entropy of initial (arbitrary) state
Example:
−10 −5 0 5 10
−10
−5
0
5
10
\tex[c][c][1][0]x
\te
x[c
][c][1
][0
]y
Daniel Polani Information Theory in Intelligent Decision Making
Information Bookkeeping
Bayesian Network
. . .Wt−3
St−3
. . . Mt−3
At−3
Wt−2
St−2
Mt−2
At−2
Wt−1
St−1
Mt−1
At−1
Wt
St
Mt
At
Wt+1
St+1
Mt+1. . .
At+1
Wt+2. . .
Informational “Conservation Laws”
Total Sensor History: S(t) = (S0, S1, . . . , St−1)
Result:limt→∞
I(S(t); W0) = H(W0)
(Klyubin et al., 2007), and see also (Ashby, 1956; Touchette and Lloyd, 2000, 2004)
Daniel Polani Information Theory in Intelligent Decision Making
Information Bookkeeping
Bayesian Network
. . .Wt−3
St−3 At−3
Wt−2
St−2 At−2
Wt−1
St−1 At−1
Wt
St At
Wt+1
St+1 At+1
Wt+2. . .
Informational “Conservation Laws”
Total Sensor History: S(t) = (S0, S1, . . . , St−1)
Result:limt→∞
I(S(t); W0) = H(W0)
(Klyubin et al., 2007), and see also (Ashby, 1956; Touchette and Lloyd, 2000, 2004)
Daniel Polani Information Theory in Intelligent Decision Making
Information Bookkeeping
Bayesian Network
. . . Wt−3
St−3 At−3
Wt−2
St−2 At−2
Wt−1
St−1 At−1
Wt
St At
Wt+1
St+1 At+1
Wt+2. . .
Informational “Conservation Laws”
Total Sensor History: S(t) = (S0, S1, . . . , St−1)
Result:limt→∞
I(S(t); W0) = H(W0)
(Klyubin et al., 2007), and see also (Ashby, 1956; Touchette and Lloyd, 2000, 2004)
Daniel Polani Information Theory in Intelligent Decision Making
Observations
Key Motto
There is no perpetuum mobile of 3rd kind.
Actually, rather, there may be no free lunch, but sometimes there is free beer.
Information Balance Sheet
Task Invariant: H(W0) determines minimum informationrequired to get to center
Task Variant: but can be spread/concentrated differently over
timeenvironment and agents (“stigmergy”)sensors and memory
(Klyubin et al., 2004a,b, 2007; van Dijk et al., 2010)
Note: invariance is purely entropic: indifferent to task
Next Step
refine towards specific tasks
Daniel Polani Information Theory in Intelligent Decision Making
Observations
Key Motto
There is no perpetuum mobile of 3rd kind.Actually, rather, there may be no free lunch, but sometimes there is free beer.
Information Balance Sheet
Task Invariant: H(W0) determines minimum informationrequired to get to center
Task Variant: but can be spread/concentrated differently over
timeenvironment and agents (“stigmergy”)sensors and memory
(Klyubin et al., 2004a,b, 2007; van Dijk et al., 2010)
Note: invariance is purely entropic: indifferent to task
Next Step
refine towards specific tasks
Daniel Polani Information Theory in Intelligent Decision Making
Information for Decision Making
Replace gradient follower by general policy π
Dynamics
. . . St−2
At−2
St−1
π
At−1
St
π
At
St+1
π
At+1
St+2. . .
π
Utility
Vπ(s) := Eπ[Rt + Rt+1 + · · · | s]
= ∑a
π(a|s)∑s′
Pass′[Ra
ss′ + V(s′)]
Daniel Polani Information Theory in Intelligent Decision Making
A Parsimony Principle
Traditional MDP
Task: find best policy π∗
Traditional RL: does not consider decision costs
Credo: information processing expensive in biology!(Laughlin et al., 1998; Laughlin, 2001; Polani, 2009)
Hypothesis: organisms trade off information-processing costs withtask payoff(Tishby and Polani, 2011; Polani, 2009; Laughlin, 2001)
Therefore: include information cost and expand to I-MDP(Polani et al., 2006; Tishby and Polani, 2011)
Principle of Information Parsimony
minimize I(S; A) (relevant information) at fixed utility level
Daniel Polani Information Theory in Intelligent Decision Making
Motto
It is a very sad thing that nowadays there is so little uselessinformation.
Oscar Wilde
Daniel Polani Information Theory in Intelligent Decision Making
Relevant Information and its Policies
Computation
Via Lagrangian formalism:(Stratonovich, 1965; Polani et al., 2006; Belavkin, 2008, 2009; Still and Precup, 2012; Saerens et al., 2009; Tishbyand Polani, 2011)
find:min
π
(I(S; A)− βE[Vπ(S)]
)β→ ∞: policy is optimal while informationally parsimonious!
β finite: policy suboptimal at fixed level E[Vπ(S)] whileinformationally parsimonious
I(S; A) as well as Vπ depend on π
Expectation
for higher utility, more relevant information required
and vice versa
Daniel Polani Information Theory in Intelligent Decision Making
Experiments
Scenario
Define (Pass′ , Ra
ss′) by:
States: grid worldActions: north, east, south,
west
Reward: action produces a“reward” of -1 untilgoal reached
Experiment
Trade off utility and relevantinformation
Question
Form of expected trade-off?
B
A
Daniel Polani Information Theory in Intelligent Decision Making
Experiment — Find the Corner
-50
-40
-30
-20
-10
0
0 0.2 0.4 0.6 0.8 1 1.2
E[Q(S,A)]
I(S;A)
Optimal Case
goal B has higher utilitythan A
but needs a lot moreinformation per step
Suboptimal Case
goal B much worse thangoal A
for same information cost
Daniel Polani Information Theory in Intelligent Decision Making
Experiment — Find the Corner
-50
-40
-30
-20
-10
0
0 0.2 0.4 0.6 0.8 1 1.2
E[Q(S,A)]
I(S;A)
Optimal Case
goal B has higher utilitythan A
but needs a lot moreinformation per step
Suboptimal Case
goal B much worse thangoal A
for same information cost
Daniel Polani Information Theory in Intelligent Decision Making
Experiment — With a Twist I
Experiment Revisited
grid-world again
consider only goal A
cost as before
The “Twist”(Polani, 2011)
permute directions north, east, south, west!
random fixed permutation of directions for each state
replace (Pass′ , Ra
ss′) by (Pass′ , Ra
ss′) where
Pass′ := Pσs(a)
ss′
Rass′ := Rσs(a)
ss′
Daniel Polani Information Theory in Intelligent Decision Making
Experiment — With a Twist II
Expectation
as a traditional MDP, “twisted” MDP (Pass′ , Ra
ss′) remainsexactly equivalent:
same optimal values
V∗(s) = V∗(s), s ∈ S
same optimal policy after undoing twistpre-/post-twist policies equivalent via
Qπ(s, a) = Qπ(s, σs(a))π(s, a) = π(s, σs(a))
And as I-MDP?
Daniel Polani Information Theory in Intelligent Decision Making
Experiment With a Twist: Uh-Oh!
-50
-40
-30
-20
-10
0
0 0.2 0.4 0.6 0.8 1 1.2
E[V(S)]
I(S;A)
Optimal Case
sanity check: utility samefor original and twisted
but latter needs a lotmore information per step
Suboptimal Case
twisted MDP becomesmuch worse than original
at same information cost
Daniel Polani Information Theory in Intelligent Decision Making
Experiment With a Twist: Uh-Oh!
-50
-40
-30
-20
-10
0
0 0.2 0.4 0.6 0.8 1 1.2
E[V(S)]
I(S;A)
Optimal Case
sanity check: utility samefor original and twisted
but latter needs a lotmore information per step
Suboptimal Case
twisted MDP becomesmuch worse than original
at same information cost
Daniel Polani Information Theory in Intelligent Decision Making
Intermediate Conclusions
Insights
as traditional MDP both experiments fully equivalent
as I-MDP, however . . .
significant difference between
agent “taking actions with it” andhaving “realigned” set of actions at each step
embodiment allows to offload informational effort(eg. Paul, 2006; Pfeifer and Bongard, 2007)
Daniel Polani Information Theory in Intelligent Decision Making
Part VIII
Goal-Relevant Information
Daniel Polani Information Theory in Intelligent Decision Making
Towards Multiple Goals
Extension
assume family of tasks (e.g. multiple goals)
action now depends on both state and goals
St−1
At−1
St
At
St+1
G
Goal-Relevant Information
I(G; At|st) = H(At|st)− H(At|Gt, st)
Daniel Polani Information Theory in Intelligent Decision Making
Towards Multiple Goals
Extension
assume family of tasks (e.g. multiple goals)
action now depends on both state and goals
St−1
At−1
St
At
St+1
G
Goal-Relevant Information (Regularized)
minπ(at|st,g)
(I(G; At|St)− βE[Vπ(St, G, At)]
)Daniel Polani Information Theory in Intelligent Decision Making
Goal-Relevant Information
I(G; At|st)
Daniel Polani Information Theory in Intelligent Decision Making
Goal-Relevant and Sensor Information Trade-Offs
0
0.1
0.2
0.3
0.4
0.5
0.6
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
I(S t
;At|G
)
I(G; At|St)
α = 0
α = 1
Lagrangian
minπ(at|st,g)
[(1− α)I(G; At|St) + αI(St; At|G)− βE[Vπ(St, G, At)]
]Daniel Polani Information Theory in Intelligent Decision Making
Information “Caching”
Note
not only the how much of goal-relevant information matters
but also the which
Consider
Accessible History (Context): e.g.
At−1 = (A0, A1, . . . , At−1)
“Cache Fetch”: new goal-relevant information not already used
I(At; G|At−1) = H(At|At−1)− H(At|G, At−1)
Daniel Polani Information Theory in Intelligent Decision Making
Subgoals
I(At; G|At−1, s) I(At−1; G|At, s)new goal information discarded goal information
(van Dijk and Polani, 2011; van Dijk and Polani, 2013)
Psychological Connections?
Crossing doors causes forgetting(see also Radvansky et al., 2011)
Daniel Polani Information Theory in Intelligent Decision Making
Subgoals
I(At; G|At−1, s) I(At−1; G|At, s)new goal information discarded goal information
(van Dijk and Polani, 2011; van Dijk and Polani, 2013)
Psychological Connections?
Crossing doors causes forgetting(see also Radvansky et al., 2011)
Daniel Polani Information Theory in Intelligent Decision Making
Efficient Relevant Goal Information(van Dijk and Polani, 2013)
“Most Efficient” Goal
G −→ G1 −→ A←− Smin
I(G1;A|S)≥CI(G; G1)
February 14, 2013 17:28 WSPC/INSTRUCTION FILE acs12
14 S.G. van Dijk and D. Polani
(a) |G1| = 3 (b) |G1| = 4
(c) |G1| = 5 (d) |G1| = 6
Fig. 8: Goal clusters induced by the bottleneck G1 on the primary goal-information
pathway in a 6-room grid world navigation task. Figures (a) to (d) show the map-
pings for increasing cardinality of the bottleneck variable.
distribution for this pathway:
minp(g2|g)
I(G; G2) subj. to I(St; At|G2) ≥ CI2(7)
5.1. Observation 4: Natural Abstraction
Firstly we will study the primary pathway, constrained with bottleneck G1, and
solve (6) to find the goal mapping induced by this bottleneck on the pathway. Figure
8 shows such mappings found for different capacities of the bottleneck variable
in a 6-room grid navigation scenario, with the lower bound CI1fixed as high as
possible (see Appendix C for more details), such that the clustering becomes most
informative.
One result to note is that the stringent lower bound results in a hard clustering:
each goal is deterministicly mapped to a single element in G1. Secondly, the map-
ping adheres to the local connectivity of goals: goal states in the same cluster are
connected directly in the transition graph of the MDP.
Moreover, the clustering also attempts to adhere to the physical boundaries of
Daniel Polani Information Theory in Intelligent Decision Making
Efficient Relevant Goal Information(van Dijk and Polani, 2013)
“Most Efficient” Goal
G −→ G1 −→ A←− Smin
I(G1;A|S)≥CI(G; G1)
February 14, 2013 17:28 WSPC/INSTRUCTION FILE acs12
14 S.G. van Dijk and D. Polani
(a) |G1| = 3 (b) |G1| = 4
(c) |G1| = 5 (d) |G1| = 6
Fig. 8: Goal clusters induced by the bottleneck G1 on the primary goal-information
pathway in a 6-room grid world navigation task. Figures (a) to (d) show the map-
pings for increasing cardinality of the bottleneck variable.
distribution for this pathway:
minp(g2|g)
I(G; G2) subj. to I(St; At|G2) ≥ CI2(7)
5.1. Observation 4: Natural Abstraction
Firstly we will study the primary pathway, constrained with bottleneck G1, and
solve (6) to find the goal mapping induced by this bottleneck on the pathway. Figure
8 shows such mappings found for different capacities of the bottleneck variable
in a 6-room grid navigation scenario, with the lower bound CI1fixed as high as
possible (see Appendix C for more details), such that the clustering becomes most
informative.
One result to note is that the stringent lower bound results in a hard clustering:
each goal is deterministicly mapped to a single element in G1. Secondly, the map-
ping adheres to the local connectivity of goals: goal states in the same cluster are
connected directly in the transition graph of the MDP.
Moreover, the clustering also attempts to adhere to the physical boundaries of
Daniel Polani Information Theory in Intelligent Decision Making
Making State Predictive for Actions
“Most Enhancive” Goal
G −→ G2 −→ A←− Smin
I(S;A|G2)≥CI(G; G2)
February 14, 2013 17:28 WSPC/INSTRUCTION FILE acs12
Informational Constraints-Driven Organization in Goal-Directed Behavior 17
(a) |G| = 4 (b) |G| = 5
(c) |G| = 6 (d) |G| = 7
Fig. 10: Goal clusters induced by the bottleneck G2 on the secondary, state-
information modulating goal-information pathway in a 9-room grid world navi-
gation task. Figures (a) to (d) show the mappings for increasing cardinality of the
bottleneck variable.
It is important to note that the global relations between states and goals is
strongly determined by the set of available actions. Consider for instance the subset
of ’north-eastern goals’, i.e. those shaded darkest in Fig. 10a. Knowing that the goal
is in this subset allows the agent to use state knowledge to make the informative
distinction of whether the goal is likely to the north or to the west. But this dis-
tinction is only relevant because the agent has access to distinct actions that define
these directions. Differently defined actions would induce other relations with goals,
and likely a different factorization would appear as goal-based frame of reference. In
the extreme, a much less structured set of actions can have a strong adverse effect
on informational requirements [16], most probably making it difficult to construct
a useful abstraction in constrained pathways.
Daniel Polani Information Theory in Intelligent Decision Making
Making State Predictive for Actions
“Most Enhancive” Goal
G −→ G2 −→ A←− Smin
I(S;A|G2)≥CI(G; G2)
February 14, 2013 17:28 WSPC/INSTRUCTION FILE acs12
Informational Constraints-Driven Organization in Goal-Directed Behavior 17
(a) |G| = 4 (b) |G| = 5
(c) |G| = 6 (d) |G| = 7
Fig. 10: Goal clusters induced by the bottleneck G2 on the secondary, state-
information modulating goal-information pathway in a 9-room grid world navi-
gation task. Figures (a) to (d) show the mappings for increasing cardinality of the
bottleneck variable.
It is important to note that the global relations between states and goals is
strongly determined by the set of available actions. Consider for instance the subset
of ’north-eastern goals’, i.e. those shaded darkest in Fig. 10a. Knowing that the goal
is in this subset allows the agent to use state knowledge to make the informative
distinction of whether the goal is likely to the north or to the west. But this dis-
tinction is only relevant because the agent has access to distinct actions that define
these directions. Differently defined actions would induce other relations with goals,
and likely a different factorization would appear as goal-based frame of reference. In
the extreme, a much less structured set of actions can have a strong adverse effect
on informational requirements [16], most probably making it difficult to construct
a useful abstraction in constrained pathways.
Daniel Polani Information Theory in Intelligent Decision Making
Making State Predictive for Actions
“Most Enhancive” Goal
G −→ G2 −→ A←− Smin
I(S;A|G2)≥CI(G; G2)
Insights
“spillover” ignoring localboundaries
action informationinduces global “frame ofreference”
depends on actionconsistency
February 14, 2013 17:28 WSPC/INSTRUCTION FILE acs12
Informational Constraints-Driven Organization in Goal-Directed Behavior 17
(a) |G| = 4 (b) |G| = 5
(c) |G| = 6 (d) |G| = 7
Fig. 10: Goal clusters induced by the bottleneck G2 on the secondary, state-
information modulating goal-information pathway in a 9-room grid world navi-
gation task. Figures (a) to (d) show the mappings for increasing cardinality of the
bottleneck variable.
It is important to note that the global relations between states and goals is
strongly determined by the set of available actions. Consider for instance the subset
of ’north-eastern goals’, i.e. those shaded darkest in Fig. 10a. Knowing that the goal
is in this subset allows the agent to use state knowledge to make the informative
distinction of whether the goal is likely to the north or to the west. But this dis-
tinction is only relevant because the agent has access to distinct actions that define
these directions. Differently defined actions would induce other relations with goals,
and likely a different factorization would appear as goal-based frame of reference. In
the extreme, a much less structured set of actions can have a strong adverse effect
on informational requirements [16], most probably making it difficult to construct
a useful abstraction in constrained pathways.
Daniel Polani Information Theory in Intelligent Decision Making
Part IX
Empowerment: Motivation
Daniel Polani Information Theory in Intelligent Decision Making
Universal Utilities
Problems
in biology, success criterium is survivalconcept of a “task” and “reward” is not sharp“search space” too large for full-fledged success feedback
pure Darwinism: feedback by deaththis is very sparse
Notes
Homeostasis: provides dense networks to guide living beingsProblem:
specific to particular organismsdesigned on case-to-case basis for artificial agentsmore generalizable perspective in view of success ofevolution?
Daniel Polani Information Theory in Intelligent Decision Making
Universal Utilities
Problems
in biology, success criterium is survivalconcept of a “task” and “reward” is not sharp“search space” too large for full-fledged success feedbackpure Darwinism: feedback by death
this is very sparse
Notes
Homeostasis: provides dense networks to guide living beingsProblem:
specific to particular organismsdesigned on case-to-case basis for artificial agentsmore generalizable perspective in view of success ofevolution?
Daniel Polani Information Theory in Intelligent Decision Making
Universal Utilities
Problems
in biology, success criterium is survivalconcept of a “task” and “reward” is not sharp“search space” too large for full-fledged success feedbackpure Darwinism: feedback by deaththis is very sparse
Notes
Homeostasis: provides dense networks to guide living beingsProblem:
specific to particular organismsdesigned on case-to-case basis for artificial agentsmore generalizable perspective in view of success ofevolution?
Daniel Polani Information Theory in Intelligent Decision Making
Idea
Universal Drives and Utilities
Core Idea: adaptational feedback should be dense and rich
artificial curiosity, learning progress, autotelic principle,intrinsic reward(Schmidhuber, 1991; Kaplan and Oudeyer, 2004; Steels, 2004; Singh et al., 2005)
homeokinesis, and predictive information(Der, 2001; Ay et al., 2008)
Physical Principle:
causal entropic forcing(Wissner-Gross and Freer, 2013)
Daniel Polani Information Theory in Intelligent Decision Making
Present Ansatz
Use Embodiment
optimize informational fit into the sensorimotor niche
maximization of potentialto inject information into the environment (via actuators)and recapture it from the environment (via sensors)
(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)
Daniel Polani Information Theory in Intelligent Decision Making
Here: Empowerment
Motto
“Being in control of one’s destiny
and knowing it
is good.”(Jung et al., 2011)
More Precisely
information-theoretic version of
controllability (being in control of destiny)
observability (knowing about it)
combined
Daniel Polani Information Theory in Intelligent Decision Making
Here: Empowerment
Motto
“Being in control of one’s destinyand knowing it
is good.”(Jung et al., 2011)
More Precisely
information-theoretic version of
controllability (being in control of destiny)
observability (knowing about it)
combined
Daniel Polani Information Theory in Intelligent Decision Making
Here: Empowerment
Motto
“Being in control of one’s destinyand knowing it
is good.”(Jung et al., 2011)
More Precisely
information-theoretic version of
controllability (being in control of destiny)
observability (knowing about it)
combined
Daniel Polani Information Theory in Intelligent Decision Making
Formalism
Bayesian Network
. . .Wt−3
St−3
. . . Mt−3
At−3
Wt−2
St−2
Mt−2
At−2
Wt−1
St−1
Mt−1
At−1
Wt
St
Mt
At
Wt+1
St+1
Mt+1. . .
At+1
Wt+2. . .
(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)
Daniel Polani Information Theory in Intelligent Decision Making
Formalism
Bayesian Network
. . .Wt−3
St−3 At−3
Wt−2
St−2 At−2
Wt−1
St−1 At−1
Wt
St At
Wt+1
St+1 At+1
Wt+2. . .
(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)
Daniel Polani Information Theory in Intelligent Decision Making
Formalism
Bayesian Network
. . .Wt−3
St−3 At−3
Wt−2
St−2 At−2
Wt−1
St−1 At−1
Wt
St At
Wt+1
St+1 At+1
Wt+2. . .
(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)
Daniel Polani Information Theory in Intelligent Decision Making
Formalism
Bayesian Network
. . .Wt−3
St−3 At−3
Wt−2
St−2 At−2
Wt−1
St−1 At−1
Wt
St At
Wt+1
St+1 At+1
Wt+2. . .
(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)
Daniel Polani Information Theory in Intelligent Decision Making
Formalism
Bayesian Network
. . .Wt−3
St−3 At−3
Wt−2
St−2 At−2
Wt−1
St−1 At−1
Wt
St At
Wt+1
St+1 At+1
Wt+2. . .
(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)
Daniel Polani Information Theory in Intelligent Decision Making
Formalism
Bayesian Network
. . .Wt−3
St−3 At−3
Wt−2
St−2 At−2
Wt−1
St−1 At−1
Wt
St At
Wt+1
St+1 At+1
Wt+2. . .
(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)
Daniel Polani Information Theory in Intelligent Decision Making
Formalism
Bayesian Network
. . .Wt−3
St−3 At−3
Wt−2
St−2 At−2
Wt−1
St−1 At−1
Wt
St At
Wt+1
St+1 At+1
Wt+2. . .
(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)
Daniel Polani Information Theory in Intelligent Decision Making
Formalism
Bayesian Network
. . .Wt−3
St−3 At−3
Wt−2
St−2 At−2
Wt−1
St−1 At−1
Wt
St At
Wt+1
St+1 At+1
Wt+2. . .
(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)
Daniel Polani Information Theory in Intelligent Decision Making
Formalism
Bayesian Network
. . .Wt−3
St−3 At−3
Wt−2
St−2 At−2
Wt−1
St−1 At−1
Wt
St At
Wt+1
St+1 At+1
Wt+2. . .
(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)
Daniel Polani Information Theory in Intelligent Decision Making
Formalism
Bayesian Network
. . .Wt−3
St−3 At−3
Wt−2
St−2 At−2
Wt−1
St−1 At−1
Wt
St At
Wt+1
St+1 At+1
Wt+2. . .
(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)
Daniel Polani Information Theory in Intelligent Decision Making
Formalism
Bayesian Network
. . .Wt−3
St−3 At−3
Wt−2
St−2 At−2
Wt−1
St−1 At−1
Wt
St At
Wt+1
St+1 At+1
Wt+2. . .
“Free Will” Actions
Empowerment: Formal Definition
E(k) := maxp(at−k ,at−k+1,...,at−1)
I(At−k, At−k+1, . . . , At−1; St)
(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)
Daniel Polani Information Theory in Intelligent Decision Making
Formalism
Bayesian Network
. . .Wt−3
St−3 At−3
Wt−2
St−2 At−2
Wt−1
St−1 At−1
Wt
St At
Wt+1
St+1 At+1
Wt+2. . .
“Free Will” Actions
Empowerment: Formal Definition
E(k)(wt−k) :=max
p(at−k ,at−k+1,...,at−1|wt−k)I(At−k, At−k+1, . . . , At−1; St|wt−k)
(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)
Daniel Polani Information Theory in Intelligent Decision Making
Formalism
Bayesian Network
. . .Wt−3
St−3 At−3
Wt−2
St−2 At−2
Wt−1
St−1 At−1
Wt
St At
Wt+1
St+1 At+1
Wt+2. . .
“Free Will” Actions
Empowerment: Formal Definition
E(k)(wt−k) := maxp(a(k)t−k |wt−k)
I(A(k)t−k; St|wt−k)
(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)
Daniel Polani Information Theory in Intelligent Decision Making
Empowerment — a Universal Utility
Notes
Empowerment E(k)(w) defined
given horizon k, i.e. local
given starting state w (or context, for POMDPs)
i.e. empowerment is function of state, “utility”
However
only defined by world dynamics
no reward function assumed
Daniel Polani Information Theory in Intelligent Decision Making
Empowerment — Notes
Properties of Empowerment
want to maximize potential information flow
could be injected through the actuatorsinto the environmentand recaptured by sensors in the future
potential influence on the environmentwhich is detectable through agent sensorsdetermined by embodiment Pa
ss′ onlyno external reward Ra
ss′
Bottom Line
information-theoretic controllability/observabilityinformational efficiency of sensorimotor niche
Daniel Polani Information Theory in Intelligent Decision Making
Other Interpretations
Related Concepts
mobility
money
affordance
graph centrality(Anthony et al., 2008)
antithesis to “helplessness”(Seligman and Maier, 1967; Overmier and Seligman, 1967)
Think Strategic
Tactics is what you do when you have a plan
Strategy is what you do when you haven’t
Daniel Polani Information Theory in Intelligent Decision Making
Part X
First Examples
Daniel Polani Information Theory in Intelligent Decision Making
Maze Empowerment
maze average distance
E ∈ [1; 2.32] E ∈ [1.58; 3.70] E ∈ [3.46; 5.52] E ∈ [4.50; 6.41]
Daniel Polani Information Theory in Intelligent Decision Making
Empowerment vs. Average Distance
*
******
**
* *
* *
*
****
*
* *
**
*
*
****
* *
**
*
*
***** *
***
*****
*
**
**
*****
*
*
***
**
*
***
*
*
****
*
**
*
*
*
*
*
*
*
*
*
*
*
**
*
*
*
*
*
*
*
*
6 8 10 12 14 16
4.5
5.0
5.5
6.0
d
EE
Daniel Polani Information Theory in Intelligent Decision Making
Box Pushing
stationary box pushable box
box invisibleto agent
E ∈ [5.86, 5.93] E = log2 61 ≈ 5.93 bit
box visible toagent
E ∈ [5.86, 5.93] E ∈ [5.93, 7.79]
Daniel Polani Information Theory in Intelligent Decision Making
In the Continuum:Pendulum Swing-up Task w/o Reward(Jung et al., 2011)
Dynamics
pendulum (length l = 1, mass m = 1, grav g = 9.81, friction µ = 0.05)
ϕ(t) =−µϕ(t) + mgl sin(ϕ(t)) + u(t)
ml2
with state st = (ϕ(t), ϕ(t)) and continuous control u ∈ [−5, 5].system time discretized to ∆ = 0.05 secdiscretize actions to u ∈ −5,−2.5, 0,+2.5,+5
Goal
To provide this system with some matching purpose, considerpendulum swing-up task
Comparison
empowerment-based controltraditional optimal control
Optimal control problem is solved by approximate dynamic programming on a high-resolution grid.Daniel Polani Information Theory in Intelligent Decision Making
Results: Performance
0 1 2 3 4 5 6 7 8 9 10−2
−1
0
1
2
3
4
5
Time (sec)
Performance of optimal policy (FVI+KNN on 1000x1000 grid)
phiphidot
0 1 2 3 4 5 6 7 8 9 10−5
−4
−3
−2
−1
0
1
2
Time (sec)
Performance of maximally empowered policy (3−step)
phiphidot
Phase plot of ϕ and ϕ when following the respective greedy policy from the last slide. Note that for ϕ, the y-axisshows the height of the pendulum (+1 means upright, the goal state).Daniel Polani Information Theory in Intelligent Decision Making
Results: “Explored” Space
−pi −pi/2 0 pi/2 pi−6
−4
−2
0
2
4
6Empowerment−based Exploration
φ [rad]
φ’ [r
ad/s
]
Action 0
Action 1
Action 2
Action 3
Action 4
Daniel Polani Information Theory in Intelligent Decision Making
Empowerment: Acrobot(Jung et al., 2011)
Setting
two-linked pendulumactuation in hip only
Idea
Add LQR control to bang-bang control
Daniel Polani Information Theory in Intelligent Decision Making
Acrobot: Demo
Daniel Polani Information Theory in Intelligent Decision Making
Block’s World(Salge, 2013)
Properties
scenario with modifiable world
deterministic (i.e. empowerment is log n where n is thenumber of reachable states in horizon k)
agent can incorporate, place, destroy blocks and move
estimated via (highly incomplete) sampling
Empowered “Minecrafter”
(Salge, 2013)
Daniel Polani Information Theory in Intelligent Decision Making
Explorer Accompanying Robot(Glackin et al., 2015)
Consortium
Demonstrator II
Daniel Polani Information Theory in Intelligent Decision Making
Part XI
References
Daniel Polani Information Theory in Intelligent Decision Making
Anthony, T., Polani, D., and Nehaniv, C. L. (2008). On preferredstates of agents: how global structure is reflected in localstructure. In Bullock, S., Noble, J., Watson, R., and Bedau,M. A., editors, Artificial Life XI: Proceedings of the EleventhInternational Conference on the Simulation and Synthesis ofLiving Systems, Winchester 5–8. Aug., pages 25–32. MITPress, Cambridge, MA.
Ashby, W. R. (1956). An Introduction to Cybernetics. Chapman &Hall Ltd.
Ay, N., Bertschinger, N., Der, R., Guttler, F., and Olbrich, E.(2008). Predictive information and explorative behavior ofautonomous robots. European Journal of Physics B,63:329–339.
Baylor, D., Lamb, T., and Yau, K. (1979). Response of retinalrods to single photons. Journal of Physiology, London,288:613–634.
Belavkin, R. (2008). The duality of utility and information inoptimally learning systems. In Proc. 7th IEEE InternationalConference on ’Cybernetic Intelligent Systems’. IEEE Press.
Daniel Polani Information Theory in Intelligent Decision Making
Belavkin, R. V. (2009). Bounds of optimal learning. In AdaptiveDynamic Programming and Reinforcement Learning, 2009.ADPRL’09. IEEE Symposium on, pages 199–204. IEEE.
Biehl, M. (2013). Kullback-leibler and bayes. Internal Memo.
Cover, T. M. and Thomas, J. A. (2006). Elements of InformationTheory. Wiley, 2nd edition.
Denk, W. and Webb, W. W. (1989). Thermal-noise-limitedtransduction observed in mechanosensory receptors of theinner ear. Phys. Rev. Lett., 63(2):207–210.
Der, R. (2001). Self-organized acqusition of situated behavior.Theory Biosci., 120:1–9.
Dusenbery, D. B. (1992). Sensory Ecology. W. H. Freeman andCompany, New York.
Glackin, C., Salge, C., Trendafilov, D., Greaves, M., Polani, D.,Leu, A., Haque, S. J. U., Slavnic, S., , and Ristic-Durrant, D.(2015). An information-theoretic intrinsic motivation modelfor robot navigation and path planning.
Daniel Polani Information Theory in Intelligent Decision Making
Hecht, S., Schlaer, S., and Pirenne, M. (1942). Energy, quanta andvision. Journal of the Optical Society of America, 38:196–208.
Jung, T., Polani, D., and Stone, P. (2011). Empowerment forcontinuous agent-environment systems. Adaptive Behaviour,19(1):16–39. Published online 13. January 2011.
Kaplan, F. and Oudeyer, P.-Y. (2004). Maximizing learningprogress: an internal reward system for development. In Iida,F., Pfeifer, R., Steels, L., and Kuniyoshi, Y., editors,Embodied Artificial Intelligence, volume 3139 of LNAI, pages259–270. Springer.
Klyubin, A., Polani, D., and Nehaniv, C. (2007). Representationsof space and time in the maximization of information flow inthe perception-action loop. Neural Computation,19(9):2387–2432.
Klyubin, A. S., Polani, D., and Nehaniv, C. L. (2004a).Organization of the information flow in the perception-actionloop of evolved agents. In Proceedings of 2004 NASA/DoDConference on Evolvable Hardware, pages 177–180. IEEEComputer Society.
Daniel Polani Information Theory in Intelligent Decision Making
Klyubin, A. S., Polani, D., and Nehaniv, C. L. (2004b). Trackinginformation flow through the environment: Simple cases ofstigmergy. In Pollack, J., Bedau, M., Husbands, P., Ikegami,T., and Watson, R. A., editors, Artificial Life IX: Proceedingsof the Ninth International Conference on Artificial Life, pages563–568. MIT Press.
Klyubin, A. S., Polani, D., and Nehaniv, C. L. (2005a). All elsebeing equal be empowered. In Advances in Artificial Life,European Conference on Artificial Life (ECAL 2005), volume3630 of LNAI, pages 744–753. Springer.
Klyubin, A. S., Polani, D., and Nehaniv, C. L. (2005b).Empowerment: A universal agent-centric measure of control.In Proc. IEEE Congress on Evolutionary Computation, 2-5September 2005, Edinburgh, Scotland (CEC 2005), pages128–135. IEEE.
Klyubin, A. S., Polani, D., and Nehaniv, C. L. (2008). Keep youroptions open: An information-based driving principle forsensorimotor systems. PLoS ONE, 3(12):e4018.
Daniel Polani Information Theory in Intelligent Decision Making
Laughlin, S. B. (2001). Energy as a constraint on the coding andprocessing of sensory information. Current Opinion inNeurobiology, 11:475–480.
Laughlin, S. B., de Ruyter van Steveninck, R. R., and Anderson,J. C. (1998). The metabolic cost of neural information.Nature Neuroscience, 1(1):36–41.
Nehaniv, C. L., Polani, D., Olsson, L. A., and Klyubin, A. S.(2007). Information-theoretic modeling of sensory ecology:Channels of organism-specific meaningful information. InLaubichler, M. D. and Muller, G. B., editors, ModelingBiology: Structures, Behaviour, Evolution, The Vienna Seriesin Theoretical Biology, pages 241–281. MIT press.
Overmier, J. B. and Seligman, M. E. P. (1967). Effects ofinescapable shock upon subsequent escape and avoidanceresponding. Journal of Comparative and PhysiologicalPsychology, 63:28–33.
Paul, C. (2006). Morphological computation: A basis for theanalysis of morphology and control requirements. Roboticsand Autonomous Systems, 54(8):619–630.
Daniel Polani Information Theory in Intelligent Decision Making
Pfeifer, R. and Bongard, J. (2007). How the Body Shapes the WayWe think: A New View of Intelligence. Bradford Books.
Polani, D. (2009). Information: Currency of life? HFSP Journal,3(5):307–316.
Polani, D. (2011). An informational perspective on how theembodiment can relieve cognitive burden. In Proc. IEEESymposium Series in Computational Intelligence 2011 —Symposium on Artificial Life, pages 78–85. IEEE.
Polani, D., Nehaniv, C., Martinetz, T., and Kim, J. T. (2006).Relevant information in optimized persistence vs. progenystrategies. In Rocha, L. M., Bedau, M., Floreano, D.,Goldstone, R., Vespignani, A., and Yaeger, L., editors, Proc.Artificial Life X, pages 337–343.
Radvansky, G. A., Krawietz, S. A., and Tamplin, A. K. (2011).Walking through doorways causes forgetting: Furtherexplorations. The Quarterly Journal of ExperimentalPsychology, 64(8):1632–1645.
Daniel Polani Information Theory in Intelligent Decision Making
Saerens, M., Achbany, Y., Fuss, F., and Yen, L. (2009).Randomized shortest-path problems: Two related models.Neural Computation, 21:2363–2404.
Salge, C. (2013). Block’s world. Presented at GSO 2013.
Schmidhuber, J. (1991). A possibility for implementing curiosityand boredom in model-building neural controllers. In Meyer,J. A. and Wilson, S. W., editors, Proc. of the InternationalConference on Simulation of Adaptive Behavior: FromAnimals to Animats, pages 222–227. MIT Press/BradfordBooks.
Seligman, M. E. P. and Maier, S. F. (1967). Failure to escapetraumatic shock. Journal of Experimental Psychology, 74:1–9.
Singh, S., Barto, A. G., and Chentanez, N. (2005). Intrinsicallymotivated reinforcement learning. In Proceedings of the 18thAnnual Conference on Neural Information Processing Systems(NIPS), Vancouver, B.C., Canada.
Steels, L. (2004). The autotelic principle. In Iida, F., Pfeifer, R.,Steels, L., and Kuniyoshi, Y., editors, Embodied Artificial
Daniel Polani Information Theory in Intelligent Decision Making
Intelligence: Dagstuhl Castle, Germany, July 7-11, 2003,volume 3139 of Lecture Notes in AI, pages 231–242. SpringerVerlag, Berlin.
Still, S. and Precup, D. (2012). An information-theoretic approachto curiosity-driven reinforcement learning. Theory inBiosciences, 131(3):139–148.
Stratonovich, R. (1965). On value of information. Izvestiya ofUSSR Academy of Sciences, Technical Cybernetics, 5:3–12.
Tishby, N. and Polani, D. (2011). Information theory of decisionsand actions. In Cutsuridis, V., Hussain, A., and Taylor, J.,editors, Perception-Action Cycle: Models, Architecture andHardware, pages 601–636. Springer.
Touchette, H. and Lloyd, S. (2000). Information-theoretic limits ofcontrol. Phys. Rev. Lett., 84:1156.
Touchette, H. and Lloyd, S. (2004). Information-theoretic approachto the study of control systems. Physica A, 331:140–172.
van Dijk, S. and Polani, D. (2011). Grounding subgoals ininformation transitions. In Proc. IEEE Symposium Series in
Daniel Polani Information Theory in Intelligent Decision Making
Computational Intelligence 2011 — Symposium on AdaptiveDynamic Programming and Reinforcement Learning, pages105–111. IEEE.
van Dijk, S. and Polani, D. (2013). Informationalconstraints-driven organization in goal-directed behavior.Advances in Complex Systems, 16(2-3). Published online, 30.April 2013, DOI:10.1142/S0219525913500161.
van Dijk, S. G., Polani, D., and Nehaniv, C. L. (2010). What doyou want to do today? relevant-information bookkeeping ingoal-oriented behaviour. In Proc. Artificial Life, Odense,Denmark, pages 176–183.
Vergassola, M., Villermaux, E., and Shraiman, B. I. (2007).’infotaxis’ as a strategy for searching without gradients.Nature, 445:406–409.
Wissner-Gross, A. D. and Freer, C. E. (2013). Causal entropicforcing. Physics Review Letters, 110(168702).
Daniel Polani Information Theory in Intelligent Decision Making