information theory in intelligent decision making · 2015. 3. 5. · daniel polani adaptive systems...

148
Information Theory in Intelligent Decision Making Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire, United Kingdom March 5, 2015 Daniel Polani Information Theory in Intelligent Decision Making

Upload: others

Post on 18-Apr-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Information Theoryin Intelligent Decision Making

Daniel Polani

Adaptive Systems and Algorithms Research GroupsSchool of Computer Science

University of Hertfordshire, United Kingdom

March 5, 2015

Daniel Polani Information Theory in Intelligent Decision Making

Page 2: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Information Theoryin Intelligent Decision Making

The Theory

Daniel Polani

Adaptive Systems and Algorithms Research GroupsSchool of Computer Science

University of Hertfordshire, United Kingdom

March 5, 2015

Daniel Polani Information Theory in Intelligent Decision Making

Page 3: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Motivation

Artificial Intelligence

modelling cognition in humans

realizing human-level “intelligent” behaviour in machines

jumble of various ideas to get above points working

Question

Is there a joint way of understanding cognition?

Probability

we have probability theory for a theory of uncertainty

we have information theory for endowing probability with asense of “metrics”

Daniel Polani Information Theory in Intelligent Decision Making

Page 4: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Motivation

Artificial Intelligence

modelling cognition in humans

realizing human-level “intelligent” behaviour in machines (justperformance: not necessarily imitating biological substrate)

jumble of various ideas to get above points working

Question

Is there a joint way of understanding cognition?

Probability

we have probability theory for a theory of uncertainty

we have information theory for endowing probability with asense of “metrics”

Daniel Polani Information Theory in Intelligent Decision Making

Page 5: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Random Variables

Def.: Event Space

Consider an event space Ω = ω1, ω2, . . . , finite or countablyinfinite with a (probability) measure PΩ : Ω→ [0, 1] s.t.

∑ω PΩ(ω) = 1. The ω are called events.

Random Variable

A random variable X is a map X : Ω→ X with some outcomespace X = x1, x2, . . . and induced probability measurePX(x) = PΩ(X−1(x)).We also write instead

PX(x) ≡ P(X = x) ≡ p(x) .

Daniel Polani Information Theory in Intelligent Decision Making

Page 6: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Neyman-Pearson Lemma I

Lemma

Consider observations x1, x2, . . . , xn of a random variable Xand two potential hypotheses (distributions) p1 and p2 theycould have been based upon.

Consider the test for hypothesis p1 to be given as(x1, x2, . . . , xn) ∈ A where

A =

x = (x′1, x′2, . . . , x′n)∣∣∣ p1(x′1,x′2,...,x′n)

p2(x′1,x′2,...,x′n)≥ C

with some

C ∈ R+.

Assuming the rate α of false negatives p1(A) to be given.Generated by p1, but not in A

If β is the rate of false positives p2(A)Then: any test with false negative rate α′ ≤ α has false

positive rate β′ ≥ β.

(Cover and Thomas, 2006)

Daniel Polani Information Theory in Intelligent Decision Making

Page 7: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Neyman-Pearson Lemma II

Proof(Cover and Thomas, 2006)

Let A as above and B some other acceptance region; χA and χBbe the indicator functions. Then for all x:

[χA(x)− χB(x)] [p1(x)− Cp2(x)] ≥ 0 .

Multiplying out & integrating:

0 ≤∑A(p1 − Cp2)−∑

B(p1 − Cp2)

= (1− α)− Cβ− (1− α′) + Cβ′

= C(β′ − β)− (α− α′)

Daniel Polani Information Theory in Intelligent Decision Making

Page 8: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Neyman-Pearson Lemma III

Consideration

assume events xi.i.d.

test becomes:

∏i

p1(xi)

p2(xi)≥ C

logarithmize:

∑i

logp1(xi)

p2(xi)≥ κ (:= log C)

Note

: Kullback-Leibler Divergence

Average “evidence” growth per sample

DKL(p1||p2) =

E[

logp1(X)

p2(X)

]= ∑

x∈Xp(x) log

p1(x)p2(x)

Daniel Polani Information Theory in Intelligent Decision Making

Page 9: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Neyman-Pearson Lemma IV

Consideration

assume events xi.i.d.

test becomes:

∏i

p1(xi)

p2(xi)≥ C

logarithmize:

∑i

logp1(xi)

p2(xi)≥ κ (:= log C)

Note

: Kullback-Leibler Divergence

Average “evidence” growth per sample

DKL(p1||p2) =

E[

logp1(X)

p2(X)

]= ∑

x∈Xp(x) log

p1(x)p2(x)

Daniel Polani Information Theory in Intelligent Decision Making

Page 10: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Neyman-Pearson Lemma V

Consideration

assume events xi.i.d.

test becomes:

∏i

p1(xi)

p2(xi)≥ C

logarithmize:

∑i

logp1(xi)

p2(xi)≥ κ (:= log C)

Note: Kullback-Leibler Divergence

Average “evidence” growth per sample

DKL(p1||p2) = Ep1

[log

p1(X)

p2(X)

]= ∑

x∈Xp1(x) log

p1(x)p2(x)

Daniel Polani Information Theory in Intelligent Decision Making

Page 11: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Neyman-Pearson Lemma VI

-100

0

100

200

300

400

500

600

700

800

900

0 2000 4000 6000 8000 10000

log

sum

samples

"0.40_vs_0.60.dat""0.50_vs_0.60.dat""0.55_vs_0.60.dat"

Daniel Polani Information Theory in Intelligent Decision Making

Page 12: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Neyman-Pearson Lemma VII

-100

0

100

200

300

400

500

600

700

800

900

0 2000 4000 6000 8000 10000

log

sum

samples

"0.40_vs_0.60.dat""0.50_vs_0.60.dat""0.55_vs_0.60.dat"

dkl_04*xdkl_05 * x

dkl_055 * x

Daniel Polani Information Theory in Intelligent Decision Making

Page 13: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Part I

Information Theory — Motivation

Daniel Polani Information Theory in Intelligent Decision Making

Page 14: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Structural MotivationIntrinsic Pathways to Information Theory

InformationTheory

optimalcommunication

Shannonaxioms

physicalentropy

Laplace’sprinciple

typicalitytheory

optimal Bayes

Rate Distortion

informationgeometry

Daniel Polani Information Theory in Intelligent Decision Making

Page 15: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Structural MotivationIntrinsic Pathways to Information Theory

InformationTheory

optimalcommunication

Shannonaxioms

physicalentropy

Laplace’sprinciple

typicalitytheory

optimal Bayes

Rate Distortion

informationgeometry

Daniel Polani Information Theory in Intelligent Decision Making

Page 16: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Structural MotivationIntrinsic Pathways to Information Theory

InformationTheory

optimalcommunication

Shannonaxioms

physicalentropy

Laplace’sprinciple

typicalitytheory

optimal Bayes

Rate Distortion

informationgeometry

Daniel Polani Information Theory in Intelligent Decision Making

Page 17: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Structural MotivationIntrinsic Pathways to Information Theory

InformationTheory

optimalcommunication

Shannonaxioms

physicalentropy

Laplace’sprinciple

typicalitytheory

optimal Bayes

Rate Distortion

informationgeometry

Daniel Polani Information Theory in Intelligent Decision Making

Page 18: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Structural MotivationIntrinsic Pathways to Information Theory

InformationTheory

optimalcommunication

Shannonaxioms

physicalentropy

Laplace’sprinciple

typicalitytheory

optimal Bayes

Rate Distortion

informationgeometry

Daniel Polani Information Theory in Intelligent Decision Making

Page 19: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Structural MotivationIntrinsic Pathways to Information Theory

InformationTheory

optimalcommunication

Shannonaxioms

physicalentropy

Laplace’sprinciple

typicalitytheory

optimal Bayes

Rate Distortion

informationgeometry

Daniel Polani Information Theory in Intelligent Decision Making

Page 20: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Structural MotivationIntrinsic Pathways to Information Theory

InformationTheory

optimalcommunication

Shannonaxioms

physicalentropy

Laplace’sprinciple

typicalitytheory

optimal Bayes

Rate Distortion

informationgeometry

Daniel Polani Information Theory in Intelligent Decision Making

Page 21: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Structural MotivationIntrinsic Pathways to Information Theory

InformationTheory

optimalcommunication

Shannonaxioms

physicalentropy

Laplace’sprinciple

typicalitytheory

optimal Bayes

Rate Distortion

informationgeometry

Daniel Polani Information Theory in Intelligent Decision Making

Page 22: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Structural MotivationIntrinsic Pathways to Information Theory

InformationTheory

optimalcommunication

Shannonaxioms

physicalentropy

Laplace’sprinciple

typicalitytheory

optimal Bayes

Rate Distortion

informationgeometry

Daniel Polani Information Theory in Intelligent Decision Making

Page 23: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Structural MotivationIntrinsic Pathways to Information Theory

InformationTheory

AI

optimalcommunication

Shannonaxioms

physicalentropy

Laplace’sprinciple

typicalitytheory

optimal Bayes

Rate Distortion

informationgeometry

Daniel Polani Information Theory in Intelligent Decision Making

Page 24: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Optimal Communication

Codes

task: send messages (disambiguate states) from sender toreceiver

consider self-delimiting codes (without extra delimitingcharacter)

simple example: prefix codes

Def.: Prefix Codes

codes where none is a prefix of another code

Daniel Polani Information Theory in Intelligent Decision Making

Page 25: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Prefix Codes

0 1

0 1

0

0

0

1

1

Daniel Polani Information Theory in Intelligent Decision Making

Page 26: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Kraft Inequality

Theorem

Assume events x ∈ X = x1, x2, . . . xk are coded using prefixcodewords based on alphabet size b = |B|, with lengthsl1, l2, . . . , lk for the respective events, then one has

k

∑i=1

bli ≤ 1 .

Proof Sketch(Cover and Thomas, 2006)

Let lmax be the length of the longest codeword. Expand tree fullyto level lmax. Fully expanded leaves are either: 1. codewords; 2.descendants of codewords; 3. neither.An li codeword has blmax−li full-tree descendants, which must bedifferent for the different codewords and there cannot be morethan blmax in total. Hence

∑ blmax−li ≤ blmax

Remark

The converse also holds.

Daniel Polani Information Theory in Intelligent Decision Making

Page 27: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Considerations — Most compact code

Assume

Want to code stream of eventsx ∈ X appearing with probabilityp(x).

Note

1 try to make li as small aspossible

2 make b−li as large as possible3 limited by Kraft inequality;

ideally becoming equality

∑i

b−li = 1

as li are integers, that’s typically not exact

Minimize

Average code length:E[L] = ∑i p(xi) li under

constraint ∑i b−li != 1

Result

Differentiating Lagrangian

∑i

p(xi) li + λ ∑i

b−li

w.r.t. l gives codewordlengths for “shortest” code:

li = − logb p(xi)

Average Codeword Length

= ∑i

p(xi) · li = −∑x

p(x) log p(x)

In the following, assume binary log.

Daniel Polani Information Theory in Intelligent Decision Making

Page 28: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Considerations — Most compact code

Assume

Want to code stream of eventsx ∈ X appearing with probabilityp(x).

Note

1 try to make li as small aspossible

2 make b−li as large as possible3 limited by Kraft inequality;

ideally becoming equality

∑i

b−li = 1

as li are integers, that’s typically not exact

Minimize

Average code length:E[L] = ∑i p(xi) li under

constraint ∑i b−li != 1

Result

Differentiating Lagrangian

∑i

p(xi) li + λ ∑i

b−li

w.r.t. l gives codewordlengths for “shortest” code:

li = − logb p(xi)

Average Codeword Length

= ∑i

p(xi) · li = −∑x

p(x) log p(x)

In the following, assume binary log.

Daniel Polani Information Theory in Intelligent Decision Making

Page 29: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Entropy

Def.: Entropy

Consider the random variable X. Then the entropy H(X) of X isdefined as

H(X)

[≡ H(p)]

:= −∑x

p(x) log p(x)

with convention 0 log 0 ≡ 0

Interpretations

average optimal codeword lengthuncertainty (about next sample of X)physical entropymuch more . . .

Quote

“Why don’t you call it entropy. In the first place, a mathematicaldevelopment very much like yours already exists in Boltzmann’sstatistical mechanics, and in the second place, no one understandsentropy very well, so in any discussion you will be in a position ofadvantage.”

John von Neumann

Daniel Polani Information Theory in Intelligent Decision Making

Page 30: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Entropy

Def.: Entropy

Consider the random variable X. Then the entropy H(X) of X isdefined as

H(X)

[≡ H(p)]

:= −∑x

p(x) log p(x)

with convention 0 log 0 ≡ 0

Interpretations

average optimal codeword lengthuncertainty (about next sample of X)physical entropymuch more . . .

Quote

“Why don’t you call it entropy. In the first place, a mathematicaldevelopment very much like yours already exists in Boltzmann’sstatistical mechanics, and in the second place, no one understandsentropy very well, so in any discussion you will be in a position ofadvantage.”

John von Neumann

Daniel Polani Information Theory in Intelligent Decision Making

Page 31: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Entropy

Def.: Entropy

Consider the random variable X. Then the entropy H(X) of X isdefined as

H(X)[≡ H(p)] := −∑x

p(x) log p(x)

with convention 0 log 0 ≡ 0

Interpretations

average optimal codeword lengthuncertainty (about next sample of X)physical entropymuch more . . .

Quote

“Why don’t you call it entropy. In the first place, a mathematicaldevelopment very much like yours already exists in Boltzmann’sstatistical mechanics, and in the second place, no one understandsentropy very well, so in any discussion you will be in a position ofadvantage.”

John von Neumann

Daniel Polani Information Theory in Intelligent Decision Making

Page 32: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Meditation

Probability/Code Mismatch

Consider events x following a probability p(x), but modelerassuming mistakenly probability q(x), with optimal code lengths− log q(x). Then “code length waste per symbol” given by

−∑x

p(x) log q(x) + ∑x

p(x) log p(x)

= ∑x

p(x) logp(x)q(x)

= DKL(p||q)

Daniel Polani Information Theory in Intelligent Decision Making

Page 33: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Part II

Types

Daniel Polani Information Theory in Intelligent Decision Making

Page 34: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

A Tip of Types(Cover and Thomas, 2006)

Method of Types: Motivation

consider sequences with same empirical distribution

how many of these with a particular distribution

probability of such a sequence

Sketch of the Method

consider binary event set X = 0, 1w.l.o.g.

consider sample x(n) = (x1, . . . , xn) ∈ X n

the type p(n)x is the empirical distribution of symbols y ∈ X in

sample x(n). I.e. px(n)(y) counts how often symbol y appears

in x(n). Let Pn be set of types with denominator n.or dividing n

for p ∈ Pn, call the set of all sequences x(n) ∈ X n with type pthe type class C(p) = x(n)|px(n) = p.

Daniel Polani Information Theory in Intelligent Decision Making

Page 35: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Type Theorem

Type Count

If |X | = 2, one has |Pn| = n + 1 different types for sequences oflength n.easy to generalize

Important

Pn grows only polynomially, but X n grows exponentially with n.It follows that (at least one) type must contain exponentially manysequences. This corresponds to the “macrostate” in physics.

Theorem(Cover and Thomas, 2006)

If x1, x2, . . . , xn is an i.i.d. drawn sample sequence drawn from q,then the probability of x(n) depends only on its type and is given by

2−n[H(px(n)

)+DKL(px(n)||q)]

Corollary

If x(n) has type q, then its probability is given by

2−nH(q)

A large value of H(q) indicates many possible candidates x(n) andhigh uncertainty, a small value few candidates and low uncertainty.

here, we interpret probability q as type

Daniel Polani Information Theory in Intelligent Decision Making

Page 36: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Part III

Laplace’s Principle and Friends

Daniel Polani Information Theory in Intelligent Decision Making

Page 37: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Laplace’s Principle of Insufficient Reason I

Scenario

Consider X . A probability distribution is assumed on X , but it isunknown.Laplace’s principle of insufficient reason states that, in absence ofany reason to assume that the outcomes are inequivalent, theprobability distribution on X is assumed as equidistribution.

Question

How to generalize when something is known?

Daniel Polani Information Theory in Intelligent Decision Making

Page 38: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Answer: Types

Dominant Sample Sequence

Remember: sequence probability of sequences in type class C(q)

2−nH(q)

A priori, a probability q maximizing H(q) will generate dominatingsequence types dominating all others.

Maximum Entropy Principle

Maximize: H(q) with respect to qResult: equidistribution q(x) = 1

|X |

Daniel Polani Information Theory in Intelligent Decision Making

Page 39: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Sanov’s Theorem I

Theorem

Consider i.i.d. sequenceX1, X2, . . . , Xn of random variables,distributed according to q(X). Letfurther E be a set of probabilitydistributions.

Then (amongst other), if E is closedand with p∗ = arg minp∈E D(p||q),one has

1n

log q(n)(E) −→ −D(p∗||q)

p∗E

q

Daniel Polani Information Theory in Intelligent Decision Making

Page 40: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Sanov’s Theorem II

Interpretation

p is unknown, but one knows constraints for p (e.g. some

condition, such as some mean value U != ∑x p(x)U(x) must be

attained, i.e. the set E is given), then the dominating types arethose close to p∗.

Special Case

if prior q is equidistribution (indifference), then minimizing D(p||q)under constraints E is equivalent to maximizing H(p) under theseconstraints.

Jaynes’ Maximum Entropy Principle

Daniel Polani Information Theory in Intelligent Decision Making

Page 41: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Sanov’s Theorem III

Jaynes’ Principle

generalization of Laplace’s Principle

maximally uncommitted distribution

Daniel Polani Information Theory in Intelligent Decision Making

Page 42: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Maximum Entropy Distributions INo constraints

We are interested in maximizing

H(X) = −∑x

p(x) log p(x)

over all probabilities p. The probability p lives in the simplex

∆ = q ∈ R|X ||∑i qi = 1, qi ≥ 0

The maximization requires to respect constraints, of which we now

consider only ∑x p(x) != 1.

The edge constraints happen not to be invoked here.

Daniel Polani Information Theory in Intelligent Decision Making

Page 43: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Maximum Entropy Distributions IINo constraints

Unconstrained maximization via Lagrange:

maxp

[−∑x

p(x) log p(x) + λ ∑x

p(x)]

Taking derivative ∇p(x) gives

− log p(x)− 1 + λ!= 0

. Thus p(x) = eλ−1 ≡ 1/|X | — equidistribution

Daniel Polani Information Theory in Intelligent Decision Making

Page 44: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Maximum Entropy DistributionsLinear Constraints

Constraints are now

∑x

p(x) != 1

∑x

p(x) f (x) != f .

Derive Lagrangian

0 =

∇P[

−∑x

p(x) log p(x) + λ ∑x

p(x) + µ ∑x

p(x) f (x)

]

− log p(x)− 1 + λ + µ f (x) = 0

so that one has

Boltzmann/Gibbs Distribution

p(x) = eλ−1+µ f (x)

=1Z

eµ f (x)

Daniel Polani Information Theory in Intelligent Decision Making

Page 45: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Maximum Entropy DistributionsLinear Constraints

Constraints are now

∑x

p(x) != 1

∑x

p(x) f (x) != f .

Derive Lagrangian

0 = ∇P[−∑x

p(x) log p(x) + λ ∑x

p(x) + µ ∑x

p(x) f (x)]

− log p(x)− 1 + λ + µ f (x) = 0

so that one has

Boltzmann/Gibbs Distribution

p(x) = eλ−1+µ f (x)

=1Z

eµ f (x)

Daniel Polani Information Theory in Intelligent Decision Making

Page 46: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Part IV

Kullback-Leibler and Friends

Daniel Polani Information Theory in Intelligent Decision Making

Page 47: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Conditional Kullback-Leibler

DKL can be conditional

DKL [p(Y|x)||q(Y|x)]DKL [p(Y|X)||q(Y||X)] = ∑

xp(x)DKL [p(Y|x)||q(Y|x)]

Daniel Polani Information Theory in Intelligent Decision Making

Page 48: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Kullback-Leibler and Bayes(Biehl, 2013)

Want to estimate p(x|θ), where θ is the parameter. Observe y.Seek “best” q(x|y) for this y in the following sense:

1 minimize DKL of true distribution to model distribution q

2 averaged over possible observations y3 averaged over θ

minq

∫dθ p(θ) ∑

yp(y|θ)

DKL[p(x|θ)||q(x|y)]

Result

q(x|y) is the Bayesian inference obtained from p(y|x) and p(x)

Daniel Polani Information Theory in Intelligent Decision Making

Page 49: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Kullback-Leibler and Bayes(Biehl, 2013)

Want to estimate p(x|θ), where θ is the parameter. Observe y.Seek “best” q(x|y) for this y in the following sense:

1 minimize DKL of true distribution to model distribution q2 averaged over possible observations y

3 averaged over θ

minq

∫dθ p(θ)

∑y

p(y|θ) DKL[p(x|θ)||q(x|y)]

Result

q(x|y) is the Bayesian inference obtained from p(y|x) and p(x)

Daniel Polani Information Theory in Intelligent Decision Making

Page 50: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Kullback-Leibler and Bayes(Biehl, 2013)

Want to estimate p(x|θ), where θ is the parameter. Observe y.Seek “best” q(x|y) for this y in the following sense:

1 minimize DKL of true distribution to model distribution q2 averaged over possible observations y3 averaged over θ

minq

∫dθ p(θ) ∑

yp(y|θ) DKL[p(x|θ)||q(x|y)]

Result

q(x|y) is the Bayesian inference obtained from p(y|x) and p(x)

Daniel Polani Information Theory in Intelligent Decision Making

Page 51: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Kullback-Leibler and Bayes(Biehl, 2013)

Want to estimate p(x|θ), where θ is the parameter. Observe y.Seek “best” q(x|y) for this y in the following sense:

1 minimize DKL of true distribution to model distribution q2 averaged over possible observations y3 averaged over θ

minq

∫dθ p(θ) ∑

yp(y|θ) DKL[p(x|θ)||q(x|y)]

Result

q(x|y) is the Bayesian inference obtained from p(y|x) and p(x)

Daniel Polani Information Theory in Intelligent Decision Making

Page 52: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Conditional Entropies

Special Case: Conditional Entropy

H(Y|X = x) := −∑y

p(y|x) log p(y|x)

H(Y|X) := −∑x

p(x)∑y

p(y|x) log p(y|x)

Information

Reduction of entropy (uncertainty) by knowing another variable

I(X; Y) := H(Y)− H(Y|X)

= H(X)− H(X|Y)= H(X) + H(Y)− H(X, Y)= DKL[p(x, y)||p(x)p(y)]

Daniel Polani Information Theory in Intelligent Decision Making

Page 53: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Part V

Towards Reality

Daniel Polani Information Theory in Intelligent Decision Making

Page 54: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Rate/Distortion TheoryCode below specifications

Reminder

Information is about sending messages. We considered mostcompact codes over a given noiseless channel. Now consider thesituation where either:

1 channel is not noiseless but has noisy characteristics p(x|x) or

2 we cannot afford to spend average of H(X) bits per symbolto transmit

Question

What happens? Total collapse of transmission

Daniel Polani Information Theory in Intelligent Decision Making

Page 55: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Rate/Distortion Theory IDistortion

“Compromise”

don’t longer insist on perfect transmission

accept compromise, measure distortion d(x, x) betweenoriginal x and transmitted xsmall distortion good, large distortion “baaad”

Theorem: Rate Distortion Function

Given p(x) for generation of symbols X,

R(D) := minp(x|x)

E[d(X,X)]=D

I(X; X)

where the mean is over p(x, x) = p(x|x)p(x).

Daniel Polani Information Theory in Intelligent Decision Making

Page 56: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Rate/Distortion Theory IIDistortion

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0 0.2 0.4 0.6 0.8 1

r(x)

Daniel Polani Information Theory in Intelligent Decision Making

Page 57: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

First Example: Infotaxis(Vergassola et al., 2007)

Daniel Polani Information Theory in Intelligent Decision Making

Page 58: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Information Theoryin Intelligent Decision Making

Applications

Daniel Polani

Adaptive Systems and Algorithms Research GroupsSchool of Computer Science

University of Hertfordshire, United Kingdom

March 5, 2015

Daniel Polani Information Theory in Intelligent Decision Making

Page 59: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Thank You

Information-theoretic PA-LoopInvariants,EmpowermentAlexander Klyubin

Relevant InformationChrystopher NehanivNaftali TishbyThomas MartinetzJan Kim

DigestedInformationChristoph Salge

ContinuousEmpowermentTobias JungPeter Stone

Christoph SalgeCornelius GlackinEC (FEELIXGROWING, FP6),NSF, ONR, DARPA,FHA

CollectiveEmpowermentPhilippe Capdepuy

Collective SystemsMalte Harder

World Structure,Graphs,Empowerment inGamesTom Anthony

Sensor Evolution,Informationdistribution over thePA-LoopSander van DijkAlexandra MarkAchim Liese

Information Flow,PA-Loop ModelsNihat Ay

FurtherContributionsMikhail ProkopenkoLars OlssonPhilippe CapdepuyMalte HarderSimon McGregor

This work was partially supported byFP7 ICT-270219

Daniel Polani Information Theory in Intelligent Decision Making

Page 60: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Part VI

Crash Introduction

Daniel Polani Information Theory in Intelligent Decision Making

Page 61: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Modelling Cognition: Motivation from Biology

Question

Why/how did cognition evolve in biology?

Observations in biology

sensors often highly optimized:

detection of few molecules (moths)(Dusenbery, 1992)

detection of few or individual photons (humans/toads)(Hecht et al., 1942; Baylor et al., 1979)

auditive sense operates close to thermal noise(Denk and Webb, 1989)

cognitive processing very expensive(Laughlin et al., 1998; Laughlin, 2001)

Daniel Polani Information Theory in Intelligent Decision Making

Page 62: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Conclusions

Evidence

sensors often operate at physical limitsevolutionary pressure for high cognitive functions

But What For?

close the cycleactions matter

Entscheidend ist, was hinten rauskommt.

Trade-Offs

sharpening sensorsimprove processingboosting actuators

Was man nicht im Kopf hat, muss man in denBeinen haben.

Daniel Polani Information Theory in Intelligent Decision Making

Page 63: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Conclusions

Evidence

sensors often operate at physical limitsevolutionary pressure for high cognitive functions

But What For?

close the cycleactions matter

Entscheidend ist, was hinten rauskommt.

Trade-Offs

sharpening sensorsimprove processingboosting actuators

Was man nicht im Kopf hat, muss man in denBeinen haben.

Daniel Polani Information Theory in Intelligent Decision Making

Page 64: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Conclusions

Evidence

sensors often operate at physical limitsevolutionary pressure for high cognitive functions

But What For?

close the cycleactions matter

Entscheidend ist, was hinten rauskommt.

Trade-Offs

sharpening sensorsimprove processingboosting actuators

Was man nicht im Kopf hat, muss man in denBeinen haben.

Daniel Polani Information Theory in Intelligent Decision Making

Page 65: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Part VII

Information

Daniel Polani Information Theory in Intelligent Decision Making

Page 66: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Decisions, Decisions

Challenge

Linking sensors, processing and actuators

The Physical and the Biological

Physics: given dynamical equations etc.known (in principle)

Biological Cognition: no established unique modelcomplex, difficult to untangle

Robotic Cognition: many near-equivalent incompatiblesolutions and architecturesoften specific and hand-designed

Problem

Considerable arbitrariness in treatment of cognition

Daniel Polani Information Theory in Intelligent Decision Making

Page 67: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Decisions, Decisions

Challenge

Linking sensors, processing and actuators

The Physical and the Biological

Physics: given dynamical equations etc.known (in principle)

Biological Cognition: no established unique modelcomplex, difficult to untangle

Robotic Cognition: many near-equivalent incompatiblesolutions and architecturesoften specific and hand-designed

Problem

Considerable arbitrariness in treatment of cognition

Daniel Polani Information Theory in Intelligent Decision Making

Page 68: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Idea

Issues

uniform treatment of cognition

distinguish:

essentialincidental

aspects of computation

Proposal: “Covariant” Modeling of Computation

Physics: observations may depend on “coordinate system”for same underlying phenomenon

Cognition: computation may depend on architecturebut essentially computes “the same concepts”

Bottom Line

“coordinate-” (mechanism-)free view of cognition?

Daniel Polani Information Theory in Intelligent Decision Making

Page 69: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Landauer’s Principle

Fundamental Limits for Information Processing

On lowest level: cannot fully separate physics and informationprocessing

Consequence: erasure of information from a “memory” createsheat

Connection: of energy and information

(Wt, Mt)

Wt

Mt

(Wt+1, Mt+1)

Wt+1

Mt+1

Daniel Polani Information Theory in Intelligent Decision Making

Page 70: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Informational Invariants: Beyond Physics

Law of Requisite Variety(Ashby, 1956; Touchette and Lloyd, 2000, 2004)

Ashby: “only variety can destroy variety”

extension by Touchette/Lloyd

Open-Loop Controller: max. entropy reduction∆H∗open

Closed-Loop Controller: max. entropy reduction∆Hclosed ≤ ∆H∗open + I(Wt; At)

. . . Wt−3

At−3

Wt−2

At−2

Wt−1

At−1

Wt

At

Wt+1

At+1

Wt+2. . .

Daniel Polani Information Theory in Intelligent Decision Making

Page 71: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Informational Invariants: Beyond Physics

Law of Requisite Variety(Ashby, 1956; Touchette and Lloyd, 2000, 2004)

Ashby: “only variety can destroy variety”

extension by Touchette/Lloyd

Open-Loop Controller: max. entropy reduction∆H∗open

Closed-Loop Controller: max. entropy reduction∆Hclosed ≤ ∆H∗open + I(Wt; At)

. . . Wt−3

At−3

Wt−2

At−2

Wt−1

At−1

Wt

At

Wt+1

At+1

Wt+2. . .

Daniel Polani Information Theory in Intelligent Decision Making

Page 72: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Informational Invariants: Scenario

Core Statement

Task: consider e.g. navigational task

Informationally: reduction of entropy of initial (arbitrary) state

Example:

−10 −5 0 5 10

−10

−5

0

5

10

\tex[c][c][1][0]x

\te

x[c

][c][1

][0

]y

Daniel Polani Information Theory in Intelligent Decision Making

Page 73: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Information Bookkeeping

Bayesian Network

. . .Wt−3

St−3

. . . Mt−3

At−3

Wt−2

St−2

Mt−2

At−2

Wt−1

St−1

Mt−1

At−1

Wt

St

Mt

At

Wt+1

St+1

Mt+1. . .

At+1

Wt+2. . .

Informational “Conservation Laws”

Total Sensor History: S(t) = (S0, S1, . . . , St−1)

Result:limt→∞

I(S(t); W0) = H(W0)

(Klyubin et al., 2007), and see also (Ashby, 1956; Touchette and Lloyd, 2000, 2004)

Daniel Polani Information Theory in Intelligent Decision Making

Page 74: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Information Bookkeeping

Bayesian Network

. . .Wt−3

St−3 At−3

Wt−2

St−2 At−2

Wt−1

St−1 At−1

Wt

St At

Wt+1

St+1 At+1

Wt+2. . .

Informational “Conservation Laws”

Total Sensor History: S(t) = (S0, S1, . . . , St−1)

Result:limt→∞

I(S(t); W0) = H(W0)

(Klyubin et al., 2007), and see also (Ashby, 1956; Touchette and Lloyd, 2000, 2004)

Daniel Polani Information Theory in Intelligent Decision Making

Page 75: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Information Bookkeeping

Bayesian Network

. . . Wt−3

St−3 At−3

Wt−2

St−2 At−2

Wt−1

St−1 At−1

Wt

St At

Wt+1

St+1 At+1

Wt+2. . .

Informational “Conservation Laws”

Total Sensor History: S(t) = (S0, S1, . . . , St−1)

Result:limt→∞

I(S(t); W0) = H(W0)

(Klyubin et al., 2007), and see also (Ashby, 1956; Touchette and Lloyd, 2000, 2004)

Daniel Polani Information Theory in Intelligent Decision Making

Page 76: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Observations

Key Motto

There is no perpetuum mobile of 3rd kind.

Actually, rather, there may be no free lunch, but sometimes there is free beer.

Information Balance Sheet

Task Invariant: H(W0) determines minimum informationrequired to get to center

Task Variant: but can be spread/concentrated differently over

timeenvironment and agents (“stigmergy”)sensors and memory

(Klyubin et al., 2004a,b, 2007; van Dijk et al., 2010)

Note: invariance is purely entropic: indifferent to task

Next Step

refine towards specific tasks

Daniel Polani Information Theory in Intelligent Decision Making

Page 77: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Observations

Key Motto

There is no perpetuum mobile of 3rd kind.Actually, rather, there may be no free lunch, but sometimes there is free beer.

Information Balance Sheet

Task Invariant: H(W0) determines minimum informationrequired to get to center

Task Variant: but can be spread/concentrated differently over

timeenvironment and agents (“stigmergy”)sensors and memory

(Klyubin et al., 2004a,b, 2007; van Dijk et al., 2010)

Note: invariance is purely entropic: indifferent to task

Next Step

refine towards specific tasks

Daniel Polani Information Theory in Intelligent Decision Making

Page 78: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Information for Decision Making

Replace gradient follower by general policy π

Dynamics

. . . St−2

At−2

St−1

π

At−1

St

π

At

St+1

π

At+1

St+2. . .

π

Utility

Vπ(s) := Eπ[Rt + Rt+1 + · · · | s]

= ∑a

π(a|s)∑s′

Pass′[Ra

ss′ + V(s′)]

Daniel Polani Information Theory in Intelligent Decision Making

Page 79: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

A Parsimony Principle

Traditional MDP

Task: find best policy π∗

Traditional RL: does not consider decision costs

Credo: information processing expensive in biology!(Laughlin et al., 1998; Laughlin, 2001; Polani, 2009)

Hypothesis: organisms trade off information-processing costs withtask payoff(Tishby and Polani, 2011; Polani, 2009; Laughlin, 2001)

Therefore: include information cost and expand to I-MDP(Polani et al., 2006; Tishby and Polani, 2011)

Principle of Information Parsimony

minimize I(S; A) (relevant information) at fixed utility level

Daniel Polani Information Theory in Intelligent Decision Making

Page 80: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Motto

It is a very sad thing that nowadays there is so little uselessinformation.

Oscar Wilde

Daniel Polani Information Theory in Intelligent Decision Making

Page 81: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Relevant Information and its Policies

Computation

Via Lagrangian formalism:(Stratonovich, 1965; Polani et al., 2006; Belavkin, 2008, 2009; Still and Precup, 2012; Saerens et al., 2009; Tishbyand Polani, 2011)

find:min

π

(I(S; A)− βE[Vπ(S)]

)β→ ∞: policy is optimal while informationally parsimonious!

β finite: policy suboptimal at fixed level E[Vπ(S)] whileinformationally parsimonious

I(S; A) as well as Vπ depend on π

Expectation

for higher utility, more relevant information required

and vice versa

Daniel Polani Information Theory in Intelligent Decision Making

Page 82: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Experiments

Scenario

Define (Pass′ , Ra

ss′) by:

States: grid worldActions: north, east, south,

west

Reward: action produces a“reward” of -1 untilgoal reached

Experiment

Trade off utility and relevantinformation

Question

Form of expected trade-off?

B

A

Daniel Polani Information Theory in Intelligent Decision Making

Page 83: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Experiment — Find the Corner

-50

-40

-30

-20

-10

0

0 0.2 0.4 0.6 0.8 1 1.2

E[Q(S,A)]

I(S;A)

Optimal Case

goal B has higher utilitythan A

but needs a lot moreinformation per step

Suboptimal Case

goal B much worse thangoal A

for same information cost

Daniel Polani Information Theory in Intelligent Decision Making

Page 84: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Experiment — Find the Corner

-50

-40

-30

-20

-10

0

0 0.2 0.4 0.6 0.8 1 1.2

E[Q(S,A)]

I(S;A)

Optimal Case

goal B has higher utilitythan A

but needs a lot moreinformation per step

Suboptimal Case

goal B much worse thangoal A

for same information cost

Daniel Polani Information Theory in Intelligent Decision Making

Page 85: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Experiment — With a Twist I

Experiment Revisited

grid-world again

consider only goal A

cost as before

The “Twist”(Polani, 2011)

permute directions north, east, south, west!

random fixed permutation of directions for each state

replace (Pass′ , Ra

ss′) by (Pass′ , Ra

ss′) where

Pass′ := Pσs(a)

ss′

Rass′ := Rσs(a)

ss′

Daniel Polani Information Theory in Intelligent Decision Making

Page 86: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Experiment — With a Twist II

Expectation

as a traditional MDP, “twisted” MDP (Pass′ , Ra

ss′) remainsexactly equivalent:

same optimal values

V∗(s) = V∗(s), s ∈ S

same optimal policy after undoing twistpre-/post-twist policies equivalent via

Qπ(s, a) = Qπ(s, σs(a))π(s, a) = π(s, σs(a))

And as I-MDP?

Daniel Polani Information Theory in Intelligent Decision Making

Page 87: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Experiment With a Twist: Uh-Oh!

-50

-40

-30

-20

-10

0

0 0.2 0.4 0.6 0.8 1 1.2

E[V(S)]

I(S;A)

Optimal Case

sanity check: utility samefor original and twisted

but latter needs a lotmore information per step

Suboptimal Case

twisted MDP becomesmuch worse than original

at same information cost

Daniel Polani Information Theory in Intelligent Decision Making

Page 88: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Experiment With a Twist: Uh-Oh!

-50

-40

-30

-20

-10

0

0 0.2 0.4 0.6 0.8 1 1.2

E[V(S)]

I(S;A)

Optimal Case

sanity check: utility samefor original and twisted

but latter needs a lotmore information per step

Suboptimal Case

twisted MDP becomesmuch worse than original

at same information cost

Daniel Polani Information Theory in Intelligent Decision Making

Page 89: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Intermediate Conclusions

Insights

as traditional MDP both experiments fully equivalent

as I-MDP, however . . .

significant difference between

agent “taking actions with it” andhaving “realigned” set of actions at each step

embodiment allows to offload informational effort(eg. Paul, 2006; Pfeifer and Bongard, 2007)

Daniel Polani Information Theory in Intelligent Decision Making

Page 90: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Part VIII

Goal-Relevant Information

Daniel Polani Information Theory in Intelligent Decision Making

Page 91: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Towards Multiple Goals

Extension

assume family of tasks (e.g. multiple goals)

action now depends on both state and goals

St−1

At−1

St

At

St+1

G

Goal-Relevant Information

I(G; At|st) = H(At|st)− H(At|Gt, st)

Daniel Polani Information Theory in Intelligent Decision Making

Page 92: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Towards Multiple Goals

Extension

assume family of tasks (e.g. multiple goals)

action now depends on both state and goals

St−1

At−1

St

At

St+1

G

Goal-Relevant Information (Regularized)

minπ(at|st,g)

(I(G; At|St)− βE[Vπ(St, G, At)]

)Daniel Polani Information Theory in Intelligent Decision Making

Page 93: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Goal-Relevant Information

I(G; At|st)

Daniel Polani Information Theory in Intelligent Decision Making

Page 94: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Goal-Relevant and Sensor Information Trade-Offs

0

0.1

0.2

0.3

0.4

0.5

0.6

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

I(S t

;At|G

)

I(G; At|St)

α = 0

α = 1

Lagrangian

minπ(at|st,g)

[(1− α)I(G; At|St) + αI(St; At|G)− βE[Vπ(St, G, At)]

]Daniel Polani Information Theory in Intelligent Decision Making

Page 95: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Information “Caching”

Note

not only the how much of goal-relevant information matters

but also the which

Consider

Accessible History (Context): e.g.

At−1 = (A0, A1, . . . , At−1)

“Cache Fetch”: new goal-relevant information not already used

I(At; G|At−1) = H(At|At−1)− H(At|G, At−1)

Daniel Polani Information Theory in Intelligent Decision Making

Page 96: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Subgoals

I(At; G|At−1, s) I(At−1; G|At, s)new goal information discarded goal information

(van Dijk and Polani, 2011; van Dijk and Polani, 2013)

Psychological Connections?

Crossing doors causes forgetting(see also Radvansky et al., 2011)

Daniel Polani Information Theory in Intelligent Decision Making

Page 97: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Subgoals

I(At; G|At−1, s) I(At−1; G|At, s)new goal information discarded goal information

(van Dijk and Polani, 2011; van Dijk and Polani, 2013)

Psychological Connections?

Crossing doors causes forgetting(see also Radvansky et al., 2011)

Daniel Polani Information Theory in Intelligent Decision Making

Page 98: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Efficient Relevant Goal Information(van Dijk and Polani, 2013)

“Most Efficient” Goal

G −→ G1 −→ A←− Smin

I(G1;A|S)≥CI(G; G1)

February 14, 2013 17:28 WSPC/INSTRUCTION FILE acs12

14 S.G. van Dijk and D. Polani

(a) |G1| = 3 (b) |G1| = 4

(c) |G1| = 5 (d) |G1| = 6

Fig. 8: Goal clusters induced by the bottleneck G1 on the primary goal-information

pathway in a 6-room grid world navigation task. Figures (a) to (d) show the map-

pings for increasing cardinality of the bottleneck variable.

distribution for this pathway:

minp(g2|g)

I(G; G2) subj. to I(St; At|G2) ≥ CI2(7)

5.1. Observation 4: Natural Abstraction

Firstly we will study the primary pathway, constrained with bottleneck G1, and

solve (6) to find the goal mapping induced by this bottleneck on the pathway. Figure

8 shows such mappings found for different capacities of the bottleneck variable

in a 6-room grid navigation scenario, with the lower bound CI1fixed as high as

possible (see Appendix C for more details), such that the clustering becomes most

informative.

One result to note is that the stringent lower bound results in a hard clustering:

each goal is deterministicly mapped to a single element in G1. Secondly, the map-

ping adheres to the local connectivity of goals: goal states in the same cluster are

connected directly in the transition graph of the MDP.

Moreover, the clustering also attempts to adhere to the physical boundaries of

Daniel Polani Information Theory in Intelligent Decision Making

Page 99: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Efficient Relevant Goal Information(van Dijk and Polani, 2013)

“Most Efficient” Goal

G −→ G1 −→ A←− Smin

I(G1;A|S)≥CI(G; G1)

February 14, 2013 17:28 WSPC/INSTRUCTION FILE acs12

14 S.G. van Dijk and D. Polani

(a) |G1| = 3 (b) |G1| = 4

(c) |G1| = 5 (d) |G1| = 6

Fig. 8: Goal clusters induced by the bottleneck G1 on the primary goal-information

pathway in a 6-room grid world navigation task. Figures (a) to (d) show the map-

pings for increasing cardinality of the bottleneck variable.

distribution for this pathway:

minp(g2|g)

I(G; G2) subj. to I(St; At|G2) ≥ CI2(7)

5.1. Observation 4: Natural Abstraction

Firstly we will study the primary pathway, constrained with bottleneck G1, and

solve (6) to find the goal mapping induced by this bottleneck on the pathway. Figure

8 shows such mappings found for different capacities of the bottleneck variable

in a 6-room grid navigation scenario, with the lower bound CI1fixed as high as

possible (see Appendix C for more details), such that the clustering becomes most

informative.

One result to note is that the stringent lower bound results in a hard clustering:

each goal is deterministicly mapped to a single element in G1. Secondly, the map-

ping adheres to the local connectivity of goals: goal states in the same cluster are

connected directly in the transition graph of the MDP.

Moreover, the clustering also attempts to adhere to the physical boundaries of

Daniel Polani Information Theory in Intelligent Decision Making

Page 100: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Making State Predictive for Actions

“Most Enhancive” Goal

G −→ G2 −→ A←− Smin

I(S;A|G2)≥CI(G; G2)

February 14, 2013 17:28 WSPC/INSTRUCTION FILE acs12

Informational Constraints-Driven Organization in Goal-Directed Behavior 17

(a) |G| = 4 (b) |G| = 5

(c) |G| = 6 (d) |G| = 7

Fig. 10: Goal clusters induced by the bottleneck G2 on the secondary, state-

information modulating goal-information pathway in a 9-room grid world navi-

gation task. Figures (a) to (d) show the mappings for increasing cardinality of the

bottleneck variable.

It is important to note that the global relations between states and goals is

strongly determined by the set of available actions. Consider for instance the subset

of ’north-eastern goals’, i.e. those shaded darkest in Fig. 10a. Knowing that the goal

is in this subset allows the agent to use state knowledge to make the informative

distinction of whether the goal is likely to the north or to the west. But this dis-

tinction is only relevant because the agent has access to distinct actions that define

these directions. Differently defined actions would induce other relations with goals,

and likely a different factorization would appear as goal-based frame of reference. In

the extreme, a much less structured set of actions can have a strong adverse effect

on informational requirements [16], most probably making it difficult to construct

a useful abstraction in constrained pathways.

Daniel Polani Information Theory in Intelligent Decision Making

Page 101: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Making State Predictive for Actions

“Most Enhancive” Goal

G −→ G2 −→ A←− Smin

I(S;A|G2)≥CI(G; G2)

February 14, 2013 17:28 WSPC/INSTRUCTION FILE acs12

Informational Constraints-Driven Organization in Goal-Directed Behavior 17

(a) |G| = 4 (b) |G| = 5

(c) |G| = 6 (d) |G| = 7

Fig. 10: Goal clusters induced by the bottleneck G2 on the secondary, state-

information modulating goal-information pathway in a 9-room grid world navi-

gation task. Figures (a) to (d) show the mappings for increasing cardinality of the

bottleneck variable.

It is important to note that the global relations between states and goals is

strongly determined by the set of available actions. Consider for instance the subset

of ’north-eastern goals’, i.e. those shaded darkest in Fig. 10a. Knowing that the goal

is in this subset allows the agent to use state knowledge to make the informative

distinction of whether the goal is likely to the north or to the west. But this dis-

tinction is only relevant because the agent has access to distinct actions that define

these directions. Differently defined actions would induce other relations with goals,

and likely a different factorization would appear as goal-based frame of reference. In

the extreme, a much less structured set of actions can have a strong adverse effect

on informational requirements [16], most probably making it difficult to construct

a useful abstraction in constrained pathways.

Daniel Polani Information Theory in Intelligent Decision Making

Page 102: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Making State Predictive for Actions

“Most Enhancive” Goal

G −→ G2 −→ A←− Smin

I(S;A|G2)≥CI(G; G2)

Insights

“spillover” ignoring localboundaries

action informationinduces global “frame ofreference”

depends on actionconsistency

February 14, 2013 17:28 WSPC/INSTRUCTION FILE acs12

Informational Constraints-Driven Organization in Goal-Directed Behavior 17

(a) |G| = 4 (b) |G| = 5

(c) |G| = 6 (d) |G| = 7

Fig. 10: Goal clusters induced by the bottleneck G2 on the secondary, state-

information modulating goal-information pathway in a 9-room grid world navi-

gation task. Figures (a) to (d) show the mappings for increasing cardinality of the

bottleneck variable.

It is important to note that the global relations between states and goals is

strongly determined by the set of available actions. Consider for instance the subset

of ’north-eastern goals’, i.e. those shaded darkest in Fig. 10a. Knowing that the goal

is in this subset allows the agent to use state knowledge to make the informative

distinction of whether the goal is likely to the north or to the west. But this dis-

tinction is only relevant because the agent has access to distinct actions that define

these directions. Differently defined actions would induce other relations with goals,

and likely a different factorization would appear as goal-based frame of reference. In

the extreme, a much less structured set of actions can have a strong adverse effect

on informational requirements [16], most probably making it difficult to construct

a useful abstraction in constrained pathways.

Daniel Polani Information Theory in Intelligent Decision Making

Page 103: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Part IX

Empowerment: Motivation

Daniel Polani Information Theory in Intelligent Decision Making

Page 104: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Universal Utilities

Problems

in biology, success criterium is survivalconcept of a “task” and “reward” is not sharp“search space” too large for full-fledged success feedback

pure Darwinism: feedback by deaththis is very sparse

Notes

Homeostasis: provides dense networks to guide living beingsProblem:

specific to particular organismsdesigned on case-to-case basis for artificial agentsmore generalizable perspective in view of success ofevolution?

Daniel Polani Information Theory in Intelligent Decision Making

Page 105: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Universal Utilities

Problems

in biology, success criterium is survivalconcept of a “task” and “reward” is not sharp“search space” too large for full-fledged success feedbackpure Darwinism: feedback by death

this is very sparse

Notes

Homeostasis: provides dense networks to guide living beingsProblem:

specific to particular organismsdesigned on case-to-case basis for artificial agentsmore generalizable perspective in view of success ofevolution?

Daniel Polani Information Theory in Intelligent Decision Making

Page 106: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Universal Utilities

Problems

in biology, success criterium is survivalconcept of a “task” and “reward” is not sharp“search space” too large for full-fledged success feedbackpure Darwinism: feedback by deaththis is very sparse

Notes

Homeostasis: provides dense networks to guide living beingsProblem:

specific to particular organismsdesigned on case-to-case basis for artificial agentsmore generalizable perspective in view of success ofevolution?

Daniel Polani Information Theory in Intelligent Decision Making

Page 107: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Idea

Universal Drives and Utilities

Core Idea: adaptational feedback should be dense and rich

artificial curiosity, learning progress, autotelic principle,intrinsic reward(Schmidhuber, 1991; Kaplan and Oudeyer, 2004; Steels, 2004; Singh et al., 2005)

homeokinesis, and predictive information(Der, 2001; Ay et al., 2008)

Physical Principle:

causal entropic forcing(Wissner-Gross and Freer, 2013)

Daniel Polani Information Theory in Intelligent Decision Making

Page 108: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Present Ansatz

Use Embodiment

optimize informational fit into the sensorimotor niche

maximization of potentialto inject information into the environment (via actuators)and recapture it from the environment (via sensors)

(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)

Daniel Polani Information Theory in Intelligent Decision Making

Page 109: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Here: Empowerment

Motto

“Being in control of one’s destiny

and knowing it

is good.”(Jung et al., 2011)

More Precisely

information-theoretic version of

controllability (being in control of destiny)

observability (knowing about it)

combined

Daniel Polani Information Theory in Intelligent Decision Making

Page 110: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Here: Empowerment

Motto

“Being in control of one’s destinyand knowing it

is good.”(Jung et al., 2011)

More Precisely

information-theoretic version of

controllability (being in control of destiny)

observability (knowing about it)

combined

Daniel Polani Information Theory in Intelligent Decision Making

Page 111: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Here: Empowerment

Motto

“Being in control of one’s destinyand knowing it

is good.”(Jung et al., 2011)

More Precisely

information-theoretic version of

controllability (being in control of destiny)

observability (knowing about it)

combined

Daniel Polani Information Theory in Intelligent Decision Making

Page 112: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Formalism

Bayesian Network

. . .Wt−3

St−3

. . . Mt−3

At−3

Wt−2

St−2

Mt−2

At−2

Wt−1

St−1

Mt−1

At−1

Wt

St

Mt

At

Wt+1

St+1

Mt+1. . .

At+1

Wt+2. . .

(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)

Daniel Polani Information Theory in Intelligent Decision Making

Page 113: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Formalism

Bayesian Network

. . .Wt−3

St−3 At−3

Wt−2

St−2 At−2

Wt−1

St−1 At−1

Wt

St At

Wt+1

St+1 At+1

Wt+2. . .

(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)

Daniel Polani Information Theory in Intelligent Decision Making

Page 114: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Formalism

Bayesian Network

. . .Wt−3

St−3 At−3

Wt−2

St−2 At−2

Wt−1

St−1 At−1

Wt

St At

Wt+1

St+1 At+1

Wt+2. . .

(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)

Daniel Polani Information Theory in Intelligent Decision Making

Page 115: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Formalism

Bayesian Network

. . .Wt−3

St−3 At−3

Wt−2

St−2 At−2

Wt−1

St−1 At−1

Wt

St At

Wt+1

St+1 At+1

Wt+2. . .

(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)

Daniel Polani Information Theory in Intelligent Decision Making

Page 116: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Formalism

Bayesian Network

. . .Wt−3

St−3 At−3

Wt−2

St−2 At−2

Wt−1

St−1 At−1

Wt

St At

Wt+1

St+1 At+1

Wt+2. . .

(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)

Daniel Polani Information Theory in Intelligent Decision Making

Page 117: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Formalism

Bayesian Network

. . .Wt−3

St−3 At−3

Wt−2

St−2 At−2

Wt−1

St−1 At−1

Wt

St At

Wt+1

St+1 At+1

Wt+2. . .

(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)

Daniel Polani Information Theory in Intelligent Decision Making

Page 118: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Formalism

Bayesian Network

. . .Wt−3

St−3 At−3

Wt−2

St−2 At−2

Wt−1

St−1 At−1

Wt

St At

Wt+1

St+1 At+1

Wt+2. . .

(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)

Daniel Polani Information Theory in Intelligent Decision Making

Page 119: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Formalism

Bayesian Network

. . .Wt−3

St−3 At−3

Wt−2

St−2 At−2

Wt−1

St−1 At−1

Wt

St At

Wt+1

St+1 At+1

Wt+2. . .

(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)

Daniel Polani Information Theory in Intelligent Decision Making

Page 120: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Formalism

Bayesian Network

. . .Wt−3

St−3 At−3

Wt−2

St−2 At−2

Wt−1

St−1 At−1

Wt

St At

Wt+1

St+1 At+1

Wt+2. . .

(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)

Daniel Polani Information Theory in Intelligent Decision Making

Page 121: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Formalism

Bayesian Network

. . .Wt−3

St−3 At−3

Wt−2

St−2 At−2

Wt−1

St−1 At−1

Wt

St At

Wt+1

St+1 At+1

Wt+2. . .

(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)

Daniel Polani Information Theory in Intelligent Decision Making

Page 122: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Formalism

Bayesian Network

. . .Wt−3

St−3 At−3

Wt−2

St−2 At−2

Wt−1

St−1 At−1

Wt

St At

Wt+1

St+1 At+1

Wt+2. . .

“Free Will” Actions

Empowerment: Formal Definition

E(k) := maxp(at−k ,at−k+1,...,at−1)

I(At−k, At−k+1, . . . , At−1; St)

(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)

Daniel Polani Information Theory in Intelligent Decision Making

Page 123: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Formalism

Bayesian Network

. . .Wt−3

St−3 At−3

Wt−2

St−2 At−2

Wt−1

St−1 At−1

Wt

St At

Wt+1

St+1 At+1

Wt+2. . .

“Free Will” Actions

Empowerment: Formal Definition

E(k)(wt−k) :=max

p(at−k ,at−k+1,...,at−1|wt−k)I(At−k, At−k+1, . . . , At−1; St|wt−k)

(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)

Daniel Polani Information Theory in Intelligent Decision Making

Page 124: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Formalism

Bayesian Network

. . .Wt−3

St−3 At−3

Wt−2

St−2 At−2

Wt−1

St−1 At−1

Wt

St At

Wt+1

St+1 At+1

Wt+2. . .

“Free Will” Actions

Empowerment: Formal Definition

E(k)(wt−k) := maxp(a(k)t−k |wt−k)

I(A(k)t−k; St|wt−k)

(Klyubin et al., 2005a,b; Nehaniv et al., 2007; Klyubin et al., 2008)

Daniel Polani Information Theory in Intelligent Decision Making

Page 125: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Empowerment — a Universal Utility

Notes

Empowerment E(k)(w) defined

given horizon k, i.e. local

given starting state w (or context, for POMDPs)

i.e. empowerment is function of state, “utility”

However

only defined by world dynamics

no reward function assumed

Daniel Polani Information Theory in Intelligent Decision Making

Page 126: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Empowerment — Notes

Properties of Empowerment

want to maximize potential information flow

could be injected through the actuatorsinto the environmentand recaptured by sensors in the future

potential influence on the environmentwhich is detectable through agent sensorsdetermined by embodiment Pa

ss′ onlyno external reward Ra

ss′

Bottom Line

information-theoretic controllability/observabilityinformational efficiency of sensorimotor niche

Daniel Polani Information Theory in Intelligent Decision Making

Page 127: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Other Interpretations

Related Concepts

mobility

money

affordance

graph centrality(Anthony et al., 2008)

antithesis to “helplessness”(Seligman and Maier, 1967; Overmier and Seligman, 1967)

Think Strategic

Tactics is what you do when you have a plan

Strategy is what you do when you haven’t

Daniel Polani Information Theory in Intelligent Decision Making

Page 128: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Part X

First Examples

Daniel Polani Information Theory in Intelligent Decision Making

Page 129: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Maze Empowerment

maze average distance

E ∈ [1; 2.32] E ∈ [1.58; 3.70] E ∈ [3.46; 5.52] E ∈ [4.50; 6.41]

Daniel Polani Information Theory in Intelligent Decision Making

Page 130: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Empowerment vs. Average Distance

*

******

**

* *

* *

*

****

*

* *

**

*

*

****

* *

**

*

*

***** *

***

*****

*

**

**

*****

*

*

***

**

*

***

*

*

****

*

**

*

*

*

*

*

*

*

*

*

*

*

**

*

*

*

*

*

*

*

*

6 8 10 12 14 16

4.5

5.0

5.5

6.0

d

EE

Daniel Polani Information Theory in Intelligent Decision Making

Page 131: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Box Pushing

stationary box pushable box

box invisibleto agent

E ∈ [5.86, 5.93] E = log2 61 ≈ 5.93 bit

box visible toagent

E ∈ [5.86, 5.93] E ∈ [5.93, 7.79]

Daniel Polani Information Theory in Intelligent Decision Making

Page 132: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

In the Continuum:Pendulum Swing-up Task w/o Reward(Jung et al., 2011)

Dynamics

pendulum (length l = 1, mass m = 1, grav g = 9.81, friction µ = 0.05)

ϕ(t) =−µϕ(t) + mgl sin(ϕ(t)) + u(t)

ml2

with state st = (ϕ(t), ϕ(t)) and continuous control u ∈ [−5, 5].system time discretized to ∆ = 0.05 secdiscretize actions to u ∈ −5,−2.5, 0,+2.5,+5

Goal

To provide this system with some matching purpose, considerpendulum swing-up task

Comparison

empowerment-based controltraditional optimal control

Optimal control problem is solved by approximate dynamic programming on a high-resolution grid.Daniel Polani Information Theory in Intelligent Decision Making

Page 133: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Results: Performance

0 1 2 3 4 5 6 7 8 9 10−2

−1

0

1

2

3

4

5

Time (sec)

Performance of optimal policy (FVI+KNN on 1000x1000 grid)

phiphidot

0 1 2 3 4 5 6 7 8 9 10−5

−4

−3

−2

−1

0

1

2

Time (sec)

Performance of maximally empowered policy (3−step)

phiphidot

Phase plot of ϕ and ϕ when following the respective greedy policy from the last slide. Note that for ϕ, the y-axisshows the height of the pendulum (+1 means upright, the goal state).Daniel Polani Information Theory in Intelligent Decision Making

Page 134: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Results: “Explored” Space

−pi −pi/2 0 pi/2 pi−6

−4

−2

0

2

4

6Empowerment−based Exploration

φ [rad]

φ’ [r

ad/s

]

Action 0

Action 1

Action 2

Action 3

Action 4

Daniel Polani Information Theory in Intelligent Decision Making

Page 135: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Empowerment: Acrobot(Jung et al., 2011)

Setting

two-linked pendulumactuation in hip only

Idea

Add LQR control to bang-bang control

Daniel Polani Information Theory in Intelligent Decision Making

Page 136: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Acrobot: Demo

Daniel Polani Information Theory in Intelligent Decision Making

Page 137: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Block’s World(Salge, 2013)

Properties

scenario with modifiable world

deterministic (i.e. empowerment is log n where n is thenumber of reachable states in horizon k)

agent can incorporate, place, destroy blocks and move

estimated via (highly incomplete) sampling

Empowered “Minecrafter”

(Salge, 2013)

Daniel Polani Information Theory in Intelligent Decision Making

Page 138: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Explorer Accompanying Robot(Glackin et al., 2015)

Consortium

Demonstrator II

Daniel Polani Information Theory in Intelligent Decision Making

Page 139: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Part XI

References

Daniel Polani Information Theory in Intelligent Decision Making

Page 140: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Anthony, T., Polani, D., and Nehaniv, C. L. (2008). On preferredstates of agents: how global structure is reflected in localstructure. In Bullock, S., Noble, J., Watson, R., and Bedau,M. A., editors, Artificial Life XI: Proceedings of the EleventhInternational Conference on the Simulation and Synthesis ofLiving Systems, Winchester 5–8. Aug., pages 25–32. MITPress, Cambridge, MA.

Ashby, W. R. (1956). An Introduction to Cybernetics. Chapman &Hall Ltd.

Ay, N., Bertschinger, N., Der, R., Guttler, F., and Olbrich, E.(2008). Predictive information and explorative behavior ofautonomous robots. European Journal of Physics B,63:329–339.

Baylor, D., Lamb, T., and Yau, K. (1979). Response of retinalrods to single photons. Journal of Physiology, London,288:613–634.

Belavkin, R. (2008). The duality of utility and information inoptimally learning systems. In Proc. 7th IEEE InternationalConference on ’Cybernetic Intelligent Systems’. IEEE Press.

Daniel Polani Information Theory in Intelligent Decision Making

Page 141: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Belavkin, R. V. (2009). Bounds of optimal learning. In AdaptiveDynamic Programming and Reinforcement Learning, 2009.ADPRL’09. IEEE Symposium on, pages 199–204. IEEE.

Biehl, M. (2013). Kullback-leibler and bayes. Internal Memo.

Cover, T. M. and Thomas, J. A. (2006). Elements of InformationTheory. Wiley, 2nd edition.

Denk, W. and Webb, W. W. (1989). Thermal-noise-limitedtransduction observed in mechanosensory receptors of theinner ear. Phys. Rev. Lett., 63(2):207–210.

Der, R. (2001). Self-organized acqusition of situated behavior.Theory Biosci., 120:1–9.

Dusenbery, D. B. (1992). Sensory Ecology. W. H. Freeman andCompany, New York.

Glackin, C., Salge, C., Trendafilov, D., Greaves, M., Polani, D.,Leu, A., Haque, S. J. U., Slavnic, S., , and Ristic-Durrant, D.(2015). An information-theoretic intrinsic motivation modelfor robot navigation and path planning.

Daniel Polani Information Theory in Intelligent Decision Making

Page 142: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Hecht, S., Schlaer, S., and Pirenne, M. (1942). Energy, quanta andvision. Journal of the Optical Society of America, 38:196–208.

Jung, T., Polani, D., and Stone, P. (2011). Empowerment forcontinuous agent-environment systems. Adaptive Behaviour,19(1):16–39. Published online 13. January 2011.

Kaplan, F. and Oudeyer, P.-Y. (2004). Maximizing learningprogress: an internal reward system for development. In Iida,F., Pfeifer, R., Steels, L., and Kuniyoshi, Y., editors,Embodied Artificial Intelligence, volume 3139 of LNAI, pages259–270. Springer.

Klyubin, A., Polani, D., and Nehaniv, C. (2007). Representationsof space and time in the maximization of information flow inthe perception-action loop. Neural Computation,19(9):2387–2432.

Klyubin, A. S., Polani, D., and Nehaniv, C. L. (2004a).Organization of the information flow in the perception-actionloop of evolved agents. In Proceedings of 2004 NASA/DoDConference on Evolvable Hardware, pages 177–180. IEEEComputer Society.

Daniel Polani Information Theory in Intelligent Decision Making

Page 143: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Klyubin, A. S., Polani, D., and Nehaniv, C. L. (2004b). Trackinginformation flow through the environment: Simple cases ofstigmergy. In Pollack, J., Bedau, M., Husbands, P., Ikegami,T., and Watson, R. A., editors, Artificial Life IX: Proceedingsof the Ninth International Conference on Artificial Life, pages563–568. MIT Press.

Klyubin, A. S., Polani, D., and Nehaniv, C. L. (2005a). All elsebeing equal be empowered. In Advances in Artificial Life,European Conference on Artificial Life (ECAL 2005), volume3630 of LNAI, pages 744–753. Springer.

Klyubin, A. S., Polani, D., and Nehaniv, C. L. (2005b).Empowerment: A universal agent-centric measure of control.In Proc. IEEE Congress on Evolutionary Computation, 2-5September 2005, Edinburgh, Scotland (CEC 2005), pages128–135. IEEE.

Klyubin, A. S., Polani, D., and Nehaniv, C. L. (2008). Keep youroptions open: An information-based driving principle forsensorimotor systems. PLoS ONE, 3(12):e4018.

Daniel Polani Information Theory in Intelligent Decision Making

Page 144: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Laughlin, S. B. (2001). Energy as a constraint on the coding andprocessing of sensory information. Current Opinion inNeurobiology, 11:475–480.

Laughlin, S. B., de Ruyter van Steveninck, R. R., and Anderson,J. C. (1998). The metabolic cost of neural information.Nature Neuroscience, 1(1):36–41.

Nehaniv, C. L., Polani, D., Olsson, L. A., and Klyubin, A. S.(2007). Information-theoretic modeling of sensory ecology:Channels of organism-specific meaningful information. InLaubichler, M. D. and Muller, G. B., editors, ModelingBiology: Structures, Behaviour, Evolution, The Vienna Seriesin Theoretical Biology, pages 241–281. MIT press.

Overmier, J. B. and Seligman, M. E. P. (1967). Effects ofinescapable shock upon subsequent escape and avoidanceresponding. Journal of Comparative and PhysiologicalPsychology, 63:28–33.

Paul, C. (2006). Morphological computation: A basis for theanalysis of morphology and control requirements. Roboticsand Autonomous Systems, 54(8):619–630.

Daniel Polani Information Theory in Intelligent Decision Making

Page 145: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Pfeifer, R. and Bongard, J. (2007). How the Body Shapes the WayWe think: A New View of Intelligence. Bradford Books.

Polani, D. (2009). Information: Currency of life? HFSP Journal,3(5):307–316.

Polani, D. (2011). An informational perspective on how theembodiment can relieve cognitive burden. In Proc. IEEESymposium Series in Computational Intelligence 2011 —Symposium on Artificial Life, pages 78–85. IEEE.

Polani, D., Nehaniv, C., Martinetz, T., and Kim, J. T. (2006).Relevant information in optimized persistence vs. progenystrategies. In Rocha, L. M., Bedau, M., Floreano, D.,Goldstone, R., Vespignani, A., and Yaeger, L., editors, Proc.Artificial Life X, pages 337–343.

Radvansky, G. A., Krawietz, S. A., and Tamplin, A. K. (2011).Walking through doorways causes forgetting: Furtherexplorations. The Quarterly Journal of ExperimentalPsychology, 64(8):1632–1645.

Daniel Polani Information Theory in Intelligent Decision Making

Page 146: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Saerens, M., Achbany, Y., Fuss, F., and Yen, L. (2009).Randomized shortest-path problems: Two related models.Neural Computation, 21:2363–2404.

Salge, C. (2013). Block’s world. Presented at GSO 2013.

Schmidhuber, J. (1991). A possibility for implementing curiosityand boredom in model-building neural controllers. In Meyer,J. A. and Wilson, S. W., editors, Proc. of the InternationalConference on Simulation of Adaptive Behavior: FromAnimals to Animats, pages 222–227. MIT Press/BradfordBooks.

Seligman, M. E. P. and Maier, S. F. (1967). Failure to escapetraumatic shock. Journal of Experimental Psychology, 74:1–9.

Singh, S., Barto, A. G., and Chentanez, N. (2005). Intrinsicallymotivated reinforcement learning. In Proceedings of the 18thAnnual Conference on Neural Information Processing Systems(NIPS), Vancouver, B.C., Canada.

Steels, L. (2004). The autotelic principle. In Iida, F., Pfeifer, R.,Steels, L., and Kuniyoshi, Y., editors, Embodied Artificial

Daniel Polani Information Theory in Intelligent Decision Making

Page 147: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Intelligence: Dagstuhl Castle, Germany, July 7-11, 2003,volume 3139 of Lecture Notes in AI, pages 231–242. SpringerVerlag, Berlin.

Still, S. and Precup, D. (2012). An information-theoretic approachto curiosity-driven reinforcement learning. Theory inBiosciences, 131(3):139–148.

Stratonovich, R. (1965). On value of information. Izvestiya ofUSSR Academy of Sciences, Technical Cybernetics, 5:3–12.

Tishby, N. and Polani, D. (2011). Information theory of decisionsand actions. In Cutsuridis, V., Hussain, A., and Taylor, J.,editors, Perception-Action Cycle: Models, Architecture andHardware, pages 601–636. Springer.

Touchette, H. and Lloyd, S. (2000). Information-theoretic limits ofcontrol. Phys. Rev. Lett., 84:1156.

Touchette, H. and Lloyd, S. (2004). Information-theoretic approachto the study of control systems. Physica A, 331:140–172.

van Dijk, S. and Polani, D. (2011). Grounding subgoals ininformation transitions. In Proc. IEEE Symposium Series in

Daniel Polani Information Theory in Intelligent Decision Making

Page 148: Information Theory in Intelligent Decision Making · 2015. 3. 5. · Daniel Polani Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire,

Computational Intelligence 2011 — Symposium on AdaptiveDynamic Programming and Reinforcement Learning, pages105–111. IEEE.

van Dijk, S. and Polani, D. (2013). Informationalconstraints-driven organization in goal-directed behavior.Advances in Complex Systems, 16(2-3). Published online, 30.April 2013, DOI:10.1142/S0219525913500161.

van Dijk, S. G., Polani, D., and Nehaniv, C. L. (2010). What doyou want to do today? relevant-information bookkeeping ingoal-oriented behaviour. In Proc. Artificial Life, Odense,Denmark, pages 176–183.

Vergassola, M., Villermaux, E., and Shraiman, B. I. (2007).’infotaxis’ as a strategy for searching without gradients.Nature, 445:406–409.

Wissner-Gross, A. D. and Freer, C. E. (2013). Causal entropicforcing. Physics Review Letters, 110(168702).

Daniel Polani Information Theory in Intelligent Decision Making