probabilistic models of learning and...

54
PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian inference MÁTÉ LENGYEL Computational and Biological Learning Lab Department of Engineering University of Cambridge

Upload: others

Post on 22-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

PROBABILISTIC MODELS OF LEARNING AND MEMORYUncertainty and Bayesian inference

MÁTÉ LENGYEL

Computational and Biological Learning LabDepartment of Engineering

University of Cambridge

Page 2: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009 2

Page 3: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009 2

listen to the words

Page 4: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

UNCONSCIOUS INFERENCES

2

Hermann von Helmholtz1867

listen to the words

Page 5: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

UNCONSCIOUS INFERENCES

2

Adelson, unpubl

Hermann von Helmholtz1867

listen to the words

Page 6: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

UNCONSCIOUS INFERENCES

2

Adelson, unpubl

Hermann von Helmholtz1867

listen to the words

Page 7: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

UNCONSCIOUS INFERENCES

2

Adelson, unpubl

Hermann von Helmholtz1867

stimulus

percept

prior knowledge

listen to the words

Page 8: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

UNCONSCIOUS INFERENCES, CONT’D

3

stimulus

percept

prior knowledge

Page 9: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

UNCONSCIOUS INFERENCES, CONT’D

3

seat

radio

table

rocking

bench

boat

chair Roed

iger

& M

cDer

mot

t, 1

995

stimulus

percept

prior knowledge

Page 10: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

UNCONSCIOUS INFERENCES, CONT’D

3

seat

radio

table

rocking

bench

boat

chair

✗✓

✓✗✗✗✗ Ro

edig

er &

McD

erm

ott,

199

5

stimulus

percept

prior knowledge

Page 11: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

UNCONSCIOUS INFERENCES, CONT’D

3

seat

radio

table

rocking

bench

boat

chair

✗✓

✓✗✗✗✗ Ro

edig

er &

McD

erm

ott,

199

5

experience

memories

stimulus

percept

prior knowledge

Page 12: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

UNCONSCIOUS INFERENCES, CONT’D

3

seat

radio

table

rocking

bench

boat

chair

✗✓

✓✗✗✗✗ Ro

edig

er &

McD

erm

ott,

199

5

experience

memories

stimulus

percept

prior knowledge

learning

Page 13: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

FORMALISING INFERENCES

4

possible

impossible

belief

Page 14: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

FORMALISING INFERENCES

4

possible

impossible

belief

shade of square A

Page 15: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

FORMALISING INFERENCES

4

possible

impossible

belief

physical luminance

shade of square A

Page 16: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

FORMALISING INFERENCES

4

possible

impossible

belief

physical luminanceknowledge of checkerboards

shade of square A

Page 17: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

FORMALISING INFERENCES

4

possible

impossible

belief

physical luminanceknowledge of checkerboards

shade of square A

Page 18: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

FORMALISING INFERENCES

4

possible

impossible

belief

physical luminanceknowledge of checkerboards

shade of square B

Page 19: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

FORMALISING INFERENCES

4

possible

impossible

belief

physical luminanceknowledge of checkerboards

shade of square B

Page 20: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

FORMALISING INFERENCES

4

possible

impossible

belief

physical luminanceknowledge of checkerboards

shade of square B

Page 21: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

REPRESENTING DEGREES OF BELIEFS AS PROBABILITIES

5

0

physical luminanceknowledge of checkerboards

shade of square B

P(shade of square B)

Page 22: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

REPRESENTING DEGREES OF BELIEFS AS PROBABILITIES

5

0

physical luminanceknowledge of checkerboards

shade of square B

P(shade of square B)

Dutch Book Theorem:If you are willing to bet on your beliefs, then unless they satisfy the axioms of probability, there will always be a set of bets (a “Dutch book”) that you would accept which is guaranteed to lose you money, no matter what the outcome is!

odds(shade of square B = x) =!$ if shade of square B "= x

+$ if shade of square B = x=

P(shade of square B = x)P(shade of square B "= x)

Page 23: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

THE AXIOMS OF PROBABILITY(HOW TO REPRESENT YOUR BELIEFS)

6

Page 24: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

THE AXIOMS OF PROBABILITY(HOW TO REPRESENT YOUR BELIEFS)

6

0 ! P(x) ! 1

P(x|y) belief in x if we know y is true

P(x) = 1 ! x is certainly trueP(x) = 0 ! x is certainly not true

properties:

Page 25: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

THE AXIOMS OF PROBABILITY(HOW TO REPRESENT YOUR BELIEFS)

6

0 ! P(x) ! 1

P(x|y) belief in x if we know y is true

P(x) = 1 ! x is certainly trueP(x) = 0 ! x is certainly not true

properties:

axioms:

probabilities are non-negative: P(x) ! 0

Page 26: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

THE AXIOMS OF PROBABILITY(HOW TO REPRESENT YOUR BELIEFS)

6

0 ! P(x) ! 1

P(x|y) belief in x if we know y is true

P(x) = 1 ! x is certainly trueP(x) = 0 ! x is certainly not true

properties:

axioms:

!

x

P(x) = 1probabilities are normalised

probabilities are non-negative: P(x) ! 0

Page 27: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

THE AXIOMS OF PROBABILITY(HOW TO REPRESENT YOUR BELIEFS)

6

joint probability

0 ! P(x) ! 1

P(x|y) belief in x if we know y is true

P(x) = 1 ! x is certainly trueP(x) = 0 ! x is certainly not true

properties:

axioms:

!

x

P(x) = 1probabilities are normalised

P(x, y) = P(x) · P(y) ! x and y they are independent

P(x, y) ! P(x) =!

y

P(x, y) marginal probability

probabilities are non-negative: P(x) ! 0

Page 28: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

THE AXIOMS OF PROBABILITY(HOW TO REPRESENT YOUR BELIEFS)

6

joint probability

conditional probability by Bayes’ rule

0 ! P(x) ! 1

P(x|y) belief in x if we know y is true

P(x) = 1 ! x is certainly trueP(x) = 0 ! x is certainly not true

properties:

axioms:

!

x

P(x) = 1probabilities are normalised

P(x, y) = P(x|y) · P(y) = P(y|x) · P(x)

P(x, y) = P(x) · P(y) ! x and y they are independent

P(x, y) ! P(x) =!

y

P(x, y) marginal probability

probabilities are non-negative: P(x) ! 0

Page 29: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

THE AXIOMS OF PROBABILITY(HOW TO REPRESENT YOUR BELIEFS)

6

joint probability

conditional probability by Bayes’ rule

0 ! P(x) ! 1

P(x|y) belief in x if we know y is true

P(x) = 1 ! x is certainly trueP(x) = 0 ! x is certainly not true

properties:

axioms:

!

x

P(x) = 1probabilities are normalised

P(x, y) = P(x|y) · P(y) = P(y|x) · P(x) P(x|y) =P(y|x) P(x)

P(y)

P(x, y) = P(x) · P(y) ! x and y they are independent

P(x, y) ! P(x) =!

y

P(x, y) marginal probability

probabilities are non-negative: P(x) ! 0

Page 30: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

THE AXIOMS OF PROBABILITY(HOW TO REPRESENT YOUR BELIEFS)

6

joint probability

conditional probability by Bayes’ rule

0 ! P(x) ! 1

P(x|y) belief in x if we know y is true

P(x) = 1 ! x is certainly trueP(x) = 0 ! x is certainly not true

properties:

axioms:

!

x

P(x) = 1probabilities are normalised

P(x, y) = P(x|y) · P(y) = P(y|x) · P(x) P(x|y) =P(y|x) P(x)

P(y)

posterior likelihood prior∝ ×

P(x, y) = P(x) · P(y) ! x and y they are independent

P(x, y) ! P(x) =!

y

P(x, y) marginal probability

probabilities are non-negative: P(x) ! 0

Page 31: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

REPRESENTING DEGREES OF BELIEFS AS PROBABILITIES

7

0

physical luminanceknowledge of checkerboards

shade of square B

P(shade of square B)

Page 32: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

REPRESENTING DEGREES OF BELIEFS AS PROBABILITIES

7

0

physical luminanceknowledge of checkerboards

shade of square B

P(shade of square B)

P(shade of square B | luminance, checkerboard, shadows) ∝

∝ P(luminance of square B | shade of square B) × P(shade of square B | checkerboard)

Page 33: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

REPRESENTING DEGREES OF BELIEFS AS PROBABILITIES

7

0

physical luminanceknowledge of checkerboards

shade of square B

P(shade of square B)

P(shade of square B | luminance, checkerboard, shadows) ∝

∝ P(luminance of square B | shade of square B) × P(shade of square B | checkerboard)

posterior

likelihood prior

Page 34: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

BAYESIAN DECISION THEORY(HOW TO MAKE POINT ESTIMATES)

8

Page 35: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

BAYESIAN DECISION THEORY(HOW TO MAKE POINT ESTIMATES)

8

state of the worldx1 x2 x3

action

a1 L(a1,x1) L(a1,x2) L(a1,x3)a2 L(a2,x1) L(a2,x2) L(a2,x3)a3 L(a3,x1) L(a3,x2) L(a3,x3)

...

...

loss function

Page 36: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

BAYESIAN DECISION THEORY(HOW TO MAKE POINT ESTIMATES)

8

state of the worldx1 x2 x3

action

a1 L(a1,x1) L(a1,x2) L(a1,x3)a2 L(a2,x1) L(a2,x2) L(a2,x3)a3 L(a3,x1) L(a3,x2) L(a3,x3)

...

...

action to choose:

note: a and x need not live in the same space

loss function

a! = argmina

!

x

L(a, x) P(x)

Page 37: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

BAYESIAN DECISION THEORY(HOW TO MAKE POINT ESTIMATES)

8

state of the worldx1 x2 x3

action

a1 L(a1,x1) L(a1,x2) L(a1,x3)a2 L(a2,x1) L(a2,x2) L(a2,x3)a3 L(a3,x1) L(a3,x2) L(a3,x3)

...

...

action to choose:

note: a and x need not live in the same space

special cases when and x do live in the same space

loss function

a! = argmina

!

x

L(a, x) P(x)

a = x̂

Page 38: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

BAYESIAN DECISION THEORY(HOW TO MAKE POINT ESTIMATES)

8

state of the worldx1 x2 x3

action

a1 L(a1,x1) L(a1,x2) L(a1,x3)a2 L(a2,x1) L(a2,x2) L(a2,x3)a3 L(a3,x1) L(a3,x2) L(a3,x3)

...

...

action to choose:

note: a and x need not live in the same space

special cases when and x do live in the same space

loss function

a! = argmina

!

x

L(a, x) P(x)

a = x̂

L(x̂, x) = (x̂! x)2

Page 39: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

BAYESIAN DECISION THEORY(HOW TO MAKE POINT ESTIMATES)

8

state of the worldx1 x2 x3

action

a1 L(a1,x1) L(a1,x2) L(a1,x3)a2 L(a2,x1) L(a2,x2) L(a2,x3)a3 L(a3,x1) L(a3,x2) L(a3,x3)

...

...

action to choose:

note: a and x need not live in the same space

special cases when and x do live in the same space

posterior mean

loss function

a! = argmina

!

x

L(a, x) P(x)

a = x̂

L(x̂, x) = (x̂! x)2 x̂ =!

x

xP(x)

Page 40: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

BAYESIAN DECISION THEORY(HOW TO MAKE POINT ESTIMATES)

8

state of the worldx1 x2 x3

action

a1 L(a1,x1) L(a1,x2) L(a1,x3)a2 L(a2,x1) L(a2,x2) L(a2,x3)a3 L(a3,x1) L(a3,x2) L(a3,x3)

...

...

action to choose:

note: a and x need not live in the same space

special cases when and x do live in the same space

posterior mean

loss function

a! = argmina

!

x

L(a, x) P(x)

a = x̂

L(x̂, x) = (x̂! x)2 x̂ =!

x

xP(x)

L(x̂, x) = |x̂! x|

Page 41: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

BAYESIAN DECISION THEORY(HOW TO MAKE POINT ESTIMATES)

8

state of the worldx1 x2 x3

action

a1 L(a1,x1) L(a1,x2) L(a1,x3)a2 L(a2,x1) L(a2,x2) L(a2,x3)a3 L(a3,x1) L(a3,x2) L(a3,x3)

...

...

action to choose:

note: a and x need not live in the same space

special cases when and x do live in the same space

posterior mean

posterior median

loss function

a! = argmina

!

x

L(a, x) P(x)

a = x̂

L(x̂, x) = (x̂! x)2 x̂ =!

x

xP(x)

L(x̂, x) = |x̂! x|x̂!

x=!"P(x) =

12

Page 42: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

BAYESIAN DECISION THEORY(HOW TO MAKE POINT ESTIMATES)

8

state of the worldx1 x2 x3

action

a1 L(a1,x1) L(a1,x2) L(a1,x3)a2 L(a2,x1) L(a2,x2) L(a2,x3)a3 L(a3,x1) L(a3,x2) L(a3,x3)

...

...

action to choose:

note: a and x need not live in the same space

special cases when and x do live in the same space

posterior mean

posterior median

loss function

a! = argmina

!

x

L(a, x) P(x)

a = x̂

L(x̂, x) = (x̂! x)2 x̂ =!

x

xP(x)

L(x̂, x) = |x̂! x|x̂!

x=!"P(x) =

12

L(x̂, x) =!

0 if x=x̂!1 otherwise

Page 43: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

BAYESIAN DECISION THEORY(HOW TO MAKE POINT ESTIMATES)

8

state of the worldx1 x2 x3

action

a1 L(a1,x1) L(a1,x2) L(a1,x3)a2 L(a2,x1) L(a2,x2) L(a2,x3)a3 L(a3,x1) L(a3,x2) L(a3,x3)

...

...

action to choose:

note: a and x need not live in the same space

special cases when and x do live in the same space

posterior mean

posterior median

maximum a posteriori (MAP)

loss function

a! = argmina

!

x

L(a, x) P(x)

a = x̂

L(x̂, x) = (x̂! x)2 x̂ =!

x

xP(x)

L(x̂, x) = |x̂! x|x̂!

x=!"P(x) =

12

L(x̂, x) =!

0 if x=x̂!1 otherwise

x̂ = argmaxx

P(x)

Page 44: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

EXAMPLE: PREDICTING LIFE SPAN

9

You meet someone who is t years old. What will be his total life span ttotal?

Page 45: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

EXAMPLE: PREDICTING LIFE SPAN

9

You meet someone who is t years old. What will be his total life span ttotal?

P(ttotal|t) ! P(t|ttotal) P(ttotal)

Page 46: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

EXAMPLE: PREDICTING LIFE SPAN

9

You meet someone who is t years old. What will be his total life span ttotal?

the probability that you meet someone

at the age of twhen s/he will have

a total life span of ttotal

P(ttotal|t) ! P(t|ttotal) P(ttotal)

Page 47: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

EXAMPLE: PREDICTING LIFE SPAN

9

You meet someone who is t years old. What will be his total life span ttotal?

the probability that you meet someone

at the age of twhen s/he will have

a total life span of ttotal

P(ttotal|t) ! P(t|ttotal) P(ttotal)

!

!"

#

1P(ttotal)

if t < ttotal

0 otherwise

Page 48: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

EXAMPLE: PREDICTING LIFE SPAN

9

You meet someone who is t years old. What will be his total life span ttotal?

the probability that you meet someone

at the age of twhen s/he will have

a total life span of ttotal

prior on life span distribution of people

P(ttotal|t) ! P(t|ttotal) P(ttotal)

!

!"

#

1P(ttotal)

if t < ttotal

0 otherwise

Page 49: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

EXAMPLE: PREDICTING LIFE SPAN

9

You meet someone who is t years old. What will be his total life span ttotal?

the probability that you meet someone

at the age of twhen s/he will have

a total life span of ttotal

prior on life span distribution of people

P(ttotal|t) ! P(t|ttotal) P(ttotal)

!

!"

#

1P(ttotal)

if t < ttotal

0 otherwise

+ decision theory

e.g. report the median of the posterior

Page 50: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

EXAMPLE: PREDICTING LIFE SPAN

9

You meet someone who is t years old. What will be his total life span ttotal?

the probability that you meet someone

at the age of twhen s/he will have

a total life span of ttotal

prior on life span distribution of people

P(ttotal|t) ! P(t|ttotal) P(ttotal)

!

!"

#

1P(ttotal)

if t < ttotal

0 otherwise

+ decision theory

e.g. report the median of the posterior

gous to the Copernican anthropic principle in Bayesian cos-

mology (Buch, 1994; Caves, 2000; Garrett & Coles, 1993; Gott,1993, 1994; Ledford, Marriott, & Crowder, 2001) and the ge-

neric-view principle in Bayesian models of visual perception(Freeman, 1994; Knill & Richards, 1996). The prior probability

p(ttotal) reflects our general expectations about the relevant classof events—in this case, about how likely it is that a man’s lifespan will be ttotal. Analysis of actuarial data shows that the

distribution of life spans in our society is (ignoring infant mor-tality) approximately Gaussian—normally distributed—with a

mean, m, of about 75 years and a standard deviation, s, of about16 years.

Combining the prior with the likelihood according to Equation1 yields a probability distribution p(ttotal|t) over all possible totallife spans ttotal for a man encountered at age t. A good guess for

ttotal is the median of this distribution—that is, the point at whichit is equally likely that the true life span is longer or shorter.

Taking the median of p(ttotal|t) defines a Bayesian predictionfunction, specifying a predicted value of ttotal for each observedvalue of t. Prediction functions for events with Gaussian priors

are nonlinear: For values of t much less than the mean of theprior, the predicted value of ttotal is approximately the mean;

once t approaches the mean, the predicted value of ttotal in-creases slowly, converging to t as t increases but always re-

maining slightly higher, as shown in Figure 1. Although itsmathematical form is complex, this prediction function makesintuitive sense for human life spans: A predicted life span of

about 75 years would be reasonable for aman encountered at age18, 39, or 51; if we met a man at age 75, we might be inclined to

give him several more years at least; but if wemet someone at age96, we probably would not expect him to live much longer.This approach to prediction is quite general, applicable to any

problem that requires estimating the upper limit of a duration,extent, or other numerical quantity given a sample drawn from

that interval (Buch, 1994; Caves, 2000; Garrett & Coles, 1993;Gott, 1993, 1994; Jaynes, 2003; Jeffreys, 1961; Ledford et al.,

2001; Leslie, 1996; Maddox, 1994; Shepard, 1987; Tenenbaum& Griffiths, 2001). However, different priors will be appropriatefor different kinds of phenomena, and the prediction function

will vary substantially as a result. For example, imagine trying topredict the total box-office gross of a movie given its take so far.

The total gross of movies follows a power-law distribution, withp(ttotal) / ttotal

!g for some g> 0.1 This distribution has a highly

non-Gaussian shape (see Fig. 1), with most movies taking in onlymodest amounts, but occasional blockbusters making hugeamounts of money. In the appendix, we show that for power-law

priors, the Bayesian prediction function picks a value for ttotalthat is a multiple of the observed sample t. The exact multipledepends on the parameter g. For the particular power law thatbest fits the actual distribution of movie grosses, an optimal

Bayesian observer would estimate the total gross to be approx-imately 50% greater than the current gross: Thus, if we observe amovie has made $40 million to date, we should guess a total

gross of around $60 million; if we observe a current gross of only$6 million, we should guess about $9 million for the total.

Although such constant-multiple prediction rules are optimalfor event classes that follow power-law priors, they are clearly

inappropriate for predicting life spans or other kinds of eventswith Gaussian priors. For instance, upon meeting a 10-year-oldgirl and her 75-year-old grandfather, we would never predict

that the girl will live a total of 15 years (1.5 " 10) and thegrandfather will live to be 112 (1.5" 75). Other classes of priors,

such as the exponential-tailed Erlang distribution, p(ttotal) /ttotalexp(!ttotal/b) for b> 0,2 are also associated with distinctiveoptimal prediction functions. For the Erlang distribution, the

Fig. 1. Bayesian prediction functions and their associated prior distri-butions. The three columns represent qualitatively different statisticalmodels appropriate for different kinds of events. The top row of plotsshows three parametric families of prior distributions for the total dura-tion or extent, ttotal, that could describe events in a particular class. Linesof different styles represent different parameter values (e.g., differentmean durations) within each family. The bottom row of plots shows theoptimal predictions for ttotal as a function of t, the observed duration orextent of an event so far, assuming the prior distributions shown in the toppanel. For Gaussian priors (left column), the prediction function alwayshas a slope less than 1 and an intercept near the mean m: Predictions arenever much smaller than the mean of the prior distribution, nor muchlarger than the observed duration. Power-law priors (middle column)result in linear prediction functions with variable slope and a zero inter-cept. Erlang priors (right column) yield a linear prediction function thatalways has a slope equal to 1 and a nonzero intercept.

1When g > 1, a power-law distribution is often referred to in statistics andeconomics as a Pareto distribution.

2The Erlang distribution is a special case of the gamma distribution. Thegamma distribution is p(ttotal) / ttotal

k!1exp(!ttotal/b), where k > 0 and b > 0are real numbers. The Erlang distribution assumes that k is an integer. FollowingShepard (1987), we use a one-parameter Erlang distribution, fixing k at 2.

768 Volume 17—Number 9

Everyday Predictions

Gri

ffit

hs &

Ten

enba

um,

2006

Page 51: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

EVERYDAY PREDICTIONS

10

best guess of ttotal is simply t plus a constant determined bythe parameter b, as shown in the appendix and illustrated in

Figure 1.Our experiment compared these ideal Bayesian analyses with

the judgments of a large sample of human participants, exam-

ining whether people’s predictions were sensitive to the distri-butions of different quantities that arise in everyday contexts.

We used publicly available data to identify the true prior dis-tributions for several classes of events (the sources of these data

are given in Table 1). For example, as shown in Figure 2, humanlife spans and the run time of movies are approximatelyGaussian, the gross of movies and the length of poems are ap-

proximately power-law distributed, and the distributions of thenumber of years in office for members of the U.S. House of

Representatives and of the length of the reigns of pharaohs are

approximately Erlang. The experiment examined how wellpeople’s predictions corresponded to optimal statistical infer-

ence in these different settings.

METHOD

Participants and ProcedureParticipants were tested in two groups, with each group makingpredictions about five different phenomena. One group of 208undergraduates made predictions about movie grosses, poem

lengths, life spans, reigns of pharaohs, and lengths of marriages.A second group of 142 undergraduates made predictions about

movie run times, terms of U.S. representatives, baking times forcakes, waiting times, and lengths of marriages. The surveys were

TABLE 1

Sources of Data for Estimating Prior Distributions

Data set Source (number of data points)

Movie grosses http://www.worldwideboxoffice.com/ (5,302)Poem lengths http://www.emule.com/ (1,000)Life spans http://www.demog.berkeley.edu/wilmoth/mortality/states.html (complete life table)Movie run times http://www.imdb.com/charts/usboxarchive/ (233 top-10 movies from 1998 through 2003)U.S. representatives’ terms http://www.bioguide.congress.gov/ (2,150 members since 1945)Cake baking times http://www.allrecipes.com/ (619)Pharaohs’ reigns http://www.touregypt.com/ (126)

Note. Data were collected from these Web sites between July and December 2003.

Fig. 2.People’s predictions for various everyday phenomena.The top row of plots shows the empirical distributions of the total duration or extent, ttotal,for each of these phenomena. The first two distributions are approximately Gaussian, the third and fourth are approximately power-law, and the fifthand sixth are approximatelyErlang.The bottom row shows participants’ predicted values of ttotal for a single observed sample t of a duration or extent foreach phenomenon. Black dots show the participants’ median predictions of ttotal. Error bars indicate 68% confidence intervals (estimated by a 1,000-sample bootstrap). Solid lines show the optimal Bayesian predictions based on the empirical prior distributions shown above. Dashed lines show pre-dictions made by estimating a subjective prior, for the pharaohs and waiting-times stimuli, as explained in the main text. Dotted lines show predictionsbased on a fixed uninformative prior (Gott, 1993).

Volume 17—Number 9 769

Thomas L. Griffiths and Joshua B. Tenenbaum

Gri

ffit

hs &

Ten

enba

um,

2006

Page 52: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

RATIONAL VS IRRATIONAL

11

Bernoulli (1713) Kahneman & Tversky2002 Nobel Prize

in Economics

John Andersonrational analysis

Page 53: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

RATIONAL VS IRRATIONAL

11

Bernoulli (1713) Kahneman & Tversky2002 Nobel Prize

in Economics

• computational cost• ecology vs economy• certainty vs uncertainty• implicit vs explicit (esp verbal) computations

for more, see Anderson (1990)

John Andersonrational analysis

Page 54: PROBABILISTIC MODELS OF LEARNING AND MEMORYcbl.eng.cam.ac.uk/pub/Public/Lengyel/Presentations/01_bayesintro.pdf · PROBABILISTIC MODELS OF LEARNING AND MEMORY Uncertainty and Bayesian

Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009

Adelson (unpubl) http://web.mit.edu/persci/people/adelson/checkershadow_illusion.html

Anderson (1990) The adaptive character of thought. Lawrence Erlbaum Asociates, Hillsdale, NJ.

Bernoulli J (1713) Ars conjectandi. Thurnisiorum, Basel.

Griffiths TL, Tenenbaum, JB (2006) Optimal predictions in everyday cognition. Psychol Sci 17:767-773.

Helmholtz H (1867) Handbuch der physiologischen Optik. L. Voss, Leipzig. (translated in English by JPC Southall as “Treatise on Physiological Optics”)

Kahneman D, Tversky A (1973) On the psychology of predictions. Psychol Rev 80:237-251.

Roediger HL III, McDermott KB (1995) Creating false memories: Remembering words not presented in lists. J Exp Psychol Learn Mem Cogn 21:803-14.

12

REFERENCES