everyday inductive leaps making predictions and detecting coincidences tom griffiths department of...

Everyday inductive leapsMaking predictions and detecting coincidences

Tom GriffithsDepartment of Psychology

Program in Cognitive ScienceUniversity of California, Berkeley

(joint work with Josh Tenenbaum, MIT)

Inductive problems

• Inferring structure from data• Perception

– e.g. structure of 3D world from 2D visual data

data hypotheses

cube

shaded hexagon

Inductive problems

• Inferring structure from data• Perception

– e.g. structure of 3D world from 2D visual data• Cognition

– e.g. whether a process is randomhypotheses

fair coin

two heads

data

HHHHH

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.





Perception is optimal

QuickTime™ and aTIFF (Uncompressed) decompressor





are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.



Cognition is not

Everyday inductive leaps

• Inferences we make effortlessly every day– making predictions– detecting coincidences– evaluating randomness– learning causal relationships– identifying categories– picking out regularities in language

• A chance to study induction in microcosm, and compare cognition to optimal solutions

Two everyday inductive leaps

Predicting the future

Detecting coincidences




How often is Google News updated? t = time since last update

ttotal = time between updates

What should we guess for ttotal given t?



Reverend Thomas Bayes

Bayes’ theorem

∑∈′

′′=

Hhhphdp

hphdpdhp)()|(

)()|()|(

Posteriorprobability

Likelihood Priorprobability

Sum over space of hypothesesh: hypothesis

d: data

Bayes’ theorem

€

p(h | d)∝ p(d | h)p(h)

h: hypothesisd: data

Bayesian inference

p(ttotal|t) p(t|ttotal) p(ttotal)

posterior probability

likelihood prior

Bayesian inference

p(ttotal|t) p(t|ttotal) p(ttotal)

p(ttotal|t) 1/ttotal p(ttotal)assumerandomsample

(0 < t < ttotal)

posterior probability

likelihood prior

The effects of priorsDifferent kinds of priors p(ttotal) are

appropriate in different domains

e.g. wealth e.g. height

The effects of priors

Evaluating human predictions

• Different domains with different priors:– a movie has made $60 million [power-law]

– your friend quotes from line 17 of a poem [power-law]

– you meet a 78 year old man [Gaussian]

– a movie has been running for 55 minutes [Gaussian]

– a U.S. congressman has served 11 years [Erlang]

• Prior distributions derived from actual data• Use 5 values of t for each • People predict ttotal

peopleparametric priorempirical prior

Gott’s rule

Probability matching

p(t to

tal|t pa

st)

ttotal

Quantile of Bayesian posterior distribution

Pro

porti

on o

f jud

gmen

ts b

elow

pre

dict

ed v

alue

Probability matching

Average over all prediction tasks:• movie run times• movie grosses• poem lengths• life spans• terms in congress• cake baking times

p(t to

tal|t pa

st)

ttotal

Quantile of Bayesian posterior distribution

Pro

porti

on o

f jud

gmen

ts b

elow

pre

dict

ed v

alue


• People produce accurate predictions for the duration and extent of everyday events

• Strong prior knowledge – form of the prior (power-law or exponential)– distribution given that form (parameters)

• Contrast with “base rate neglect”(Kahneman & Tversky, 1973)

November 12, 2001: New Jersey lottery results were 5-8-7, the same day that American Airlines flight 587 crashed

"It could be that, collectively, the people in New York caused those lottery numbers to come up 911," says Henry Reed. A psychologist who specializes in intuition, he teaches seminars at the Edgar Cayce Association for Research and Enlightenment in Virginia Beach, VA.

"If enough people all are thinking the same thing, at the same time, they can cause events to happen," he says. "It's called psychokinesis."


are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.



The bombing of LondonThe bombing of London

(Gilovich, 1991)


(Gilovich, 1991)

(Snow, 1855)

John Snow and choleraJohn Snow and cholera



76 y

ears

75

yea

rs

(Halley, 1752)

The paradox of coincidencesThe paradox of coincidences

How can coincidences simultaneously lead us to irrational conclusions and

significant discoveries?

A common definition: A common definition: Coincidences are unlikely eventsCoincidences are unlikely events

“an event which seems so unlikely that it is worth telling a story about”

“we sense that it is too unlikely to have been the result of luck or mere chance”

Coincidences are not just unlikely...

HHHHHHHHHHvs.

HHTHTHTTHT


€



p(cause)p(chance)

Likelihood ratio(evidence)

Prior odds

high

low

highlow

cause

chance

?

?

Likelihood ratio(evidence)

Prior odds

high

low

highlow

cause

chance

coincidence

?


€



p(cause)p(chance)

What makes a coincidence?What makes a coincidence?

A coincidence is an event that provides evidence for causal structure, but not enough evidence to make us believe that structure exists

€



p(cause)p(chance)


likelihood ratiois high


€



p(cause)p(chance)


prior oddsare low

posterior oddsare middling



€



p(cause)p(chance)

HHHHHHHHHH

HHTHTHTTHTlikelihood ratio

is highprior odds

are lowposterior oddsare middling

€



p(cause)p(chance)

Bayesian causal inductionBayesian causal inductionHypotheses:

Likelihoods:

Priors:

Data: frequency of effect in presence of cause

cause chance

E

C

E

C

1 -

0 < p(E) < 1 p(E) = 0.5

HHHHHHHHHH

HHTHTHTTHT


prior oddsare low

posterior oddsare middling

likelihood ratiois low

prior oddsare low

posterior oddsare low

coincidence

chance

€



p(cause)p(chance)

Empirical tests

• Is this definition correct?– from coincidence to evidence

• How do people assess complex coincidences?

– the bombing of London– coincidences in date

HHHHHHHHHHHHHHHHHHHHHH

HHHHHHHHHHlikelihood ratio

is highprior odds

are lowposterior oddsare middling

coincidence

likelihood ratiois very high

prior oddsare low

posterior oddsare high

cause

From coincidence to evidenceFrom coincidence to evidence

coincidence evidence for acausal relation

€


• Transition produced by– increase in likelihood ratio (e.g., coin flipping)– increase in prior odds (e.g., genetics vs.ESP)

Testing the definitionTesting the definition

• Provide participants with data from experiments

• Manipulate:– cover story: genetics vs. ESP (prior)– data: number of heads/males (likelihood)– task: “coincidence or evidence?” vs. “how likely?”

• Predictions:– coincidences affected by prior and likelihood– relationship between coincidence and posterior

47 51 55 59 63 70 87 99

r = -0.98

47 51 55 59 63 70 87 99

Number of heads/males

Prop

ortio

n “c

oinc

iden

ce”

Post

erio

r pro

babi

lity

Empirical tests

• Is this definition correct?– from coincidence to evidence

• How do people assess complex coincidences?

– the bombing of London– coincidences in date

Complex coincidencesComplex coincidences

• Many coincidences involve structure hidden in a sea of noise (e.g., bombing of London)

• How well do people detect such structure?

• Strategy: examine correspondence between strength of coincidence and likelihood ratio

€



p(cause)p(chance)

(uniform)

Spread

Location

Ratio

Number

Change in... People


Likelihoods:

Priors: 1 -

uniformuniform

+regularity

cause chanceT

X X XX

TT TT

X X XX

T

Data: bomb locations

r = 0.98

(uniform)

Spread

Location

Ratio

Number

Change in... People Bayes

76 y

ears

75

yea

rs

May 14, July 8, August 21, December 25

vs.

August 3, August 3, August 3, August 3

Coincidences in date

People


Likelihoods:

Priors: 1 -

uniformuniform + regularity

August

Data: birthdays of those present

cause chance

P P PPP P PP

B B B B B B B B

People Bayes

Regularities:Proximity in dateSame day of monthSame month

CoincidencesCoincidences• Provide evidence for causal structure, but not

enough to make us believe that structure exists

• Intimately related to causal induction– an opportunity to revise a theory– a window on the process of discovery

• Guided by a well calibrated sense of when an event provides evidence of causal structure

The paradox of coincidencesThe paradox of coincidences

false significant discovery

true false conclusion

Status of current theory Consequence

The utility of attending to coincidencesdepends upon how much you know already

Subjective randomness

• View randomness as an inference about generating processes behind data

• Analysis similar (but inverse) to coincidences– randomness is evidence against

a regular generating process

(Griffiths & Tenenbaum, 2003)

A B

Other cases of causal induction

(Griffiths, Baraff, & Tenenbaum, 2004)

Aspects of language acquisition

(Goldwater, Griffiths, & Johnson, 2006)

Categorization

€

P(c | x) = P(x | c)P(c)P(x | c)P(c)

c∑

x

Pro

babi

lity

(Sanborn, Griffiths, & Navarro, 2006)

ConclusionsConclusions• We can learn about cognition (and not just

perception) by thinking about optimal solutions to computational problems

• We can study induction using the inferences that people make every day

• Bayesian inference offers a way to understand these inductive inferences

Magic tricks



Magic tricks are regularly used to identify infants’

ontological commitments

Can we use a similar method with adults?

(Wynn, 1992)

Ontological commitments

(Keil, 1981)

What’s a better magic trick?






are needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressor




What’s a better magic trick?• Participants rate the quality of 45

transformations, 10 appearances, and 10 disappearances– direction of transformation is

randomized between subjects• A second group rates similarity• Objects are chosen to lie at

different points in a hierarchy

milk

water

a brick

a vase

a rose

a daffodil

a dove

a blackbird

a man

a girl

App

licab

le p

redi

cate

s

milk

water

a brick

a vase

a rose

a daffodil

a dove

a blackbird

a man

a girl

milk

water

a brick

a vase

a rose

a daffodil

a dove

a blackbird

a man

a girl

milk

wat

er

a br

ick

a va

se

a ro

se

a da

ffod

il

a do

ve

a bl

ackb

ird

a m

an

a gi

rl

milk

wat

er

a br

ick

a va

se

a ro

se

a da

ffod

il

a do

ve

a bl

ackb

ird

a m

an

a gi

rl

What’s a better magic trick?

Ontological asymmetries

milk

water

a brick

a vase

a rose

a daffodil

a dove

a blackbird

a man

a girl

milk

water

a brick

a vase

a rose

a daffodil

a dove

a blackbird

a man

a girl

milk

wat

er

a br

ick

a va

se

a ro

se

a da

ffod

il

a do

ve

a bl

ackb

ird

a m

an

a gi

rl

milk

wat

er

a br

ick

a va

se

a ro

se

a da

ffod

il

a do

ve

a bl

ackb

ird

a m

an

a gi

rl

Analyzing asymmetry• Build a regression model:

– similarity– appearing object– disappearing object– contains people– direction in hierarchy (-1,0,1)

• All factors significant• Explains 90.9% of variance

milk

water

a brick

a vase

a rose

a daffodil

a dove

a blackbird

a man

a girl

App

licab

le p

redi

cate

s

Summary: magic tricks• Certain factors reliably influence the estimated

quality of a magic trick• Magic tricks might be a way to investigate our

ontological assumptions– inviolable laws that are otherwise hard to assess

• A Bayesian theory of magic tricks?– strong evidence for a novel causal force– causal force is given low prior probability

A reformulation: unlikely kinds

• Coincidences are events of an unlikely kind– e.g. a sequence with that number of heads

• Deals with the obvious problem...

p(10 heads) < p(5 heads, 5 tails)

Problems with unlikely kinds

• Defining kindsAugust 3, August 3, August 3, August 3

January 12, March 22, March 22, July 19, October 1, December 8

Problems with unlikely kinds

• Defining kinds• Counterexamples

P(4 heads) < P(2 heads, 2 tails)

P(4 heads) > P(15 heads, 8 tails)

HHHH > HHHHTHTTHHHTHTHHTHTTHHH

HHHH > HHTT

Sampling from categories

Frog distribution

P(x|c)

Markov chain Monte Carlo

• Sample from a target distribution P(x) by constructing Markov chain for which P(x) is the stationary distribution

• Markov chain converges to its stationary distribution, providing outcomes that can be used similarly to samples

Metropolis-Hastings algorithm

p(x)


A(x(t), x(t+1)) = 0.5

p(x)


p(x)


A(x(t), x(t+1)) = 1

p(x)

A task

Ask subjects which of two alternatives comes from a target category

Which animal is a frog?

Collecting the samplesWhich is the frog? Which is the frog? Which is the frog?

Trial 1 Trial 2 Trial 3

Sampling from natural categoriesExamined distributions for four natural categories:

giraffes, horses, cats, and dogs

Presented stimuli with nine-parameter stick figures (Olman & Kersten, 2004)

Choice task

Samples from Subject 3(projected onto a plane)

Mean animals by subject

giraffe

horse

cat

dog

S1 S2 S3 S4 S5 S6 S7 S8

Markov chain Monte Carlo with people

• Rational models can guide the design of psychological experiments

• Markov chain Monte Carlo (and other methods) can be used to sample from subjective probability distributions– category distributions– prior distributions

everyday inductive leaps making predictions and detecting coincidences tom griffiths department of...

Documents