everyday inductive leaps making predictions and detecting coincidences tom griffiths department of...
DESCRIPTION
Inductive problems Inferring structure from data Perception –e.g. structure of 3D world from 2D visual data Cognition –e.g. whether a process is random hypotheses fair coin two heads data HHHHHTRANSCRIPT
Everyday inductive leapsMaking predictions and detecting coincidences
Tom GriffithsDepartment of Psychology
Program in Cognitive ScienceUniversity of California, Berkeley
(joint work with Josh Tenenbaum, MIT)
Inductive problems
• Inferring structure from data• Perception
– e.g. structure of 3D world from 2D visual data
data hypotheses
cube
shaded hexagon
Inductive problems
• Inferring structure from data• Perception
– e.g. structure of 3D world from 2D visual data• Cognition
– e.g. whether a process is randomhypotheses
fair coin
two heads
data
HHHHH
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Perception is optimal
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Cognition is not
Everyday inductive leaps
• Inferences we make effortlessly every day– making predictions– detecting coincidences– evaluating randomness– learning causal relationships– identifying categories– picking out regularities in language
• A chance to study induction in microcosm, and compare cognition to optimal solutions
Two everyday inductive leaps
Predicting the future
Detecting coincidences
Two everyday inductive leaps
Predicting the future
Detecting coincidences
Predicting the future
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
How often is Google News updated? t = time since last update
ttotal = time between updates
What should we guess for ttotal given t?
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Reverend Thomas Bayes
Bayes’ theorem
∑∈′
′′=
Hhhphdp
hphdpdhp)()|(
)()|()|(
Posteriorprobability
Likelihood Priorprobability
Sum over space of hypothesesh: hypothesis
d: data
Bayes’ theorem
€
p(h | d)∝ p(d | h)p(h)
h: hypothesisd: data
Bayesian inference
p(ttotal|t) p(t|ttotal) p(ttotal)
posterior probability
likelihood prior
Bayesian inference
p(ttotal|t) p(t|ttotal) p(ttotal)
p(ttotal|t) 1/ttotal p(ttotal)assumerandomsample
(0 < t < ttotal)
posterior probability
likelihood prior
The effects of priorsDifferent kinds of priors p(ttotal) are
appropriate in different domains
e.g. wealth e.g. height
The effects of priors
Evaluating human predictions
• Different domains with different priors:– a movie has made $60 million [power-law]
– your friend quotes from line 17 of a poem [power-law]
– you meet a 78 year old man [Gaussian]
– a movie has been running for 55 minutes [Gaussian]
– a U.S. congressman has served 11 years [Erlang]
• Prior distributions derived from actual data• Use 5 values of t for each • People predict ttotal
peopleparametric priorempirical prior
Gott’s rule
Probability matching
p(t to
tal|t pa
st)
ttotal
Quantile of Bayesian posterior distribution
Pro
porti
on o
f jud
gmen
ts b
elow
pre
dict
ed v
alue
Probability matching
Average over all prediction tasks:• movie run times• movie grosses• poem lengths• life spans• terms in congress• cake baking times
p(t to
tal|t pa
st)
ttotal
Quantile of Bayesian posterior distribution
Pro
porti
on o
f jud
gmen
ts b
elow
pre
dict
ed v
alue
Predicting the future
• People produce accurate predictions for the duration and extent of everyday events
• Strong prior knowledge – form of the prior (power-law or exponential)– distribution given that form (parameters)
• Contrast with “base rate neglect”(Kahneman & Tversky, 1973)
Two everyday inductive leaps
Predicting the future
Detecting coincidences
November 12, 2001: New Jersey lottery results were 5-8-7, the same day that American Airlines flight 587 crashed
"It could be that, collectively, the people in New York caused those lottery numbers to come up 911," says Henry Reed. A psychologist who specializes in intuition, he teaches seminars at the Edgar Cayce Association for Research and Enlightenment in Virginia Beach, VA.
"If enough people all are thinking the same thing, at the same time, they can cause events to happen," he says. "It's called psychokinesis."
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
The bombing of LondonThe bombing of London
(Gilovich, 1991)
The bombing of LondonThe bombing of London
(Gilovich, 1991)
(Snow, 1855)
John Snow and choleraJohn Snow and cholera
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
76 y
ears
75
yea
rs
(Halley, 1752)
The paradox of coincidencesThe paradox of coincidences
How can coincidences simultaneously lead us to irrational conclusions and
significant discoveries?
A common definition: A common definition: Coincidences are unlikely eventsCoincidences are unlikely events
“an event which seems so unlikely that it is worth telling a story about”
“we sense that it is too unlikely to have been the result of luck or mere chance”
Coincidences are not just unlikely...
HHHHHHHHHHvs.
HHTHTHTTHT
Priors: p(cause) p(chance)
Data: d
Hypotheses: cause chancea novel causal
relationship existsno such
relationship exists
Likelihoods: p(d|cause) p(d|chance)
€
p(cause | d)p(chance | d)
= p(d | cause)p(d | chance)
p(cause)p(chance)
Bayesian causal induction
Bayesian causal induction
€
p(cause | d)p(chance | d)
= p(d | cause)p(d | chance)
p(cause)p(chance)
Likelihood ratio(evidence)
Prior odds
high
low
highlow
cause
chance
?
?
Likelihood ratio(evidence)
Prior odds
high
low
highlow
cause
chance
coincidence
?
Bayesian causal induction
€
p(cause | d)p(chance | d)
= p(d | cause)p(d | chance)
p(cause)p(chance)
What makes a coincidence?What makes a coincidence?
A coincidence is an event that provides evidence for causal structure, but not enough evidence to make us believe that structure exists
€
p(cause | d)p(chance | d)
= p(d | cause)p(d | chance)
p(cause)p(chance)
What makes a coincidence?What makes a coincidence?
likelihood ratiois high
A coincidence is an event that provides evidence for causal structure, but not enough evidence to make us believe that structure exists
€
p(cause | d)p(chance | d)
= p(d | cause)p(d | chance)
p(cause)p(chance)
likelihood ratiois high
prior oddsare low
posterior oddsare middling
A coincidence is an event that provides evidence for causal structure, but not enough evidence to make us believe that structure exists
What makes a coincidence?What makes a coincidence?
€
p(cause | d)p(chance | d)
= p(d | cause)p(d | chance)
p(cause)p(chance)
HHHHHHHHHH
HHTHTHTTHTlikelihood ratio
is highprior odds
are lowposterior oddsare middling
€
p(cause | d)p(chance | d)
= p(d | cause)p(d | chance)
p(cause)p(chance)
Bayesian causal inductionBayesian causal inductionHypotheses:
Likelihoods:
Priors:
Data: frequency of effect in presence of cause
cause chance
E
C
E
C
1 -
0 < p(E) < 1 p(E) = 0.5
HHHHHHHHHH
HHTHTHTTHT
likelihood ratiois high
prior oddsare low
posterior oddsare middling
likelihood ratiois low
prior oddsare low
posterior oddsare low
coincidence
chance
€
p(cause | d)p(chance | d)
= p(d | cause)p(d | chance)
p(cause)p(chance)
Empirical tests
• Is this definition correct?– from coincidence to evidence
• How do people assess complex coincidences?
– the bombing of London– coincidences in date
Empirical tests
• Is this definition correct?– from coincidence to evidence
• How do people assess complex coincidences?
– the bombing of London– coincidences in date
HHHHHHHHHHHHHHHHHHHHHH
HHHHHHHHHHlikelihood ratio
is highprior odds
are lowposterior oddsare middling
coincidence
likelihood ratiois very high
prior oddsare low
posterior oddsare high
cause
From coincidence to evidenceFrom coincidence to evidence
coincidence evidence for acausal relation
€
p(cause | d)p(chance | d)
• Transition produced by– increase in likelihood ratio (e.g., coin flipping)– increase in prior odds (e.g., genetics vs.ESP)
Testing the definitionTesting the definition
• Provide participants with data from experiments
• Manipulate:– cover story: genetics vs. ESP (prior)– data: number of heads/males (likelihood)– task: “coincidence or evidence?” vs. “how likely?”
• Predictions:– coincidences affected by prior and likelihood– relationship between coincidence and posterior
47 51 55 59 63 70 87 99
r = -0.98
47 51 55 59 63 70 87 99
Number of heads/males
Prop
ortio
n “c
oinc
iden
ce”
Post
erio
r pro
babi
lity
Empirical tests
• Is this definition correct?– from coincidence to evidence
• How do people assess complex coincidences?
– the bombing of London– coincidences in date
Complex coincidencesComplex coincidences
• Many coincidences involve structure hidden in a sea of noise (e.g., bombing of London)
• How well do people detect such structure?
• Strategy: examine correspondence between strength of coincidence and likelihood ratio
€
p(cause | d)p(chance | d)
= p(d | cause)p(d | chance)
p(cause)p(chance)
The bombing of LondonThe bombing of London
(uniform)
Spread
Location
Ratio
Number
Change in... People
Bayesian causal inductionBayesian causal inductionHypotheses:
Likelihoods:
Priors: 1 -
uniformuniform
+regularity
cause chanceT
X X XX
TT TT
X X XX
T
Data: bomb locations
r = 0.98
(uniform)
Spread
Location
Ratio
Number
Change in... People Bayes
76 y
ears
75
yea
rs
May 14, July 8, August 21, December 25
vs.
August 3, August 3, August 3, August 3
Coincidences in date
People
Bayesian causal inductionBayesian causal inductionHypotheses:
Likelihoods:
Priors: 1 -
uniformuniform + regularity
August
Data: birthdays of those present
cause chance
P P PPP P PP
B B B B B B B B
People Bayes
Regularities:Proximity in dateSame day of monthSame month
CoincidencesCoincidences• Provide evidence for causal structure, but not
enough to make us believe that structure exists
• Intimately related to causal induction– an opportunity to revise a theory– a window on the process of discovery
• Guided by a well calibrated sense of when an event provides evidence of causal structure
The paradox of coincidencesThe paradox of coincidences
false significant discovery
true false conclusion
Status of current theory Consequence
The utility of attending to coincidencesdepends upon how much you know already
Two everyday inductive leaps
Predicting the future
Detecting coincidences
Subjective randomness
• View randomness as an inference about generating processes behind data
• Analysis similar (but inverse) to coincidences– randomness is evidence against
a regular generating process
(Griffiths & Tenenbaum, 2003)
A B
Other cases of causal induction
(Griffiths, Baraff, & Tenenbaum, 2004)
Aspects of language acquisition
(Goldwater, Griffiths, & Johnson, 2006)
Categorization
€
P(c | x) = P(x | c)P(c)P(x | c)P(c)
c∑
x
Pro
babi
lity
(Sanborn, Griffiths, & Navarro, 2006)
ConclusionsConclusions• We can learn about cognition (and not just
perception) by thinking about optimal solutions to computational problems
• We can study induction using the inferences that people make every day
• Bayesian inference offers a way to understand these inductive inferences
Magic tricks
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Magic tricks are regularly used to identify infants’
ontological commitments
Can we use a similar method with adults?
(Wynn, 1992)
Ontological commitments
(Keil, 1981)
What’s a better magic trick?
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
What’s a better magic trick?• Participants rate the quality of 45
transformations, 10 appearances, and 10 disappearances– direction of transformation is
randomized between subjects• A second group rates similarity• Objects are chosen to lie at
different points in a hierarchy
milk
water
a brick
a vase
a rose
a daffodil
a dove
a blackbird
a man
a girl
App
licab
le p
redi
cate
s
milk
water
a brick
a vase
a rose
a daffodil
a dove
a blackbird
a man
a girl
milk
water
a brick
a vase
a rose
a daffodil
a dove
a blackbird
a man
a girl
milk
wat
er
a br
ick
a va
se
a ro
se
a da
ffod
il
a do
ve
a bl
ackb
ird
a m
an
a gi
rl
milk
wat
er
a br
ick
a va
se
a ro
se
a da
ffod
il
a do
ve
a bl
ackb
ird
a m
an
a gi
rl
What’s a better magic trick?
Ontological asymmetries
milk
water
a brick
a vase
a rose
a daffodil
a dove
a blackbird
a man
a girl
milk
water
a brick
a vase
a rose
a daffodil
a dove
a blackbird
a man
a girl
milk
wat
er
a br
ick
a va
se
a ro
se
a da
ffod
il
a do
ve
a bl
ackb
ird
a m
an
a gi
rl
milk
wat
er
a br
ick
a va
se
a ro
se
a da
ffod
il
a do
ve
a bl
ackb
ird
a m
an
a gi
rl
Analyzing asymmetry• Build a regression model:
– similarity– appearing object– disappearing object– contains people– direction in hierarchy (-1,0,1)
• All factors significant• Explains 90.9% of variance
milk
water
a brick
a vase
a rose
a daffodil
a dove
a blackbird
a man
a girl
App
licab
le p
redi
cate
s
Summary: magic tricks• Certain factors reliably influence the estimated
quality of a magic trick• Magic tricks might be a way to investigate our
ontological assumptions– inviolable laws that are otherwise hard to assess
• A Bayesian theory of magic tricks?– strong evidence for a novel causal force– causal force is given low prior probability
A reformulation: unlikely kinds
• Coincidences are events of an unlikely kind– e.g. a sequence with that number of heads
• Deals with the obvious problem...
p(10 heads) < p(5 heads, 5 tails)
Problems with unlikely kinds
• Defining kindsAugust 3, August 3, August 3, August 3
January 12, March 22, March 22, July 19, October 1, December 8
Problems with unlikely kinds
• Defining kinds• Counterexamples
P(4 heads) < P(2 heads, 2 tails)
P(4 heads) > P(15 heads, 8 tails)
HHHH > HHHHTHTTHHHTHTHHTHTTHHH
HHHH > HHTT
Sampling from categories
Frog distribution
P(x|c)
Markov chain Monte Carlo
• Sample from a target distribution P(x) by constructing Markov chain for which P(x) is the stationary distribution
• Markov chain converges to its stationary distribution, providing outcomes that can be used similarly to samples
Metropolis-Hastings algorithm
p(x)
Metropolis-Hastings algorithm
p(x)
Metropolis-Hastings algorithm
p(x)
Metropolis-Hastings algorithm
A(x(t), x(t+1)) = 0.5
p(x)
Metropolis-Hastings algorithm
p(x)
Metropolis-Hastings algorithm
A(x(t), x(t+1)) = 1
p(x)
A task
Ask subjects which of two alternatives comes from a target category
Which animal is a frog?
Collecting the samplesWhich is the frog? Which is the frog? Which is the frog?
Trial 1 Trial 2 Trial 3
Sampling from natural categoriesExamined distributions for four natural categories:
giraffes, horses, cats, and dogs
Presented stimuli with nine-parameter stick figures (Olman & Kersten, 2004)
Choice task
Samples from Subject 3(projected onto a plane)
Mean animals by subject
giraffe
horse
cat
dog
S1 S2 S3 S4 S5 S6 S7 S8
Markov chain Monte Carlo with people
• Rational models can guide the design of psychological experiments
• Markov chain Monte Carlo (and other methods) can be used to sample from subjective probability distributions– category distributions– prior distributions