causes and coincidences tom griffiths cognitive and linguistic sciences brown university
TRANSCRIPT
Causes and coincidences
Tom GriffithsCognitive and Linguistic Sciences
Brown University
“It could be that, collectively, the people in New York caused those lottery numbers to come up 9-1-1… If enough people all are thinking the same thing, at the same time, they can cause events to happen… It's called psychokinesis.”
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
(Halley, 1752)
76 y
ears
75 y
ears
The paradox of coincidences
How can coincidences simultaneously lead us toirrational conclusions and significant discoveries?
Outline
1. A Bayesian approach to causal induction
2. Coincidencesi. what makes a coincidence?
ii. rationality and irrationality
iii. the paradox of coincidences
3. Explaining inductive leaps
Outline
1. A Bayesian approach to causal induction
2. Coincidencesi. what makes a coincidence?
ii. rationality and irrationality
iii. the paradox of coincidences
3. Explaining inductive leaps
Causal induction
• Inferring causal structure from data
• A task we perform every day … – does caffeine increase productivity?
• … and throughout science– three comets or one?
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Reverend Thomas Bayes
Bayes’ theorem
∑∈′
′′=
Hh
hphdp
hphdpdhp
)()|(
)()|()|(
Posteriorprobability
Likelihood Priorprobability
Sum over space of hypothesesh: hypothesis
d: data
Bayesian causal induction
Hypotheses:
Likelihoods:
Priors:
Data:
causal structures
Causal graphical models(Pearl, 2000; Spirtes et al., 1993)
• Variables
X Y
Z
• Variables
• Structure
X Y
Z
Causal graphical models(Pearl, 2000; Spirtes et al., 1993)
X Y
Z
• Variables
• Structure
• Conditional probabilities
p(z|x,y)
p(x) p(y)
Defines probability distribution over variables(for both observation, and intervention)
Causal graphical models(Pearl, 2000; Spirtes et al., 1993)
Bayesian causal induction
Hypotheses:
Likelihoods:
Priors:
probability distribution over variables
Data: observations of variables
causal structures
a priori plausibility of structures
“Does C cause E?”(rate on a scale from 0 to 100)
E present (e+)
E absent (e-)
C present(c+)
C absent(c-)
a
b
c
d
Causal induction from contingencies
Buehner & Cheng (1997)
“Does the chemical cause gene expression?”(rate on a scale from 0 to 100)
E present (e+)
E absent (e-)
C present(c+)
C absent(c-)
6
2
4
4Gen
e
Chemical
People
Examined human judgments for all values of P(e+|c+) and P(e+|c-) in increments of 0.25
How can we explain these judgments?
Buehner & Cheng (1997)C
ausa
l rat
ing
Bayesian causal inductioncause chance
E
B C
E
B CB B
Hypotheses:
Likelihoods:
Priors:
each cause has an independent opportunity to produce the effect
p 1 - p
Data: frequency of cause-effect co-occurrence
Bayesian causal inductioncause chance
E
B C
E
B CB B
Hypotheses:
€
p(cause | d) =p(d | cause) p(cause)
p(d | cause) p(cause) + p(d | chance) p(chance)
Bayesian causal inductioncause chance
E
B C
E
B CB B
Hypotheses:
€
p(cause | d)
p(chance | d)=
p(d | cause)
p(d | chance)
p(cause)
p(chance)
evidence for a causal relationship
People
Bayes (r = 0.97)
Buehner and Cheng (1997)
People
Bayes (r = 0.97)
Buehner and Cheng (1997)
P (r = 0.89)
Power (r = 0.88)
Other predictions
• Causal induction from contingency data– sample size effects– judgments for incomplete contingency tables
(Griffiths & Tenenbaum, in press)
• More complex cases– detectors (Tenenbaum & Griffiths, 2003)
– explosions (Griffiths, Baraff, & Tenenbaum, 2004)
– simple mechanical devices
A B
The stick-ball machine
(Kushnir, Schulz, Gopnik, & Danks, 2003)
Outline
1. A Bayesian approach to causal induction
2. Coincidencesi. what makes a coincidence?
ii. rationality and irrationality
iii. the paradox of coincidences
3. Explaining inductive leaps
What makes a coincidence?
A common definition: Coincidences are unlikely events
“an event which seems so unlikely that it is worth telling a story about”
“we sense that it is too unlikely to have been the result of luck or mere chance”
Coincidences are not just unlikely...
HHHHHHHHHHvs.
HHTHTHTTHT
Bayesian causal induction
€
p(cause | d)
p(chance | d)=
p(d | cause)
p(d | chance)
p(cause)
p(chance)
Likelihood ratio(evidence)
Prior odds
high
low
highlow
cause
chance
?
?
Bayesian causal induction
€
p(cause | d)
p(chance | d)=
p(d | cause)
p(d | chance)
p(cause)
p(chance)
Likelihood ratio(evidence)
Prior odds
high
low
highlow
cause
chance
coincidence
?
What makes a coincidence?
A coincidence is an event that provides evidence for causal structure, but not enough evidence to make us believe that structure exists
€
p(cause | d)
p(chance | d)=
p(d | cause)
p(d | chance)
p(cause)
p(chance)
What makes a coincidence?
A coincidence is an event that provides evidence for causal structure, but not enough evidence to make us believe that structure exists
€
p(cause | d)
p(chance | d)=
p(d | cause)
p(d | chance)
p(cause)
p(chance)
likelihood ratiois high
What makes a coincidence?
A coincidence is an event that provides evidence for causal structure, but not enough evidence to make us believe that structure exists
€
p(cause | d)
p(chance | d)=
p(d | cause)
p(d | chance)
p(cause)
p(chance)
likelihood ratiois high
prior oddsare low
posterior oddsare middling
HHHHHHHHHH
HHTHTHTTHT
€
p(cause | d)
p(chance | d)=
p(d | cause)
p(d | chance)
p(cause)
p(chance)
likelihood ratiois high
prior oddsare low
posterior oddsare middling
Bayesian causal inductioncause chance
E
C
Hypotheses:
Likelihoods:
Priors: p 1 - p
Data: frequency of effect in presence of cause
E
C
(small)
0 < p(E) < 1 p(E) = 0.5
HHHHHHHHHH
HHTHTHTTHT
€
p(cause | d)
p(chance | d)=
p(d | cause)
p(d | chance)
p(cause)
p(chance)
likelihood ratiois high
prior oddsare low
posterior oddsare middling
likelihood ratiois low
prior oddsare low
posterior oddsare low
coincidence
chance
HHHHHHHHHHHHHHHHHH
HHHHHHHHHH
HHHHlikelihood ratio
is middlingprior odds
are lowposterior odds
are low
mere coincidence
likelihood ratiois high
prior oddsare low
posterior oddsare middling
suspicious coincidence
likelihood ratiois very high
prior oddsare low
posterior oddsare high
cause
Mere and suspicious coincidences
mere coincidence
suspiciouscoincidence
evidence for acausal relation
€
p(cause | d)
p(chance | d)
• Transition produced by– increase in likelihood ratio (e.g., coinflipping)– increase in prior odds (e.g., genetics vs. ESP)
Testing the definition
• Provide participants with data from experiments
• Manipulate:– cover story: genetic engineering vs. ESP (prior)– data: number of males/heads (likelihood)– task: “coincidence or evidence?” vs. “how likely?”
• Predictions:– coincidences affected by prior and likelihood– relationship between coincidence and posterior
47 51 55 59 63 70 87 99
r = -0.98
47 51 55 59 63 70 87 99
Number of heads/males
Prop
orti
on “
coin
cide
nce”
Post
erio
r pr
obab
ilit
y
€
p(cause | d)
p(chance | d)=
p(d | cause)
p(d | chance)
p(cause)
p(chance)
Likelihood ratio(evidence)
Prior odds
high
low
highlow
cause
chance
coincidence
?
Rationality and irrationality
(Gilovich, 1991)
The bombing of London
(uniform)
Spread
Location
Ratio
Number
Change in... People
Bayesian causal inductioncause chanceHypotheses:
Likelihoods:
Priors: p 1 - p
uniformuniform
+regularity
T
X X XX
TT TT
X X XX
T
Data: bomb locations
r = 0.98
(uniform)
Spread
Location
Ratio
Number
Change in... People Bayes
May 14, July 8, August 21, December 25
vs.
August 3, August 3, August 3, August 3
Coincidences in date
People
Bayesian causal inductioncause chanceHypotheses:
Likelihoods:
Priors: p 1 - p
uniformuniform + regularity
P P PPP P PP
B B B
August
Data: birthdays of those present
People Bayes
• People’s sense of the strength of coincidences gives a close match to the likelihood ratio– bombing and birthdays
Rationality and irrationality
€
p(cause | d)
p(chance | d)=
p(d | cause)
p(d | chance)
p(cause)
p(chance)
• People’s sense of the strength of coincidences gives a close match to the likelihood ratio– bombing and birthdays
• Suggests that we accept false conclusions when our prior odds are insufficiently low
Rationality and irrationality
€
p(cause | d)
p(chance | d)=
p(d | cause)
p(d | chance)
p(cause)
p(chance)
Rationality and irrationality
Likelihood ratio(evidence)
Prior odds
high
low
highlow
cause
chance
coincidence
?
The paradox of coincidences
Prior odds can be low for two reasons
Incorrect current theory Significant discovery
Correct current theory False conclusion
Reason Consequence
Attending to coincidences makes more sense the less you know
Coincidences
• Provide evidence for causal structure, but not enough to make us believe that structure exists
• Intimately related to causal induction– an opportunity to discover a theory is wrong
• Guided by a well calibrated sense of when an event provides evidence of causal structure
Outline
1. A Bayesian approach to causal induction
2. Coincidencesi. what makes a coincidence?
ii. rationality and irrationality
iii. the paradox of coincidences
3. Explaining inductive leaps
Explaining inductive leaps
• How do people – infer causal relationships– identify the work of chance– predict the future– assess similarity and make generalizations– learn functions, languages, and concepts
. . . from such limited data?
• What knowledge guides human inferences?
Which sequence seems more random?
HHHHHHHHHHvs.
HHTHTHTTHT
Subjective randomness
• Typically evaluated in terms of p(d | chance)
• Assessing randomness is part of causal induction
€
p(chance | d)
p(cause | d)=
p(d | chance)
p(d | cause)
p(chance)
p(cause)evidence for a random
generating process
Randomness and coincidences
€
p(chance | d)
p(cause | d)=
p(d | chance)
p(d | cause)
p(chance)
p(cause)evidence for a random
generating process
€
p(cause | d)
p(chance | d)=
p(d | cause)
p(d | chance)
p(cause)
p(chance)
strength of coincidence
Randomness and coincidencesBombing
0
2
4
6
8
10
0 2 4 6 8 10
How big a coincidence?
How random?
Birthdays
0
2
4
6
8
10
0 2 4 6 8 10
How big a coincidence?
How random?
r = -0.96 r = -0.94
People
Bayes
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
Pick a random number…
Bayes’ theorem
∑∈′
′′=
Hh
hphdp
hphdpdhp
)()|(
)()|()|(
Bayes’ theorem
inference = f(data,knowledge)
Bayes’ theorem
inference = f(data,knowledge)
Predicting the future
Human predictions match optimal predictions from empirical prior
Iterated learning(Briscoe, 1998; Kirby, 2001)
data hypothesislea
rning
prod
uctio
n
data hypothesislea
rning
prod
uctio
n
d0 h1 d1 h2inf
erenc
e
sampli
ng
infere
nce
sampli
ng
p(h|d) p(d|h) p(d|h)p(h|d)
(Griffiths & Kalish, submitted)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
1 2 3 4 5 6 7 8 9
Iteration
Conclusion
• Many cognitive judgments are the result of challenging problems of induction
• Bayesian statistics provides a formal framework for exploring how people solve these problems
• Makes it possible to ask…– how do we make surprising discoveries?– how do we learn so much from so little?– what knowledge guides our judgments?
Collaborators
• Causal induction– Josh Tenenbaum (MIT)– Liz Baraff (MIT)
• Iterated learning– Mike Kalish (University of Louisiana)
Causes and coincidences
“coincidence” appears in 13/60 cases
p(“cause”) = 0.01
p(“cause”|“coincidence”) = 0.26
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
A reformulation: unlikely kinds
• Coincidences are events of an unlikely kind– e.g. a sequence with that number of heads
• Deals with the obvious problem...
p(10 heads) < p(5 heads, 5 tails)
Problems with unlikely kinds
• Defining kinds
August 3, August 3, August 3, August 3
January 12, March 22, March 22, July 19, October 1, December 8
Problems with unlikely kinds
• Defining kinds
• Counterexamples
P(4 heads) < P(2 heads, 2 tails)
P(4 heads) > P(15 heads, 8 tails)
HHHH > HHHHTHTTHHHTHTHHTHTTHHH
HHHH > HHTT