computational models of cognitive development: the grammar analogy josh tenenbaum mit
TRANSCRIPT
![Page 1: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/1.jpg)
Computational models of cognitive development: the grammar analogy
Josh TenenbaumMIT
![Page 2: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/2.jpg)
Top 10 reasons to be Bayesian 1. Bayesian inference over hierarchies of structured
representations provides integrated answers to these key questions of cognition:– What is the content and form of human knowledge, at
multiple levels of abstraction?
– How is abstract knowledge used to guide learning and inference of more specific representations?
– How are abstract systems of knowledge themselves learned?
![Page 3: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/3.jpg)
The plan• This morning…
• … this afternoon: higher-level cognition– Causal theories
– Theories for object categories and properties
Phrase structures
Utterances
Linguistic grammar
Object layout (scenes)
Image features
Scene grammar
P(phrase structures | grammar)
P(utterances | phrase structures)
P(objects | grammar)
P(features | objects)
P(grammar) P(grammar)
![Page 4: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/4.jpg)
(fragments of) Theories• Physics
– The natural state of matter is at rest. Every motion has a causal agent.
– F = m a.
• Medicine – People have different amounts of four humors in the body, and illnesses are caused by
them being out of balance.
– Diseases result from pathogens transmitted through the environment, interacting with genetically controlled cellular mechanisms.
• Psychology– Rational agents choose efficient plans of action in order to satisfy their desires given
their beliefs.
– Action results from neuronal activity in the prefrontal cortex, which activates units in the motor cortex that control motor neurons projecting to the body’s muscles.
![Page 5: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/5.jpg)
“A theory consists of three interrelated components: a set of phenomena that are in its domain, the causal laws and other explanatory mechanisms in terms of which the phenomena are accounted for, and the concepts in terms of which the phenomena and explanatory apparatus are expressed.”
The structure of intuitive theories
Carey (1985), “Constraints on semantic development”
![Page 6: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/6.jpg)
The function of intuitive theories• “Set the frame” of cognition by defining different domains of thought.
• Describe what things are found in that domain, as well as what behaviors and relations are possible
• Provide causal explanations appropriate to that domain.
• Guide inference and learning:– Highlights variables to attend to
– Generates hypotheses/explanations to consider
– Supports generalization from very limited data
– Supports prediction and action planning.
![Page 7: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/7.jpg)
Theories in cognitive development
The big question: what develops from childhood to adulthood?– One extreme: basically everything
• Totally new ways of thinking, e.g. logical thinking.
– The other extreme: basically nothing • Just new facts (specific beliefs), e.g., trees can die.
– Intermediate view: something important• New theories, new ways of organizing beliefs.
![Page 8: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/8.jpg)
Hierarchical Bayesian models for learning abstract knowledge
Causal theory
Causal network structures
Observed events
Structural form
Structure over objects
Observed properties of objects
Linguistic grammar
Phrase structures
Observed utterances in the language
![Page 9: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/9.jpg)
Causal theory
Observed events
(Griffiths, Tenenbaum, Kemp et al.)
Hierarchical Bayesian framework for causal induction
Causal networkstructures
E
BA
E
BA
E
BA
E
BA
BA
BA
B
A
![Page 10: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/10.jpg)
Tree with species at leaf nodes
mouse
squirrel
chimp
gorilla
mousesquirrel
chimpgorilla
F1
F2
F3
F4
Ha
s T
9h
orm
on
es
??
?
…
Hierarchical Bayesian frameworkfor property induction
(Kemp & Tenenbaum)
Structural form
Observed properties of
objects
Structure over objects
![Page 11: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/11.jpg)
Questions
• How can the abstract knowledge be learned?
• How do we represent abstract knowledge?
A tradeoff in representational expressiveness versus learnability...
![Page 12: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/12.jpg)
Types: Block, Detector, Trial
Predicates: Contact(block, detector, trial) non-random
Activates(block, detector) random
Active(detector, trial) random
Dependency statements: Activates(b, d) ~ TabularCPD[[q (1-q)]];
Active(d, t) ~ NoisyORAggCPD[1 0]
({Contact(b, d, t) for block b: Activates(b, d)});
Causaltheory
Causal structure
Observedevents
Activates(A, detector)? Activates(B, detector)?
Relational Skeleton(unknown)
E
BA
E
BA
E
BA
Contact(A, detector, t1) Contact(B, detector, t1) Active(detector, t1) Contact(A, detector, t2)
~Contact(B, detector, t2) Active(detector, t2)
E
BA
![Page 13: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/13.jpg)
Questions
• How can the abstract knowledge be learned?
• How do we represent abstract knowledge?
A tradeoff in representational expressiveness versus learnability...
… consider a representation for theories which is less expressive but more learnable.
![Page 14: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/14.jpg)
patients
conditions
has(patient,condition)
Learning causal networks
Observed events
Causalstructure
![Page 15: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/15.jpg)
Different causal networks,Same domain theory
Different domain theories
![Page 16: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/16.jpg)
The grammar analogy
People
Cats
Bees
Trees
Talk
Sleep
Fly
Grow
People
Cats
Bees
Trees
Talk
Sleep
Fly
Grow
Different semantic networks,Same linguistic grammar
Different grammars
People
Cats
Bees
Trees
Imagine
Sleep
Fly
Grow
People
Cats
Bees
Trees
Imagine
Sleep
Fly
Grow
![Page 17: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/17.jpg)
Natural Language Grammar
Abstract classes: N, V, ….
Production rules: S [N V], ….
(“Nouns precede verbs”)
N {people, cats, bees, trees, .…} V {talks, sleep, fly, grow, ….}
Linguistic Grammar
Syntactic structures
Observed sentences in the language
Causal Theory
Abstract classes: D, S, ….
Causal laws: [D S], …. (“Diseases cause symptoms”)
D {flu, bronchitis, lung cancer,….}
S {fever, cough, chest pain, ….}
Causal Theory
Causal network structures
Observed data in the domain
The grammar analogy
![Page 18: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/18.jpg)
Classes = {B, D, S}
Laws = {B D, D S}( : possible causal link)
Classes = {B, D, S}
Laws = {S D}
Classes = {C}
Laws = {C C}
Theories as graph grammars
![Page 19: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/19.jpg)
History of the grammar analogy“The grammar of a language can be viewed as a theory of the structure of this language. Any scientific theory is based on a certain finite set of observations and, by establishing general laws stated in terms of certain hypothetical constructs, it attempts to account for these observations, to show how they are interrelated, and to predict an indefinite number of new phenomena…. Similarly, a grammar is based on a finite number of observed sentences… and it ‘projects’ this set to an infinite set of grammatical sentences by establishing general ‘laws’… [framed in terms of] phonemes, words, phrases, and so on.…”
Chomsky (1956), “Three models for the description of language”
![Page 20: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/20.jpg)
patients
conditions
has(patient,condition)
B: working in factory, smoking, stress, high fat diet, …
D: flu, bronchitis, lung cancer, heart disease, …
S: headache, fever, coughing, chest pain, …
Classes = {B, D, S}
Laws = {B D, D S}
Learning causal networks
Causaltheory
Observed events
Causalstructure
![Page 21: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/21.jpg)
Using a causal theory
Given current causal network beliefs . . .
. . . and some new observed data: Correlation between “working in factory” and “chest pain”.
![Page 22: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/22.jpg)
The theory constrains possible hypotheses:
And rules out others:
• Allows strong inferences about causal structure from very limited data.• Very different from conventional Bayes net learning.
![Page 23: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/23.jpg)
patients
conditions
has(patient,condition)
R: working in factory, smoking, stress, high fat diet, …
D: flu, bronchitis, lung cancer, heart disease, …
S: headache, fever, coughing, chest pain, …
Learning causal theories
Causaltheory
Observed events
Causalstructure
Classes = {B, D, S}
Laws = {B D, D S}
How do we define the theory as a probabilistic generative model?
![Page 24: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/24.jpg)
1 48
2 69 5
7 3
Profs Grads Ugrads
Profs 0.1 0.9 0.9
Grads 0.1 0.1 0.9
Ugrads 0.1 0.1 0.1
Profs Grads UgradsLearning a relational
theory(c.f. Charles Kemp, Monday)
Profs
Grads
Ugrads
Prof Grads Ug
dominates
people
peop
le
![Page 25: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/25.jpg)
1 48
2 69 5
7 3
0.1 0.9 0.9
0.1 0.1 0.9
0.1 0.1 0.1
The Infinite Relational Model (IRM)
![Page 26: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/26.jpg)
patients
conditions
has(patient,condition)
Causaltheory
Observed events
Causalstructure
IRM
Causal graphical model
Classes z
Laws
1 2 3 4
0.80.0 0.01
0.0 0.0 0.75
0.0 0.0 0.0
5 6 7 8
9 1011 12
‘B’‘D’
‘S’
‘B’ ‘D’ ‘S’
‘B’
‘D’
‘S’
![Page 27: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/27.jpg)
attributes (1-12)
observed data
True network
Sample 75 observations…
patients
Learning with a uniform prior on network structures:
![Page 28: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/28.jpg)
True network
Sample 75 observations…
Learning a block-structured prior on network structures: (Mansinghka et al. UAI 06)
attributes (1-12)
observed datapatients
z
1 2 3 40.80.0 0.01
0.0 0.0 0.75
0.0 0.0 0.0
5 6 7 8
9 1011 12
![Page 29: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/29.jpg)
True structure of Bayesian network N:
edge (N)
class (Z)
edge (N)
1 2 3 4 5 6
7 8 9 10 11 12 13 14 15 16
# of samples: 20 80 1000
Data D
Network N
Data D
Network N
AbstractTheory
1 2 3 4 5 6…
7 8 9 10 11 12 1314 15 16…
…
0.40.0
0.0 0.0…
…
(Mansinghka, Kemp, Tenenbaum, Griffiths UAI 06)
c1 c2
c1
c2
c1
c2
Classes Z
“blessing of abstraction”
![Page 30: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/30.jpg)
The flexibility of a nonparametric prior
edge (N)
class (Z)
edge (N)
12
3
4567
8
9
1011 12
# of samples: 40 100 1000
True structure of Bayesian network N:
Data D
Network N
Data D
Network N
AbstractTheory 1 2 3 4
5 6 7 89 10 11 12
…
0.1
c1
c1
c1
Classes Z
…
…
…
![Page 31: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/31.jpg)
Insights
• Abstract theories, which have traditionally either been neglected or presumed to be innate in cognitive development, could in fact be learned from data by rational inferential means, together with specific causal relations.
• Abstract theories can in some cases be learned more quickly and more easily than specific concrete causal relations (the “blessing of abstraction”).
• Key to progress on learning abstract knowledge : finding sweet spots in the expressiveness – learnability tradeoff….
![Page 32: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/32.jpg)
Hierarchical Bayesian models for learning abstract knowledge
Causal theory
Causal network structures
Observed events
Structural form
Structure over objects
Observed properties of objects
Linguistic grammar
Phrase structures
Observed utterances in the language
![Page 33: Computational models of cognitive development: the grammar analogy Josh Tenenbaum MIT](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef95503460f94c0b368/html5/thumbnails/33.jpg)
1. How does knowledge guide learning from sparsely observed data? Bayesian inference, with priors based on background knowledge.
2. What form does knowledge take, across different domains and tasks? Probabilities defined over structured representations: graphs, grammars, rules, logic, relational schemas, theories.
3. How is more abstract knowledge itself learned? Hierarchical Bayesian models, with inference at multiple levels of abstraction.
4. How can abstract knowledge constrain learning yet maintain flexibility, balancing assimilation and accommodation? Nonparametric models, growing in complexity as the data require.
A framework for inductive learning