bayesian models of inductive generalization in language acquisition josh tenenbaum mit

45
Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

Upload: malina

Post on 06-Jan-2016

32 views

Category:

Documents


1 download

DESCRIPTION

Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT Joint work with Fei Xu, Amy Perfors, Terry Regier, Charles Kemp. The problem of generalization. How can people learn so much from such limited evidence? Kinds of objects and their properties - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

Bayesian models of inductive generalization in language acquisition

Josh Tenenbaum

MIT

Joint work with Fei Xu, Amy Perfors, Terry Regier, Charles Kemp

Page 2: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

The problem of generalization

How can people learn so much from such limited evidence?– Kinds of objects and their properties

– Meanings and forms of words, phrases, and sentences

– Causal relations

– Intuitive theories of physics, psychology, …

– Social structures, conventions, and rules

The goal: A general-purpose computational framework for understanding of how people make these inductive leaps, and how they can be successful.

Page 3: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

The problem of generalization

How can people learn so much from such limited evidence?– Learning word meanings from examples

“horse” “horse” “horse”

Page 4: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

How can people learn so much from such limited evidence?

The answer: human learners have abstract knowledge that provides inductive constraints – restrictions or biases on the hypotheses to be considered.

• Word learning: whole-object principle, taxonomic principle, basic-level bias, shape bias, mutual exclusivity, …

• Syntax: syntactic rules are defined over hierarchical phrase structures rather than linear order of words.

The problem of generalization

Poverty of the stimulus as a scientific tool…

Page 5: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

1. How does abstract knowledge guide generalization from sparsely observed data?

2. What form does abstract knowledge take, across different domains and tasks?

3. What are the origins of abstract knowledge?

The big questions

Page 6: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

1. How does abstract knowledge guide generalization from sparsely observed data?

Priors for Bayesian inference:

2. What form does abstract knowledge take, across different domains and tasks?

Probabilities defined over structured representations: graphs, grammars, predicate logic, schemas.

3. What are the origins of abstract knowledge? Hierarchical probabilistic models, with inference at multiple levels of

abstraction and multiple timescales.

The approach

Hhii

i

hPhdP

hPhdPdhP

)()|(

)()|()|(

Page 7: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

Three case studies of generalization

• Learning words for object categories• Learning abstract word-learning principles

(“learning to learn words”)– Taxonomic principle

– Shape bias

• Learning in a communicative context– Mike Frank

Page 8: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

Word learning as Bayesian inference(Xu & Tenenbaum, Psych Review 2007)

A Bayesian model can explain several core aspects of generalization in word learning…– learning from very few examples– learning from only positive examples– simultaneous learning of overlapping extensions– graded degrees of confidence– dependence on pragmatic and social context

… arguably, better than previous computational accounts based on hypothesis elimination (e.g., Siskind) or associative learning (e.g., Regier).

Page 9: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

Basics of Bayesian inference

• Bayes’ rule:

• An example– Data: John is coughing

– Some hypotheses:1. John has a cold

2. John has lung cancer

3. John has a stomach flu

– Likelihood P(d|h) favors 1 and 2 over 3

– Prior probability P(h) favors 1 and 3 over 2

– Posterior probability P(h|d) favors 1 over 2 and 3

Hhii

i

hPhdP

hPhdPdhP

)()|(

)()|()|(

Page 10: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

X

Bayesian generalization

“horse”?

?

?

??

?

Page 11: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

X Hypothesis space H of possible word meanings (extensions): e.g., rectangular regions

uniform~)(hp

Bayesian generalization

Page 12: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

X

:)|( hXp

Bayesian generalization

Hypothesis space H of possible word meanings (extensions): e.g., rectangular regions

uniform~)(hp

Assume examples are sampledrandomly from the word’s extension.

Page 13: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

X

Bayesian generalization

hxx

n

nhhXp

,,if

1)size(

1)|(

hxi any if 0

Hypothesis space H of possible word meanings (extensions): e.g., rectangular regions

uniform~)(hp

Page 14: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

X

hxx

n

nhhXp

,,if

1)size(

1)|(

hxi any if 0

Hypothesis space H of possible word meanings (extensions): e.g., rectangular regions

uniform~)(hp

Bayesian generalization

Page 15: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

X

Bayesian generalization

hxx

n

nhhXp

,,if

1)size(

1)|(

hxi any if 0

Hypothesis space H of possible word meanings (extensions): e.g., rectangular regions

uniform~)(hp

“Size principle”: Smaller hypotheses receive greater likelihood, and exponentially more so as n increases.

Page 16: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

X

Bayesian generalization

hxx

n

nhhXp

,,if

1)size(

1)|(

hxi any if 0

Hypothesis space H of possible word meanings (extensions): e.g., rectangular regions

uniform~)(hp

“Size principle”: Smaller hypotheses receive greater likelihood, and exponentially more so as n increases.

Page 17: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

X

c.f. Subset principle

Bayesian generalization

hxx

n

nhhXp

,,if

1)size(

1)|(

hxi any if 0

Hypothesis space H of possible word meanings (extensions): e.g., rectangular regions

uniform~)(hp

“Size principle”: Smaller hypotheses receive greater likelihood, and exponentially more so as n increases.

Page 18: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

Bayes

Maximum likelihood or “subset principle”

Generalization gradients

Hypothesis averaging: Xyh

XhpXyp, include

)|()| Wa is (

Page 19: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

Word learning as Bayesian inference(Xu & Tenenbaum, Psych Review 2007)

superordinate

basic-level

subordinate

Page 20: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

• Prior p(h): Choice of hypothesis space embodies traditional constraints: whole object principle, shape bias, taxonomic principle…

– More fine-grained prior favors more distinctive clusters.

• Likelihood p(X | h): Random sampling assumption.

– Size principle: Smaller hypotheses receive greater likelihood, and exponentially more so as n increases.

Word learning as Bayesian inference(Xu & Tenenbaum, Psych Review 2007)

hxx

n

nhhXp

,,if

1)size(

1)|(

hxi any if 0

Page 21: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT
Page 22: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT
Page 23: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

Generalization experiments

Bayesianmodel

Children’sgeneralizations

Not easily explained by hypothesis elimination or associative models.

Page 24: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

Further questions• Bayesian learning for other kinds of words?

– Verbs (Niyogi; Alishahi & Stevenson; Perfors, Wonnacott, Tenenbaum)

– Adjectives (Dowman; Schmidt, Goodman, Barner, Tenenbaum)

• How fundamental and general is learning by “suspicious coincidence” (the size principle)?– Other domains of inductive generalization in adults and children

(Tenenbaum et al; Xu et al.)

– Generalization in < 1-year-old infants (Gerken; Xu et al.)

• Bayesian word learning in more natural communicative contexts? – Cross-situational mapping with real-world scenes and utterances

(Frank, Goodman & Tenenbaum; c.f., Yu)

Page 25: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

Further questions• Where do the hypothesis space and priors come

from?• How does word learning interact with conceptual

development?

Page 26: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

Principles T

Structure S

Data D

A hierarchical Bayesian view

? ?“fep” ? ?

...?

Whole-object principleShape biasTaxonomic principle…

“ziv” ? “gip”?

“Basset hound”

“dog”

“animal”

“cat”

“tree”

“daisy”

“thing”

Page 27: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

Principles T

Structure S

Data D

A hierarchical Bayesian view

? ?“fep” ? ?

...?

Whole-object principleShape biasTaxonomic principle…

“ziv” ? “gip”?

“Basset hound”

“dog”

“animal”

“cat”

“tree”

“daisy”

“thing”

Page 28: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

Different forms of structure

Dominance Order

Line RingFlat

Hierarchy Taxonomy Grid Cylinder

Page 29: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

F: form

S: structure

D: data

Tree-structured taxonomy

Disjoint clusters

Linearorder

X1

X3

X4

X5

X6

X7

X2

X1

X3

X4

X5

X6

X7

X2

X1

X3X2

X5

X4 X6

X7

Discovery of structural form(Kemp and Tenenbaum)

X1X2X3X4X5X6X7

Features

P(S | F)Simplicity

P(D | S)Fit to data

P(F)

Page 30: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

Principles T

Structure S

Data D

A hierarchical Bayesian view

? ?“fep” ? ?

...?

Whole-object principleShape biasTaxonomic principle…

“ziv” ? “gip”?

“Basset hound”

“dog”

“animal”

“cat”

“tree”

“daisy”

“thing”

Page 31: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

The shape bias in word learning(Landau, Smith, Jones 1988)

This is a dax. Show me the dax…

• A useful inductive constraint: many early words are labels for object categories, and shape may be the best cue to object category membership.

• English-speaking children typically show the shape bias at 24 months, but not at 20 months.

Page 32: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

Is the shape bias learned?

• Smith et al (2002) trained 17-month-olds on labels for 4 artificial categories:

• After 8 weeks of training (20 min/week), 19-month-olds show the shape bias:

“wib”

“lug”

“zup”“div”

This is a dax.

Show me the dax…

Page 33: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

Transfer to real-world vocabulary

The puzzle: The shape bias is a powerful inductive constraint, yet can be learned from very little data.

Page 34: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

Learning abstract knowledge about feature variability

“wib”

“lug”

“zup”“div”

The intuition: - Shape varies across categories but relatively

constant within nameable categories.

- Other features (size, color, texture) vary both within and across nameable object categories.

Page 35: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

Learning a Bayesian prior

??

?

??

?

Hypothesis space H of possible word meanings (extensions): e.g., rectangular regions

“horse”

p(h) ~ uniform

shape

color

Page 36: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

??

?

??

?

Hypothesis space H of possible word meanings (extensions): e.g., rectangular regions

“cat” “cup” “ball” “chair”

“horse”

p(h) ~ uniform

shape

color

Learning a Bayesian prior

Page 37: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

“horse”?

?

?

??

?

Hypothesis space H of possible word meanings (extensions): e.g., rectangular regions

“cat” “cup” “ball” “chair”

p(h) ~ long & narrow: high others: low

shape

color

Learning a Bayesian prior

Page 38: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

“horse”?

?

?

??

?

Hypothesis space H of possible word meanings (extensions): e.g., rectangular regions

“cat” “cup” “ball” “chair”

p(h) ~ long & narrow: high others: low

shape

color

Learning a Bayesian prior

Page 39: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

color

Nameable object categories tend to be homogeneous in shape, but heterogeneous in color, material, …

Level 1: specific categories

Data

Level 2: nameable object categories in general

shape

color

“cat” “cup” “ball” “chair”

shape shape

color

shape

color

?

?

Hierarchical Bayesian model

Page 40: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

color

Nameable object categories tend to be homogeneous in shape, but heterogeneous in color, material, …

Level 1: specific categories

Data

Level 2: nameable object categories in general

shape

color

“cat” “cup” “ball” “chair”

shape shape

color

shape

color

Hierarchical Bayesian model

Page 41: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

Level 1: specific categories

Data

Level 2: nameable object categories in general

shape

color

“cat” “cup” “ball” “chair”

shape shape shape

color color color

shape

color

p(i) ~ Exponential()

p(i|i) ~ Dirichlet(i)

p(yi|i) ~ Multinomial(i)

i: within-category variability for feature i

{yshape , ycolor}

low high

low high

Page 42: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

Learning the shape bias

“wib”

“lug”

“zup”“div”

Training

Page 43: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

This is a dax.

Show me the dax…

Training Test

Second-order generalization test

Page 44: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

– Word Learning Whole object bias Taxonomic principle (Markman)

Shape bias (Smith)

– Causal reasoning Causal schemata (Kelley)

– Folk physics Objects are unified, persistent (Spelke)

– Number Counting principles (Gelman) – Folk biology Principles of taxonomic rank (Atran)

– Folk psychology Principle of rationality (Gergely)

– Ontology M-constraint on predicability (Keil)

– Syntax UG (Chomsky)

– Phonology Faithfulness, Markedness constraints (Prince, Smolensky)

Abstract knowledge in cognitive development

Page 45: Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

Conclusions• Bayesian inference over hierarchies of structured representations provides a way

to study core questions of human cognition, in language and other domains.– What is the content and form of abstract knowledge? – How can abstract knowledge guide generalization from sparse data? – How can abstract knowledge itself be acquired? What is built in?

• Going beyond traditional dichotomies.– How can structured knowledge be acquired by statistical learning?– How can domain-general learning mechanisms acquire domain-specific inductive

constraints?

• A different way to think about cognitive development.– Powerful abstractions (taxonomic structure, shape bias, hierarchical organization of syntax)

can be inferred “top down”, from surprisingly little data, together with learning more concrete knowledge.

– Very different from the traditional empiricist or nativist views of abstraction. Worth pursuing more generally…