a bayesian approach to the poverty of the stimulus

103
A Bayesian Approach to the Poverty of the Stimulus Amy Perfors MIT With Josh Tenenbaum (MIT) and Terry Regier (University of Chicago)

Upload: jesus

Post on 23-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

A Bayesian Approach to the Poverty of the Stimulus. Amy Perfors MIT. With Josh Tenenbaum (MIT) and Terry Regier (University of Chicago). Innate. Learned. Explicit Structure. Innate. Learned. No explicit Structure. Language has hierarchical phrase structure. No. Yes. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Bayesian Approach to the Poverty of the Stimulus

A Bayesian Approach to the Poverty of the Stimulus

Amy PerforsMIT

With Josh Tenenbaum (MIT) and Terry Regier (University of

Chicago)

Page 2: A Bayesian Approach to the Poverty of the Stimulus

Innate

Learned

Page 3: A Bayesian Approach to the Poverty of the Stimulus

Innate

Learned

Explicit Structure

No explicit

Structure

Page 4: A Bayesian Approach to the Poverty of the Stimulus

No Yes

Language has hierarchical phrase structure

Page 5: A Bayesian Approach to the Poverty of the Stimulus

Why believe that language has hierarchical phrase structure?

Formal properties + information-theoretic, simplicity-based argument (Chomsky, 1956) Dependency structure of language:

A finite-state grammar cannot capture the infinite sets of English sentences with dependencies like this

If we restrict ourselves to only a finite set of sentences, then in theory a finite-state grammar could account for them: “but this grammar will be so complex as to be of little use or interest.”

Page 6: A Bayesian Approach to the Poverty of the Stimulus

Simple declarative: The girl is happy, They are eating

Simple interrogative: Is the girl happy? Are they eating?

1. Linear: move the first “is” (auxiliary) in the sentence to the beginning

2. Hierarchical: move the auxiliary in the main clause to the beginning

Res

ult

Hyp

oth

eses

Complex declarative: The girl who is sleeping is happy.

Dat

a

Children say: Is the girl who is sleeping happy?

NOT: *Is the girl who sleeping is happy?

Tes

t

Chomsky, 1965, 1980; Crain & Nakayama, 1987

Why believe that structure dependence is innate?

The Argument from the Poverty of the Stimulus (PoS):

Page 7: A Bayesian Approach to the Poverty of the Stimulus

Why believe it’s not innate?

There are actually enough complex interrogatives (Pullum & Scholz 02)

Children’s behavior can be explained via statistical learning of natural language data (Lewis & Elman 01; Reali & Christiansen 05)

It is not necessary to assume a grammar with explicit structure

Page 8: A Bayesian Approach to the Poverty of the Stimulus

Innate

Learned

Explicit Structure

No explicit

Structure

Page 9: A Bayesian Approach to the Poverty of the Stimulus

Innate

Learned

Explicit Structure

No explicit

Structure

Page 10: A Bayesian Approach to the Poverty of the Stimulus

Our argument

Page 11: A Bayesian Approach to the Poverty of the Stimulus

We suggest that, contra the PoS claim: It is possible, given the nature of the input and certain

domain-general assumptions about the learning mechanism, that an ideal, unbiased learner can realize that language has a hierarchical phrase structure; therefore this knowledge need not be innate

The reason: grammars with hierarchical phrase structure offer an optimal tradeoff between simplicity and fit to natural language data

Our argument

Page 12: A Bayesian Approach to the Poverty of the Stimulus

Plan

Model Data: corpus of child-directed speech (CHILDES)

Grammars Linear & hierarchical Both: Hand-designed & result of local

search Linear: automatic, unsupervised ML

Evaluation Complexity vs. fit

Results Implications

Page 13: A Bayesian Approach to the Poverty of the Stimulus

The model: Data

Corpus from CHILDES database (Adam, Brown corpus)

55 files, age range 2;3 to 5;2 Sentences spoken by adults to children Each word replaced by syntactic category

det, n, adj, prep, pro, prop, to, part, vi, v, aux, comp, wh, c

Ungrammatical sentences and the most grammatically complex sentence types were removed: kept 21792 out of 25876 utterances Topicalized sentences(66); sentences serial verb constructions (459),

subordinate phrases (845), sentential complements (1636), and conjunctions (634). Ungrammatical sentences (444)

Page 14: A Bayesian Approach to the Poverty of the Stimulus

Data

Final corpus contained 2336 individual sentence types corresponding to 21792 sentence tokens

Page 15: A Bayesian Approach to the Poverty of the Stimulus

Data: variation

Amount of evidence available at different points in development

Page 16: A Bayesian Approach to the Poverty of the Stimulus

Data: variation

Amount of evidence available at different points in development

Amount comprehended at different points in development

Page 17: A Bayesian Approach to the Poverty of the Stimulus

Data: amount available

Rough estimate – split by age

Epoch 1

Epoch 2

Epoch 3

Epoch 4

Epoch 5

# FilesAge % types# types

2;3 to 3;1

2;3 to 2;8 879

1295

1735

2090

2336

38%

55%

74%

89%

100%

11

2;3 to 3;5

2;3 to 4;2

2;3 to 5;2

2;3 173 7.4%Epoch 0 1

33

22

55

44

Page 18: A Bayesian Approach to the Poverty of the Stimulus

Data: amount comprehended

Rough estimate – split by frequency

Level 1

Level 2

Level 3

Level 4

Level 5

Level 6

Frequency # types % tokens% types

8

37

67

115

268

2336

500+

100+

50+

25+

10+

1+ (all)

0.3%

1.6%

2.9%

4.9%

12%

100%

28%

55%

64%

71%

82%

100%

Page 19: A Bayesian Approach to the Poverty of the Stimulus

The model

Data Child-directed speech (CHILDES)

Grammars Linear & hierarchical Both: Hand-designed & result of local

search Linear: automatic, unsupervised ML)

Evaluation Complexity vs. fit

Page 20: A Bayesian Approach to the Poverty of the Stimulus

Grammar types

Context-free grammar

Rules Example

“Flat” grammar

Rules

List of each sentence

Example

Regular grammar

Rules

NT t NT

Example

NT tNT NT NT

NT t NT

NT NT

NT t

HierarchicalLinear

Rules

Example

1-state grammar

Anything accepted

Page 21: A Bayesian Approach to the Poverty of the Stimulus

CFG-S

Description

Designed to be as linguistically plausible as possible

Example productions

Standard CFG

CFG-L

Description

Derived from CFG-S; contains additional productions corresponding to different expansions of the same NT (puts less

probability mass on recursive productions)

Example productions

Larger CFG

77 rules, 15 non-terminals 133 rules, 15 non-terminals

Specific hierarchical grammars: Hand-designed

Page 22: A Bayesian Approach to the Poverty of the Stimulus

FLAT

List of each

sentence

2336 rules, 0 non-

terminals

1-STATE

Anything accepted

26 rules, 0 non-

terminals

Exact fit, no compression

Poor fit, high compression

Specific linear grammars: Hand-designed

Page 23: A Bayesian Approach to the Poverty of the Stimulus

REG-N

Narrowest regular derived from CFG

289 rules, 85 non-

terminals

FLAT

List of each

sentence

2336 rules, 0 non-

terminals

1-STATE

Anything accepted

26 rules, 0 non-

terminals

Exact fit, no compression

Poor fit, high compression

Specific linear grammars: Hand-designed

Page 24: A Bayesian Approach to the Poverty of the Stimulus

Mid-level regular derived from CFG

REG-M

169 rules, 14 non-

terminals

REG-N

Narrowest regular derived from CFG

289 rules, 85 non-

terminals

FLAT

List of each

sentence

2336 rules, 0 non-

terminals

1-STATE

Anything accepted

26 prods, 0 non-

terminals

Exact fit, no compression

Poor fit, high compression

Specific linear grammars: Hand-designed

Page 25: A Bayesian Approach to the Poverty of the Stimulus

REG-B

Broadest regular derived from CFG

117 rules, 10 non-

terminals

Mid-level regular derived from CFG

REG-M

169 prods, 14 non-

terminals

REG-N

Narrowest regular derived from CFG

289 prods, 85 non-

terminals

FLAT

List of each

sentence

2336 rules, 0 non-

terminals

1-STATE

Anything accepted

26 rules, 0 non-

terminals

Exact fit, no compression

Poor fit, high compression

Specific linear grammars: Hand-designed

Page 26: A Bayesian Approach to the Poverty of the Stimulus

Local search around hand-designed grammars

Automated search

Linear: unsupervised, automatic HMM learning

Goldwater & Griffiths, 2007

Bayesian model for acquisition of trigram HMM (designed for POS tagging, but given a corpus of syntactic categories,

learns a regular grammar)

Page 27: A Bayesian Approach to the Poverty of the Stimulus

The model

Data Child-directed speech (CHILDES)

Grammars Linear & hierarchical Hand-designed & result of local search Linear: automatic, unsupervised ML

Evaluation Complexity vs. fit

Page 28: A Bayesian Approach to the Poverty of the Stimulus

Grammars

T: type of grammar

G: Specific grammar

D: Data

Context-freeRegular

Flat, 1-state

unbiased (uniform)

Page 29: A Bayesian Approach to the Poverty of the Stimulus

Grammars

T: type of grammar

G: Specific grammar

D: Data

Context-freeRegular

Flat, 1-state

data fit (likelihood)

complexity (prior)

Page 30: A Bayesian Approach to the Poverty of the Stimulus

Low prior probability = more complex

Low likelihood = poor fit to the data

Fit: low

Simplicity: high

Fit: moderate

Simplicity: moderate

Fit: high

Simplicity: low

Tradeoff: Complexity vs. Fit

Page 31: A Bayesian Approach to the Poverty of the Stimulus

Measuring complexity: prior

Designing a grammar (God’s eye view)

Grammars with more rules and non-terminals will have lower prior probability

n = # of nonterminals Ni = # items in production iPk = # productions of nonterminal k V = vocab sizeΘk = production probability parameters for k

Page 32: A Bayesian Approach to the Poverty of the Stimulus

Measuring fit: likelihood

Probability of that grammar generating the data Product of the probability of each parse

Ex: pro aux det n

= 0.25 = 0.5*0.25*1.0*0.25*0.5 = 0.016

Page 33: A Bayesian Approach to the Poverty of the Stimulus

Plan

Model Data: corpus of child-directed speech

(CHILDES) Grammars

Linear & hierarchical Hand-designed & result of local search Linear: automated, unsupervised ML

Evaluation Complexity vs. fit

Results Implications

Page 34: A Bayesian Approach to the Poverty of the Stimulus

Corpus level

FLAT REG-N REG-M REG-B REG-AUTO

1-ST CFG-S CFG-L

1 -116 -119 -119 -119 -125 -135 -161 -176

2 -764 -537 -581 -538 -501 -476 -545 -586

3 -1480 -971 -905 -875 -841 -765 -835 -902

4 -7337 -3284 -2963 -2787 -3011 -3339 -2653 -2784

5 -13466 -5256 -4896 -4772 -5083 -6034 -4545 -4587

6 -85730 -29441 -27300 -27561 -28713 -40360 -27883 -26967

Results: data split by frequency levels (estimate of comprehension)

Log posterior probability (lower magnitude = better)

Page 35: A Bayesian Approach to the Poverty of the Stimulus

Results: data split by age (estimate of availability)

Page 36: A Bayesian Approach to the Poverty of the Stimulus

Results: data split by age (estimate of availability)

Log posterior probability (lower magnitude = better)

Corpus epoch

FLAT REG-N REG-M REG-B REG-AUTO

1-ST CFG-S CFG-L

0 -4849 -3181 -2671 -2488 -2422 -2443 -2187 -2312

1 -28778 -11608 -10209 -9891 -11127 -13379 -9673 -9522

2 -44158 -16346 -14972 -14557 -15643 -20594 -14541 -14194

3 -61365 -21757 -20182 -19775 -20332 -28765 -20109 -19527

4 -75570 -26201 -24507 -24193 -24786 -35547 -24706 -23904

5 -85730 -29441 -27300 -27561 -28713 -40360 -27883 -26967

Page 37: A Bayesian Approach to the Poverty of the Stimulus

Generalization: How well does each grammar predict sentences it hasn’t seen?

Page 38: A Bayesian Approach to the Poverty of the Stimulus

Generalization: How well does each grammar predict sentences it hasn’t seen?

Type In

corp?

Example RGN RG-M RG-B AUTO 1-ST CFG-S CFG-L

Simple Declarative

Eagles do fly. (n aux vi)

Simple Interrogative

Do eagles fly? (aux n vi)

Complex Declarative

Eagles that are alive do fly. (n comp aux adj aux vi)

Complex Interrogative

Do eagles that are alive fly? (aux n comp aux adj vi)

Complex Interrogative

Are eagles that alive do fly? (aux n comp adj aux vi)

Complex interrogatives

Page 39: A Bayesian Approach to the Poverty of the Stimulus

Shown that given reasonable domain-general assumptions, an unbiased rational learner could realize that languages have a hierarchical structure based on typical child-directed input

This paradigm is valuable: it makes any assumptions explicit and enables us to rigorously evaluate how different representations capture the tradeoff between simplicity and fit to data

In some ways, “higher-order” knowledge may be easier to learn than specific details (the “blessing of abstraction”)

Take-home messages

Page 40: A Bayesian Approach to the Poverty of the Stimulus

Implications for innateness?

Ideal learner Strong(er) assumptions:

The learner can find the best grammar in the space of possibilities

Weak(er) assumptions The learner has the ability to parse the corpus into

syntactic categories The learner can represent both linear and hierarchical

grammars Assume a particular way of calculating complexity &

data fit Have we actually found representative

grammars?

Page 41: A Bayesian Approach to the Poverty of the Stimulus

The End

Thanks also to the following for many helpful discussions: Virginia Savova, Jeff Elman, Danny Fox, Adam Albright, Fei Xu, Mark Johnson, Ken Wexler,

Ted Gibson, Sharon Goldwater, Michael Frank, Charles Kemp, Vikash Mansinghka, Noah Goodman

Page 42: A Bayesian Approach to the Poverty of the Stimulus
Page 43: A Bayesian Approach to the Poverty of the Stimulus

REG-B

Broadest regular derived from CFG

117 rules, 10 non-

terminals

Mid-level regular derived from CFG

REG-M

169 prods, 14 non-

terminals

REG-N

Narrowest regular derived from CFG

289 prods, 85 non-

terminals

FLAT

List of each

sentence

2336 rules, 0 non-

terminals

1-STATE

Anything accepted

26 rules, 0 non-

terminals

Exact fit, no compression

Poor fit, high compression

Specific linear grammars: Hand-designed

Page 44: A Bayesian Approach to the Poverty of the Stimulus

Why these results?

Natural language actually is generated from a grammar that looks more like a CFG

The other grammars overfit and therefore do not capture important language-specific generalizations

Flat

Page 45: A Bayesian Approach to the Poverty of the Stimulus
Page 46: A Bayesian Approach to the Poverty of the Stimulus

Computing the prior…

CFGREG

Context-free grammar

Regular grammar

NT t NTNT t

NT NT NT

NT t NT

NT NT

NT t

Page 47: A Bayesian Approach to the Poverty of the Stimulus
Page 48: A Bayesian Approach to the Poverty of the Stimulus

Likelihood, intuitively

Z: rule out because it does not explain some of the data points

X and Y both “explain” the data points, but X is the more likely source

Page 49: A Bayesian Approach to the Poverty of the Stimulus
Page 50: A Bayesian Approach to the Poverty of the Stimulus

Possible empirical tests

Present people with data the model learns FLAT, REG, and CFGs from; see which novel productions they generalize to Non-linguistic? To small children?

Examples of learning regular grammars in real life: does the model do the same?

Page 51: A Bayesian Approach to the Poverty of the Stimulus

Do people learn regular grammars?

S1 s2 s3 w1 w1 w1

Miss Mary Mack, Mack, MackAll dressed in black, black, blackWith silver buttons, buttons, buttonsAll down her back, back, backShe asked her mother, mother, mother,…

X s1 s2 s3

Spanish dancer, do the splits.Spanish dancer, give a kick.Spanish dancer, turn around.

Children’s Songs: Line level grammar

Page 52: A Bayesian Approach to the Poverty of the Stimulus

Do people learn regular grammars?

Teddy bear, teddy bear, turn around.Teddy bear, teddy bear, touch the ground.Teddy bear, teddy bear, show your shoe.Teddy bear, teddy bear, that will do.Teddy bear, teddy bear, go upstairs.…

Bubble gum, bubble gum, chew and blow,

Bubble gum, bubble gum, scrape your toe,

Bubble gum, bubble gum, tastes so sweet,

Children’s Songs: Song level: X X s1 s2 s3

Dolly Dimple walks like this,Dolly Dimple talks like this,Dolly Dimple smiles like this,Dolly Dimple throws a kiss.

Page 53: A Bayesian Approach to the Poverty of the Stimulus

Do people learn regular grammars?

A my name is AliceAnd my husband's name is Arthur,We come from Alabama,Where we sell artichokes.B my name is BarneyAnd my wife's name is Bridget,We come from Brooklyn,Where we sell bicycles.…

Songs containing items represented as lists (where order matters)

Dough a Thing I Buy Beer WithRay a guy who buys me beerMe, the one who wants a beerFa, a long way to the beerSo, I think I'll have a beerLa, -gers great but so is beer!Tea, no thanks I'll have a beer…

Cinderella, dressed in yella,Went upstairs to kiss a fella,Made a mistake and kissed a snake,How many doctors did it take?1, 2, 3, …

Page 54: A Bayesian Approach to the Poverty of the Stimulus

Do people learn regular grammars?

You put your [body part] inYou put your [body part] outYou put your [body part] inand you shake it all about

You do the hokey pokeyAnd you turn yourself aroundAnd that's what it's all about!

Most of the song is a template, with repeated (varying) element

If I were the marrying kindI thank the lord I'm not sirThe kind of rugger I would beWould be a rugby [position/item] sirCos I'd [verb phrase]And you'd [verb phrase]We'd all [verb phrase] together…

If you’re happy and you know it[verb] your [body part]If you’re happy and you know it then your face will surely show itIf you’re happy and you know it[verb] your [body part]

Page 55: A Bayesian Approach to the Poverty of the Stimulus

Do people learn regular grammars?

There was a farmer had a dog,And Bingo was his name-O.B-I-N-G-O!B-I-N-G-O!B-I-N-G-O!And Bingo was his name-O!

(each subsequent verse, replace a letter with a clap)

Other interesting structures… I know a song that never ends,

It goes on and on my friends,I know a song that never ends,And this is how it goes:(repeat)

Oh, Sir Richard, do not touch me

(each subsequent verse, remove the last word at the end of the sentence)

Page 56: A Bayesian Approach to the Poverty of the Stimulus
Page 57: A Bayesian Approach to the Poverty of the Stimulus

New PRG: 1-state

S End

Det, n, pro, prop, prep, adj, aux, wh, comp, to, v, vi, part

Det, n, pro, prop, prep, adj, aux, wh, comp, to, v, vi, part

Log(prior) = 0; no free parameters

Page 58: A Bayesian Approach to the Poverty of the Stimulus

Another PRG: standard + noise

For instance, level-1 PRG + noise would be the best regular grammar for the corpus at level 1, plus the 1-state model This could parse all levels of evidence Perhaps this would be better than a more

complicated PRG at later levels of evidence

Page 59: A Bayesian Approach to the Poverty of the Stimulus
Page 60: A Bayesian Approach to the Poverty of the Stimulus

Corpus level

Flat RG-L RG-S CFG-S

CFG-L

Flat RG-L RG-S CFG-S CFG-L

1 6817

11619

10118

13324

16426

85 135 119 157 188

2 405134

394146

357156

313185

446176

539 540 513 498 622

3 783281

560322

475333

384401

436373

1064 882 808 785 809

4 1509548

783627

607653

491490

596709

2055 1410 1260 1260 1305

5 40871499

13431758

8581863

5412078

7781941

5586 3101 2721 2619 2719

6 5148918119

508424326

155925625

68127289

133025754

69607

29410 27184

27970 27084

Results: frequency levels (comprehension estimates)

Log prior, log likelihood (abs) Log posterior (smaller is better)

P

P

P

P

P

P

L

L

L

L

L

L

Page 61: A Bayesian Approach to the Poverty of the Stimulus

Period Flat RG-L RG-S CFG-S

CFG-L

Flat RG-L RG-S CFG-S

CFG-L

0 2839891

14571260

8431342

5521498

8081425

3730 2717 2185 2050

2233

1 168315959

33607804

13498291

6678879

11758373

22790

11164

9640 9546

9548

2 260639272

374812168

146412891

67413785

127313006

35335

15916

14335

14459

14289

3 3657512932

431317185

149318123

68119406

129618280

49507

21498

19616

20087

19576

4 4529215969

468121376

152122536

68124059

129622674

61261

26057

24057

24740

23970

5 5148918119

508424326

155925625

68127289

133025754

69607

29410

27184

27970

27084

Results: availability by age

Log prior, log likelihood (abs)

P

P

P

P

P

P

L

L

L

L

L

L

Log posterior (smaller is better)

Page 62: A Bayesian Approach to the Poverty of the Stimulus
Page 63: A Bayesian Approach to the Poverty of the Stimulus

One type of hand-designed grammar

69 productions, 14 nonterminals

390 productions, 85 nonterminals

Specific grammars of each type

Page 64: A Bayesian Approach to the Poverty of the Stimulus

The other type of hand-designed grammar

126 productions, 14 nonterminals

170 productions, 14 nonterminals

Specific grammars of each type

Page 65: A Bayesian Approach to the Poverty of the Stimulus
Page 66: A Bayesian Approach to the Poverty of the Stimulus

P1. It is impossible to have made some generalization G simply on the basis of data D

P2. Children show behavior BP3. Behavior B is not possible

without having made G

G: a specific grammarD: typical child-directed speech inputB: children don’t make certain

mistakes (they don’t seem to entertain structure-independent hypotheses)

T: language has hierarchical phrase structureC1. Some constraints T,

which limit what type of generalizations G are possible, must be innate

The Argument from the Poverty of the Stimulus (PoS)

Page 67: A Bayesian Approach to the Poverty of the Stimulus

#1: Children hear complex interrogatives

Well, a few, but not many Adam (CHILDES) – 0.048%

No yes-no questions Four wh-questions (e.g., “What is the music it’s

playing?”) Nina (CHILDES) – 0.068%

No yes-no questions 14 wh-questions

In all, most estimates are << 1% of input

Legate & Yang 2002

Page 68: A Bayesian Approach to the Poverty of the Stimulus

Well, a few, but not many Adam (CHILDES) – 0.048%

No yes-no questions Four wh-questions (e.g., “What is the music it’s

playing?”) Nina (CHILDES) – 0.068%

No yes-no questions 14 wh-questions

In all, most estimates are << 1% of input

Legate & Yang 2002

How much is “enough”?

#1: Children hear complex interrogatives

Page 69: A Bayesian Approach to the Poverty of the Stimulus

#2: Can get the behavior without structure

There is enough statistical information in the input to be able to conclude which type of complex interrogative is ungrammatical

Reali & Christiansen 2004; Lewis & Elman, 2001

Rare: comp adj aux

Common: comp aux adj

Page 70: A Bayesian Approach to the Poverty of the Stimulus

Response: there is enough statistical information in the input to be able to conclude that “Are eagles that alive can fly?” is ungrammatical

Reali & Christiansen 2004; Lewis & Elman, 2001

Rare: comp adj aux

Common: comp aux adj

Sidesteps the question: does not address the innateness of structure (knowledge X)

Explanatorily opaque

#2: Can get the behavior without structure

Page 71: A Bayesian Approach to the Poverty of the Stimulus

Why do linguists believe that language has hierarchical phrase

structure? Formal properties + information-theoretic, simplicity-based

argument (Chomsky, 1956) A sentence has an (i,j) dependency if replacement of the ith symbol ai of

S by bi requires a corresponding replacement of the jth symbolf aj of S by bj

If S has an m-termed dependency set in L, at least 2^m states are necessary in the finite-state grammar that generates L Therefore, if L is a finite-state language, then there is an m such that no

sentence S of L has a dependency set of more than m terms in L The “mirror language” made up of sentences consisting of a string X

followed by X in reverse (e.g., aa, abba, babbab, aabbaa, etc), has the property that for any m we can find a dependency set D = {(1,2m), (2,2m-1),..,(m,m+1)}. Therefore it cannot be captured by any finite-state grammar

English has infinite sets of sentences with dependency sets with more than any fixed number of terms. E.g. “the man who said that S5 is arriving today”, there is a dependency between “man” and “is”. Therefore English cannot be finite-state

There is the possible counterargument that since any finite corpus could be captured by a finite-state grammar, then English is only not finite-state in the limit – but in practice, it could be Easy counterargument: simplicity considerations. Chomsky: “If the

processes have a limit, then the construction of a finite-state grammar will not be literally impossible (since a list is a trivial finite-state grammar), but this grammar will be so complex as to be of little use or interest.”

Page 72: A Bayesian Approach to the Poverty of the Stimulus

Innate

Learned

The big picture

Page 73: A Bayesian Approach to the Poverty of the Stimulus

Innate

Learned

Grammar Acquisition (Chomsky)

Page 74: A Bayesian Approach to the Poverty of the Stimulus

P1. Children show behavior B

B

The Argument from the Poverty of the Stimulus (PoS)

Page 75: A Bayesian Approach to the Poverty of the Stimulus

P1. Children show behavior BP2. Behavior B is not possible

without having some specific grammar or rule G

G

B

The Argument from the Poverty of the Stimulus (PoS)

Page 76: A Bayesian Approach to the Poverty of the Stimulus

P1. Children show behavior BP2. Behavior B is not possible

without having some specific grammar or rule G

P3. It is impossible to have learned G simply on the basis of data D

G

BD

X

The Argument from the Poverty of the Stimulus (PoS)

Page 77: A Bayesian Approach to the Poverty of the Stimulus

C1. Some constraints T, which limit what type of grammars are possible, must be innate

G

B

T

D

P1. Children show behavior BP2. Behavior B is not possible

without having some specific grammar or rule G

P3. It is impossible to have learned G simply on the basis of data D

The Argument from the Poverty of the Stimulus (PoS)

Page 78: A Bayesian Approach to the Poverty of the Stimulus

There are enough complex interrogatives in D

P1. It is impossible to have made some generalization G simply on the basis of data D

P2. Children show behavior BP3. Behavior B is not possible

without having made G

e.g., Pullum & Scholz 2002

C1. Some constraints T, which limit what type of generalizations G are possible, must be innate

Replies to the PoS argument

Page 79: A Bayesian Approach to the Poverty of the Stimulus

There are enough complex interrogatives in D

P1. It is impossible to have made some generalization G simply on the basis of data D

P2. Children show behavior BP3. Behavior B is not possible

without having made G

Pullum & Scholz, 2002

There is a route to B other than G (statistical learning)e.g., Lewis & Elman , 2001

Reali & Christiansen, 2005C1. Some constraints T,

which limit what type of generalizations G are possible, must be innate

Replies to the PoS argument

Page 80: A Bayesian Approach to the Poverty of the Stimulus

Innate Learned

Page 81: A Bayesian Approach to the Poverty of the Stimulus

Innate Learned

Explicit structure

No explicit structure

Page 82: A Bayesian Approach to the Poverty of the Stimulus

Innate Learned

Explicit structure

No explicit structure

Page 83: A Bayesian Approach to the Poverty of the Stimulus
Page 84: A Bayesian Approach to the Poverty of the Stimulus
Page 85: A Bayesian Approach to the Poverty of the Stimulus
Page 86: A Bayesian Approach to the Poverty of the Stimulus
Page 87: A Bayesian Approach to the Poverty of the Stimulus
Page 88: A Bayesian Approach to the Poverty of the Stimulus
Page 89: A Bayesian Approach to the Poverty of the Stimulus
Page 90: A Bayesian Approach to the Poverty of the Stimulus
Page 91: A Bayesian Approach to the Poverty of the Stimulus
Page 92: A Bayesian Approach to the Poverty of the Stimulus
Page 93: A Bayesian Approach to the Poverty of the Stimulus
Page 94: A Bayesian Approach to the Poverty of the Stimulus
Page 95: A Bayesian Approach to the Poverty of the Stimulus
Page 96: A Bayesian Approach to the Poverty of the Stimulus
Page 97: A Bayesian Approach to the Poverty of the Stimulus
Page 98: A Bayesian Approach to the Poverty of the Stimulus
Page 99: A Bayesian Approach to the Poverty of the Stimulus
Page 100: A Bayesian Approach to the Poverty of the Stimulus
Page 101: A Bayesian Approach to the Poverty of the Stimulus
Page 102: A Bayesian Approach to the Poverty of the Stimulus
Page 103: A Bayesian Approach to the Poverty of the Stimulus