macwhinney rethinking logical problem

1

Rethinking the Logical Problem of Language Acquisition

Brian MacWhinney

Carnegie Mellon University

[The child’s acquisition of grammar] is hopelessly

underdetermined by the fragmentary evidence available.

-- Chomsky 1968 Language and Mind

Abstract

The study of child language acquisition is dominated by three major competing visions:

socialization theory, learning theory, and nativist theory. Each takes a different approach to a

core issue in developmental psycholinguistics known as the logical problem of language

acquisition (LPLA). This paper argues that the LPLA is composed of two, only partially

related, sub-problems. The first form of the LPLA emphasizes recovery from

overgeneralization. Nativists claim that, contrary to the claims of socialization theory,

recovery occurs without corrective feedback under the guidance of innate constraints.

Learning theory presents five plausible and interesting alternatives to constraints, including:

conservatism, indirect negative evidence, competition, cue construction, and monitoring. The

second form of the LPLA focuses on error-free processes in acquisition. The nativist claim

is that error-free performance shows that the child understands the possible shape of human

language. However, error-free performance can also arise from these same five learning

mechanisms. The availability of so many mechanisms for addressing the logical problem

2

indicates that it is time to view recovery from overgeneralization and error-free learning not

as logical problems, but evidence for the collaboration of acquisitional supports. We now

need to specify the interactions of mechanisms derived from each of the three major

competing visions. Emergentist models (MacWhinney, 1999c) present a particularly

promising framework for specifying this integration, if they can develop richer linguistic

representations and make fuller use of data on spontaneous conversational interactions.

Three Approaches to Child Language Learning

The study of child language acquisition is dominated by three major competing visions:

socialization theory, learning theory, and nativist theory. Socialization theory holds that

language is acquired from social interactions. Nativist theory holds that language is innately

derived from a series of genetically programmed modules. Connectionist theory holds that

language is acquired from the detection of patterns in the input. Each of these theories is

committed to providing fundamental accounts for all of the core phenomena of language

acquisition. Among these core phenomena, one that has been the particular focus of

theoretical attention is the capacity for recovery from overgeneralization.

Overgeneralization and the subsequent recovery from overgeneralization are common

processes in the normal course of language acquisition. Sometime during the first years,

every normally developing English-speaking child will produce an overgeneralization like

“goed” or “ated.” We can be sure that children will also learn to stop making these errors.

Each of the three major competing visions gives a very different story about how and why

this recovery occurs. This paper will examine those assumptions, calling into question each

of the currently accepted approaches to this issue and suggesting an alternative approach

grounded on the notion of multiple supports for language learning.

3

1. Socialization Theory

The oldest and most widely held approach to language acquisition is socialization

theory. This approach focuses on the role of caregivers as sources of social wisdom.

Children are viewed as novices who are learning to act like others so that they can

communicate their desires. The earliest articulation of this point of view was provided by St.

Augustine in his Confessions when he described the ways in which he crudely negotiated

the meanings of words with his elders in order to express his wills and desires.

This I remember; and have since observed how I learned to speak. It was

not that my elders taught me words (as, soon after, other learning) in any

set method; but I, longing by cries and broken accents and various

motions of my limbs to express my thoughts, that so I might have my

will, and yet unable to express all I willed or to whom I willed, did myself,

by the understanding which Thou, my God, gavest me, practise the

sounds in my memory. When they named anything, and as they spoke

turned towards it, I saw and remembered that they called what they would

point out by the name they uttered. And that they meant this thing, and no

other, was plain from the motion of their body, the natural language, as it

were, of all nations, expressed by the countenance, glances of the eye,

gestures of the limbs, and tones of the voice, indicating the affections of

the mind as it pursues, possesses, rejects, or shuns. And, thus, by

constantly hearing words, as they occurred in various sentences, I

collected gradually for what they stood; and, having broken in my mouth

to these signs, I thereby gave utterance to my will. Thus, I exchanged with

those about me these current signs of our wills, and so launched deeper

into the stormy intercourse of human life, yet depending on parental

authority and the beck of elders.

This view of language as a negotiated expression of will fits in well with the views of many

developmental psychologists (Bruner, 1978; Ervin-Tripp, 1981; Moerk, 1983; Snow, 1995;

Tomasello, 1999); social anthropologists (Heath, 1983; Hymes, 1964; Ochs, 1985; Scollon,

1976); and functional linguists (Chafe, 1987; Givón, 1979). Perhaps the strongest version

of socialization theory is the position advocated by Hopper (1987) who suggests that

4

grammar emerges directly from social interaction. In child language, we see that the first

uses of grammatical forms are often tightly linked to discourse contexts. For example,

Schieffelin (1985) showed that the emergence of the Kaluli ergative is confined to high

transitive uses in particular conversational contexts. Similarly, Idiazabal (in press) shows

that the first uses of the perfective in Basque appear within narrative structures.

Socialization theory emphasizes the developmental importance of corrective feedback. In

many middle class families, those “magic moments” in which the parent provides

corrective feedback occur hundreds of times each day. Let us take a look at one of the most

often cited of these moments, as reported by McNeill (1966):

Child: Nobody don’t like me.

Mother: No, say “Nobody likes me.”

Child: Nobody don’t like me.

(dialogue repeated eight times)

Mother: Now listen carefully, say “Nobody likes me.”

Child: Oh! Nobody don’t likeS me.

Examining data from Adam, Eve, and Sarah (Brown, 1973) for evidence of learning during

these magic moments, Brown and Hanlon (1970) found that correction was more often for

meaning than for form and that, when formal correction was provided by the parent, it was

not immediately echoed by the child.

Further research has modified this initial assessment. Typically, the form of corrective

feedback is not so overt as in the example from McNeill. Instead, parents rely on more

subtle forms of recasting. Parents tend to provide corrections for form most often when the

child’s utterance is very close to the adult standard, often containing only one error

(Bohannon, MacWhinney, & Snow, 1990; Bohannon & Stanowicz, 1988). Children who

receive corrective feedback in the form of recasts tend to learn the corrected structures more

quickly (Farrar, 1992; Nelson, 1982; Nelson, Denninger, Bonvilian, Kaplan, & Baker,

1984). A very general finding of this research is that the type of feedback that parents

provide to their children is finely tuned to the developmental stage of the child’s grammar

(Demetras, Post, & Snow, 1986; Hirsh-Pasek, Trieman, & Schneiderman, 1984; Morgan,

5

Bonamo, & Travis, 1995; Penner, 1987; Post, 1994; Snow, 1995; Sokolov, 1993; Sokolov

& MacWhinney, 1990).

It makes sense for a parent to provide some form of corrective feedback. However,

unless the feedback is extremely stereotypic, the child may have trouble interpreting it as an

overt correction (Marcus, 1993; Saxton, 1997). Consider the most extreme and clear form

of corrective feedback. Every time the child makes a grammatical mistake, the parent would

clap his hands and say “ungrammatical.” If a parent were to provide absolutely obvious

and uniform negative evidence in this way, interactions would look like this:

Child: me want more.

Father: ungrammatical.

Child: want more milk.


Child: more milk!


Child: cries

Father: ungrammatical

However, parents cannot interact with their children in this unresponsive way. If they are to

provide any form of feedback, it needs to be through recasting and expansion, rather than

overt correction. Here is a more plausible interaction:

Child: Me want more.

Father: You want more? More what?

Child: Want more milk.

Father: You want more milk?

Child: More milk!

Father: Sure, honey, I’ll get you some more.

Child: (cries)

Father: Now don’t cry. Daddy is getting you some.

The parent’s main goal in providing feedback to the child is not the provision of

negative evidence, but the extraction of the child’s meaning and the maintenance of a

successful interaction. When one thinks a bit about the language learning process, this

makes sense. If the parent started from the beginning by providing uniform negative

6

feedback to all ungrammatical sentences, virtually all of the child’s first 1000 utterances

would be marked with the word “ungrammatical” and an eyebrow raise or a clap of the

hands. The child would learn little from this process except perhaps to avoid communicating

with a person who provides nothing but raised eyebrows.

Proponents of socialization theory can argue that feedback need not be provided in this

absolute fashion. Rather, both parents and children could obey the principles of signal

detection theory by maximizing “hits” and “correct rejections,” while minimizing

“misses” and “false alarms.” However, Marcus (1993) has shown that the actual

distribution of individual types of feedback such as recasts or expansions to specific

syntactic constructions is so noisy that a huge amount of feedback would be required before

the child could establish a sufficient level of confidence to know that a given construction is

either correct or incorrect.

Unfortunately, much of the discussion of the use of negative evidence has tended to

underestimate the information-processing abilities of the child. In most models, it is

assumed that the child focuses on the use of a single cue, rather than a combination of cues.

By integrating a variety of cues with differential cue validities (Anderson, 1982;

MacWhinney & Bates, 1989; Massaro, 1987), the child could establish an overall “negative

feedback index” for each utterance. Some of the cues that could be integrated include overt

correction, recasting, expansions, clarification questions, topic continuation, proxemics,

gesture, and intonation. If the child could put together all of this information, there might be

enough parental feedback to tag sentences as grammatical or ungrammatical.

Socialization theory emphasizes the importance of tutoring, scaffolding, and corrective

feedback as cues that guide the child through every step of linguistic socialization. This view

tends to minimize the importance of a priori hypotheses while maximizing the impact of the

structure of the sociolinguistic environment. Socialization theory places a great emphasis on

the “here and now” as the wellspring of grammatical learning. Because it assigns no

particular role to memory or off-line hypothesis checking, socialization theory views

7

linguistic input as having a direct and immediate effect on language learning. For the issue

of recovery from overgeneralization, the finding that would provide the strongest support for

socialization theory is one that shows direct links between parental feedback and recovery

from overgeneralization. Evidence that parental feedback plays no direct role in language

acquisition would strike at the heart of socialization theory by weakening its conceptual

underpinnings and markedly limiting its scope.

2. Learning Theory

The second major vision of the process of language learning is that espoused by the

empiricists and associationists. The roots of empiricism go back to Aristotle and the

Skeptics in ancient Greece. In the modern period, philosophers such as Locke, Hume, and

Berkeley outlined the general shape of associationist psychology. In the period between the

two world wars, associationist thinking was a dominant theme in American psychology.

During that period, associationism became closely linked to behaviorism, particularly in the

work of Skinner, Hull, and Thorndike.

In modern times, associationist thinking has reemerged without its earlier behaviorist

linkages in the context of connectionism or neural network modeling. For our current

concerns, one of the most interesting connectionist models is the account of past tense

learning explored by Rumelhart and McClelland (1986), Plunkett and Marchman (1991),

and MacWhinney and Leinbach (1991). Connectionism tends to be rather agnostic on the

issue of the Poverty of Stimulus. Some connectionist models such as Back Propagation rely

heavily on corrective feedback. Others, such as Competitive Learning and Adaptive

Resonance Theory (ART) learn simply on positive data. Despite this apparent eclecticism

and agnosticism, the issue of the Logical Problem of Language Acquisition is just as much

a problem for connectionists as it is for nativists and social environmentalists. Not all

learning mechanisms must be expressed in neural networks. However, all of the

mechanisms we will consider here could be expressed in neural network terms.

8

3. Generativist Theory

The third major approach to language acquisition is Chomsky’s generativist theory. At

the core of this view of language learning is the rejection of the behaviorist premises of

socialization theory. Chomsky (1965) has argued that natural language input is full of

retracings, errors, and slips of the tongue. Because language is degenerate in these various

ways, the child should find it difficult to acquire grammatical rules by simple induction

across the input. This analysis of the learning challenge (Chomsky, 1980) has been called

“Argument from Poverty of the Stimulus.” According to this argument, no child can build

an adult language out of such degenerate input without obeying by a rich set of species-

specific innate hypotheses. These hypotheses are encoded genetically as specifications for

the shape of the language organ. Some refer to this issue as the “Logical Problem of

Language Acquisition” (LPLA) (Baker, 1979), while others have called it “Plato’s

Problem”, “Chomsky’s Problem”, “Gold’s Problem”, or “Baker’s Paradox”.

The LPLA has served as the fundamental motivation for an enormous body of research

on both first and second language acquisition in the generativist framework. The

argumentation in this work takes one of two forms. The first form focuses on the process of

recovery from overgeneralization.

1. A linguistic structure is presented.

2. It is shown that children sometimes overgeneralize the use of this structure. This

demonstrates that the structure is not being learned by rote.

3. It is argued that there is not enough evidence in the “stimulus” to force recovery

from overgeneralization.

4. Therefore, the observed recovery from overgeneralization must be due to some

innate mechanisms.

We can think of this argument as argument from recovery. We will refer to it as the LPLA

#1.

9

The second form of the nativist argument focuses on certain grammatical features that

the child putatively produces without any errors at all. For this second type of phenomenon

the argument is as follows:

1. A linguistic structure is presented.

2. It is shown that children use this structure without ever making mistakes.

3. This structure is shown to be so rare that children never encounter it in the input.

4. Therefore, the observed correct performance and avoidance of alternative incorrect

performances must be due to innate mechanisms.

We can think of this analysis as argument from lack of error. We will refer to it as the

LPLA #2.

Both of these forms of the LPLA are arguments from Poverty of the Stimulus.

However, they differ in terms of the nature of the child’s performances. In the first case, the

child produces errors and then recovers. In the second case, the child never makes an error

in the first place. LPLA #1 and LPLA #2 have played a very different role in the literature.

In his articulations of the theory of Principles and Parameters (P&P), Chomsky has tended

to emphasize the importance of lack of error and LPLA #2. However, empirical studies of

child language acquisition have tended more to emphasize recovery from overgeneralization

and LPLA #1. A failure to clearly distinguish these two very different lines of

argumentation has led to some confusion in this literature. Therefore, one of our goals here

will be to clarify this distinction and to analyze each of the arguments separately.

The LPLA #1: Recovery from overgeneralization

Although these two forms of the LPLA look at very different language phenomena, both

are grounded conceptually on a formal analysis presented by Gold (1967). Gold contrasted

two different language-learning situations: text presentation and informant presentation.

With informant presentation, the language learner can receive feedback from an infallible

informant regarding the grammaticality of every candidate sentence. This corrective

10

feedback is called “negative evidence” and it only requires that ungrammatical strings be

clearly identified as unacceptable. Whenever the learner formulates an overly general guess

about some particular linguistic structure, the informant will label the resulting structure as

ungrammatical and the learner will use this information to restrict the developing grammar.

In the case of text presentation, the learner only receives information on acceptable

sentences and no information regarding ungrammaticality is available. Gold showed that,

with only text presentation, languages with reasonably complex grammars, such as those

that have phrase structure rules, are not learnable. Nativists have then argued that, since

language is not learnable from input in this way, it must be innate in the sense that the child

must already have identified the basic shape of possible grammars before any learning

begins.

Gold’s proof is formulated in the terms of the abstract objects of recursive function

theory. However, it only takes a little rephrasing to see how the proof can be applied directly

to the actual process of language learning. The child can be viewed as the learner and the

adult can be viewed as the informant. It does not matter for Gold’s argument whether the

child or the adult is the source of a given string. What is important is only the shape of the

feedback associated with that string. In text presentation, no feedback can occur, so the

following interaction types are possible:

Utterance Feedback Result

1. Child says, “went.” none none

2. Child says, “*goed.” none none

3. Adult says, “went.” none positive data

The only information that the child receives in these sequences is positive data, since there is

no feedback regarding the child’s own productions. In sequence #1, there is no information

presented regarding the acceptability of “went.” However, sequence #3 does provide this

positive evidence by allowing the “text” to include acceptable sequences. In sequence #2,

there is no information presented regarding the unacceptability of “goed.” Moreover, the

11

adult “text presentation” will never produce the form “goed.” Therefore, the child has no

direct way of knowing that “goed” is ungrammatical.

Unlike text presentation, informant presentation provides feedback. The strongest form

of feedback is that which presents positive feedback for grammatical utterances and negative

feedback for ungrammatical utterances. If the child makes an error, it will be marked by a

signal from the adult. The adult can produce the error directly, along with information

signaling the fact that the error is ungrammatical. The provision of these signals is the

responsibility of the adult. In the informant presentation scenario, there are four types of

possible sequences:

Utterance Adult Feedback Result

1. Child says, “went.” Good Positive data

2. Child says, “*goed.” Bad Corrective feedback

3. Adult says, “went.” Good Positive data

4. Adult says, “*goed.” Bad Corrective feedback

There is no attested example in the literature of a sequence like #4 in which a parent

spontaneously produced a random error just to have an opportunity to mark it as

ungrammatical. Although such sequences never actually occur, they would fit in well with

the Gold framework, if they did. However, in Gold’s framework, a sequence like #2 is

functionally equivalent to a sequence like #4, so the absence of #4 does not affect the

analysis. In cases like sequence #3, the provision of positive feedback is not necessary,

since the child can reasonably assume that most forms produced by the adult are

grammatical. Of course, adults will occasionally make errors. However, on the level of the

lexical item and the construction, the notion that adult input is correct is a good working

assumption. To implement this fully, the child may need to filter out false starts and

retracings, and just store away words, constructions, and sentences that are clear, unretraced,

and fully comprehended. Once this is done, the child can then treat all remaining adult

forms as positive evidence.

12

With text presentation, if the learner formulates an overly general hypothesis, there is no

way to exclude that general hypothesis. Consider a very simple example in which the learner

is given a corpus of regular present and past tense verbs, along with a few verbs that have

irregular past tense forms. Using the regular past tense examples, the learner will induce a

grammar that adds “-ed” to the end of the present tense. This rule will then produce the

overgeneralized form “goed.” Without information regarding the ungrammaticality of

“goed,” the learner will never be able to recover from this overgeneralization and will never

learn to restrict the language to the smaller grammar that produces just “went.” Thus, the

grammar induced by this process will forever remain too big, since it will include both

“goed” and “went.”

correct grammar overly general grammar

wentjumped

goed runnedfalledwented

13

Gold showed that this problem occurs inevitably for the learning of all but the simplest

forms of language. If the set of languages being explored includes only finite languages

generated by finite-state machines (i.e., languages generated by regular, Markov processes),

text presentation is adequate. To see why this is true, consider a simple finite-state grammar

such Grammar (1):

(1)

A

B

C

D

endstart

This grammar will generate the strings ABD or AC. If we add the string ACD to the

positive evidence in the input, the grammar will add a new connection to permit the

additional string. The result will be Grammar (2):

(2)

A

B

C

D

endstart

Learning involves the addition of new connections or transitions between nodes and no

cutting or rewiring of old transitions. New positive strings always lead to the addition of

new transitions. There is no way for a finite grammar of this type to overgeneralize or

overgenerate, since it is simply an organized summary of the information in the input

strings. Basically, the learning of a finite-state grammar is a very conservative, data-based

process.

However, if the set of possible grammars that may be confronting the child includes all

possible finite grammars of this type as well as potentially at least one non-finite grammar,

Gold shows that the correct grammar cannot be induced from text presentation. For

14

example, one non-finite grammar that is consistent with the strings ABD, AC, and ACD is

Grammar (3):

(3) S -> AP + (BP)

AP -> A + (C)

BP -> (B) + D

The problem with this grammar is that it will also generate the ungrammatical string ACBD.

Since the learner will never be told that ACBD is ungrammatical, there will be no way to

reject the nonfinite grammar and no way to settle on the correct grammar indicated in (2).

Gold’s proof relates to the case in which a child is willing to consider all possible finite-

state grammars along with just one nonfinite grammar. One might object that this is a rather

bizarre limitation. However, Gold selected this configuration only to illustrate the problem in

its simplest form. One could equally well imagine that the child is examining the utility of

many alternative non-finite grammars, along with the basic finite-state summaries of the

input. If one allows the child to hypothesize multiple possible nonfinite grammars, the

problem only gets worse. In this second scenario, the child could induce Grammar (4)

(4) S -> AP + (BP) + (C) + (D)

AP -> A

BP -> BD

This second nonfinite grammar would generate illegal strings such as ABDCD or ADD. If

the child goes down the road of formulating all manner of non-finite grammars, it is difficult

to constrain this process to just a particular grammar. In fact, the child might well formulate

both (3) and (4) as alternatives. Given this, and given the a priori commitment to view

language identification as deterministic, many linguists and psycholinguists have accepted

Gold’s analysis and used it as the foundation stone upon which to build further analyses.

When coupled with certain additional forms of argumentation, this logical problem of

language acquisition (LPLA) has functioned as a major conceptual pillar supporting current

15

work in generative linguistics, language acquisition theory, and second language acquisition

theory.

Solving the LPLA #1 through conservatism

The most direct way for a language learner to solve Gold’s problem is to avoid

formulating overly general grammars in the first place. If the child never overgeneralizes,

there is no problem of recovery from overgeneralization and no need for negative evidence

or corrective feedback. In the examples presented above, the conservative child would avoid

formulating Grammar (4) and never go beyond a finite-state grammar. To insure that this

happens, the child simply has to avoid constructing a grammar with greater than finite-state

complexity.

This first solution to the LPLA #1 emphasizes the child’s obedience to the Subset

Principle of Angluin (1980) or Fodor and Crain (1987). The Subset Principle requires the

child to avoid overgeneralization by always sticking with the most conservative grammar. It

stipulates that grammars are ordered in a subset relation such that the child explores the

more restrictive grammar first before even considering the less restrictive one. In essence,

the Subset Principle says that the child is conservative.

Virtually all accounts of language learning assume some degree of conservatism in the

child’s approach to rule induction. Many children are able to avoid falling into the trap of

overgeneralization by using linguistic forms cautiously and conservatively. For example, if a

child avoids using a verb with dative movement until that verb is detected in a sentence with

dative movement, dative movement overgeneralization will never occur. Conservative learners

can learn without negative evidence, because they never make errors. This means that they

never actually go beyond the data given. Baker (1981), Fodor and Crain (1987), Maratsos,

Kuczaj, Fox, and Chalkley (1979) and others have emphasized the extent to which syntactic

learning can proceed conservatively, often avoiding the need for negative evidence. Wolfe

Quintero (1992) has shown that conservatism can be used to account for learner acquisition

16

of the sentence patterns that have been used to motivate the subjacency constraint and its

related parameter. For example, she notes that second language learners acquire these

positive contexts for wh-movement in this order:

What did the little girl hit __ with the block today?

What did the boy play with __ behind his mother?

What did the boy read a story about __ this morning?

Because they are proceeding conservatively, learners never produce forms such as:

*What did the boy with ___ read a story this morning?

They never hear this structure in the input and never hypothesize a grammar that includes it.

As a result, they never make overgeneralizations and never attempt wh-movement in this

particular context. Data from Maratsos, Kuczaj, Fox, and Chalkley (1979) suggest that this

same analysis may also apply to first language learners.

Many child language researchers have emphasized the importance of item-based

constructions (Braine, 1976; Lieven, Pine, & Baldwin, 1997; MacWhinney, 1975, 1982;

Tomasello, 1992) in acquisition. If the child formulates and applies these patterns

conservatively, overgeneralization will be minimized. For example, a common

overgeneralization at age 3 involves the frequent verb “say.” Children will ask parents to

“say me that story” instead of “tell me that story.” However, conservative children will not

make this error, since they will only use the verb “say” in exactly the way it was used in the

input. In the terms of MacWhinney (1982; 1988), conservative children will learn a finite-

state transition network centered on the lexical item “tell.” This network accepts (or

generates) an NP in the role of “speaker” in preverbal position, an NP in the role of

“listener” in postverbal position, and an NP in the role of “story” in the post-postverbal

slot. A second network is used to produce the periphrastic dative, as in “tell that story to

me.” These two networks can then be joined into a single item-based finite-state grammar

that operates on narrowly defined lexical categories. Children can learn this item-based

grammar using positive data only. They can also learn a similar network for the verb “say.”

17

However, for that network, there is only the periphrastic dative. Moreover, for the verb

“say,” the category of the NP in postverbal position is defined semantically as a short

verbalization, rather than a longer story. This means that to minimize the possibility of error

here, the child has to be conservative in three ways:

1. The child needs to formulate each syntactic combination as an item-based pattern.

2. Each item-based pattern needs to record the exact semantic status of each positive

instance of an argument in a particular grammatical configuration (MacWhinney,

1988).

3. Attempts to use the item-based pattern with new arguments must be closely guided

by the semantics of previously encountered positive instances.

If the child has a good memory and applies this method cautiously, overgeneralization will

be minimized.

Conservatism can be viewed as a powerful mechanism for addressing the LPLA.

However, it is better understood as one of several crucial supports for successful

acquisition. Children will eventually go “beyond the information given” and produce the

occasional error (Jespersen, 1922). However, by blending a certain level of conservatism

with other supports for successful acquisition, the child can make optimal progress in

language learning.

Solving the LPLA #1 by recovering from overgeneralization

Even if the child minimizes error through conservatism, successful learning will require

some form of negative evidence. The logic of Gold’s proof cannot be avoided. When the

child overgeneralizes, some force must prune back that overgeneralization. However,

researchers (Marcus, 1993) have often mistakenly assumed that negative evidence is

equivalent to overt parental correction. This is only true if the learner has no ability to

construct secondary comparisons across the positive input. If we modify the Gold scenario

by providing the learner with the ability to construct searches across the input, there are at

18

least four ways to compute negative evidence from positive instances. These four processes

are: competition, cue construction, monitoring, and indirect negative evidence.

1. Competition

Psychological theories have often referred to the notion of competition (Freud, 1958;

Herbart, 1891). In the area of language acquisition, MacWhinney (1978) used competition

to account for the interplay between “rote” and “analogy” in learning morphophonology.

This mechanism was later generalized to all levels of linguistic processing in the form of the

Competition Model (MacWhinney, 1988; MacWhinney & Bates, 1989). In the 1990s, the

Competition Model was further elaborated in terms of neural network theory.

The Competition Model views overgeneralizations as arising from three types of

pressures. The first is the underlying analogic pressure that produces the overgeneralization.

The second pressure is the growth in the rote episodic auditory representation of a correct

form. This representation slowly grows in strength over time, as it is repeatedly

strengthened through encounters with the input data. The third pressure is the competition

of analogy with rote. Consider the case of “*goed” and “went” viewed diagrammatically.

The overgeneralization “goed” is supported by analogy. It competes against the weak rote

form “went” which is supported by auditory memory:

go + PAST

went go + edcompetition

analogicpressure

episodic/rotesupport

As the strength of the rote auditory form for “went” grows, it begins to win out in the

competition against the analogic form “*goed”. Finally, the error is eliminated.i

Saxton (1997) has emphasized the ways in which competition operates directly during

conversation. He argues that, “When the child produces an utterance containing an

19

erroneous form, which is responded to immediately with an utterance containing the correct

adult alternative to the erroneous form (i.e. when negative evidence is supplied), then the

child may perceive the adult form as being in contrast with the equivalent child form.

Cognizance of a relevant contrast can then form the basis for perceiving the adult form as a

correct alternative to the child form.” (p. 155). Saxton refers to this juxtaposition as the

Direct Contrast hypothesis. A paradigmatic example of a Direct Contrast exchange for

Saxton would be:

Child: Well, I feeled it.

Adult: I felt it.

Child: I felt it.

As Saxton notes, the child is aware of the existence of both “felt” and “feeled” and uses

the parental data to reinforce the strength of the former. Thus, Saxton’s Direct Contrast

account is equivalent to the Competition Model account (MacWhinney, 1993). Further

implementing this concept, Saxton (1997; 1998) has conducted training experiments with

novel irregular past tense forms. His studies clearly demonstrate the efficacy of providing

correct models that are closely tuned to the child’s own productions (Bohannon et al., 1990;

Bohannon & Stanowicz, 1988).

If the learner is sufficiently conservative, learning will be close to error free. In this

account, conservatism works by placing relatively more reliance on episodic/rote support

and discounting the influences of analogic pressure. Errors will only occur in cases where

analogy is strongly in competition with rote. Generalizing away from the particular example

given above, the general schema for competition looks like this:

meaning

word wordcompetition

analogicpressure

episodicsupport

20

The competition between two candidate forms is governed by the strength of their episodic

auditory representations. In the case of the competition between “*goed” and “went”, the

overgeneralized form has little episodic auditory strength, since it is heard seldom if at all in

the input. Although “*goed” lacks auditory support, it has strong analogic support from

the general pattern for past tense formation (MacWhinney & Leinbach, 1991). In the

Competition Model, analogic pressure stimulates overgeneralization and episodic auditory

encoding reins it in. The analogic pressure hypothesized in this account has been described

in detail in several connectionist models of morphophonological learning. The models that

most closely implement the type of competition being described here are the models of

MacWhinney and Leinbach (1991) for English and MacWhinney, Leinbach, Taraban, and

McDonald (1989) for German. In these models, there is a pressure for regularization

according to the general pattern that produces forms such as “*goed” and “*ranned”. In

addition, there are weaker gang effects that lead to overgeneralizations such as “*stang” for

the past tense of “sting”.

Morphological Competition

Bowerman (1987) has suggested that recovery from overgeneralizations such as

“*unsqueeze” is particularly problematic for a Competition Model account. To make this

example concrete, let us imagine that “*unsqueeze” is being used to refer to the voluntary

opening of a clenched fist. In this case, likely competitors include “release” or “let go.”

Because there is no rote auditory support for “*unsqueeze,” forms like “release” or “let

go” will eventually compete against and eliminate this particular error.

Several semantic cues support this process of recovery. In particular, inanimate objects

such as rubber balls and sponges cannot be “*unsqueezed” in the same way that they can

be “squeezed.” Squeezing is only reversible if we focus on the action of the body part

doing the squeezing, not the object being squeezed. Or consider the competition between

21

“*unapprove” and “disapprove”. We might imagine that a mortgage loan application that

has been initially approved can then be subsequently “unapproved.” At that point, we

would still not have heard “unapproved” actually supported by input data, but there would

be less direct competition with “disapprove.” Forces that minimize the competition between

meanings can help an overgeneralization survive long enough for it to begin to carve out its

own “ecological niche” (MacWhinney, 1989).

Lexical Competition

The same logic that can be used to account for recovery from morphological

overgeneralizations can be used to account for recovery from lexical overgeneralizations.

For example, a child may overgeneralize the word “kitty” to refer to tigers and lions. The

child will eventually learn the correct names for these animals and restrict the

overgeneralized form. The same three forces are at work here: analogic pressure,

competition, and episodic encoding. Although the child has never actually seen a “kitty”

that looks like a tiger, there are enough shared features to license the generalization. If the

parent supplies the name “tiger.” there is a new episodic encoding which then begins to

compete with the analogic pressure. If no new name is supplied, the child may still begin to

accumulate some negative evidence, noting that this particular use of “kitty” is not yet

confirmed in the input.

Merriman (1999) has shown how the linking of competition to a theory of attentional

focusing can account for the major empirical findings in the literature on Mutual Exclusivity

(Markman, 1989), or the tendency to treat each object as having only one name. By treating

this constraint as an emergent bias, we avoid a variety of empirical problems (MacWhinney,

1991). Since competition is implemented probabilistically through fuzzy logic (Massaro,

1987) or connectionist nets, it only imposes a bias, rather than a fixed constraint. The

probabilistic basis for competition allows the child to deal with hierarchical category

structure without having to enforce major conceptual reorganization (Carey, 1985).

22

Competition may initially lead a child to avoid referring to a “robin” as a “bird,” since the

form “robin” would be a direct match. However, sometimes “bird” does not compete

directly with “robin.” These include reference to a collection of different types of birds that

may include robins, reference to an object that cannot be clearly identified as a robin, or

anaphoric reference to an item that was earlier mentioned as a “robin.”

Syntactic Frame Competition

Overgeneralizations in syntax arise when a valency pattern common to a large group of

verbs is incorrectly overextended to a new verb. This type of overextension has been

analyzed in both distributed networks (Miikkulainen & Mayberry, 1999) and interactive

activation networks (MacDonald, Pearlmutter, & Seidenberg, 1994; MacWhinney, 1987).

These networks demonstrate the same gang effects and generalizations found in networks

for morphological forms (Plunkett & Marchman, 1993) and spelling correspondences

(Plaut, McClelland, Seidenberg, & Patterson, 1996). If a word shares a variety of semantic

features with a group of other words, it will be treated syntactically as a member of the

group.

Consider the example of overgeneralizations of dative movement. Verbs like “give”,

“send”, and “ship” all share a set of semantic features involving the transfer of an object

through some physical medium. In this regard, they are quite close to a verb like “deliver”

and the three-argument group exerts strong analogic pressure on the verb “deliver”.

However, dative movement only applies to certain frequent, monosyllabic transfer verbs and

not to multisyllabic, Latinate forms with a less transitive semantics such as “deliver” or

“recommend.” When children overgeneralize and say, “Tom delivered the library the

book,” they are being influenced by the underlying analogic pressure of the group of

transfer verbs that permit dative movement. In effect, the child has created a new argument

frame for the verb “deliver.” The first argument frame only specifies two arguments – a

subject or “giver” and an object or “thing transferred.” The new lexical entry specifies

23

three arguments. These two homophonous entries for “deliver” are now in competition,

just as “*goed” and “went” were in competition. Like the entry for “*goed”, the three-

place entry for “deliver” has good analogic support, but no support from episodic

encoding derived from the input. Over time, it loses in its competition with the two-argument

form of “deliver” and its progressive weakening along with strengthening of the competing

form leads to recovery from overgeneralization. Thus, the analysis of recovery from “Tom

delivered the library the book” is identical to the analysis of recovery from “*goed”.

2. Cue construction

Most recovery from overgeneralization relies on competition. However, competition will

eventually encounter limits in its ability to deal with the fine details of grammatical patterns.

To illustrate these limits, consider the case of recovery from causative overgeneralizations

such as “*I untied my shoes loose”. This particular extension receives analogic support

from verbs like “shake” or “kick” which permit “I shook my shoes loose” or “I kicked

my shoes loose.” It appears that the child is not initially tuned in to the fine details of these

semantic classifications. Bowerman (1988) has suggested that the process of recovery from

overgeneralization may lead the child to construct new features to block overgeneralization.

We can refer to this process as “cue construction.”

Recovering from other causative overgeneralizations may also require cue construction.

For example, an error such as “*The gardener watered the tulips flat” can be attributed to a

derivational pattern which yields three-argument verbs from “hammer” or “rake”, as in

“The gardener raked the grass flat.” Source-goal overgeneralization can also fit into this

framework. Consider, “*The maid poured the tub with water” instead of “The maid

poured water into the tub” and “*The maid filled water into the tub” instead of “The maid

poured water into the tub”. In each case, the analogic pressure from one group of words

leads to the establishment of a case frame that is incorrect for a particular verb. Although

this competition could be handled just by the strengthening of the correct patterns, it seems

24

likely that the child also needs to clarify the shape of the semantic features that unify the

“pour” verbs and the “fill” verbs.

Bowerman (personal communication) provides an even more challenging example. One

can say “The customers drove the taxi driver crazy,” but not “*The customers drove the

taxi driver sad.” The error involves an overgeneralization of the exact shape of the

resultative adjective. A connectionist model of the three-argument case frame for “drive”

would determine not only that certain verbs license a third possible argument, but also what

the exact semantic shape of that argument can be. In the case of the standard pattern for

verbs like “drive”, the resultant state must be terminative, rather than transient. To express

this within the Competition Model context, we would need to have a competition between a

confirmed three-argument form for “drive” and a looser overgeneral form based only on

analogic pressure. A similar competition account can be used to account for recovery from

an error such as, “*The workers unloaded the truck empty” which contrasts with “The

workers loaded the truck full”. In both of these cases, analogic pressure seems weak, since

examples of such errors are extremely rare in the language learning literature.

The actual modeling of these competitions in a neural network will require detailed

lexical work and extensive corpus analysis. A sketch of the types of models that will be

required is given in MacWhinney (1999a).

3. Monitoring

The Competition Model holds that, over time, correct forms gain strength from

encounters with positive exemplars and that this increasing strength leads them to drive out

incorrect forms. In the terms of Gold’s analysis, this strengthening of correct forms can

guarantee the learnability of language. However, by itself, competition does not fully

account for the dynamics of language processing in real social interactions. Consider a

standard self-correction such as “I gived, uh, gave my friend a peach.” Here the correct

form “gave” is activated in real time just after the production of the overgeneralization.

25

MacWhinney (1978) and Elbers (1993) have treated this type of self-correction as involving

“expressive monitoring” in which the child listens to her own output, compares the correct

weak rote form with the incorrect overgeneralization, and attempts to block the output of the

incorrect form. One possible outcome of expressive monitoring is the strengthening of the

weak rote form and weakening of the analogic forms. Exactly how this is implemented will

vary from model to model

In general, retraced false starts move from incorrect forms to correct forms, indicating

that the incorrect forms are produced quickly, whereas the incorrect rote forms take time to

activate. Kawamoto (1994) has shown how a recurrent connectionist network can simulate

exactly these timing asymmetries between analogic and rote retrieval. For example,

Kawamoto’s model captures the experimental finding that incorrect regularized

pronunciations of “pint” to rhyme with “hint” are produced faster than correct irregular

pronunciations.

An even more powerful learning mechanism is what MacWhinney (1978) called

“receptive monitoring.” If the child shadows input structures closely, he will be able to pick

up many discrepancies between his own productive system and the forms he hears. Berwick

(1987) found that a great deal of syntactic learning can be driven by the attempt to extract

meaning during comprehension. Whenever the child cannot parse an input sentence, the

failure to parse can be used as a means of expanding the grammar. The kind of analysis

through synthesis that occurs in some parsing systems can make powerful use of positive

instances to establish new syntactic frames. Receptive monitoring can also be used to

recover from overgeneralization. The child may monitor the form “went” in the input and

attempt to use his own grammar to match that input. If the result of the receptive monitoring

is “*goed”, the child can use the mismatch to reset the weights in the analogic system to

avoid future overgeneralizations.

Neural network models that rely on back-propagation assume that negative evidence is

continually available for every learning trial. This assumption is clearly much too strong.

26

However, not all connectionist models rely on the availability of negative evidence. For

example, Kohonen’s self-organizing feature map model (Miikkulainen, 1993) learns

linguistic patterns simply using cooccurences in the data with no reliance on negative

evidence.

4. Indirect Negative Evidence

Another interesting approach to the LPLA involves the examination of the input corpus

to compute indirect negative evidence. This computation can be illustrated with the error

“*goed.” To construct indirect negative evidence in this case, children need to track:

1. The frequency of all verbs.

2. The frequency of the past tense as marked by the regular “-ed.”

3. The ratio of (2) over (1).

4. The frequency of the verb “go.”

5. The predicted frequency of the form “*goed” as the product of (3) times (4).

6. The actual frequency of “*goed” in the input.

If (5) exceeds (6) by some specified threshold, then children can conclude that the form

“*goed” is excluded by the grammar. They can do this without ever receiving overt

correction from the informant.

Arguments based on this analysis have been offered by Chomsky (1981), Lasnik

(1989), Braine (1989) and others. In logical terms, indirect negative evidence is an

interesting solution to the LPLA. However, there is little actual evidence that children keep

track of the facts they would need to perform this computation. For elements (1) and (2)

above, it might be sufficient to only track the relative frequency of the present and the past

for a few core verbs. However, some frequency tracking of the general class must be done.

A neural network model or some other generalization mechanism could compute (3) and

(5). Moreover, the frequency tracking in (4) and (6) is something that most learning models

will have to assume in any case. The real question for this approach is whether children

27

actually compute anything like (1) and (2). Recent evidence for a slow rise in generalization

abilities before age 3 (Pine, Lieven, & Rowland, 1998; Tomasello, 2000) suggests that

indirect negative evidence might well be available to older children, but probably not to

younger children.

Interestingly, the structures for which indirect negative evidence provides the most

useful accounts are ones that are learned rather late. These typically involve the LPLA #2,

rather than the LPLA #1. For example, the learner could compute indirect negative evidence

that would block wh-raising from object-modifying relatives in sentences such as:

The police arrested the thieves who were carrying the loot.

*What did the police arrest the thieves who were carrying?

To do this, they would need to track the frequency of sentences such as:

Bill thought the thieves were carrying the loot.

What did Bill think the thieves were carrying?

Noting that raising from predicate complements occurs fairly frequently, children can

reasonably conclude that the absence of raising from object modification position means

that it is ungrammatical. Coupled with conservatism, indirect negative evidence could be a

powerful mechanism for avoiding overgeneralization of complex structures syntactic

structures. Unfortunately, we have little direct evidence demonstrating that either children or

adults compute indirect negative evidence in the way suggested above. One problem faced

by the indirect negative evidence account is that the child would need to know beforehand

which structures to include in the ratio. For example, the child would need to know that the

frequency of raising in relatives needs to be compared with the frequency of raising in

complements. However, if learning is item-based, as suggested earlier, this comparison

could be restricted to structures potentially involving a particular lexical item such as

“what” or “where.” This suggests that the computation of indirect negative evidence may

be partially linked to the same item-based mechanisms that support conservatism.

28

The Competition Model account can also be extended to compute indirect negative

evidence. The indirect negative evidence tracker could note that, although “squeeze” occurs

frequently in the input, “*unsqueeze” does not. Diagrammatically, this mechanism works

through the juxtaposition of a form receiving episodic support (“squeeze”) with a predicted

inflected form (“unsqueeze”).

squeeze (unsqueeze)

episodic/rotesupport

analogicprediction

gap tracking

comparison gap prediction(unconfirmed)

This mechanism uses analogic pressure to predict the form “*unsqueeze.” This is the

same mechanism as used in the generation of “*goed.” However, the child does not need

to actually produce “*unsqueeze,” only to hypothesize its existence. This form is then

tracked in the input. If it is not found, the comparison of the near-zero strength of the

unconfirmed form “unsqueeze” with the confirmed form “squeeze” leads to the

strengthening of competitors such as “release” and blocking of any attempts to use

“unsqueeze.” Although this mechanism is plausible, it is more complicated than the basic

competition mechanism and places a greater requirement on memory for tracking of non-

occurrences. Since the end result of this tracking of indirect negative evidence is the same as

that of the basic competition mechanism, it is reasonable to imagine that learners use this

mechanism only as a fall back strategy, relying on simple competition for most problems

with overgeneralization.

Solving the LPLA #1 by recharacterizing the target

A less direct, by equally effective, method of solving the LPLA #1 involves a

recharacterization of the shape of the target grammar. Gold’s analysis shows that, if the

29

child hypothesizes a language with more than finite state complexity, negative evidence will

be needed to recover from overgeneralization. However, if we provide a characterization of

language that stays within the bounds set by this proof, then we can assume that children are

capable of learning language through simple positive data. In that case, the LPLA #1

essentially vanishes. There are five ways we can achieve this type of recharacterization. The

first involves the postulation of a set of innate constraints, as in Principles and Parameters

(P&P) Theory. A second involves the imposition of a strict ordering on the set of

constraints, as in Optimality Theory (OT). A third approach views constraints not as innate,

but as emergent. A fourth recharacterization involves providing alternative characterizations

of the formal shape of the target grammar. The fifth involves a recharacterization of the end-

state of language learning as probabilistic, rather than deterministic. Let us examine each of

these five recharacterizations.

1. Innate constraints

Generativists argue that children solve the LPLA by obeying innate constraints on the

shape of possible grammars that they consider. Viewed historically, the constraints imposed

by the child have played a large role in the development of generative theory. For example,

early on, generativists realized that, even with informant presentation, the child could not

learn a full transformational grammar of the type proposed in Chomsky (1957). The

problem at that time was a technical one, since the transformational component of the

grammar could be characterized and ordered in so many alternative ways that it was

essentially impossible to know which form was uniquely correct, even with negative

evidence. The solution was to constrain the shape and ordering of transformations

{Chomsky, 1973 #9492}. For example, permutations were eliminated, since they could be

formulated as combinations of additions and deletions.

Pursuing this line of thinking, Wexler and Culicover (1980) showed that constraints

such as subjacency could allow children to acquire a transformational grammar, as long as

30

some types of negative evidence were provided. Their demonstration depended on the fact

that subjacency limited the depth to which the child would have to track interrelations

between syntactic roles across clauses. Lightfoot (1989) then showed that the child could

acquire nearly all of the important rules of the language from non-embedded structures. He

called this degree-0 learnability.

Over the last four decades, each new version of generative grammar has brought with it a

new vision of the innate constraints that provide the child with prior guidance about the

shape of human language. In the 1980s, these constraints involved parameterized principles

contained in a series of modules. Children were thought to begin learning with the

parameters set for some default value and would only change this default setting if they

encountered some triggering linguistic structure (Jespersen, 1922; Matthews &

Demopoulos, 1989).

The learning of marked parameters in the theory of Principles and Parameters (P&P)

can avoid the LPLA #1 if three conditions are met. First, there must be a small set of

possible parameters constituting the set of possible human languages. Second, there must

be a clear specification of the unmarked settings of these parameters. Third, there must be a

clear specification of the surface structure triggers that would lead the child to move from an

unmarked parameter setting to a marked parameter setting for each of the hypothesized

parameters. Despite two decades of work within the framework of P&P, none of these three

conditions has yet been met. Nonetheless, researchers in the P&P tradition remain

optimistic about the program, as well as its newer articulation in the minimalist framework.

Chomsky (1981) has noted that the P&P view of language acquisition leads directly to a

trivial solution to the LPLA. However, there has not yet been any general acceptance of this

view among generative linguists (Osherson, Stob, & Weinstein, 1989) or child language

researchers (Pinker, 1984).

31

2. Strict constraint ordering

Like P&P, Optimality Theory (OT) views language structure as arising from the

application of a universal set of constraints. Learning a particular language is basically just

the learning of the correct ordering of the constraints in this universal set. The fullest

articulation of OT has been in the area of phonology, where Tesar and Smolensky (2000)

have offered a formal proof of the learnability of OT phonology without negative evidence.

Initially, one might think that this demonstration has little to say to the main line of

discussion of language learnability for grammar. However, OT has now also been applied to

syntax (Barbosa, Fox, Hagstrom, McGinnis, & Pesetsky, 1997). Moreover, as Pulleybank

and Turkel (1997) observe, OT faces the same learnability problems in phonology and

syntax.

Although both P&P and OT emphasize the role of constraints in typology and learning,

they are still generative grammars deep down. In P&P, it is assumed that the basic rules of

X-bar syntax and move-α operate to produce all possible structures. The constraints then

apply to filter out from the millions of impossible structures, the few that are actually

grammatical. In OT phonology, the same strategy applies. Each word begins in its

underlying form. Then all possible derivations through the phonological processes that

implement the constraints are applied. All those that violate highly ranked constraints are

thrown out. The single remaining form is the one that violates either no constraint or only

some very weak constraint.

In OT, learning the phonology of a language involves learning a specific ordering of the

universal constraints. Tesar and Smolensky (2000) show that, if one assumes no interaction

between constraints and a strict dominance ordering within each possible language, it is

possible to use a certain form of indirect negative evidence to learn which constraints should

be demoted based on particular data for a language. If a child learns a form from the input

in which constraint B takes precedence over constraint A, and if constraint A is ranked

above constraint B in the child’s current grammar, then the child will simply demote

32

constraint A on the basis of this positive evidence. This method works equally well for

learning either OT phonology or OT syntax.

Both OT and P&P achieve their ability to solve the LPLA at the expense of making

extremely strong claims about the shape of human language. Attempts to test simple

versions of P&P (Hyams, 1986) have not produced clear empirical (Liceras, 1989; Pizzuto

& Caselli, 1993; Valian, 1991) or conceptual (Truscott & Wexler, 1989) support. Direct

application of OT to child language leads to complex derivations (Bernhardt & Stemberger,

1998) and unclear predictive power. Moreover, the rigid ordering assumptions made in OT

seem to undercut its utility as a psycholinguistic theory.

3. Emergent constraints

Evidence that the child follows some general guidelines in recovering from

overgeneralization and avoiding errors can be interpreted as evidence for innate constraints.

However, it can equally well be explained through the operation of emergent constraints that

solidify during the process of language learning itself. In other words, the child can use

language learning to learn about the shape of language learning. In the next major section,

we will examine this possibility in detail.

4. Alternative formal analysis

Gold’s formulation of the LPLA rests on Chomsky’s formulation of relations between

types of grammars known as the Chomsky Hierarchy (Chomsky, 1963). Other formal work

has often presented alternative ways of understanding the shape of human language. By

refining or modifying the formal characterization of human language, these alternative

analyses can lead to markedly different consequences in the context of Gold’s analysis. We

can mention at least two analyses of this type, each of which presents an interesting solution

to the LPLA.

33

One solution to the LPLA strikes directly at the notion (Reich, 1969) that language

cannot be described by finite-state grammars. Hausser (1999) has developed a powerful

parser based on the use of left-associative grammar. He has shown that left-associative

grammar can be expressed as a finite-state grammar that orders words in terms of part-of-

speech categories. Because we know that finite-state grammars can be acquired from

positive evidence (Hopcroft & Ullman, 1979), this means that children should be able to

learn left-associative grammars directly without encountering the LPLA. Given the fact that

these grammars can parse sentences in a time-linear and psycholinguistically plausible

fashion, they would seem to be excellent candidates for further exploration by child

language researchers.

A second formal solution to the LPLA arises in the context of the theory of categorical

grammar. Kanazawa (1998) shows that a particular class of categorial grammars known as

the k-valued grammars can be learned on positive data within the Gold framework.

Moreover, he shows that most of the customary versions of categorial grammar discussed in

the linguistic literature can be included in this k-valued class. These attempts to

recharacterize the nature of human language by revised formal analysis all stand as useful

approaches to the LPLA. By characterizing the target language in a way that makes it

learnable by children, linguists help bridge the gap between linguistic theory and child

language studies.

5. Revised end-state criterion

A particularly powerful solution to the LPLA was proposed by Horning (1969), just

after the publication of the original Gold analysis. Horning showed that, if the notion of

language identification is treated in terms of a certain probability of identification, rather

than an absolute guarantee of no further error ever, then language may be identified on the

basis of positive evidence alone. It is surprising that this solution has not received more

attention. This crucial early demonstration undercuts the core logic of the LPLA, as it

34

applies to the learning of all rule systems up to the level of context-sensitive grammars. If

learning were deterministic, children would go through a series of attempts to hypothesize

the “correct” grammar for the language. Once they hit on the correct identification, they

would then remain correctly with this final guess forever. The fact that adults make speech

errors and differ in their judgments regarding at least some syntactic structures suggests

that this criterion is too strong and that the analysis provided by Horning is more realistic.

The LPLA #2: Errors children never make

Beginning in the early 1980s, workers in the generative tradition began to shift their

attention from the LPLA #1 to the LPLA #2. Realizing that there are many mechanisms

capable of achieving recovery from overgeneralization, this alternative shape of the LPLA

seemed to provide clearer and less ambiguous guidance for the discovery of the contents of

Universal Grammar. Argumentation in this area has centered on characterizing a set of

grammatical errors that English-speaking children never make. Failure to produce possible

errors is then used as evidence for the innateness of structural dependency, c-command and

the three binding conditions, subjacency, and the empty category principle. The basic form

of the argument has remained constant throughout various versions of the theories of

Government and Binding, Principles and Parameters, and Minimalism.

The analysis of non-occuring errors is not linked to the search for a set of parameters

within P&P. Because the erroneous setting of a parameter can lead to overgeneralization,

parameter setting data is relevant to the LPLA #1, not the LPLA #2. Data that are relevant to

LPLA #2 are those that show evidence of non-parameterized universals. The paradigm case

of argumentation based on the LPLA #2 is, instead, the child’s obedience to the Structural

Dependency condition, as presented by Chomsky in his formal discussion with Jean Piaget

(Piatelli-Palmarini, 1980, p. 40). Chomsky notes that children learn early on to move the

auxiliary to initial position in questions like “Is the man coming?” One possible

formulation of this movement rule looks only at the surface structure of a sentence like

35

“The man is coming” and formulates the question as moving the first auxiliary to initial

position. However, if children want to question the proposition given in (1), they will never

produce a movement such as (2). Instead, they will always produce (3).

1. The man who is first in line is coming.

2. Is the man who __ first in line is coming?

3. Is the man who is first in line __ coming?”

The movement of the auxiliary involves a movement of INFL to COMP that is subject to the

head movement constraint. In (2) the auxiliary would have to move around the N’ of

“man” and the CP and Comp of the relative clause, but this would be blocked by the head

movement constraint (HMC). No such barriers exist in the main clause. In addition, if the

auxiliary moves as in (2), it leaves a gap that will violate the empty category principle (ECP).

However, Chomsky’s analysis of this pattern does not rely on the details of the operation of

the ECP and the HMC. Chomsky simply argues that the child has to realize that phrasal

structure is somehow involved in this process and that one cannot formulate the rule of

auxiliary movement as “move the first auxiliary to the front.”

This restriction on auxiliary movement is called “structural dependency.” Chomsky

claims that, “A person might go through much or all of his life without ever having been

exposed to relevant evidence, but he will nevertheless unerringly employ the structure-

dependent generalization, on the first relevant occasion.” A more general statement of this

type provided by Hornstein and Lightfoot (1981) who claim that, “People attain knowledge

of the structure of their language for which no evidence is available in the data to which they

are exposed as children.” As Pullum (1996) has noted, a major problem with Chomsky’s

analysis in this case is the fact that children do indeed hear sentences such as “The child

who is first in line is getting the prize” or “The child who is first in line will get the prize.”

A conservative child can easily hold off on producing auxiliary movement in complex

sentences until hearing one or two sentences with the needed positive evidence.

36

Pullum’s analysis, although technically accurate, seems to miss the essence of

Chomsky’s point. First, it is certainly true that sentences such as (1) are extremely rare in

the input to children. In a search of the input to the three children studied by Brown (1973),

I found no such sentences. Sentences of this type may well appear in the Wall Street

Journal corpus studied by Pullum, but they are rare in the input to children. Second, it

would seem counter-intuitive to argue against Chomsky’s basic point. The structural

dependency condition only requires that the child pay attention to the relations between

words, rather than just their serial order. Behaghel (1923) pointed out that words that are

meaningfully related typically appear next to each other. Some appreciation of this principle

must certainly be basic to both auditory and visual processing across species and is not in

disagreement with any of the fundamental tenets of an emergentist view of learning.

Although Chomsky may have overstated this argument a bit, it is difficult to imagine a

language learner who does not pay some attention to conceptual structure. Given this

general ability to represent conceptual structure, it seems fair enough to wonder what kind

of child would even consider producing a sentence such as “Is the man who first in line is

coming?”

The theory of item-based learning (MacWhinney, 1975, 1982, 1988) supports

Chomsky’s analysis. In that theory, the syntactic positions of arguments are specified in

relation to the predicates with which they cluster. Children learn the positioning of the

auxiliary marking a yes-no question on an item-by-item basis. For each yes-no auxiliary,

children learn that it must appear in preinitial position (before the subject NP). As several of

these yes-no auxiliary item-based patterns accumulate, they form a gang, which then

constitutes an emergent construction (Goldberg, 1999). This learning is driven by positive

evidence. When the child first needs to form a question on the basis of (1), the available

device is therefore one that is formulated in terms of relations, not positions and (3) is

produced, instead of (2). Thus, both an item-based account and a Chomskyan account agree

on the importance of structural dependency. However, the item-based account views the

37

particular implementation of structural dependency in this case as emergent from earlier

item-based learning.

This analysis of a solution to a particular instance of the LPLA #2 relied on positive

evidence, conservative item-based learning, and competition. The mechanisms of monitoring

and indirect negative evidence can provide additional support for (3) over (2). In general, all

of the mechanisms that we discussed in terms of our solution of the LPLA #1 apply with

equal strength to the LPLA #2. Let us consider how these processes apply to some of the

other standard arguments based on the LPLA #2.

One constraint that has a clear impact on adult English is the complex-NP constraint

(Ross, 1974) or head movement constraint that blocks movement of a noun from a relative

clause as in (4) and (5).

4. * Who did John believe the man that kissed __ arrived

5. Who did John believe __ kissed his buddy?

The problems that we have with such sentences like (4) can be viewed in processing terms

(O’Grady, in press). Verbs like “believe” encourage the initial wh-word to continue its

search for a gap in as long, as they are expecting complements, as in (5). However, when the

expectation for a complement is blocked by the presence of a complex NP as direct object,

the usual complement-based filler strategy is thrown for a loop. It is important to realize that

what causes the problem is the ambiguity after the verb, not the time taken to find a gap. For

example, we can compare (6) in which a gap is found right away with (7) in which it is

found later.

6. Who could my friends have asked __ to take the biscuits to Tom last week?

7. Who could my friends have asked us to take the biscuits to Tom for __ last week?

Neither of these causes problems, because the cues for continuing the search are clear. The

complex-NP constraint also blocks movement from prepositional phrases and other

complex NPs, as in

8. * Who did pictures of ___ surprise you?

38

9. * What did you see a happy ___ ?

10. * What did you stand between the wall and ___ ?

The constraint in (8) has also been treated as the coordinated-NP constraint in some

accounts. Although it appears that most children obey these constraints, there are some

exceptions. Wilson and Peters (1988) present these violations of the complex NP constraint

from Wilson’s son Seth:

what am I cooking on a hot __ ? (-- stove)

what did I get lost at the __ , Dad?

what are we gonna look for some __ ? (houses)

what is this a funny __ , Dad?

what are we gonna push number __ ? (9)

where did you pin this on my __ ? (robe)

what are you shaking all the __ ? (batter and milk)

what is this medicine for my __ ? (cold)

what are we gonna go at Auntie and __ ? (priya - name of babysitter)

Nearly all of these violations involve movement of a noun modified by an adjective. It

appears that Seth had in fact learned to produce these violations almost as a game.

Nonetheless, it is interesting to see that this putatively universal principle could be so easily

violated by a young child.

In my own recording of my sons Ross and Mark, I only observed a very few violations.

One occurred when my son Mark was 5;4.4. He said (out of the blue as it were): “Dad,

next time when it's Indian Guides and my birthday, what do you think a picture of ___

should be on my cake?” Catherine Snow reports that at age 10;10, her son Nathaniel said,

“I have a fever, but I don't want to be taken a temperature of.” Most researchers would

agree that violations are rare. However, the structures that might trigger violations are also

rare.

The binding theory (Chomsky, 1981) focused quite heavily on a set of three proposed

universal conditions on the binding of pronouns and reflexives to referents. Sentence (11)

illustrates two of the constraints. In (11), “he” cannot be coreferential with “Bill” because

39

“Bill” does no c-command the pronoun. At the same time, “himself” must be coreferential

with “Bill” because it is a clausemate and does c-command “Bill.”

11. He said that Bill hurt himself.

When attempting to apply the LPLA to the study of the binding constraints, it is important

to remember that the sentences produced or interpreted are fully grammatical. However, one

of the possible interpretations is disallowed by the universal constraints. This means that, to

study the imposition of the constraints, researchers must rely on comprehension studies.

As an example of the studies conducted during this period, consider this example from a

study of long-distance movement of adjuncts by de Villiers, Roeper, and Vainikka (1990).

Children were divided into two age groups: 3;7 to 5;0 and 5;1 to 6;11. They were given

sentences such as:

12. When did the boy say he hurt himself?

13. When did the boy say how he hurt himself?

14. Who did the boy ask what to throw?

For (12), 44% gave long distance interpretations, associating “when” with “hurt himself”.

For (13), with a medial wh-phrase blocking a long-distance interpretation, only 6% were

long-distance responses. So children were sensitive to the conditions on traces, in accord

with P&P theory. However, it appears that this sensitivity develops over time. In the

youngest group, children had trouble even understanding sentences with medial arguments

like (14). The fact that this ability improves over time suggests that there may well be

learning occurring for the easier patterns such as (12) at an earlier age.

The argument in this particular case is very different from Chomsky’s argument

regarding the structure dependency constraint. In this case, we know that children

themselves actually produce sentences with these structures. De Villiers et al report these

instances from Brown’s subject Adam:

What chu like to have? – 30 months

What you think this look like? – 30 months

What he went to play with? – 31 months

40

What do you think the grain is going to taste like? – 55 months

The question is when are children able to construct the two interpretations for (12) and

when do they realize that only one of these interpretations is available for (12)? The P&P

answer is that this depends on parameter-setting. First, the child must realize that their

language allows movement, unlike Chinese. Next they must decide whether the movement

can be local, as in German, or both local and distant as in English. Finally, they must decide

whether the movement is indexed by pronouns, traces, or both. However, once a parameter-

setting account is detailed in this way, it can be difficult to distinguish it from a learning

account. Using positive evidence, children can first learn that some movement can occur.

Next, they can learn to move locally and finally they can acquire the cues to linking the

moved argument to its original argument position, one by one. In learning these structures,

children must be sensitive to complex syntactic configurations. This means that any learning

account must provide a large role for syntactic structure and provide mechanisms that are

capable of acquiring complex patterns.

Implications

The study of the LPLA provided a useful focus for child language research in the 1970s

and 1980s. However, the use of the LPLA #1 as a way of guiding research has not kept

pace with advances in theory, experimentation, and observation. We now know that recovery

from overgeneralization is supported by a set of five powerful processes that effectively

solve the LPLA #1. The process of recovery from overgeneralization continues to be an

important research topic, but it is not longer appropriate to conduct this investigation within

the narrow conceptual focus of the LPLA #1.

The LPLA #2 has more life in it. Human language is the result of a long, gradual

process of evolution (MacWhinney, in press). This process has provided us with some clear

ideas about the possible shapes of sounds, words, and sentences in language. These ideas

41

are grounded primarily on facts about our body (MacWhinney, 1999b) and general

processes in cognition, perception, and action. By pursuing the study of error-free

acquisition in the context of the LPLA #2, we can hope to shed light on these universals.

However, we need to conduct this study in the context of an integrated account that derives

insights from each of the major competing visions.

How can we unite the insights of the three major competing views of language

development to derive a fuller, more satisfying account? One framework for producing this

integration is provided by the concept of emergentism (MacWhinney, 2001). Emergentism

views language structure are emerging from processes operating on six different time scales,

including phylogeny, embryology, development, online processing, and diachronics.

Emergentism in the area of language acquisition commits itself to providing a neurologically

and socially grounded mechanistic account of the interaction of these forces. This means

that any integration of the three competing visions must occur on the level of neural

mechanism and the body. Constructing this account is currently a goal, rather than an

achieved reality (Elman, Bates, Plunkett, Johnson, & Karmiloff-Smith, 1996).

One way to begin building this integration is to look at how socialization processes

interact with specific learning mechanisms. In the Competition Model, children rely on

stored auditory representations to recover from overgeneralization. These stored

representations are in fact delayed traces of interactions with adults. This means that an

integrated emergentist theory needs to understand the ways in which adults can assist the

child in acquiring accurate stored auditory forms. One way in which a parent can do this is

through recasting. Marcus (1993) have suggested that parents are inconsistent in their

provision of negative evidence to the child. However, there is abundant evidence that parents

can provide finely tuned, sensitive input (Snow, 1995). This suggests that what is important

to the child is not the provision of negative evidence, but the sensitive provision of finely

tuned positive evidence in accord with the Competiton Model analysis. As Merriman (1999)

has argued, successful learning depends on the child being able to attend to the objects and

42

actions being discussed. Tomasello (1999) has also emphasized the role of joint attention

and mutual understanding in language learning. Careful examination of the impact of these

social frameworks on language learning can further clarify the processes of recovery from

overgeneralization.

One promising avenue for developing an emergentist account would integrate analyses

and findings from generative theory with the theory of item-based learning. The clearer

separation of phrasal structure, lexicon, and processing through unification that Chomsky

has articulated in the current Minimalist Program matches up in some ways with the claims

of item-based learning and Construction Grammar. However, there is not yet a fully

powerful way of simulating item-based learning in neural networks (MacWhinney, 1999a).

This means that major advances must be achieved in learning theory models to properly

model the actions of an item-based processor. In summary, the successful construction of

an integrated emergentist account of error-free learning will require major conceptual

advances in each of the three major competing visions of human language learning.

References

Anderson, N. (1982). Methods of information integration theory. New York: Academic

Press.

Angluin, D. (1980). Inductive inference of formal languages from positive data. Information

and Control, 45, 117-135.

Baker, C. L. (1979). Syntactic theory and the projection problem. Linguistic Inquiry, 10,

533-581.

Baker, C. L., & McCarthy, J. J. (Eds.). (1981). The logical problem of language

acquisition. Cambridge: MIT Press.

43

Barbosa, P., Fox, D., Hagstrom, P., McGinnis, M., & Pesetsky, D. (Eds.). (1997). Is the

best good enough: Optimality and competition in syntax. Cambridge, MA: MIT

Press.

Behaghel, O. (1923). Deutsche Syntax. Heidelberg: Winter.

Bernhardt, B., & Stemberger, J. (1998). Handbook of phonological development. San

Diego, CA: Academic.

Berwick, R. (1987). Parsability and learnability. In B. MacWhinney (Ed.), Mechanisms of

Language Acquisition. Hillsdale, NJ: Lawrence Erlbaum Associates.

Bohannon, N., MacWhinney, B., & Snow, C. (1990). No negative evidence revisited:

Beyond learnability or who has to prove what to whom. Developmental Psychology,

26, 221-226.

Bohannon, N., & Stanowicz, L. (1988). The issue of negative evidence: Adult responses to

children's language errors. Developmental Psychology, 24, 684-689.

Bowerman, M. (1987). Commentary. In B. MacWhinney (Ed.), Mechanisms of language

acquisition. Hillsdale, N.J.: Lawrence Erlbaum Associates.

Bowerman, M. (1988). The "no negative evidence" problem. In J. Hawkins (Ed.),

Explaining language universals (pp. 73-104). London: Blackwell.

Braine, M. D. S. (1976). Children's first word combinations. Monographs of the Society for

Research in Child Development, 41, (Whole No. 1).

Braine, M. D. S. (1989). Modeling the acquisition of linguistic structure. In Y. Levy & I.

Schlesinger & M. Braine (Eds.), Categories and processes in language acquisition

(pp. 217-259). Hillsdale, NJ: Lawrence Erlbaum Associates.

Brown, R. (1973). A first language: The early stages. Cambridge, MA: Harvard.

Brown, R., & Hanlon, C. (1970). Derivational complexity and order of acquisition in child

speech. In J. R. Hayes (Ed.), Cognition and the development of language (pp. 11-

54). New York: Wiley.

44

Bruner, J. (1978). On prelinguistic prerequisites of speech. In R. N. Campbell & P. T.

Smith (Eds.), Recent Advances in the Psychology of Language. New York: Plenum

Press.

Carey, S. (1985). Conceptual change in childhood. Cambridge, MA: MIT Press.

Chafe, W. (1987). Cognitive constraints on information flow. In R. Tomlin (Ed.),

Coherence and grounding in discourse. Philadelphia, PA: Benjamins.

Chomsky, N. (1957). Syntactic Structures. The Hague: Mouton.

Chomsky, N. (1963). Formal properties of grammars. In R. B. R. Luce & E. Galanter

(Eds.), Handbook of mathematical psychology (Vol. 2). New York: Wiley.

Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press.

Chomsky, N. (1980). Rules and Representations. New York: Columbia University Press.

Chomsky, N. (1981). Lectures on government and binding. Cinnaminson, NJ: Foris.

de Villiers, J., Roeper, T., & Vainikka, A. (1990). The acquisition of long distance rules. In

L. Frazier & J. de Villiers (Eds.), Language processing and language acquisition.

Amsterdam: Kluwer.

Demetras, M., Post, K., & Snow, C. (1986). Feedback to first-language learners. Journal of

Child Language, 13, 275-292.

Elbers, L., & Wijnen, F. (1993). Effort, production skill, and language learning. In C.

Ferguson & L. Menn & C. Stoel-Gammon (Eds.), Phonological development (pp.

337-368). Timonium, MD: York.

Elman, J., Bates, E., Plunkett, K., Johnson, M., & Karmiloff-Smith, A. (1996). Rethinking

innateness. Cambridge, MA: MIT Press.

Ervin-Tripp, S. (1981). Social process in first and second language learning. In H. Winitz

(Ed.), Native language and foreign language acquisition. New York, N. Y.: The

New York Academy of Sciences.

Farrar, J. (1992). Negative evidence and grammatical morpheme acquisition. Developmental

Psychology, 28, 90-98.

45

Fodor, J., & Crain, S. (1987). Simplicity and generality of rules in language acquisition. In

B. MacWhinney (Ed.), Mechanisms of Language Acquisition. Hillsdale, N.J.:

Lawrence Erlbaum.

Freud, S. (1958). Psychopathology of everyday life. New York: New American Library,

Mentor.

Givón, T. (1979). On understanding grammar. New York: Academic Press.

Gold, E. (1967). Language identification in the limit. Information and Control, 10, 447-

474.

Goldberg, A. E. (1999). The emergence of the semantics of argument structure

constructions. In B. MacWhinney (Ed.), The emergence of language (pp. 197-213).

Mahwah, NJ: Lawrence Erlbaum Associates.

Hausser, R. (1999). Foundations of computational linguistics: Man-machine

communication in natural language. Berlin: Springer.

Heath, S. (1983). Ways with words: Language, life and work in communities and

classrooms. Cambridge: Cambridge University Press.

Herbart, J. F. (1891). A text-book in psychology. New York: Appleton and Co.

Hirsh-Pasek, K., Trieman, R., & Schneiderman, M. (1984). Brown and Hanlon revisited:

mother sensitivity to grammatical form. Journal of Child Language, 11, 81-88.

Hopcroft, J., & Ullman, J. (1979). Introduction to automata theory, languages, and

computation. Reading, Mass.: Addison-Wesley.

Hopper, P. (1987). Emergent grammar. In J. Aske & N. Beery & L. Michaelis & H. Filip

(Eds.), Berkeley Linguistic Society. Vol 13. Berkeley: University of California Press.

Horning, J. J. (1969). A study of grammatical inference.: Stanford University, Computer

Science Department.

Hornstein, N., & Lightfoot, D. (1981). Explanation in linguistics: the logical problem of

language acquisition. London: Longmans.

46

Hyams, N. (1986). Language acquisition and the theory of parameters. Dordrecht: D.

Reidel.

Hymes, D. (1964). Language in culture and society: A reader in linguistics and

anthropology. New York: Harper and Row.

Jespersen, O. (1922). Language: Its nature, development, and origin. London: George

Allen and Unwin.

Kanazawa, M. (1998). Learnable classes of categorial grammars. Stanford, CA: CSLI

Publications.

Kawamoto, A. (1994). One system or two to handle regulars and exceptions: How time-

course of processing can inform this debate. In S. D. Lima & R. L. Corrigan & G.

K. Iverson (Eds.), The reality of linguistic rules (pp. 389-416). Amsterdam: John

Benjamins.

Lasnik, H. (1989). On certain substitutes for negative data. In R. Matthews & W.

Demopoulos (Eds.), Learnability and linguistic theory. Dordrecht: Kluwer.

Liceras, J. (1989). On some properties of the "pro-drop" parameter: looking for missing

subjects in non-native Spanish. In S. Gass & J. Schachter (Eds.), Linguistic

perspectives on second language acquisition (pp. 109-133). Cambridge: Cambridge

University Press.

Lieven, E. V. M., Pine, J. M., & Baldwin, G. (1997). Positional learning and early

grammatical development. Journal of Child Language, 24, 187-219.

Lightfoot, D. (1989). The child's trigger experience: Degree-0 learnability. Behavioral and

Brain Sciences, 12, 321-275.

MacDonald, M. C., Pearlmutter, N. J., & Seidenberg, M. S. (1994). Lexical nature of

syntactic ambiguity resolution. Psychological Review, 101(4), 676-703.

MacWhinney, B. (1975). Pragmatic patterns in child syntax. Stanford Papers And Reports

on Child Language Development, 10, 153-165.

47

MacWhinney, B. (1978). The acquisition of morphophonology. Monographs of the Society

for Research in Child Development, 43, Whole no. 1, pp. 1-123.

MacWhinney, B. (1982). Basic syntactic processes. In S. Kuczaj (Ed.), Language

acquisition: Vol. 1. Syntax and semantics (pp. 73-136). Hillsdale, NJ: Lawrence

Erlbaum.

MacWhinney, B. (1987). Toward a psycholinguistically plausible parser. In S. Thomason

(Ed.), Proceedings of the Eastern States Conference on Linguistics. Columbus,

Ohio: Ohio State University.

MacWhinney, B. (1988). Competition and teachability. In R. Schiefelbusch & M. Rice

(Eds.), The teachability of language (pp. 63-104). New York: Cambridge University

Press.

MacWhinney, B. (1989). Competition and lexical categorization. In R. Corrigan & F.

Eckman & M. Noonan (Eds.), Linguistic categorization (pp. 195-242).

Philadelphia: Benjamins.

MacWhinney, B. (1991). Reply to Woodward and Markman. Developmental Review, 11,

192-194.

MacWhinney, B. (1993). The (il)logical problem of language acquisition, Proceedings of

the Fifteenth Annual Conference of the Cognitive Science Society (pp. 61-70).

Hillsdale, NJ: Lawrence Erlbaum Associates.

MacWhinney, B. (1999a). Connectionism and language learning. In S. Kemmer (Ed.),

Data-driven models of language learning. Stanford: CSLI Press.

MacWhinney, B. (1999b). The emergence of language from embodiment. In B.

MacWhinney (Ed.), The emergence of language (pp. 213-256). Mahwah, NJ:

Lawrence Erlbaum.

MacWhinney, B. (2001). Emergence from what? Journal of Child Language, 28, 726-736.

MacWhinney, B. (in press). The gradual evolution of language. In T. Givón & B. Malle

(Eds.), The evolutionary emergence of language. Amsterdam: Benjamins.

48

MacWhinney, B. (Ed.). (1999c). The emergence of language. Mahwah, NJ: Lawrence

Erlbaum Associates.

MacWhinney, B., & Bates, E. (Eds.). (1989). The crosslinguistic study of sentence

processing. New York: Cambridge University Press.

MacWhinney, B., & Leinbach, J. (1991). Implementations are not conceptualizations:

Revising the verb learning model. Cognition, 29, 121-157.

MacWhinney, B. J., Leinbach, J., Taraban, R., & McDonald, J. L. (1989). Language

learning: Cues or rules? Journal of Memory and Language, 28, 255-277.

Maratsos, M., Kuczaj, S. A., Fox, D. E., & Chalkley, M. A. (1979). Some empirical studies

in the acquisition of transformational relations: Passives, negatives, and the past

tense. In W. A. Collins (Ed.), Children's language and communication. Hillsdale,

N.J.: Lawrence Erlbaum.

Marcus, G. (1993). Negative evidence in language acquisition. Cognition, 46, 53-85.

Markman, E. (1989). Categorization and naming in children: Problems of induction.

Cambrdige, MA: MIT Press.

Massaro, D. (1987). Speech perception by ear and eye. Hillsdale, NJ: Lawrence Erlbaum.

Matthews, R., & Demopoulos, W. (1989). Learnability and linguistic theory. Dordrecht:

Kluwer.

McNeill, D. (1966). The creation of language by children. In J. Lyons & R. Wales (Eds.),

Psycholinguistics papers. Edinburgh: University of Edinburgh Press.

Merriman, W. (1999). Competition, attention, and young children's lexical processing. In B.

MacWhinney (Ed.), The emergence of language (pp. 331-358). Mahwah, NJ:

Lawrence Erlbaum.

Miikkulainen, R. (1993). Subsymbolic natural language processing. Cambridge, MA: MIT

Press.

49

Miikkulainen, R., & Mayberry, M. R. (1999). Disambiguation and grammar as emergent

soft constraints. In B. MacWhinney (Ed.), The emergence of language (pp. 153-

176). Mahwah, NJ: Lawrence Erlbaum Associates.

Moerk, E. (1983). The mother of Eve as a first language teacher. Norwood, N.J.: ABLEX.

Morgan, J. L., Bonamo, K. M., & Travis, L. L. (1995). Negative evidence on negative

evidence. Developmental Psychology, 31, 180-197.

Nelson, K. (1982). Experimental gambits in the service of language acquisition theory. In S.

Kuczaj (Ed.), Language development: Syntax and Semantics. Hillsdale, N.J.:

Lawrence Erlbaum.

Nelson, K. E., Denninger, M. S., Bonvilian, J. D., Kaplan, B. J., & Baker, N. D. (1984).

Maternal input adjustments and non-adjustments as related to children's linguistic

advances and to language acquisition theories. In A. D. Pellegrini & T. D. Yawkey

(Eds.), The development of oral and written language in social contexts. Norwood,

N.J.: Ablex Publishing Corporation.

Ochs, E. (1985). The acquisition of Samoan. In D. I. Slobin (Ed.), The crosslinguistic study

of language acquisition. Volume 1: The data. Hillsdale, NJ: Lawrence Erlbaum

Associates.

Osherson, D., Stob, M., & Weinstein, S. (1989). Learning theory and natural language. In

R. Matthews & W. Demopoulos (Eds.), Learnability and linguistic theory.

Dordrecht: Kluwer.

Penner, S. G. (1987). Parental responses to grammatical and ungrammatical child

utterances. Child Development, 58, 376-384.

Piatelli-Palmarini, M. (1980). Language and learning: the debate between Jean Piaget and

Noam Chomsky. Cambridge MA: Harvard University Press.

Pine, J. M., Lieven, E. V. M., & Rowland, C. F. (1998). Comparing different models of the

development of the English verb category. Linguistics, 36, 4-40.

50

Pinker, S. (1984). Language learnability and language development. Cambridge, Mass:

Harvard University Press.

Pizzuto, E., & Caselli, M. (1993). The acquisition of Italian morphology: A reply to Hyams.

Journal of Child Language, 20, 707-712.

Plaut, D. C., McClelland, J. L., Seidenberg, M. S., & Patterson, K. (1996). Understanding

normal and impaired word reading: Computational principles in quasi-regular

domains. Psychological Review, 103, 56-115.

Plunkett, K., & Marchman, V. (1991). U-shaped learning and frequency effects in a multi-

layered perceptron: Implications for child language acquisition. Cognition, 38, 43-

102.

Plunkett, K., & Marchman, V. (1993). From rote learning to system building. Cognition,

49, xx-xx.

Post, K. (1994). Negative evidence. In J. Sokolov & C. Snow (Eds.), Handbook of

Research in Language Development Using CHILDES (pp. 132-173). Hillsdale, NJ:

Lawrence Erlbaum Associates.

Pulleybank, D., & Turkel, W. (1997). The logical problem of language acquisition in

Optimality Theory. In P. Barbosa & D. Fox & P. Hagstrom & M. McGinnis & D.

Pesetsky (Eds.), Is the best good enough? Optimality and competition in syntax

(pp. 399-420). Cambridge, MA: MIT Press.

Pullum, G. (1996). Learnability, hyperlearning, and the poverty of the stimulus. In J.

Johnson & M. Juge & J. Moxley (Eds.), Proceedings of the 22nd Annual Meeting:

General Session and Parasession on the Role of Learnability in Grammatical

Theory (pp. 498-513). Berkeley, CA: Berkeley Linguistics Society.

Reich, P. (1969). The finiteness of natural language. Language, 45, 831-843.

Ross, J. (1974). Three batons for cognitive psychology. In W. B. Weimer & D. S. Palermo

(Eds.), Cognition and the symbolic processes. New York: Wiley.

51

Rumelhart, D. E., & McClelland, J. L. (1986). On learning the past tense of English verbs.

In J. L. McClelland & D. E. Rumelhart (Eds.), Parallel distributed processing:

Explorations in the microstructure of cognition (pp. 216-271). Cambridge: MIT

Press.

Saxton, M. (1997). The Contrast Theory of negative input. Journal of Child Language, 24,

139-161.

Saxton, M., Kulcsar, B., Greer, M., & Rupra, M. (1998). Longer term effects of corrective

input: An experimental approach. Journal of Child Language, 25, 701-721.

Schieffelin, B. (1985). The acquisition of Kaluli. In D. Slobin (Ed.), The crosslinguistic

study of language acquisition. Volume 1: The data. Hillsdale, NJ: Lawrence

Erlbaum Associates.

Scollon, R. (1976). Conversations with a one year old: A case study of the developmental

foundation of syntax. Honolulu: University Press of Hawaii.

Snow, C. (1995). Issues in the study of input: Finetuning, universality, individual and

developmental differences, and necessary causes. In P. Fletcher & B. MacWhinney

(Eds.), The handbook of child language (pp. 180-193). Oxford: Blackwells.

Sokolov, J. L. (1993). A local contingency analysis of the fine-tuning hypothesis.

Developmental Psychology, 29, 1008-1023.

Sokolov, J. L., & MacWhinney, B. (1990). The CHIP framework: Automatic coding and

analysis of parent-child conversational interaction. Behavior Research Methods,

Instruments, and Computers, 22, 151-161.

Tesar, B., & Smolensky, P. (2000). Learnability in optimality theory. Cambridge, MA: MIT

Press.

Tomasello, M. (1992). First verbs: A case study of early grammatical development.

Cambridge: Cambridge University Press.

Tomasello, M. (1999). The cultural origins of human communication. New York:

Cambridge University Press.

52

Tomasello, M. (2000). Do young children have adult syntactic competence? Cognition, 74,

209-253.

Truscott, J., & Wexler, K. (1989). Some problems in the parametric analysis of learnability.

In R. Matthews & W. Demopoulos (Eds.), Learnability and linguistic theory.

Dordrecht: Kluwer.

Valian, V. (1991). Syntactic subjects in the early speech of American and Italian children.

Cognition, 40, 21-81.

Wexler, K., & Culicover, P. (1980). Formal principles of language acquisition. Cambridge,

Mass.: MIT Press.

Wilson, B., & Peters, A. M. (1988). What are you cookin' on a hot?: Movement Constraints

in the Speech of a Three-Year-Old Blind Child. Language, 64, No.2, 249-273.

Wolfe Quintero, K. (1992). Learnability and the acquisition of extraction in relative clauses

and wh-questions. Studies in Second Language Acquisition, 14, 39-70.

i The competition between “went” and “*goed” has also been treated as an instance of

“blocking” (Baker & McCarthy, 1981; Pinker, 1984). In the blocking account, “went” is

said to block “*goed” because lexically-based rules are ordered before general rules in the

rule cycle of the morphological component. This account involves an unnecessary

commitment to strict rule-ordering and an unnecessary invocation of an ability to order rules

according to some innate criteria. Since the explanatory power of blocking is completely

captured by the mechanism of competition, we will rely on competition here, rather than

blocking.

macwhinney rethinking logical problem

Documents