macwhinney rethinking logical problem
TRANSCRIPT
1
Rethinking the Logical Problem of Language Acquisition
Brian MacWhinney
Carnegie Mellon University
[The child’s acquisition of grammar] is hopelessly
underdetermined by the fragmentary evidence available.
-- Chomsky 1968 Language and Mind
Abstract
The study of child language acquisition is dominated by three major competing visions:
socialization theory, learning theory, and nativist theory. Each takes a different approach to a
core issue in developmental psycholinguistics known as the logical problem of language
acquisition (LPLA). This paper argues that the LPLA is composed of two, only partially
related, sub-problems. The first form of the LPLA emphasizes recovery from
overgeneralization. Nativists claim that, contrary to the claims of socialization theory,
recovery occurs without corrective feedback under the guidance of innate constraints.
Learning theory presents five plausible and interesting alternatives to constraints, including:
conservatism, indirect negative evidence, competition, cue construction, and monitoring. The
second form of the LPLA focuses on error-free processes in acquisition. The nativist claim
is that error-free performance shows that the child understands the possible shape of human
language. However, error-free performance can also arise from these same five learning
mechanisms. The availability of so many mechanisms for addressing the logical problem
2
indicates that it is time to view recovery from overgeneralization and error-free learning not
as logical problems, but evidence for the collaboration of acquisitional supports. We now
need to specify the interactions of mechanisms derived from each of the three major
competing visions. Emergentist models (MacWhinney, 1999c) present a particularly
promising framework for specifying this integration, if they can develop richer linguistic
representations and make fuller use of data on spontaneous conversational interactions.
Three Approaches to Child Language Learning
The study of child language acquisition is dominated by three major competing visions:
socialization theory, learning theory, and nativist theory. Socialization theory holds that
language is acquired from social interactions. Nativist theory holds that language is innately
derived from a series of genetically programmed modules. Connectionist theory holds that
language is acquired from the detection of patterns in the input. Each of these theories is
committed to providing fundamental accounts for all of the core phenomena of language
acquisition. Among these core phenomena, one that has been the particular focus of
theoretical attention is the capacity for recovery from overgeneralization.
Overgeneralization and the subsequent recovery from overgeneralization are common
processes in the normal course of language acquisition. Sometime during the first years,
every normally developing English-speaking child will produce an overgeneralization like
“goed” or “ated.” We can be sure that children will also learn to stop making these errors.
Each of the three major competing visions gives a very different story about how and why
this recovery occurs. This paper will examine those assumptions, calling into question each
of the currently accepted approaches to this issue and suggesting an alternative approach
grounded on the notion of multiple supports for language learning.
3
1. Socialization Theory
The oldest and most widely held approach to language acquisition is socialization
theory. This approach focuses on the role of caregivers as sources of social wisdom.
Children are viewed as novices who are learning to act like others so that they can
communicate their desires. The earliest articulation of this point of view was provided by St.
Augustine in his Confessions when he described the ways in which he crudely negotiated
the meanings of words with his elders in order to express his wills and desires.
This I remember; and have since observed how I learned to speak. It was
not that my elders taught me words (as, soon after, other learning) in any
set method; but I, longing by cries and broken accents and various
motions of my limbs to express my thoughts, that so I might have my
will, and yet unable to express all I willed or to whom I willed, did myself,
by the understanding which Thou, my God, gavest me, practise the
sounds in my memory. When they named anything, and as they spoke
turned towards it, I saw and remembered that they called what they would
point out by the name they uttered. And that they meant this thing, and no
other, was plain from the motion of their body, the natural language, as it
were, of all nations, expressed by the countenance, glances of the eye,
gestures of the limbs, and tones of the voice, indicating the affections of
the mind as it pursues, possesses, rejects, or shuns. And, thus, by
constantly hearing words, as they occurred in various sentences, I
collected gradually for what they stood; and, having broken in my mouth
to these signs, I thereby gave utterance to my will. Thus, I exchanged with
those about me these current signs of our wills, and so launched deeper
into the stormy intercourse of human life, yet depending on parental
authority and the beck of elders.
This view of language as a negotiated expression of will fits in well with the views of many
developmental psychologists (Bruner, 1978; Ervin-Tripp, 1981; Moerk, 1983; Snow, 1995;
Tomasello, 1999); social anthropologists (Heath, 1983; Hymes, 1964; Ochs, 1985; Scollon,
1976); and functional linguists (Chafe, 1987; Givón, 1979). Perhaps the strongest version
of socialization theory is the position advocated by Hopper (1987) who suggests that
4
grammar emerges directly from social interaction. In child language, we see that the first
uses of grammatical forms are often tightly linked to discourse contexts. For example,
Schieffelin (1985) showed that the emergence of the Kaluli ergative is confined to high
transitive uses in particular conversational contexts. Similarly, Idiazabal (in press) shows
that the first uses of the perfective in Basque appear within narrative structures.
Socialization theory emphasizes the developmental importance of corrective feedback. In
many middle class families, those “magic moments” in which the parent provides
corrective feedback occur hundreds of times each day. Let us take a look at one of the most
often cited of these moments, as reported by McNeill (1966):
Child: Nobody don’t like me.
Mother: No, say “Nobody likes me.”
Child: Nobody don’t like me.
(dialogue repeated eight times)
Mother: Now listen carefully, say “Nobody likes me.”
Child: Oh! Nobody don’t likeS me.
Examining data from Adam, Eve, and Sarah (Brown, 1973) for evidence of learning during
these magic moments, Brown and Hanlon (1970) found that correction was more often for
meaning than for form and that, when formal correction was provided by the parent, it was
not immediately echoed by the child.
Further research has modified this initial assessment. Typically, the form of corrective
feedback is not so overt as in the example from McNeill. Instead, parents rely on more
subtle forms of recasting. Parents tend to provide corrections for form most often when the
child’s utterance is very close to the adult standard, often containing only one error
(Bohannon, MacWhinney, & Snow, 1990; Bohannon & Stanowicz, 1988). Children who
receive corrective feedback in the form of recasts tend to learn the corrected structures more
quickly (Farrar, 1992; Nelson, 1982; Nelson, Denninger, Bonvilian, Kaplan, & Baker,
1984). A very general finding of this research is that the type of feedback that parents
provide to their children is finely tuned to the developmental stage of the child’s grammar
(Demetras, Post, & Snow, 1986; Hirsh-Pasek, Trieman, & Schneiderman, 1984; Morgan,
5
Bonamo, & Travis, 1995; Penner, 1987; Post, 1994; Snow, 1995; Sokolov, 1993; Sokolov
& MacWhinney, 1990).
It makes sense for a parent to provide some form of corrective feedback. However,
unless the feedback is extremely stereotypic, the child may have trouble interpreting it as an
overt correction (Marcus, 1993; Saxton, 1997). Consider the most extreme and clear form
of corrective feedback. Every time the child makes a grammatical mistake, the parent would
clap his hands and say “ungrammatical.” If a parent were to provide absolutely obvious
and uniform negative evidence in this way, interactions would look like this:
Child: me want more.
Father: ungrammatical.
Child: want more milk.
Father: ungrammatical.
Child: more milk!
Father: ungrammatical.
Child: cries
Father: ungrammatical
However, parents cannot interact with their children in this unresponsive way. If they are to
provide any form of feedback, it needs to be through recasting and expansion, rather than
overt correction. Here is a more plausible interaction:
Child: Me want more.
Father: You want more? More what?
Child: Want more milk.
Father: You want more milk?
Child: More milk!
Father: Sure, honey, I’ll get you some more.
Child: (cries)
Father: Now don’t cry. Daddy is getting you some.
The parent’s main goal in providing feedback to the child is not the provision of
negative evidence, but the extraction of the child’s meaning and the maintenance of a
successful interaction. When one thinks a bit about the language learning process, this
makes sense. If the parent started from the beginning by providing uniform negative
6
feedback to all ungrammatical sentences, virtually all of the child’s first 1000 utterances
would be marked with the word “ungrammatical” and an eyebrow raise or a clap of the
hands. The child would learn little from this process except perhaps to avoid communicating
with a person who provides nothing but raised eyebrows.
Proponents of socialization theory can argue that feedback need not be provided in this
absolute fashion. Rather, both parents and children could obey the principles of signal
detection theory by maximizing “hits” and “correct rejections,” while minimizing
“misses” and “false alarms.” However, Marcus (1993) has shown that the actual
distribution of individual types of feedback such as recasts or expansions to specific
syntactic constructions is so noisy that a huge amount of feedback would be required before
the child could establish a sufficient level of confidence to know that a given construction is
either correct or incorrect.
Unfortunately, much of the discussion of the use of negative evidence has tended to
underestimate the information-processing abilities of the child. In most models, it is
assumed that the child focuses on the use of a single cue, rather than a combination of cues.
By integrating a variety of cues with differential cue validities (Anderson, 1982;
MacWhinney & Bates, 1989; Massaro, 1987), the child could establish an overall “negative
feedback index” for each utterance. Some of the cues that could be integrated include overt
correction, recasting, expansions, clarification questions, topic continuation, proxemics,
gesture, and intonation. If the child could put together all of this information, there might be
enough parental feedback to tag sentences as grammatical or ungrammatical.
Socialization theory emphasizes the importance of tutoring, scaffolding, and corrective
feedback as cues that guide the child through every step of linguistic socialization. This view
tends to minimize the importance of a priori hypotheses while maximizing the impact of the
structure of the sociolinguistic environment. Socialization theory places a great emphasis on
the “here and now” as the wellspring of grammatical learning. Because it assigns no
particular role to memory or off-line hypothesis checking, socialization theory views
7
linguistic input as having a direct and immediate effect on language learning. For the issue
of recovery from overgeneralization, the finding that would provide the strongest support for
socialization theory is one that shows direct links between parental feedback and recovery
from overgeneralization. Evidence that parental feedback plays no direct role in language
acquisition would strike at the heart of socialization theory by weakening its conceptual
underpinnings and markedly limiting its scope.
2. Learning Theory
The second major vision of the process of language learning is that espoused by the
empiricists and associationists. The roots of empiricism go back to Aristotle and the
Skeptics in ancient Greece. In the modern period, philosophers such as Locke, Hume, and
Berkeley outlined the general shape of associationist psychology. In the period between the
two world wars, associationist thinking was a dominant theme in American psychology.
During that period, associationism became closely linked to behaviorism, particularly in the
work of Skinner, Hull, and Thorndike.
In modern times, associationist thinking has reemerged without its earlier behaviorist
linkages in the context of connectionism or neural network modeling. For our current
concerns, one of the most interesting connectionist models is the account of past tense
learning explored by Rumelhart and McClelland (1986), Plunkett and Marchman (1991),
and MacWhinney and Leinbach (1991). Connectionism tends to be rather agnostic on the
issue of the Poverty of Stimulus. Some connectionist models such as Back Propagation rely
heavily on corrective feedback. Others, such as Competitive Learning and Adaptive
Resonance Theory (ART) learn simply on positive data. Despite this apparent eclecticism
and agnosticism, the issue of the Logical Problem of Language Acquisition is just as much
a problem for connectionists as it is for nativists and social environmentalists. Not all
learning mechanisms must be expressed in neural networks. However, all of the
mechanisms we will consider here could be expressed in neural network terms.
8
3. Generativist Theory
The third major approach to language acquisition is Chomsky’s generativist theory. At
the core of this view of language learning is the rejection of the behaviorist premises of
socialization theory. Chomsky (1965) has argued that natural language input is full of
retracings, errors, and slips of the tongue. Because language is degenerate in these various
ways, the child should find it difficult to acquire grammatical rules by simple induction
across the input. This analysis of the learning challenge (Chomsky, 1980) has been called
“Argument from Poverty of the Stimulus.” According to this argument, no child can build
an adult language out of such degenerate input without obeying by a rich set of species-
specific innate hypotheses. These hypotheses are encoded genetically as specifications for
the shape of the language organ. Some refer to this issue as the “Logical Problem of
Language Acquisition” (LPLA) (Baker, 1979), while others have called it “Plato’s
Problem”, “Chomsky’s Problem”, “Gold’s Problem”, or “Baker’s Paradox”.
The LPLA has served as the fundamental motivation for an enormous body of research
on both first and second language acquisition in the generativist framework. The
argumentation in this work takes one of two forms. The first form focuses on the process of
recovery from overgeneralization.
1. A linguistic structure is presented.
2. It is shown that children sometimes overgeneralize the use of this structure. This
demonstrates that the structure is not being learned by rote.
3. It is argued that there is not enough evidence in the “stimulus” to force recovery
from overgeneralization.
4. Therefore, the observed recovery from overgeneralization must be due to some
innate mechanisms.
We can think of this argument as argument from recovery. We will refer to it as the LPLA
#1.
9
The second form of the nativist argument focuses on certain grammatical features that
the child putatively produces without any errors at all. For this second type of phenomenon
the argument is as follows:
1. A linguistic structure is presented.
2. It is shown that children use this structure without ever making mistakes.
3. This structure is shown to be so rare that children never encounter it in the input.
4. Therefore, the observed correct performance and avoidance of alternative incorrect
performances must be due to innate mechanisms.
We can think of this analysis as argument from lack of error. We will refer to it as the
LPLA #2.
Both of these forms of the LPLA are arguments from Poverty of the Stimulus.
However, they differ in terms of the nature of the child’s performances. In the first case, the
child produces errors and then recovers. In the second case, the child never makes an error
in the first place. LPLA #1 and LPLA #2 have played a very different role in the literature.
In his articulations of the theory of Principles and Parameters (P&P), Chomsky has tended
to emphasize the importance of lack of error and LPLA #2. However, empirical studies of
child language acquisition have tended more to emphasize recovery from overgeneralization
and LPLA #1. A failure to clearly distinguish these two very different lines of
argumentation has led to some confusion in this literature. Therefore, one of our goals here
will be to clarify this distinction and to analyze each of the arguments separately.
The LPLA #1: Recovery from overgeneralization
Although these two forms of the LPLA look at very different language phenomena, both
are grounded conceptually on a formal analysis presented by Gold (1967). Gold contrasted
two different language-learning situations: text presentation and informant presentation.
With informant presentation, the language learner can receive feedback from an infallible
informant regarding the grammaticality of every candidate sentence. This corrective
10
feedback is called “negative evidence” and it only requires that ungrammatical strings be
clearly identified as unacceptable. Whenever the learner formulates an overly general guess
about some particular linguistic structure, the informant will label the resulting structure as
ungrammatical and the learner will use this information to restrict the developing grammar.
In the case of text presentation, the learner only receives information on acceptable
sentences and no information regarding ungrammaticality is available. Gold showed that,
with only text presentation, languages with reasonably complex grammars, such as those
that have phrase structure rules, are not learnable. Nativists have then argued that, since
language is not learnable from input in this way, it must be innate in the sense that the child
must already have identified the basic shape of possible grammars before any learning
begins.
Gold’s proof is formulated in the terms of the abstract objects of recursive function
theory. However, it only takes a little rephrasing to see how the proof can be applied directly
to the actual process of language learning. The child can be viewed as the learner and the
adult can be viewed as the informant. It does not matter for Gold’s argument whether the
child or the adult is the source of a given string. What is important is only the shape of the
feedback associated with that string. In text presentation, no feedback can occur, so the
following interaction types are possible:
Utterance Feedback Result
1. Child says, “went.” none none
2. Child says, “*goed.” none none
3. Adult says, “went.” none positive data
The only information that the child receives in these sequences is positive data, since there is
no feedback regarding the child’s own productions. In sequence #1, there is no information
presented regarding the acceptability of “went.” However, sequence #3 does provide this
positive evidence by allowing the “text” to include acceptable sequences. In sequence #2,
there is no information presented regarding the unacceptability of “goed.” Moreover, the
11
adult “text presentation” will never produce the form “goed.” Therefore, the child has no
direct way of knowing that “goed” is ungrammatical.
Unlike text presentation, informant presentation provides feedback. The strongest form
of feedback is that which presents positive feedback for grammatical utterances and negative
feedback for ungrammatical utterances. If the child makes an error, it will be marked by a
signal from the adult. The adult can produce the error directly, along with information
signaling the fact that the error is ungrammatical. The provision of these signals is the
responsibility of the adult. In the informant presentation scenario, there are four types of
possible sequences:
Utterance Adult Feedback Result
1. Child says, “went.” Good Positive data
2. Child says, “*goed.” Bad Corrective feedback
3. Adult says, “went.” Good Positive data
4. Adult says, “*goed.” Bad Corrective feedback
There is no attested example in the literature of a sequence like #4 in which a parent
spontaneously produced a random error just to have an opportunity to mark it as
ungrammatical. Although such sequences never actually occur, they would fit in well with
the Gold framework, if they did. However, in Gold’s framework, a sequence like #2 is
functionally equivalent to a sequence like #4, so the absence of #4 does not affect the
analysis. In cases like sequence #3, the provision of positive feedback is not necessary,
since the child can reasonably assume that most forms produced by the adult are
grammatical. Of course, adults will occasionally make errors. However, on the level of the
lexical item and the construction, the notion that adult input is correct is a good working
assumption. To implement this fully, the child may need to filter out false starts and
retracings, and just store away words, constructions, and sentences that are clear, unretraced,
and fully comprehended. Once this is done, the child can then treat all remaining adult
forms as positive evidence.
12
With text presentation, if the learner formulates an overly general hypothesis, there is no
way to exclude that general hypothesis. Consider a very simple example in which the learner
is given a corpus of regular present and past tense verbs, along with a few verbs that have
irregular past tense forms. Using the regular past tense examples, the learner will induce a
grammar that adds “-ed” to the end of the present tense. This rule will then produce the
overgeneralized form “goed.” Without information regarding the ungrammaticality of
“goed,” the learner will never be able to recover from this overgeneralization and will never
learn to restrict the language to the smaller grammar that produces just “went.” Thus, the
grammar induced by this process will forever remain too big, since it will include both
“goed” and “went.”
correct grammar overly general grammar
wentjumped
goed runnedfalledwented
13
Gold showed that this problem occurs inevitably for the learning of all but the simplest
forms of language. If the set of languages being explored includes only finite languages
generated by finite-state machines (i.e., languages generated by regular, Markov processes),
text presentation is adequate. To see why this is true, consider a simple finite-state grammar
such Grammar (1):
(1)
A
B
C
D
endstart
This grammar will generate the strings ABD or AC. If we add the string ACD to the
positive evidence in the input, the grammar will add a new connection to permit the
additional string. The result will be Grammar (2):
(2)
A
B
C
D
endstart
Learning involves the addition of new connections or transitions between nodes and no
cutting or rewiring of old transitions. New positive strings always lead to the addition of
new transitions. There is no way for a finite grammar of this type to overgeneralize or
overgenerate, since it is simply an organized summary of the information in the input
strings. Basically, the learning of a finite-state grammar is a very conservative, data-based
process.
However, if the set of possible grammars that may be confronting the child includes all
possible finite grammars of this type as well as potentially at least one non-finite grammar,
Gold shows that the correct grammar cannot be induced from text presentation. For
14
example, one non-finite grammar that is consistent with the strings ABD, AC, and ACD is
Grammar (3):
(3) S -> AP + (BP)
AP -> A + (C)
BP -> (B) + D
The problem with this grammar is that it will also generate the ungrammatical string ACBD.
Since the learner will never be told that ACBD is ungrammatical, there will be no way to
reject the nonfinite grammar and no way to settle on the correct grammar indicated in (2).
Gold’s proof relates to the case in which a child is willing to consider all possible finite-
state grammars along with just one nonfinite grammar. One might object that this is a rather
bizarre limitation. However, Gold selected this configuration only to illustrate the problem in
its simplest form. One could equally well imagine that the child is examining the utility of
many alternative non-finite grammars, along with the basic finite-state summaries of the
input. If one allows the child to hypothesize multiple possible nonfinite grammars, the
problem only gets worse. In this second scenario, the child could induce Grammar (4)
(4) S -> AP + (BP) + (C) + (D)
AP -> A
BP -> BD
This second nonfinite grammar would generate illegal strings such as ABDCD or ADD. If
the child goes down the road of formulating all manner of non-finite grammars, it is difficult
to constrain this process to just a particular grammar. In fact, the child might well formulate
both (3) and (4) as alternatives. Given this, and given the a priori commitment to view
language identification as deterministic, many linguists and psycholinguists have accepted
Gold’s analysis and used it as the foundation stone upon which to build further analyses.
When coupled with certain additional forms of argumentation, this logical problem of
language acquisition (LPLA) has functioned as a major conceptual pillar supporting current
15
work in generative linguistics, language acquisition theory, and second language acquisition
theory.
Solving the LPLA #1 through conservatism
The most direct way for a language learner to solve Gold’s problem is to avoid
formulating overly general grammars in the first place. If the child never overgeneralizes,
there is no problem of recovery from overgeneralization and no need for negative evidence
or corrective feedback. In the examples presented above, the conservative child would avoid
formulating Grammar (4) and never go beyond a finite-state grammar. To insure that this
happens, the child simply has to avoid constructing a grammar with greater than finite-state
complexity.
This first solution to the LPLA #1 emphasizes the child’s obedience to the Subset
Principle of Angluin (1980) or Fodor and Crain (1987). The Subset Principle requires the
child to avoid overgeneralization by always sticking with the most conservative grammar. It
stipulates that grammars are ordered in a subset relation such that the child explores the
more restrictive grammar first before even considering the less restrictive one. In essence,
the Subset Principle says that the child is conservative.
Virtually all accounts of language learning assume some degree of conservatism in the
child’s approach to rule induction. Many children are able to avoid falling into the trap of
overgeneralization by using linguistic forms cautiously and conservatively. For example, if a
child avoids using a verb with dative movement until that verb is detected in a sentence with
dative movement, dative movement overgeneralization will never occur. Conservative learners
can learn without negative evidence, because they never make errors. This means that they
never actually go beyond the data given. Baker (1981), Fodor and Crain (1987), Maratsos,
Kuczaj, Fox, and Chalkley (1979) and others have emphasized the extent to which syntactic
learning can proceed conservatively, often avoiding the need for negative evidence. Wolfe
Quintero (1992) has shown that conservatism can be used to account for learner acquisition
16
of the sentence patterns that have been used to motivate the subjacency constraint and its
related parameter. For example, she notes that second language learners acquire these
positive contexts for wh-movement in this order:
What did the little girl hit __ with the block today?
What did the boy play with __ behind his mother?
What did the boy read a story about __ this morning?
Because they are proceeding conservatively, learners never produce forms such as:
*What did the boy with ___ read a story this morning?
They never hear this structure in the input and never hypothesize a grammar that includes it.
As a result, they never make overgeneralizations and never attempt wh-movement in this
particular context. Data from Maratsos, Kuczaj, Fox, and Chalkley (1979) suggest that this
same analysis may also apply to first language learners.
Many child language researchers have emphasized the importance of item-based
constructions (Braine, 1976; Lieven, Pine, & Baldwin, 1997; MacWhinney, 1975, 1982;
Tomasello, 1992) in acquisition. If the child formulates and applies these patterns
conservatively, overgeneralization will be minimized. For example, a common
overgeneralization at age 3 involves the frequent verb “say.” Children will ask parents to
“say me that story” instead of “tell me that story.” However, conservative children will not
make this error, since they will only use the verb “say” in exactly the way it was used in the
input. In the terms of MacWhinney (1982; 1988), conservative children will learn a finite-
state transition network centered on the lexical item “tell.” This network accepts (or
generates) an NP in the role of “speaker” in preverbal position, an NP in the role of
“listener” in postverbal position, and an NP in the role of “story” in the post-postverbal
slot. A second network is used to produce the periphrastic dative, as in “tell that story to
me.” These two networks can then be joined into a single item-based finite-state grammar
that operates on narrowly defined lexical categories. Children can learn this item-based
grammar using positive data only. They can also learn a similar network for the verb “say.”
17
However, for that network, there is only the periphrastic dative. Moreover, for the verb
“say,” the category of the NP in postverbal position is defined semantically as a short
verbalization, rather than a longer story. This means that to minimize the possibility of error
here, the child has to be conservative in three ways:
1. The child needs to formulate each syntactic combination as an item-based pattern.
2. Each item-based pattern needs to record the exact semantic status of each positive
instance of an argument in a particular grammatical configuration (MacWhinney,
1988).
3. Attempts to use the item-based pattern with new arguments must be closely guided
by the semantics of previously encountered positive instances.
If the child has a good memory and applies this method cautiously, overgeneralization will
be minimized.
Conservatism can be viewed as a powerful mechanism for addressing the LPLA.
However, it is better understood as one of several crucial supports for successful
acquisition. Children will eventually go “beyond the information given” and produce the
occasional error (Jespersen, 1922). However, by blending a certain level of conservatism
with other supports for successful acquisition, the child can make optimal progress in
language learning.
Solving the LPLA #1 by recovering from overgeneralization
Even if the child minimizes error through conservatism, successful learning will require
some form of negative evidence. The logic of Gold’s proof cannot be avoided. When the
child overgeneralizes, some force must prune back that overgeneralization. However,
researchers (Marcus, 1993) have often mistakenly assumed that negative evidence is
equivalent to overt parental correction. This is only true if the learner has no ability to
construct secondary comparisons across the positive input. If we modify the Gold scenario
by providing the learner with the ability to construct searches across the input, there are at
18
least four ways to compute negative evidence from positive instances. These four processes
are: competition, cue construction, monitoring, and indirect negative evidence.
1. Competition
Psychological theories have often referred to the notion of competition (Freud, 1958;
Herbart, 1891). In the area of language acquisition, MacWhinney (1978) used competition
to account for the interplay between “rote” and “analogy” in learning morphophonology.
This mechanism was later generalized to all levels of linguistic processing in the form of the
Competition Model (MacWhinney, 1988; MacWhinney & Bates, 1989). In the 1990s, the
Competition Model was further elaborated in terms of neural network theory.
The Competition Model views overgeneralizations as arising from three types of
pressures. The first is the underlying analogic pressure that produces the overgeneralization.
The second pressure is the growth in the rote episodic auditory representation of a correct
form. This representation slowly grows in strength over time, as it is repeatedly
strengthened through encounters with the input data. The third pressure is the competition
of analogy with rote. Consider the case of “*goed” and “went” viewed diagrammatically.
The overgeneralization “goed” is supported by analogy. It competes against the weak rote
form “went” which is supported by auditory memory:
go + PAST
went go + edcompetition
analogicpressure
episodic/rotesupport
As the strength of the rote auditory form for “went” grows, it begins to win out in the
competition against the analogic form “*goed”. Finally, the error is eliminated.i
Saxton (1997) has emphasized the ways in which competition operates directly during
conversation. He argues that, “When the child produces an utterance containing an
19
erroneous form, which is responded to immediately with an utterance containing the correct
adult alternative to the erroneous form (i.e. when negative evidence is supplied), then the
child may perceive the adult form as being in contrast with the equivalent child form.
Cognizance of a relevant contrast can then form the basis for perceiving the adult form as a
correct alternative to the child form.” (p. 155). Saxton refers to this juxtaposition as the
Direct Contrast hypothesis. A paradigmatic example of a Direct Contrast exchange for
Saxton would be:
Child: Well, I feeled it.
Adult: I felt it.
Child: I felt it.
As Saxton notes, the child is aware of the existence of both “felt” and “feeled” and uses
the parental data to reinforce the strength of the former. Thus, Saxton’s Direct Contrast
account is equivalent to the Competition Model account (MacWhinney, 1993). Further
implementing this concept, Saxton (1997; 1998) has conducted training experiments with
novel irregular past tense forms. His studies clearly demonstrate the efficacy of providing
correct models that are closely tuned to the child’s own productions (Bohannon et al., 1990;
Bohannon & Stanowicz, 1988).
If the learner is sufficiently conservative, learning will be close to error free. In this
account, conservatism works by placing relatively more reliance on episodic/rote support
and discounting the influences of analogic pressure. Errors will only occur in cases where
analogy is strongly in competition with rote. Generalizing away from the particular example
given above, the general schema for competition looks like this:
meaning
word wordcompetition
analogicpressure
episodicsupport
20
The competition between two candidate forms is governed by the strength of their episodic
auditory representations. In the case of the competition between “*goed” and “went”, the
overgeneralized form has little episodic auditory strength, since it is heard seldom if at all in
the input. Although “*goed” lacks auditory support, it has strong analogic support from
the general pattern for past tense formation (MacWhinney & Leinbach, 1991). In the
Competition Model, analogic pressure stimulates overgeneralization and episodic auditory
encoding reins it in. The analogic pressure hypothesized in this account has been described
in detail in several connectionist models of morphophonological learning. The models that
most closely implement the type of competition being described here are the models of
MacWhinney and Leinbach (1991) for English and MacWhinney, Leinbach, Taraban, and
McDonald (1989) for German. In these models, there is a pressure for regularization
according to the general pattern that produces forms such as “*goed” and “*ranned”. In
addition, there are weaker gang effects that lead to overgeneralizations such as “*stang” for
the past tense of “sting”.
Morphological Competition
Bowerman (1987) has suggested that recovery from overgeneralizations such as
“*unsqueeze” is particularly problematic for a Competition Model account. To make this
example concrete, let us imagine that “*unsqueeze” is being used to refer to the voluntary
opening of a clenched fist. In this case, likely competitors include “release” or “let go.”
Because there is no rote auditory support for “*unsqueeze,” forms like “release” or “let
go” will eventually compete against and eliminate this particular error.
Several semantic cues support this process of recovery. In particular, inanimate objects
such as rubber balls and sponges cannot be “*unsqueezed” in the same way that they can
be “squeezed.” Squeezing is only reversible if we focus on the action of the body part
doing the squeezing, not the object being squeezed. Or consider the competition between
21
“*unapprove” and “disapprove”. We might imagine that a mortgage loan application that
has been initially approved can then be subsequently “unapproved.” At that point, we
would still not have heard “unapproved” actually supported by input data, but there would
be less direct competition with “disapprove.” Forces that minimize the competition between
meanings can help an overgeneralization survive long enough for it to begin to carve out its
own “ecological niche” (MacWhinney, 1989).
Lexical Competition
The same logic that can be used to account for recovery from morphological
overgeneralizations can be used to account for recovery from lexical overgeneralizations.
For example, a child may overgeneralize the word “kitty” to refer to tigers and lions. The
child will eventually learn the correct names for these animals and restrict the
overgeneralized form. The same three forces are at work here: analogic pressure,
competition, and episodic encoding. Although the child has never actually seen a “kitty”
that looks like a tiger, there are enough shared features to license the generalization. If the
parent supplies the name “tiger.” there is a new episodic encoding which then begins to
compete with the analogic pressure. If no new name is supplied, the child may still begin to
accumulate some negative evidence, noting that this particular use of “kitty” is not yet
confirmed in the input.
Merriman (1999) has shown how the linking of competition to a theory of attentional
focusing can account for the major empirical findings in the literature on Mutual Exclusivity
(Markman, 1989), or the tendency to treat each object as having only one name. By treating
this constraint as an emergent bias, we avoid a variety of empirical problems (MacWhinney,
1991). Since competition is implemented probabilistically through fuzzy logic (Massaro,
1987) or connectionist nets, it only imposes a bias, rather than a fixed constraint. The
probabilistic basis for competition allows the child to deal with hierarchical category
structure without having to enforce major conceptual reorganization (Carey, 1985).
22
Competition may initially lead a child to avoid referring to a “robin” as a “bird,” since the
form “robin” would be a direct match. However, sometimes “bird” does not compete
directly with “robin.” These include reference to a collection of different types of birds that
may include robins, reference to an object that cannot be clearly identified as a robin, or
anaphoric reference to an item that was earlier mentioned as a “robin.”
Syntactic Frame Competition
Overgeneralizations in syntax arise when a valency pattern common to a large group of
verbs is incorrectly overextended to a new verb. This type of overextension has been
analyzed in both distributed networks (Miikkulainen & Mayberry, 1999) and interactive
activation networks (MacDonald, Pearlmutter, & Seidenberg, 1994; MacWhinney, 1987).
These networks demonstrate the same gang effects and generalizations found in networks
for morphological forms (Plunkett & Marchman, 1993) and spelling correspondences
(Plaut, McClelland, Seidenberg, & Patterson, 1996). If a word shares a variety of semantic
features with a group of other words, it will be treated syntactically as a member of the
group.
Consider the example of overgeneralizations of dative movement. Verbs like “give”,
“send”, and “ship” all share a set of semantic features involving the transfer of an object
through some physical medium. In this regard, they are quite close to a verb like “deliver”
and the three-argument group exerts strong analogic pressure on the verb “deliver”.
However, dative movement only applies to certain frequent, monosyllabic transfer verbs and
not to multisyllabic, Latinate forms with a less transitive semantics such as “deliver” or
“recommend.” When children overgeneralize and say, “Tom delivered the library the
book,” they are being influenced by the underlying analogic pressure of the group of
transfer verbs that permit dative movement. In effect, the child has created a new argument
frame for the verb “deliver.” The first argument frame only specifies two arguments – a
subject or “giver” and an object or “thing transferred.” The new lexical entry specifies
23
three arguments. These two homophonous entries for “deliver” are now in competition,
just as “*goed” and “went” were in competition. Like the entry for “*goed”, the three-
place entry for “deliver” has good analogic support, but no support from episodic
encoding derived from the input. Over time, it loses in its competition with the two-argument
form of “deliver” and its progressive weakening along with strengthening of the competing
form leads to recovery from overgeneralization. Thus, the analysis of recovery from “Tom
delivered the library the book” is identical to the analysis of recovery from “*goed”.
2. Cue construction
Most recovery from overgeneralization relies on competition. However, competition will
eventually encounter limits in its ability to deal with the fine details of grammatical patterns.
To illustrate these limits, consider the case of recovery from causative overgeneralizations
such as “*I untied my shoes loose”. This particular extension receives analogic support
from verbs like “shake” or “kick” which permit “I shook my shoes loose” or “I kicked
my shoes loose.” It appears that the child is not initially tuned in to the fine details of these
semantic classifications. Bowerman (1988) has suggested that the process of recovery from
overgeneralization may lead the child to construct new features to block overgeneralization.
We can refer to this process as “cue construction.”
Recovering from other causative overgeneralizations may also require cue construction.
For example, an error such as “*The gardener watered the tulips flat” can be attributed to a
derivational pattern which yields three-argument verbs from “hammer” or “rake”, as in
“The gardener raked the grass flat.” Source-goal overgeneralization can also fit into this
framework. Consider, “*The maid poured the tub with water” instead of “The maid
poured water into the tub” and “*The maid filled water into the tub” instead of “The maid
poured water into the tub”. In each case, the analogic pressure from one group of words
leads to the establishment of a case frame that is incorrect for a particular verb. Although
this competition could be handled just by the strengthening of the correct patterns, it seems
24
likely that the child also needs to clarify the shape of the semantic features that unify the
“pour” verbs and the “fill” verbs.
Bowerman (personal communication) provides an even more challenging example. One
can say “The customers drove the taxi driver crazy,” but not “*The customers drove the
taxi driver sad.” The error involves an overgeneralization of the exact shape of the
resultative adjective. A connectionist model of the three-argument case frame for “drive”
would determine not only that certain verbs license a third possible argument, but also what
the exact semantic shape of that argument can be. In the case of the standard pattern for
verbs like “drive”, the resultant state must be terminative, rather than transient. To express
this within the Competition Model context, we would need to have a competition between a
confirmed three-argument form for “drive” and a looser overgeneral form based only on
analogic pressure. A similar competition account can be used to account for recovery from
an error such as, “*The workers unloaded the truck empty” which contrasts with “The
workers loaded the truck full”. In both of these cases, analogic pressure seems weak, since
examples of such errors are extremely rare in the language learning literature.
The actual modeling of these competitions in a neural network will require detailed
lexical work and extensive corpus analysis. A sketch of the types of models that will be
required is given in MacWhinney (1999a).
3. Monitoring
The Competition Model holds that, over time, correct forms gain strength from
encounters with positive exemplars and that this increasing strength leads them to drive out
incorrect forms. In the terms of Gold’s analysis, this strengthening of correct forms can
guarantee the learnability of language. However, by itself, competition does not fully
account for the dynamics of language processing in real social interactions. Consider a
standard self-correction such as “I gived, uh, gave my friend a peach.” Here the correct
form “gave” is activated in real time just after the production of the overgeneralization.
25
MacWhinney (1978) and Elbers (1993) have treated this type of self-correction as involving
“expressive monitoring” in which the child listens to her own output, compares the correct
weak rote form with the incorrect overgeneralization, and attempts to block the output of the
incorrect form. One possible outcome of expressive monitoring is the strengthening of the
weak rote form and weakening of the analogic forms. Exactly how this is implemented will
vary from model to model
In general, retraced false starts move from incorrect forms to correct forms, indicating
that the incorrect forms are produced quickly, whereas the incorrect rote forms take time to
activate. Kawamoto (1994) has shown how a recurrent connectionist network can simulate
exactly these timing asymmetries between analogic and rote retrieval. For example,
Kawamoto’s model captures the experimental finding that incorrect regularized
pronunciations of “pint” to rhyme with “hint” are produced faster than correct irregular
pronunciations.
An even more powerful learning mechanism is what MacWhinney (1978) called
“receptive monitoring.” If the child shadows input structures closely, he will be able to pick
up many discrepancies between his own productive system and the forms he hears. Berwick
(1987) found that a great deal of syntactic learning can be driven by the attempt to extract
meaning during comprehension. Whenever the child cannot parse an input sentence, the
failure to parse can be used as a means of expanding the grammar. The kind of analysis
through synthesis that occurs in some parsing systems can make powerful use of positive
instances to establish new syntactic frames. Receptive monitoring can also be used to
recover from overgeneralization. The child may monitor the form “went” in the input and
attempt to use his own grammar to match that input. If the result of the receptive monitoring
is “*goed”, the child can use the mismatch to reset the weights in the analogic system to
avoid future overgeneralizations.
Neural network models that rely on back-propagation assume that negative evidence is
continually available for every learning trial. This assumption is clearly much too strong.
26
However, not all connectionist models rely on the availability of negative evidence. For
example, Kohonen’s self-organizing feature map model (Miikkulainen, 1993) learns
linguistic patterns simply using cooccurences in the data with no reliance on negative
evidence.
4. Indirect Negative Evidence
Another interesting approach to the LPLA involves the examination of the input corpus
to compute indirect negative evidence. This computation can be illustrated with the error
“*goed.” To construct indirect negative evidence in this case, children need to track:
1. The frequency of all verbs.
2. The frequency of the past tense as marked by the regular “-ed.”
3. The ratio of (2) over (1).
4. The frequency of the verb “go.”
5. The predicted frequency of the form “*goed” as the product of (3) times (4).
6. The actual frequency of “*goed” in the input.
If (5) exceeds (6) by some specified threshold, then children can conclude that the form
“*goed” is excluded by the grammar. They can do this without ever receiving overt
correction from the informant.
Arguments based on this analysis have been offered by Chomsky (1981), Lasnik
(1989), Braine (1989) and others. In logical terms, indirect negative evidence is an
interesting solution to the LPLA. However, there is little actual evidence that children keep
track of the facts they would need to perform this computation. For elements (1) and (2)
above, it might be sufficient to only track the relative frequency of the present and the past
for a few core verbs. However, some frequency tracking of the general class must be done.
A neural network model or some other generalization mechanism could compute (3) and
(5). Moreover, the frequency tracking in (4) and (6) is something that most learning models
will have to assume in any case. The real question for this approach is whether children
27
actually compute anything like (1) and (2). Recent evidence for a slow rise in generalization
abilities before age 3 (Pine, Lieven, & Rowland, 1998; Tomasello, 2000) suggests that
indirect negative evidence might well be available to older children, but probably not to
younger children.
Interestingly, the structures for which indirect negative evidence provides the most
useful accounts are ones that are learned rather late. These typically involve the LPLA #2,
rather than the LPLA #1. For example, the learner could compute indirect negative evidence
that would block wh-raising from object-modifying relatives in sentences such as:
The police arrested the thieves who were carrying the loot.
*What did the police arrest the thieves who were carrying?
To do this, they would need to track the frequency of sentences such as:
Bill thought the thieves were carrying the loot.
What did Bill think the thieves were carrying?
Noting that raising from predicate complements occurs fairly frequently, children can
reasonably conclude that the absence of raising from object modification position means
that it is ungrammatical. Coupled with conservatism, indirect negative evidence could be a
powerful mechanism for avoiding overgeneralization of complex structures syntactic
structures. Unfortunately, we have little direct evidence demonstrating that either children or
adults compute indirect negative evidence in the way suggested above. One problem faced
by the indirect negative evidence account is that the child would need to know beforehand
which structures to include in the ratio. For example, the child would need to know that the
frequency of raising in relatives needs to be compared with the frequency of raising in
complements. However, if learning is item-based, as suggested earlier, this comparison
could be restricted to structures potentially involving a particular lexical item such as
“what” or “where.” This suggests that the computation of indirect negative evidence may
be partially linked to the same item-based mechanisms that support conservatism.
28
The Competition Model account can also be extended to compute indirect negative
evidence. The indirect negative evidence tracker could note that, although “squeeze” occurs
frequently in the input, “*unsqueeze” does not. Diagrammatically, this mechanism works
through the juxtaposition of a form receiving episodic support (“squeeze”) with a predicted
inflected form (“unsqueeze”).
squeeze (unsqueeze)
episodic/rotesupport
analogicprediction
gap tracking
comparison gap prediction(unconfirmed)
This mechanism uses analogic pressure to predict the form “*unsqueeze.” This is the
same mechanism as used in the generation of “*goed.” However, the child does not need
to actually produce “*unsqueeze,” only to hypothesize its existence. This form is then
tracked in the input. If it is not found, the comparison of the near-zero strength of the
unconfirmed form “unsqueeze” with the confirmed form “squeeze” leads to the
strengthening of competitors such as “release” and blocking of any attempts to use
“unsqueeze.” Although this mechanism is plausible, it is more complicated than the basic
competition mechanism and places a greater requirement on memory for tracking of non-
occurrences. Since the end result of this tracking of indirect negative evidence is the same as
that of the basic competition mechanism, it is reasonable to imagine that learners use this
mechanism only as a fall back strategy, relying on simple competition for most problems
with overgeneralization.
Solving the LPLA #1 by recharacterizing the target
A less direct, by equally effective, method of solving the LPLA #1 involves a
recharacterization of the shape of the target grammar. Gold’s analysis shows that, if the
29
child hypothesizes a language with more than finite state complexity, negative evidence will
be needed to recover from overgeneralization. However, if we provide a characterization of
language that stays within the bounds set by this proof, then we can assume that children are
capable of learning language through simple positive data. In that case, the LPLA #1
essentially vanishes. There are five ways we can achieve this type of recharacterization. The
first involves the postulation of a set of innate constraints, as in Principles and Parameters
(P&P) Theory. A second involves the imposition of a strict ordering on the set of
constraints, as in Optimality Theory (OT). A third approach views constraints not as innate,
but as emergent. A fourth recharacterization involves providing alternative characterizations
of the formal shape of the target grammar. The fifth involves a recharacterization of the end-
state of language learning as probabilistic, rather than deterministic. Let us examine each of
these five recharacterizations.
1. Innate constraints
Generativists argue that children solve the LPLA by obeying innate constraints on the
shape of possible grammars that they consider. Viewed historically, the constraints imposed
by the child have played a large role in the development of generative theory. For example,
early on, generativists realized that, even with informant presentation, the child could not
learn a full transformational grammar of the type proposed in Chomsky (1957). The
problem at that time was a technical one, since the transformational component of the
grammar could be characterized and ordered in so many alternative ways that it was
essentially impossible to know which form was uniquely correct, even with negative
evidence. The solution was to constrain the shape and ordering of transformations
{Chomsky, 1973 #9492}. For example, permutations were eliminated, since they could be
formulated as combinations of additions and deletions.
Pursuing this line of thinking, Wexler and Culicover (1980) showed that constraints
such as subjacency could allow children to acquire a transformational grammar, as long as
30
some types of negative evidence were provided. Their demonstration depended on the fact
that subjacency limited the depth to which the child would have to track interrelations
between syntactic roles across clauses. Lightfoot (1989) then showed that the child could
acquire nearly all of the important rules of the language from non-embedded structures. He
called this degree-0 learnability.
Over the last four decades, each new version of generative grammar has brought with it a
new vision of the innate constraints that provide the child with prior guidance about the
shape of human language. In the 1980s, these constraints involved parameterized principles
contained in a series of modules. Children were thought to begin learning with the
parameters set for some default value and would only change this default setting if they
encountered some triggering linguistic structure (Jespersen, 1922; Matthews &
Demopoulos, 1989).
The learning of marked parameters in the theory of Principles and Parameters (P&P)
can avoid the LPLA #1 if three conditions are met. First, there must be a small set of
possible parameters constituting the set of possible human languages. Second, there must
be a clear specification of the unmarked settings of these parameters. Third, there must be a
clear specification of the surface structure triggers that would lead the child to move from an
unmarked parameter setting to a marked parameter setting for each of the hypothesized
parameters. Despite two decades of work within the framework of P&P, none of these three
conditions has yet been met. Nonetheless, researchers in the P&P tradition remain
optimistic about the program, as well as its newer articulation in the minimalist framework.
Chomsky (1981) has noted that the P&P view of language acquisition leads directly to a
trivial solution to the LPLA. However, there has not yet been any general acceptance of this
view among generative linguists (Osherson, Stob, & Weinstein, 1989) or child language
researchers (Pinker, 1984).
31
2. Strict constraint ordering
Like P&P, Optimality Theory (OT) views language structure as arising from the
application of a universal set of constraints. Learning a particular language is basically just
the learning of the correct ordering of the constraints in this universal set. The fullest
articulation of OT has been in the area of phonology, where Tesar and Smolensky (2000)
have offered a formal proof of the learnability of OT phonology without negative evidence.
Initially, one might think that this demonstration has little to say to the main line of
discussion of language learnability for grammar. However, OT has now also been applied to
syntax (Barbosa, Fox, Hagstrom, McGinnis, & Pesetsky, 1997). Moreover, as Pulleybank
and Turkel (1997) observe, OT faces the same learnability problems in phonology and
syntax.
Although both P&P and OT emphasize the role of constraints in typology and learning,
they are still generative grammars deep down. In P&P, it is assumed that the basic rules of
X-bar syntax and move-α operate to produce all possible structures. The constraints then
apply to filter out from the millions of impossible structures, the few that are actually
grammatical. In OT phonology, the same strategy applies. Each word begins in its
underlying form. Then all possible derivations through the phonological processes that
implement the constraints are applied. All those that violate highly ranked constraints are
thrown out. The single remaining form is the one that violates either no constraint or only
some very weak constraint.
In OT, learning the phonology of a language involves learning a specific ordering of the
universal constraints. Tesar and Smolensky (2000) show that, if one assumes no interaction
between constraints and a strict dominance ordering within each possible language, it is
possible to use a certain form of indirect negative evidence to learn which constraints should
be demoted based on particular data for a language. If a child learns a form from the input
in which constraint B takes precedence over constraint A, and if constraint A is ranked
above constraint B in the child’s current grammar, then the child will simply demote
32
constraint A on the basis of this positive evidence. This method works equally well for
learning either OT phonology or OT syntax.
Both OT and P&P achieve their ability to solve the LPLA at the expense of making
extremely strong claims about the shape of human language. Attempts to test simple
versions of P&P (Hyams, 1986) have not produced clear empirical (Liceras, 1989; Pizzuto
& Caselli, 1993; Valian, 1991) or conceptual (Truscott & Wexler, 1989) support. Direct
application of OT to child language leads to complex derivations (Bernhardt & Stemberger,
1998) and unclear predictive power. Moreover, the rigid ordering assumptions made in OT
seem to undercut its utility as a psycholinguistic theory.
3. Emergent constraints
Evidence that the child follows some general guidelines in recovering from
overgeneralization and avoiding errors can be interpreted as evidence for innate constraints.
However, it can equally well be explained through the operation of emergent constraints that
solidify during the process of language learning itself. In other words, the child can use
language learning to learn about the shape of language learning. In the next major section,
we will examine this possibility in detail.
4. Alternative formal analysis
Gold’s formulation of the LPLA rests on Chomsky’s formulation of relations between
types of grammars known as the Chomsky Hierarchy (Chomsky, 1963). Other formal work
has often presented alternative ways of understanding the shape of human language. By
refining or modifying the formal characterization of human language, these alternative
analyses can lead to markedly different consequences in the context of Gold’s analysis. We
can mention at least two analyses of this type, each of which presents an interesting solution
to the LPLA.
33
One solution to the LPLA strikes directly at the notion (Reich, 1969) that language
cannot be described by finite-state grammars. Hausser (1999) has developed a powerful
parser based on the use of left-associative grammar. He has shown that left-associative
grammar can be expressed as a finite-state grammar that orders words in terms of part-of-
speech categories. Because we know that finite-state grammars can be acquired from
positive evidence (Hopcroft & Ullman, 1979), this means that children should be able to
learn left-associative grammars directly without encountering the LPLA. Given the fact that
these grammars can parse sentences in a time-linear and psycholinguistically plausible
fashion, they would seem to be excellent candidates for further exploration by child
language researchers.
A second formal solution to the LPLA arises in the context of the theory of categorical
grammar. Kanazawa (1998) shows that a particular class of categorial grammars known as
the k-valued grammars can be learned on positive data within the Gold framework.
Moreover, he shows that most of the customary versions of categorial grammar discussed in
the linguistic literature can be included in this k-valued class. These attempts to
recharacterize the nature of human language by revised formal analysis all stand as useful
approaches to the LPLA. By characterizing the target language in a way that makes it
learnable by children, linguists help bridge the gap between linguistic theory and child
language studies.
5. Revised end-state criterion
A particularly powerful solution to the LPLA was proposed by Horning (1969), just
after the publication of the original Gold analysis. Horning showed that, if the notion of
language identification is treated in terms of a certain probability of identification, rather
than an absolute guarantee of no further error ever, then language may be identified on the
basis of positive evidence alone. It is surprising that this solution has not received more
attention. This crucial early demonstration undercuts the core logic of the LPLA, as it
34
applies to the learning of all rule systems up to the level of context-sensitive grammars. If
learning were deterministic, children would go through a series of attempts to hypothesize
the “correct” grammar for the language. Once they hit on the correct identification, they
would then remain correctly with this final guess forever. The fact that adults make speech
errors and differ in their judgments regarding at least some syntactic structures suggests
that this criterion is too strong and that the analysis provided by Horning is more realistic.
The LPLA #2: Errors children never make
Beginning in the early 1980s, workers in the generative tradition began to shift their
attention from the LPLA #1 to the LPLA #2. Realizing that there are many mechanisms
capable of achieving recovery from overgeneralization, this alternative shape of the LPLA
seemed to provide clearer and less ambiguous guidance for the discovery of the contents of
Universal Grammar. Argumentation in this area has centered on characterizing a set of
grammatical errors that English-speaking children never make. Failure to produce possible
errors is then used as evidence for the innateness of structural dependency, c-command and
the three binding conditions, subjacency, and the empty category principle. The basic form
of the argument has remained constant throughout various versions of the theories of
Government and Binding, Principles and Parameters, and Minimalism.
The analysis of non-occuring errors is not linked to the search for a set of parameters
within P&P. Because the erroneous setting of a parameter can lead to overgeneralization,
parameter setting data is relevant to the LPLA #1, not the LPLA #2. Data that are relevant to
LPLA #2 are those that show evidence of non-parameterized universals. The paradigm case
of argumentation based on the LPLA #2 is, instead, the child’s obedience to the Structural
Dependency condition, as presented by Chomsky in his formal discussion with Jean Piaget
(Piatelli-Palmarini, 1980, p. 40). Chomsky notes that children learn early on to move the
auxiliary to initial position in questions like “Is the man coming?” One possible
formulation of this movement rule looks only at the surface structure of a sentence like
35
“The man is coming” and formulates the question as moving the first auxiliary to initial
position. However, if children want to question the proposition given in (1), they will never
produce a movement such as (2). Instead, they will always produce (3).
1. The man who is first in line is coming.
2. Is the man who __ first in line is coming?
3. Is the man who is first in line __ coming?”
The movement of the auxiliary involves a movement of INFL to COMP that is subject to the
head movement constraint. In (2) the auxiliary would have to move around the N’ of
“man” and the CP and Comp of the relative clause, but this would be blocked by the head
movement constraint (HMC). No such barriers exist in the main clause. In addition, if the
auxiliary moves as in (2), it leaves a gap that will violate the empty category principle (ECP).
However, Chomsky’s analysis of this pattern does not rely on the details of the operation of
the ECP and the HMC. Chomsky simply argues that the child has to realize that phrasal
structure is somehow involved in this process and that one cannot formulate the rule of
auxiliary movement as “move the first auxiliary to the front.”
This restriction on auxiliary movement is called “structural dependency.” Chomsky
claims that, “A person might go through much or all of his life without ever having been
exposed to relevant evidence, but he will nevertheless unerringly employ the structure-
dependent generalization, on the first relevant occasion.” A more general statement of this
type provided by Hornstein and Lightfoot (1981) who claim that, “People attain knowledge
of the structure of their language for which no evidence is available in the data to which they
are exposed as children.” As Pullum (1996) has noted, a major problem with Chomsky’s
analysis in this case is the fact that children do indeed hear sentences such as “The child
who is first in line is getting the prize” or “The child who is first in line will get the prize.”
A conservative child can easily hold off on producing auxiliary movement in complex
sentences until hearing one or two sentences with the needed positive evidence.
36
Pullum’s analysis, although technically accurate, seems to miss the essence of
Chomsky’s point. First, it is certainly true that sentences such as (1) are extremely rare in
the input to children. In a search of the input to the three children studied by Brown (1973),
I found no such sentences. Sentences of this type may well appear in the Wall Street
Journal corpus studied by Pullum, but they are rare in the input to children. Second, it
would seem counter-intuitive to argue against Chomsky’s basic point. The structural
dependency condition only requires that the child pay attention to the relations between
words, rather than just their serial order. Behaghel (1923) pointed out that words that are
meaningfully related typically appear next to each other. Some appreciation of this principle
must certainly be basic to both auditory and visual processing across species and is not in
disagreement with any of the fundamental tenets of an emergentist view of learning.
Although Chomsky may have overstated this argument a bit, it is difficult to imagine a
language learner who does not pay some attention to conceptual structure. Given this
general ability to represent conceptual structure, it seems fair enough to wonder what kind
of child would even consider producing a sentence such as “Is the man who first in line is
coming?”
The theory of item-based learning (MacWhinney, 1975, 1982, 1988) supports
Chomsky’s analysis. In that theory, the syntactic positions of arguments are specified in
relation to the predicates with which they cluster. Children learn the positioning of the
auxiliary marking a yes-no question on an item-by-item basis. For each yes-no auxiliary,
children learn that it must appear in preinitial position (before the subject NP). As several of
these yes-no auxiliary item-based patterns accumulate, they form a gang, which then
constitutes an emergent construction (Goldberg, 1999). This learning is driven by positive
evidence. When the child first needs to form a question on the basis of (1), the available
device is therefore one that is formulated in terms of relations, not positions and (3) is
produced, instead of (2). Thus, both an item-based account and a Chomskyan account agree
on the importance of structural dependency. However, the item-based account views the
37
particular implementation of structural dependency in this case as emergent from earlier
item-based learning.
This analysis of a solution to a particular instance of the LPLA #2 relied on positive
evidence, conservative item-based learning, and competition. The mechanisms of monitoring
and indirect negative evidence can provide additional support for (3) over (2). In general, all
of the mechanisms that we discussed in terms of our solution of the LPLA #1 apply with
equal strength to the LPLA #2. Let us consider how these processes apply to some of the
other standard arguments based on the LPLA #2.
One constraint that has a clear impact on adult English is the complex-NP constraint
(Ross, 1974) or head movement constraint that blocks movement of a noun from a relative
clause as in (4) and (5).
4. * Who did John believe the man that kissed __ arrived
5. Who did John believe __ kissed his buddy?
The problems that we have with such sentences like (4) can be viewed in processing terms
(O’Grady, in press). Verbs like “believe” encourage the initial wh-word to continue its
search for a gap in as long, as they are expecting complements, as in (5). However, when the
expectation for a complement is blocked by the presence of a complex NP as direct object,
the usual complement-based filler strategy is thrown for a loop. It is important to realize that
what causes the problem is the ambiguity after the verb, not the time taken to find a gap. For
example, we can compare (6) in which a gap is found right away with (7) in which it is
found later.
6. Who could my friends have asked __ to take the biscuits to Tom last week?
7. Who could my friends have asked us to take the biscuits to Tom for __ last week?
Neither of these causes problems, because the cues for continuing the search are clear. The
complex-NP constraint also blocks movement from prepositional phrases and other
complex NPs, as in
8. * Who did pictures of ___ surprise you?
38
9. * What did you see a happy ___ ?
10. * What did you stand between the wall and ___ ?
The constraint in (8) has also been treated as the coordinated-NP constraint in some
accounts. Although it appears that most children obey these constraints, there are some
exceptions. Wilson and Peters (1988) present these violations of the complex NP constraint
from Wilson’s son Seth:
what am I cooking on a hot __ ? (-- stove)
what did I get lost at the __ , Dad?
what are we gonna look for some __ ? (houses)
what is this a funny __ , Dad?
what are we gonna push number __ ? (9)
where did you pin this on my __ ? (robe)
what are you shaking all the __ ? (batter and milk)
what is this medicine for my __ ? (cold)
what are we gonna go at Auntie and __ ? (priya - name of babysitter)
Nearly all of these violations involve movement of a noun modified by an adjective. It
appears that Seth had in fact learned to produce these violations almost as a game.
Nonetheless, it is interesting to see that this putatively universal principle could be so easily
violated by a young child.
In my own recording of my sons Ross and Mark, I only observed a very few violations.
One occurred when my son Mark was 5;4.4. He said (out of the blue as it were): “Dad,
next time when it's Indian Guides and my birthday, what do you think a picture of ___
should be on my cake?” Catherine Snow reports that at age 10;10, her son Nathaniel said,
“I have a fever, but I don't want to be taken a temperature of.” Most researchers would
agree that violations are rare. However, the structures that might trigger violations are also
rare.
The binding theory (Chomsky, 1981) focused quite heavily on a set of three proposed
universal conditions on the binding of pronouns and reflexives to referents. Sentence (11)
illustrates two of the constraints. In (11), “he” cannot be coreferential with “Bill” because
39
“Bill” does no c-command the pronoun. At the same time, “himself” must be coreferential
with “Bill” because it is a clausemate and does c-command “Bill.”
11. He said that Bill hurt himself.
When attempting to apply the LPLA to the study of the binding constraints, it is important
to remember that the sentences produced or interpreted are fully grammatical. However, one
of the possible interpretations is disallowed by the universal constraints. This means that, to
study the imposition of the constraints, researchers must rely on comprehension studies.
As an example of the studies conducted during this period, consider this example from a
study of long-distance movement of adjuncts by de Villiers, Roeper, and Vainikka (1990).
Children were divided into two age groups: 3;7 to 5;0 and 5;1 to 6;11. They were given
sentences such as:
12. When did the boy say he hurt himself?
13. When did the boy say how he hurt himself?
14. Who did the boy ask what to throw?
For (12), 44% gave long distance interpretations, associating “when” with “hurt himself”.
For (13), with a medial wh-phrase blocking a long-distance interpretation, only 6% were
long-distance responses. So children were sensitive to the conditions on traces, in accord
with P&P theory. However, it appears that this sensitivity develops over time. In the
youngest group, children had trouble even understanding sentences with medial arguments
like (14). The fact that this ability improves over time suggests that there may well be
learning occurring for the easier patterns such as (12) at an earlier age.
The argument in this particular case is very different from Chomsky’s argument
regarding the structure dependency constraint. In this case, we know that children
themselves actually produce sentences with these structures. De Villiers et al report these
instances from Brown’s subject Adam:
What chu like to have? – 30 months
What you think this look like? – 30 months
What he went to play with? – 31 months
40
What do you think the grain is going to taste like? – 55 months
The question is when are children able to construct the two interpretations for (12) and
when do they realize that only one of these interpretations is available for (12)? The P&P
answer is that this depends on parameter-setting. First, the child must realize that their
language allows movement, unlike Chinese. Next they must decide whether the movement
can be local, as in German, or both local and distant as in English. Finally, they must decide
whether the movement is indexed by pronouns, traces, or both. However, once a parameter-
setting account is detailed in this way, it can be difficult to distinguish it from a learning
account. Using positive evidence, children can first learn that some movement can occur.
Next, they can learn to move locally and finally they can acquire the cues to linking the
moved argument to its original argument position, one by one. In learning these structures,
children must be sensitive to complex syntactic configurations. This means that any learning
account must provide a large role for syntactic structure and provide mechanisms that are
capable of acquiring complex patterns.
Implications
The study of the LPLA provided a useful focus for child language research in the 1970s
and 1980s. However, the use of the LPLA #1 as a way of guiding research has not kept
pace with advances in theory, experimentation, and observation. We now know that recovery
from overgeneralization is supported by a set of five powerful processes that effectively
solve the LPLA #1. The process of recovery from overgeneralization continues to be an
important research topic, but it is not longer appropriate to conduct this investigation within
the narrow conceptual focus of the LPLA #1.
The LPLA #2 has more life in it. Human language is the result of a long, gradual
process of evolution (MacWhinney, in press). This process has provided us with some clear
ideas about the possible shapes of sounds, words, and sentences in language. These ideas
41
are grounded primarily on facts about our body (MacWhinney, 1999b) and general
processes in cognition, perception, and action. By pursuing the study of error-free
acquisition in the context of the LPLA #2, we can hope to shed light on these universals.
However, we need to conduct this study in the context of an integrated account that derives
insights from each of the major competing visions.
How can we unite the insights of the three major competing views of language
development to derive a fuller, more satisfying account? One framework for producing this
integration is provided by the concept of emergentism (MacWhinney, 2001). Emergentism
views language structure are emerging from processes operating on six different time scales,
including phylogeny, embryology, development, online processing, and diachronics.
Emergentism in the area of language acquisition commits itself to providing a neurologically
and socially grounded mechanistic account of the interaction of these forces. This means
that any integration of the three competing visions must occur on the level of neural
mechanism and the body. Constructing this account is currently a goal, rather than an
achieved reality (Elman, Bates, Plunkett, Johnson, & Karmiloff-Smith, 1996).
One way to begin building this integration is to look at how socialization processes
interact with specific learning mechanisms. In the Competition Model, children rely on
stored auditory representations to recover from overgeneralization. These stored
representations are in fact delayed traces of interactions with adults. This means that an
integrated emergentist theory needs to understand the ways in which adults can assist the
child in acquiring accurate stored auditory forms. One way in which a parent can do this is
through recasting. Marcus (1993) have suggested that parents are inconsistent in their
provision of negative evidence to the child. However, there is abundant evidence that parents
can provide finely tuned, sensitive input (Snow, 1995). This suggests that what is important
to the child is not the provision of negative evidence, but the sensitive provision of finely
tuned positive evidence in accord with the Competiton Model analysis. As Merriman (1999)
has argued, successful learning depends on the child being able to attend to the objects and
42
actions being discussed. Tomasello (1999) has also emphasized the role of joint attention
and mutual understanding in language learning. Careful examination of the impact of these
social frameworks on language learning can further clarify the processes of recovery from
overgeneralization.
One promising avenue for developing an emergentist account would integrate analyses
and findings from generative theory with the theory of item-based learning. The clearer
separation of phrasal structure, lexicon, and processing through unification that Chomsky
has articulated in the current Minimalist Program matches up in some ways with the claims
of item-based learning and Construction Grammar. However, there is not yet a fully
powerful way of simulating item-based learning in neural networks (MacWhinney, 1999a).
This means that major advances must be achieved in learning theory models to properly
model the actions of an item-based processor. In summary, the successful construction of
an integrated emergentist account of error-free learning will require major conceptual
advances in each of the three major competing visions of human language learning.
References
Anderson, N. (1982). Methods of information integration theory. New York: Academic
Press.
Angluin, D. (1980). Inductive inference of formal languages from positive data. Information
and Control, 45, 117-135.
Baker, C. L. (1979). Syntactic theory and the projection problem. Linguistic Inquiry, 10,
533-581.
Baker, C. L., & McCarthy, J. J. (Eds.). (1981). The logical problem of language
acquisition. Cambridge: MIT Press.
43
Barbosa, P., Fox, D., Hagstrom, P., McGinnis, M., & Pesetsky, D. (Eds.). (1997). Is the
best good enough: Optimality and competition in syntax. Cambridge, MA: MIT
Press.
Behaghel, O. (1923). Deutsche Syntax. Heidelberg: Winter.
Bernhardt, B., & Stemberger, J. (1998). Handbook of phonological development. San
Diego, CA: Academic.
Berwick, R. (1987). Parsability and learnability. In B. MacWhinney (Ed.), Mechanisms of
Language Acquisition. Hillsdale, NJ: Lawrence Erlbaum Associates.
Bohannon, N., MacWhinney, B., & Snow, C. (1990). No negative evidence revisited:
Beyond learnability or who has to prove what to whom. Developmental Psychology,
26, 221-226.
Bohannon, N., & Stanowicz, L. (1988). The issue of negative evidence: Adult responses to
children's language errors. Developmental Psychology, 24, 684-689.
Bowerman, M. (1987). Commentary. In B. MacWhinney (Ed.), Mechanisms of language
acquisition. Hillsdale, N.J.: Lawrence Erlbaum Associates.
Bowerman, M. (1988). The "no negative evidence" problem. In J. Hawkins (Ed.),
Explaining language universals (pp. 73-104). London: Blackwell.
Braine, M. D. S. (1976). Children's first word combinations. Monographs of the Society for
Research in Child Development, 41, (Whole No. 1).
Braine, M. D. S. (1989). Modeling the acquisition of linguistic structure. In Y. Levy & I.
Schlesinger & M. Braine (Eds.), Categories and processes in language acquisition
(pp. 217-259). Hillsdale, NJ: Lawrence Erlbaum Associates.
Brown, R. (1973). A first language: The early stages. Cambridge, MA: Harvard.
Brown, R., & Hanlon, C. (1970). Derivational complexity and order of acquisition in child
speech. In J. R. Hayes (Ed.), Cognition and the development of language (pp. 11-
54). New York: Wiley.
44
Bruner, J. (1978). On prelinguistic prerequisites of speech. In R. N. Campbell & P. T.
Smith (Eds.), Recent Advances in the Psychology of Language. New York: Plenum
Press.
Carey, S. (1985). Conceptual change in childhood. Cambridge, MA: MIT Press.
Chafe, W. (1987). Cognitive constraints on information flow. In R. Tomlin (Ed.),
Coherence and grounding in discourse. Philadelphia, PA: Benjamins.
Chomsky, N. (1957). Syntactic Structures. The Hague: Mouton.
Chomsky, N. (1963). Formal properties of grammars. In R. B. R. Luce & E. Galanter
(Eds.), Handbook of mathematical psychology (Vol. 2). New York: Wiley.
Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press.
Chomsky, N. (1980). Rules and Representations. New York: Columbia University Press.
Chomsky, N. (1981). Lectures on government and binding. Cinnaminson, NJ: Foris.
de Villiers, J., Roeper, T., & Vainikka, A. (1990). The acquisition of long distance rules. In
L. Frazier & J. de Villiers (Eds.), Language processing and language acquisition.
Amsterdam: Kluwer.
Demetras, M., Post, K., & Snow, C. (1986). Feedback to first-language learners. Journal of
Child Language, 13, 275-292.
Elbers, L., & Wijnen, F. (1993). Effort, production skill, and language learning. In C.
Ferguson & L. Menn & C. Stoel-Gammon (Eds.), Phonological development (pp.
337-368). Timonium, MD: York.
Elman, J., Bates, E., Plunkett, K., Johnson, M., & Karmiloff-Smith, A. (1996). Rethinking
innateness. Cambridge, MA: MIT Press.
Ervin-Tripp, S. (1981). Social process in first and second language learning. In H. Winitz
(Ed.), Native language and foreign language acquisition. New York, N. Y.: The
New York Academy of Sciences.
Farrar, J. (1992). Negative evidence and grammatical morpheme acquisition. Developmental
Psychology, 28, 90-98.
45
Fodor, J., & Crain, S. (1987). Simplicity and generality of rules in language acquisition. In
B. MacWhinney (Ed.), Mechanisms of Language Acquisition. Hillsdale, N.J.:
Lawrence Erlbaum.
Freud, S. (1958). Psychopathology of everyday life. New York: New American Library,
Mentor.
Givón, T. (1979). On understanding grammar. New York: Academic Press.
Gold, E. (1967). Language identification in the limit. Information and Control, 10, 447-
474.
Goldberg, A. E. (1999). The emergence of the semantics of argument structure
constructions. In B. MacWhinney (Ed.), The emergence of language (pp. 197-213).
Mahwah, NJ: Lawrence Erlbaum Associates.
Hausser, R. (1999). Foundations of computational linguistics: Man-machine
communication in natural language. Berlin: Springer.
Heath, S. (1983). Ways with words: Language, life and work in communities and
classrooms. Cambridge: Cambridge University Press.
Herbart, J. F. (1891). A text-book in psychology. New York: Appleton and Co.
Hirsh-Pasek, K., Trieman, R., & Schneiderman, M. (1984). Brown and Hanlon revisited:
mother sensitivity to grammatical form. Journal of Child Language, 11, 81-88.
Hopcroft, J., & Ullman, J. (1979). Introduction to automata theory, languages, and
computation. Reading, Mass.: Addison-Wesley.
Hopper, P. (1987). Emergent grammar. In J. Aske & N. Beery & L. Michaelis & H. Filip
(Eds.), Berkeley Linguistic Society. Vol 13. Berkeley: University of California Press.
Horning, J. J. (1969). A study of grammatical inference.: Stanford University, Computer
Science Department.
Hornstein, N., & Lightfoot, D. (1981). Explanation in linguistics: the logical problem of
language acquisition. London: Longmans.
46
Hyams, N. (1986). Language acquisition and the theory of parameters. Dordrecht: D.
Reidel.
Hymes, D. (1964). Language in culture and society: A reader in linguistics and
anthropology. New York: Harper and Row.
Jespersen, O. (1922). Language: Its nature, development, and origin. London: George
Allen and Unwin.
Kanazawa, M. (1998). Learnable classes of categorial grammars. Stanford, CA: CSLI
Publications.
Kawamoto, A. (1994). One system or two to handle regulars and exceptions: How time-
course of processing can inform this debate. In S. D. Lima & R. L. Corrigan & G.
K. Iverson (Eds.), The reality of linguistic rules (pp. 389-416). Amsterdam: John
Benjamins.
Lasnik, H. (1989). On certain substitutes for negative data. In R. Matthews & W.
Demopoulos (Eds.), Learnability and linguistic theory. Dordrecht: Kluwer.
Liceras, J. (1989). On some properties of the "pro-drop" parameter: looking for missing
subjects in non-native Spanish. In S. Gass & J. Schachter (Eds.), Linguistic
perspectives on second language acquisition (pp. 109-133). Cambridge: Cambridge
University Press.
Lieven, E. V. M., Pine, J. M., & Baldwin, G. (1997). Positional learning and early
grammatical development. Journal of Child Language, 24, 187-219.
Lightfoot, D. (1989). The child's trigger experience: Degree-0 learnability. Behavioral and
Brain Sciences, 12, 321-275.
MacDonald, M. C., Pearlmutter, N. J., & Seidenberg, M. S. (1994). Lexical nature of
syntactic ambiguity resolution. Psychological Review, 101(4), 676-703.
MacWhinney, B. (1975). Pragmatic patterns in child syntax. Stanford Papers And Reports
on Child Language Development, 10, 153-165.
47
MacWhinney, B. (1978). The acquisition of morphophonology. Monographs of the Society
for Research in Child Development, 43, Whole no. 1, pp. 1-123.
MacWhinney, B. (1982). Basic syntactic processes. In S. Kuczaj (Ed.), Language
acquisition: Vol. 1. Syntax and semantics (pp. 73-136). Hillsdale, NJ: Lawrence
Erlbaum.
MacWhinney, B. (1987). Toward a psycholinguistically plausible parser. In S. Thomason
(Ed.), Proceedings of the Eastern States Conference on Linguistics. Columbus,
Ohio: Ohio State University.
MacWhinney, B. (1988). Competition and teachability. In R. Schiefelbusch & M. Rice
(Eds.), The teachability of language (pp. 63-104). New York: Cambridge University
Press.
MacWhinney, B. (1989). Competition and lexical categorization. In R. Corrigan & F.
Eckman & M. Noonan (Eds.), Linguistic categorization (pp. 195-242).
Philadelphia: Benjamins.
MacWhinney, B. (1991). Reply to Woodward and Markman. Developmental Review, 11,
192-194.
MacWhinney, B. (1993). The (il)logical problem of language acquisition, Proceedings of
the Fifteenth Annual Conference of the Cognitive Science Society (pp. 61-70).
Hillsdale, NJ: Lawrence Erlbaum Associates.
MacWhinney, B. (1999a). Connectionism and language learning. In S. Kemmer (Ed.),
Data-driven models of language learning. Stanford: CSLI Press.
MacWhinney, B. (1999b). The emergence of language from embodiment. In B.
MacWhinney (Ed.), The emergence of language (pp. 213-256). Mahwah, NJ:
Lawrence Erlbaum.
MacWhinney, B. (2001). Emergence from what? Journal of Child Language, 28, 726-736.
MacWhinney, B. (in press). The gradual evolution of language. In T. Givón & B. Malle
(Eds.), The evolutionary emergence of language. Amsterdam: Benjamins.
48
MacWhinney, B. (Ed.). (1999c). The emergence of language. Mahwah, NJ: Lawrence
Erlbaum Associates.
MacWhinney, B., & Bates, E. (Eds.). (1989). The crosslinguistic study of sentence
processing. New York: Cambridge University Press.
MacWhinney, B., & Leinbach, J. (1991). Implementations are not conceptualizations:
Revising the verb learning model. Cognition, 29, 121-157.
MacWhinney, B. J., Leinbach, J., Taraban, R., & McDonald, J. L. (1989). Language
learning: Cues or rules? Journal of Memory and Language, 28, 255-277.
Maratsos, M., Kuczaj, S. A., Fox, D. E., & Chalkley, M. A. (1979). Some empirical studies
in the acquisition of transformational relations: Passives, negatives, and the past
tense. In W. A. Collins (Ed.), Children's language and communication. Hillsdale,
N.J.: Lawrence Erlbaum.
Marcus, G. (1993). Negative evidence in language acquisition. Cognition, 46, 53-85.
Markman, E. (1989). Categorization and naming in children: Problems of induction.
Cambrdige, MA: MIT Press.
Massaro, D. (1987). Speech perception by ear and eye. Hillsdale, NJ: Lawrence Erlbaum.
Matthews, R., & Demopoulos, W. (1989). Learnability and linguistic theory. Dordrecht:
Kluwer.
McNeill, D. (1966). The creation of language by children. In J. Lyons & R. Wales (Eds.),
Psycholinguistics papers. Edinburgh: University of Edinburgh Press.
Merriman, W. (1999). Competition, attention, and young children's lexical processing. In B.
MacWhinney (Ed.), The emergence of language (pp. 331-358). Mahwah, NJ:
Lawrence Erlbaum.
Miikkulainen, R. (1993). Subsymbolic natural language processing. Cambridge, MA: MIT
Press.
49
Miikkulainen, R., & Mayberry, M. R. (1999). Disambiguation and grammar as emergent
soft constraints. In B. MacWhinney (Ed.), The emergence of language (pp. 153-
176). Mahwah, NJ: Lawrence Erlbaum Associates.
Moerk, E. (1983). The mother of Eve as a first language teacher. Norwood, N.J.: ABLEX.
Morgan, J. L., Bonamo, K. M., & Travis, L. L. (1995). Negative evidence on negative
evidence. Developmental Psychology, 31, 180-197.
Nelson, K. (1982). Experimental gambits in the service of language acquisition theory. In S.
Kuczaj (Ed.), Language development: Syntax and Semantics. Hillsdale, N.J.:
Lawrence Erlbaum.
Nelson, K. E., Denninger, M. S., Bonvilian, J. D., Kaplan, B. J., & Baker, N. D. (1984).
Maternal input adjustments and non-adjustments as related to children's linguistic
advances and to language acquisition theories. In A. D. Pellegrini & T. D. Yawkey
(Eds.), The development of oral and written language in social contexts. Norwood,
N.J.: Ablex Publishing Corporation.
Ochs, E. (1985). The acquisition of Samoan. In D. I. Slobin (Ed.), The crosslinguistic study
of language acquisition. Volume 1: The data. Hillsdale, NJ: Lawrence Erlbaum
Associates.
Osherson, D., Stob, M., & Weinstein, S. (1989). Learning theory and natural language. In
R. Matthews & W. Demopoulos (Eds.), Learnability and linguistic theory.
Dordrecht: Kluwer.
Penner, S. G. (1987). Parental responses to grammatical and ungrammatical child
utterances. Child Development, 58, 376-384.
Piatelli-Palmarini, M. (1980). Language and learning: the debate between Jean Piaget and
Noam Chomsky. Cambridge MA: Harvard University Press.
Pine, J. M., Lieven, E. V. M., & Rowland, C. F. (1998). Comparing different models of the
development of the English verb category. Linguistics, 36, 4-40.
50
Pinker, S. (1984). Language learnability and language development. Cambridge, Mass:
Harvard University Press.
Pizzuto, E., & Caselli, M. (1993). The acquisition of Italian morphology: A reply to Hyams.
Journal of Child Language, 20, 707-712.
Plaut, D. C., McClelland, J. L., Seidenberg, M. S., & Patterson, K. (1996). Understanding
normal and impaired word reading: Computational principles in quasi-regular
domains. Psychological Review, 103, 56-115.
Plunkett, K., & Marchman, V. (1991). U-shaped learning and frequency effects in a multi-
layered perceptron: Implications for child language acquisition. Cognition, 38, 43-
102.
Plunkett, K., & Marchman, V. (1993). From rote learning to system building. Cognition,
49, xx-xx.
Post, K. (1994). Negative evidence. In J. Sokolov & C. Snow (Eds.), Handbook of
Research in Language Development Using CHILDES (pp. 132-173). Hillsdale, NJ:
Lawrence Erlbaum Associates.
Pulleybank, D., & Turkel, W. (1997). The logical problem of language acquisition in
Optimality Theory. In P. Barbosa & D. Fox & P. Hagstrom & M. McGinnis & D.
Pesetsky (Eds.), Is the best good enough? Optimality and competition in syntax
(pp. 399-420). Cambridge, MA: MIT Press.
Pullum, G. (1996). Learnability, hyperlearning, and the poverty of the stimulus. In J.
Johnson & M. Juge & J. Moxley (Eds.), Proceedings of the 22nd Annual Meeting:
General Session and Parasession on the Role of Learnability in Grammatical
Theory (pp. 498-513). Berkeley, CA: Berkeley Linguistics Society.
Reich, P. (1969). The finiteness of natural language. Language, 45, 831-843.
Ross, J. (1974). Three batons for cognitive psychology. In W. B. Weimer & D. S. Palermo
(Eds.), Cognition and the symbolic processes. New York: Wiley.
51
Rumelhart, D. E., & McClelland, J. L. (1986). On learning the past tense of English verbs.
In J. L. McClelland & D. E. Rumelhart (Eds.), Parallel distributed processing:
Explorations in the microstructure of cognition (pp. 216-271). Cambridge: MIT
Press.
Saxton, M. (1997). The Contrast Theory of negative input. Journal of Child Language, 24,
139-161.
Saxton, M., Kulcsar, B., Greer, M., & Rupra, M. (1998). Longer term effects of corrective
input: An experimental approach. Journal of Child Language, 25, 701-721.
Schieffelin, B. (1985). The acquisition of Kaluli. In D. Slobin (Ed.), The crosslinguistic
study of language acquisition. Volume 1: The data. Hillsdale, NJ: Lawrence
Erlbaum Associates.
Scollon, R. (1976). Conversations with a one year old: A case study of the developmental
foundation of syntax. Honolulu: University Press of Hawaii.
Snow, C. (1995). Issues in the study of input: Finetuning, universality, individual and
developmental differences, and necessary causes. In P. Fletcher & B. MacWhinney
(Eds.), The handbook of child language (pp. 180-193). Oxford: Blackwells.
Sokolov, J. L. (1993). A local contingency analysis of the fine-tuning hypothesis.
Developmental Psychology, 29, 1008-1023.
Sokolov, J. L., & MacWhinney, B. (1990). The CHIP framework: Automatic coding and
analysis of parent-child conversational interaction. Behavior Research Methods,
Instruments, and Computers, 22, 151-161.
Tesar, B., & Smolensky, P. (2000). Learnability in optimality theory. Cambridge, MA: MIT
Press.
Tomasello, M. (1992). First verbs: A case study of early grammatical development.
Cambridge: Cambridge University Press.
Tomasello, M. (1999). The cultural origins of human communication. New York:
Cambridge University Press.
52
Tomasello, M. (2000). Do young children have adult syntactic competence? Cognition, 74,
209-253.
Truscott, J., & Wexler, K. (1989). Some problems in the parametric analysis of learnability.
In R. Matthews & W. Demopoulos (Eds.), Learnability and linguistic theory.
Dordrecht: Kluwer.
Valian, V. (1991). Syntactic subjects in the early speech of American and Italian children.
Cognition, 40, 21-81.
Wexler, K., & Culicover, P. (1980). Formal principles of language acquisition. Cambridge,
Mass.: MIT Press.
Wilson, B., & Peters, A. M. (1988). What are you cookin' on a hot?: Movement Constraints
in the Speech of a Three-Year-Old Blind Child. Language, 64, No.2, 249-273.
Wolfe Quintero, K. (1992). Learnability and the acquisition of extraction in relative clauses
and wh-questions. Studies in Second Language Acquisition, 14, 39-70.
i The competition between “went” and “*goed” has also been treated as an instance of
“blocking” (Baker & McCarthy, 1981; Pinker, 1984). In the blocking account, “went” is
said to block “*goed” because lexically-based rules are ordered before general rules in the
rule cycle of the morphological component. This account involves an unnecessary
commitment to strict rule-ordering and an unnecessary invocation of an ability to order rules
according to some innate criteria. Since the explanatory power of blocking is completely
captured by the mechanism of competition, we will rely on competition here, rather than
blocking.