how phonological structures can be culturally selected for

12
269 How Phonological Structures Can Be Culturally Selected for Learnability Pierre-Yves Oudeyer Sony CSL Paris, France This paper shows how phonological structures can be culturally selected so as to become learnable and adapted to the ecological niche formed by the brains and bodies of speakers. A computational model of the cultural formation of syllable systems illustrates how general learning and physical biases can influence the evolution of the structure of vocalization systems. We use the artificial life methodology of building a society of artificial agents, equipped with motor, perceptual and cognitive systems that are generic and have a realistic complexity. We demonstrate that agents, playing the “imitation game,” build shared syllable systems and show how these syllable systems relate to exist- ing human syllable systems. Detailed experiments study the learnability of the self-organized syllable systems. In particular, we reproduce the critical period effect and the artificial language learning effect without the need for innate biases which specify explicitly in advance the form of possible phonologi- cal structures. The ability of children agents to learn syllable systems is explained by the cultural evo- lutionary history of these syllable systems, which were selected for learnability. Keywords origins of speech · evolution · acquisition · constraint · phonetics · phonology · learnability 1 The Principle of Cultural Selection for Learnability Children learn language, and in particular sound systems, incredibly easily and quickly, in spite of its apparent idiosyncratic complexity and noisy learning conditions. Many researchers (Pinker & Bloom, 1990) believe this can not be possible without a substantial genetic pre- programming of specific neural circuits that encode explicitly at birth the main structures of language. In fact the role of learning in language development is even sometimes thought to be very minor (Piattelli-Palmarini, 1989) and reduced to the setting of a few parameters as in the Principles and Parameters theory (Chomsky & Halle, 1968) or in Optimality Theory (Archangeli & Langendoen, 1997). Yet, a growing number of research- ers have challenged this view, and have argued that no linguistically specific innate neural device is necessary to account for the oddities of language learning (and structure): Rather, they propose that they result from the complex interactions between a number of general motor, perceptual, cognitive, social and functional con- straints during the course of cultural interactions (Steels, 1997; Kirby, 2001; Tomasello, 2003). According to this view, language emerged and evolved so as to fit the ecological niche of initially non-speaking human brains and bodies: Languages were (and still are) culturally selected so as to be learnable. As a consequence, if one wants to understand the principles of language acquisition and the structure of Copyright © 2005 International Society for Adaptive Behavior (2005), Vol 13(4): 269–280. [1059–7123(200512) 13:4; 269–280; 059568] Figures 1, 2 appear in color online: http://adb.sagepub.com Correspondence to: Pierre-Yves Oudeyer, Sony Computer Science Laboratory Paris, 6, rue Amyot, 75005 Paris, France. E-mail: [email protected] Web: http://www.csl.sony.fr/~py Tel.: +33144080503, Fax: +33145878750.

Upload: others

Post on 03-Feb-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How Phonological Structures Can Be Culturally Selected for

269

How Phonological Structures Can Be Culturally

Selected for Learnability

Pierre-Yves OudeyerSony CSL Paris, France

This paper shows how phonological structures can be culturally selected so as to become learnable

and adapted to the ecological niche formed by the brains and bodies of speakers. A computationalmodel of the cultural formation of syllable systems illustrates how general learning and physical

biases can influence the evolution of the structure of vocalization systems. We use the artificial life

methodology of building a society of artificial agents, equipped with motor, perceptual and cognitivesystems that are generic and have a realistic complexity. We demonstrate that agents, playing the

“imitation game,” build shared syllable systems and show how these syllable systems relate to exist-

ing human syllable systems. Detailed experiments study the learnability of the self-organized syllablesystems. In particular, we reproduce the critical period effect and the artificial language learning effect

without the need for innate biases which specify explicitly in advance the form of possible phonologi-

cal structures. The ability of children agents to learn syllable systems is explained by the cultural evo-lutionary history of these syllable systems, which were selected for learnability.

Keywords origins of speech · evolution · acquisition · constraint · phonetics · phonology · learnability

1 The Principle of Cultural Selection for Learnability

Children learn language, and in particular sound systems,incredibly easily and quickly, in spite of its apparentidiosyncratic complexity and noisy learning conditions.Many researchers (Pinker & Bloom, 1990) believe thiscan not be possible without a substantial genetic pre-programming of specific neural circuits that encodeexplicitly at birth the main structures of language. In factthe role of learning in language development is evensometimes thought to be very minor (Piattelli-Palmarini,1989) and reduced to the setting of a few parametersas in the Principles and Parameters theory (Chomsky& Halle, 1968) or in Optimality Theory (Archangeli &

Langendoen, 1997). Yet, a growing number of research-ers have challenged this view, and have argued that nolinguistically specific innate neural device is necessaryto account for the oddities of language learning (andstructure): Rather, they propose that they result fromthe complex interactions between a number of generalmotor, perceptual, cognitive, social and functional con-straints during the course of cultural interactions (Steels,1997; Kirby, 2001; Tomasello, 2003). According to thisview, language emerged and evolved so as to fit theecological niche of initially non-speaking human brainsand bodies: Languages were (and still are) culturallyselected so as to be learnable.

As a consequence, if one wants to understand theprinciples of language acquisition and the structure of

Copyright © 2005 International Society for Adaptive Behavior(2005), Vol 13(4): 269–280.[1059–7123(200512) 13:4; 269–280; 059568]Figures 1, 2 appear in color online: http://adb.sagepub.com

Correspondence to: Pierre-Yves Oudeyer, Sony Computer Science Laboratory Paris, 6, rue Amyot, 75005 Paris, France.E-mail: [email protected] Web: http://www.csl.sony.fr/~pyTel.: +33144080503, Fax: +33145878750.

Oudeyer, P-Y (2005) How phonological structures can be culturally selected for learnability, Adaptive Behavior, 13(4), pp. 269--280.

Page 2: How Phonological Structures Can Be Culturally Selected for

270 Adaptive Behavior 13(4)

language, one has to understand the principles of lan-guage emergence and evolution. Conversely, if onewants to understand the principles of language emer-gence and evolution, it is a necessity to understand theprinciples of language acquisition, since the biases dueto the learning mechanism and the learning situationsinfluence crucially the shaping of linguistic structures.There exists already a vast amount of computationalmodels of the emergence and evolution of language(Steels, 1997; Kaplan, 2001; Kirby, 2001; Cangelosi& Parisi, 2002; de Boer, 2001; Oudeyer, 2005b; Vogt,2003). Yet, those who have closely studied the impactof learning biases upon the evolution of linguistic struc-tures, and in particular the cultural selection for betterlearnability, are still quite rare. Zuidema (2003) pre-sented abstract simulations of the formation of syntac-tic structures and detailed the influence of cognitiveconstraints upon the generated syntax. Brighton et al.(2005) presented a thorough study of several simula-tions of the origins of syntax (Kirby, 2001) which werere-described in the light of this paradigm of culturalselection for learnability. The objective of this paper isto show an example of cultural selection for learnabil-ity in the field of phonology. We will present a compu-tational system which illustrates how phonologicalstructures can be culturally selected so as to becomelearnable.

More particularly, our system models the culturalformation of syllable systems, which are thought to bea fundamental unit of the complex phonological sys-tems of human languages (MacNeilage, 1998). It relieson the interactional framework developed by de Boer(de Boer, 2001) and called the “imitation game.” de Boer(2001) developed this framework in the context of sim-ulations of the origins and evolution of vowel systems.He showed how self-organization may have allowedagents to form efficient vowel systems in terms ofacoustic distinctiveness, and with neither explicit opti-mization nor centralized control. Lindblom (1992) hadshown that one could predict the most frequent vowelsystems in human languages by searching those thatwere acoustically maximally distinctive. The work ofde Boer allows us to understand how this optimizationcan be actually realized implicitly and through culturalevolution in a society of agents.

We extend his simulations by providing the agentwith the capacity to produce complex dynamic utter-ances, built by the sequencing of articulatory targets.This ability to sequence targets (or phonemes) allows

us to study the evolution of phonological structures,i.e., the rules of phoneme sequencing. The learning ofsyllable repertoires is controlled by a mechanism thatwe will detail in order to understand its biases. Fur-thermore, we introduce the notion of articulatory/ener-getic cost for vocalizations, which impacts the learningsystem. This allows us to embody in an operationalsystem the “ease versus distinctiveness” hypothesisproposed by Lindblom (1992). The introduction of anarticulatory cost and its role in the shaping of syllablerepertoires was also studied in the computational modelpresented by Redford et al. (2001). Yet, Redford et al.(2001) used explicit centralized optimization like Lind-blom, and did not study how this could be realized in asociety of decentralized autonomous agents. A previ-ous paper (Oudeyer, 2001) studied the structure of theself-organized shared syllable repertoires that appearin our system, and presented comparisons with actualhuman syllable repertoires, as was done in Lindblom(1992), Redford et al. (2001), and de Boer (2001). Thefocus of this paper is different and new as compared tothese previous studies: We will here concentrate on thestudy of the learnability of the self-organized syllablesystems by children and adult agents, and compare itto the learnability of randomly generated artificial syl-lable systems.

In three other papers (Oudeyer, 2005a,b,), we devel-oped another model of the origins of phonologicalstructures that must not be confused with the one thatwe present in this paper. In both models, agents are ableto produce complex utterances, and rules of sequentialcombination emerge. But the assumptions as well as theobjective of this other model are very different. First ofall, this other model does not assume an explicit pressurefor building repertoires of distinctive sounds: Agents donot play the “imitation game,” and there are no rewardsor feedback between agents. In fact, agents have no skillsto coordinate socially, and are just coupled by the factthat they can hear the vocalizations of the other agentswith which they share the same environment. This meansthat the model did not pre-suppose any convention suchas the rules of a language game. That model was designedto study the bootstrapping of speech: How did the firstspeech code appear when linguistic communicationdid not exist yet? In contrast, the model we presenthere, as well as the one of de Boer, deals with the cul-tural formation and evolution of a speech code oncethe ability to have linguistic interactions has appeared,with all the associated tools such as the interactional

Oudeyer, P-Y (2005) How phonological structures can be culturally selected for learnability, Adaptive Behavior, 13(4), pp. 269--280.

Page 3: How Phonological Structures Can Be Culturally Selected for

Oudeyer Phonological Structure Selection 271

conventions like the “imitation game.” In fact, the modelwe present here, as well as the one of de Boer, can beseen as a study of what happens when the bootstrappedvocalization repertoires generated by the agents inOudeyer (2005b) are actually recruited for communi-cation, which introduces feedback and repulsive forcesbetween speech categories, and as a consequence shapesfurther the bootstrapped repertoires.1

The next section presents an overview of the arti-ficial system. Then we show how agents form sharedsyllable systems efficiently and we summarize the struc-tural properties of the produced syllable systems. Finally,we explore in detail their learnability and the implica-tions for theories of language.

2 The Artificial System

2.1 The Imitation Game

Central to the model is the way agents interact. We usehere the concept of “language game,” operationallyused in a number of computational models of the ori-gins of language (Steels, 1997; Oudeyer, 2001). A gameis a sort of protocol that describes the outline of a con-versation, allowing agents to coordinate by knowingwho should try to say what kind of things at a particu-lar moment. Here, we use the “imitation game” devel-oped by de Boer for his experiments on the emergenceof vowel systems (de Boer, 2001).

A round of a game involves two agents, one beingcalled the speaker, and the other the hearer. Each agentpossesses a repertoire of syllables with a score associ-ated (this is the categorical memory described below).The speaker initiates the conversation by picking uprandomly one item in its repertoire and uttering it. Thenthe hearer tries to imitate this vocalization by produc-ing the item in its repertoire that matches best withwhat it heard. The speaker then evaluates whether theimitation was good or not by checking whether thebest match to this imitation in its repertoire correspondsto the item it uttered initially. It then gives a positive ornegative feedback signal to the hearer. Finally, eachagent updates its repertoire. If the imitation succeeded,the scores of involved items increase. Otherwise, thescore of the association used by the speaker decreasesand there are 2 possibilities for the hearer: Either thescore of the association it used was below a certainthreshold, and this item is modified by the agent who

tries to find a better one, or the score was above thisthreshold, which means that it may not be a good idea tochange this item, and a new item is created, as close aspossible to the utterance of the speaker given the con-straints and knowledge of the hearer at this time of itslife. Regularly the repertoire is cleaned up by removingthe items that have a score below a certain threshold.Initially, the repertoires of agents are empty. Newitems are added either by invention, which takes placeregularly, or by learning from others.

2.2 The Production System

The production system that we use was designed inorder to reflect the complexity of the human vocal appa-ratus so that we can study how some morpho-physio-logical constraints can affect the cultural evolution ofsyllable repertoires. Yet, it is not our goal to build anaccurate model of the vocal tract and its associatedcontrol systems, and certainly it would not be possiblesince the existing conceptions of these systems in thescientific community are rather controversial and notdetailed enough. Nevertheless, for other examples ofvocal tract and control system models, see Bailly (1998),and Guenther (2003).

2.2.1 Vocal Tract A physical computational modelof the vocal tract is used, based on an implementationof Cook’s model (Cook, 1989). It consists in modelingthe vocal tract together with the nasal tract as a set oftubes that act as filters, into which are sent acousticwaves produced by a model of the glottis and a noisesource. There are eight control parameters for the shapeof the vocal tract, used for the production of syllables.The setting of these eight parameters specifies an artic-ulatory configuration.

2.2.2 Control System A vocalization consists in therealization of a sequence of articulatory targets. Anarticulatory target is defined by a specific articulatoryconfiguration to be reached. The vocalizations thatagents produce are called “syllables.” Indeed, we con-sider the syllable from the point of view of the frame–content theory (MacNeilage, 1998) which defines it asan oscillation of the jaw (the frame) modulated byintermediary specific articulatory configurations, whichrepresent segmental content (the content), correspond-

Oudeyer, P-Y (2005) How phonological structures can be culturally selected for learnability, Adaptive Behavior, 13(4), pp. 269--280.

Page 4: How Phonological Structures Can Be Culturally Selected for

272 Adaptive Behavior 13(4)

ing to what one may call phonemes. In our simula-tions, this means that agents always start a vocalizationfrom a default rest position corresponding to a closedmouth, and always terminate the vocalization in thisrest position. In between, they specify a number ofarticulatory targets.

There is a control system responsible for continu-ously driving the vocal tract shape parameters from thecurrent articulatory configuration to the next articula-tory target. This control system is simply based on apolynomial interpolation, whose smoothing propertiesallow us to model the phenomenon of co-articulation:The articulatory trajectory around each articulatorytarget, i.e., the realization of each phoneme, is modu-lated by the neighboring targets. This is crucial becauseit determines which syllables are difficult to pronounceand imitate. The constraint of jaw oscillation is mod-eled by a force that pulls in the direction of the positionthe articulators would have if the syllable was a pureframe, which means an oscillation without intermedi-ary targets. This can be viewed as an elastic whose restposition at each time step is the pure frame configura-tion at this time step. The amount of deformation of thepure frame during a vocalization is used to define thearticulatory cost (or energetic effort) of this vocaliza-tion. This cost is used to model the idea that easyvocalizations tend to be remembered more easily thanothers (Lindblom, 1992).

At the beginning of the simulation, agents are ini-tialized with a repertoire of 10 phonemes, i.e., a set of10 pre-defined articulatory targets, that can be thoughtto be the outcome of the system that we described inOudeyer (2005b). Although the degrees of freedomthat we can control here do not correspond exactly tothe degrees that are used to define human phonemes,we chose values that allow them to be good analogs ofvowels (V), liquids (C1) and plosives (C2), whichrespectively correspond to low, medium and high degreeof obstruction of the air flow in the vocal tract.

2.3 The Perception System

The ear of each agent consists of a model of the coch-lea, and in particular of the basilar membrane, asdescribed in Lyon (1997). It is used to compute theevolution of the excitation of this membrane over time.Each excitation trajectory is discretized over both timeand frequency: 20 frequency bins are used and a sam-ple is extracted every 10 ms. As a measure of similar-

ity between two perceptual trajectories, we used atechnique well-known in the field of speech recogni-tion: Dynamic time warping (Sakoe, 1982). Agentsuse this measure to compute which item in their mem-ory is closest. No segmentation into “phonemes” isdone in the recognition process: The recognition isdone over the complete unsegmented vocalization.Agents discover what phonemes compose the syllableonly after recognition of the syllable and by looking atthe articulatory program associated to the matchedperceptual trajectory in the exemplar.

2.4 The Artificial Brain

The brain of our agents consists of two memories ofexemplars associated with a mechanism to shape anduse them. The first memory consists of a set of exem-plars that serve in the imitation process, and allow theagent to retrieve the motor programs which correspondto given acoustic vocalizations (this memory is there-fore called the inverse mapping memory): The set ofthese exemplars represents the skills of agents for thistask. This set is limited in size, which purposefully intro-duces a crucial generic cognitive constraint. Exemplarsconsist in associations between articulatory programs(an articulatory program is a sequence of articulatorytargets) and corresponding perceptual trajectories. Thesecond memory (the categorical memory) is in fact asubset of the inverse-mapping memory, where eachexemplar possesses a score. Categorical memory isused to represent the particular syllables that count ascategories in the vocalization system being collec-tively built by agents (corresponding exemplars areprototypes for categories). It corresponds to the mem-ory of prototypes classically used in the imitation game(de Boer, 2001).

Initially, the inverse mapping memory is builtthrough babbling. Agents generate random articula-tory programs, composed of between one and four artic-ulatory targets, execute them with the control moduleand perceive the produced vocalization. They storeeach trial with a probability inverse to the articulatorycost involved. The number of exemplars that can bestored in this memory is quite limited: In the experi-ments presented below, there are 100 exemplars whereasthe total number of possible syllables is slightly above12,000. So initially the inverse mapping memory iscomposed of exemplars which tend to be more numer-ous in zones where the cost is low than in zones where

Oudeyer, P-Y (2005) How phonological structures can be culturally selected for learnability, Adaptive Behavior, 13(4), pp. 269--280.

Page 5: How Phonological Structures Can Be Culturally Selected for

Oudeyer Phonological Structure Selection 273

the cost is higher. As far as the categorical memory isconcerned, it is initially empty, and will grow throughlearning and invention.

When an agent hears a syllable and wants to imi-tate it, it first looks up in its categorical memory (if it isnot empty) and finds the item whose perceptual trajec-tory is most similar to the one it just heard. Then itexecutes the associated articulatory program. After theinteraction is finished, it tries to improve its imitation.To do that, it finds in its inverse mapping memory theitem whose perceptual trajectory matches best (it maynot be the same as the categorical item). Then it tries,through babbling, a small number of articulatory vari-ations of this item by changing, adding or deleting onearticulatory target. For example, if one agent hears“ble,” and if “fle” is the closest item in its memory, thenit may try “vle,” “fli,” or even “ble” if chance decidesso and if it possesses the phonemes b/l/e/f/v/i (indeed,not all possible mutations are tried, which models atime constraint: Here they typically try 10 mutations).The important point is that these mutation trials are notforgotten for the future (some of them may be uselessnow, but may become very useful in the future): Eachof them is remembered with a probability inverse to itsarticulatory cost. Furthermore, this mechanism impliesthat the hearing of a certain kind of vocalization increasesthe probability to produce similar vocalizations in thefuture and to use them as categorical prototypes. Thisbears some similarities with the phonological attune-ment observed during the vocal babbling of develop-ing infants (Vihman, 1996). Finally, as we have memorylimitations, when new items are added to the inversemapping memory, some others have to be pruned. Thestrategy chosen here is: For each new item, a randomlychosen item is also deleted (only the items that belongto the categorical memory cannot be deleted).

The evolution of the inverse mapping memoryimplied by this mechanism is as follows. Whereas atthe beginning items are spread uniformly across “iso-cost” regions (they have some capacity of imitation formany kinds of vocalizations, but not very precise), atthe end items are clustered in certain zones correspond-ing to the particular vocalization system of the societyof agents (they have specific capacities for precise imi-tation of certain kinds of vocalizations). This is due tothe fact that exemplars closest to vocalizations pro-duced by other agents are differentiated and this leadsto an increase of exemplars in their local region, and asa consequence this leads also to a decrease of the

number of exemplars elsewhere, because of the limitsimposed on memory. Indeed, the memory limits do notallow agents to become very good imitators in thewhole space, but only in focused regions.

It is interesting to remark that what goes on in thehead of each agent is very similar to what happens ingenetic evolution. One can view the set of exemplarsthat an agent possesses as a population of individuals/genomes, each defined by the sequence of articulatorytargets. The fitness function of each individual/sylla-ble is defined by how often it leads to successful imita-tions when it is used, in both speaker and hearer roles.This population of individuals evolve through a generateand select process, generation being performed througha combination of completely random inventions andmutations of syllables (when one changes one articula-tory goal), and selection using the scores of each syllable.The original thing here as compared to many simula-tions modeling either genetic or cultural evolution, isthat the fitness function is not fixed but evolves withtime: Indeed, the fitness of one syllable depends on thepopulation of syllables in the heads of other agentswhose fitness itself depends on this syllable. So wehave a case of coupled dynamic fitness landscapes. Aswe will see, what happens is that those fitness land-scapes synchronize at some point, they become verysimilar and stable. Also, the fitness of one syllabledepends on the other syllables/exemplars in the mem-ory of the agent: Indeed, if a syllable is alone in its partof the space, then few syllables of this area will be pro-duced and other agents will have less opportunity topractice imitation of this kind of syllable, and so thereis a high probability that the syllable will be pruned.The consequence of this is that group selection alsohappens.

3 The Formation of Shared Repertoires of Syllables

The first thing one wants to know about this system iswhether populations of agents manage to develop asyllable system of reasonable size that allows them tohave successful imitations. Figures 1 and 2 show anexample of an experiment involving 15 agents, with amemory limit on inverse-mapping memory of 100exemplars, and with vocalizations comprising betweentwo and four targets included among 10 possible ones(which means that at a given moment, one agent never

Oudeyer, P-Y (2005) How phonological structures can be culturally selected for learnability, Adaptive Behavior, 13(4), pp. 269--280.

Page 6: How Phonological Structures Can Be Culturally Selected for

274 Adaptive Behavior 13(4)

Figure 1 Example of the evolution of the rate of successful imitations in a society of 15 agents starting with empty rep-ertoires of syllables.

Figure 2 Corresponding evolution of the mean number of syllables in the repertoires of agents.

Oudeyer, P-Y (2005) How phonological structures can be culturally selected for learnability, Adaptive Behavior, 13(4), pp. 269--280.

Page 7: How Phonological Structures Can Be Culturally Selected for

Oudeyer Phonological Structure Selection 275

knows more than about 0.8 percent of the syllablespace). In Figure 1, each point represents the averageimitation success in the last 100 games, and in Figure 2,each point represents the average size of categoricalmemory in the population (i.e. the mean number of syl-lables in agents’ repertoires). We see that of course thesuccess is very high right from the start: This is nor-mal, since at the beginning agents have basically oneor two syllables in their repertoire, which implies thateven if an imitation is quite bad in the absolute, it willstill get well matched. The challenge is actually toremain at a high success rate while increasing the sizeof the repertoires. The two graphs show that this iswhat happens. To make these results convincing, theexperiments was repeated 20 times, and the averagenumber of syllables and success was measured in thelast 1,000 games (over a total of 20,000 games): 96.9%is the mean success and 79.1 is the mean number ofcategories/syllables. The plateau that appears in Figure2 is directly explained by the fact that agents have amemory with limited capacity.

The fact that the success remains high as the sizeof repertoires increases can be explained. At the begin-ning, agents have very few items in their repertoires,so even if their imitations are bad in the absolute, theywill be successfully recognized since recognition is doneusing a nearest-neighbor procedure (for example, whentwo agents have only one item, no confusion is possi-ble since there is only one category). As time goes on,while their repertoires become larger, their imitationskills are also increasing: Indeed, agents explore thearticulatory/acoustic mapping locally in areas wherethey hear the syllables of others, and the new syllableprototypes they create are hence also in these areas.The consequence is a positive feed-back loop whichmakes it such that agents who knew very different partsof the mapping initially tend to synchronize their knowl-edge and become experts in the same small area (whereasat the beginning they have skills to imitate very differ-ent kinds of syllables, but are poor when it comes tomaking subtle distinctions in small areas).

4 Structural Properties

It is possible to statistically study the structural proper-ties of the self-organized syllable systems, and to com-pare them to human syllable systems. This was thetopic of a previous paper (Oudeyer, 2001), and is not

the focus of this paper. So, we only summarize theresults.

The produced syllable systems have structuressimilar to what we observe in human languages. Onthe one hand, a number of universal tendencies werefound, like the ranking of syllable types along theirfrequency (CV CVC CCV CCVC/CVCC).Also, the model predicts the preference for syllablesrespecting the sonority hierarchy principle, which statesthat within a syllable, the sonority (or degree of obstruc-tion of the air flow in the vocal apparatus) first increasesuntil a peak (the nucleus) and then decreases. On theother hand, the diversity observed in human languagescould also be observed: Some syllable systems did notfollow the trend in syllable type preference, and cate-gorical differences exist (some syllable systems havecertain syllable types not possessed by others). Thisconstitutes a viable alternative to the mainstream viewon phonological systems, optimality theory (Arch-angeli & Langendoen, 1997), which requires the pres-ence of innate explicit formal constraints in the genometo account for universal tendencies (an example ofconstraint is the *COMPLEX constraint which statesthat syllables can have at most one consonant at an edge),and explains diversity by different orderings in thestrengths of these formal constraints (which is basicallythe only thing that is learnt).

5 Learnability

The learnability of the resulting systems by new agentsconfronted directly with the complete vocalizationsystem is an important question. Indeed, the learnabil-ity of language has been the subject of many experi-ments, theories and debates. Experiments have shownfor example that language acquisition is most success-ful when it is begun early in life (Long, 1990), whichrefers to the well-known concept of the critical learningperiod (Lenneberg, 1967). Also, learners of a secondlanguage typically have many more difficulties thanlearners of a first language (Flege, 1992). Until relativelyrecently, these facts were interpreted in favor of theidea that humans have an innate language acquisitiondevice (Pinker & Bloom, 1990; Piattelli-Palmarini,1989) which partly consists in pre-giving a number oflinguistically specific constraints: For example, Long(1990), argues that it is strong evidence for “matura-tionaly scheduled language specific learning abilities.”

≥ ≥ ≥

Oudeyer, P-Y (2005) How phonological structures can be culturally selected for learnability, Adaptive Behavior, 13(4), pp. 269--280.

Page 8: How Phonological Structures Can Be Culturally Selected for

276 Adaptive Behavior 13(4)

This view is also supported by a number of theoreticalstudies, like Gold’s theorem (Gold, 1967), which basi-cally states that in the absence of enough explicit neg-ative evidence, one cannot learn languages belongingto the superfinite class, which includes context-freeand context-sensitive languages. Nevertheless, the appli-cability to human languages has been challenged (Dea-con, 1997).

There is an alternative view to which our modelbrings plausibility. It consists in explaining the factthat the learning skills of adults are lower than those ofchildren by the fact that the brain resources needed todo so have already been recruited for other tasks or fora different language/vocalization system (Rohde &Plaut, 1999). Said another way, children learn a com-pletely new vocalization system better than adultsbecause their cognitive capabilities are less committed,whereas adults are already specialized. This is indeedwhat we can observe in the artificial system. A numberof experiments were conducted in which on the one

hand, some children agents (i.e. new agents) had to learnan already established syllable system, and on the otherhand, adult agents had to learn the same establishedsyllable system, which was for them a “second lan-guage” vocalization system. More precisely, in eachexperiment, first a society of agents was run to pro-duce a syllable system: After 15,000 games, an agentwas randomly chosen and called the teacher. Thisteacher was then used in the same game describedabove with a second agent, the learner, except that herethe teacher did not update its memory (he is supposedto know that he knows the syllable system well com-pared to the learner). The learner was in a first series of20 runs a child agent, and in a second series of 20 runsthe learner was an agent taken from another societyafter 15,000 games (which models an adult who alreadyknows another vocalization system). Typical examplesof imitation success curves are in Figure 3: The uppercurve is the one for child learning, and the lower curvefor adult learning. Each point in the curve represents

Figure 3 Evolution of the rate of successful imitations for a child agent which learns an already established syllablesystem (top curve), and for an adult agent already knowing another established syllable system (bottom curve). We ob-serve that the child learns the syllable system very well, while the adult never manages to master it. This shows the“second language learning effect”: adults are no longer capable of perfectly learning a new vocalization system.

Oudeyer, P-Y (2005) How phonological structures can be culturally selected for learnability, Adaptive Behavior, 13(4), pp. 269--280.

Page 9: How Phonological Structures Can Be Culturally Selected for

Oudeyer Phonological Structure Selection 277

the mean success in the last 100 games at a particulartime t. The mean success after 5,000 games of the 20runs was of 97.3% for children and 80.8% for adults.This conforms well to the idea of a critical period:Adults never manage to perfectly learn another vocal-ization system. There is an explanation for that: Whereaschildren start with a high plasticity in their inversemapping memory (because they have no categories yetand so can freely delete and create many new items)and have no strong biases towards a particular zone ofthe syllable space, adults, in contrast, are already com-mitted to another vocalization system, and have moredifficulty in creating new items in the appropriate zoneof the syllable space because their memory resources(items in inverse mapping memory that are not proto-types of one of their previous language categories) aremuch lower. Of course, some of these category proto-types may be pruned, thus freeing some resources,because they are unsuccessful for the new vocalizationsystem. But in practice it seems that a number of themallow for successful imitations with the new vocalizationsystem, which prevents the freeing of enough resourcesand so the remaining confusions cannot be resolved.To conclude, we see that our model fits very well withthe idea that critical periods and second language learn-ing effect do not require a genetically programmed lan-guage-specific mechanism to find an explanation, andthat the more parsimonious idea of (non-)commitmentof the cognitive system can account for it.

We saw that children could actually nearly per-fectly learn a fully developed syllable system. Thisresult is not obvious since they are faced directly withthe complete syllable system, as opposed to the agentswho co-built it: The building was incremental and thesyllable system complexified progressively, which doesnot mean that their job was easier since negotiationalso had to take place, but it was different. An experi-ment was performed that on the one hand shows hownon-obvious the task is and on the other hand illus-trates the principle of cultural selection for learnabilitywhich we detailed in the introduction. Children agentswere put in a situation of trying to learn a random syl-lable system: The adult/teacher was artificially built byputting in its categorical repertoire items whose articu-latory programs were completely random (chosen amongthe complete set of combinatorially possible articulatoryprograms with less than four phonemes). This experi-ment was repeated 20 times. Figure 4 shows the curvesof two experiments: The top curve is for child learning

success when the target syllable system was generatedby a population of agents and the bottom curve is forchild learning success when the target syllable systemwas random. The mean success over the 20 experi-ments after 5,000 games is 97.3% for “natural” sylla-ble systems and 78.2% for random syllable systems.We see that children never learn the random syllablesystems well. This result is experimentally and func-tionally very similar to an experiment about syntaxdescribed by Christiansen (2000) in which human sub-jects were asked to learn small languages whose syntaxwas either that of an existing natural language or a ran-dom/artificial one. They found that indeed subjects weremuch better at learning the language where the syntaxwas “natural” than the language where the syntax was“artificial.” Deacon (1997) also made a point about this:“If language were a random set of associations, chil-dren would likely be significantly handicapped by theirhighly biased guessing.”

This state of affairs is in fact compatible with mostof theories of language, which all basically suggestthat human languages have many particular structures(that make them non-random) and that we are innatelyendowed with constraints that bias us towards an eas-ier learning of these languages. Where considerabledisagreement arises again about the nature and the ori-gins of these constraints. On the one hand, the nativistapproach (Pinker & Bloom, 1990) suggests that they areencoded in a Universal Grammar, genetically coded andlinguistically specific, and considers language as asystem mainly independent of its users (humans) whomay have undergone biological evolution so as to beable to acquire and use it in an efficient way. This isnot only true for syntax but also down to phonetics:This approach posits that we have an innate knowl-edge of what features (for example the labiality of aphoneme) and combinations of features can be used inlanguage (Chomsky & Halle, 1968).

On the other hand, a more recent approach consid-ers that language itself evolved and its features wereselected to fit the generic and already existing learningand processing capabilities of humans (Brighton et al.,2005), and that complex linguistic structures may haveemerged through a process of self-organization at mul-tiple levels (Steels, 1997). The fact that language evolvedand adapted to the primitive human brain’s ecologicalniche, and in particular to the brains of children, explainswhy “children have an uncanny ability to make luckyguesses” though they do not possess innate linguistic

Oudeyer, P-Y (2005) How phonological structures can be culturally selected for learnability, Adaptive Behavior, 13(4), pp. 269--280.

Page 10: How Phonological Structures Can Be Culturally Selected for

278 Adaptive Behavior 13(4)

knowledge (Deacon, 1997). Again the present modeltends to bring more plausibility to the second approach.Indeed, it is clear here that on the one hand there areinnate generic motor, perceptual and cognitive mecha-nisms which are not specific to speech (and could beused to learn the coordination between the vision ofthe hand and the muscles of the arm for example), thatbias the way agents explore and acquire parts of thesyllable space. On the other hand, the mechanism bywhich agents culturally negotiate which will be theirparticular syllable system makes them preferentiallyselect systems which allow for easy imitation, henceeasier learning. For instance, syllables that are verysensitive to noise will tend to be avoided/pruned sincethey lead to confusions and so introduce difficulties inthe learning of the repertoire. Also, syllable systemswill tend to be coherent both with the process of explo-ration by differentiation and the tendency to betterremember easy syllable prototypes than difficult ones:

Given a part of a syllable system, the rest may be foundquite easily by focusing the exploration on small vari-ants of items of this part, and exploration is also mademaximally efficient by focusing on easy parts.

6 Conclusion

In this paper, we presented an artificial system in whichagents developed shared syllable systems through cul-tural evolution. This system extends the one of de Boer(2001), and is complementary to the model presented in(Oudeyer, 2005a,b,c), which showed that it was possi-ble to bootstrap complex vocalization systems withoutthe need to pre-suppose an explicit pressure for buildingrepertoires of distinctive sounds, and without the needto pre-suppose interactional conventions such as lan-guage games. Here, we showed how a syllable systemcould evolve in a population of agents once an explicit

Figure 4 Evolution of the rate of successful imitations for a child agent which learns a syllable system established by apopulation of agents (top curve), and for a child agent which learns a syllable system established randomly by the exper-imenter (bottom curve). We observe the “artificial language learning effect”: children can only perfectly learn the vocali-zation systems which evolved in a population of agents. This shows that in fact, vocalization systems were culturallyselected for learnability.

Oudeyer, P-Y (2005) How phonological structures can be culturally selected for learnability, Adaptive Behavior, 13(4), pp. 269--280.

Page 11: How Phonological Structures Can Be Culturally Selected for

Oudeyer Phonological Structure Selection 279

pressure for efficient communication has been intro-duced, using the “imitation game” interactional frame-work. Moreover, this system constituted the basis forexperiments which illustrated the concept of culturalselection for learnability in the context of phonology.This complements the existing simulations of this phe-nomenon in the field of syntax (Zuidema, 2003;Brighton et al., 2005). We showed, for example, that achild agent could easily learn a syllable system whichevolved in a population of agents, but could not learn arandom syllable system. Indeed, agents possessgeneric learning biases for which certain kinds of syl-lable systems are easy, and some others are difficult tolearn. As a consequence, the process of cultural evolu-tion selects the phonological systems which are easiestto learn and to transmit. These simulations develop ourintuitions about the fantastic ability that children haveto learn vocalization systems. In particular, this showsthat an innate Language Acquisition Device whichspecifies explicitly the form of possible linguistic sys-tems is not necessary to account for the performance ofchildren. Their performance may on the contrary bedue to the fact that speech evolved so as to becomeeasily learnable. Yet, we do not exclude the possibilitythat biological evolution driven by the need to adapt toa linguistic environment took a role; in fact it is veryprobable that genes (in particular those implicated inthe development of the neural system) co-evolved withlanguage, but, as Deacon puts it: “languages have cer-tainly done most of the adapting” (Deacon, 1997).

Note

1 Indeed, the simulations in Oudeyer (2005b,c) showed that,already, much structure can self-organize without the explicitneed for building repertoires of distinctive sounds: Thisincludes phonemic coding (the logical discreteness of con-tinuous vocalizations), statistical regularities of vowel sys-tems, and phonotactics.

References

Archangeli, D., & Langendoen, T. (1997). Optimality theory,an overview. Oxford: Blackwell.

Bailly, G. (1998). Learning to speak. Sensori-motor control ofspeech movements. Speech Communication, 22(2-3),251–267.

Brighton, H., Kirby, S., & Smith, K. (2005). Cultural selectionfor learnability: Three hypotheses underlying the view thatlanguage adapts to be learnable. Oxford: Oxford Univer-sity Press.

Cangelosi, A., & Parisi, D. (2002). Simulating the evolution oflanguage. Berlin: Springer.

Chomsky, N., & Halle, M. (1968). The sound pattern of Eng-lish. New York: Harper Row.

Christiansen, M. (2000). Using artificial language learning tostudy language evolution: Exploring the emergence ofword order universals in language evolution. In J. Dessal-les, A. Wray, & C. Knight (Eds.), Transitions to language(pp. 45–48). Oxford: Oxford University Press.

Cook, P. R. (1989). Synthesis of the singing voice using aphysically parameterized model of the human vocal tract.In Proceedings of the International Computer Music Con-ference (pp. 69–72), Columbus, OH.

de Boer, B. (2001). The origins of vowel systems (Oxford Lin-guistics Series). Oxford: Oxford University Press.

Deacon, T. (1997). The symbolic species. New York: The Pen-guin Press.

Flege, J. (1992). Speech learning in a second language. In C.Ferguson, L. Menn, & C. Stoel-Gammon (Eds.), Phono-logical development: Models, research, implications (pp.565–604). Timonium, MD: York Press.

Gold, E. (1967). Language identification in the limit. Informa-tion and Control, 10, 447–474.

Guenther, F. (2003). Neural control of speech movements. InA. Meyer & N. Schiller (Eds.), Phonetics and phonologyin language comprehension and production: Differencesand similarities (pp. 209–239). Berlin: de Gruyter.

Kaplan, F. (2001). La naissance d’une langue chez les robots.Paris: Hermes Science.

Kirby, S. (2001). Spontaneous evolution of linguistic struc-ture—an iterated learning model of the emergence of reg-ularity and irregularity. IEEE Transactions onEvolutionary Computation, 5(2), 102–110.

Lenneberg, E. (1967). Biological foundations of language.New York: Wiley.

Lindblom, B. (1992). Phonological units as adaptive emergentsof lexical development. In C. Ferguson, L. Menn, & C.Stoel-Gammon (Eds.), Phonological development: Mod-els, research, implications (pp. 565–604). Timonium,MD: York Press.

Long, M. (1990). Maturational constraints on language devel-opment. Studies in Second Language Acquisition, 12,251–285.

Lyon, R. (1997). All pole models of auditory filtering. In E. R.Lewis et al. (Eds.), Diversity in auditory mechanics (pp.205–211). Singapore: World Scientific.

MacNeilage, P. (1998). The frame/content theory of evolutionof speech production. Behavioral and Brain Sciences, 21,499–548.

Oudeyer, P-Y (2005) How phonological structures can be culturally selected for learnability, Adaptive Behavior, 13(4), pp. 269--280.

Page 12: How Phonological Structures Can Be Culturally Selected for

280 Adaptive Behavior 13(4)

Oudeyer, P.-Y. (2001). The origins of syllable systems: anoperational model. In J. Moore & K. Stenning (Eds.). Pro-ceedings of the 23rd Annual Conference of the CognitiveScience Society (pp. 744–749). Mahwah, NJ: LawrenceErlbaum Associates.

Oudeyer, P.-Y. (2005a). From holistic to discrete vocalizations:The blind snow-flake maker hypothesis (pp. 68–99).Oxford: Oxford University Press.

Oudeyer, P.-Y. (2005b). The self-organization of speechsounds. Journal of Theoretical Biology, 233(3), 435–449.

Oudeyer, P.-Y. (2005c). The self-organisation of combinatori-ality and phonotactics in vocalisation systems. ConnectionScience, 17(3), 1–17.

Piattelli-Palmarini, M. (1989). Evolution, selection and cogni-tion: from “learning” to parameter setting in biology andin the study of language. Cognition, 31, 1–44.

Pinker, S., & Bloom, P. (1990). Natural language and naturalselection. Brain and Behavioral Sciences, 13, 707–784.

Redford, M. A., Chen, C. C., & Miikkulainen, R. (2001). Con-strained emergence of universals and variation in syllablesystems. Language and Speech, 44, 27–56.

Rohde, D., & Plaut, D. (1999). Language acquisition in theabsence of explicit negative evidence: how important isstarting small? Cognition, 72, 67–109.

Sakoe, H. (1982). Dynamic programming optimization for spo-ken word recognition. IEEE Transactions on Acoustics,Speech, and Signal Processing, 26, 263–266.

Steels, L. (1997). The synthetic modeling of language origins.Evolution of Communication, 1, 1–35.

Tomasello, M. (2003). Constructing a language: A usage-based theory of language acquisition. Cambridge, MA:Harvard University Press.

Vihman, M. (1996). Phonological development: The origins oflanguage in the child. Oxford: Blackwell.

Vogt, P. (2003). Anchoring of semiotic symbols. Robotics andAutonomous Systems, 43(2-3), 109–120.

Zuidema, W. (2003). How the poverty of the stimulus solvesthe poverty of the stimulus. In S. T. Becker, & K. Ober-mayer (Eds.), Advances in Neural Information ProcessingSystems 15 (Proceedings of NIPS’02) (pp. 51–68). Cam-bridge, MA: MIT Press.

About the Author

Pierre-Yves Oudeyer studied theoretical computer science at Ecole Normale Supérieurede Lyon, and obtained his PhD in Artificial Intelligence at University Paris VI, which pre-sented a computational theory of the origins of speech sounds. He is doing research onthe origins of speech, and builds societies of robots to study how linguistic codes canappear. In particular, he studies the role of self-organization in the coupling between thesensory–motor modalities within agents, and in the coupling of agents. He is also doingresearch in the area of developmental robotics, where he studies how a robot may auton-omously set up its own tasks so that the complexity of its behavior increases in an open-ended manner. In particular, he is working on algorithms of adaptive curiosity.

Oudeyer, P-Y (2005) How phonological structures can be culturally selected for learnability, Adaptive Behavior, 13(4), pp. 269--280.