paul de palma george luger departments of computer science gonzaga university university of new...

19
Metathesis in English and Hebrew A Computational Account of Usage-Based Phonology Paul De Palma George Luger Departments of Computer Science Gonzaga University University of New Mexico ([email protected]) 1

Upload: wendy-underwood

Post on 13-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Paul De Palma George Luger Departments of Computer Science Gonzaga University University of New Mexico (depalma@gonzaga.edu) 1

1

Metathesis in English and HebrewA Computational Account of Usage-Based Phonology

Paul De PalmaGeorge LugerDepartments of Computer ScienceGonzaga UniversityUniversity of New Mexico([email protected])

Page 2: Paul De Palma George Luger Departments of Computer Science Gonzaga University University of New Mexico (depalma@gonzaga.edu) 1

2

Metathesis

Reversal of the expected linear ordering of sounds Instead of xy we find yx Examples

tl shift: borrowed noun chipotle chipolte (SAE: a spice) ts shift: binyan 5 hitsader histader (Modern Hebrew: “he

got organized” ) hr shift: dative singular tehernek dative plural terhek

(Hungarian: “load”) rh shift: Expected tiirhisaskhus actual tihriasku (Pawnee:

“he is called”) Metathesis Myth: sporadic, irregular, due to

performance errors String of sounds realized as xy in language A can be yx in

language B

Page 3: Paul De Palma George Luger Departments of Computer Science Gonzaga University University of New Mexico (depalma@gonzaga.edu) 1

3

Model Levels

A usage-based phonological account (Elizabeth Hume) primarily synchronic

Can be extended to language change Utterance Selection Theory (William Croft)

Genetic Algorithm operationalizes Utterance Selection Theory

Page 4: Paul De Palma George Luger Departments of Computer Science Gonzaga University University of New Mexico (depalma@gonzaga.edu) 1

4

A Usage-Based Account

Metathesis requires two conditions1. An indeterminate speech signal2. Output that conforms to existing patterns in

language Example: chipotle/chipolte

In SAE, tl (stop consonant preceding a lateral) is indeterminate

Stop consonant following the lateral is frequent in post-vocalic position (cold,sold,mold,fold,molt,bolt,jolt,colt)

SAE speakers transform tl to lt

Page 5: Paul De Palma George Luger Departments of Computer Science Gonzaga University University of New Mexico (depalma@gonzaga.edu) 1

5

Utterance Selection Theory

Natural Selection requires: A population of individuals with distinct characteristics A mechanism for replicating those characteristics Interaction among individuals and the environment Selective pressure from the environment producing

differential reproduction of the individuals and characteristics Extended to language:

Language: A population of utterances (not a system of signs or a collection of words and rules that operate on them)

Normal replication: utterance conforms to the conventions of language use

Altered replication: utterance violates convention Selection: graduate establishment of a new convention

through use

Page 6: Paul De Palma George Luger Departments of Computer Science Gonzaga University University of New Mexico (depalma@gonzaga.edu) 1

6

Genetic Algorithm (GA)

Operationalizes (i.e, renders computationally precise) Usage-based account of metathesis Usage-based account of language

change Based loosely on the Darwinian notion of

natural selection

Page 7: Paul De Palma George Luger Departments of Computer Science Gonzaga University University of New Mexico (depalma@gonzaga.edu) 1

7

More Precisely

GA(){

Initialize(population); //build initial populationComputeCost(population); //apply cost functionSort(population); //rank populationwhile (population has not converged on a good-enough solution)

{Pair(population); //decide which members reproduceMate(population); //exchange characteristicsMutate(population); //randomly perturb genes Sort(population); //rank populationTestConvergence(population); //has a new species appeared?

}}

Page 8: Paul De Palma George Luger Departments of Computer Science Gonzaga University University of New Mexico (depalma@gonzaga.edu) 1

8

Cost Function

Embodies most of the theory being modeled. For example,

1. Prevocalic stop (e.g., te) is more salient than a postvocalic stop. Give a fitness boost.

2. Penalize words with postvocalic stops (e.g., et)3. Glottals (e.g., g), liquids (e.g., l), glides (e.g., w)

bleed into adjacent sounds when followed by a stop (e.g., t). Penalize sequences like lt.

4. A stop followed by any non-stop consonant (e.g., tl) is perceptually weak. Penalize stop/non-stop consonant sequences

5. A stop followed by a strident (e.g., ts) is perceptually weak. Penalize prestrident stops.

Page 9: Paul De Palma George Luger Departments of Computer Science Gonzaga University University of New Mexico (depalma@gonzaga.edu) 1

9

As A Result

Each utterance in the population is tagged with a collection of boosts and penalties

The collection makes the underlying phonological theory computationally precise

Page 10: Paul De Palma George Luger Departments of Computer Science Gonzaga University University of New Mexico (depalma@gonzaga.edu) 1

10

Method

Encode the GA as a collection of objects in Java executable under Linux

Parameters Population size: 64 strings Mutation factor: .5% For each of 1, 2, 4 base strings in the population, begin at

parity then double the number of target strings three times Fill out the balance of the population with randomly

generated character sequences For each population configuration

Run GA 250 times 250 generations per run Collect results per run

Page 11: Paul De Palma George Luger Departments of Computer Science Gonzaga University University of New Mexico (depalma@gonzaga.edu) 1

11

More Precisely

1. Input an initial population of the base word and the target word

2. Generate random sequences of characters that fill out the population.

3. Assign a fitness value to each of the sequences that comprise the population.

4. Sort the population by fitness value5. Collect the population into two-tuples from highest to

lowest fitness6. Exchange pieces of sounds between each pair7. Randomly shift a fixed fraction of the sounds the action

of chemical/biological/radiological mutagens on individuals.

8. Sort the population. Stop if some predetermined condition is met, else go to step 3.

Page 12: Paul De Palma George Luger Departments of Computer Science Gonzaga University University of New Mexico (depalma@gonzaga.edu) 1

12

Results

chipotle/chipolteAfter 60 generations chipolte tokens are

95% of the populationchipotle disappears within 3 generations

hitsader/histader After 48 generations histader tokens are

97.3% of the populationhitsader disappears within 2 generations

Page 13: Paul De Palma George Luger Departments of Computer Science Gonzaga University University of New Mexico (depalma@gonzaga.edu) 1

13

Metathesis: Conclusions

Accurate but underspecified Computational model supplies missing

precision Usage-based aspect modeled as a

frequency affect Target tokens tends stabilize more quickly at

a higher fraction as their number in the initial population increases

The larger the number of base tokens in the initial population, the better the performance

Page 14: Paul De Palma George Luger Departments of Computer Science Gonzaga University University of New Mexico (depalma@gonzaga.edu) 1

14

Utterance Selection: Conclusions

Hume’s account of metathesis can be reframed as an account of (one type) of language change

Can be rendered computationally precise using the Genetic Algorithm

Page 15: Paul De Palma George Luger Departments of Computer Science Gonzaga University University of New Mexico (depalma@gonzaga.edu) 1

15

Base:Target Influences Stabilization

Page 16: Paul De Palma George Luger Departments of Computer Science Gonzaga University University of New Mexico (depalma@gonzaga.edu) 1

16

Data: ChipotleRatio of Base to Target

GenerationChipotle Disappeared

GenerationChipolte Stabilized

Percent of Chipolte Tokens at Stabilization

1:1 3 119 73.4

1:2 3 68 93.7

1:4 3 60 96.8

1:8 2 44 98.4

2:2 3 90 92.1

2:4 3 72 96.8

2:8 2 50 98.4

2:16 2 31 98.4

4:4 3 58 96.8

4:8 2 44 98.4

4:16 2 33 98.4

4:32 1 26 98.4

Page 17: Paul De Palma George Luger Departments of Computer Science Gonzaga University University of New Mexico (depalma@gonzaga.edu) 1

17

Data: Hitsader

Ratio of Base to Target

GenerationHitsader Disappeared

GenerationHistader Stabilized

Percent of Histader Tokens at Stabilization

1:1 2 79 84.3

1:2 2 65 98.4

1:4 2 55 98.4

1:8 2 39 98.4

2:2 2 59 98.4

2:4 2 47 98.4

2:8 2 43 98.4

2:16 1 33 98.4

4:4 2 49 98.4

4:8 1 43 98.4

4:16 1 32 98.4

4:32 1 23 100

Page 18: Paul De Palma George Luger Departments of Computer Science Gonzaga University University of New Mexico (depalma@gonzaga.edu) 1

18

Future Research

Use transcribed corpora to determine the frequency of both vulnerable cues and the targets of metathetic change

Use frequencies to weight penalties and rewards (adding precision to statement like, “[they] contribute to indeterminacy: /t/ with perceptually vulnerable cues and /l/ with stretched out features,” Hume, 2004, p.223)

Generate all instances of metathesis within a language

Page 19: Paul De Palma George Luger Departments of Computer Science Gonzaga University University of New Mexico (depalma@gonzaga.edu) 1

19

References

Croft, W. (2000). Explaining Language Change: An Evolutionary Approach. Harlow, England: Pearson.

Hume, E. (2004). The Indeterminancy/Attestation Model of Metathesis. Language 80(2): 203-237.