a stochastic ot analysis of production and perception in ... · web view16 manipulated stimuli for...

A Stochastic OT Analysis of Production and Perception in Yucatec Maya

Melissa FrazierUniversity of North Carolina at Chapel Hill

[email protected]

NELS 39 at Cornell UniversityNovember 7, 2008

I. Introduction

(1) modeling the production and perception of glottalization and pitch in Yucatec Maya at the phonetics-phonology interfacea. production: how the speaker uses a stored abstract form

to generate a continuous phonetic formb. perception: how the listener uses a continuous phonetic

form to identify a stored abstract formc. model used in this paper: Boersma’s (1997, 2006,

2007a,b) bidirectional Stochastic OT in combination with the Gradual Learning Algorithm (Boersma and Hayes 2001)

d. preview of results the GLA can handle real phonetic data to create an

accurate production grammar the associated perception grammar predicts pitch to be

a cue in perceiving a contrast, but a perception experiment yields different results

(2) Boersma’s (2007a: 2031) “bidirectional model of phonology and phonetics with parallel production”:

|underlying form|

/surface form/

[auditory form]

[articulatory form]

a. “the interface between phonology and phonetics resides in a connection between the phonological surface form and the auditory-phonetic form” (Boersma 2007b: 4)

b. focus of this paper (see §3):

/surface form/

[phonetic form]

(3) accounting for variation with Stochastic OTa. variation naturally occurs in continuous phonetic

productions; e.g. a surface form is consistently marked for high tone, but individual phonetic forms are produced with different pitch values

b. a constraint ranking is defined by the ranking value of each constraint; if C1 has a larger ranking value than C2, then C1 » C2

c. in classic OT, constraints have the same ranking value at every point of evaluation

C1 » C2 » C399 88 83| | |

d. in StOT, a constraint’s rank is defined by a mean ranking value; at a single point of evaluation, statistical noise is added to each constraint’s mean ranking value:

C1 » C2 »/« C399 88 83

e. at any point of evaluation, the constraint ranking, and hence the winning candidate, may differ from another point of evaluation

(4) Why Yucatec Maya?a. the sound systems of Mayan languages are under-

studied; data from these languages is important for cross-linguistic analysis

b. Yucatec Maya utilizes an unusual set of suprasegmental contrasts (see §2); some aspects of this system seem to be traditionally phonological (i.e. categorical), while others seem to be traditionally phonetic (i.e. gradient).

(5) summary of talka. phonological and phonetic description of Yucatec Mayab. bidirectional Stochastic OT and the Gradual Learning

Algorithm: description and analysis

1

production

co

mpr

ehen

sion

prod.

com

p.

c. methods and results of perception experimentd. discussion and conclusions

2. Yucatec Maya

(6) about Yucatec Mayaa. member of the Yucatecan branch of the Mayan language

familyb. spoken by about 700,000 in Yucatán, Campeche, and

Quintana Roo, México (1990 census, Gordon 2005).

(7) relevant phonological contrasts (see Bricker et al. 1998 for concise description)a. four different “vowel shapes”:

short (no tonal marking), e.g. chak ‘red’ long, low tone, e.g. chaak ‘boil’ long, high tone, e.g. cháak ‘rain’ long, modal voice interrupted by creaky voice, initial

high pitch, e.g. cha’ak ‘starch’b. five vowel qualities [i e a o u]c. each quality can appear with each shaped. only the last two vowel shapes, henceforth “high tone”

and “glottalized”, are analyzed in this paper e. orthographically, high tone vowels (áa) are marked by an

acute accent and glottalized vowels (a’a) are marked by an apostrophe

(8) phonetic analysis of high tone and glottalized vowelsa. see Frazier (to appear) for full details on experimental

methods used to obtain these results b. this paper only reports on a subset of the collected data

only participants from Santa Elena, Yucatán, México only existing words (nonce forms excluded) 12 participants (5 males, 7 females) each produce 19

words with a glottalized vowel and 20 words with a high tone vowel

(9) production of glottalization

a. only 50% of glottalized vowels are produced with glottalization

b. common types of glottalization: creaky middle (22%) creaky last half

(6%) glottal stop (2%)ta’ab ‘salt’ ta’ab ‘salt’ k’a’an ‘strong’

c. 3% of high tone vowels are produced with glottalization

(10) measurement of pitcha. PRAAT (Boersma and Weenink 2006) used to extract

pitch values in Hz, then Hz converted to semitones (Nolan 2003), relative to each speaker’s baseline (similar to methods of Pierrehumbert 1980), where the baseline is the average pitch value produced by a given speaker at the mid point of low tone vowels

b. equation: 12*log2(produced Hz/baseline Hz)c. e.g.: a pitch measurement of 2.7 indicates that pitch is

2.7 semitones higher than that speaker’s baselined. allows for direct comparison of measurements from

males and females

(11) production of pitch

-0.5

0

0.5

1

1.5

2

initial middle final

12*lo

g2(p

rodu

ced

Hz/

bas

elin

e H

z)

glottalized

high tone

a. both high tone and glottalized vowels display a falling pitch contour

b. initial pitch is significantly higher in glottalized vowels than in high tone vowels (p=.03, t(427)=2.21, using a linear mixed regression model to account for multiple observations within subjects)

(12) comparison of glottalized and high tone vowelsa. glottalization is more likely to occur with the glottalized

2

vowel, but it occurs only half the time and is usually realized as creaky voice

b. initial pitch is generally higher with glottalized vowels and drops more rapidly (due to onset of creaky voice)

c. glottalized and high tone vowels significantly differ in the production of both glottalization and pitch

d. the production grammar accounts for these differences, and results in a perception grammar that treats both glottalization and pitch as important cues to comprehension (§3)

e. results of perception experiment show that native speakers do not use pitch as a cue to comprehension (§4)

3. Bidirectional Stochastic OT and the Gradual Learning Algorithm

3.1 Description of the Model

(13) relevant constraints for production and perception (Boersma 2007b: 5)

/surface form/ structural constraints (markedness)

cue constraints(faithfulness)[phonetic form] articulatory constraints(markedness)

(14) the production grammara. a surface form is the input in each tableau, and phonetic

forms compete as output candidatesb. articulatory constraints act like markedness constraints

and penalize costly phonetic productions (regardless of what the input is); e.g. “*F1=700 Hz”

c. cue constraints act like faithfulness constraints and penalize certain pairings of surface and phonetic forms; e.g. “*/a/, [F1=500 Hz]” says that an /a/ is not produced with an F1 of 500 Hz and assigns a violation mark any time a surface form with /a/ is paired with a phonetic form with an F1 of 500 Hz

d. example production tableau:

/a/*/a/, [F1= 500 Hz]

*/a/, [F1= 700 Hz]

*[F1= 700 Hz]

*/a/, [F1= 600 Hz]

[F1= 500 Hz]

*!

[F1= 600 Hz]

*

[F1= 700 Hz]

*! *

(15) the perception grammara. a phonetic form is the input in each tableau, and surface

forms compete as output candidatesb. articulatory constraints have no effect (because they do

not penalize surface forms, which are the candidates)c. structural constraints assign violation marks to surface

forms as applicabled. example perception tableau:

[F1= 600 Hz]

*/u/, [F1= 600 Hz]

*/o/, [F1= 600 Hz]

*[F1= 600 Hz]

*/back/ */a/, [F1= 600 Hz]

/a/ * * /o/ *! * /u/ *! *

(16) the Gradual Learning Algorithm (Boersma and Hayes 2001)a. model of language acquisition: how the learner adjusts an

interim constraint ranking when faced with data that contradicts that ranking

b. initial state: all constraints have same ranking valuec. each time the interim ranking predicts an “incorrect

winner” (), all constraints that favor this candidate over the “correct winner” () are demoted and all constraints that favor the “correct winner” over the “incorrect winner” are promoted

→ → ← → ←/s. f./ C1 C2 C3 C4 C5 C6

[cand1] * * * [cand2] * *

3.2 Analysis with the Model

(17) PRAAT is used to run learning simulations with the GLA (all default settings, see Boersma 1999 or the PRAAT manual)

3

(18) production input and outputa. only “glottalized vowels” and “high tone vowels” are

possible inputs: /gl/, /hi/b. outputs consist of initial pitch values and glottalization

types, e.g. [150, creaky] is an output produced with an initial pitch of 150 Hz and with creaky voice

c. each acoustic dimension used to define phonetic forms has to be categorized in some way; in order to simplify the analyses, I work with rather broad categories, e.g. defining glottalization as either [modal], [creaky], or [glottal stop]

d. the distributions for specific input and output pairings come from the production study (see §2; tokens where no pitch value could be determined for the initial portion of the vowel are excluded)

(19) constraintsa. one cue constraint penalizing the pairing of each possible

input with each possible output (as defined for a specific phonetic dimension)

b. e.g., if the possible glottalization types are [+glot] and [-glot], we need four cue constraints: */gl/, [+gl]; */hi/, [+gl]; */gl/, [-gl]; */hi/, [-gl]

c. articulatory and structural constraints are ignored

(20) learning simulationa. two possible inputs: /gl/, /hi/b. sixteen possible outputs

four types of glottalization: [modal], [short creak], [long creak], [glottal stop], where [short creak] ≤ 40 ms and [long creak] > 40 ms

four types of initial pitch: [L], [ML], [MH], [H], where [L] ≤ 0 (semitones above baseline), 0 < [LM] ≤ 1, 1 < [MH] ≤ 3, [H] > 3

c. distribution of outputs from production study L ML MH H

modal 39 18% 29 13

% 22 10% 21 10

%/gl/

n=215

short creak 13 7% 7 3% 9 4% 6 3%long

creak 22 10% 9 4% 22 10

% 11 5%glottal

stop 3 1% 1 0% 1 0% 0 0%

modal 103

46% 51 23

% 50 22% 15 7%

/hi/n=22

5

short creak 0 0% 0 0% 1 0% 0 0%long

creak 2 1% 1 0% 2 1% 0 0%glottal

stop 0 0% 0 0% 0 0% 0 0%d. constraint ranking

94

96

98

100

102

104

106

[mod] [creak] [gs]

/gl/ /hi/

92

99

106

[mod] [sc] [lc] [gs]98

100

102

[L] [ML] [MH] [H]

e. empirical output distributions (shaded cells) compared to predicted output distributions (non-shaded cells)

L ML MH Hmodal 1

817

13

10

10

13

10 9

/gl/short creak 6 6 3 4 3 4 3 3long

creak10

10 4 6 1

0 8 5 5glottal

stop 1 1 0 1 0 1 0 0

modal 46

44

23

23

22

23 7 7

/hi/short creak 0 0 0 0 0 0 0 0long

creak 1 1 0 0 1 0 0 0glottal

stop 0 0 0 0 0 0 0 0 predicted output distributions closely match empirical

output distributions – the GLA and Stochastic OT can handle real linguistic data to develop a constraint ranking that accurately predicts the phonetic output distributions for a given surface form

f. example tableau for one point of evaluation103. 103. 101. 100. 100. 99.2 99.5 98.8

4

2 0 8 3 2

/gl/

*/gl/,

[ML]

*/gl/,

[gs]

*/gl/,

[lc]

*/gl/,

[MH]

*/gl/,

[sc]

*/gl/,

[H]

*/gl/,

[L]

*/gl/,

[m

od]

[L, mod] * * [L, sc] *! * [L, lc] *! * [L, gs] *! * [ML, mod]

*! *

[ML, sc] *! * [ML, lc] *! * [ML, gs] *! * [MH, mod]

*! *

[MH, sc] *! * [MH, lc] *! * [MH, gs] *! * [H, mod] *! * [H, sc] *! * [H, lc] *! * [H, gs] *! *

(21) see appendix for additional learning simulations with comparable results (simulations used nine possible outputs (three-way phonetic dimensions) and twenty-five possible outputs (five-way phonetic dimensions))

(22) perception grammara. in the perception grammar, a phonetic form (e.g. [L,

modal]) is the input and the surface forms /gl/ and /hi/ compete as candidates

b. same cue constraints with same mean ranking valuesc. constraints that conflict in perception are not the same as

the constraints that conflict in production

(23) percentage of times the perception grammar predicts a given input (phonetic form) will be heard as a glottalized vowel /gl/:

L ML MH Hmodal 32% 39% 41% 59%short creak 55% 55% 57% 65%long

creak 49% 49% 52% 63%glottal 88% 87% 88% 88%

stopa. both a glottal stop and creaky voice increase the

probability of being perceived as /gl/ (glottal stop more so)

b. [long creak] is less likely than [short creak] to be perceived as /gl/

c. as pitch increases, the probability of being perceived as /gl/ increases (except in conjunction with [glottal stop])

d. prediction: pitch and glottalization participate in a trading relation in perception (Repp 1982)

4. Perception Experiment: Testing the Grammar

(24) participants: 14 native speakers of Yucatec Maya living in Santa Elena, Yucatán, Méxicoa. 5 males (ages 23, 43, 44, 64, 69); 9 females (ages 21,

21, 21, 23, 26, 31, 34, 38, 65)b. most had only lived in Santa Elenac. all fluent in Spanish, two also fluent in Englishd. perception experiment occurred one year after

production experiment – many subjects participated in both experiments

(25) procedurea. forced choice task: participants heard a stimulus and

were asked to choose between a word with a high tone vowel and a word with a glottalized vowel (either k’a’an ‘strong’ vs. k’áan ‘hammock’ or cha’ak ‘starch’ vs. cháak ‘rain’)

b. stimuli were manipulated from one production of k’an ’ripe’ and of chak ‘red’ (short vowel with mid pitch and modal voice, produced by a male from Mérida, Yucatán)

c. 16 manipulated stimuli for each minimal pair (32 stimuli total) four types of glottalization (all modal, ≈30 ms of

creaky voice, ≈70 ms of creaky voice, full glottal stop)

four values for initial pitch (125, 140, 155, 170 Hz; -0.7, 1.2, 3.0, 4.6 semitones above baseline)

d. vowels of all manipulated stimuli ≈200 ms longe. participants heard each stimulus three times, for 96 trialsf. rejected data: three participants always selected cháak

5

when given the choice of cháak vs. cha’ak, and so these responses were not included in this analysis (48 rejected trials for 3 participants)

(26) resultsa. percentage of time participants selected a word with a

glottalized vowelL (-0.7) ML

(1.2)MH

(3.0) H (4.6)modal 27% 25% 23% 29%

short creak (≈20 ms) 44% 44% 41% 37%

long creak (≈70 ms) 61% 63% 63% 69%

glottal stop 79% 85% 77% 83%b. significant effect of glottalization (p < .0001, Wald

χ2(3)=189.2), nonsignificant effect of pitch (p =.64, Wald χ2(3)=1.67), nonsignificant interaction (p =.92, Wald χ2(9)=3.81)

c. productions of glottalized and high tone vowels differ by both pitch and glottalization and the perception grammar predicts pitch to be a cue, but listeners are choosing by glottalization alone and not pitch

d. surprisingly, even a full glottal stop does not guarantee the selection of a glottalized vowel

e. comparison of empirical results (shaded cells) and predicted results (non-shaded cells)

L ML MH Hmodal 2

732

25

39

23

41

29

59

short creak 44

55

44

55

41

57

37

65

long creak 61

49

63

49

63

52

69

63

glottal stop 79

88

85

87

77

88

83

88

f. for native listeners, an increase in the amount of glottalization (regardless of pitch value) increases the probability of a an utterance being comprehended as a word with a glottalized vowel

(27) two incorrect predictionsa. predicted %/gl/ increases with pitchb. predicted %/gl/ lower for [long creak] than [short creak]

5. Discussion and Conclusions

(28) implications for future worka. regardless of the model used for analysis, production and

perception results show that, in Yucatec Maya, speakers control a phonetic parameter that listeners do not use

b. cf. difference between prototypicality judgments and production frequencies (Boersma 2006): listeners prefer phonetic forms with more extreme values, but such phonetic forms are sparse in production data

c. using bidirectional StOT, cue constraints cannot account for mismatches between production and perception

d. articulatory constraints can account for such mismatches, but is there articulatory motivation for the production of super-high pitch?

(29) conclusionsa. native speakers of Yucatec Maya produce glottalized

vowels and high tone vowels with consistent differences in both pitch and glottalization

b. native listeners of Yucatec Maya determine whether they heard a glottalized vowel or a high tone vowel on the basis of glottalization alone

c. the GLA and StOT can use empirical data to develop a constraint ranking that accurately accounts for production

d. a model of production and perception must be able to account for the use of extreme phonetic values in production that are not used in perception

6

ReferencesBoersma, Paul. 1997. How We Learn Variation, Optionality, and

Probability. Proceedings of the Institute of Phonetic Sciences, University of Amsterdam 21: 43–58.

Boersma, Paul. 1999. Optimality-Theoretic Learning in the PRAAT Program. IFA Proceedings 23: 17-35.

Boersma, Paul. 2006. Prototypicality Judgments as Inverted Perception. In G. Fanselow, C. Féry, M. Schlesewsky & R. Vogel (eds.): Gradience in Grammar, 167-184. Oxford: Oxford University Press.

Boersma, Paul. 2007a. Some Listener-Oriented Accounts of h-Aspiré in French. Lingua 117: 1989-2054.

Boersma, Paul. 2007b. Cue Constraints are their Interaction in Phonological Perception and Production. Ms. University of Amsterdam. ROA 944.

Boersma, Paul and Bruce Hayes. 2001. Empirical Tests of the Gradual Learning Algorithm. Linguistic Inquiry 32: 45-86.

Boersma, Paul and David Weenink. 2006. PRAAT: doing phonetics by computer (Version 4.4.05) [Computer program]. Retrieved from http://www.praat.org/.

Bricker, Victoria, Eleuterio Po?ot Yah, and Ofelia Dzul Po?ot. 1998. A Dictionary of The Maya Language As Spoken in Hocabá, Yucatán. Salt Lake City: University of Utah Press.

Frazier, Melissa. To appear. “Tonal Dialects and Consonant-Pitch Interaction in Yucatec Maya.” New Perspectives in Mayan Linguistics. WPL-MIT. H. Avelino, J. Coon, and E. Norcliffe (Eds.)

Gordon, Raymond G., Jr. (ed.), 2005. Ethnologue: Languages of the World, Fifteenth edition. Dallas, Tex.: SIL International. Online version: http://www.ethnologue.com/.

Nolan, Francis. 2003. Intonational Equivalence: an Experimental Evaluation of Pitch Scales. Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona. 771-774.

Pierrehumbert, Janet. 1980. The Phonology and Phonetics of English Intonation. Ph.D. dissertation, MIT.

Repp, Bruno H. 1982. Phonetic Trading Relations and Context Effects: New Experimental Evidence for a Speech Mode of Perception. Psychological Bulletin 92: 81-110.

AcknowledgementsI enthusiastically thank all the participants in both the production and perception experiments and Santiago Domínguez for invaluable assistance in Santa Elena. I would also like to thank Jennifer Smith, Elliott Moreton, and David Mora-Marin for helpful feedback at various stages of this project and Chris Wiesen for statistical consulting.This work was support by the Luis Quirós Varela Graduate Student Travel Fund (supplemented by the ISA Mellon Dissertation Fellowship at UNC-CH) and the Jacobs Research Fund (Whatcom Museum, Bellingham, WA).

Appendix: Additional Learning Simulations

(1) learning simulation with three-way phonetic dimensionsa. two possible inputs: /gl/, /hi/b. nine possible outputs

three types of glottalization: [modal], [creaky], [glottal stop]

three types of initial pitch: [L], [M], [H], where [L] < 1 (semitones above baseline), 1 ≤ [M] < 3, [H] ≥ 3

c. resulting constraint ranking

94

96

98

100

102

104

106

[mod] [creak] [gs]

/gl/ /hi/

94

98

102

106

[mod] [creak] [gs]99

100

101

[L] [M] [H]

d. production grammar: empirical output distributions (shaded cells) compared to predicted output distributions (non-shaded cells)

/gl/ /hi/L M H L M H

mod 18

19

19

18

14

15

46

46

36

36

16

16

creak 16

17

15

16

15

13

1 1 1 1 0 0

gs 1 1 1 1 0 1 0 0 0 0 0 0e. perception grammar: percentage of times /gl/ is selected

by participants (shaded cells) compared to percentage of times /gl/ is the outp ut of the perception grammar (non-shaded cells)

L M Hmod 22 34 24 38 21 55

creak 56 53 57 55 60 86gs 82 86 83 86 86 86

(2) learning simulation with five-way phonetic dimensionsa. two possible inputs: /gl/, /hi/b. twenty-five possible outputs

five types of glottalization: [modal], [short creak], [medium creak], [long creak], [glottal stop], where [short creak] < 30 ms, [medium creak] < 60 ms, and [long creak] ≥ 60 ms

7

five types of initial pitch: [L], [LM], [M], [HM], [H], where [L] < 0 (semitones above baseline), 0 ≤ [LM] < 1, 1 ≤ [M] < 2, 2 ≤ [MH] < 4, [H] ≥ 4;

c. resulting constraint ranking

94

96

98

100

102

104

106

[mod] [creak] [gs]

/gl/ /hi/

92

99

106

[mod] [sc] [mc] [lc] [gs]98

100

102

[L] [ML] [M] [MH] [H]

d. production grammar: empirical output distributions (shaded cells) compared to predicted output distributions (non-shaded cells)

L ML M MH Hmod 5 6 1

313

13

11

10

13

10 9

/gl/sc 0 1 2 2 2 2 2 2 2 2mc 3 2 6 5 4 4 0 1 2 2lc 2 2 3 4 6 4 2 3 3 2gs 0 0 1 1 0 0 0 1 0 0mod

10 9 3

634

23

23

22

23 7 6

/hi/sc 0 0 0 0 0 0 0 0 0 0mc 0 0 0 1 0 0 1 0 0 0lc 0 0 0 0 0 0 0 0 0 0gs 0 0 0 0 0 0 0 0 0 0

e. perception grammar: percentage of times /gl/ is selected by participants (shaded cells) compared to percentage of times /gl/ is the output of the perception grammar (non-shaded cells)

L ML M MH Hmod 27 46 na 33 25 39 23 41 29 60

sc na 84 na 87 na 87 na 88 na 88mc 44 45 na 36 44 40 41 42 37 58

lc 61 45 na 36 63 41 63 41 69 56gs 79 90 na 90 85 90 77 90 83 90

8

a stochastic ot analysis of production and perception in ... · web view16 manipulated stimuli for...

Documents