synthesis & evaluation of prosodically exaggerated utterances: a preliminary study

26
Synthesis & evaluation of prosodically exaggerated utterances: A preliminary study Kyuchul Yoon Division of English Kyungnam University Spring 2008 Joint Conference of KSPS & KASS

Upload: hedwig-bryant

Post on 30-Dec-2015

39 views

Category:

Documents


0 download

DESCRIPTION

Synthesis & evaluation of prosodically exaggerated utterances: A preliminary study. Kyuchul Yoon Division of English Kyungnam University Spring 2008 Joint Conference of KSPS & KASS. Contents. Synthesis & evaluation of human utterances with exaggerated prosody - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Synthesis & evaluation of  prosodically exaggerated utterances: A preliminary study

Synthesis & evaluation of prosodically exaggerated utterances:

A preliminary study

Kyuchul YoonDivision of English

Kyungnam UniversitySpring 2008 Joint Conference of KSPS & KASS

Page 2: Synthesis & evaluation of  prosodically exaggerated utterances: A preliminary study

2

Contents

• Synthesis & evaluation of human utterances with exaggerated prosody

• Synthesis of exaggerated prosody– Useful for native utterances– The definition of prosody “exaggeration”– The algorithm

• Evaluation of exaggerated prosody– Useful for evaluating learner utterances– The algorithm & an experiment

Page 3: Synthesis & evaluation of  prosodically exaggerated utterances: A preliminary study

3

Teaching & evaluating prosody

• Teaching language prosody– The need for “exaggeration” of native utterances– How to define “exaggeration”

• Evaluating language prosody– Given the native version of an utterance,

evaluate learner’s utterances w/ atypical prosody– How to measure the differences btw/ the native

and learner utterances

Page 4: Synthesis & evaluation of  prosodically exaggerated utterances: A preliminary study

4

Exaggerating native prosody

• Exaggeration of the F0 contour– One way would be to make the pitch peaks/valleys

higher/lower

• Exaggeration of the intensity contour– One way would be to manipulate the intensity contour

of the pitch peaks/valleys

• Exaggeration of the segmental durations– One way would be to manipulate the segmental

durations of the pitch peaks/valleys

Page 5: Synthesis & evaluation of  prosodically exaggerated utterances: A preliminary study

5

Exaggerating native prosody

The fundamental frequency (F0) contour of an utterance Marianna!.

F0

Page 6: Synthesis & evaluation of  prosodically exaggerated utterances: A preliminary study

6

Exaggerating native prosodyIntensity

The intensity contour of an utterance Marianna!.

Page 7: Synthesis & evaluation of  prosodically exaggerated utterances: A preliminary study

7

Exaggerating native prosodyDuration

The segmental durations of an utterance Marianna! before and after the exaggeration.

Page 8: Synthesis & evaluation of  prosodically exaggerated utterances: A preliminary study

8

Algorithm: prosody exaggeration

• Definition of prosody exaggeration– F0 contour

• Make pitch peaks/valleys higher/lower in Hz values

– Intensity contour• Make pitch peaks higher in dB values

– Segmental durations• Make pitch peaks longer in times values

Page 9: Synthesis & evaluation of  prosodically exaggerated utterances: A preliminary study

9

Algorithm: prosody exaggerationF0

Page 10: Synthesis & evaluation of  prosodically exaggerated utterances: A preliminary study

10

Algorithm: prosody exaggerationIntensity

Page 11: Synthesis & evaluation of  prosodically exaggerated utterances: A preliminary study

11

Algorithm: prosody exaggerationDurations

Page 12: Synthesis & evaluation of  prosodically exaggerated utterances: A preliminary study

12

How Praat script works

Page 13: Synthesis & evaluation of  prosodically exaggerated utterances: A preliminary study

13

How Praat script worksF0

Intensity

Durations

Page 14: Synthesis & evaluation of  prosodically exaggerated utterances: A preliminary study

14

How Praat script worksOriginal

F0Durations

Intensity

F0Durations

Page 15: Synthesis & evaluation of  prosodically exaggerated utterances: A preliminary study

15

Evaluating learner prosody• Assumes the existence of the native version• Evaluates the learner versions• Evaluation of the F0 & intensity contours

– Is preceded by duration manipulation:• The durations of the matching segments of the two utterances are

made identical [3]

– Is preceded by F0/intensity normalization & F0 smoothing• The mean difference is added/subtracted to/from learner utterance

– Is followed by pitch/intensity point-to-point comparison

• Evaluation of segmental durations– Done without any duration manipulation. Segment-to-

segment comparison

• Evaluation measure: Euclidean distance metric

Page 16: Synthesis & evaluation of  prosodically exaggerated utterances: A preliminary study

16

Algorithm: prosody evaluation

Before & after duration manipulation

native

learnerbefore

learnerafter

Page 17: Synthesis & evaluation of  prosodically exaggerated utterances: A preliminary study

17

Algorithm: prosody evaluation

F0 point-to-point comparison btw/ native and learner

native

learnerafter

Page 18: Synthesis & evaluation of  prosodically exaggerated utterances: A preliminary study

18

Algorithm: prosody evaluation

Intensity point-to-point comparison btw/ native and learner

native

learnerafter

Page 19: Synthesis & evaluation of  prosodically exaggerated utterances: A preliminary study

19

Algorithm: prosody evaluation

Duration segment-to-segment comparison btw/ native and learner

native

learnerbefore

P = (p1, p2, p3,..., pn) and Q = (q1, q2, q3,..., qn) in Euclidean n-space

Euclidean distance metric for evaluation measure

Page 20: Synthesis & evaluation of  prosodically exaggerated utterances: A preliminary study

20

A pilot experiment

native

learnerafter

Euclidean distance should be minimum

Page 21: Synthesis & evaluation of  prosodically exaggerated utterances: A preliminary study

21

A pilot experiment

native

F0: -100Hz to +100Hz with a 10Hz interval 21 stimuliIntensity: -25dB to +25dB with a 5dB interval 11 stimuliDuration: 0.25, 0.50, 0.75, 1.00, 1.50, 2.00, 2.50, 3.00 times the original 8 stimuli

learnerafter

Page 22: Synthesis & evaluation of  prosodically exaggerated utterances: A preliminary study

22

Results & Conclusion

Page 23: Synthesis & evaluation of  prosodically exaggerated utterances: A preliminary study

23

Results & Conclusion

Page 24: Synthesis & evaluation of  prosodically exaggerated utterances: A preliminary study

24

Results & Conclusion

Page 25: Synthesis & evaluation of  prosodically exaggerated utterances: A preliminary study

25

Results & Conclusion

• Prosody exaggeration – Can be a tool for teaching language prosody

– Can be used to test measures for evaluating prosody

• Limitation of the current prosody evaluation– Native utterances should exist to yield measures

• TTS systems with advanced prosody models could be helpful

– “Weights” of the three separate measures (F0/intensity/duration) need to be determined

• Experiments with human evaluators could provide the weights

Page 26: Synthesis & evaluation of  prosodically exaggerated utterances: A preliminary study

26

References[1] Boersma, Paul. 2001. Praat, a system for doing phonetics by computer. Glot

International 5(9/10). pp.341-345.[2] Moulines, E. & F. Charpentier. 1990. Pitch synchronous waveform processing

techniques for text-to-speech synthesis using diphones. Speech Communication 9. pp.453-467.

[3] Yoon, K. 2007. Imposing native speakers' prosody on non-native speakers' utterances: The technique of cloning prosody. Journal of the Modern British & American Language & Literature 25(4). pp.197-215.