synthesis & evaluation of prosodically exaggerated utterances: a preliminary study
DESCRIPTION
Synthesis & evaluation of prosodically exaggerated utterances: A preliminary study. Kyuchul Yoon Division of English Kyungnam University Spring 2008 Joint Conference of KSPS & KASS. Contents. Synthesis & evaluation of human utterances with exaggerated prosody - PowerPoint PPT PresentationTRANSCRIPT
Synthesis & evaluation of prosodically exaggerated utterances:
A preliminary study
Kyuchul YoonDivision of English
Kyungnam UniversitySpring 2008 Joint Conference of KSPS & KASS
2
Contents
• Synthesis & evaluation of human utterances with exaggerated prosody
• Synthesis of exaggerated prosody– Useful for native utterances– The definition of prosody “exaggeration”– The algorithm
• Evaluation of exaggerated prosody– Useful for evaluating learner utterances– The algorithm & an experiment
3
Teaching & evaluating prosody
• Teaching language prosody– The need for “exaggeration” of native utterances– How to define “exaggeration”
• Evaluating language prosody– Given the native version of an utterance,
evaluate learner’s utterances w/ atypical prosody– How to measure the differences btw/ the native
and learner utterances
4
Exaggerating native prosody
• Exaggeration of the F0 contour– One way would be to make the pitch peaks/valleys
higher/lower
• Exaggeration of the intensity contour– One way would be to manipulate the intensity contour
of the pitch peaks/valleys
• Exaggeration of the segmental durations– One way would be to manipulate the segmental
durations of the pitch peaks/valleys
5
Exaggerating native prosody
The fundamental frequency (F0) contour of an utterance Marianna!.
F0
6
Exaggerating native prosodyIntensity
The intensity contour of an utterance Marianna!.
7
Exaggerating native prosodyDuration
The segmental durations of an utterance Marianna! before and after the exaggeration.
8
Algorithm: prosody exaggeration
• Definition of prosody exaggeration– F0 contour
• Make pitch peaks/valleys higher/lower in Hz values
– Intensity contour• Make pitch peaks higher in dB values
– Segmental durations• Make pitch peaks longer in times values
9
Algorithm: prosody exaggerationF0
10
Algorithm: prosody exaggerationIntensity
11
Algorithm: prosody exaggerationDurations
12
How Praat script works
13
How Praat script worksF0
Intensity
Durations
14
How Praat script worksOriginal
F0Durations
Intensity
F0Durations
15
Evaluating learner prosody• Assumes the existence of the native version• Evaluates the learner versions• Evaluation of the F0 & intensity contours
– Is preceded by duration manipulation:• The durations of the matching segments of the two utterances are
made identical [3]
– Is preceded by F0/intensity normalization & F0 smoothing• The mean difference is added/subtracted to/from learner utterance
– Is followed by pitch/intensity point-to-point comparison
• Evaluation of segmental durations– Done without any duration manipulation. Segment-to-
segment comparison
• Evaluation measure: Euclidean distance metric
16
Algorithm: prosody evaluation
Before & after duration manipulation
native
learnerbefore
learnerafter
17
Algorithm: prosody evaluation
F0 point-to-point comparison btw/ native and learner
native
learnerafter
18
Algorithm: prosody evaluation
Intensity point-to-point comparison btw/ native and learner
native
learnerafter
19
Algorithm: prosody evaluation
Duration segment-to-segment comparison btw/ native and learner
native
learnerbefore
P = (p1, p2, p3,..., pn) and Q = (q1, q2, q3,..., qn) in Euclidean n-space
Euclidean distance metric for evaluation measure
20
A pilot experiment
native
learnerafter
Euclidean distance should be minimum
21
A pilot experiment
native
F0: -100Hz to +100Hz with a 10Hz interval 21 stimuliIntensity: -25dB to +25dB with a 5dB interval 11 stimuliDuration: 0.25, 0.50, 0.75, 1.00, 1.50, 2.00, 2.50, 3.00 times the original 8 stimuli
learnerafter
22
Results & Conclusion
23
Results & Conclusion
24
Results & Conclusion
25
Results & Conclusion
• Prosody exaggeration – Can be a tool for teaching language prosody
– Can be used to test measures for evaluating prosody
• Limitation of the current prosody evaluation– Native utterances should exist to yield measures
• TTS systems with advanced prosody models could be helpful
– “Weights” of the three separate measures (F0/intensity/duration) need to be determined
• Experiments with human evaluators could provide the weights
26
References[1] Boersma, Paul. 2001. Praat, a system for doing phonetics by computer. Glot
International 5(9/10). pp.341-345.[2] Moulines, E. & F. Charpentier. 1990. Pitch synchronous waveform processing
techniques for text-to-speech synthesis using diphones. Speech Communication 9. pp.453-467.
[3] Yoon, K. 2007. Imposing native speakers' prosody on non-native speakers' utterances: The technique of cloning prosody. Journal of the Modern British & American Language & Literature 25(4). pp.197-215.