imposing native speakers’ prosody on non-native speakers’ utterances: preliminary studies...

Imposing native speakers’ prosody on non-native speakers’ utterances:

Preliminary studies

Kyuchul YoonSpring 2006 NAELL

The Division of English

Kyungnam University

2

Contents

• Acquiring prosody in language learning…...3• Previous approaches……………………….4• A new tool…………………………………5• Technical details…………………………...6• Implications of the technique…………….15• Preliminary plans for an experiment ……..16

3

Acquiring prosody in language learning

• One of the critical tasks in language learning

• Prosody as non-segmental features of speech1. phrase breaks2. intonation (F0) contour3. segmental durations4. intensity contour

4

Previous approaches

• Explicit teaching of prosodic features such as the intonation contours, segmental durations, etc.

• Audio aidListen and repeat!

• Visual aid in computer softwareDr.Speaking® : F0 contour comparison between native speaker and non-native speaker

5

A new tool

• A new kind of audio aidin the form of a non-native speaker’s utterance with the prosodic features of a native speaker’s utterance

• How this works1. Software presents a native speaker’s utterance2. A non-native speaker repeats the utterance3. Software records the non-native speaker’s utterance4. Software imposes the native speaker’s prosody onto the non-native speaker’s utterance5. Software presents the processed non-native utterance

6

Technical details

• Manipulation of1. segmental durations, including phrase breaks 2. F0 contours 3. intensity contours

• For 1 and 2PSOLA (Pitch Synchronous OverLap and Add), developed by Moulines & Charpentier, 1990implemented in Praat

• For 3Intensity swap in Praat

7

Technical detailsMoulines & Charpentier, 1990

original waveform

windowed waveform

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

1 4 7 10 13 16 19

shortened waveform

1 3 5 7 9 11 13 15 17 19

waveform with lower F0

8

Technical details 1Segmental durations

• Segmental alignment & PSOLA processing: Alignment can be manual or automatic (with the help of speech recognition)

k eI m i n “…came in…”native

k eI i nnon-native m

9

Technical details 2F0 contours

• PSOLA processing on duration-treated utterance

k eI m i nnative

non-native k eI m i n

higher F0

lower F0

10

Technical details 3Intensity contours

• Mathematically “neutralize” non-native speaker’s intensity contour and transfer native speaker’s intensity contour in Praat – Holger Miterer (personal communication)

k eI m i nnative

non-native k eI m i n

11

Technical details

• Weakness1. Voiceless segments can be made “voiced” in the windowing process (pitch-synchronous technique)2. Excessive handling results in unnatural synthesis

• Segment alignmentshould be fine-tuned according to the voiced/voicless status of the (sub-)segments for better results

12

Technical detailsExamples

Praat script

native utterance

non-native utterance

synthetic non-native

13

Technical detailsComparison before synthesis – duration, F0 & intensity

native utterance

non-native utterance

(blue & yellow)

14

Technical detailsComparison after synthesis – duration, F0 & intensity

native utterance

synthetic non-native

(blue & yellow)

15

Implications of the technique

• The technique can be used in second language education:

to facilitate/motivate acquisition of the target language prosody

to emphasize the importance of prosody in achieving native speaker fluency

• ASR (Automatic Speech Recognition) can be employed to automate the segment aligning stage

16

Preliminary plans for an experiment

• HypothesisThe new type of audio feedback improves the efficiency of language, i.e. prosody, learning

• MethodKey idea: (In a listen-and-repeat type of language learning)Contrast the “old” type of audio feedback, i.e. playing native utterances only, with the “new” type of audio feedback, i.e. playing native and synthetic utterances.

17

Preliminary plans for an experiment

• Method1. Baseline: Grouping non-native learners into two (“good” and “bad”)2. Administration: Learning either with the “old” type of audio feedback

or with the “new” type of audio feedback3. Evaluation: Evaluate the two type of feedback by examining the

recordings of the learners

In 1 and 3, a native speaker marks the recordings of the non-native learners on a categorical/numerical scale.

In 2, the two groups (good/bad) are divided into four subgroups (good-A/good-B/bad-A/bad-B) so that A groups are given “old” type of audio feedback and B groups are given “new” type of audio feedback.

imposing native speakers’ prosody on non-native speakers’ utterances: preliminary studies...

Documents