results clear distinction between two question intonations: perception and understanding level three...

1
Results Clear distinction between two question intonations: perception and understanding level Three distinct prototypes for different interpretations Future work Will be implemented and tested in Higgins to evaluate user responses and dialogue efficiency Corpus study of human-human dialog Multi-syllable, multi-word, accent II Back-channels (hmm, eh) Multimodal synthesis early mid late N um berofvotes ACCEPT CLAR IFY U NDERSTANDING CLAR IFY P ERCEPTION 0 10 20 30 40 early late HIG H LO W mid (max 48) The Effects of Prosodic Features on the Interpretation of Clarification Ellipses Jens Edlund, David House and Gabriel Skantze Abstract In this paper, the effects of prosodic features on the interpretation of elliptical clarification requests in dialogue are studied. An experiment is presented where subjects were asked to listen to short human-computer dialogue fragments in Swedish, where a synthetic voice was making an elliptical clarification after a user turn. The prosodic features of the synthetic voice were systematically varied, and the subjects were asked to judge what was actually intended by the computer. The results show that an early low F0 peak signals acceptance, that a late high peak is perceived as a request for clarification of what was said, and that a mid high peak is perceived as a request for clarification of the meaning of what was said. Setting Levels of understanding Allwood et al. (1992), Clark (1996) Ellipsis interpretation Centre for S peech Technology TT Errors and clarification in dialog Dialog not always error free Error detection often made using explicit or implicit spoken clarification/verification: User […] on the right I see a red building. System (low conf.) Did you say ’A red building’? System (high conf.) A red building… ok, take a left […]? Traditionally: Clarification Ellipses User […] on the right I see a red building. System red(?) Advantages: Level Acceptance H accepts what S says Understand ing H understands what S says Perception H hears what S says Contact H hears that S speaks Experiment 8 subjects judged the meaning of one-word elliptical clarification requests in dialogue context Task: Select paraphrase for elliptical system utterance • Swedish System utterance: red, blue, yellow •F 0 peak position: early, mid, late •F 0 peak height: low, high Vowel duration: normal, long = 36 stimuli LUKAS diphone MBROLA synthesis Level Paraphrase Signal Acceptance Ok, red. Clarify Understanding Do you really mean red? Clarify Perception Did you say red? The Higgins spoken dialog system for pedestrian navigation No effects for: • Subject • Color • Duration Prototypes: • Accept: Early low peak • Clarify Understanding: Mid high peak Clarify Perception: Late high peak The Problem Elliptical one-word clarification requests are potentially ambiguous Little syntax and structure Prosody more critical How do prosodic features affect the interpretation of these utterances? Constructed as full propositions Often perceived as tedious Clarifies entire user utterances • Fast Focuses on problematic fragment Often used in human-human dialog Question intonation Swedish question intonation Raised top-line and widened F0 range on focal accent (Gårding, 1998) Delayed focal peak (House, 2003) German dialog Rodriguez & Schlangen (2004) Rising boundary tones to clarify acoustic problems (perception) Used less for reference resolution (understanding)

Post on 19-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Results Clear distinction between two question intonations: perception and understanding level Three distinct prototypes for different interpretations

Results

• Clear distinction between two question intonations: perception and understanding level

• Three distinct prototypes for different interpretations

Future work

• Will be implemented and tested in Higgins to evaluate user responses and dialogue efficiency

• Corpus study of human-human dialog• Multi-syllable, multi-word, accent II• Back-channels (hmm, eh)• Multimodal synthesis

early

mid late

Number of votes

A CCEPT C LARIFY U NDERSTANDING

C LARIFY P ERCEPTION

0

10

20

30

40

early

late

HIGH

LOW

mid

(max 48)

The Effects of Prosodic Features on the Interpretation of Clarification Ellipses

Jens Edlund, David House and Gabriel Skantze

AbstractIn this paper, the effects of prosodic features on the interpretation of elliptical clarification requests in dialogue are studied. An experiment is presented where subjects were asked to listen to short human-computer dialogue fragments in Swedish, where a synthetic voice was making an elliptical clarification after a user turn. The prosodic features of the synthetic voice were systematically varied, and the subjects were asked to judge what was actually intended by the computer. The results show that an early low F0 peak signals acceptance, that a late high peak is perceived as a request for clarification of what was said, and that a mid high peak is perceived as a request for clarification of the meaning of what was said.

Setting Levels of understanding

Allwood et al. (1992), Clark (1996)

Ellipsis interpretation

Centre forSpeech Technology

TT

Errors and clarification in dialog

• Dialog not always error free• Error detection often made using explicit or implicit

spoken clarification/verification:

User […] on the right I see a red building.System (low conf.) Did you say ’A red building’?System (high conf.) A red building… ok, take a left […]?

Traditionally:

Clarification Ellipses

User […] on the right I see a red building.System red(?)

Advantages:

Level

Acceptance H accepts what S says

Understanding H understands what S says

Perception H hears what S says

Contact H hears that S speaks

Experiment

• 8 subjects judged the meaning of one-word elliptical clarification requests in dialogue context

• Task: Select paraphrase for elliptical system utterance

• Swedish

• System utterance: red, blue, yellow• F0 peak position: early, mid, late• F0 peak height: low, high• Vowel duration: normal, long

= 36 stimuli

• LUKAS diphone MBROLA synthesis

Level Paraphrase

Signal Acceptance Ok, red.

Clarify Understanding Do you really mean red?

Clarify Perception Did you say red?

The Higgins spoken dialog system forpedestrian navigation

No effects for:• Subject• Color• Duration

Prototypes:• Accept:

Early low peak• Clarify Understanding:

Mid high peak• Clarify Perception:

Late high peak

The Problem

• Elliptical one-word clarification requests are potentially ambiguous

• Little syntax and structure • Prosody more critical• How do prosodic features affect the

interpretation of these utterances?

• Constructed as full propositions• Often perceived as tedious• Clarifies entire user utterances

• Fast• Focuses on problematic fragment• Often used in human-human dialog

Question intonation

Swedish question intonation • Raised top-line and widened F0 range on focal

accent (Gårding, 1998)• Delayed focal peak (House, 2003)

German dialog • Rodriguez & Schlangen (2004)• Rising boundary tones to clarify acoustic

problems (perception) • Used less for reference resolution

(understanding)