dissertation defense saarbrücken – november ??th 2004 automatic classification of speech...

35
Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte Gabsdil Universität des Saarlandes

Upload: magdalen-greer

Post on 16-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

Dissertation DefenseSaarbrücken – November ??th 2004

Automatic Classification of Speech Recognition Hypotheses Using

Acoustic and Pragmatic Features

Malte Gabsdil

Universität des Saarlandes

Page 2: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

The Problem (theoretical)

• Grounding: establishing common ground between dialogue participants– “Did H correctly understand what S said?”

• Combination of bottom-up (“signal”) and top-down (“expectation”) information

• Clark (1996): Action ladders– upward completion– downward evidence

Page 3: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

The Problem (practical)

• Assessment of recognition quality for spoken dialogue systems

• Information sources– speech/recognition output (“acoustic”)– dialogue/task context (“pragmatic”)

• Crucial for usability and user satisfaction– avoid misunderstandings– promote dialogue flow and efficiency

Page 4: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

The General Picture

• Dialogue System Architecture

Dialogue Manager

dialoguehistory

interpretation

dialoguemodel

response selection

Gen

erat

ion

AS

R

Page 5: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

A Closer Look• How to assess recognition quality?

– decision problem

AS

R

Dialogue Manager

dialoguehistory

interpretation

dialoguemodel

response selection

Besthypothesis+ confidence

Con

fiden

ce r

ejec

tion

thre

shol

ds

n-Besthypotheses+ confidence

Mac

hin

e L

earn

ing

Cla

ssif

ier

Pragmatic features

Acoustic features

Page 6: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

Overview

• Machine learning classifiers

• Acoustic and pragmatic features

• Experiment 1: Chess– exemplary domain

• Experiment 2: WITAS– complex spoken dialogue system

• Conclusions & Topics for Future Work

Page 7: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

Machine Learning Classifiers

• Concept learners– learn decision function – training: present feature vectors annotated

with correct class– testing: classify unseen feature vectors

• Combine acoustic and pragmatic features to classify recognition hypotheses as accept, (clarify), reject, or ignore

nccxf ,...,)( 1

Page 8: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

Acoustic Information

• Derived from speech waveforms and recognition output

• Low level features– amplitude, pitch (f0), duration, tempo (e.g.

Levow 1998, Litman et al. 2000)

• Recogniser confidences– normalised probability that a sequence of

recognised words is correct (e.g. Wessel et al. 2001)

Page 9: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

Pragmatic Information

• Derived from the dialogue context and task knowledge

• Dialogue features– adjacency pairs: current/previous dialogue

move, DM bigram frequencies– reference: unresolvable definite NPs/PROs

• Task features (scenario dependent)– evaluation of move scores (Chess), conflicts

in action preconds and effects of (WITAS)

Page 10: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

Experiment 1: Chess

• Recognise spoken chess move instructions– speech interface to computer chess program

• Exemplary domain to test methodology– nice properties, easy to control

• Pragmatic features: automatic move evaluation scores (Crafty)

• Acoustic features: recogniser confidence scores (Nuance 8.0)

Page 11: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

Data & Design

• Subjects replay given chess games– instruct each other to move pieces– approx. 2000 move instructions in different

data sets (devel, train, test)

• 5 x 2 x 6 design– 5 systems for classifying recognition results

(main effect)– 2 game levels (strong vs. weak)– 6 pairs of players

Page 12: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

Players and Instructions

Page 13: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

Systems

• Task: accept or reject rec. hypotheses

• Baseline– confidence rejection threshold– binary classification of best hypothesis

• ML System– SVM learner (best on dev. set)– binary classication of 10-best results– choose first classified as accept, else reject

Page 14: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

Results

• Accuracy:– Baseline: 64.3%, ML System: 97.2%

0

200

400

600

800

Baseline ML System

false rejects

false accepts

correct rejects

correct accepts

Page 15: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

Evaluation

• 82.2% relative error rate reduction

• χ² test on confusion matrices– highly significant (p<.001)

• Combination of acoustic and pragmatic information outperforms standard approach

• System reacts appropriately more often → increased usability

Page 16: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

Experiment 2: WITAS

• Operator interaction with robot helicopter

• Multi-modal references, collaborative activities, multi-tasking

• Differences to chess experiment– complex dialogue scenario– complex system (ISU-based, planning, …)– much larger grammar and vocabulary

• Chess 37 GR, Vocab 50 FEHLT WAS

– open mic recordings (ignore class)

Page 17: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

WITAS Screenshot

Page 18: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

Data Preparation

• 30 dialogues (6 users, 303 utterances)

• Manual transcriptions

• Offline recognition (10best) and parsing→ quasi-logical forms

• Hypothesis labelling:– accept: same QLF– reject: out-of-grammar or different QLF– ignore: “crosstalk” not directed to system

Page 19: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

Example Features

• Acoustic– low level: amplitude (RMS)– recogniser: hypothesis confidence score, rank

in nbest list

• Pragmatic– dialogue: current/previous DM, DM bigram

probability, #unresolvable definite NPs – task: #conflicts in planning operators (e.g.

already satisfied effects)

Page 20: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

ResultsBaseline (Lemon 2004)

0

50

100

150

200

accept reject ignore

Best ML System

0

50

100

150

200

accept reject ignore

• Context-sensitive LMs• Accuracy: 65.68%• Wfscore: 61.81%• (higher price)

• TiMBL (optimised)• Accuracy: 86.14%• Wfscore: 86.39%

Page 21: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

Evaluation

• 59.6% relative error rate reduction

• χ² test on confusion matrices– highly significant (p<.001)

• Combination of acoustic and pragmatic information outperforms grammar switching approach

• System reacts appropriately more often → increased usability

Page 22: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

WITAS Features

• Importance according to χ²1. confidence (ac: recogniser)2. DMBigramFrequency (pr: dialogue)3. currentDM (pr: dialogue)4. minAmp (ac: low level)5. hypothesisLength (ac: recogniser)6. RMSamp (ac: low level)7. currentCommand (pr: dialogue)8. minWordConf (ac: recogniser)9. aqMatch (pr: dialogue)10. nbestRank (ac: recogniser)

Page 23: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

Summary/Achievements

• Assessment of recognition quality for spoken dialogue systems (grounding)

• Combination of acoustic and pragmatic information via machine learning

• Highly significant improvements in classification accuracy over standard methods (incl. “grammar switching”)

• Expect better system behaviour and user satisfaction

Page 24: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

Topics for Future Work

• Usability evaluation– systems with and w/o classification module

• Generic and system-specific features– which features are available across systems?

• Tools for ISU-based systems– module in DIPPER software library

• Clarification– flexible generation (alternative questions,

word-level clarification)

Page 25: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

APPENDIX

Page 26: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

Our Proposal

• Combine acoustic and pragmatic information in a principled way

• Machine learning to predict the grounding status of competing recognition hypotheses of user utterances

• Evaluation against standard methods in spoken dialogue system engineering– confidence rejection thresholds

Page 27: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

Application

ASR

1.

2.

3.

4.

5.

6.

7.

8.

9.

MLclassifier

Acousticinformation

Pragmaticinformation

1.

2.

3.

4.

5.

6.

7.

8.

9.

Dialogue Manager

dialogue contexttask knowledge

Page 28: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

Data CollectionGame Pair1 Pair2 Pair3 Pair4 Pair5 Pair6 Total

Trial (1) (1) (1) (1) (1) (1)

LMw (2) (2) (3)

LMs (4) (5) (5)

DSw 69 (5) 68 (4) 62 (4) 199

DSs 65 (3) 68 (3) 62 (4) 200

TRw 68 (4) 64 (2) 66 (4) 60 (3) 69 (5) 62 (3) 389

TRs 70 (5) 63 (3) 70 (5) 59 (2) 62 (4) 68 (2) 392

TEw 64 (6) 64 (6) 64 (6) 64 (6) 64 (6) 64 (6) 384

TEs 69 (7) 69 (7) 69 (7) 69 (7) 69 (7) 69 (7) 414

Total 336 329 337 320 331 325 1978

Page 29: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

Chess Results

• Accuracy:– Base: 64.3%, LM: 93.5%, ML: 97.2%

0

200

400

600

800

Baseline LegalMoves

MLSystem

false rejects

false accepts

correct rejects

correct accepts

Page 30: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

Data/Baseline

• Data from user study with WITAS– 6 subjects, 5 tasks each (“open mic”)– 30 dialogues (303 user utterances)– recorded utterances and logs of WITAS

Information State (dialogue history)

• Originally collected to evaluate a “grammar switching” version of WITAS (= Baseline; Lemon 2004)

Page 31: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

Data Preparation/Setup

• Manual transcription of all utterances

• Offline recognition (10best) with “full” grammar and processing with NLU component (quasi-logical forms)

• Hypothesis labelling:– accept: same QLF– reject: out-of-grammar or different QLF– ignore: “crosstalk” not directed to system

Page 32: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

Acoustic Features

• Low level:– RMSamp, minAmp (abs), meanAmp (abs)– motiv: detect crosstalk

• Recogniser output/confidence scores:– nbest rank, hypothesisLength (words)– hyp. confidence, confidence zScore,

confidence SD, minWordConf– motiv: quality estimation within and across

hypotheses

Page 33: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

Pragmatic Features

• Dialogue:– currentDM, DMTactiveNode, qaMatch,

aqMatch, DMbigramFreq, currentCommand– #unresNPs, #unresPROs, #uniqueIndefs– motiv: adjacency pairs, unlikely references

• Task:– taskConflict (same command already active),

taskConstraintConflict (fly vs. land)

Page 34: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

Importance of Features• What are the most predictive features?

– χ² statistics: correlate feature values with different classes

– computed for each feature from value/class contingency tables

i j ij

ijij

E

OE )²(²

..

..

n

nnE

ijij

– Oij: observed frequencies; Eij: expected frequencies

– n.j: sum over column j; ni.: sum over row i;n..: #instances

Page 35: Dissertation Defense Saarbrücken – November ??th 2004 Automatic Classification of Speech Recognition Hypotheses Using Acoustic and Pragmatic Features Malte

Simple Example

c1 c2

v1 100 100 200

v2 100 100 200

v3 100 100 200

v4 100 100 200

400 400 n=800

• Feature B• Feature A

• χ² = 0

c1 c2

v1 20 80 100

v2 75 75 150

v3 20 130 150

v4 285 115 400

400 400 n=800

• χ² = 2*(30²/50)+2*0+ 2*(55²/75)+ 2*(85²/200) = 188.92