dialogue structure and pronoun resolution

Dialogue Structure and Pronoun Resolution

Joel Tetreault and James Allen University of RochesterDepartment of Computer ScienceDAARC September 23, 2004

WELCOME TO DAARC!!!

Reference in Spoken Dialogue Resolving anaphoric expressions correctly is critical

in task-oriented domains Makes conversation easier for humans

Reference resolution module provides feedback to other components in system Ie. Incremental Parsing, Interpretation Module

Investigate how to improve RRM: Discourse Structure could be effective in reducing search

space of antecedents and improving accuracy (Grosz and Sidner, 1986)

Paucity of empirical work: Byron and Stent (1998), Eckert and Strube (2001), Byron (2002)

Goal

To evaluate whether shallow approaches to dialogue structure can improve a reference resolution algorithm (LRC used as baseline model to augment)

Investigated two models: Eckert &Strube (manual and automatic versions) “Literal QUD” model (manual)

Outline

Background Dialogue Act synchronization (Eckert and Strube model) QUD (Craige Roberts)

Monroe Corpus Algorithm Results

3rd person pronoun evaluation Dialogue Structure

Summary

Past approaches in structure and reference Veins: the nuclei of RST trees are the most salient

discourse units, the entities in these units are this more salient than others

Tetreault (2003): Penn Treebank subset annotated with RST. Used G&S approximations to try to improve on LRC baseline. Result: performed the same as baseline Veins: decreased performance slightly

Problem: fine-grained approaches (RST) are difficult to annotate reliably and do in real-time.

Perhaps shallow approaches can work?

literal QUD

Questions Under Discussion (Craige Roberts, Jonathan Ginzburg) – “what are we talking about?”: topics create discourse segments

Literally: questions or modals can be viewed as creating a discourse segment

Result – questions provide a shallow discourse structuring, and that maybe enough to improve performance, especially in a task-oriented domain

Entities in QUD main segment can be viewed as the topic Segment closed when question is answered (use ack

sequences, change in entities used) only entities from answer and entities in question are accessible Can be used in TRIPS to reduce search space of entities – set

context size

QUD Annotation Scheme

Annotate: Start utterance End utterance Type (aside, repeated question, unanswered,

open-ended, clarification) Kappa (compared with reconciled data):

Annotator Start End Type Overall

1 0.86 0.80 0.93 0.73

2 0.86 0.73 0.86 0.73

Example - QUDutt06 U: Where is it?

utt07 U: Just a second

utt08 U: I can't find the Rochester airport

utt09 S: It's

--------------------------------------------------------

utt10 U: I think I have a disability with maps

utt11 U: Have I ever told you that before

utt12 S: It's located on brooks avenue

utt13 U: Oh thank you

utt14 S: Do you see it?

utt15 U: Yes

(QUD-entry

:start utt06

:end utt13

:type clarification)

(QUD-entry

:start utt10

:end utt11

:type aside)

Example - QUD (utt10-11 processed)utt06 U: Where is it?



utt09 S: It's

[utt10,11 removed]

--------------------------------------------------------




utt15 U: Yes

(QUD-entry

:start utt06

:end utt13


(QUD-entry

:start utt10

:end utt11

:type aside)

Example - QUD (s13 processed)[utt06-13 collapsed: {the Rochester airport, brooks avenue}]

--------------------------------------------------------


utt15 U: Yes

(QUD-entry

:start utt06

:end utt13


QUD Issues

Issue 1: easy to detect Q’s (use Speech-Act information), but how do you know Q is answered?

Cue words, multiple acknowledgements, changes in entities discussed provide strong clues that question is finishing, but general questions such as “how are we going to do this?” can be ambiguous

Issue 2: what is more salient to a QUD pronoun – the QUD topic or a more recent entity?

Dialogue Act Segmentation

E&S: model to resolve all types of pronouns (3rd person and abstract) in spoken dialogue

Intuition: grounding is very important in spoken dialogue

Utterances that are not acknowledged by the listener may not be in common ground and thus not accessible to pronominal reference

Dialogue Act Segmentation Each utterance marked as

(I): contains content (initiation), question (A): acknowledgment (C): combination of the above (N): none of the above

Basic algorithm: utterances not ack’d or not in a string of I’s are removed from the discourse before next sentence is processed

Evaluation showed improvement for pronouns referring to abstract entities, and strong annotator reliability

Pronoun performance? Unclear, no comparison of measure without using DA model

Example – DA modelutt06 U: Where is it?



utt09 S: It's

utt10 U: I think I have a disability with maps (removed)

utt11 U: Have I ever told you that before




utt15 U: Yes

(I)

(N)

(I)

(N)

(I)

(I)

(I)

(A)

(I)

(A)

Parsing Monroe Domain Domain: Monroe Corpus of 20 transcriptions (Stent,

2001) of human subjects collaborating on Emergency Rescue 911 tasks

Each dialogue was at least 10 minutes long, and most were over 300 utterances long

Work presented here focuses on 5 of the dialogues (1756 utterances) (278 3rd person pronouns)

Goals: develop a corpus of sentences parsed with rich syntactic, semantic, discourse information to

Able to parse 5 dialogue sub-corpus with 84% accuracy

More details see ACL Discourse Annotation ‘04

TRIPS Parser

Broad-coverage, deep parser Uses bottom-up algorithm with CFG and

domain independent ontology combined with a domain model

Flat, unscoped LF with events and labeled semantic roles based on FrameNet

Semantic information for noun phrases based on EuroWordNet

Parser information for Reference Rich parser output is helpful for discourse

annotation and reference resolution: Referring expressions identified (pronoun, NP, impros) Verb roles and temporal information (tense, aspect)

identified Noun phrases have semantic information associated

with them Speech act information (question, acknowledgment) Discourse markers (so, but) Semi-automatic annotation increases reliability

Semantics Example: “an ambulance” (TERM :VAR V213818

:LF (A V213818 (:* LF::LAND-VEHICLE W::AMBULANCE) :INPUT (AN AMBULANCE))

:SEM ($ F::PHYS-OBJ (SPATIAL-ABSTRACTION SPATIAL-POINT)

(GROUP -) (MOBILITY LAND-MOVABLE) (FORM ENCLOSURE) (ORIGIN ARTIFACT) (OBJECT-FUNCTION VEHICLE) (INTENTIONAL -) (INFORMATION -) (CONTAINER (OR + -))

(TRAJECTORY -)))

Reference Annotation

Annotated dialogues for reference w/undergraduate researchers (created a Java Tool: PronounTool)

Markables determined by LF terms Identification numbers determined by :VAR field of LF

term Used stand-off file to encode what each pronoun refers

to (refers-to) and the relation between pronoun and antecedent (relation)

Post-processing phase assigns an unique identification number to coreference chains

Also annotated coreference between definite noun phrases

Reference Annotation

Used slightly modified MATE scheme: pronouns divided into the following types: IDENTITY (Coreference) (278)

Includes set constructions (6) FUNCTIONAL (20) PROPOSITON/D.DEXEIS (41) ACTION/EVENT (22) INDEXICAL (417) EXPLETIVE (97) DIFFICULT (5)

LRC Algorithm

LRC: modified centering algorithm (Tetreault ’01) that does not use Cb or transitions, but keeps a Cf-list (history) for each utterance

While processing utterance’s entities (left to right) do:Push entity onto Cf-list-new, for a pronoun p, attempt to resolve:

Search through Cf-list-new (l-to-r) taking the first candidate that meets gender, agreement, and binding and semantic feature constraints.

If none found, search past utterance’s Cf-lists starting from previous utterance to beginning of discourse

When p is resolved, push pronoun with semantic features from antecedent on to Cf-list-new

More details see SemDial ‘04

LRC Algorithm with Structure Info Augmented algorithm with extensions to

handle QUD and E&S input For QUD, at the start and end of processing

an utterance, QUD’s are started (pushed on stack) or ended (entities are collapsed), so Cf-list history changes

For E&S, each utterance is assigned a DA code and then removed or kept depending on the next utterance (if it is an acknowledgement, or a series of I’s)

Results

Metric Baseline QUD E&S Auto

E&S Manual

+sem 67.9% 67.9% 64.7% 60.4%

-sem 61.5% 61.5% 60.1% 54.7%

Error Analysis

Though QUD and +sem baseline performed the same (89 errors), they each got 3 pronouns right the other did not

Baseline: 3 collapsing nodes removes correct antecedent

QUD: 2 right associated with blocking off aside 1 associated with collapsing (intervening nodes blocked)

15 pronouns, both got wrong, but made different predictions

Remaining 71, both made same error

Issues

Structuring methods are probably more trouble than they are worth with the corpora available right now

Also only affect a few pronouns Segment ends are least reliable

What constitutes an end? 3 errors show either boundaries are marked incorrectly if

pronouns are accessing elements in a “closed” DS Or perhaps collapsing routine is too harsh

Small corpus size Hard to draw definite conclusions given only 3 criss-

crossed errors need more data for statistical evaluations

Issues

E&S Model has advantage over QUD of being easiest to automate, but fares worse since it takes into account a small window of utterances (extremely shallow)

QUD model can be semi-automated (detecting question starts is easy) but detecting ends and type are harder

QUD could definitely be improved by taking into account plan initiations and suggestions, instead of limiting to questions only, but tradeoff is reliability

dialogue structure and pronoun resolution

Documents

utt10 u

utt07 u

utt15 u

secondutt08 u

processedutt06 u

mapsutt11 u

brooks avenueutt13 u

discourse structure