speaking while monitoring addressees for understanding

Speaking while monitoring addressees for understanding

Torsten Jachmann

16.12.2013

Herbert H. Clark and Meredyth A. Krych

Seminar „Gaze as function of instructions - and vice versa“

Research Question• Speaking and listening in dialog

o Unilateral• Speakers and listeners act autonomous• No interaction

o Bilateral• Speakers and listeners monitor their respective partner• Joint activity

What do speakers monitor?How do they use that information?

Grounding• Level 1

o Attend to vocalization

• Level 2o Identify words, phrases and sentences

• Level 3o Understand the meaning

• Level 4o Consider answering

GroundingA: Where you there when they erected the new signs?B: Th… which new signs? (Level 3)A: Little notice boards, indicating where you had to go for everythingB: No. Bilateral account

Monitoring• Voices

o Attendance to partners utterances

• Faceso Gaze and facial expressions as indicator for understanding

• Workspaceso Region in front of the bodyo Manual gestures (but also games, etc.)

Monitoring• Bodies

o Head and torso movement as indicator

• Shared Sceneso Scenery beyond workspace

• Signals vs. Symptomso Signals are constructed to get meaning acrosso Symptoms are not intentionally created

Least joint effort• Opportunistic

o Selection of the available methods that take the least effort to produce

• “Tailored”o Overhearers (not monitored by speaker) may

misunderstand utterances

Method• Pairs of directors and builders

o 76 students (34 male / 42 female)

• Instructions to build 10 simple Lego Models• 2 x 2 design (interactive)

o 28 pairs

• Additional non-interactive conditiono 10 pairs

• Video and audio analyses

Interactive• Mixture model

o Workspace (between subject)• Visible• Invisible

o Faces (within subject)• Visible• Invisible

• No restrictions in time and talk

Non-interactive• Only one condition• Director records instructions

o No time or talk constrainso Prototype can be examined as long as wanted before

recording

• Builders listen to instructionso No constrains on actions• Start, stop, rewind

Results• Efficiency• Turns• Gestures and grounding

o Deictic expressionso Gestures by addresseeso Cross-timing of actionso Timing strategieso Visual monitoring

Efficiency

• Visibility of workspace improves efficiency

EfficiencyNon-interactive• Time needed to build much longer

(245s “n-i” vs. 183s “i”)

• Strong drop in accuracy

o Inadequate instructions

Turns

• Fewer SPOKEN turns of builder when workspace is visible

Deictic expressions

• Mainly unusable when workspace hiddeno Joint attention neededo only referring to before mentioned situation

Gestures by addressees• Mostly accompanied by

deictic utterances (if any)

• Explicit verdict usually only on such utterances

(otherwise continuing)

Cross-timing

• Gestural signalso Reflect understanding at that moment

Cross-timing

• Overlapping signalso Usually not in spoken dialogo Start with “sufficient information”

Cross-timing

• Projectingo Prediction of following actions/instructions

Cross-timing

• Initiation timeo Waiting for partner to be able to attend the following

utterance

Cross-timing

• Time uptakeo Responses have to be timed exactly to the action and

situation

Timing strategies

• Self-interruptiono Dealing with evidence from the addresseeo Usually not continued

Timing strategies

• Collaborative referenceso Deictic references rely on addressees actions

Visual monitoring

• Mainly used when director reaches a problem• Eye gaze as support

Conclusion

• Grounding is fundamental• Visible workspace enhances grounding

speed• In task-oriented dialogs faces are not

important• Compensation possible (only if any

monitoring is available)

Conclusion

• Updating common ground• Increments are determined jointly• Much evidence for bilateral account

o Addressees provide statement about current understanding

o Speakers monitor to update and change utterances

Conclusion• Opportunistic process

o Offering optionso Self-interruptionso Waitingo Instant revision

• Multi-modal processo Speech and gestures are combined if possibleo Speech alone takes more time

Remarks• Gaze only important for certain types of

tasks

• Measurement of time maybe outdated(“old” study)

• No contradicting studies(To some extend commonsense)

Gaze and Turn-Taking Behavior in Casual

Conversation InteractionsKristiina Jokinen, Hirohisa Furukawa, Masafumi Nishida and

Seiichi Yamamoto

Differences

• Three-party dialogue

• No instructional task

• Stronger focus on eye gaze

Research Question• How well can eye gaze help in predicting

turn taking?• What is the role of eye gaze when the

speaker holds the turn?• Is the role of eye gaze as important in

three-party dialogs as in two-party dialogue?

Hypothesis• In group discussions, eye gaze is

important in turn to management (especially in turn holding cases)• The speaker is more influential than the

other partners in coordinating interactions

(selects the next speaker)

Method

• Three-person conversational eye gaze corpuso Natural conversationso Balanced familiarity (50% familiar; 50% unfamiliar)o Balanced gender (male-only; female-only; mixed)

Method

• 28 conversations among Japanese students in their early 20’s with three participants each

• Each conversation about 10 minutes• Eye gaze recorded for one participant

Method

• Eye tracker fixed on table to remain naturalness

Method

Used data• Estimated at the last 300ms of an

utterance if followed by a 500ms pause

Used data

• Dialog acts

• Speech featuresoValues of F0, etc.

• Eye gaze

Results

Conclusion

• Speaker signals whether he intends to give the turn or hold it by using eye gazeo fixating listener vs. focusing attention somewhere

• Eye gaze in multi-participant conversation as important as in two-participant conversations

Conclusion

• Eye gaze is used to select next speaker (seems to be correct)

• Maybe Japanese data interferes with value of speech datao Comparison Study?

• Listeners focus on speaker not vice versa

Remarks• Vague information and data presentation

o Although various data exists, interaction of factors is not presented

o Some conclusions rely on the before mentioned point

• Setup only takes one participant in consideration• Much of the data was unused

o Lack in quality and way of creation

Remarks

• Study is based on data for another studyo Setup is not optimal

• Realistic designo Yet, contains biasing flaws (situation of the

participants, only one eye tracker)

Comparison• Clark and Krych present interesting ideas

but eye gaze is only rarely handledo How could this be altered?

• Jokinen et al. focus on eye gaze in a (more or less) natural situation but lack in scientific results and setupoWhat points and ideas of this setup could be

beneficial?

speaking while monitoring addressees for understanding

Documents

workspace signals

deictic utterances

new signs

function of instructions

addressees actions

following actionsinstructions

following utterance

sufficient information