speaking while monitoring addressees for understanding
DESCRIPTION
Speaking while monitoring addressees for understanding. Seminar „Gaze as function of instructions - and vice versa “. Herbert H. Clark and Meredyth A. Krych. Torsten Jachmann 16.12.2013. Research Question. Speaking and listening in dialog Unilateral - PowerPoint PPT PresentationTRANSCRIPT
Speaking while monitoring addressees for understanding
Torsten Jachmann
16.12.2013
Herbert H. Clark and Meredyth A. Krych
Seminar „Gaze as function of instructions - and vice versa“
Research Question• Speaking and listening in dialog
o Unilateral• Speakers and listeners act autonomous• No interaction
o Bilateral• Speakers and listeners monitor their respective partner• Joint activity
What do speakers monitor?How do they use that information?
Grounding• Level 1
o Attend to vocalization
• Level 2o Identify words, phrases and sentences
• Level 3o Understand the meaning
• Level 4o Consider answering
GroundingA: Where you there when they erected the new signs?B: Th… which new signs? (Level 3)A: Little notice boards, indicating where you had to go for everythingB: No. Bilateral account
Monitoring• Voices
o Attendance to partners utterances
• Faceso Gaze and facial expressions as indicator for understanding
• Workspaceso Region in front of the bodyo Manual gestures (but also games, etc.)
Monitoring• Bodies
o Head and torso movement as indicator
• Shared Sceneso Scenery beyond workspace
• Signals vs. Symptomso Signals are constructed to get meaning acrosso Symptoms are not intentionally created
Least joint effort• Opportunistic
o Selection of the available methods that take the least effort to produce
• “Tailored”o Overhearers (not monitored by speaker) may
misunderstand utterances
Method• Pairs of directors and builders
o 76 students (34 male / 42 female)
• Instructions to build 10 simple Lego Models• 2 x 2 design (interactive)
o 28 pairs
• Additional non-interactive conditiono 10 pairs
• Video and audio analyses
Interactive• Mixture model
o Workspace (between subject)• Visible• Invisible
o Faces (within subject)• Visible• Invisible
• No restrictions in time and talk
Non-interactive• Only one condition• Director records instructions
o No time or talk constrainso Prototype can be examined as long as wanted before
recording
• Builders listen to instructionso No constrains on actions• Start, stop, rewind
Results• Efficiency• Turns• Gestures and grounding
o Deictic expressionso Gestures by addresseeso Cross-timing of actionso Timing strategieso Visual monitoring
Efficiency
• Visibility of workspace improves efficiency
EfficiencyNon-interactive• Time needed to build much longer
(245s “n-i” vs. 183s “i”)
• Strong drop in accuracy
o Inadequate instructions
Turns
• Fewer SPOKEN turns of builder when workspace is visible
Deictic expressions
• Mainly unusable when workspace hiddeno Joint attention neededo only referring to before mentioned situation
Gestures by addressees• Mostly accompanied by
deictic utterances (if any)
• Explicit verdict usually only on such utterances
(otherwise continuing)
Cross-timing
• Gestural signalso Reflect understanding at that moment
Cross-timing
• Overlapping signalso Usually not in spoken dialogo Start with “sufficient information”
Cross-timing
• Projectingo Prediction of following actions/instructions
Cross-timing
• Initiation timeo Waiting for partner to be able to attend the following
utterance
Cross-timing
• Time uptakeo Responses have to be timed exactly to the action and
situation
Timing strategies
• Self-interruptiono Dealing with evidence from the addresseeo Usually not continued
Timing strategies
• Collaborative referenceso Deictic references rely on addressees actions
Visual monitoring
• Mainly used when director reaches a problem• Eye gaze as support
Conclusion
• Grounding is fundamental• Visible workspace enhances grounding
speed• In task-oriented dialogs faces are not
important• Compensation possible (only if any
monitoring is available)
Conclusion
• Updating common ground• Increments are determined jointly• Much evidence for bilateral account
o Addressees provide statement about current understanding
o Speakers monitor to update and change utterances
Conclusion• Opportunistic process
o Offering optionso Self-interruptionso Waitingo Instant revision
• Multi-modal processo Speech and gestures are combined if possibleo Speech alone takes more time
Remarks• Gaze only important for certain types of
tasks
• Measurement of time maybe outdated(“old” study)
• No contradicting studies(To some extend commonsense)
Gaze and Turn-Taking Behavior in Casual
Conversation InteractionsKristiina Jokinen, Hirohisa Furukawa, Masafumi Nishida and
Seiichi Yamamoto
Differences
• Three-party dialogue
• No instructional task
• Stronger focus on eye gaze
Research Question• How well can eye gaze help in predicting
turn taking?• What is the role of eye gaze when the
speaker holds the turn?• Is the role of eye gaze as important in
three-party dialogs as in two-party dialogue?
Hypothesis• In group discussions, eye gaze is
important in turn to management (especially in turn holding cases)• The speaker is more influential than the
other partners in coordinating interactions
(selects the next speaker)
Method
• Three-person conversational eye gaze corpuso Natural conversationso Balanced familiarity (50% familiar; 50% unfamiliar)o Balanced gender (male-only; female-only; mixed)
Method
• 28 conversations among Japanese students in their early 20’s with three participants each
• Each conversation about 10 minutes• Eye gaze recorded for one participant
Method
• Eye tracker fixed on table to remain naturalness
Method
Used data• Estimated at the last 300ms of an
utterance if followed by a 500ms pause
Used data
• Dialog acts
• Speech featuresoValues of F0, etc.
• Eye gaze
Results
Conclusion
• Speaker signals whether he intends to give the turn or hold it by using eye gazeo fixating listener vs. focusing attention somewhere
• Eye gaze in multi-participant conversation as important as in two-participant conversations
Conclusion
• Eye gaze is used to select next speaker (seems to be correct)
• Maybe Japanese data interferes with value of speech datao Comparison Study?
• Listeners focus on speaker not vice versa
Remarks• Vague information and data presentation
o Although various data exists, interaction of factors is not presented
o Some conclusions rely on the before mentioned point
• Setup only takes one participant in consideration• Much of the data was unused
o Lack in quality and way of creation
Remarks
• Study is based on data for another studyo Setup is not optimal
• Realistic designo Yet, contains biasing flaws (situation of the
participants, only one eye tracker)
Comparison• Clark and Krych present interesting ideas
but eye gaze is only rarely handledo How could this be altered?
• Jokinen et al. focus on eye gaze in a (more or less) natural situation but lack in scientific results and setupoWhat points and ideas of this setup could be
beneficial?