human-computer interactioncomputer interaction csg 170
Post on 15-Mar-2022
3 Views
Preview:
TRANSCRIPT
11/26/2008
1
Human-Computer InteractionHuman Computer InteractionCSG 170 – Round 10
Prof. Timothy Bickmore
Quiz
11/26/2008
2
Quiz – backgroundDiscourse Analysis Notation One utterance per line One utterance per line. Brackets represent overlapped behavior
1. A: Did you really [mean that]?2. B: [yes]
Parentheses used for nonverbal behavior and other notations
1. A: (eyebrow raise) Really?2. B: (inaudible)
Ellipsis used for pauses ‘…’
Quiz Characterize the following 15 seconds of Characterize the following 15 seconds of
dialog Diane (on left) is a real estate agent, Beth & Ned
Flood are new clients. Video will be played 10 times You will then have 5 minutes to analyze & write up
your resultsyour results
Hint: think about each communicative modality separately, starting with speech
11/26/2008
4
Human Dialogue
Hierarchical Structure of Language
Analyses StructuresAnalyses Structures
Phonological
Morphological
Lexical
Words
Morphemes
Phonemes
SoundsPhonological
Morphological
Lexical
Words
Morphemes
Phonemes
Sounds
Syntactic
Semantic
PragmaticCommunicative Intentions
Representation Structures
Syntactic Structures
Syntactic
Semantic
PragmaticCommunicative Intentions
Representation Structures
Syntactic Structures
11/26/2008
5
Discourse, Dialogue & NVB are all pragmatic concerns
Intimately depend on context Intimately depend on context Discourse is concerned with
The combinatorial meaning of utterances (internal context)
Other contextual phenomena (external context)context) Deixis Social deixis Grounding & referring Etc.
Conversational behavior vs. function
Eyebrow raise is a conversational behavior Eyebrow raise is a conversational behavior Emphasis is a conversational function There is a many-to-many mapping
between behaviors and functions
Eyebrow raise
BEAT gesture
Intonation pitch peak
Emphasis
Affective display
11/26/2008
6
Types of Conversational Function
Propositional Propositional Interactional Affective Attitudinal
R l ti l Relational …
Proxemics
Engagement & disengagement Engagement & disengagement Social distance Immediacy behavior
11/26/2008
7
Eye gaze
Attention Attention Deictic Turn-taking
Eyebrows
Emphasis Emphasis Affective displays
11/26/2008
8
Headnods
Emphasis Emphasis Greetings Backchannels Acknowledgements
Hand gesture
Form (from McNeil) Form… (from McNeil) Beat Deictic Iconic Metaphoric
Function Function Emphasis Propositional/Semantic Turn-taking / interruption
11/26/2008
9
Turn-taking
Interlocutors cannot talk at once Interlocutors cannot talk at once Cues for ‘giving turn’
Gaze at next speaker Pause Rising end intonation
Cues for taking ‘taking turn’ Speaking Gesturing
REA: Embodied Conversational Agent
11/26/2008
10
Grounding
Process by which interlocutors come to a Process by which interlocutors come to a shared understanding of what is said
A collaborative process Mechanisms
Requests for acknowledgementA k l d t Acknowledgements Can be contingent move
Request for repair Repair
Discourse understanding How does hearer understand?
Assumptions• Physical context
Communicative behavior
DiscourseContext
• Physical context• Language abilities• “Commonsense” • Cultural context• Gricean maxims(quantity, quality,relevance, etc)
• etcSpecifications for updatingdiscourse context given assumptions
Speaker Hearer
11/26/2008
11
Speech act theory
Our utterance are actions Our utterance are actions Bless, Vow, Curse Assert, Promise, Request, Reply, etc.
Locutionary act (uttering the words) Illocutionary act (speech act)
l i ( l i ff ) Perlocutionary act (ultimate effect)
e.g. “It’s cold in here.”
Other phenomena
Anaphora / Cataphora Anaphora / Cataphora John ate an apple. He liked it. If you need one, there’s a towel in the top
drawer. Ellipsisp
Anything wrong?
11/26/2008
12
Representations of Dialog as used in dialog systems
State transition networks State transition networks Hierarchical ATNs
Rule-based Chunks of dialog are represented by a
partially ordered sequence of statespartially-ordered sequence of states e.g. adjacency pairs
Chunks are negotiated and deployed dynamically at runtime
Grosz & Sidner ’86Attention, Intention and the Structure of Discourse
• Discourse can be partitioned into discourse segments • The segments have a hierarchical structure• This structure is dynamically determined • Interlocutors must maintain a stackmaintain a stack representing where they are in the hierarchy• This stack is used to efficiently generate and resolve references
11/26/2008
13
Example Grosz & Sidner / COLLAGEN
EvaluateExerciseEvaluateExercise ShowGraphShowGraph
Interacting about exercise.Done opening interaction.1
Interacting about exercise.Done opening interaction.1
Interacting about exercise.Done opening interaction.11Intentional Structure
Opening ClosingDiscussPreviousDay DiscussNextDay
ShowGraph DiscussGraph
1
2
3
Intentional Structure
Opening ClosingDiscussPreviousDay DiscussNextDay
ShowGraph DiscussGraph
11
22
33
DiscussPreviousDay
EvaluateExercise
Attentional State
DiscussPreviousDay
EvaluateExercise
Attentional StateAsk Steps
p gDiscussing previous day.
Done client identifying steps walked yesterday as 1000.Coach says “How many steps did you walk yesterday?”Client says “1000.”
Coach showing graph for the week.Coach says “Here’s how you’ve done this week.”Next expecting coach to show graph for the week.
Expecting to discuss graph.Expecting to discuss next day.Expecting to close interaction.
2
3
Linguistic Structure
p gDiscussing previous day.
Done client identifying steps walked yesterday as 1000.Coach says “How many steps did you walk yesterday?”Client says “1000.”
Coach showing graph for the week.Coach says “Here’s how you’ve done this week.”Next expecting coach to show graph for the week.
Expecting to discuss graph.Expecting to discuss next day.Expecting to close interaction.
2
3
p gDiscussing previous day.
Done client identifying steps walked yesterday as 1000.Coach says “How many steps did you walk yesterday?”Client says “1000.”
Coach showing graph for the week.Coach says “Here’s how you’ve done this week.”Next expecting coach to show graph for the week.
Expecting to discuss graph.Expecting to discuss next day.Expecting to close interaction.
22
33
Linguistic Structure
State of the art – InteractiveState of the art Interactive Voice Response (IVR) systems
11/26/2008
14
Current status of IVR systems Speech Recognition: accuracy of 85-90%+ Speech Recognition: accuracy of 85 90%+ Speech Synthesis: high quality for short
utterances Dialog Management: strongly-managed
dialog flow Language Processing: Very limited g g g y Language Generation: Very trivial
Current IVR Apps
Directory assistance Directory assistance Taxi bookings Stock transactions Remote banking Travel reservations Auto-attendants Pizza ordering And…
11/26/2008
15
State of the Art:Scripted Interactions
Scripts written by Scripts written by teams of experts
Represented as flow charts or Augmented transition networks
GetCommitment
GC_6
I'm going to workout at the gym.
Great. How much aerobic exercise do you plan to do?
GC_4
X again?
No
How long do you plan to play for?
Yes
Great. How long do you plan to go for?
GC_7
I'm going to go for a walk.
Great. How long do you plan to go for?
GC_8(null)
(below exp)
(at exp)
Yes
Do you think you can go for X minutes?Do you think you can increase your time a little today/tomorrow?Can you keep up the same time as yesterday/etc?
if REL & know locationWhere are you going to walk? MotivateDuration
No
GC_1(null)
I'm going to play a sport.
if REL & know sport
GC_2
GC_3
GC_5
GC_9
What kind of exercise are you going to do?
Are you going to workout tomorrow?
Yep
GC_START(null)
Are you going to get some [more] exercise today?
(time2bed<2)
(time2bed > 2)
noyes
GC_16
GC_17
GC_18
Something else.
(TEXTENTRY)What kind of exercise?
GC_19
Great.
Which one?
I can't
GC_21
GC_22
(above exp)
No, I really want to.OK
You shouldn't try to do somuch so soon... How about
X minutes this time?
OK, but you should tryto increase gradually..transition networks
Implemented as state machines / ATNs / VXML
GC_END
GC_12(null)
GC_10No
yesAre you going to (location X) again?
Who?
(if REL & know buddy)
(else)
No
Yes
yes
No
(loner)
MotivateToExercise
GC_11
GC_13
GC_14
GC_15
Are you going to gowith X again?
Are you going togowith anyone?
Is it becuaseof your illness/injury?
(no illness/injury)
(illness|injury)
Yes
No
GC_19
GC_20
11/26/2008
16
Commercial & Research Tools Available
IntervoiceInVision Studio
Also:
Telera DeVXchangeAppbuilder
AudiumAudium3
VoiceGenieGenieBuilder
11/26/2008
17
Language for specifying voice dialogs
What is it? Language for specifying voice dialogs Output:
Prerecorded audio and text-to-speech (TTS)
Input: Touch-tone keys and Automatic Speech y p
Recognition (ASR)
Extension of XML Designed to interact with web-based
applications
1995
History 1995
Phone Web project by AT&T Research 1999
Lucent and AT&T have incompatible dialects of Phone Markup Language
So, VoiceXML Forum created with AT&T, Lucent, Motorola, and IBM
Team develops VoiceXML 0.9, a first pass at standardization 2000
VoiceXML 1.0 was created and submitted to World-Wide Web Consortium (W3C)
2001 VoiceXML 2.0 by W3C’s Voice Browser Working Group
11/26/2008
18
Big Picture:The phone as a web browser
VXML is a kind of XML Tags and bodyTags and body
<cmu><welcome>Welcome to CMU! </welcome> <ecom>
<welcome>Welcome to the E-commerce!</welcome></ecom>
</cmu>
zero or more attributesl t “t ” W l / l<welcome accent=“texan”>Welcome</welcome>
<welcome accent=“pittsburgh”>Welcome</welcome>
Tag with no body<breath/>
11/26/2008
19
A simple example<?xml version="1 0" encoding="UTF-8"?><?xml version 1.0 encoding UTF 8 ?><vxml version="2.1"><form>
<field name=“goingtoBeach" type="boolean"><prompt>
“Are you going to Daytona Beach this year?”</prompt>
<filled>Ohh... <if cond=“goingtoBeach">That’s great!
<goto next=“goingDocument.vxml" />
< l /><else />
bummer maybe next year.<goto next=“notgoingDocument.vxml" />
</if></filled>
</field></form>
</vxml>
<?xml version="1.0" encoding="UTF-8"?>
A simple exampleASR grammar & TTS prompt
<vxml version="2.1"><form>
<field name=“goingtoBeach" type="boolean"><prompt>
“Are you going to Daytona Beach this year?”</prompt>
<filled>Ohh... <if cond=“goingtoBeach">That’s great!
<goto next=“goingDocument.vxml" />
<else /><else />
bummer maybe next year.<goto next=“notgoingDocument.vxml" />
</if></filled>
</field></form>
</vxml>
11/26/2008
20
? l i "1 0" di "UTF 8"?
A simple exampleExecution conditioned on ASR recognition
<?xml version="1.0" encoding="UTF-8"?><vxml version="2.1"><form>
<field name=“goingtoBeach" type="boolean"><prompt>
“Are you going to Daytona Beach this year?”</prompt>
<filled>Ohh... <if cond=“goingtoBeach">That’s great!
<goto next=“goingDocument.vxml" />
<else />
bummer maybe next year.<goto next=“notgoingDocument.vxml" />
</if></filled>
</field></form>
</vxml>
<?xml version="1 0" encoding="UTF-8"?>
A simple exampleif/else with goto
<?xml version 1.0 encoding UTF 8 ?><vxml version="2.1"><form>
<field name=“goingtoBeach" type="boolean"><prompt>
“Are you going to Daytona Beach this year?”</prompt>
<filled>Ohh... <if cond=“goingtoBeach">That’s great!
<goto next=“goingDocument.vxml" />
< l /><else />
bummer maybe next year.<goto next=“notgoingDocument.vxml" />
</if></filled>
</field></form>
</vxml>
11/26/2008
21
Custom Input Grammar<form>
<field name=“month”>
<prompt>…</prompt>
<grammar src=“month.grxml”/>
</form>
Many formats for grammars, both ASR and y gDTMF.
Recorded audio instead of TTS
<prompt>
Hello! Thanks for calling.
<audio src=“welcome.wav”/>
</prompt>
11/26/2008
22
Some error handling
<noinput>I did not hear anything. Please try again.<reprompt/>
</noinput>
<nomatch>I did not recognize that character. Please try again.<reprompt/><reprompt/>
</nomatch>
Evaluation of IVRs
PARADISE PARADISE Decision-theoretic framework to derive a
single metric that takes system accuracy, efficiency and subjective factors into account
Ch k li t Check lists TRINDI DISC
11/26/2008
23
“TRINDI Tick List”
(A subset)
Q1: Is Utterance Interpretation Sensitive to Context?
S: When do you want to arrive? S: When do you want to arrive? U: Tomorrow. S: You want to arrive on Thursday
October 31st.
Context = Time, Space, Person, etc.
11/26/2008
24
Q2: Is Over-answering Possible?
S: Where do you want to go to? S: Where do you want to go to? U: Boston at 3pm.
Q3: Can Answers be for Unasked Questions?
S: Where do you want to go? S: Where do you want to go? U: I'm leaving New York on the 29th
June.
11/26/2008
25
Q4: Can Answers be Underinformative?
S: On what date do you want to go? S: On what date do you want to go? U: Sometime next week. S: I need to know which day next week. U: Tuesday.
Q5: Can the User use Ambiguous Designators?
S: Where do you want to go? S: Where do you want to go? U: London. S: London Heathrow or London
Gatwick? U: Heathrow U: Heathrow.
11/26/2008
26
Q6: Can Answers Provide Negative Information?
S: When do you want to leave? S: When do you want to leave? U: Not before lunchtime.
Q7: Can Help be Asked For in Appropriate Ways?
S: What meal do you want? S: What meal do you want? U: What are the choices? S: Standard, vegetarian, Kosher and
Halal. U: Halal please U: Halal, please.
11/26/2008
27
Q8: Can the User Initiate Subdialogs?
S: Where do you want to go? S: Where do you want to go? U: Are there any economy flights to
Boston today? S: Yes. U: I want to go to Boston today and U: I want to go to Boston today and
New York on Thursday.
Q9: Can the System Reformulate an Utterance?
S: Do you want to fly cabin class? S: Do you want to fly cabin class? U: I don't understand. S: There are two classes of seats,
business and cabin; cabin is also known as economy class. Do you want to flyas economy class. Do you want to fly cabin class?
11/26/2008
28
Additional questions Q10 – Can the system deal with inconsistent Q10 Can the system deal with inconsistent
information? Q11 – Can the system deal with belief revision? Q12 – Can the system deal with no answer to a
question at all? Q13 – Can the system repeat an utterance on
request?q Q14 – Does the system make it explicitly clear that it
is not a human? Q15 – Can the system keep track of multiple entities
(e.g., routes) at the same time?
Homework I7 – Part I
Evaluate a commercial IVR system Evaluate a commercial IVR system Call 1-800-FANDANGO Characterize the dialog for checking the times of
a particular movie at a particular theatre as a hierarchical STN. Start with the top level and work down. No more than 30 states.
Evaluate the TRINDI tick list items (yes or no) presented in class.
EMAIL your results.
11/26/2008
29
Homework I7 – Part II
Write Voice XML Write Voice XML Get the Voxeo 'helloworld' VXML tutorial
application working. This involves signing up for a Voxeo developer account.
NOTE: host your vxml pages on your own web server (not Voxeo's).
Write a new VXML application to tell 'knock Write a new VXML application to tell knock knock' jokes
As many as the user wants to hear - repeating after 3 is OK.
To do
Read Read Dix Ch 19 - CSCW 3 CSCW papers
Homework I8: call 1-800-FANDANGO & VXML T7 - software prototype
Address feedback from heuristic evaluations assign each a severity rating (cosmetic, minor, major, catastrophic)g y g ( , , j , p ) brainstorm possible solutions Modify your system to correct as many of the problems found as
possible (in priority order) document how you do this.
Get started on final report CHI format
11/26/2008
30
Papers: Speech Interfaces
Similarity is More Important than Expertise: Similarity is More Important than Expertise: Accent Effects in Speech Interfaces - CHI'07 [Tracy]
Dealing with System Response Times in Interactive Speech Applications CHI'05 [Bhavna][Bhavna]
Shaping User Input in Speech Graffiti: a First Pass – CHI’06 [Arpit]
top related