audiovisual attentive user interfaces attending to the needs and actions of the user paulina...

Audiovisual Attentive User Interfaces

Attending to the needs and actions of the user

Paulina ModlitbaT-121.900 Seminar on User Interfaces and Usability

What is an Attentive User Interface? (1/2)

• Negotiate the timing and volume of communication with the user

• Use specific input, output and turn-taking techniques to determine what task, device or person a user is attending to

• User’s presence, orientation, speech activity and gaze and statistically modeling attention and interaction are detected

• Four characteristic components– visual attention– turn-taking techniques– modeling techniques for the attention– focus and context displays and visualisation

• Dürsteler (2003)

What is an Attentive User Interface? (2/2)

Why are they needed?

• Roel Vertegaal (2003)• Multiple ubiquitous computing devices lead to a

growing demands on users’ attention• Metaphor: modern traffic light system

– Sensors– Statistical models of traffic volume– Peripheral displays (traffic lights)

• Disruptive effect of interruptions can be avoided

Evolution of human-machine interaction

1960s-1980s: many-one 1980s-1990s: one-one

2000s-2010s: many-many1990s-2000s: one-many

Visual attention

• Eye-gaze tracking: detecting the user’s visual focus of attention

• Operate by sending an infrared light source toward the user’s eye

• Provides information about the context

• Central I/O channel in communication

• Limitations in existing hardware/software

• Biological limitations

Reasons for implementing gaze tracking

• Kaur et al. (2003)• The gaze location is the only reliable predictor of

the locus of visual attention• Gaze can be used as a “natural” mode of input

that avoids the need for learned hand-eye coordination

• Gaze selection of screen objects is expected to be significantly faster than the traditional hand-eye coordination

• Gaze allows for hands-free interaction

Current issues

• Limited size of fovea (1-3°)

• Subconscious eye movements

• Eyes are not control organs (Zhai et al., 2003)

• No natural analogy to current input devices, e.g. mouse

• Gaze is always active (Kaur et al., 2003)

Current state

• Eye-gaze control used as an additional input channel

• Provides context to the action

• Combined with manual input gaze tracking can improve the robustness and reliability of a system

EASE Chinese Input (1/2)

• Zhai et al. (2002)• Supports pinyin type-writing

– official Chinese phonetic alphabet based on Roman characters

– Chinese characters are homophonic - each syllable corresponds to several Chinese characters

– When the user types the pinyin of a character, a number of possible characters with the same pronunciation are displayed

• Normally, user chooses a character by pressing a number on the keyboard

• With EASE user only has to press the spacebar as soon as he or she sees the wished-for character in the list

• The system selects the character closest to the user’s

current gaze location

EASE Chinese Input (2/2)

Speech recognition (1/2)

• Limited technology, despite extensive research and progress

• Crucial issues– error rate of speech recognition engines and

how these errors can be reduced– the effort required to port the speech

technology applications between different application domains or languages (Deng & Huang, 2004)

• Three directions for enhancing the technique– improve the microphone ergonomics for

enhancing the signal-to-noise ratio – equipping speech recognizers with the ability

to learn and to correct errors – add semantic (meaning) and pragmatic

(application context) knowledge (Deng & Huang, 2004)

Speech recognition (2/2)

Multimodal interfaces

• Can provide more natural human-machine interaction

• Improves the robustness of the interaction by using redundant or complementary information

• Today: usually gaze/speech + manual control (e.g. mouse)

• Future: gaze + speech, gaze, speech

Main issue

• Shumin Zhai (2003)

• “We need to design unobtrusive, transparent and subtle turn-taking processes that coordinate attentive input with the user’s explicit input in order to contribute to the user’s goal without the burden of explicit dialogues.”

Manual and Gaze Input Cascaded (MAGIC) Pointing

• interaction technique that utilizes eye movement to assist the control task

• Zhai et al. have constructed two MAGIC pointing techniques, one liberal and one conservative (Zhai et al., 1999)

Liberal approach (1/2)

• The cursor is warped to every new object that the user looks at

• The user can then manually take control of the cursor near (or on) the target, or ignore it and search for the next target

• New target defined by distance (e.g. 120 pixels) from the current cursor position

• Issues: pro-active (cursor waits readily); overactive (gaze enough to move cursor)

Liberal approach (2/2)

Conservative approach (1/2)

• Warps the cursor to a target when the manual input device has been actuated

• Once moved, the cursor appears in motion towards the target

• Hence, the cursor never jumps directly to a target that the user does not intend to obtain

• May be slower than the liberal approach

Conservative approach (2/2)

EyeCOOK

• Bradbury et al. (2003) • Multimodal attentive cookbook that helps unaccustomed

computer users cook a meal • User interacts with the eyeCOOK system by using eye-

gaze and speech commands

• System responds visually and verbally • The system replaces the object of the user’s gaze with

the word “this”• If the user’s gaze can not be tracked by the eyeCOOK

system the user has to specify the target verbally

EyeCOOK in Page Display Mode

GAZE-2

• Vertegaal et al, 2003• A new group video conferencing system that

uses gaze-controlled cameras to convey eye-contact

• Consists of a video tunnel that makes it possible to place cameras behind the participant images on the screen

• system automatically directs the video cameras in this tunnel using a gaze tracker by selecting the camera closest to the user’s current focus of attention (gaze location)

GAZE-2 system structure

3D rendering

• The 2D video images of the participants are displayed in a 3D virtual meeting room and are automatically rotated to face the participant each user is looking at.

• In the picture bellow, everyone is looking at the left person, who’s image is broadcasted in a higher resolution.

Turn-taking in video conferencing

• Misunderstandings cause interruptions

• Eye contact plays an important role in turn-taking (Vertegaal, et al., 2003)

References

• Vertegaal, et al., 2003• Bradbury et al. (2003)• Zhai et al., 1999• Dürsteler (2003)• Vertegaal (2003)• Kaur et al. (2003)• Shumin Zhai (2003)• Zhai et al. (2002)

• (Deng & Huang, 2004)

Things missing• Are attentive user interfaces better in following the user in

order to "capture his/her context" to make proactive actions for him/her,or are they better used as input devices (an approach you take).

• The distinction between explicit and implicit input, as presented byHorvitz (you can find a link from the seminar homepage), is thus importanthere and could give you benefit.

• Please take some real world examples of prototypes and real situationsto your presentation. This makes grasping the idea better and arguingmore concrete. You might consider presenting other application ideas aswell as the ones already in the paper.

• I think you would benefit from considering in more detail, for eachparticular application, why attention and preferences are tracked andhow they might be combined, effectively, to minimize disruption and makeinteraction more fluent. Binding the presentation more tightly to the"let's make interruptions go away" theme of the seminar is important here.

• Consequently, the presentation, it would be nice to see your analysis of"how things were" and "how things are" (now with AUIs).

Oulasvirta

• Attention• Working memory• Long-time memory• Task resumptions• Control• Trust • Stress• Social interaction

audiovisual attentive user interfaces attending to the needs and actions of the user paulina...

system slide

ease user

usability slide

mouse gaze

locus of visual attention

users eye

manual input gaze tracking

user types

Documents

(user) interfaces

mobile user interfaces

user interfaces 4

probabilistic user interfaces

inductive user interfaces

new user interfaces

generating consistent interfaces for appliances jeffrey...

organic user interfaces

graphic invention for user interfaces: an...

various user interfaces

adaptive user interface modelling for web-environments...

course outline - testout · 5.0 graphical user interfaces...

evaluating user interfaces

innovative user interfaces

user interfaces

user interfaces products

speech user interfaces

graphical user interfaces

creating user interfaces

building user interfaces with delphi...