from head pose to focus of attention - rainer stiefelhagen from head pose to focus of attention

56
From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

Upload: philomena-powers

Post on 17-Dec-2015

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

From Head Pose to Focus of Attention

Page 2: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Overview

• Motivation• Face Tracking

• Head Pose Tracking– A model-based approach– An appearance-based approach

• From Head Pose to Focus of Attention• Some Applications

• Outlook

Page 3: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Why Tracking Focus of Attention ?

• FoA in Meetings: – To determine the addressee of a speech act

– To determine to whom a person is listening

– To understand the dynamics of interaction

– For meeting indexing / retrieval

• FoA for Context-Aware Interaction: – To know with what a user is interacting

(TV, fridge, household robot, …?)

– To know whether a user is aware of something

Page 4: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Gaze and Attention

• Gaze is a good indicator of attention:– We look at people to whom we talk or listen

– We look at objects that are of interest to us

– We look at objects to whom we refer

– Looking at someone is perceived as a signal of attention

• Eye-gaze is difficult to track …– with changing number of users at changing locations

• Head Orientation is a sufficient cue in many cases

Page 5: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Tracking of Human Faces

• A face provides different functions:• identification• perception of emotional expressions

• Human Computer Interaction requires tracking of faces:• lip-reading• gaze tracking• facial action analysis / synthesis

• Video Conferencing / video telephony application:• tracking the speaker• achieving low bit rate transmission

Page 6: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Color Based Face Tracking

Human skin-colors:• cluster in a small area of a color space• skin-colors of different people mainly differ in intensity!• variance can be reduced by color normalization• distribution can be characterized by a Gaussian model

BGR

Rr

BGR

Gg

Chromatic colors:

Page 7: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Color Model

Advantages:• very fast• orientation invariant• stable object representation• not person-dependent• model parameters can be quickly adapted

Disadvantages:• environment dependent • (light-sources heavily affect color distribution)

Page 8: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Tracking Gaze and Focus of Attention

• In meetings:– to determine the addressee of a speech act – to track the participants attention– to analyse, who was in the center of focus– for meeting indexing / retrieval

• Interactive rooms – to guide the environments focus to the right application– to suppress unwanted responses

• Virtual collaborative workspaces (CSCW)• Human-Robot Cooperation • Cars (Driver monitoring)

Page 9: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Tracking a User’s Focus of Attention

• Focus of Attention tracking:– To detect a person’s interest

– To know what a user is interacting with

– To understand his actions/intentions

– To know whether a user is aware of something

• In meetings:– to determine the addressee of a speech act

– to understand the dynamics of interaction

– for meeting indexing / retrieval

• Other areas– Smart environments

– Video-conferencing

– Human-Robot Interaction

Page 10: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Attention during Social Interaction

• “Looking Means Listening!” (Ruusuvuori 2000)

• Attentional signals:– Eye gaze– Head orientation– Body posture– Gestures

• Head Orientation is a good cue to predict focus of attention !

• Eye-gaze is difficult to track …

Page 11: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Typical Eye-Gaze Trackers

• Head mounted systems • user is allowed to move (limited by wires, etc.)• accurate head pose and eye-gaze (< 1 degree error)• fast (> 60 Hz possible)

• Very intrusive !

Page 12: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Eye Gaze Trackers

Remote Trackers:

• User‘s movements are very limited• Problems with head rotations

Page 13: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Eye Gaze Tracking

• Problems:– Requires the user to wear head gear– or position of the user relative to the

camera is rather fixed

– Certainly not acceptabel for everyday use in multimodal rooms

Page 14: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Head Pose Estimation

• Model-based approaches:– Locate and track a number of facial features– Compute head pose from 2D to 3D correspondences (Gee

& Cipolla '94, Stiefelhagen et.al '96, Jebara & Pentland '97,Toyama '98)

• Example-based approaches:– estimate new pose with function approximator (such as

ANN) (Beymer et.al.'94, Schiele & Waibel '95, Rae & Ritter '98)

– use face database to encode images (Pentland et.al. '94)

Page 15: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Model-based Head Pose estimation

Image 3D Model Real World

Y

Z

X

Feature Tracking Pose Estimation

•Find correspondences between points in a 3D model and points in the image

• Iteratively solve linear equation system to find pose parameters (rx, ry, rz, tx, ty, tz)

Page 16: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Real-Time Facial Feature Tracking

Top Down Approach:• searching the face by the face tracker • searching and tracking of 6 facial feature points: pupils, nostrils and lip corners

Page 17: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Real-Time Facial Feature Tracking

Searching pupils• searching for 2 dark regions• anthropometric constraints certain search area within found face• iterative to adapt to different lighting conditions

Apply iterative thresholding on grayscale image

Page 18: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Real-Time Facial Feature Tracking

Searching Lip Corners:• predict search window, using eye position and face-model• find vertical position of the line between the lips (using horizontal integral projection)• find horizontal boundaries using edge detection and vertical integral projection

Searching Nostrils:• search area restricted to area between eyes and lips• same technique as used to find eyes

Page 19: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Demo / Video

Page 20: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Demo: Facial Feature Tracking

Page 21: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Demo: Model-based Head Pose

Page 22: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

System at Daimler Chrysler

DA A. King, System from Tilo Schwarz (Daimler-Chrysler, Ulm)

Page 23: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Model-based Head Pose

• Pose estimation accuracy depends on correct feature localization!

• Problems:– Choice of good features– Occlusion due to strong head rotation– Fast head movement– Detection of tracking failure / re-initialization– Requires good image resolution

• Video

Page 24: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Estimating Head Pose with ANNs

• Train neural network to estimate head orientation

• Preprocessed image of the face used as input

Page 25: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Network Architecture

Hidden Layer:40 to 150 units

Pan (Tilt)

Input Retina: up to 3 x 20x30 pixel 1.800 units

Page 26: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Image Preprocessing• Automatic extraction of faces

– using a skin-color model

• Preprocessing a) Histogram normalization of grayscale images

b) Extracting horizontal- and vertical edges– Down-sampling to 20x30 pixel

Page 27: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Histogram Transformation

possible target functions:

grayvalue distribution:

accumulated grayvalues:

• Mapping of greyvalues to a fixed distribution • expands the range of intensities in the window• compensates for different camera gains• improves contrast (for gray levels near histogram maxima)

Page 28: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Visual Preprocessinggrayvalue modification - example histogram :

grayvalue new:)´(

functionon modificati:

grayvalue original:)(

))(()´(

pf

T

pf

pfTpf

Page 29: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Estimating Head Pose with ANNs

• Estimate head pan and tilt from the facial image• Data Collection:

– perspective views of users are collected with pano-cam– labels from a magnetic field pose tracker– data from fourteen users, at four positions around table

collected

Page 30: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

ANN: Training / Results

• Training on 6100 images from 12 users (+mirrored images)

• Crossevaluation on 750 images from same 12 users

• Tested on 750 images from these users• Additional User Independent Testset: 1500 images from two new

users

training set test set new usershisto 4.8 / 3.6 5.5 / 4.1 10.4 / 9.3edges 4.3 / 2.9 5.6 / 3.5 12.2 / 10.3both 2.1 / 2.1 3.1 / 2.5 9.5 / 9.8

Average Error in degrees for pan / tilt

Page 31: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

FoA Tracking in Meetings

How:1. Track all participants‘ faces2. Estimate their head orientations (ANNs)3. Map head orientations onto likely targets (e.g.

the other participants)

Why:

– to determine the addressee of a speech act

– to track the participants attention

– to analyse, who was in the center of focus

– for meeting indexing / retrieval), ...

Page 32: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Focus of Attention Tracking in Meetings

Page 33: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Tracking People in a Panoramic View

Camera View Panoramic View

PerspectiveView

Page 34: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Demo

Page 35: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Modeling Focus from Head Orientation

)(

)()|()|(

xp

TFocusPTFocusxpxTFocusP S

S

x: head pan in degrees, T {Person 1, Person 2, …, Person M }

• Head Orientation is a strong indicator of social attention• Eye-Gaze is difficult to measure

Estimate a persons focus of attention based on his head orientation

Page 36: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Head Pan Distributions p(x|Fi)

Person 1 Person 3Person 2

• Head pan distributions are dependent on

- personal head-turning “styles”- location of targets

• Distributions should be adapted for- each person- each meeting

Page 37: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Adaptation of the Model

All observations: p(x) Class-cond. Distr. p(x|F) Posteriors P(F|x)

• P(x) is modeled as mixture of Gaussians:

M

j

jPjxpxp1

)()|()(

• Model parameters are found using EM-approach• Individual Gaussian components are used as class-conditionals• Priors of the mixture model P(j) are use as focus prior P(F = T)

Page 38: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Focus based on Head Orientation

P(Focus | Gaze)

Meeting A (4 participants) 68.8 % Meeting B (4 participants) 73.4 % Meeting C (4 participants) 79.5 % Meeting D (4 participants) 69.8 %

Avg. 72.9 %

Percentage of correctly assigned focus targets based on computing P(Focus | Head pan)

• Results on four meetings– 4 participants in each meeting– Participants automatically detected and tracked– unsupervised adaptation of model parameters

Page 39: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Head Orientation and Eye Gaze in Meetings

• Head orientation and eye gaze point to the same direction 87% of the time

• Head orientation accounted for 69 % of the overall horizontal gaze direction

• Focus can be predicted based on head orientation in 88.7 % of the frames (using our model ...)

Experiment: Four people in a meeting, head orientation and eye-gaze measured using ISCAN head-eye tracker

Page 40: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Head Orientation and Eye Gaze in Meetings (2)

Histograms of line of sight and head orientation:

Plot of horizontal orientation of head, eye and line-of sight:

Person 1 Person 2

Page 41: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Head Orientation and Eye Gaze in Meetings

• Four people in a meeting, head orientation and eye-gaze measured using ISCAN head-eye tracker

Page 42: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Head Orientation and Eye Gaze in Meetings (3)

• Focus can be predicted based on head orientation in 88.7 % of the frames (using our model ...)

Histograms of line of sight and head orientation:

Person 1 Person 2

Page 43: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Contribution of Head Orientation

Subject #FramesEye

BlinksSame

directionHead

Contribution

1 36003 25 % 83 % 62 %

2 35994 23 % 80 % 53 %

3 38071 19 % 92 % 64 %

4 35991 20 % 93 % 97 %

Average 22 % 87 % 69 %

Page 44: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Focus from Head Orientation

Adapt Mixture Modelto Head-Orientation Data.

Use Componentsas Class-Conditionals

P(Target | Head Pan)

Page 45: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Focus Detection Results

Subject Accuracy

1 86 %

2 83 %

3 93 %

4 93 %

Average 89 %

Page 46: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Predicting Focus from Sound

• Idea: People tend to look at the speaker– To signal attention– To speech-read– To monitor gestures, facial expressions, etc.

• Use information about who is speaking, to estimate where each participant is looking at:

P(Focus |”Audio”)

Page 47: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Predicting Focus from Sound (2)

• Finding P(F|At) by counting focus targets on training data for each “speaker case”

At=(S,L,C,R)

Left Straight Right

0 0 0 0 0.26 0.49 0.23

0 0 0 1 0.11 0.27 0.60

0 0 1 0 0.12 0.74 0.11

0 0 1 1 0.07 0.49 0.40

0 1 0 0 0.59 0.28 0.11

... ... ... ...

1 1 0 1 0.29 0.44 0.26

1 1 1 0 0.35 0.56 0.08

1 1 1 1 0.50 0.50 0.00

Focus-predicition: 56.4 % accuracy (four meetings)

Page 48: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Predicting Focus from Sound (3)

• Short-time “speaker-history” influences where people look at

predict P(F|At, At-1,…, At-N)

• ANN used to predict P(F|At, At-1,…, At-

N)

Page 49: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Sound-based Focus with ANNs

• ANN used to predict P(F|At, At-1,…, At-N)• Training- and cross-evaluation sets taken round

robin

52

54

56

58

60

62

64

66

68

1 2 3 4 5 10 15 20 25 30 40

History length [frames]

Acc

urac

y [%

]

2 hidden units

3 hidden units

5 hidden units

10 hidden units

20 hidden units

Sound-based prediction on four meetings

Page 50: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Predicting Focus from Sound (5)

• Counts on training data used to predict P(F| At )• ANN used to predict P(F|At, At-1,…, At-9)

P(F| At ) P(F|At,At-1,..,At-9)

Set A 57.7 63.0 Set B 57.6 67.2 Set C 46.9 60.2 Set D 63.2 72.6 Avg. 56.4 65.8

Audio-based Focus Prediction Results

Page 51: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Combination of Gaze and Sound

Video only Audio only Combined (α=0.6)

Set A 68.8 63.0 71.4 Set B 73.4 67.2 77.1 Set C 79.5 60.2 80.5 Set D 69.8 72.1 75.7 Avg. 72.9 65.6 76.2

• Accuracy increases from 72.9 % to 76.2 %• 12 % relative error reduction using a fixed α

Combination: (1- α) P(Fi|Gaze) + α P(Fi|Sound)

Page 52: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Applications: “Meeting Browser”

• Transcription• Summarization• Dialog Processing• Focus of Attention• Speaker + Face ID

Page 53: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

FoA for Indexing and Analysis of Meetings

• Activity / Attention Analysis of Meetings– temporally filtering the focus / speaking activity

• Meeting statistics: – how often were persons paying attention to the speaker – how often did people talk ?– how often have people been payed attention to ?

• In a Meeting Browser– to get additional tags for retrieval– for focus-oriented replay of (parts of) meetings

Page 54: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Applications: Human-Robot Interaction

• Focus of Attention tracking:– to understand the user’s

actions/intentions– to establish joint/shared

attention– to determine the addressee

of a speech act– to know when someone else

is addressed See: www.sfb588.uni-karlsruhe.de

Page 55: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Speech-RecognitionDialogue-ProcessingTV-Database, …

Focus-Aware Human-Robot Interaction

• Demo-System: Interaction with a Household Robot, a voice-controlled VCR, Broadcast-News retrieval system and people in the room

Attention-

tracker

Speech-recognition, Parser, Dialog-Proc.,Speech-SynthesisAnimation of some actions

Other peopleNo action

?

?

Page 56: From Head Pose to Focus of Attention - Rainer Stiefelhagen From Head Pose to Focus of Attention

From Head Pose to Focus of Attention - Rainer Stiefelhagen

Problems / Open Issues

• Robustness against changing illumination

• Handling of moving users, dynamic scenes

• Detection of new targets (objects, persons, ...)