Transcript

Challenging 5 Common Assumptions about

Videoconferencing

Milton ChenComputer Systems Lab

Stanford University

Presented at Internet2 Advanced Applications Track 10/28/2002

Copyright 2002 Milton Chen

The Stanford Video Auditorium

desktop interface

15’ x 5’ video wall

Copyright 2002 Milton Chen

Video Auditorium publicity/usersIntel president Paul Otellini’s Intel Developer Forum keynoteInvited demo to NASA headquarters for Paul G. Pastorek

CANARIE, CanadaCUDI, MexicoComdex, BrazilIBM Almaden LabManhattan College Hopkins Marine Station Stanford Medical SchoolStanford Learning LabStanford Center for Design ResearchBerkeley Bioengineering LabUniversidade Federal do Rio Grande do Sul, Brazil

OutlineCommon assumptions

– Technology1. High-fidelity AV requires dedicated hardware2. Difficult to install and use

– Human factors3. Life size displays are ideal4. Floor control requires interactive frame rate5. Eye contact is difficult

Beyond MCU and H323– Peer-to-peer– Stanford’s Port Bootstrap Protocol– Personal directory

An evaluation of distance learning at StanfordWhy videoconferencing is not ubiquitous

1. High-fidelity low-latency AV requires

dedicated hardware

Copyright 2002 Milton Chen

                                              

$700 Pentium 4 computer $7000 systemsoutperforms

Your PC outperforms all dedicated systems

Comparison of videoconferencing solutions

Max number of links

Max video resolution

BW required at 352x288 15fps

NetMeeting 1 352x288 200 Kbps

WIDE DVTS 1 720x480 3000 Kbps

Vbrick 1 720x480 2000 Kbps

Polycom, Sony, … 4 352x288 200 Kbps

AccessGrid, VRVS many 720x480 400 Kbps

Stanford Video Auditorium

16 to more than 100

720x480 100 Kbps

* CUSeeME, iVisit, Yahoo messenger have unacceptable latency

demo

* TrueSpeech 8.5* MPEG-4* Encrypted, AES (Rijndael), streaming* Simultaneous AV recording* Perceptual streaming adapts to network conditions

A scalable AV streaming architecture

audiocapture

audiocompress

audiosend

audioreceive

audiodecompress

audiorender

videocapture

videocompress

videosend

videoreceive

videodecompress

videorender

Copyright 2002 Milton Chen

Copyright 2002 Milton Chen

Beyond MCU and H323

MCU vs. peer-to-peer– Scalability– Ease of deployment

H323 vs. Stanford’s Port-Bootstrap Protocol– Firewall– Ease of deployment

Personal directory

2. Videoconferencing systems are difficult to install and use

Copyright 2002 Milton Chen

One click operationTo use the Video Auditorium

– “Nothing” to install– One click on the html speed dial

<OBJECTCLASSID="CLSID:E80F7B8F-7906-4A89-B59E-B19871F474A9"

CODEBASE="runtime/VA_Start.ocx#Version=-1,-1,-1,-1"> <PARAM NAME="addr" VALUE="stanford -client_only"></OBJECT>

Makes conferencing as simple as surfing the web

3. Life size displays are ideal

Copyright 2002 Milton Chen

Each video should be between 6° and 14° wide

smile recognition time

0

350

700

0 10 20 30

video size (deg of visual angle)

time

(mse

c)

* 12 people sat 10’ from the display Subjectively, people reported 6° as minimum and 14° as ideal. Life size is 12°.

Copyright 2002 Milton Chen

Balance between size and head movements

* 12 people viewed 9 and 36 students on a large and immersive display. Immersive display requires head movements to see all the students.

0%

50%

100%

9 students 36 students

class size

pre

fere

nce

immersive (64°)

large (27°)

14°

4. Effective floor control requires

interactive frame rate

Copyright 2002 Milton Chen

Minimum required frame rate

Interactive 10 fps

Tolerable 5 fps– [Tang and Isaac ’93]

Lip synchronization 5 fps– [Watson and Sasse ’96]

Content understanding 5 fps– [Ghinea and Thomas ’98]

Sign language recognition 1 fps– [Johnson and Caird ’96]

Copyright 2002 Milton Chen

Gesture Detection Algorithm

input image frame difference after erosion

Visualization of algorithm

Copyright 2002 Milton Chen

Requires 10% of full motion bandwidth

0

25

50

75

100

0 100 200 300

time (frame number)

fram

e s

ize (

kb

its)

0

25

50

75

100

0 100 200 300

time (frame number)

fram

e s

ize (

kb

its)

full-motion (10 fps)

gesture-sensitive (0.2 fps)

* MPEG4 encoded at 320x240

Copyright 2002 Milton Chen

Gesture sensitive allows dynamic discussion

15 fps ~0.2 fps 0.2 fps

0

1

2

3

4

5

full motion gesture sensitive low update

spea

ker c

hang

e per

min

ute )

* 8 groups of 4 people during a discussion

5. Eye contact is difficult

Copyright 2002 Milton Chen

Eye contact fires up our brain

[Kampe et al. ’01]

Copyright 2002 Milton Chen

Eye contact is difficult

Looking into the camera Attempting eye contact

Copyright 2002 Milton Chen

Solutions to eye contact

Half-silvered mirror [Rosenthal ’47] MAJIC [Okada, et al. ’94]

ClearBoard [Ishii, et al. ’92]GazeMaster [Gemmell, et al. ’00]

Copyright 2002 Milton Chen

A simple solution

Hydra [Sellen, Buxton, and Arnott ’92]

Copyright 2002 Milton Chen

Eye contact sensitivity is high

Spatial perception task

As good as Snellen acuity[Gibson and Pick ’63]

2 m

0 8.5-8.50

100stdev = 2.8°

Eye

con

tact

(%

)

Angle (deg)

* 6 observers judged 1 looker

looker observer

Copyright 2002 Milton Chen

Sensitivity is symmetricCline ’67

Kruger and Huckstedt ‘69

Anstis, et al. ’69

Stokes ’69

Ellgring ’70

PicturePhonecamera above display

Hydracamera below display

Copyright 2002 Milton Chen

Methodology

* Two rooms can be linked in a videoconferencing session

Observers watch videos of looker and judge eye contact

large display with camera at the center

Record lookers gazing at different targets

Copyright 2002 Milton Chen

Sensitivity is asymmetric

* 16 observers judged recorded videos of 1 looker

Copyright 2002 Milton Chen

An anatomical explanation

looking at you looking sideways

looking up

looking down eye closing

Illustrations from The Artist’s Guide to Facial Expression[Faigin ’90]

Copyright 2002 Milton Chen

Sensitivity is less in conversation

0

25

50

75

100

0 5 10 15visual angle (deg)

eye

cont

act (

%)

* 16 observers judged videos of 1 looker

(down)

recorded

conversation

Copyright 2002 Milton Chen

Sensitivity is less in video

0

25

50

75

100

0 5 10 15visual angle (deg)

eye

cont

act (

%)

* 16 observers judged 1 looker in conversation

(down)

face-to-face

video

Copyright 2002 Milton Chen

We are biased to perceive contact

angle

eye

cont

act (

%)

sideway,up down

down &video

down &video &conversation

Snellen Acuity Conferencing Acuity

0

100

Copyright 2002 Milton Chen

Maximum camera to eyes distance

* Assuming a sensitivity of 7°

device minimum viewing distance

camera to rendered eyes distance

Palm held 1’ 1.5”

Desktop 2’ 3”

Wall size 8’ 12”

Copyright 2002 Milton Chen

Eye contact in the Video Auditorium

Why is videoconferencing essential to distance learning:

An evaluation of distance learning at Stanford

Copyright 2002 Milton Chen

Distance learning at Stanford

Remote students can call in during class

Instructor cannot see the remote students

a 1969 classroom

a 2002 operator console

a 2002 lecture viewer

Copyright 2002 Milton Chen

Students like distance learning

Attitude toward distance learning

0%

50%

100%

students TAs faculty

enjoy

does not matter

dislike

other

* 120 students, 15 TAs, and 41 faculty

Copyright 2002 Milton Chen

Learning is less effective

Learning outcome

0%

50%

100%

students TAs faculty

increasegreatly

increasesomewhat

does notchange

decreasesomewhat

decreasegreatly

* 120 students, 15 TAs, and 41 faculty

Copyright 2002 Milton Chen

F2F interaction is important

Importance of f2f interaction

0%

50%

100%

students TAs faculty

extremely

very

moderately

somewhat

not

F2F is important for lecturing and crucial for discussions

Copyright 2002 Milton Chen

No interaction with remote students

Classroom observation of 4 CS classes– Instructor on average asked 9 questions per

session– Local students on average asked/made 3

questions/comments per session

– Remote students spoke once in 6 month

Copyright 2002 Milton Chen

Value of video beyond audio

Cues only transmitted by the visual channel– Negative feedbacks, …

Emotional bond– Establishing and maintaining relationships

Can you imagine it?– A new face, …

A proposal

The world’s largest video wall:link all Internet2 members for Spring 03

Developed technologyOne Mouse

AV stream migration

Bandwidth: 2 x 300 x (100 Kbps + 10 Kbps) 60Mbps

Cost: 10 P4 laptops + 10 portable projectors $30K

A prediction

Copyright 2002 Milton Chen

A plane that does not fly is not a plane

First flight, Wrights 1903

A videophone that limits communication is not a videophone• poor audio fidelity• poor video fidelity• excessive latency• no eye contact• poor lip synchronization

Why all videoconferencing products has failed

Copyright 2002 Milton Chen

Threshold of quality for the 2nd revolution

first mobile phone, 1924 first handheld phone, 1973

1st Revolution: Possible 2nd Revolution: Practical

first videoconferencing system, 1927

Copyright 2002 Milton Chen

Conclusion

Common assumptions1. High-fidelity AV requires dedicated hardware higher on a PC

2. Difficult to install/use one click

3. Life size displays are ideal 6° to 14°

4. Floor control requires at least 10fps 0.2 fps avg

5. Eye contact is difficult 7° down

Videoconferencing is essential to distance learning

A MCU-less and H323-less future

You already have a one-click high-fidelity multiparty

videoconferencing system

We are at the dawn of a videoconferencing revolution that will fuel the demand for a 1000X increase in available bandwidth

Acknowledgement– NASA– Intel– Sony– Interval Research– Wallenberg Global Learning Network– Department of Defense

Future work– Gold release for Feb 2003– SDK– The Wall


Top Related