telepresence tutori a l
DESCRIPTION
Telepresence Tutori A l. July 30, 2012. Overview. Introduction to Telepresence (Stephen Botzko ). What is Telepresence: Co-location. At its core, Telepresence uses technology and “ stagecraft ” to create a sense of co-location (meeting participants feel they are in the same space). - PowerPoint PPT PresentationTRANSCRIPT
Telepresence TutoriAl
July 30, 2012
Overview
Introduction to Telepresence(Stephen Botzko)
What is Telepresence: Co-location
At its core, Telepresence uses technology and “stagecraft” to create a sense of co-location (meeting participants feel they are in the same space).
Key Aspects: Gaze Awareness, Eye Contact, Actual Size Rendering
Telepresence Dinner
History
• “Toward the Telehandshake” 1983Media for Interactive Communications; Bretz and Schmidbauer
• Commercial systems began in the 90s– TeleSuite founded in 1993 – Cisco, HP, Polycom, etc. by 2010.
Some Product Examples
Telepresence: Definition
• Telepresence: An interactive audio-visual communications experience between remote locations, where the users enjoy a strong sense of realism and presence between all participants by optimizing a variety of attributes such as audio and video quality, eye contact, body language, spatial audio, coordinated environments and natural image size.
How is it done? Lay out physical space / Identify sight lines
How is it done? Partition the space
How is it done? Place cameras and displays
Essential Co-location Requirements
• Preserve spatial relationships between streams• Maintain coherence of audio and video “stage”• Ability to scale images to true size• Ability to select best sight line
• Many of these facilities can also be used to enhance other non-telepresence applications.
IETF CLUE Working Group(Mary Barnes)
What is CLUE?
• CLUE = ControLling mUltiple streams for tElepresence
• Motivation: – Currently deployed telepresence systems are not interoperable
What is CLUE?
•Objectives: – Describe the data required for the handling of multiple streams– Define the behavior required to negotiate the use of multiple streams of audio and video media flows
Scope of CLUE
• RTP and SIP based systems• Define signaling for transporting CLUE
information• Apply existing protocols for signaling and
transport • Extensions to existing protocols in
appropriate WGs (e.g., AVTCORE and MMUSIC)
Data ModelData Model Call FlowsCall Flows
RequirementsRequirements
Working Towards a Solution
Use CasesUse Cases
RTP UsageRTP Usage
SignalingSignaling
CLUE Telepresence Scenarios(Roni Even)
Overview
• Telepresence systems (TP)– Primary objective is for an immersive
experience as close to “being there” as possible
• Life-size video display• Eye contact• Gaze direction• Spatial audio
Central Camerassemi circular seating
19
Cameras located with screensSemi circular or Linear classroom seating
20
Telepresence architecture
• TP systems will typically have multiple cameras and microphones– Typical system will have the same
number of monitors and cameras (1 and 3 are common but some systems will have 2 and 4)
22
23
24
25
Additional Use Cases
• Dynamically add video sources from an endpoint based on meeting context– E.g. turn on a document camera or provide video
stream of presentation
• Different number of cameras and screens. Example 3 cameras with six screens or with one big screen.
CLUE Framework(Allyn Romanow,Andy Pepperell)
Power of the Framework
Power of the Framework
• Receiver driven• Chooses what to receive and encoding
• Media captures
• Description used by renderer• Advertised by provider• Chosen by consumer
3030
What is the Framework?
Vendor One
Vendor Two
Provider and ConsumerI am a provider. I advertiseI am also a consumer. I choose
I am a provider. I advertiseI am also a consumer. I choose
I am a provider. I advertiseI am also a consumer. I choose
I am a provider. I advertiseI am also a consumer. I choose
MCU
Meow.. Send me 2 streams of 360 at 1080p, and 1 audio..
I can send you 1 image of both of us, or 2 images each of 1 of us. I can send them at 1080p, 720p or 360. 1 mono audio at 64k
I can send you one image, 2 images, or 3 images; 1 or 2 mono audio streams . I can send streams at 1080p , 720p, and 360 as long as total not over 4896 Mbps. All at 4Gps not exceeding 6Gbps. Audio at 64 kbits each.
Basic Idea
Woof woof. I’ll take the single stream at 720p, single mono audio.
Media Captures
• Fundamental CLUE concept• Media capture is a media representation of
some portion of the provided scene• Eg #1: video from the left camera of 3 (maybe
show in diagram)• Eg #2: a stereo audio capture of a room’s
audio
Capture Attributes
– Each capture described via its attributes– High-level categorization, audio vs video– Spatial information (“3 – D cartesian co-
ordinates”) to enable correct rendering– Switched capture– Mechanism for extensibility
Capture Scene
Each alternative representation of a Scene is a capture entry in a Capture Scene
Three cameras
Two cameras, moved and zoomed out
Switched (based on voice), composed PiP
(VC0, VC1, VC2)
(VC3, VC4)
(VC5)
(AC0)
(VC0, VC1, VC2)
(VC3, VC4)
(VC5)
(AC0)
Capture Scene EntriesVC0 VC2VC1
VC3 VC4
VC5
Main Media
Basic CLUE MessagingProvider Consumer
Provider capture advertisement
Consumer stream choice
media streams
Provider capture advertisement
Consumer stream choicePotentially multiple
further exchanges
media streams
ProviderConsumer
Provider capture advertisement
Consumer stream choice
Provider Capture Advertisement
• Provider tells consumer about its media captures– Enumeration of available media captures
• Includes organisation of captures into scenes
– Physical constraints• Center camera may also be used for “zoomed out” view
– Encoding constraints• Provider expresses its overall encoding capabilities• Allows modelling of multiple constituent physical units
Consumer Choice
• Consumer tells provider which captures it wishes to receive– Encoding parameters such as max resolution, mbps
etc.– Instantiates provider media captures to “real”
streams• Captures can have multiple instantiations; not a
simple one to one mapping between captures and encodings
– Media model no longer simply “transmitter chooses”
Receiver Choosing is Powerful
• Consumer do its own layout• Knows its display hardware• Number of streams, bw, resolution• Receiver multiple representations of same
scene– Recording– MCU switch different versions out
• Expanded functionality, flexibility
Framework Realization (Rob Hansen)
Example Endpoint - Alice
Camera Set
Main Display
Speaker
Microphone
Desk
Seat
Presentation Display
Region A
Region B
Region C
Alice BobSIP: INVITE
SIP: ACK
SIP: 200 OK
(optional) Single-stream RTP + RTCP
CLUE: Advertisement
CLUE: Configure
Multi-stream RTP + RTCP
Example Call-Flow
Example SIP INVITE• Acceptable to non-CLUE endpoints• As always, SDP defines limits of RTP
sessions• INVITE contains CLUE transport details• Alice’s SDP has 1 audio m-line, 1 video m-
line:v=0 o=alice 2890844526 2890844526 IN IP4 client.atlanta.example.coms=-c=IN IP4 192.0.2.101t=0 0b=AS:6064m=audio 49172 RTP/AVP 0a=rtpmap:0 PCMU/8000m=video 49174 RTP/AVP 96b=AS:6000a=rtpmap:96 H264/90000a=fmtp:96 profile-level-id=42e016;max-mbps=244800;max-fs=8160
Example CLUE Advertisement• Capture Scene
– Captures– Entries
• Simultaneous Transmission Sets• Encoding Group
– Encodings
Example Captures
Capture 4Switched videoNo spatial parameters
Capture 3Static videoSpatial parameters
Capture 2Static videoSpatial parameters
Capture 1Static videoSpatial parameters
Capture 5Mixed audioNo spatial parameters
Video Audio
Capture Spatial Parameters
Camera Set
Main Display
Speaker
Microphone
Desk
Seat
Presentation Display
Region A
Region B
Region CCapture 3Static videoRegion C
Capture 2Static videoRegion B
Capture 1Static videoRegion A
Point of CaptureAxis of CaptureArea of Capture
• Entries of the same media type define alternative views of the scene.
• Alice advertises three entries:
Example Entries
Entry 1: Video captures 1, 2 & 3 (three static cameras)
Entry 2: Video capture 4 (switched video stream)
Entry 3: Audio capture 5 (mixed audio stream)
Encoding group limit: Max bandwidth 6Mb
Encoding Group & Encodings
Max 1080p @ 4Mb
Video Audio
Max 1080p @ 4Mb
Max 1080p @ 4Mb
Max 64kb
• Encodings define the maximum encoding parameters available for streams.
• Alice advertises the ability to encode up to three streams at 1080p, 4Mb, but with an overall limit of 6Mb:
CLUE Configure
• Bob selects the three static camera streams at 720p, and the mixed audio stream:
Static capture 1Max 720p @ 2Mb
Static capture 2Max 720p @ 2Mb
Static capture 3Max 720p @ 2Mb
Mixed capture 5
Video Audio
Multi-stream media
Alice BobAudio RTP session
Video RTP Session
audioport
videoport
• Alice sends 1 audio stream• Alice sends 3 multiplexed video streams
Alice BobSIP: INVITE
SIP: ACK
SIP: 200 OK
Single-stream media (optional)
CLUE: Advertisement
CLUE: Configure
Multi-stream media
Bob changes his request
CLUE: Configure
Different multi-stream media
2nd CLUE Configure
• Advertise/configure is not offer/answer – messages are sent independently
• Bob now requests the single, switched video stream at 1080p:
Video Audio
Switched capture 4Max 1080p @ 4Mb
Mixed capture 5
Summary
1) CLUE is about more than telepresence – developing building blocks for other multi-stream applications2) CLUE uses SIP and SDP signaling for session setup.3) CLUE defines additional non-O/A signaling to communicate CLUE specific information.
References• CLUE Requirements:
draft-ietf-clue-telepresence-requirements• CLUE Use Case:sdraft-ietf-clue-telepresence-
use-cases• CLUE Framework: draft-ietf-clue-framework• RTP Usage:
– draft-lennox-clue-rtp-usage– draft-even-clue-rtp-mapping
• Call Flows: draft-romanow-clue-call-flow• Data model: draft-romanow-clue-data-model
Contributors to the tutorial (alphabetical order)
• Mary Barnes• Espen Berger• Stephen Botzko• Mark Duckworth• Roni Even• Rob Hansen• Paul Kyzivat• Jonathan Lennox• Andy Pepperell• Allyn Romanow
Questions?
Backup