technical challenges and opportunities for live vr
TRANSCRIPT
Technical Challenges & Opportunities for Live VRJULES DAVIS – CTO FOCAL POINT VR
What is VR for?
VR Goal: Teleportation?
Video: Teleportation to Reality
Live Video: Presence at an Event
What is Presence?
Wikipedia: It is defined as a person's subjective sensation of being there in a scene depicted by a medium
Michael Abrash: “Presence is VR Magic…it engages you at a deeper, more visceral level than any other form of entertainment”
Presence RequirementsFeature VR Today Human Perception
Field of View (per eye) ~80° x 90° 160° x 130°
Acuity (pixels / degree) 12 - 18 ~60 (and True HD)
Resolution (per eye) ~1k x 1k ~10k x 8k
Refresh Rate 90 Hz 120 Hz ?
Tracking / latency 5 - 20 ms 4 ms ?
Michael Abrash at Steam Dev Days 2014http://media.steampowered.com/apps/abrashblog/Abrash%20Dev%20Days%202014.pdf
Video Mechanics - Capture
Samsung Beyond iZugur Z63DC
Google Jump / GoPro Odyssey
Capture
Left Eye Only
12 Camera GoPro Rig5 pairs horizontally1 up and 1 down
Stitching / Projection Stitch images together To map onto a sphere surrounding viewer
Just like map projection in geography Most common is equirectangular projection
Stitching / Projection
Stitching / Projection
Stitching / Projection
Broadcast and Playback Upload stitched video to cloud Download or stream video to headset Project video onto a sphere and project
Live Virtual Reality Video
VR camera Video Processor Cloud VR PlayerBroadcast
Market changing fast Capture
Huge variety of cameras No camera meets all needs
Next VR, Nokia, Samsung, GoPro, Ricoh, Kodak, Sphericam, Vuze Stitching and Projection
Some cameras have it built in Video-Stich have Vahana
Broadcast YouTube, Facebook and many video streaming companies
Videos Today Max resolution 4k x 2k video Mix of mono and stereo Almost all using equirectangular
Challenges
Many choice Capture quality Dynamic range Resolution / Bandwidth Head Movement Stereo Quality …
Challenge 1 – Resolution and Bandwidth
Resolution / Bandwidth 4k video is normally 3840 x 2160 x 8 bit
H.264 good quality 18 mbps Bandwidth for 1 hour of video at 18 mbps
60 * 60 * 18 / 8 = 8 gigabytes For 100,000 viewers
8 GB * 100000 = 800 terabytes Bandwidth might be 5p per GB
Cost = 0.05 * 670000 = £40,000 20x cost of equivalent SD broadcast (4x 1080p)
Target is Headset Resolution Gear VR has highest pixel density
H.FoV = 72.9° & H.Res = 1280 ~17.5 pixels per degree
Target resolution ~6.3k x 3.2k per eyeMany H.264 codecs won’t handle this
4K video on Gear VR gives~10.5 pixels per degree horizontally~5.4 vertically
Resolution / Bandwidth
Simulation of 4k video displayed on native Gear VR
Technical Challenge #1Bandwidth / Resolution
Native headset resolution video in Stereo Equivalent quality to 18 Mbps 4K video But at much lower bandwidth – ideally 3-4 Mbps
Look for Redundancy Native resolution 17.5 pixels per degree Equirectangular texture = 6.3k x 3.2k x 2
Notice how stretched it is at poles
Look for redundancy – Projection Native resolution for Gear VR
6k x 3k x 2 => ~40 Megapixels for stereo pair Actual pixels needed is much less
Surface of a sphere with circumference 6k (res ² / π) 24.5 Megapixels (~60% extra pixels wasted)
Why use equirectangular? Pros
Plenty of software out there to generate it Fairly simple to render Creates one continuous rectangular array Simple for highly optimised video codecs
Cons requires 60% extra pixels to achieve equivalent quality Big distortion – straight lines become curves
Video codecs optimised for straight lines Rendering artefacts caused by non-linearity
Are there alternatives? Cube-maps?
+ Minimal distortion – straight lines stay straight + Hardware accelerated rendering - nearly 2x pixels of ideal minimum
Pyramids? Facebook have blogged about pyramids Cube-maps in disguise 5 planar projections instead of 6
Compress more efficiently Problem is as old as astronomy
Optimise Equirectangular? Too much horizontal resolution at poles
Resolution is about 2x above 60 degrees Chop the top and bottom off and half their width
Optimise Equirectangular Halve width of polar regions
Removes 30/180 of image => 5/6 * 40 = ~34 Megapixels Now we’re only 35% worse than ideal
General lesson We can divide sphere into regions Change projection and resolution
Can we do better?
Divide into multiple regions Remove down Vary resolution
Base on projection And area of interest
Other Options Lot’s of redundancy between left / right eye
Stereo aware compression as in 3D movies Reduction can be as much to 60%
Viewer often cares about one direction much more than another Broadcast of this event, screens and speaker more important Give them more bandwidth
Reduce resolution of off directions or reduce codec quality Send area around direction user is looking
Minimise switching latency Better codecs
H.265/HEVC – 50% if you’re lucky
The Future This Year
1k x 1k per eye 3 years
2k x 2k per eye (4k screens here now) 5+ years
4k x 4k per eye (wider field of view?)
Human vision Target per eye 8k x 8k may be sufficient?
Challenge 2 – 3D Vision
Stereo VR Videos Effectively a video for each eye
Parallax comes from camera positioning Packed vertically (left = top, right = bottom) Much stronger sense of presence
Stereo Vision Replace eyes by cameras
Stereo Vision Turn camera around head centre of rotation
Stitch and Project
Add a camera top and bottom Stitch all the left eyes together Stitch all the right eyes together
Stereo Vision
Truth about 3D VR Video
Creates a convincing sense of depth Increases sense of presence
This is good. Yay!
Truth about 3D VR Video Up and down are mono
Unavoidably – look up, turn 90°, look up again Effective Stereo separation varies with
viewing angle
Truth about 3D VR Video No toe in
Humans eyes track together Don’t look straight forward This impacts all VR for now
Truth about 3D VR Video
Camera is fixed position Don’t move your head
Camera pairs fixed separation orientation Don’t roll/tip your head
Truth about 3D VR Video Camera positions fixed
Position Roll IPD (based on view angle)
Perfect when eyes aligned with camera Less perfect elsewhere
More cameras and clever processing can improve Still limited by fixed view in each half of stereo video
What can be done? Need more 3D information
Depth and Occlusion Reconstruct view each frame
Reconstruction With depth and occlusion (geometry)
Generate right eye from left Correct stereo for up and roll
Reconstruct different positions and orientations Some head movement
Practical? Challenging computer vision problem
Probably not full-scene in real-time yet Multiple inward facing cameras
Motion capture suites Potentially laser scan fixed scene in advance Capture foreground objects live
Examples from Hololens, 8i and others Specular lighting difficult to reconstruct
Teleportation?
2021