technical challenges and opportunities for live vr

Technical Challenges & Opportunities for Live VRJULES DAVIS – CTO FOCAL POINT VR

What is VR for?

VR Goal: Teleportation?

Video: Teleportation to Reality

Live Video: Presence at an Event

What is Presence?

Wikipedia: It is defined as a person's subjective sensation of being there in a scene depicted by a medium

Michael Abrash: “Presence is VR Magic…it engages you at a deeper, more visceral level than any other form of entertainment”

Presence RequirementsFeature VR Today Human Perception

Field of View (per eye) ~80° x 90° 160° x 130°

Acuity (pixels / degree) 12 - 18 ~60 (and True HD)

Resolution (per eye) ~1k x 1k ~10k x 8k

Refresh Rate 90 Hz 120 Hz ?

Tracking / latency 5 - 20 ms 4 ms ?

Michael Abrash at Steam Dev Days 2014http://media.steampowered.com/apps/abrashblog/Abrash%20Dev%20Days%202014.pdf

Video Mechanics - Capture

Samsung Beyond iZugur Z63DC

Google Jump / GoPro Odyssey

Capture

Left Eye Only

12 Camera GoPro Rig5 pairs horizontally1 up and 1 down

Stitching / Projection Stitch images together To map onto a sphere surrounding viewer

Just like map projection in geography Most common is equirectangular projection

Stitching / Projection

Broadcast and Playback Upload stitched video to cloud Download or stream video to headset Project video onto a sphere and project

Live Virtual Reality Video

VR camera Video Processor Cloud VR PlayerBroadcast

Market changing fast Capture

Huge variety of cameras No camera meets all needs

Next VR, Nokia, Samsung, GoPro, Ricoh, Kodak, Sphericam, Vuze Stitching and Projection

Some cameras have it built in Video-Stich have Vahana

Broadcast YouTube, Facebook and many video streaming companies

Videos Today Max resolution 4k x 2k video Mix of mono and stereo Almost all using equirectangular

Challenges

Many choice Capture quality Dynamic range Resolution / Bandwidth Head Movement Stereo Quality …

Challenge 1 – Resolution and Bandwidth

Resolution / Bandwidth 4k video is normally 3840 x 2160 x 8 bit

H.264 good quality 18 mbps Bandwidth for 1 hour of video at 18 mbps

60 * 60 * 18 / 8 = 8 gigabytes For 100,000 viewers

8 GB * 100000 = 800 terabytes Bandwidth might be 5p per GB

Cost = 0.05 * 670000 = £40,000 20x cost of equivalent SD broadcast (4x 1080p)

Target is Headset Resolution Gear VR has highest pixel density

H.FoV = 72.9° & H.Res = 1280 ~17.5 pixels per degree

Target resolution ~6.3k x 3.2k per eyeMany H.264 codecs won’t handle this

4K video on Gear VR gives~10.5 pixels per degree horizontally~5.4 vertically

Resolution / Bandwidth

Simulation of 4k video displayed on native Gear VR

Technical Challenge #1Bandwidth / Resolution

Native headset resolution video in Stereo Equivalent quality to 18 Mbps 4K video But at much lower bandwidth – ideally 3-4 Mbps

Look for Redundancy Native resolution 17.5 pixels per degree Equirectangular texture = 6.3k x 3.2k x 2

Notice how stretched it is at poles

Look for redundancy – Projection Native resolution for Gear VR

6k x 3k x 2 => ~40 Megapixels for stereo pair Actual pixels needed is much less

Surface of a sphere with circumference 6k (res ² / π) 24.5 Megapixels (~60% extra pixels wasted)

Why use equirectangular? Pros

Plenty of software out there to generate it Fairly simple to render Creates one continuous rectangular array Simple for highly optimised video codecs

Cons requires 60% extra pixels to achieve equivalent quality Big distortion – straight lines become curves

Video codecs optimised for straight lines Rendering artefacts caused by non-linearity

Are there alternatives? Cube-maps?

+ Minimal distortion – straight lines stay straight + Hardware accelerated rendering - nearly 2x pixels of ideal minimum

Pyramids? Facebook have blogged about pyramids Cube-maps in disguise 5 planar projections instead of 6

Compress more efficiently Problem is as old as astronomy

Optimise Equirectangular? Too much horizontal resolution at poles

Resolution is about 2x above 60 degrees Chop the top and bottom off and half their width

Optimise Equirectangular Halve width of polar regions

Removes 30/180 of image => 5/6 * 40 = ~34 Megapixels Now we’re only 35% worse than ideal

General lesson We can divide sphere into regions Change projection and resolution

Can we do better?

Divide into multiple regions Remove down Vary resolution

Base on projection And area of interest

Other Options Lot’s of redundancy between left / right eye

Stereo aware compression as in 3D movies Reduction can be as much to 60%

Viewer often cares about one direction much more than another Broadcast of this event, screens and speaker more important Give them more bandwidth

Reduce resolution of off directions or reduce codec quality Send area around direction user is looking

Minimise switching latency Better codecs

H.265/HEVC – 50% if you’re lucky

The Future This Year

1k x 1k per eye 3 years

2k x 2k per eye (4k screens here now) 5+ years

4k x 4k per eye (wider field of view?)

Human vision Target per eye 8k x 8k may be sufficient?

Challenge 2 – 3D Vision

Stereo VR Videos Effectively a video for each eye

Parallax comes from camera positioning Packed vertically (left = top, right = bottom) Much stronger sense of presence

Stereo Vision Replace eyes by cameras

Stereo Vision Turn camera around head centre of rotation

Stitch and Project

Add a camera top and bottom Stitch all the left eyes together Stitch all the right eyes together

Stereo Vision

Truth about 3D VR Video

Creates a convincing sense of depth Increases sense of presence

This is good. Yay!

Truth about 3D VR Video Up and down are mono

Unavoidably – look up, turn 90°, look up again Effective Stereo separation varies with

viewing angle

Truth about 3D VR Video No toe in

Humans eyes track together Don’t look straight forward This impacts all VR for now

Truth about 3D VR Video

Camera is fixed position Don’t move your head

Camera pairs fixed separation orientation Don’t roll/tip your head

Truth about 3D VR Video Camera positions fixed

Position Roll IPD (based on view angle)

Perfect when eyes aligned with camera Less perfect elsewhere

More cameras and clever processing can improve Still limited by fixed view in each half of stereo video

What can be done? Need more 3D information

Depth and Occlusion Reconstruct view each frame

Reconstruction With depth and occlusion (geometry)

Generate right eye from left Correct stereo for up and roll

Reconstruct different positions and orientations Some head movement

Practical? Challenging computer vision problem

Probably not full-scene in real-time yet Multiple inward facing cameras

Motion capture suites Potentially laser scan fixed scene in advance Capture foreground objects live

Examples from Hololens, 8i and others Specular lighting difficult to reconstruct

Teleportation?

2021

technical challenges and opportunities for live vr

Software