state of the union neil trevett - khronos group · state of the union neil trevett khronos...
TRANSCRIPT
© Copyright Khronos Group 2014 - Page 1
GPU Acceleration for the Web State of the Union
Neil Trevett Khronos President
NVIDIA Vice President Mobile Content
© Copyright Khronos Group 2014 - Page 2
Khronos Connects Software to Silicon
Open Consortium creating ROYALTY-FREE, OPEN STANDARD APIs for hardware acceleration
Defining the roadmap for low-level silicon interfaces needed on every platform
Graphics, compute, rich media, vision, sensor and camera
processing
Rigorous specifications AND conformance tests for cross-
vendor portability
Acceleration APIs BY the Industry
FOR the Industry
Well over a BILLION people use Khronos APIs Every Day…
© Copyright Khronos Group 2014 - Page 3
http://accelerateyourworld.org/
© Copyright Khronos Group 2014 - Page 4
Mobile Web is a Real Time Application
Buttery smooth touch interaction needs continuous
60Hz updates
Apple iPhone
320x480 153K Pixels
163 DPI
Apple iPad
1024x768 786K Pixels
132 DPI
2048x1536 3100K Pixels
326 DPI
Apple iPad Mini
In 5 years the number of pixels to process on
mobile screens has gone up by factor of TWENTY
+ =
Need GPU Acceleration for Web Rendering!
© Copyright Khronos Group 2014 - Page 5
Access to 3D on Over 2 BILLION Devices
300M Desktops / year
1.9B Mobiles / year
1B Browsers / year
Source: Gartner (December 2013)
© Copyright Khronos Group 2014 - Page 6
Pervasive WebGL
http://caniuse.com/#feat=webgl
WebGL on EVERY major desktop browser And coming to ALL major mobile browsers
Portable (NO source change) 3D applications are possible for the first time
© Copyright Khronos Group 2014 - Page 7
WebGL Tool/Engine Ecosystem
https://www.youtube.com/watch?v=l9KRBuVBjVo
Epic Citadel - WebGL HTML 5 Benchmark (Firefox 22)
© Copyright Khronos Group 2014 - Page 8
WebGL on Mobile
http://crypt-webgl.unigine.com/
Unigine Engine Demo
© Copyright Khronos Group 2014 - Page 9
WebGL Roadmap
2003 1.0
2004 1.1
2007 2.0
2012 3.0
2014 3.1
Driver Update
Silicon Update
Silicon Update
Driver Update
Compute Shaders
32-bit integers and floats NPOT, 3D/depth textures Texture arrays Multiple Render Targets
Programmable Shaders Fixed function
Pipeline
2011
WebGL 2.0 - Open Review http://www.khronos.org/registry/webgl/specs/latest/2.0/
WebGL 2.0 Under Development
WebGL 1.0
© Copyright Khronos Group 2014 - Page 10
glTF - Transmitting 3D Assets to WebGL Apps • ‘GL Transmission Format’
- Runtime asset format for WebGL, OpenGL ES, and OpenGL applications
• Efficient Representation = Small Size AND Minimal Load Processing - JSON for scene structure and other high-level constructs - Binary mesh and animation data - Little or no processing to drop glTF data into client application
• Runtime Neutral - Can be created and used by any app or runtime
• Khronos is prototyping standards-based pipeline - Conditioning of COLLADA assets into glTF for WebGL applications
Playback Authoring
© Copyright Khronos Group 2014 - Page 11
COLLADA and glTF Ecosystem
Tool Interop
Three.js glTF Importer. Rest3D initiative
COLLADA2GLTF Translator
OpenCOLLADA Importer/Exporter
and COLLADA Conformance Tests
On GitHUB
Pervasive WebGL deployment
Other authoring formats
Web-based Tools
© Copyright Khronos Group 2014 - Page 12
Open Source Resources for glTF • COLLADA2GLTF open-source converter is gaining robustness and momentum
- https://github.com/KhronosGroup/glTF/tree/master/converter/COLLADA2GLTF - Binaries are available on GitHUB for easy use
• Three.js glTF loader - https://github.com/KhronosGroup/glTF/tree/master/loaders/threejs - Most glTF features are already supported
• Open specification; Open process - Spec, and sample code: https://github.com/KhronosGroup/glTF - All features backed up by multiple implementations in code - glTF 0.8 schema available - getting very close to glTF 1.0!
• Convertor using Open3DGC to compress 3D Meshes, Skinning, Animations - Available at https://github.com/fabrobinet/glTF-webgl-viewer
© Copyright Khronos Group 2014 - Page 13
glTF and Compression Extension • Benchmarking 3D compression formats for implementation as glTF extensions
- Baseline is GZIP - MPEG royalty-free Scalable Complexity
3D Mesh Compression codec MPEG-SC3DMC - Open3DGC JavaScript and C/C++ implementation
- WebGL-loader is Google lightweight compression format for WebGL content
Format
CAD Models (Mbytes)
3D Scanned Models (Mbytes)
MPEG dataset (Mbytes)
OBJ 1310 (100%) 736 (100%) 600 (100%) Gzip 336 (26%) 204 (28%) 157 (26%) Webgl-loader 219 (17%) 117 (16%) 103 (17%) Open3DGC 67 (5%) 22 (3%) 22 (4%) Webgl-loader + Gzip 80 (6%) 38 (5%) 26 (4%)
Open3DGC is 5x-9x more efficient than Gzip and 1.2x-1.5x more efficient than webgl-loader
© Copyright Khronos Group 2014 - Page 14
Motivation for WebCL • Parallel acceleration for compute-intensive web applications
- Portable and efficient access to heterogeneous multicore devices in JavaScript
• Typical Use Cases - 3D asset codecs, video codecs and processing, imaging and vision processing - Physics for WebGL games, Online data visualization, Augmented Reality
• WebCL 1.0 specification officially released at GDC March 2014 - https://www.khronos.org/webcl
http://www.youtube.com/user/SamsungSISA#p/a/u/1/9Ttux1A-Nuc
© Copyright Khronos Group 2014 - Page 15
WebCL - Heterogeneous Computing for Web • OpenCL = Two APIs and C-based Kernel language
- Platform Layer API to query, select and initialize compute devices - Kernel language - Subset of ISO C99 + language extensions - C Runtime API to build and execute kernels across multiple devices
• WebCL defines JavaScript binding to the OpenCL APIs - Enables initiation of OpenCL C Kernels from within the browser
OpenCL
Kernel Code
OpenCL Kernel Code
OpenCL Kernel Code
OpenCL C Kernel Code
GPU
DSP CPU
CPU HW
JavaScript Platform API To query, select and initialize
compute devices
JavaScript Runtime API To build and execute kernels
across multiple devices
© Copyright Khronos Group 2014 - Page 16
Content JavaScript, HTML, CSS, ...
WebGL/WebCL Ecosystem
JavaScript Middleware
JavaScript HTML5 / CSS
Browser provides WebGL and WebCL Alongside other HTML5 technologies
No plug-in required
OS Provided Drivers WebGL uses OpenGL ES 2.0 or
Angle for OpenGL ES 2.0 over DX9 WebCL uses OpenCL 1.X
Content downloaded from the Web
Middleware can make WebGL and WebCL accessible to non-expert programmers
E.g. three.js library: http://threejs.org/ used by majority of WebGL content
Low-level APIs provide a powerful foundation for a rich JavaScript
middleware ecosystem
© Copyright Khronos Group 2014 - Page 17
WebCL - Designed-in Architectural Security • Leverages OpenCL 1.2 robustness/security extensions
- Context Termination: to prevent DoS from long running kernels - Memory Initialization: no leakage from out of bounds memory access
• API and Language Restrictions to ensure no unsafe behavior is possible - Structures are not supported as kernel arguments - Kernels name must be less than 256 characters - Mapping of CL memory objects into host memory space is not supported - Program binaries are not supported - Some OpenCL API functions & builtin functions may require translation
• WebCL Kernel Validator - Open source on GitHub - Static and dynamic kernel checking - Verifies memory accesses are inside valid memory areas - Run-time checks injected in code if neccesary - https://github.com/KhronosGroup/webcl-validator
© Copyright Khronos Group 2014 - Page 18
• Implementations - Nokia - Firefox extension (Mozilla Public License 2.0)
- https://github.com/toaarnio/webcl-firefox - Samsung - WebKit (BSD)
- https://github.com/SRA-SiliconValley/webkit-webcl - Motorola - Uses Node.js (BSD)
- https://github.com/Motorola-Mobility/node-webcl - AMD –Chromium build
- https://github.com/amd/Chromium-WebCL
• WebCL Kernel Validator (open source) - https://github.com/KhronosGroup/webcl-validator
• OpenCL to WebCL Translator - https://github.com/wolfviking0/webcl-translator
• OpenCL Conformance Tests - https://github.com/KhronosGroup/WebCL-conformance/
WebCL Open Source Resources
http://fract.ured.me/
Based on Iñigo Quilez, Shader Toy
Based on Apple QJulia
Based on Iñigo Quilez, Shader Toy
© 2014 NVIDIA - Page 19
Path Rendering Acceleration Offload the CPU so the application can run as fast as possible
Make maximum use of the GPU for best performance and power
CPU creates paths
Use standard 3D commands to
process polygons
CPU renders paths
CPU creates paths
CPU tessellates paths into polygons
Define new OpenGL path commands to
process paths directly
CPU creates paths
- Software Scanline renderers can be high quality and portable
- CPU has to process complete pipeline – stealing cycles
from the application - Software rendering limits
performance
- Tessellation loads the CPU – stealing cycles from the application so perf
sometimes slower than software alone - Tessellation consumes a lot of data
and memory bandwidth = power - Quality can be compromised due to
tessellation accuracy
CPU
GPU
- Maximum CPU offload - Compact data format sent
to GPU renderer - GPU provides excellent performance and power
- GPU can increase quality and functionality
© 2014 NVIDIA - Page 20
NV_path_rendering OpenGL Extension Brings Path processing directly to OpenGL
OpenGL processes paths as fundmental primitive No tessellation necessary
Goals Functionally complete for key standards: SVG, Canvas, PostScript etc. Much faster—often 4x to 100x faster than CPUs Enhanced quality – can avoid approximations needed by CPU renderers Lower power by leveraging GPU hardware New functionality – e.g. mix 2D paths with 3D and programmable shading
© 2014 NVIDIA - Page 21
Stencil then Cover Approach Create a path object and pass directly to the GPU
Cubic & quadratic Bezier segments, line segments, partial elliptical arcs
GPU “Stencils” the path object into the stencil buffer GPU provides massively parallel stenciling of filled or stroked paths Calculate winding rule or containment at every sub-pixel sample in parallel
“Cover” the path object and stencil test against its coverage Test against path coverage determined in the 1st step and shade the path
Uses GPU MSAA anti-aliasing 8 or 16 samples/pixel gives good quality
Step 1 Stencil
Step 2: Cover
repeat
© 2014 NVIDIA - Page 22
Enhanced Quality on GPU
conflation artifacts on CPU conflation free on GPU Eliminate Conflation Artifacts
Multiple color AND stencil samples per pixel
color bleeding
Cairo NV_path_rendering Skia
feathers? weird big holes
Stroking approximations avoided by GPU regular grid on CPU - sub-optimal Antialiasing
jitter pattern on GPU for better Antialiasing
GPU Offers Jittered Sampling for Free
GPU
Qt
Cairo
Moiré artifacts Similar for Qt & Skia
Proper gradient filtering on GPU
GPUs great at texturing: Mip-mapping Anisotropic filtering Wrap modes
© 2014 NVIDIA - Page 23
New GPU Functionality
light source position for BUMP Mapping
Programmable Shading Paint in GLSL – for filter and blending acceleration
Projective Transformation
Fast Arbitrary Path Clipping
Mixing depth tested Text, 3D, and Paths
linear RGB transition between saturated red and saturated blue has dark purple region
sRGB perceptually smooth transition from saturated red to saturated blue
Fully sRGB Correct Rendering
© 2014 NVIDIA - Page 24
NVPR Resources and Adoption Adobe Illustrator CC shipping with NVPR!
Significant application acceleration
Developer resources http://developer.nvidia.com/nv-path-rendering Open source SVG Renderer - pr_svg Whitepapers, FAQ, specification NVprSDK—software development kit Email: [email protected]
Availability Shipping in Release 275 drivers and beyond All CUDA-capable NVIDIA GPUs – including mobile
Proposed open standard Submitted royalty-free to Khronos For use in any OpenGL-family API including WebGL
© 2014 NVIDIA - Page 25
Path Rendering Acceleration on Android Tablet
© Copyright Khronos Group 2014 - Page 26
Vision Pipeline Challenges and Opportunities
• Light / Proximity • 2 cameras • 3 microphones • Touch • Position
- GPS - WiFi (fingerprint) - Cellular trilateration - NFC/Bluetooth Beacons
• Accelerometer • Magnetometer • Gyroscope • Pressure / Temp / Humidity
19
Sensor Proliferation Diverse sensor awareness of the user and surroundings
• Camera sensors >20MPix • Novel sensor configurations • Stereo pairs • Plenoptic Arrays • Active Structured Light • Active TOF
Growing Camera Diversity Capturing color, range
and lightfields
Diverse Vision Processors Driving for high performance
and low power
• Camera ISPs • Dedicated vision IP blocks • DSPs and DSP arrays • Programmable GPUs • Multi-core CPUs
Flexible sensor and camera control to generate
required image stream
Use best processing available for image stream processing –
with code portability
Control/fuse vision data by/with all other sensor data
on device
© Copyright Khronos Group 2014 - Page 27
Khronos and W3C Cooperation • Khronos and W3C liaison for Web APIs
- Leverage proven native APIs - Fast API development/deployment - Designed by hardware community - Familiar foundation reduces
developer learning curve
Native APIs shipping or Khronos working group
JavaScript API shipping, acceleration being developed or work underway
WebVX? Vision
Processing
WebKCAM? Camera control
Possible future JavaScript APIs or acceleration
WebStream? Sensor Fusion
Native
JavaScript Canvas
Path Rendering
W3C Augmented Web Community Group discussing many of these vision issues for the Web: e.g. leveraging WebRTC in the short term http://w3.org/community/ar
WebSL? JS Binding to
WebAudio
© Copyright Khronos Group 2014 - Page 28
Questions?
• www.khronos.org • [email protected] • @neilt3d