khronos computing and graphics apis in smart devices · pdf fileworkshop on compiler...
TRANSCRIPT
© Copyright Khronos Group 2016 - Page 1
Khronos computing and graphics APIs in smart devices
Bor-Sung Liang, Ph.D
Workshop on Compiler Techniques and System Software for High-Performance and Embedded Computing
CTHPC 2016
May 27, 2016
© Copyright Khronos Group 2016 - Page 2
http://accelerateyourworld.org/
© Copyright Khronos Group 2016 - Page 3
Khronos Open Standards for Graphics and Compute
Portable intermediate representation
for graphics and parallel compute
High-efficiency GPU graphics and
compute for performance critical apps
Workhorse cross-platform professional 3D apps & gaming
Ubiquitous mobile gaming & graphics apps
Heterogeneous parallel compute
2000’s
2008
1990’s
2014
2015
Safety Critical Graphics 2005
© Copyright Khronos Group 2016 - Page 4
Khronos Connects Software to Silicon
Low-level silicon APIs
needed on almost every platform:
graphics, parallel compute,
rich media, vision, sensor
and camera processing
Software
Silicon
Conformance Tests and Adopters
Programs for specification integrity
and cross-vendor portability
Industry Consortium creating OPEN STANDARD APIs for hardware acceleration
Any company is welcome – one company one vote
ROYALTY-FREE specifications
State-of-the art IP framework protects
members AND the standards
International, non-profit organization
Membership and Adopters fees cover
operating and engineering expenses
Strong industry momentum
100s of man years invested by industry experts
Well over a BILLION people use Khronos APIs Every Day…
© Copyright Khronos Group 2016 - Page 5
Over 130 members worldwide Any company is welcome to join
PROMOTER MEMBERS
Members in East Asia Members in Taiwan
© Copyright Khronos Group 2016 - Page 6
Khronos Cooperative Framework
Royalty-Free
Specifications
Conformance Tests and
Adopters Program
Documentation, Tools
SDKs, Code Samples,
Adopters Build conformant
implementation and products
Developers Develop applications
using the APIs
Educators / Certifiers Create Courses
Training and Certification
Educator Guidelines
Courseware Materials
API Working Groups (Industry and Academic
members)
Members
Wider Community
$
$
“Khronos” “Vulkan”
8
Chronos (Khronos) is the personification of Time
in pre-Socratic philosophy and later literature.
Greek: Χρόνος, "time," also transliterated as Chronos ,
Khronos or Latinised as Chronus
Source: Wikipedia
Chronos (Khronos)
Vulcan
Vulcan is the god of fire including the fire of volcanoes,
metalworking, and the forge in ancient Roman religion and myth.
Latin: Volcānus or Vulcānus
Source: Wikipedia
Example: Apple Application Processor
9
SoC A4 A5 A6 A7 A8 A8x A9 A9x
Year 2010 2011 2012 2013 2014 2014 2015 2015
Proc. 45nm 32nm 32nm 28nm 20nm 20nm 16nm(T)/14nm(S) 16nm(T)
Size 53.3 mm2 69.6 mm2 96.71 mm2 102 mm2 89 mm2 128 mm2 104.5 / 96mm2 147mm2
CPU
CA8 Single-core
1.0 GHz
CA9 Dual-core 1.0 GHz
Swift Dual-core 1.3 GHz
Cyclone Dual-core
1.3GHz
Cyclone Dual-core 1.4 GHz
Cyclone Triple-core
1.5 GHz
Twister dual-core 1.85 GHz
Twister dual-core
2.16~2.26 GHz
GPU
SGX535 Single-core 200 MHz 2 GFLOPS
SGX543MP2 dual-core 250 MHz
16 GFLOPS
SGX543MP3 Tri-core
266 MHz 25.5 GFLOPS[
G6430 Quad-core 450 MHz
115.2 GFLOPS
GX6450 Quad-core
450 MH 115.2 GFLOPS
GXA6850 Octa-core 450 MHz
230.4 GFLOPS
GT7600 hexa-core 450 MHz
172.8 GFLOPS
7XT Series dodeca-core
custom design 345.6GFlops
CPU CPU CPU
GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU
GPU GPU GPU GPU
GPU GPU GPU
GPU GPU GPU
CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU
GPU GPU GPU
GPU GPU GPU
GPU GPU GPU
GPU GPU GPU
Source: Wikipedia, Apple, Chipworks, TechInsights and public information on internet
Performance Gap between CPU & GPU
10
Driver
Driver overhead affects performance!!
Example: Apple iPad family
* Source: Apple, public information on internet
GPU
Direct GPU Control
Application
responsible for
memory allocation and thread
management to generate command
buffers
New Low overhead API
11
GPU
Traditional graphics
drivers include significant
context, memory and error
management
Application
Driver Overhead
iOS iOS OSX
12
Low Overhead API in PC Direct X11 (Traditional) AMD Mantle Microsoft DirectX 12
Low Overhead API Low Overhead API
https://www.youtube.com/watch?v=tTh1xOhADuA
Source: GamerBoy PLAYS, Directx 11 VS Mantle VS Directx 12 3Dmark API Overhead Test R9 280x Catalyst 15.200.1012.2 Win 10TP
OpenCL API /Runtime
13
OpenCL C Kernel language
GLSL Shader Language
SPIR
2014 Status
OGS/ES API /Runtime
RSC Kernel Language
RenderScript API /Runtime
RenderScript
OpenCL API /Runtime
14
OpenCL C/C++ Kernel language
Low-Overhead
Vulkan API /Runtime
SPIR-V
2016 Status
Low-Overhead
C++
Metal API/Runtime
Metal
OGS/ES API /Runtime
RSC Kernel Language
RenderScript API /Runtime
RenderScript
GLSL Shader Language
GLSL Shader Language
SPIR-V
© Copyright Khronos Group 2016 - Page 16
The Genesis of Vulkan
Vulkan Working Group Participants
Significant proposals, IP contributions
and engineering effort from many
working group members
Khronos members from all
segments of the graphics industry
agree the need for new
generation cross-platform GPU API
Including an unprecedented level of
participation from game engine developers
Khronos’ first API
‘hard launch’
Specification, Conformance Tests, SDKs - all open source…
Reference Materials, Compiler front-ends, Samples…
Multiple Conformant Drivers on multiple OS
18 months
A high-energy
working group effort
© Copyright Khronos Group 2016 - Page 17
The Need for a New Generation GPU API • Explicit
- Open up the high-level driver abstraction to give direct, low-level GPU control
• Streamlined
- Faster performance, lower overhead, less latency
• Portable
- Cloud, desktop, console, mobile and embedded
• Extensible
- Platform for rapid innovation
OpenGL has evolved over 25 years and
continues to meet industry needs – but there is
a need for a complementary API approach
GPUs will accelerate graphics, compute, vision
and deep learning across diverse platforms:
FLEXIBILITY and PORTABILITY are key
GPUs are increasingly programmable and
compute capable + platforms are becoming
mobile, memory-unified and multi-core
© Copyright Khronos Group 2016 - Page 18
Vulkan Explicit GPU Control
GPU
High-level Driver
Abstraction Context management
Memory allocation
Full GLSL compiler
Error detection
Layered GPU Control
Application Single thread per context
GPU
Thin Driver Explicit GPU Control
Application Memory allocation
Thread management
Multi-threaded generation
of command buffers
Multi-queue work
submission
Language Front-end
Compilers Initially GLSL
Loadable debug and
validation layers
Vulkan 1.0 provides access to
OpenGL ES 3.1 / OpenGL 4.X-class GPU functionality
but with increased performance and flexibility
Loadable Layers No error handling overhead in
production code
SPIR-V Pre-compiled Shaders: No front-end compiler in driver
Future shading language flexibility
Simpler drivers: Improved efficiency/performance
Reduced CPU bottlenecks
Lower latency
Increased portability
Graphics, compute and DMA queues: Work dispatch flexibility
Command Buffers: Command creation can be multi-threaded
Multiple CPU cores increase performance
Resource management in app code: Less hitches and surprises
Vulkan Benefits
SPIR-V pre-compiled shaders
© Copyright Khronos Group 2016 - Page 19
Next Generation GPU APIs
Only Apple Only Windows 10
Cross Platform
© Copyright Khronos Group 2016 - Page 20
Vulkan - No Compromise Performance
Potential Performance Gain
Retains Traditional Binding Model (but missing functionality such as
Tessellation and Geometry Shaders)
Amount of work to port from traditional
OpenGL and OpenGL ES
© Copyright Khronos Group 2016 - Page 21
Which Developers Should Use Vulkan? • Vulkan puts more work and responsibility into the application
- Not every developer will need or want to make that extra investment
• For many developers OpenGL and OpenGL ES will remain the most effective API
- Khronos actively evolving OpenGL and OpenGL ES in parallel with Vulkan
Is your app
CPU bound?
Can your graphics
work creation be
parallelized?
You put
a premium on
avoiding
hitches
You can
manage your
graphics resource
allocations
You’ll
do whatever
it takes to squeeze
out maximum
performance
Yes
Yes
Yes
Yes
Yes
No
Vulkan provides more choice to developers and can be used to
create new classes of end-user experience
Vulkan vs OpenGL ES
https://www.youtube.com/watch?v=P_I8an8jXuM
Source: Imagination, PowerVR Rogue GPUs running Gnome Horde demo (Vulkan prototype)
© Copyright Khronos Group 2016 - Page 23
Fish demo (nVidia)
• Vulkan and OpenGL ES 3.1
• Can change
- # of schools of fish
- # of fish per school
- # of fish per drawcall
• Worker threads create command
buffers in Vulkan mode
• Reports
- Drawcalls/sec
- FPS
- CPU time per thread
- GPU time
• Android and Windows
• Source code will be available soon
https://www.youtube.com/watch?v=6DpJY1izxc4
Source, nVidia, Vulkan API - Multithreaded Fish Demo Download
© Copyright Khronos Group 2016 - Page 24
200K Fishies, 100 fish per draw call
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
180,000
200,000
Geforce GTX 980 SHIELD Android TV SHIELD Tablet K1
OpenGL ES
Vulkan
drawcalls / sec
7x
1.5x
1.2x
= 2K Draw call @ 1 frame
= 60K Draw call @ 30 frame
= 120K Draw call @ 60 frame
= 180K Draw call @ 90 frame
© Copyright Khronos Group 2016 - Page 25
200K Fishies, 1 fish per draw call
0
2,000,000
4,000,000
6,000,000
8,000,000
10,000,000
12,000,000
14,000,000
16,000,000
18,000,000
Geforce GTX 980 SHIELD Android TV SHIELD Tablet K1
OpenGL ES
Vulkan
drawcalls / sec
6x
5x
19x = 200K draw calls @ 1 frame
= 6,000K draw calls @ 30 frame
= 12,000K draw calls @ 60 frame
= 18,000K draw calls @ 90 frame
© Copyright Khronos Group 2016 - Page 26
The Power of a Three Layer Ecosystem
Applications
can use Vulkan
directly for
maximum
flexibility and
control Utility libraries
and layers
Application
Game Engines
fully optimized
over Vulkan
Application uses utility libraries to
speed
development
Rich Area for Innovation • Many utilities and layers will be in open source
• Layers to ease transition from OpenGL • Domain specific flexibility
Applications using game engines
will automatically benefit from
Vulkan’s enhanced performance
Similar ecosystem dynamic as WebGL A widely pervasive, powerful, flexible foundation layer enables diverse middleware tools and libraries
© Copyright Khronos Group 2016 - Page 28
OpenCL Ecosystem
Hardware Implementers Desktop/Mobile/Embedded/FPGA
Working Group Members Apps/Tools/Tests/Courseware
Single Source C++ Programming
Portable Kernel Intermediate Language
Core API and Language Specs
OpenCL 2.2 – Top to Bottom C++
© Copyright Khronos Group 2016 - Page 29
OpenCL 2.2 • Provisional - seeking industry feedback before finalization at SIGGRAPH or SC16
• OpenCL C++ kernel language into core
• SPIR-V 1.1 adds OpenCL C++ support
• SYCL 2.2 fully leverages OpenCL 2.2 from a single source file
• Runs on any OpenCL 2.0-capable hardware
OpenCL 1.0 Specification
Dec08 Jun10 OpenCL 1.1
Specification
Nov11 OpenCL 1.2
Specification OpenCL 2.0 Specification
Nov13
Device partitioning
Separate compilation and linking
Enhanced image support
Built-in kernels / custom devices
Enhanced DX and OpenGL Interop
Shared Virtual Memory
On-device dispatch
Generic Address Space
Enhanced Image Support
C11 Atomics
Pipes
Android ICD
3-component vectors
Additional image formats
Multiple hosts and devices
Buffer region operations
Enhanced event-driven execution
Additional OpenCL C built-ins
Improved OpenGL data/event interop
18 months 18 months 24 months
OpenCL 2.1 Specification
Nov15 24 months
SPIR-V in Core
Subgroups into core
Subgroup query operations
clCloneKernel
Low-latency device
timer queries
OpenCL C++ Kernel Language
SPIR-V 1.1 with C++ support
SYCL 2.2 for single source C++
OpenCL 2.2 PROVISIONAL
May16 7months
© Copyright Khronos Group 2016 - Page 30
OpenCL Implementations
OpenCL 1.0 Specification
Dec08 Jun10 OpenCL 1.1
Specification
Nov11 OpenCL 1.2
Specification OpenCL 2.0
Specification
Nov13
1.0 | Jul13
1.0 | Aug09
1.0 | May09
1.0 | May10
1.0 | Feb11
1.0 | May09
1.0 | Jan10
1.1 | Aug10
1.1 | Jul11
1.2 | May12
1.2 | Jun12
1.1 | Feb11
1.1 |Mar11
1.1 | Jun10
1.1 | Aug12
1.1 | Nov12
1.1 | May13
1.1 | Apr12
1.2 | Apr14
1.2 | Sep13
1.2 | Dec12
Desktop
Mobile
FPGA
2.0 | Jul14
OpenCL 2.1 Specification
Nov15
1.2 | May15
2.0 | Dec14
1.0 | Dec14
1.2 | Dec14
1.2 | Sep14
Vendor timelines are
first implementation of
each spec generation
1.2 | May15
Embedded
1.2 | Aug15
1.2 | Mar16
2.0 | Nov15
© Copyright Khronos Group 2016 - Page 31
Add Compute to Vulkan? In Discussion…
Embedded Use cases: Signal and Pixel Processing
Roadmap: arbitrary precision for power
efficiency, hard real-time scheduling,
asynch DMA
FPGAs Use cases: Network and
Stream Processing
Roadmap: enhanced execution model, self-
synchronized and self-scheduled graphs, fine-
grained synchronization between kernels,
DSL in C++
HPC, SciViz, Datacenter Use case: Numerical Simulation, Virtualization
Roadmap: enhanced streaming processing,
enhanced library support
Vulkan Compute? Gaming Compute, Pixel Processing, Inference
Fine grain graphics and compute (no interop needed)
SPIR-V for shading language flexibility – C/C++
Low-latency, fine grain run-time
Google Android adoption
Competes well with Metal (=C++/OpenCL 1.2)
Roadmap: arbitrary precision, SVM,
dynamic parallelism, pre-emption and QoS scheduling
Desktop Use cases: Video and Image Processing, Gaming Compute
Roadmap: Vulkan interop, arbitrary precision for
increased performance, pre-emption, collective
programming and improved execution model
Mobile Use case: Photo and Vision Processing
Roadmap: arbitrary precision for
inference engine and pixel processing efficiency, pre-
emption and QoS scheduling for power efficiency
Vulkan Lessons 1. Engine developer insights were essential during design
2. Engine prototyping during design was essential during design
3. Open sourcing tests, tools, specs drives deeper community engagement
4. Explicit API – supports strong middleware ecosystem
BUT its ‘just’ a GPU API – still need OpenCL!
© Copyright Khronos Group 2016 - Page 32
Possible OpenCL Evolution
Increasing parallel hardware flexibility Execution and memory model enhancements
Pre-emption, virtual memory, on-device dispatch, synchronization
Increasing language expressiveness Guaranteeing degrees of forward progress
Definitions of concurrency
Evolution of OpenCL … … filling the gap between imprecise HLL and imperfect hardware
Should OpenCL evolve to focus on the things that ONLY OpenCL can do… 1. Enable low-level, explicit access to heterogeneous hardware – needed by languages and libraries
2. Provide efficient runtime coordination of tasks, resources, scheduling on target hardware
3. Leverage, synergize and co-exist with Vulkan compute – and learn from Vulkan …
4. Define feature sets so target hardware does not have to implement inappropriate functionality
5. Adopt layered tools architecture to drive tools momentum and decrease run-time overhead
6. Leave usability, portability and performance portability to higher levels in the ecosystem
© Copyright Khronos Group 2016 - Page 34
SPIR-V Ecosystem
LLVM
Third party kernel and
shader Languages
SPIR-V • Khronos defined and controlled
cross-API intermediate language
• Native support for graphics
and parallel constructs
• 32-bit Word Stream
• Extensible and easily parsed
• Retains data object and control
flow information for effective
code generation and translation
OpenCL C++ OpenCL C
GLSL Khronos has open sourced
these tools and translators
IHV Driver
Runtimes
Other
Intermediate
Forms
SPIR-V Validator
SPIR-V Tools
SPIR-V (Dis)Assembler
LLVM to SPIR-V
Bi-directional
Translator
Khronos plans to open
source these tools soon
https://github.com/KhronosGroup/SPIR/tree/spirv-1.1
Open source C++ front-end released
© Copyright Khronos Group 2016 - Page 35
SPIR-V serves many APIs
Vulkan
OpenGL
OpenCL
● Need ongoing coherent design.
Talk to us early.
● Aim to serve everyone with
common features
● Updates need to be ratified by
SPIR WG and client API WG.
© Copyright Khronos Group 2016 - Page 36
Evolution of SPIR Family • SPIR–V is first fully specified Khronos-defined SPIR standard
- Does not use LLVM to isolate from LLVM roadmap changes
- Includes full flow control, graphics and parallel constructs beyond LLVM
- Khronos will open source SPIR-V <-> LLVM conversion tools
SPIR 1.2 SPIR 2.0 SPIR-V 1.0
LLVM Interaction Uses LLVM 3.2 Uses LLVM 3.4 100% Khronos defined
Round-trip lossless conversion
Compute Constructs Metadata/Intrinsics Metadata/Intrinsics Native
Graphics Constructs No No Native
Supported Language
Feature Set OpenCL C 1.2
OpenCL C 1.2 OpenCL C 2.0
OpenCL C 1.2 / 2.0 OpenCL C++
GLSL
OpenCL Ingestion OpenCL 1.2 Extension
OpenCL 2.0 Extension
OpenCL 2.1 Core
Vulkan Ingestion - - Vulkan Core
© Copyright Khronos Group 2016 - Page 37
Support for Both SPIR-V and LLVM • LLVM is an SDK, not a formally defined standard
- Khronos moved away from trying to use LLVM IR as a standard
- Issues with versioning, metadata, etc.
• But LLVM is a treasure chest of useful transforms
- SPIR-V tools can encapsulation and use LLVM to do useful SPIR-V transforms
• SPIR-V tools can all use different rules – and there will be lots of these
- May be lossy and only support SPIR-V subset
- Internal form is not standardized
- May hide LLVM version, metadata
SPIR-V
‘Rendezvous’ format
for interchange Native expression of graphics
and parallel functionality for
Khronos APIs
Tool-encapsulated
LLVM
HLSL
GLSL OpenCL C
OpenCL C++
Transform
Tool - Compression
- Optimization
- Stripping
- Linker/Merger
Driver
© Copyright Khronos Group 2016 - Page 39
Khronos Open Standards for Graphics and Compute
Portable intermediate representation
for graphics and parallel compute
High-efficiency GPU graphics and
compute for performance critical apps
Workhorse cross-platform professional 3D apps & gaming
Ubiquitous mobile gaming & graphics apps
Heterogeneous parallel compute
2000’s
2008
1990’s
2014
2015
Safety Critical Graphics 2005
New Extensions to enable latest
desktop graphics capabilities
OpenGL ES 3.2 released to bring
AEP functionality to core
New Safety Critical Working
Group – Call for Participation
OpenCL 2.0 specification update
and C++ Headers released
Provisional Spec Update and
significant open source activity
Adopted by Android and other
platforms. Building ecosystem
LATEST STATUS
© Copyright Khronos Group 2016 - Page 40
Roadmap Possibilities
1. C++ Shading Language
2. Single source C++
Programming from SYCL
3. OpenCL-class
Heterogeneous Compute to
Vulkan runtime
SPIR-V Ingestion for OpenGL and OpenGL
ES for shading language flexibility
Thin and
predictable
graphics and
compute for
safety critical
systems
© Copyright Khronos Group 2016 - Page 41
Virtual Reality Will Influence Graphics APIs • The ability to generate ‘Presence’ is becoming achievable at reasonable cost
- Using visual input to generate subconscious belief in a virtual situation
- PC-based AND mobile systems
• VR Requirements will affect how graphics APIs generate visual imagery
- Control over generation of stereo pairs – slightly different view for each eye
- Optical system geometric correction in rendering path
- Reduce latency through elimination of in-driver buffering
- Asynchronously warp framebuffer for
instantaneous response to head movement
Samsung Gear VR
Many companies driving the VR revolution are
participating in the Vulkan working group
© Copyright Khronos Group 2016 - Page 42
Khronos Standards for Advanced Processing
Low-power vision processing
for tracking, odometry and
scene analysis
Heterogeneous Processing Acceleration
e.g. Neural Net Processing for scene understanding
3D Graphics for
Portable display of augmentations
and visualizations on every platform Image: ScreenMedia
3D File formats for
AUTHORING and TRANSMISSION
of 3D runtime assets
© Copyright Khronos Group 2016 - Page 43
Thank You!
Bor-Sung Liang, Ph.D [email protected]
Workshop on Compiler Techniques and System Software for High-Performance and Embedded Computing
CTHPC 2016