khronos computing and graphics apis in smart devices · pdf fileworkshop on compiler...

43
© Copyright Khronos Group 2016 - Page 1 Khronos computing and graphics APIs in smart devices Bor-Sung Liang, Ph.D Workshop on Compiler Techniques and System Software for High-Performance and Embedded Computing CTHPC 2016 May 27, 2016 [email protected]

Upload: vuongdang

Post on 07-Mar-2018

220 views

Category:

Documents


1 download

TRANSCRIPT

© Copyright Khronos Group 2016 - Page 1

Khronos computing and graphics APIs in smart devices

Bor-Sung Liang, Ph.D

Workshop on Compiler Techniques and System Software for High-Performance and Embedded Computing

CTHPC 2016

May 27, 2016

[email protected]

© Copyright Khronos Group 2016 - Page 2

http://accelerateyourworld.org/

© Copyright Khronos Group 2016 - Page 3

Khronos Open Standards for Graphics and Compute

Portable intermediate representation

for graphics and parallel compute

High-efficiency GPU graphics and

compute for performance critical apps

Workhorse cross-platform professional 3D apps & gaming

Ubiquitous mobile gaming & graphics apps

Heterogeneous parallel compute

2000’s

2008

1990’s

2014

2015

Safety Critical Graphics 2005

© Copyright Khronos Group 2016 - Page 4

Khronos Connects Software to Silicon

Low-level silicon APIs

needed on almost every platform:

graphics, parallel compute,

rich media, vision, sensor

and camera processing

Software

Silicon

Conformance Tests and Adopters

Programs for specification integrity

and cross-vendor portability

Industry Consortium creating OPEN STANDARD APIs for hardware acceleration

Any company is welcome – one company one vote

ROYALTY-FREE specifications

State-of-the art IP framework protects

members AND the standards

International, non-profit organization

Membership and Adopters fees cover

operating and engineering expenses

Strong industry momentum

100s of man years invested by industry experts

Well over a BILLION people use Khronos APIs Every Day…

© Copyright Khronos Group 2016 - Page 5

Over 130 members worldwide Any company is welcome to join

PROMOTER MEMBERS

Members in East Asia Members in Taiwan

© Copyright Khronos Group 2016 - Page 6

Khronos Cooperative Framework

Royalty-Free

Specifications

Conformance Tests and

Adopters Program

Documentation, Tools

SDKs, Code Samples,

Adopters Build conformant

implementation and products

Developers Develop applications

using the APIs

Educators / Certifiers Create Courses

Training and Certification

Educator Guidelines

Courseware Materials

API Working Groups (Industry and Academic

members)

Members

Wider Community

$

$

© Copyright Khronos Group 2016 - Page 7

Evolution of Vulkan, OpenCL, SPIR-V

“Khronos” “Vulkan”

8

Chronos (Khronos) is the personification of Time

in pre-Socratic philosophy and later literature.

Greek: Χρόνος, "time," also transliterated as Chronos ,

Khronos or Latinised as Chronus

Source: Wikipedia

Chronos (Khronos)

Vulcan

Vulcan is the god of fire including the fire of volcanoes,

metalworking, and the forge in ancient Roman religion and myth.

Latin: Volcānus or Vulcānus

Source: Wikipedia

Example: Apple Application Processor

9

SoC A4 A5 A6 A7 A8 A8x A9 A9x

Year 2010 2011 2012 2013 2014 2014 2015 2015

Proc. 45nm 32nm 32nm 28nm 20nm 20nm 16nm(T)/14nm(S) 16nm(T)

Size 53.3 mm2 69.6 mm2 96.71 mm2 102 mm2 89 mm2 128 mm2 104.5 / 96mm2 147mm2

CPU

CA8 Single-core

1.0 GHz

CA9 Dual-core 1.0 GHz

Swift Dual-core 1.3 GHz

Cyclone Dual-core

1.3GHz

Cyclone Dual-core 1.4 GHz

Cyclone Triple-core

1.5 GHz

Twister dual-core 1.85 GHz

Twister dual-core

2.16~2.26 GHz

GPU

SGX535 Single-core 200 MHz 2 GFLOPS

SGX543MP2 dual-core 250 MHz

16 GFLOPS

SGX543MP3 Tri-core

266 MHz 25.5 GFLOPS[

G6430 Quad-core 450 MHz

115.2 GFLOPS

GX6450 Quad-core

450 MH 115.2 GFLOPS

GXA6850 Octa-core 450 MHz

230.4 GFLOPS

GT7600 hexa-core 450 MHz

172.8 GFLOPS

7XT Series dodeca-core

custom design 345.6GFlops

CPU CPU CPU

GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU

GPU GPU GPU GPU

GPU GPU GPU

GPU GPU GPU

CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU

GPU GPU GPU

GPU GPU GPU

GPU GPU GPU

GPU GPU GPU

Source: Wikipedia, Apple, Chipworks, TechInsights and public information on internet

Performance Gap between CPU & GPU

10

Driver

Driver overhead affects performance!!

Example: Apple iPad family

* Source: Apple, public information on internet

GPU

Direct GPU Control

Application

responsible for

memory allocation and thread

management to generate command

buffers

New Low overhead API

11

GPU

Traditional graphics

drivers include significant

context, memory and error

management

Application

Driver Overhead

iOS iOS OSX

12

Low Overhead API in PC Direct X11 (Traditional) AMD Mantle Microsoft DirectX 12

Low Overhead API Low Overhead API

https://www.youtube.com/watch?v=tTh1xOhADuA

Source: GamerBoy PLAYS, Directx 11 VS Mantle VS Directx 12 3Dmark API Overhead Test R9 280x Catalyst 15.200.1012.2 Win 10TP

OpenCL API /Runtime

13

OpenCL C Kernel language

GLSL Shader Language

SPIR

2014 Status

OGS/ES API /Runtime

RSC Kernel Language

RenderScript API /Runtime

RenderScript

OpenCL API /Runtime

14

OpenCL C/C++ Kernel language

Low-Overhead

Vulkan API /Runtime

SPIR-V

2016 Status

Low-Overhead

C++

Metal API/Runtime

Metal

OGS/ES API /Runtime

RSC Kernel Language

RenderScript API /Runtime

RenderScript

GLSL Shader Language

GLSL Shader Language

SPIR-V

Low overhead 3D Graphics API

© Copyright Khronos Group 2016 - Page 16

The Genesis of Vulkan

Vulkan Working Group Participants

Significant proposals, IP contributions

and engineering effort from many

working group members

Khronos members from all

segments of the graphics industry

agree the need for new

generation cross-platform GPU API

Including an unprecedented level of

participation from game engine developers

Khronos’ first API

‘hard launch’

Specification, Conformance Tests, SDKs - all open source…

Reference Materials, Compiler front-ends, Samples…

Multiple Conformant Drivers on multiple OS

18 months

A high-energy

working group effort

© Copyright Khronos Group 2016 - Page 17

The Need for a New Generation GPU API • Explicit

- Open up the high-level driver abstraction to give direct, low-level GPU control

• Streamlined

- Faster performance, lower overhead, less latency

• Portable

- Cloud, desktop, console, mobile and embedded

• Extensible

- Platform for rapid innovation

OpenGL has evolved over 25 years and

continues to meet industry needs – but there is

a need for a complementary API approach

GPUs will accelerate graphics, compute, vision

and deep learning across diverse platforms:

FLEXIBILITY and PORTABILITY are key

GPUs are increasingly programmable and

compute capable + platforms are becoming

mobile, memory-unified and multi-core

© Copyright Khronos Group 2016 - Page 18

Vulkan Explicit GPU Control

GPU

High-level Driver

Abstraction Context management

Memory allocation

Full GLSL compiler

Error detection

Layered GPU Control

Application Single thread per context

GPU

Thin Driver Explicit GPU Control

Application Memory allocation

Thread management

Multi-threaded generation

of command buffers

Multi-queue work

submission

Language Front-end

Compilers Initially GLSL

Loadable debug and

validation layers

Vulkan 1.0 provides access to

OpenGL ES 3.1 / OpenGL 4.X-class GPU functionality

but with increased performance and flexibility

Loadable Layers No error handling overhead in

production code

SPIR-V Pre-compiled Shaders: No front-end compiler in driver

Future shading language flexibility

Simpler drivers: Improved efficiency/performance

Reduced CPU bottlenecks

Lower latency

Increased portability

Graphics, compute and DMA queues: Work dispatch flexibility

Command Buffers: Command creation can be multi-threaded

Multiple CPU cores increase performance

Resource management in app code: Less hitches and surprises

Vulkan Benefits

SPIR-V pre-compiled shaders

© Copyright Khronos Group 2016 - Page 19

Next Generation GPU APIs

Only Apple Only Windows 10

Cross Platform

© Copyright Khronos Group 2016 - Page 20

Vulkan - No Compromise Performance

Potential Performance Gain

Retains Traditional Binding Model (but missing functionality such as

Tessellation and Geometry Shaders)

Amount of work to port from traditional

OpenGL and OpenGL ES

© Copyright Khronos Group 2016 - Page 21

Which Developers Should Use Vulkan? • Vulkan puts more work and responsibility into the application

- Not every developer will need or want to make that extra investment

• For many developers OpenGL and OpenGL ES will remain the most effective API

- Khronos actively evolving OpenGL and OpenGL ES in parallel with Vulkan

Is your app

CPU bound?

Can your graphics

work creation be

parallelized?

You put

a premium on

avoiding

hitches

You can

manage your

graphics resource

allocations

You’ll

do whatever

it takes to squeeze

out maximum

performance

Yes

Yes

Yes

Yes

Yes

No

Vulkan provides more choice to developers and can be used to

create new classes of end-user experience

Vulkan vs OpenGL ES

https://www.youtube.com/watch?v=P_I8an8jXuM

Source: Imagination, PowerVR Rogue GPUs running Gnome Horde demo (Vulkan prototype)

© Copyright Khronos Group 2016 - Page 23

Fish demo (nVidia)

• Vulkan and OpenGL ES 3.1

• Can change

- # of schools of fish

- # of fish per school

- # of fish per drawcall

• Worker threads create command

buffers in Vulkan mode

• Reports

- Drawcalls/sec

- FPS

- CPU time per thread

- GPU time

• Android and Windows

• Source code will be available soon

https://www.youtube.com/watch?v=6DpJY1izxc4

Source, nVidia, Vulkan API - Multithreaded Fish Demo Download

© Copyright Khronos Group 2016 - Page 24

200K Fishies, 100 fish per draw call

0

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

180,000

200,000

Geforce GTX 980 SHIELD Android TV SHIELD Tablet K1

OpenGL ES

Vulkan

drawcalls / sec

7x

1.5x

1.2x

= 2K Draw call @ 1 frame

= 60K Draw call @ 30 frame

= 120K Draw call @ 60 frame

= 180K Draw call @ 90 frame

© Copyright Khronos Group 2016 - Page 25

200K Fishies, 1 fish per draw call

0

2,000,000

4,000,000

6,000,000

8,000,000

10,000,000

12,000,000

14,000,000

16,000,000

18,000,000

Geforce GTX 980 SHIELD Android TV SHIELD Tablet K1

OpenGL ES

Vulkan

drawcalls / sec

6x

5x

19x = 200K draw calls @ 1 frame

= 6,000K draw calls @ 30 frame

= 12,000K draw calls @ 60 frame

= 18,000K draw calls @ 90 frame

© Copyright Khronos Group 2016 - Page 26

The Power of a Three Layer Ecosystem

Applications

can use Vulkan

directly for

maximum

flexibility and

control Utility libraries

and layers

Application

Game Engines

fully optimized

over Vulkan

Application uses utility libraries to

speed

development

Rich Area for Innovation • Many utilities and layers will be in open source

• Layers to ease transition from OpenGL • Domain specific flexibility

Applications using game engines

will automatically benefit from

Vulkan’s enhanced performance

Similar ecosystem dynamic as WebGL A widely pervasive, powerful, flexible foundation layer enables diverse middleware tools and libraries

© Copyright Khronos Group 2016 - Page 27

OpenCL Computing API

© Copyright Khronos Group 2016 - Page 28

OpenCL Ecosystem

Hardware Implementers Desktop/Mobile/Embedded/FPGA

Working Group Members Apps/Tools/Tests/Courseware

Single Source C++ Programming

Portable Kernel Intermediate Language

Core API and Language Specs

OpenCL 2.2 – Top to Bottom C++

© Copyright Khronos Group 2016 - Page 29

OpenCL 2.2 • Provisional - seeking industry feedback before finalization at SIGGRAPH or SC16

• OpenCL C++ kernel language into core

• SPIR-V 1.1 adds OpenCL C++ support

• SYCL 2.2 fully leverages OpenCL 2.2 from a single source file

• Runs on any OpenCL 2.0-capable hardware

OpenCL 1.0 Specification

Dec08 Jun10 OpenCL 1.1

Specification

Nov11 OpenCL 1.2

Specification OpenCL 2.0 Specification

Nov13

Device partitioning

Separate compilation and linking

Enhanced image support

Built-in kernels / custom devices

Enhanced DX and OpenGL Interop

Shared Virtual Memory

On-device dispatch

Generic Address Space

Enhanced Image Support

C11 Atomics

Pipes

Android ICD

3-component vectors

Additional image formats

Multiple hosts and devices

Buffer region operations

Enhanced event-driven execution

Additional OpenCL C built-ins

Improved OpenGL data/event interop

18 months 18 months 24 months

OpenCL 2.1 Specification

Nov15 24 months

SPIR-V in Core

Subgroups into core

Subgroup query operations

clCloneKernel

Low-latency device

timer queries

OpenCL C++ Kernel Language

SPIR-V 1.1 with C++ support

SYCL 2.2 for single source C++

OpenCL 2.2 PROVISIONAL

May16 7months

© Copyright Khronos Group 2016 - Page 30

OpenCL Implementations

OpenCL 1.0 Specification

Dec08 Jun10 OpenCL 1.1

Specification

Nov11 OpenCL 1.2

Specification OpenCL 2.0

Specification

Nov13

1.0 | Jul13

1.0 | Aug09

1.0 | May09

1.0 | May10

1.0 | Feb11

1.0 | May09

1.0 | Jan10

1.1 | Aug10

1.1 | Jul11

1.2 | May12

1.2 | Jun12

1.1 | Feb11

1.1 |Mar11

1.1 | Jun10

1.1 | Aug12

1.1 | Nov12

1.1 | May13

1.1 | Apr12

1.2 | Apr14

1.2 | Sep13

1.2 | Dec12

Desktop

Mobile

FPGA

2.0 | Jul14

OpenCL 2.1 Specification

Nov15

1.2 | May15

2.0 | Dec14

1.0 | Dec14

1.2 | Dec14

1.2 | Sep14

Vendor timelines are

first implementation of

each spec generation

1.2 | May15

Embedded

1.2 | Aug15

1.2 | Mar16

2.0 | Nov15

© Copyright Khronos Group 2016 - Page 31

Add Compute to Vulkan? In Discussion…

Embedded Use cases: Signal and Pixel Processing

Roadmap: arbitrary precision for power

efficiency, hard real-time scheduling,

asynch DMA

FPGAs Use cases: Network and

Stream Processing

Roadmap: enhanced execution model, self-

synchronized and self-scheduled graphs, fine-

grained synchronization between kernels,

DSL in C++

HPC, SciViz, Datacenter Use case: Numerical Simulation, Virtualization

Roadmap: enhanced streaming processing,

enhanced library support

Vulkan Compute? Gaming Compute, Pixel Processing, Inference

Fine grain graphics and compute (no interop needed)

SPIR-V for shading language flexibility – C/C++

Low-latency, fine grain run-time

Google Android adoption

Competes well with Metal (=C++/OpenCL 1.2)

Roadmap: arbitrary precision, SVM,

dynamic parallelism, pre-emption and QoS scheduling

Desktop Use cases: Video and Image Processing, Gaming Compute

Roadmap: Vulkan interop, arbitrary precision for

increased performance, pre-emption, collective

programming and improved execution model

Mobile Use case: Photo and Vision Processing

Roadmap: arbitrary precision for

inference engine and pixel processing efficiency, pre-

emption and QoS scheduling for power efficiency

Vulkan Lessons 1. Engine developer insights were essential during design

2. Engine prototyping during design was essential during design

3. Open sourcing tests, tools, specs drives deeper community engagement

4. Explicit API – supports strong middleware ecosystem

BUT its ‘just’ a GPU API – still need OpenCL!

© Copyright Khronos Group 2016 - Page 32

Possible OpenCL Evolution

Increasing parallel hardware flexibility Execution and memory model enhancements

Pre-emption, virtual memory, on-device dispatch, synchronization

Increasing language expressiveness Guaranteeing degrees of forward progress

Definitions of concurrency

Evolution of OpenCL … … filling the gap between imprecise HLL and imperfect hardware

Should OpenCL evolve to focus on the things that ONLY OpenCL can do… 1. Enable low-level, explicit access to heterogeneous hardware – needed by languages and libraries

2. Provide efficient runtime coordination of tasks, resources, scheduling on target hardware

3. Leverage, synergize and co-exist with Vulkan compute – and learn from Vulkan …

4. Define feature sets so target hardware does not have to implement inappropriate functionality

5. Adopt layered tools architecture to drive tools momentum and decrease run-time overhead

6. Leave usability, portability and performance portability to higher levels in the ecosystem

© Copyright Khronos Group 2016 - Page 33

SPIR – Standard Portable

Intermediate Representation

© Copyright Khronos Group 2016 - Page 34

SPIR-V Ecosystem

LLVM

Third party kernel and

shader Languages

SPIR-V • Khronos defined and controlled

cross-API intermediate language

• Native support for graphics

and parallel constructs

• 32-bit Word Stream

• Extensible and easily parsed

• Retains data object and control

flow information for effective

code generation and translation

OpenCL C++ OpenCL C

GLSL Khronos has open sourced

these tools and translators

IHV Driver

Runtimes

Other

Intermediate

Forms

SPIR-V Validator

SPIR-V Tools

SPIR-V (Dis)Assembler

LLVM to SPIR-V

Bi-directional

Translator

Khronos plans to open

source these tools soon

https://github.com/KhronosGroup/SPIR/tree/spirv-1.1

Open source C++ front-end released

© Copyright Khronos Group 2016 - Page 35

SPIR-V serves many APIs

Vulkan

OpenGL

OpenCL

● Need ongoing coherent design.

Talk to us early.

● Aim to serve everyone with

common features

● Updates need to be ratified by

SPIR WG and client API WG.

© Copyright Khronos Group 2016 - Page 36

Evolution of SPIR Family • SPIR–V is first fully specified Khronos-defined SPIR standard

- Does not use LLVM to isolate from LLVM roadmap changes

- Includes full flow control, graphics and parallel constructs beyond LLVM

- Khronos will open source SPIR-V <-> LLVM conversion tools

SPIR 1.2 SPIR 2.0 SPIR-V 1.0

LLVM Interaction Uses LLVM 3.2 Uses LLVM 3.4 100% Khronos defined

Round-trip lossless conversion

Compute Constructs Metadata/Intrinsics Metadata/Intrinsics Native

Graphics Constructs No No Native

Supported Language

Feature Set OpenCL C 1.2

OpenCL C 1.2 OpenCL C 2.0

OpenCL C 1.2 / 2.0 OpenCL C++

GLSL

OpenCL Ingestion OpenCL 1.2 Extension

OpenCL 2.0 Extension

OpenCL 2.1 Core

Vulkan Ingestion - - Vulkan Core

© Copyright Khronos Group 2016 - Page 37

Support for Both SPIR-V and LLVM • LLVM is an SDK, not a formally defined standard

- Khronos moved away from trying to use LLVM IR as a standard

- Issues with versioning, metadata, etc.

• But LLVM is a treasure chest of useful transforms

- SPIR-V tools can encapsulation and use LLVM to do useful SPIR-V transforms

• SPIR-V tools can all use different rules – and there will be lots of these

- May be lossy and only support SPIR-V subset

- Internal form is not standardized

- May hide LLVM version, metadata

SPIR-V

‘Rendezvous’ format

for interchange Native expression of graphics

and parallel functionality for

Khronos APIs

Tool-encapsulated

LLVM

HLSL

GLSL OpenCL C

OpenCL C++

Transform

Tool - Compression

- Optimization

- Stripping

- Linker/Merger

Driver

© Copyright Khronos Group 2016 - Page 38

Next Step

© Copyright Khronos Group 2016 - Page 39

Khronos Open Standards for Graphics and Compute

Portable intermediate representation

for graphics and parallel compute

High-efficiency GPU graphics and

compute for performance critical apps

Workhorse cross-platform professional 3D apps & gaming

Ubiquitous mobile gaming & graphics apps

Heterogeneous parallel compute

2000’s

2008

1990’s

2014

2015

Safety Critical Graphics 2005

New Extensions to enable latest

desktop graphics capabilities

OpenGL ES 3.2 released to bring

AEP functionality to core

New Safety Critical Working

Group – Call for Participation

OpenCL 2.0 specification update

and C++ Headers released

Provisional Spec Update and

significant open source activity

Adopted by Android and other

platforms. Building ecosystem

LATEST STATUS

© Copyright Khronos Group 2016 - Page 40

Roadmap Possibilities

1. C++ Shading Language

2. Single source C++

Programming from SYCL

3. OpenCL-class

Heterogeneous Compute to

Vulkan runtime

SPIR-V Ingestion for OpenGL and OpenGL

ES for shading language flexibility

Thin and

predictable

graphics and

compute for

safety critical

systems

© Copyright Khronos Group 2016 - Page 41

Virtual Reality Will Influence Graphics APIs • The ability to generate ‘Presence’ is becoming achievable at reasonable cost

- Using visual input to generate subconscious belief in a virtual situation

- PC-based AND mobile systems

• VR Requirements will affect how graphics APIs generate visual imagery

- Control over generation of stereo pairs – slightly different view for each eye

- Optical system geometric correction in rendering path

- Reduce latency through elimination of in-driver buffering

- Asynchronously warp framebuffer for

instantaneous response to head movement

Samsung Gear VR

Many companies driving the VR revolution are

participating in the Vulkan working group

© Copyright Khronos Group 2016 - Page 42

Khronos Standards for Advanced Processing

Low-power vision processing

for tracking, odometry and

scene analysis

Heterogeneous Processing Acceleration

e.g. Neural Net Processing for scene understanding

3D Graphics for

Portable display of augmentations

and visualizations on every platform Image: ScreenMedia

3D File formats for

AUTHORING and TRANSMISSION

of 3D runtime assets

© Copyright Khronos Group 2016 - Page 43

Thank You!

Bor-Sung Liang, Ph.D [email protected]

Workshop on Compiler Techniques and System Software for High-Performance and Embedded Computing

CTHPC 2016