pycuda: even simpler gpu programming with python · python code gpu code gpu compiler gpu binary...

65
GPU Scripting PyOpenCL News RTCG Showcase PyCUDA: Even Simpler GPU Programming with Python Andreas Kl¨ ockner Courant Institute of Mathematical Sciences New York University Nvidia GTC · September 22, 2010 AndreasKl¨ockner PyCUDA: Even Simpler GPU Programming with Python

Upload: others

Post on 19-Mar-2020

103 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase

PyCUDA: Even SimplerGPU Programming with Python

Andreas Klockner

Courant Institute of Mathematical SciencesNew York University

Nvidia GTC · September 22, 2010

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 2: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase

Thanks

Jan Hesthaven (Brown)

Tim Warburton (Rice)

Leslie Greengard (NYU)

PyCUDA contributors

PyOpenCL contributors

Nvidia Corporation

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 3: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase

Outline

1 Scripting GPUs with PyCUDA

2 PyOpenCL

3 The News

4 Run-Time Code Generation

5 Showcase

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 4: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Overview Being Productive

Outline

1 Scripting GPUs with PyCUDAPyCUDA: An OverviewDo More, Faster with PyCUDA

2 PyOpenCL

3 The News

4 Run-Time Code Generation

5 Showcase

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 5: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Overview Being Productive

Whetting your appetite

1 import pycuda.driver as cuda2 import pycuda.autoinit3 import numpy45 a = numpy.random.randn(4,4).astype(numpy.float32)6 a gpu = cuda.mem alloc(a.nbytes)7 cuda.memcpy htod(a gpu, a)

[This is examples/demo.py in the PyCUDA distribution.]

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 6: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Overview Being Productive

Whetting your appetite

1 mod = cuda.SourceModule(”””2 global void twice( float ∗a)3 4 int idx = threadIdx.x + threadIdx.y∗4;5 a[ idx ] ∗= 2;6 7 ”””)89 func = mod.get function(”twice”)

10 func(a gpu, block=(4,4,1))1112 a doubled = numpy.empty like(a)13 cuda.memcpy dtoh(a doubled, a gpu)14 print a doubled15 print a

Compute kernel

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 7: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Overview Being Productive

Whetting your appetite

1 mod = cuda.SourceModule(”””2 global void twice( float ∗a)3 4 int idx = threadIdx.x + threadIdx.y∗4;5 a[ idx ] ∗= 2;6 7 ”””)89 func = mod.get function(”twice”)

10 func(a gpu, block=(4,4,1))1112 a doubled = numpy.empty like(a)13 cuda.memcpy dtoh(a doubled, a gpu)14 print a doubled15 print a

Compute kernel

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 8: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Overview Being Productive

Why do Scripting for GPUs?

GPUs are everything that scriptinglanguages are not.

Highly parallelVery architecture-sensitiveBuilt for maximum FP/memorythroughput

→ complement each other

CPU: largely restricted to controltasks (∼1000/sec)

Scripting fast enough

Python + CUDA = PyCUDA

Python + OpenCL = PyOpenCL

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 9: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Overview Being Productive

Scripting: Python

One example of a scripting language: Python

Mature

Large and active community

Emphasizes readability

Written in widely-portable C

A ‘multi-paradigm’ language

Rich ecosystem of sci-comp relatedsoftware

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 10: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Overview Being Productive

Scripting: Interpreted, not Compiled

Program creation workflow:

Edit

Compile

Link

Run

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 11: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Overview Being Productive

Scripting: Interpreted, not Compiled

Program creation workflow:

Edit

Compile

Link

Run

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 12: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Overview Being Productive

Scripting: Interpreted, not Compiled

Program creation workflow:

Edit

Compile

Link

Run

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 13: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Overview Being Productive

PyCUDA: Workflow

Edit

PyCUDA

Run

SourceModule("...")

Cache?

nvcc

no

.cubin

Upload to GPU

Run on GPU

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 14: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Overview Being Productive

How are High-Performance Codes constructed?

“Traditional” Construction ofHigh-Performance Codes:

C/C++/FortranLibraries

“Alternative” Construction ofHigh-Performance Codes:

Scripting for ‘brains’GPUs for ‘inner loops’

Play to the strengths of eachprogramming environment.

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 15: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Overview Being Productive

PyCUDA Philosophy

Provide complete access

Automatically manage resources

Provide abstractions

Check for and report errorsautomatically

Full documentation

Integrate tightly with numpy

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 16: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Overview Being Productive

What’s this “numpy”, anyway?

Numpy: package for large,multi-dimensional arrays.

Vectors, Matrices, . . .

A+B, sin(A), dot(A,B)

la.solve(A, b), la.eig(A)

cube[:, :, n-k:n+k], cube+5

All much faster than functional equivalents inPython.

“Python’s MATLAB”:Basis for SciPy, plotting, . . .

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 17: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Overview Being Productive

gpuarray: Simple Linear Algebra

pycuda.gpuarray:

Meant to look and feel just like numpy.

gpuarray.to gpu(numpy array)

numpy array = gpuarray.get()

+, -, ∗, /, fill, sin, exp, rand,basic indexing, norm, inner product, . . .

Mixed types (int32 + float32 = float64)

print gpuarray for debugging.

Allows access to raw bits

Use as kernel arguments, textures, etc.

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 18: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Overview Being Productive

Whetting your appetite, Part II

1 import numpy2 import pycuda.autoinit3 import pycuda.gpuarray as gpuarray45 a gpu = gpuarray.to gpu(6 numpy.random.randn(4,4).astype(numpy.float32))7 a doubled = (2∗a gpu).get()8 print a doubled9 print a gpu

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 19: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Overview Being Productive

gpuarray: Elementwise expressions

Avoiding extra store-fetch cycles for elementwise math:

from pycuda.curandom import rand as curanda gpu = curand((50,))b gpu = curand((50,))

from pycuda.elementwise import ElementwiseKernellin comb = ElementwiseKernel(

” float a, float ∗x, float b, float ∗y, float ∗z”,”z[ i ] = a∗x[i ] + b∗y[i ]”)

c gpu = gpuarray.empty like (a gpu)lin comb(5, a gpu, 6, b gpu, c gpu)

assert la .norm((c gpu − (5∗a gpu+6∗b gpu)).get()) < 1e−5

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 20: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Overview Being Productive

gpuarray: Reduction made easy

Example: A scalar product calculation

from pycuda.reduction import ReductionKerneldot = ReductionKernel(dtype out=numpy.float32, neutral=”0”,

reduce expr=”a+b”, map expr=”x[i]∗y[i]”,arguments=”const float ∗x, const float ∗y”)

from pycuda.curandom import rand as curandx = curand((1000∗1000), dtype=numpy.float32)y = curand((1000∗1000), dtype=numpy.float32)

x dot y = dot(x, y). get()x dot y cpu = numpy.dot(x.get(), y.get())

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 21: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Overview Being Productive

PyCUDA: Vital Information

http://mathema.tician.de/

software/pycuda

Complete documentation

MIT License(no warranty, free for all use)

Requires: numpy, Python 2.4+(Win/OS X/Linux)

Support via mailing list

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 22: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase

Outline

1 Scripting GPUs with PyCUDA

2 PyOpenCL

3 The News

4 Run-Time Code Generation

5 Showcase

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 23: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase

OpenCL’s perception problem

OpenCL does not presently get thecredit it deserves.

Single abstraction works well forGPUs, CPUs

Vendor-independence

Compute Dependency DAG

A JIT C compiler baked into alibrary

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 24: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase

Introducing. . . PyOpenCL

PyOpenCL is“PyCUDA for OpenCL”

Complete, mature API wrapper

Has: Arrays, elementwiseoperations, RNG, . . .

Near feature parity with PyCUDA

Tested on all availableImplementations, OSs

http://mathema.tician.de/

software/pyopencl

OpenCL

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 25: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase

Introducing. . . PyOpenCL

Same flavor, different recipe:

import pyopencl as cl , numpy

a = numpy.random.rand(50000).astype(numpy.float32)

ctx = cl. create some context()queue = cl.CommandQueue(ctx)

a buf = cl. Buffer(ctx , cl .mem flags.READ WRITE, size=a.nbytes)cl . enqueue write buffer (queue, a buf , a)

prg = cl.Program(ctx, ”””kernel void twice( global float ∗a)

int gid = get global id (0);a[gid ] ∗= 2;”””). build ()

prg. twice(queue, a.shape, None, a buf). wait()

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 26: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Exciting Developments in GPU-Python

Outline

1 Scripting GPUs with PyCUDA

2 PyOpenCL

3 The NewsExciting Developments in GPU-Python

4 Run-Time Code Generation

5 Showcase

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 27: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Exciting Developments in GPU-Python

Step 1: Download

Hot off the presses:

PyCUDA 0.94.1

PyOpenCL 0.92

All the goodies from this talk, plus

Supports all new features in CUDA3.0, 3.1, 3.2rc, OpenCL 1.1

Allows printf()

(see example in Wiki)

New stuff shows up in git very quickly.Still needed: better release schedule.

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 28: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Exciting Developments in GPU-Python

Step 2: Installation

PyCUDA and PyOpenCL no longerdepend on Boost C++

Eliminates major install obstacle

Easier to depend on PyCUDA andPyOpenCL

easy install pyopencl workson Macs out of the box

Boost is still there–just notuser-visible by default.

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 29: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Exciting Developments in GPU-Python

Step 3: Usage

Complex numbers

. . . in GPUArray

. . . in user code(pycuda-complex.hpp)

If/then/else for GPUArrays

Support for custom device pointers

Smarter device picking/contextcreation

PyFFT: FFT for PyOpenCL andPyCUDA

scikits.cuda: CUFFT, CUBLAS,CULA

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 30: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Exciting Developments in GPU-Python

Sparse Matrix-Vector on the GPU

New feature in 0.94:Sparse matrix-vectormultiplication

Uses “packeted format”by Garland and Bell (alsoincludes parts of their code)

Integrates with scipy.sparse.

Conjugate-gradients solverincluded

Deferred convergencechecking

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 31: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Exciting Developments in GPU-Python

Step 4: Debugging

New in 0.94.1: Support for CUDA gdb:

$ cuda-gdb --args python -m

pycuda.debug demo.py

Automatically:

Sets Compiler flags

Retains source code

Disables compiler cache

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 32: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Writing Code when the most Knowledge is Available

Outline

1 Scripting GPUs with PyCUDA

2 PyOpenCL

3 The News

4 Run-Time Code GenerationWriting Code when the most Knowledge is Available

5 Showcase

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 33: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Writing Code when the most Knowledge is Available

GPU Programming: Implementation Choices

Many difficult questions

Insufficient heuristics

Answers are hardware-specific andhave no lasting value

Proposed Solution: Tune automaticallyfor hardware at run time, cache tuningresults.

Decrease reliance on knowledge ofhardware internals

Shift emphasis fromtuning results to tuning ideas

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 34: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Writing Code when the most Knowledge is Available

GPU Programming: Implementation Choices

Many difficult questions

Insufficient heuristics

Answers are hardware-specific andhave no lasting value

Proposed Solution: Tune automaticallyfor hardware at run time, cache tuningresults.

Decrease reliance on knowledge ofhardware internals

Shift emphasis fromtuning results to tuning ideas

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 35: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Writing Code when the most Knowledge is Available

Metaprogramming

Idea

Python Code

GPU Code

GPU Compiler

GPU Binary

GPU

Result

Machine

Human

In GPU scripting,GPU code doesnot need to bea compile-time

constant.

(Key: Code is data–it wants to bereasoned about at run time)

Good for codegeneration

PyCUDAPyOpenCL

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 36: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Writing Code when the most Knowledge is Available

Metaprogramming

Idea

Python Code

GPU Code

GPU Compiler

GPU Binary

GPU

Result

Machine

Human

In GPU scripting,GPU code doesnot need to bea compile-time

constant.

(Key: Code is data–it wants to bereasoned about at run time)

Good for codegeneration

PyCUDAPyOpenCL

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 37: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Writing Code when the most Knowledge is Available

Metaprogramming

Idea

Python Code

GPU Code

GPU Compiler

GPU Binary

GPU

Result

Machine

Human

In GPU scripting,GPU code doesnot need to bea compile-time

constant.

(Key: Code is data–it wants to bereasoned about at run time)

Good for codegeneration

PyCUDAPyOpenCL

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 38: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Writing Code when the most Knowledge is Available

Metaprogramming

Idea

Python Code

GPU Code

GPU Compiler

GPU Binary

GPU

Result

Machine

Human

In GPU scripting,GPU code doesnot need to bea compile-time

constant.

(Key: Code is data–it wants to bereasoned about at run time)

Good for codegeneration

PyCUDAPyOpenCL

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 39: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Writing Code when the most Knowledge is Available

Metaprogramming

Idea

Python Code

GPU Code

GPU Compiler

GPU Binary

GPU

Result

Machine

Human

In GPU scripting,GPU code doesnot need to bea compile-time

constant.

(Key: Code is data–it wants to bereasoned about at run time)

Good for codegeneration

PyCUDAPyOpenCL

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 40: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Writing Code when the most Knowledge is Available

Metaprogramming

Idea

Python Code

GPU Code

GPU Compiler

GPU Binary

GPU

Result

Machine

Human In GPU scripting,GPU code doesnot need to bea compile-time

constant.

(Key: Code is data–it wants to bereasoned about at run time)

Good for codegeneration

PyCUDAPyOpenCL

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 41: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Writing Code when the most Knowledge is Available

Metaprogramming

Idea

Python Code

GPU Code

GPU Compiler

GPU Binary

GPU

Result

Machine

Human

In GPU scripting,GPU code doesnot need to bea compile-time

constant.

(Key: Code is data–it wants to bereasoned about at run time)

Good for codegeneration

PyCUDAPyOpenCL

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 42: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Writing Code when the most Knowledge is Available

Metaprogramming

Idea

Python Code

GPU Code

GPU Compiler

GPU Binary

GPU

Result

Machine

Human

In GPU scripting,GPU code doesnot need to bea compile-time

constant.

(Key: Code is data–it wants to bereasoned about at run time)

Good for codegeneration

PyCUDA

PyOpenCL

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 43: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Writing Code when the most Knowledge is Available

Metaprogramming

Idea

Python Code

GPU Code

GPU Compiler

GPU Binary

GPU

Result

Machine

Human

In GPU scripting,GPU code doesnot need to bea compile-time

constant.

(Key: Code is data–it wants to bereasoned about at run time)

Good for codegeneration

PyCUDAPyOpenCL

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 44: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Writing Code when the most Knowledge is Available

Machine-generated Code

Why machine-generate code?

Automated Tuning(cf. ATLAS, FFTW)

Data types

Specialize code for given problem

Constants faster than variables(→ register pressure)

Loop Unrolling

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 45: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Writing Code when the most Knowledge is Available

RTCG via Templates

from jinja2 import Templatetpl = Template(”””

global void twice( type name ∗tgt)

int idx = threadIdx.x + thread block size ∗ block size ∗ blockIdx .x;

% for i in range( block size ) %% set offset = i∗ thread block size %tgt [ idx + offset ] ∗= 2;

% endfor %”””)

rendered tpl = tpl . render(type name=”float”, block size =block size ,thread block size =thread block size )

smod = SourceModule(rendered tpl)

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 46: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Python+GPUs in Action Conclusions

Outline

1 Scripting GPUs with PyCUDA

2 PyOpenCL

3 The News

4 Run-Time Code Generation

5 ShowcasePython+GPUs in ActionConclusions

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 47: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Python+GPUs in Action Conclusions

Discontinuous Galerkin Method

Let Ω :=⋃

i Dk ⊂ Rd .

Goal

Solve a conservation law on Ω: ut +∇ · F (u) = 0

Example

Maxwell’s Equations: EM field: E (x , t), H(x , t) on Ω governed by

∂tE −1

ε∇× H = − j

ε, ∂tH +

1

µ∇× E = 0,

∇ · E =ρ

ε, ∇ · H = 0.

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 48: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Python+GPUs in Action Conclusions

Discontinuous Galerkin Method

Let Ω :=⋃

i Dk ⊂ Rd .

Goal

Solve a conservation law on Ω: ut +∇ · F (u) = 0

Example

Maxwell’s Equations: EM field: E (x , t), H(x , t) on Ω governed by

∂tE −1

ε∇× H = − j

ε, ∂tH +

1

µ∇× E = 0,

∇ · E =ρ

ε, ∇ · H = 0.

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 49: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Python+GPUs in Action Conclusions

Discontinuous Galerkin Method

Let Ω :=⋃

i Dk ⊂ Rd .

Goal

Solve a conservation law on Ω: ut +∇ · F (u) = 0

Example

Maxwell’s Equations: EM field: E (x , t), H(x , t) on Ω governed by

∂tE −1

ε∇× H = − j

ε, ∂tH +

1

µ∇× E = 0,

∇ · E =ρ

ε, ∇ · H = 0.

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 50: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Python+GPUs in Action Conclusions

GPU DG Showcase

Eletromagnetism

Poisson

CFD

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 51: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Python+GPUs in Action Conclusions

GPU DG Showcase

Eletromagnetism

Poisson

CFD

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 52: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Python+GPUs in Action Conclusions

GPU DG Showcase

Eletromagnetism

Poisson

CFD

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 53: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Python+GPUs in Action Conclusions

GPU DG Showcase

Eletromagnetism

Poisson

CFD

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 54: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Python+GPUs in Action Conclusions

GPU DG Showcase

Eletromagnetism

Poisson

CFD

Shock-laden flows

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 55: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Python+GPUs in Action Conclusions

GPU-DG: Performance on GTX280

0 2 4 6 8 10Polynomial Order N

0

50

100

150

200

250

300

GFl

ops/

s

GPU

CPU

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 56: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Python+GPUs in Action Conclusions

16 T10s vs. 64 = 8× 2× 4 Xeon E5472

2 4 6 8Polynomial Order N

0

1000

2000

3000

4000

GFl

ops/

s

Flop Rates and Speedups: 16 GPUs vs 64 CPU cores

GPU

CPU

Tim Warburton: Shockingly fast and accurateCFD simulationsWednesday, 11:00–11:50(Several posters/talks on GPU-DG at GTC.)

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 57: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Python+GPUs in Action Conclusions

16 T10s vs. 64 = 8× 2× 4 Xeon E5472

2 4 6 8Polynomial Order N

0

1000

2000

3000

4000

GFl

ops/

s

Flop Rates and Speedups: 16 GPUs vs 64 CPU cores

GPU

CPU

Tim Warburton: Shockingly fast and accurateCFD simulationsWednesday, 11:00–11:50(Several posters/talks on GPU-DG at GTC.)

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 58: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Python+GPUs in Action Conclusions

Computational Visual Neuroscience

Nicolas Pinto: Easy GPU Metaprogramming:A Case Study in Biologically-Inspired Com-puter VisionThursday, 10:00–10:50, Room A1

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 59: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Python+GPUs in Action Conclusions

Computational Visual Neuroscience

Nicolas Pinto: Easy GPU Metaprogramming:A Case Study in Biologically-Inspired Com-puter VisionThursday, 10:00–10:50, Room A1

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 60: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Python+GPUs in Action Conclusions

Copperhead

from copperhead import ∗import numpy as np

@cudef axpy(a, x, y):return [a ∗ xi + yi for xi , yi in zip(x, y)]

x = np.arange(100, dtype=np.float64)y = np.arange(100, dtype=np.float64)

with places .gpu0:gpu = axpy(2.0, x, y)

with places .here :cpu = axpy(2.0, x, y)

Bryan Catanzaro: Copperhead: Data-ParallelPython for the GPUWednesday, 15:00–15:50 (next slot!), Room N

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 61: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Python+GPUs in Action Conclusions

Copperhead

from copperhead import ∗import numpy as np

@cudef axpy(a, x, y):return [a ∗ xi + yi for xi , yi in zip(x, y)]

x = np.arange(100, dtype=np.float64)y = np.arange(100, dtype=np.float64)

with places .gpu0:gpu = axpy(2.0, x, y)

with places .here :cpu = axpy(2.0, x, y)

Bryan Catanzaro: Copperhead: Data-ParallelPython for the GPUWednesday, 15:00–15:50 (next slot!), Room N

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 62: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Python+GPUs in Action Conclusions

Conclusions

Fun time to be in computational science

Even more fun with Python and PyCUDA,OpenCLWith no compromise in performance

GPUs and scripting work well together

Enable Metaprogramming

The “Right” way to develop computational codes

Bake all runtime-available knowledge into code

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 63: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Python+GPUs in Action Conclusions

Where to from here?

More at. . .

→ http://mathema.tician.de/

CUDA-DG

AK, T. Warburton, J. Bridge, J.S. Hesthaven, “NodalDiscontinuous Galerkin Methods on Graphics Processors”,J. Comp. Phys., 2009.

GPU RTCG

AK, N. Pinto et al. PyCUDA: GPU Run-Time Code Generation forHigh-Performance Computing, in prep.

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 64: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Python+GPUs in Action Conclusions

Questions?

?

Thank you for your attention!

http://mathema.tician.de/

image credits

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python

Page 65: PyCUDA: Even Simpler GPU Programming with Python · Python Code GPU Code GPU Compiler GPU Binary GPU Result Machine Human In GPU scripting, GPU code does not need to be a compile-time

GPU Scripting PyOpenCL News RTCG Showcase Python+GPUs in Action Conclusions

Image Credits

Fermi GPU: Nvidia Corp.C870 GPU: Nvidia Corp.Python logo: python.orgOld Books: flickr.com/ppdigital

Adding Machine: flickr.com/thomashawk

Floppy disk: flickr.com/ethanheinThumbs up: sxc.hu/thiagofestOpenCL logo: Ars Technica/Apple Corp.Newspaper: sxc.hu/brandcoreBoost C++ logo: The Boost C++ project?/! Marks: sxc.hu/svilen001

Machine: flickr.com/13521837@N00

Andreas Klockner PyCUDA: Even Simpler GPU Programming with Python