intro to gpu programming

Upload: 3dmashup

Post on 04-Apr-2018

243 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Intro to GPU Programming

    1/29

    Introduction toGPU Programming

    Silicon Valley Camp 2012

    By Tim Child

  • 7/31/2019 Intro to GPU Programming

    2/29

    What We Will Cover TodaySpeakers BiographyWhy Program on GPUs?What's A GPUParallel AlgorithmsProgramming Model

    Programming Language AlternativesCUDAOpenCLC++ AMPOther LanguagesDemoGPU PerformanceFeeding the BeastToolsTrendsReference MaterialQ&A

    Overview

  • 7/31/2019 Intro to GPU Programming

    3/29

    Speakers Biography

    30+ years in Computing

    Currently CEO 3DMashUp Start-up

    Formerly VP Engineering

    Oracle, Informix & BEA Systems

    Programming GPUs

    3+ years (CUDA & OpenCL)

  • 7/31/2019 Intro to GPU Programming

    4/29

    Why Program on GPUs?

    Much More Data to Crunch Social Media, Web Logs, 3D scanners

    More Advanced Problems to Solve Monte Carlo Simulation, Data Mining

    More need for Real-time Answers Web Advertising, High Freq. Trading

    Single CPU Performance Growth Slowing More multi-core chips

    Data Centers - Power Constrained 100 MW Data Center, 12GW Power 2011 US Data Centers

  • 7/31/2019 Intro to GPU Programming

    5/29

    Parallelism Overview

    Data Center AWS EC2 >100K Cores ~30 MW Power VM's Infibiband

    Cluster Sparc T4 32 Cores 5- 10 KW Power O/S Threads/ MPI Fabric Switch

    Node Core I7 + GPU 16 - 2048 Cores ~1 KW Power GPU Programming

    PCIe BusAll Networked!

  • 7/31/2019 Intro to GPU Programming

    6/29

    Whats a GPU?

    1 -16 Cores2.5B Transistors3 4 GHz Clock4 - 8 Wide SIMD Vector ALU0 - 2 Hyper Threads1 - 256G RAM20 - 30 GB/s RAM Bandwidth

    3 - 2048 Cores7B Transistors~800 Mhz - 1GHz Clock16 Wide SIMD Vector ALU32 - 64 H/W Threads1 - 6G RAM

    200 - 300 GB/s RAM Bandwidth

  • 7/31/2019 Intro to GPU Programming

    7/29

    Why So Powerful?

  • 7/31/2019 Intro to GPU Programming

    8/29

    GPU Internals

    Nvidia GK104 3.5B Transistors

    1536 Cores 1 GHz 3 T FLOPs 192 GB/s 195 W

  • 7/31/2019 Intro to GPU Programming

    9/29

    Running AGPU Kernel

    O/SGPU

    Driver

    GPU SDKCUDAOpenCLC++ AMP

    WindowsLinux

    Kernel

    Environment

    Buffers

    Host Application

    ParallelAlgorithm Memory Strategy

    Arrangement Access Pattern

    Execution Model Threads Concurrency

  • 7/31/2019 Intro to GPU Programming

    10/29

    GPU ProgramExecution Sequence

  • 7/31/2019 Intro to GPU Programming

    11/29

    Parallel Algorithms

    Parallel Reduction

    Used for Sum Min Max

    Pass 1

    Pass 2

    Pass 3

    Pass 4

  • 7/31/2019 Intro to GPU Programming

    12/29

    Prefix Sum

    Prefix SumUsed forStream Compaction

  • 7/31/2019 Intro to GPU Programming

    13/29

    Programming Model

    2D Array of Work GroupsMapped to 2D Array of Cores

    Work Group

    2D Array of Work Items (Threads)

  • 7/31/2019 Intro to GPU Programming

    14/29

    Programming Languages& GPU Libraries

    Language/Lib Description Vendor Platform

    CUDA Original GPUprograming language

    Nvidia Linux, Windows, Mac,Nvidia

    Thrust C++ CUDA library Nvidia Linux, Windows, Mac,Nvidia

    CUBLAS CUDA Linear MathAlgorithms

    Nvida Linux, Windows, Mac,Nvidia

    CUFFT CUDA FFT library Nvidia Linux, Windows, Mac,Nvidia

    OpenCL C99 based GPU/CPUprogramming language

    Many Nvidia, AMD, Intel, Mac,ARM

    OpenGL C 3D Graphics API,GLSL Shader

    Khronos Nvidia, AMD, Intel, Mac,ARM

    WebCL JavaScript OpenCL Many FireFox, Chrome, Node.js

    C++ AMP MSFT C++ Extensions MSFT MSFT, Nvidia, AMD,Windows

  • 7/31/2019 Intro to GPU Programming

    15/29

    CUDA

    Compute Unified Device Architecture

    C/C++ Low Level and High Level API

    Embedded in Host C/C++ source 5 Years

    Richest GPU Ecoshpere

  • 7/31/2019 Intro to GPU Programming

    16/29

    OpenCL

    Open Compute Language

    C 99 Sub/Supset

    Requires C API Host CPU and GPU support

    Claims write once run anywhere

    ~3 years old Limited Ecopshere

  • 7/31/2019 Intro to GPU Programming

    17/29

    C++ AMP

    Microsoft

    C++ Accelerated Massive Parallelism

    VS 2012 and Windows 8 Built on DirectX 11

    C++ Language extensions

    Emerging technology Limited Ecosphere

  • 7/31/2019 Intro to GPU Programming

    18/29

    Other Languages CUDA Bindings

    Python SQL

    Perl

    .NET

    ...

    OpenCL

    JavaScript

    SQL

    Python

    Perl

    .NET

    ...

  • 7/31/2019 Intro to GPU Programming

    19/29

    Example Programs/**

    Vectoradd An OpenCL Kernel Function to add

    Two float 4 vectors@param[in] a float4 vector@param[in] b float4 vector@param[out] c float4 vector@param[in] n number of float4 vectors

    **/_kernel void vectoradd(

    global float4 *a,global4 float * b,global float4 * c,constant const int n)

    {int i = get_global_id(0);

    if ( get_global_id(0) < n){c[i] = a[i] + b[i];

    }

    }

  • 7/31/2019 Intro to GPU Programming

    20/29

    Live Demo

    Demo System

    Javascript IDE WebStorm

    Intel or AMD OpenCL SDK

    FireFox 15

    Nokia Research WebCL Extension

    AMD APU

  • 7/31/2019 Intro to GPU Programming

    21/29

    GPU Performance

    Get Data on GPU and Keep it There

    Give the GPU Plenty of Work

    Reuse Data within GPU to avoid LimitedBandwidth to CPU

  • 7/31/2019 Intro to GPU Programming

    22/29

    Memory Arrangements

    ID Name DOB Gender ID Name DOB Gender

    ID Name DOB Gender

    ID Name DOB Gender

    IDID ID ID ID

    Name NameNameName

    DOB DOBDOBDOB

    Array of Structs

    Struct of Arrays

  • 7/31/2019 Intro to GPU Programming

    23/29

    Feeding the Beast

  • 7/31/2019 Intro to GPU Programming

    24/29

    Coalesced Memory Access

    BlockXfer

    Aligned

    Block

    XferAligned

    B:lockXfer

    Not Aligned

  • 7/31/2019 Intro to GPU Programming

    25/29

    Ecosphere & Tools

    Debuggers

    Profilers

    System Monitors

    Kernel Analysis Tools

    HPC & GPU Meet-Up Groups

    HPC & GPU Sumpercomputing Group of Silicon Valley

    Heterogeneous Computing

    http://www.meetup.com/HPC-GPU-Supercomputing-Group-of-Silicon-Valley/http://www.meetup.com/Heterogenous-Computing/http://www.meetup.com/Heterogenous-Computing/http://www.meetup.com/HPC-GPU-Supercomputing-Group-of-Silicon-Valley/
  • 7/31/2019 Intro to GPU Programming

    26/29

    Future Trends

    Heterogeneous ComputingAMD + ARM + Others = HSA

    AMD A10 APU

    Mobile GPUs

    Dynamic Parallelism

  • 7/31/2019 Intro to GPU Programming

    27/29

    Conclusions Continue Sector Growth

    Compute Only GPU 50% CAGR GPU used in RDMS/OLAP

    PgOpenCL, Jedox

    GPU are Our Future

    Not Just for HPC

    Greener computing

    Parallel programming Skills become Important

    GPU Everywhere Severs, Desktops, Laptops, Tablets, Mobile

    Expect Better CPU GPU Integration

    HSA, Nvidia, Intel

  • 7/31/2019 Intro to GPU Programming

    28/29

    Reference Materials

    GPU SDK's

    CUDA

    AMD OpenCL

    Intel OpenCL

    Books

  • 7/31/2019 Intro to GPU Programming

    29/29

    Q & A