cuda vs opencl

21
ArrayFire Webinar: OpenCL and CUDA Trade-Offs and Comparisons

Upload: john-melonakos

Post on 10-May-2015

1.588 views

Category:

Technology


7 download

DESCRIPTION

The slidedeck from a webinar given

TRANSCRIPT

Page 1: CUDA vs OpenCL

ArrayFire Webinar: OpenCL and CUDA Trade-Offs and Comparisons

Page 2: CUDA vs OpenCL

GPU Software Features

Portability Scalability

Community

Programmability

Performance

Page 3: CUDA vs OpenCL

Leading GPU Computing Platforms

Portability Scalability

Community

Programmability

Performance

Page 4: CUDA vs OpenCL

Performance

• Both CUDA and OpenCL are fast

• Both can fully utilize the hardware

• Devil in the details:

– Hardware type, algorithm type, code quality

Page 5: CUDA vs OpenCL

Performance

• ArrayFire results at end of webinar

Page 6: CUDA vs OpenCL

Scalability

• Laptops --> Single GPU Machine

– Both CUDA and OpenCL scale, no code change

• Single GPU Machine --> Multi-GPU Machine

– User managed, low-level synchronization

• Multi-GPU Machine --> Cluster

– MPI

Page 7: CUDA vs OpenCL

Scalability

Interesting developments:

• Memory Transfer Optimizations

– CUDA GPUDirect technology

• Mobile GPU Computing

– OpenCL available on ARM, Imgtec, Freescale, …

Page 8: CUDA vs OpenCL

Scalability

• Laptops --> Single GPU Machine

– ArrayFire’s JIT optimizes for GPU type

• Single GPU Machine --> Multi-GPU Machine

– ArrayFire’s deviceset() function is super easy

• Multi-GPU Machine --> Cluster

– MPI

Page 9: CUDA vs OpenCL

Portability

• CUDA is NVIDIA-only

– Open source announcement

– Does not provide CPU fallback

• OpenCL is the open industry standard

– Runs on AMD, Intel, and NVIDIA

– Provides CPU fallback

Page 10: CUDA vs OpenCL

Portability

• ArrayFire is fully portable

– Same ArrayFire code runs on CUDA or OpenCL

– Simply select the right library

Page 11: CUDA vs OpenCL

Community

• NVIDIA CUDA Forums – 26,893 topics

• AMD OpenCL Forums – 4,038 topics

• Stackoverflow CUDA Tag – 1,709 tags

• Stackoverflow OpenCL Tag – 564 tags

Page 12: CUDA vs OpenCL

Community

• AccelerEyes GPU Forums – 1,435 topics

– Largest GPU forums by a software company

• Next largest

– PGI GPU Forums – 485 topics

Page 13: CUDA vs OpenCL

Programmability

• Both CUDA and OpenCL are low-level

– Time consuming kernel development

– Data-parallel algorithm design

• Focus on programmability interfaces

Page 14: CUDA vs OpenCL

Programmability

Time-consuming

SSE or AVX

Easy-to-use

Faster

Slower

Page 15: CUDA vs OpenCL

Programmability

Time-consuming

Writing Kernels

SSE or AVX

Easy-to-use

Faster

Slower

Page 16: CUDA vs OpenCL

Programmability

Time-consuming

Writing Kernels

SSE or AVX

Easy-to-use

Compiler Directives

Faster

Slower

Page 17: CUDA vs OpenCL

Programmability

Faster

Time-consuming

Writing Kernels

Using Libraries

Compiler Directives

SSE or AVX

Slower

Easy-to-use

Page 18: CUDA vs OpenCL

Libraries Make the Difference

Raw math libraries in NVIDIA CUDA

• CUBLAS, CUFFT, CULA, Magma

– Provides all BLAS, LAPACK, and FFT routines necessary for most dense matrix operations

• CUSPARSE

– A good start for sparse linear algebra

Page 19: CUDA vs OpenCL

Libraries Make the Difference

Raw math libraries in AMD OpenCL

• clAmdBlas, clAmdFft

– Provides most important Blas routines

– Provides radix 2, 3, and 5 FFT routines

– No LAPACK support

– No sparse data support

Page 20: CUDA vs OpenCL

Libraries Make the Difference

Raw math libraries in AMD OpenCL

• clAmdBlas, clAmdFft

– Provides most important Blas routines

– Provides radix 2, 3, and 5 FFT routines

– No LAPACK support

– No sparse data support

– Runs on any OpenCL-compliant device

Page 21: CUDA vs OpenCL

ArrayFire Code and Benchmarks