gpu ecosystem
DESCRIPTION
This presentation describes the components of GPU ecosystem for compute, provides overview of existing ecosystems, and contains a case study on NVIDIA NsightTRANSCRIPT
Page 1
GPU Ecosystem Introduction & Case Study
Ofer Rosenberg
October 2013
Page 2
Content
GPU Ecosystem
Ecosystem on Mobile/Embedded Platforms
NSIGHT - Tools case study
Libraries
Page 3
Product
GPU Ecosystem
Software Product Development cycle:
The GPU Ecosystem role is to support, speedup, and
improve this cycle for GPU Compute
Design
Write Code
Debug
Profile
Page 4
GPU Ecosystem
Support writing code by:
IDE integration – Compiler, Parser, Wizards
Libraries: Math (BLAS, IPP-like, Matrix, etc.),
STL-like (Thrust, BOLT)
Support Debugging by:
IDE integration of the debugger (preferred)
Provide usable execution control (breakpoints, pause/resume, etc.)
Providing reliable memory view of various address spaces
Support Profiling by:
Provide two levels of profiling: System Tracing and Kernel Profiling
System Tracing - quick highlighting of hotspots and device optimal access
Statistical and TimeLine-based Kernel Profiling (using perf. counters)
Design
Write Code
Debug
Profile
Page 5
Ecosystem on
Mobile/Embedded Platforms
Page 6
ARM MALI
Part of ARM SoC
OpenCL 1.1Full Profile (Linux, Android)
Renderscript (Android only)
OpenCL SDK – Samples, Tutorials, etc.
No GPU debugging capability
ARM DS-5 (Developer Suite 5)
Eclipse IDE integration
Compiler, Debugger (CPU only)
System Trace – CPU & GPU
Deep Profiling - CPU & GPU
Page 7
Intel Haswell GPU
Part of Haswell (CPU & GPU)
OpenCL 1.2 Full Profile
Windows only for now (Linux @ alpha stage)
OpenCL SDK
Samples
Tools: Kernel Builder, VS/Eclipse Integration, Offline Compiler, GDB support (CPU Only)
No GPU debugging capability
VTune Amplifier XE supports OpenCL (CPU & GPU)
System level tracing (Application, Memory, Kernel launch)
Kernel Profiling
Page 8
Intel BayTrail platform (Atom)
BayTrail < 13W, BayTrail-M < 6.5W
Vallyview SoC (Z37xx)
GPU is based on Gen7 (same arch as IvyBridge)
Same as previous slide:
OpenCL 1.2 (windows only for now)
OpenCL SDK
VTune support
System level tracing
Kernel Profiling
Page 9
NVIDIA Tegra 5 ? (Codename: Logan)
Disclaimer: Logan is due early 2014. Part of the information is speculations
Development Boards and Samples available to selected customers
Logan SoC – 2W
ARM CPU A15 4+1 :speculated
Kepler based GPU : verified
CUDA Support : verified
CUDA SDK – Dozens of samples
CUDA Libraries: Thrust, cuBLAS, cuNVPP, etc.
NSIGHT : speculated
System Trace
Profiling, Debugging
Page 10
NSIGHT TOOLS CASE STUDY
Design
Write Code
Debug
Profile
Page 11
Nsight Highlights
“NVIDIA® Nsight™ is the ultimate development platform for heterogeneous
computing”
( Taken from Nsight page )
IDE integration
Windows – integration with Visual Studio
Linux – specialized Eclipse version
Debugging , System Trace , Profiling
Graphics (DX, OpenGL)
Computing (OpenCL, CUDA, C++ AMP)
Profiling only on CUDA kernels
Debug/Trace/Profile Information is highly shaped
Highly efficient information fields, windows, diagrams
Feedback from professional users is noticed
Page 12
Debugging
Much more than “just integrated” with the IDE
Shaped windows showing valuable info
Assembly (GPU!)
Variables across
all warps Visible layout of the stopped thread
Page 13
Debugging – Eclipse edition
Seems that Eclipse integration is deeper than Visual Studio
Unified CPU / GPU Debugging
Simultaneous visibility into both CPU and GPU state
Multi-GPU support
Slides from: “CUDA Development Using NVIDIA Nsight, Eclipse Edition” by David Goodwin, SC12
Full GPU debugging
Set kernel breakpoints
Single-step, run until, etc.
View values across multiple GPU
threads at the same time
Examine thread, warp, block state
Source and assembly level debugging
Page 14
System Trace
Page 15
Kernel Profiling
Choose a kernel to profile
Skip N kernels, Profile M kernels
Choose “experiments”
Experiment - Types of profiling/analysis
NVIDIA runs each kernel launch dozens of times with the same data
Page 16
Profiling Results
Experiment list
Each experiment is a tabbed window
Profiling information is shaped in graphs,
pie charts, diagrams, etc.
Taking HW counters and shaping them to easy-
to-understand graphics
Information targets known HW bottlenecks, Code
inefficiencies, etc.
Amazingly shaped…
Page 17
Profiling Results
The information provides a quick & easy methodic way to identify the performance
bottlenecks
1 2
3 4
Page 18
Eclipse Edition - Source Code Editor
Project Templates
CUDA code highlighting
CUDA aware refactoring
CUDA aware code completion and inline help
Page 19
LIBRARIES EXAMPLES
Page 20
CUDA Libraries – Part of the SDK
cuFFT
cuBLAS
cuRAND
cuSPARSE
NPP (like IPP)
Math Library
Thrust (next slide)
Page 21
Thrust Library
https://developer.nvidia.com/thrust
Works on top of CUDA
Open-source version is available at github
http://thrust.github.io/
Presentations:
http://on-demand.gputechconf.com/gtc-
express/2011/presentations/introductiontothrust.pdf
Page 22
OPENCL LIBRARIES
Page 23
CLPP
OpenCL Data Parallel Primitives Library (similar to thrust)
Source : https://code.google.com/p/clpp/
7 committers, last commit 1.5Y ago
Page 24
OpenCL BLAS
OpenCL BLAS
http://openclblas.sourceforge.net/
Code is available here (GPLv2):
http://sourceforge.net/projects/openclblas/
Page 25
ViennaCL
BLAS implementation
http://viennacl.sourceforge.net/
Looks very promising
Page 26
REFERENCES
Page 27
Platform links:
ARM
Developer site : http://malideveloper.arm.com
OpenCL tracing : http://malideveloper.arm.com/develop-for-mali/tools/mali-graphics-debugger/
DS-5 suite : http://www.arm.com/products/tools/software-tools/ds-5/index.php
OpenCL SDK : http://malideveloper.arm.com/develop-for-mali/sdks/mali-opencl-sdk/
OpenCL developer guide:
Online: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0538e/index.html
PDF: http://infocenter.arm.com/help/topic/com.arm.doc.dui0538e/DUI0538E_mali_t600_opencl_dg.pdf
NVIDIA
http://www.anandtech.com/show/7169/nvidia-demonstrates-logan-soc-mobile-kepler
http://www.slashgear.com/nvidia-tegra-logan-detailed-with-game-changing-cuda-integration-19274630/
http://www.ubergizmo.com/2013/07/nvidia-tegra-5-release-date-specs-news/
Page 28
Links:
Intel
OpenCL sdk http://software.intel.com/en-us/vcsource/tools/opencl-sdk
GPA http://software.intel.com/en-us/vcsource/tools/intel-gpa
vTune support in OpenCL http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe-getting-started-with-opencl-
performance-analysis-on-intel-hd-graphics
http://www.theinquirer.net/inquirer/news/2266966/intel-releases-opencl-sdk-for-windows-and-linux
Haswell Linux support: http://www.phoronix.com/scan.php?page=news_item&px=MTA3NDc
OpenCL “Beignet” – open source linux compiler :
http://software.intel.com/en-us/forums/topic/402118
http://linux.slashdot.org/story/13/04/16/014233/intel-releases-new-opencl-implementation-for-gnulinux
ATOM BayTrail:
http://arstechnica.com/gadgets/2013/02/intel-gets-aggressive-with-new-smartphone-and-tablet-chips/
http://www.anandtech.com/show/7314/intel-baytrail-preview-intel-atom-z3770-tested
http://www.tomshardware.com/reviews/bay-trail-celeron-j1750-performance,3614-6.html
http://software.intel.com/en-us/forums/topic/476221
http://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors#.22Bay_Trail.22_.2822_nm.29
Page 29
NSIGHT Links
http://www.nvidia.com/object/nsight.html
https://developer.nvidia.com/nsight-visual-studio-edition-videos
https://developer.nvidia.com/developer-webinars
http://on-demand.gputechconf.com/supercomputing/2012/presentation/SB006-Goodwin-
CUDA-Development-Nsight.pdf
http://on-demand.gputechconf.com/gtc/2013/presentations/S3011-CUDA-Optimization-
With-Nsight-VSE.pdf