nvidia parallel nsight™ 2.0 and · cuda update: new in 4.0 easier programming use any gpu on any...

35
NVIDIA Parallel Nsight™ 2.0 and CUDA 4.0 for the Win! Jeff Kiel, Manager of Graphics Tools NVIDIA Corporation, SIGGRAPH 2011

Upload: phungthien

Post on 25-Aug-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

NVIDIA Parallel Nsight™ 2.0 and

CUDA 4.0 for the Win!

Jeff Kiel, Manager of Graphics Tools

NVIDIA Corporation, SIGGRAPH 2011

Page 2: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

Agenda

CUDA Update

NVIDIA Parallel Nsight

CUDA Debugging and Profiling

Graphics Debugging and Profiling

Application Analysis

Page 3: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

CUDA Update: New in 4.0

Easier Programming

Use any GPU on any thread

C++ new/delete and support for virtual functions

Inline PTX assembly

Thrust C++ Template Performance Primitives Libraries (sort, reduce, etc.)

NVIDIA Performance Primitives library (image/video processing)

Faster Multi-GPU Programming

Unified Virtual Addressing (UVA)

GPU Direct 2.0: GPU peer-to-peer communication technology

Developer Tools for Linux and MacOS

CUDA-GDB

Visual Profiler with Automated Performance Analysis

Page 4: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

Build Debug Profile

Visual Studio with Parallel Nsight

Integrated development for CPU and GPU

Page 5: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

NVIDIA Parallel Nsight for Graphics & Compute

Graphics Inspector

Real-time inspection of Direct3D API calls

Investigate GPU pipeline state

See contributing fragments with Pixel History

Profile frames to find GPU bottlenecks

System Analysis

View CPU & GPU events on a single timeline

Examine workload dependencies

CUDA, Direct3D, and OpenGL API Trace

Profile CUDA kernels using performance counters

Free License!

CUDA/Graphics Debugger

Visual Studio debugging environment

GPU Accelerated CUDA and HLSL debugging

Examine code executing in parallel

Conditional breakpoints, memory viewer, etc.

Page 6: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

System Analysis Graphics Inspector

Host + Target (32/64 bit)

Install appropriate NVIDIA driver

Install Parallel Nsight Host and Monitor

One computer, one NVIDIA GPU

Page 7: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

System Analysis Graphics Inspector CUDA Debugger

Install appropriate NVIDIA driver

Install Parallel Nsight Host and Monitor

Configure Local Headless Debugging (see User’s Manual)

One computer, two NVIDIA GPUs

Host + Target (32/64 bit)

Page 8: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

System Analysis Graphics Inspector CUDA Debugger Graphics Debugger

Network

Target (32/64-bit)Host (32/64-bit)

Two computers, one with NVIDIA GPUs

Install appropriate NVIDIA driver on the Target System

Install Parallel Nsight Monitor on the Target System

Install Parallel Nsight Host on the Development System

Page 9: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

Host + Target (32/64-bit)

Install Parallels Desktop and guest OS

Install appropriate NVIDIA drivers

Install Parallel Nsight Host and Monitor

One computer, two NVIDIA GPUs

Full GPU acceleration

System Analysis Graphics Inspector CUDA Debugger Graphics Debugger

Page 10: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

FFT Ocean Demo Overview

Page 11: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

FFT Ocean Demo Overview

Based on “Simulating Ocean Water” by Jerry Tenssendorf

Statistical model, not physics based

Generates wave distribution in frequency domain on GPU

Inverse FFT on GPU

Movies: Large height map (2048x2048)

Games on CPU: Small height map (64x64)

GPU Based: Medium height map (512x512)

The ocean surface is composed by enormous simple waves

Each simple wave is a hybrid sine wave (Gerstner wave)

A mass point on the surface is doing vertical circular motion

Page 12: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

FFT Ocean Demo Overview

Wave Simulation in CUDA

Rendering in DX11

Scene Rendering Components

Page 13: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

Demo: Launching…Start Nsight Monitor

Launch Your Application

Configure Parallel Nsight Project Settings

Page 14: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

Demo: CUDA Debugging

Page 15: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

Demo: CUDA Debugging

Page 16: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

Demo: CUDA Debugging

Page 17: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

Demo: Analysis

Page 18: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

Demo: Analysis

Page 19: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

Demo: Graphics Debugging - HUD

Page 20: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

Demo: HUD Showing 2x2 Texture

Page 21: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

Demo: HUD in Graphics Inspector

Page 22: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

Demo: HUD Render Target, Depth & Stencil

Page 23: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

Demo: Host Frames Page

Page 24: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

Demo: Draw Call Page

Page 25: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

Demo: Texture Viewer

Page 26: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

Demo: Pixel Shader State Inspector

Page 27: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

Demo: Buffer Inspector

Page 28: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

Demo: Output Merger Inspector

Page 29: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

Demo: Pixel History

Page 30: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

Demo: Shader Debugger Breakpoint

Page 31: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

Demo: Focus Picker

Page 32: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

Demo: Frame Profiler

Page 33: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

View all graphics resources at a glance

Numerous usability and workflow improvements

Graphics profiler performance and accuracy

Driver independence

Stability improvements

Support for latest drivers and hardware

Version 2.0

NVIDIA Parallel Nsight: Graphics Features

Page 34: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

Version 2.0

NVIDIA Parallel Nsight: Compute Features

CUDA Toolkit 4.0 Support

Full Visual Studio 2010 Platform Support

Tesla Compute Cluster (TCC) Analysis

PTX Assembly Debugging

Attach to Process

Derived Metrics and Experiments

Concurrent Kernel Trace

Runtime API Trace

Advanced Conditional Breakpoints

Support for latest drivers hardware

Page 35: NVIDIA Parallel Nsight™ 2.0 and · CUDA Update: New in 4.0 Easier Programming Use any GPU on any thread C++ new/delete and support for virtual functions Inline PTX assembly Thrust

Wrap Up…Thank You!

Call to action!

Download Parallel Nsight and try it out

Send us feedback on what features you find important

Come talk to us here at SIGGRAPH

Contact us on the NVIDIA Developer Forums

http://forums.nvidia.com/index.php?showforum=191