performance toolsperformance tools

43
Performance Tools Performance Tools Jeff Kiel Jeff Kiel Manager, Graphics Tools

Upload: others

Post on 12-Sep-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Performance ToolsPerformance Tools

Performance ToolsPerformance Tools

Jeff KielJeff KielManager, Graphics Tools

Page 2: Performance ToolsPerformance Tools

Performance Tools Agenda

Eric Preisz: Optimizing Marble Blast Ultrap gIntroducing PerfHUD 6.0PerfKit

PerfSDK: GPU & Driver Performance DataGraphic Remedy’s gDEBugger

ShaderPerfShaderPerfFX Composer 2.5: Shader Debugger

© NVIDIA Corporation 2008

Page 3: Performance ToolsPerformance Tools

OptimizingOptimizingMarble Blast Ultra

By: Eric Preisz

Page 4: Performance ToolsPerformance Tools

Marble Blast Ultra

130 H130 Hz78 Hz

Page 5: Performance ToolsPerformance Tools

The Game Pipeline

CPU GPU

AI Phys. Culling GameRasterizationPrimitive

Assembly

DirectX

D iV/G/F

Sh dVertex R O

Assembly

PCIe

Drivers ShaderAssembly R.Op

Texture

System Mem Non-Local Vid Mem Video Memory

Texture

y Video Memory

Page 6: Performance ToolsPerformance Tools

Levels of Optimization

System - % of hardware utilizationy

Application - Class/function/algorithm

Micro - Line by line optimizations.

Page 7: Performance ToolsPerformance Tools

Simple Mistakes - Resource Allocations

Page 8: Performance ToolsPerformance Tools

CPU vs GPUPerformance Dashboard

GPU Busy % for the best FPS increase

CPU BOUND GPU BOUND

0% 80% 90% 100%

CPU BOUND GPU BOUND

Page 9: Performance ToolsPerformance Tools

GPU System LevelPerformance Dashboard

VA(%) ~5%Sh d (%) 55%

VS(%) ~5%GS(%) 0%Shader(%) ~55%

Texture(%) ~45%ROP(%) ~25%

GS(%) ~0%PS(%) ~55%

Page 10: Performance ToolsPerformance Tools

GPU Application LevelFrame Profiler

Page 11: Performance ToolsPerformance Tools

GPU Application LevelFrame Profiler

State Bucket = 4.125 ms, 307,695 pixels

Draw Call = 3.247 ms, 257,662 pixels

Page 12: Performance ToolsPerformance Tools

GPU Micro Level

Page 13: Performance ToolsPerformance Tools

GPU System LevelPerformance Dashboard

Page 14: Performance ToolsPerformance Tools

GPU Application Level

Problem: Inexpensive geometry, low number of draw calls, high number of expensive pixels…..

Solution: Render scene twice and disable color writes on the first draw!

Page 15: Performance ToolsPerformance Tools

GPU Application Level

State Bucket = 4.125 ms, 307,695 pixelsDraw Call = 3.247 ms, 257,662 pixelsState Bucket = 1.830 ms, 455,928 pixelsDraw Call = 0.834 ms 125,868

Page 16: Performance ToolsPerformance Tools

GPU System LevelPerformance Dashboard

Still very GPU Bound (A)Still very GPU Bound (A),Still very shader and texture bound (B)…

Page 17: Performance ToolsPerformance Tools

GPU Application LevelFrame Profiler

Now we are limited by a FFP call. By using the “scrubber” we determine that the call renders the glow buffer quad to the screen.

Page 18: Performance ToolsPerformance Tools

GPU System LevelPerformance Dashboard

Moving on to a texture optimization

Page 19: Performance ToolsPerformance Tools

GPU System LevelPerformance Dashboard

A StretchRect optimization reduces texture work, making pixel optimizations even more valuable.

Page 20: Performance ToolsPerformance Tools

GPU System LevelPerformance Dashboard

A driver optimization increases GPU Utilization.

Page 21: Performance ToolsPerformance Tools

GPU System LevelPerformance Dashboard

What’s next…

- Investigate more pixel optimizations – Scissor test confirms.- Compressed texture usage – forcing 2x2 gives ~5 frame improvement.p o e e t- SLI Optimizations- Investigate CPU optimizations, on the 8800 GPU utilization is only 25%

Page 22: Performance ToolsPerformance Tools

Thanks Eric!!!

© NVIDIA Corporation 2008

Page 23: Performance ToolsPerformance Tools

Introducing PerfHUD 6.0!

Unified Driver on Vista: use any release driver!yComprehensive SLI Support

Graphs for SLI specific dataInsight into SLI performance gotchas

Powerful new debugging featuresTexture visualization modesTexture visualization modesAPI Call data mining and analysisShader visualization

U bilit F tUsability FeaturesAll new hot key supportRich use of Direct3D PerfMarkers

© NVIDIA Corporation 2008

Page 24: Performance ToolsPerformance Tools

PerfHUD 6: Performance Dashboard

Graph GPU and driver dataEdit to suit your needsSLI Graph for multi-GPUAPI usage statistics

Crysis used with permission from Crytek. © Crytek GmbH. All Rights Reserved. Crysis and CryENGINEare trademarks or registered trademarks of Crytek GmbH in the U.S and/or other countries.

© NVIDIA Corporation 2008

Page 25: Performance ToolsPerformance Tools

PerfHUD 6: Frame Debugger

Scrub through sceneVisualize draw call infoTextures and RTsTooltips on buffers

Crysis used with permission from Crytek. © Crytek GmbH. All Rights Reserved. Crysis and CryENGINEare trademarks or registered trademarks of Crytek GmbH in the U.S and/or other countries.

© NVIDIA Corporation 2008

Page 26: Performance ToolsPerformance Tools

PerfHUD 6: Frame Debugger

Texture analysis: substituteprecomputed textures

Controllable via Perf Markers

Crysis used with permission from Crytek. © Crytek GmbH. All Rights Reserved. Crysis and CryENGINEare trademarks or registered trademarks of Crytek GmbH in the U.S and/or other countries.

© NVIDIA Corporation 2008

Page 27: Performance ToolsPerformance Tools

PerfHUD6: Frame DebuggerBuffer VisualizationBuffer Visualization

Visualize any buffer full screen

2D/3D/Cube/ArraysPan/ZoomChange mipmap levelChange mipmap level

Crysis used with permission from Crytek. © Crytek GmbH. All Rights Reserved. Crysis and CryENGINEare trademarks or registered trademarks of Crytek GmbH in the U.S and/or other countries.

© NVIDIA Corporation 2008

Page 28: Performance ToolsPerformance Tools

PerfHUD6: Frame DebuggerAPI Call ListAPI Call List

Based on a frame captureSee frame events

including parametersTooltips for detailsConnected to scrubber

Crysis used with permission from Crytek. © Crytek GmbH. All Rights Reserved. Crysis and CryENGINEare trademarks or registered trademarks of Crytek GmbH in the U.S and/or other countries.

© NVIDIA Corporation 2008

Page 29: Performance ToolsPerformance Tools

PerfHUD 6: Frame DebuggerDraw Call DependenciesDraw Call Dependencies

Show producers & consumers dependencies for each call

These can hurt single GPU and SLI performanceand SLI performance

Crysis used with permission from Crytek. © Crytek GmbH. All Rights Reserved. Crysis and CryENGINEare trademarks or registered trademarks of Crytek GmbH in the U.S and/or other countries.

© NVIDIA Corporation 2008

Page 30: Performance ToolsPerformance Tools

PerfHUD 6: Adv Frame DebuggerVertex AssemblyVertex Assembly

Geometry PreviewVertex and index buffer setup

Crysis used with permission from Crytek. © Crytek GmbH. All Rights Reserved. Crysis and CryENGINEare trademarks or registered trademarks of Crytek GmbH in the U.S and/or other countries.

© NVIDIA Corporation 2008

Page 31: Performance ToolsPerformance Tools

PerfHUD 6: Adv Frame DebuggerVertex Geometry and Pixel ShadersVertex, Geometry and Pixel Shaders

Edit & Continue ShadersVisualize input texturespConstantsSampler overrides

Crysis used with permission from Crytek. © Crytek GmbH. All Rights Reserved. Crysis and CryENGINEare trademarks or registered trademarks of Crytek GmbH in the U.S and/or other countries.

© NVIDIA Corporation 2008

Page 32: Performance ToolsPerformance Tools

PerfHUD 6: Adv Frame Debugger

Display and modify all render state settingsg

Render targets displayed

Crysis used with permission from Crytek. © Crytek GmbH. All Rights Reserved. Crysis and CryENGINEare trademarks or registered trademarks of Crytek GmbH in the U.S and/or other countries.

© NVIDIA Corporation 2008

Page 33: Performance ToolsPerformance Tools

PerfHUD 6: Frame ProfilerOne button bottleneck determinationOne button bottleneck determination

All draw calls profiledDraw calls grouped by Stateg p y

Buckets: multiply performance optimizations

Multiple result graphs

Crysis used with permission from Crytek. © Crytek GmbH. All Rights Reserved. Crysis and CryENGINEare trademarks or registered trademarks of Crytek GmbH in the U.S and/or other countries.

© NVIDIA Corporation 2008

Page 34: Performance ToolsPerformance Tools

PerfHUD 6: Adv Frame Profiler

Same advanced features now in the profiling contextp g

Crysis used with permission from Crytek. © Crytek GmbH. All Rights Reserved. Crysis and CryENGINEare trademarks or registered trademarks of Crytek GmbH in the U.S and/or other countries.

© NVIDIA Corporation 2008

Page 35: Performance ToolsPerformance Tools

PerfHUD 6A taste of things to come!

Better control via PerfMarkers: add them now!ette co t o a e a e s add t e oAPI time graphMore performance hints: VSync on, windowed mode, event queries, not all render targets used, VBs not managed, etc.Subtotals in Frame ProfilerSubtotals in Frame ProfilerBreak (_int 3) on draw call32bit apps on 64bit OSspp

© NVIDIA Corporation 2008

Page 36: Performance ToolsPerformance Tools

PerfKit: Features

PerfSDKReal time performance information in your gameDriver data, GPU counters, etc.Simplified Experiments for easy bottleneck analysisSimplified Experiments for easy bottleneck analysisSimple API, code samples and helper classes

GLExpertDetailed feedback on pipeline setupSLI performance feedbackWarnings for software fallbackWarnings for software fallbackVBO/FBO performance information

Microsoft PIX for Windows plugin

© NVIDIA Corporation 2008

GPU & driver counters alongside PIX data

Page 37: Performance ToolsPerformance Tools

Simplified G80+ Counter Diagram

IA VS3-5, A-B

6-9, C 10 11

FB TEXTURE

IA VS

SHADERGEOM

SO GS

FB

, 10-11

12-1314-15

16

26-29, G 30-34, EH H

SETUP

Unified Shading Units

17-21I

PS

RASTERIZER

ROP

Vertex Shading PathGeometry Shading PathPixel Shading Path

22-23D, J

24-2535-39, F

1. gpu_idle (Global)2. gpu_busy (Global)3. input_assembler_busy4. input_assembler_waits_for_fb5. vertex_attribute_count

14. geom_vertex_out_count15. geom_primitive_out_count16. stream_out_busy17. setup_primitive_culled_count18. setup_primitive_count

27. shader_waits_for_texture28. shader_waits_for_geom29. shader_waits_for_rop30. texture_busy31. texture_waits_for_fb

6. geom_busy7. geom_waits_for_shader8. geom_vertex_in_count9. geom_primitive_in_count10. vertex_shader_busy11. vertex_shader_instruction_rate12. geometry_shader_busy13. geometry_shader_instruction_rate

19. setup_triangle_count20. setup_point_count21. setup_line_count 22. shaded_pixel_count23. rasterizer_pixels_killed_zcull_count24. pixel_shader_busy25. pixel_shader_instruction_rate26. shader_busy

32. texture_waits_for_shader33. texture_sample_base_level_rate34. texture_sample_average_level35. rop_busy36. rop_waits_for_fb37. rop_waits_for_shader38. rop_pixels_killed_earlyz_count39. rop_pixels_killed_latez_count

© NVIDIA Corporation 2008

A. 2D Bottleneck and SOLB. IDX Bottleneck and SOLC. GEOM Bottleneck and SOLD. ZCULL Bottleneck and SOLE. TEX Bottleneck and SOL

F. ROP Bottleneck and SOLG. SHD Bottleneck and SOLH. FB Bottleneck and SOLI. Primitive Setup Bottleneck and SOLJ. Rasterization Bottleneck and SOL

Page 38: Performance ToolsPerformance Tools

Graphic Remedy’s gDEBugger

OpenGL and OpenGL ES Debugger and ProfilerNVIDIA PerfKit and GLExpert integratedShorten development timeShorten development timeImprove application qualityOptimize performanceTexture/buffer viewerWi d XP & Vi t Li t !Windows XP & Vista, Linux too!Discounted academic licenses available

http://www.gremedy.com

© NVIDIA Corporation 2008

Page 39: Performance ToolsPerformance Tools

Graphic Remedy’s gDEBugger

© NVIDIA Corporation 2008

Page 40: Performance ToolsPerformance Tools

ShaderPerf 2.0

Determine shader performanceDetermine shader performance Compare with shader cycle budgetsTest optimization opportunitiesp pp

Automated regression analysisIntegrated in FX Composerg p

Interactive GUIArtists/TDs code expensive shadersAchieve optimum performance

GeForce 8 series performance data

© NVIDIA Corporation 2008

Beta 2.0 available now!

Page 41: Performance ToolsPerformance Tools

FX ComposerShader Authoring Made Easy!Shader Authoring Made Easy!

DirectX 10 backendShader DebuggerShader DebuggerGeForce 8 Series Shader PerformanceFull-featured code editorShader creation wizard with templatesShader creation wizard with templatesIntegration with online Shader LibraryMaterials panel to organize materials

© NVIDIA Corporation 2008

Page 42: Performance ToolsPerformance Tools

FX ComposerHLSL10 Shader Debugging!HLSL10 Shader Debugging!

© NVIDIA Corporation 2008

Page 43: Performance ToolsPerformance Tools

Questions?

Online: downloads, videos, etc.

http://developer.nvidia.com/PerfKithttp://developer.nvidia.com/PerfHUDhttp://developer.nvidia.com/ShaderPerfhttp://developer nvidia com/FXComposerhttp://developer.nvidia.com/FXComposer

Feedback and Support: http://developer.nvidia.com/forums

© NVIDIA Corporation 2008