dlprof plugin for tensorboard · this panel displays common issues detected in the profiled network...

40
DU-09833-002 _v20.06 | June 2020 DLProf Plugin for TensorBoard User Guide

Upload: others

Post on 12-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

DU-09833-002 _v20.06   |   June  2020

DLProf Plugin for TensorBoard

User Guide

Page 2: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   ii

Table of Contents

Chapter 1. DLProf TensorBoard Plugin...............................................................................11.1. Overview.....................................................................................................................................1

1.2. What's New in 20.06................................................................................................................. 1

1.3. Features.....................................................................................................................................1

Chapter 2. Quick Start..........................................................................................................32.1. Prerequisites............................................................................................................................. 3

2.2. Using the NGC Docker Container............................................................................................3

2.3. Generating TensorBoard Event Files.......................................................................................3

2.4. Starting TensorBoard............................................................................................................... 4

Chapter 3. DLProf Plugin..................................................................................................... 53.1. Overview.....................................................................................................................................5

3.2. Pane Overview........................................................................................................................... 5

3.3. Navigation Pane........................................................................................................................ 6

3.4. Details Pane.............................................................................................................................. 8

3.4.1. Op Details Panel.................................................................................................................8

3.4.2. Kernel Details Panel.......................................................................................................... 9

Chapter 4. Content Pane.................................................................................................... 114.1. Dashboard................................................................................................................................11

4.1.1. GPU Idle Panel................................................................................................................. 11

4.1.2. Op Summary Panel.......................................................................................................... 12

4.1.3. Kernel Summary Panel....................................................................................................13

4.1.4. Tensor Core Kernel Efficiency Panel.............................................................................. 14

4.1.5. Performance Summary Panel......................................................................................... 14

4.1.6. Top 10 GPU Ops Panel.....................................................................................................16

4.1.7. System Configuration Panel............................................................................................ 17

4.1.8. Recommendations Panel................................................................................................. 17

4.1.9. Guidance Panel.................................................................................................................18

Chapter 5. Op Type Summary............................................................................................ 195.1. Ops and Kernels......................................................................................................................21

5.2. Ops Data Table........................................................................................................................21

5.3. Kernel Summaries Data Table...............................................................................................22

Chapter 6. Kernels by Iteration View................................................................................. 236.1. Iteration Summary Data Table...............................................................................................24

6.2. Kernels in Selected Iteration................................................................................................. 25

Page 3: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   iii

Chapter 7. Kernels by Op View.......................................................................................... 267.1.  Iteration Summary Table........................................................................................................27

7.2. Ops in Selected Iteration Table..............................................................................................28

7.3. Kernels Selected Iteration / Op Combination Table............................................................. 29

7.4. Using Data Tables...................................................................................................................29

Chapter 8. Graphs Plugin...................................................................................................318.1. Network View Panel................................................................................................................31

8.2. Guidance Panel....................................................................................................................... 32

8.3. Tensor Core Help Panel......................................................................................................... 32

8.4. Navigation Panel..................................................................................................................... 32

Chapter 9. XLA View........................................................................................................... 34

Page 4: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   iv

Page 5: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   1

Chapter 1. DLProf TensorBoard Plugin

1.1.  OverviewThe DLProf Plugin for TensorBoard makes it easy to visualize the performance of your modelsby showing Top 10 operations that took the most time, eligibility of Tensor Core operations andTensor Core usage, as well as interactive iteration reports.

1.2.  What's New in 20.06‣ Compatible with event files generated by DLProf v0.12.0 / r20.06.

‣ Improved UI/UX experience:

‣ Navigate to reports directly from relevant summary page widgets

‣ Improved navigation

‣ Improved Expert Systems feedback

‣ A link is provided to the appropriate report and highlights issues that triggered thewarning.

‣ Added more detailed iteration information to summary table.

‣ Added support for event files generated from PyTorch profiles

1.3.  FeaturesThis release includes these commands and features:

‣ Panelized Dashboard Summary View: A summary view comprising several panels thatprovide a quick overview of the performance results.

‣ Top-level Key Metrics: The Summary view displays several key metrics that are used toquickly gauge the quality of the performance, including Average Iteration Time and TensorCore Utilization.

Page 6: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

DLProf TensorBoard Plugin

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   2

‣ Top 10 GPU Ops Panel: A table in the Model Summary view lists the top 10 Ops with themost time spent on the GPU.

‣ Expert Systems Panel: This panel displays any issues detected by the DLProf ExpertSystems, along with suggestions on how to address the issues and improve the modelsperformance.

‣ Interactive Tables: All tables in detailed views are completely interactive, allowing the useto sort, filter, and paginate the display.

‣ Interoperable Tables: Several views have the ability to drill down for more information.Selecting a row in one table will populate the next table with performance informationpertaining to the selection.

‣ Modified Graph Plugin: Graph plugin will color code nodes based on Tensor Core eligibilityand usage.

‣ XLA Visualization: The graph view displays the original ops within the compiled XLA ops inthe Graph Plugin when XLA is enabled in the run.

Page 7: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   3

Chapter 2. Quick Start

2.1.  PrerequisitesThese steps are required to use pre-built NGC containers:

‣ Ensure you have access and are logged into NGC. For step-by-step instructions, see theNGC Getting Started Guide.

‣ Install Docker and nvidia-docker. For DGX users, see Preparing to use NVIDIA Containers.For users other than DGX, see nvidia-docker installation documentation.

2.2.  Using the NGC Docker ContainerMake sure you log into NGC as described in Prerequisites before attempting the steps in thissection. Use docker pull to get the TensorFlow container from NGC:$ docker pull nvcr.io/nvidia/tensorflow:<xx.yy>-tf1-py3

Where <xx.yy> is the version of the TensorFlow container that you want to pull.

Assuming the training data for the model is available in /full/path/to/training/data, you canlaunch the container with the following command:$ docker run --rm --gpus=1 --shm-size=1g --ulimit memlock=-1 \--ulimit stack=67108864 -it -p6006:6006 -v/full/path/to/training/data:/data \nvcr.io/nvidia/tensorflow:<xx.yy>-tf1-py3

2.3.  Generating TensorBoard Event FilesEvent files are created directly from the Deep Learning Profiler. Refer to the Deep LearningProfiler User Guide for information on how to generate the appropriate TensorBoard EventFiles.

Page 8: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

Quick Start

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   4

2.4.  Starting TensorBoardTensorBoard and the GPU Plugin are installed in the TensorFlow 1.x container on the NVIDIAGPU Cloud (NGC). The container must be run with the -p6006:6006 option to open port 6006 forthe TensorBoard server. (Note, use any port number such as 6007, 6008, etc.)

TensorBoard is launched directly from the container:tensorboard --logdir <event_files>

Where <event_files> is the path to the event files directory. Once running, TensorBoard canbe viewed in a browser with the URL:http://<machine IP Address>:6006

Page 9: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   5

Chapter 3. DLProf Plugin

This section describes each of the available views in the DLProf Plugin. For more details onthe TensorBoard web interface and a description of the TensorBoard features, go to: https://github.com/tensorflow/tensorboard/blob/master/docs/r1/graphs.md.

3.1.  OverviewThe following information is common to all views within the DLProf Plugin.

Terms and Definitions

Term Definition

Group Node A collection of Ops and other group nodes.This is also called a subgraph. A GroupNode’s subgraph can be visualized by double-clicking the group node.

Compatible Node An Op Node whose operational type is oneof many operational types that are capableof running on a Tensor Core. For example, aconvolution node can have a kernel that runson a Tensor Core, but a Const node or NoOpnode cannot.

Op Node A node in the graph where an operation isperformed on the incoming tensor.

Model, Graph, Network <synonyms>

Tensor Data that flows between the operations.

3.2.  Pane OverviewThe DLProf plugin user interface is divided into three panes:

Page 10: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

DLProf Plugin

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   6

Pane Definintion

Navigation Pane This pane is where you can navigate to differentviews and select different NVTX ranges.Additionally, you can navigate to the onlinedocumentation and launch your email app tosend us comments. Click the ‘X’ to remove thenavigator Pane.

Content Pane Otherwise known as the VIEWS pane, this iswhere you will see all of the different pieces of theprofiled neural network.

Details Pane This pane is to display additional details about theinformation selected in the content pane. Let’ssay you’re looking at the Kernel Summary panelin the content pane, and you want to see moreinformation about the kernels.

Click on the

button (tooltip = ‘Show Kernel Details’). Thedetailed kernel information will appear in theDetails Pane. Click the ‘X’ button to remove thedetails pane.

3.3.  Navigation Pane

Page 11: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

DLProf Plugin

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   7

UI Controls

Control Definition

X Closes the Navigator pane and widens all theother panels.

Select View Clicking on the green "Select View" buttonprovides a drop-down list of available views.Clicking on a name in the drop down list loadsthat view in the main display panel. Availableviews are:

‣ Dashboard

‣ Op Type Summary

‣ Ops and Kernels

‣ Kernels by Iteration

‣ Kernels by Op

Select Domain (optional) This optional drop down list appears when anetwork has been profiled using the NVIDIA ToolsExtension (NVTX) plugin. See github for moredetails: https://github.com/NVIDIA/nvtx-plugins

Each domain is profiled independently, so whena different domain is selected, all values in theentire DLProf plugin will change. This pluginallows network programmers to isolate andprofile certain areas of a network.

Online Documentation This hyperlink navigates users to the onlineversion of this document.

Page 12: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

DLProf Plugin

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   8

Email Us Let us know! If you have a comment, question,or suggestion, click this link. It will launch yourdefault email software with the TO addressalready filled in. Just fill in the Subject line, typeyour email message, and click send.

3.4.  Details PaneThe Details pane holds panels that show more details about a particular area of the system.

3.4.1.  Op Details PanelThe Op Details panel is displayed at the top-right of the browser underneath the TensorBoardTitle Bar.

Fields

Field Definition

All Ops Aggregates the total GPU time and count for all ops inthe network.

Ops Using TC Aggregates the total GPU time and count for ops in thenetwork that use Tensor Cores.

Page 13: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

DLProf Plugin

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   9

Ops Eligible For TC, But Not Using Aggregates the total GPU time and count for ops in thenetwork that are not using Tensor Cores but are eligibleto do so.

All Other Ops Aggregates the total GPU time and counts for all othertypes of ops in the network.

UI Controls

Control Definition

X Closes the Details Pane

3.4.2.  Kernel Details PanelThe Kernel Details panel is displayed at the top-right of the browser underneath theTensorBoard Title Bar. This panel provides key metrics about the kernels in the networkaggregated over the specific iteration range.

Fields

Field Definition

All Kernels Aggregates the total GPU time and count for allkernels in the network.

Page 14: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

DLProf Plugin

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   10

Kernels Using TC Aggregates the total GPU time and count for allkernels using Tensor Cores.

Memory Kernels Aggregates the total GPU time and count for allmemory kernels.

All Other Kernels Aggregates the total GPU time and count for allremaining kernel types.

UI Controls

Control Definition

X Closes the Details Pane.

Page 15: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   11

Chapter 4. Content Pane

4.1.  DashboardThe Dashboard view provides a high level summary of the performance results in a panelizedview. This view serves as a starting point in analyzing the results and provides several keymetrics.

4.1.1.  GPU Idle PanelThe GPU Idle panel visually indicates the percentage of GPU idle time during execution ofaggregated iterations.

Page 16: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

Content Pane

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   12

4.1.2.  Op Summary PanelThis panel provides key metrics about the ops in the network aggregated over the specificiteration range. The charts provide a graphical representation of the data.

‣ Hovering over a slice in the chart will show the aggregated GPU time.

‣ Clicking a legend item will toggle its visualization in the chart.

Fields

Legend Label Definition

Page 17: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

Content Pane

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   13

Ops Using TC Aggregates the total GPU time for ops in thenetwork that use Tensor Cores.

Ops Eligible For TC Aggregates the total GPU time for ops in thenetwork that are not using Tensor Cores but areeligible to do so.

All Other Ops Aggregates the total GPU time for all other typesof ops.

UI Controls

Control Definition

Show Op Details panel in Details Pane.

More Show drop-down menu of more views.

4.1.3.  Kernel Summary PanelThe panel provides key metrics about the kernels in the network aggregated over the specificiteration range.

‣ Hovering over a slice in the chart will show the aggregated GPU time.

‣ Clicking a legend item will toggle its visualization in the chart.

Fields

Legend Label Definition

Kernels Using TC Aggregates the total GPU time for all kernels.

Page 18: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

Content Pane

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   14

Memory Kernels Aggregates the total GPU time for all kernelsusing Tensor Cores.

All Other Kernels Aggregates the total GPU time for all remainingkernel types.

UI Controls

Control Definition

Show Kernel Details panel in Details Pane.

More... Show drop-down menu of more views.

4.1.4.  Tensor Core Kernel Efficiency Panel

‣ Hovering over a slice in the chart will show the percentage.

‣ Clicking a legend item will toggle its visualization in the chart.

4.1.5.  Performance Summary PanelThe Performance Summary panel provides top level key metrics about the performance dataaggregated over the specific iteration range.

Page 19: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

Content Pane

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   15

Field Definition

Wall Clock Time This is the total run time for the aggregationrange, and is defined as the time between thestart time of the first op in the starting iterationon the CPU and the end time last op in the finaliteration on either the CPU or GPU, whichevertimestamp is greatest.

Tensor Core Kernel Efficiency % This high level metric represents the utilizationof Tensor Core enabled kernels. TensorCore operations can provide a performanceimprovement and should be used when possible.This metric is calculated by:

[Total GPU Time for Tensor Core kernels] / [TotalGPU Time for Tensor Core Eligible Ops]

A 100% Tensor Core Utilization means thatall eligible Ops are running only Tensor Coreenabled kernels on the GPU. A 50% Tensor CoreUtilization can mean anything from all eligible Opsare running Tensor Core kernels only half of thetime to only half of all eligible Ops are runningTensor Core kernels only. This metric should beused with the Op Summary Panel to determinethe quality of Tensor Core usage.

Higher is better.

Page 20: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

Content Pane

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   16

GPU Idle % Percent of the time that the GPU is idle. Note, thisincludes the time that the GPU is waiting on thedata pipeline.

Lower is better.

Total Iterations The total number of iterations found in thenetwork.

Profiled Iterations The total number of iterations used to aggregatethe performance results. This number iscalculated using ‘Start Iteration’ and ‘StopIteration’.

Start Iteration The starting iteration number used to generateperformance results.

Stop Iteration The ending iteration number used to generateperformance results.

Average Iteration Time The average iteration time is the total Wall Timedivided by the number of iterations.

4.1.6.  Top 10 GPU Ops PanelTop 10 GPU Ops table shows the top 10 operations with the largest execution times on theGPU. This table comes pre-sorted with the order of each row in descending GPU Time. Thetable is not sortable or searchable.

Column Definition

GPU Time Cumulative time executing all GPU kernelslaunched for the op.

Op Name The name of the op.

Op Type The type of the op.

Calls The number of times the op was called.

TC Eligible A true/false field indicating whether or not the opis eligible to use Tensor Core kernels.

Page 21: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

Content Pane

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   17

Using TC A true/false field indicating whether or not one ofthe kernels launched in this op is using TensorCores.

4.1.7.  System Configuration PanelThis panel displays the various environmental attributes, such as the number of GPUs, thenames of all the GPUs, and the versions of CUDA and DLProf.

Field Definition

GPU Count The number of GPU devices found on thecomputer during training.

GPU Name(s) A list of the GPU devices found on the computerduring training

GPU Driver Version The version of the driver used for NVIDIA GraphicsGPU.

CUDA Version The version of the CUDA parallel computingplatform.

DLProf Version The version of the Deep Learning Profiler used togenerate the data visualized in the DLProf Pluginin TensorBoard.

DLProf Plugin Version The version of the DLProf Plugin.

TensorBoard Version The version of TensorBoard framework.

4.1.8.  Recommendations PanelThe Recommendations panel displays common issues detected in the profiled network andprovides potential solutions and suggestions to address the issues. The panel will only showissues that have been detected by DLProf. For a full list of potential issues that DLProf lookfor, please refer to the Expert Systems section in the Deep Learning Profiler User Guide.

Page 22: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

Content Pane

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   18

The double-green arrows will show additional information about the detected problem.

Column Definition

Problem The description of the scenario that DLProfdetected when profiling the network.

Recommendation A recommendation or actionable feedback,a tangible suggestion that the user can do toimprove the network.

4.1.9.  Guidance PanelThis panel provides static guidance to the user to help the user learn more about TensorCores, Mixed Precision training. The panel has hyperlinks for further reading.

Page 23: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   19

Chapter 5. Op Type Summary

This table aggregates metrics over all op types and enables users to see the performance ofall the ops in terms of its types, such as Convolutions, Matrix Multiplications, etc.

See this description for all the features available in all DataTables.

Op Type Data Table

Column name Description

Op Type The TensorFlow operation name.

No. Ops The number of ops that have the Op Typeabove.

No. Calls Number of instances that the operation wascalled / executed.

TC Eligible Indicates if the op can use Tensor Coresbased on operation name.

Page 24: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

Op Type Summary

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   20

Column name Description

Using TC Indicates if a Tensor Core enabled kernel wasused by any op having this op type.

Total CPU Time (ns) The total CPU time of all instances of this optype.

Avg. CPU Time (ns) The average CPU time of all instances of thisop type.

Min CPU Time (ns) The minimum CPU time found amongst allinstances of this op type.

Max CPU Time (ns) The maximum CPU time found amongst allinstances of this op type.

Total GPU Time (ns) The total GPU time of all instances of this optype.

Avg. GPU Time (ns) The average GPU time of all instances of thisop type.

Min GPU Time (ns) The minimum GPU time found amongst allinstances of this op type.

Max GPU Time (ns) The maximum GPU time found amongst allinstances of this op type.

Total CPU Overhead Time (ns) The total CPU overhead of all instances ofthis op type.

Avg. CPU Overhead Time (ns) The average CPU overhead of all instances ofthis op type.

Min CPU Overhead Time (ns) The minimum CPU overhead found amongstall instances of this op type.

Max CPU Overhead Time (ns) The maximum CPU overhead found amongstall instances of this op type.

Total GPU Idle Time (ns) The total GPU idle time of all instances of thisop type.

Avg. GPU Idle Time (ns) The average GPU idle time of all instances ofthis op type.

Min GPU Idle Time (ns) The minimum GPU idle time found amongstall instances of this op type.

Max GPU Idle Time (ns) The maximum GPU idle time found amongstall instances of this op type.

Page 25: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

Op Type Summary

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   21

5.1.  Ops and KernelsThis view enables users to view, search, sort all ops and their corresponding kernels in theentire network.

5.2.  Ops Data TableWhen a row is selected in the Ops table, a summary of each kernel for that op is displayed inthe bottom table.

See this description for all the features available in all DataTables.

Entry Description

GPU Time (μs) Cumulative time executing all GPU kernelslaunched for the op

Page 26: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

Op Type Summary

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   22

Entry Description

CPU Time (μs) Cumulative time executing all op instances onthe CPU

Op Name The name of the op.

Op Type The operation of the op.

Origin Where the op originated from. Examplesinclude GraphDef for the graphdef file, andOptimized for a Tensorflow optimized op.

Calls The number of times the op was called.

TC Eligible A true/false field indicating whether or notthe op is eligible to use Tensor Core kernels.

Using TC A true/false field indicating whether or notone of the kernels launched in this op is usingTensor Cores.

Kernel Calls The number of kernels launched in the op.

Data Type The data type of this op (eg, float16, int64,int32)

Input Shapes The input shapes of the op.

5.3.  Kernel Summaries Data TableSee this description for all the features available in all DataTables.

Kernels Summary table

Entry Description

Kernel name The full name of the kernel.

Using TC A true/false field indicating whether or notthe kernel is actually using a Tensor Core.

Calls The number of times this kernel waslaunched.

GPU Time (μs) The aggregate duration of each time thekernel was launched.

Avg (μs) The average duration of each time this kernelwas launched.

Min (μs) The minimum duration of all kernel launches.

Max (μs) The maximum duration of all kernellaunches.

Page 27: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   23

Chapter 6. Kernels by Iteration View

The Kernels by Iterations view shows operations and their execution time for every iteration. Ata glance, you can compare iterations with respect to time as well as Kernels executed.

Page 28: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

Kernels by Iteration View

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   24

6.1.  Iteration Summary Data TableTo see the kernels for a specific iteration, click a row in the top table. The ‘Kernels in SelectedIteration’ table will be filled with the kernels from the selection iteration.

See this description for all the features available in all DataTables.

Page 29: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

Kernels by Iteration View

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   25

Column Description

Iteration The iteration interval numbers.

Total Kernels The number of GPU kernels called during thisiteration.

TC Kernels The number of GPU Tensor Core kernelscalled during the iteration.

Total GPU Time (μs) Cumulative execution time for all kernels onthe GPU during the iteration.

TC GPU Time (μs) Cumulative execution time for all Tensor Corekernels on the GPU during the iteration.

%TC Duration Percentage of GPU time spent executingTensor Cores for the iteration.

6.2.  Kernels in Selected IterationSee this description for all the features available in all DataTables.

Column Description

Kernel Timestamp (ns) The timestamp for when the CUDA API callwas made for this kernel in the CPU thread.Useful to see the order in which the kernelswere called.

Op Name The name of the op that launched the kernel.

Uses TC A true/false field indicating whether thekernel uses Tensor Cores.

GPU Time (ns) The time spent executing the kernel on theGPU.

Kernel Name The name of the kernel.

Page 30: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   26

Chapter 7. Kernels by Op View

The Kernels by Op view is a variation of the Kernels by Iterations view. It has the capability tofilter the list of kernels by iterations and op.

See this description for all the features available in all DataTables.

Page 31: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

Kernels by Op View

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   27

7.1.  Iteration Summary TableSelecting an iteration in the Iteration Summary table will populate the Ops in SelectedIteration table with all the profile data for the ops from the selected iteration. Selecting an opin the Ops in Selected Iteration table will populate the Kernels in Selected Op table with the listof kernels and timing data executed for the selected Op and Iteration.

See this description for all the features available in all DataTables.

Page 32: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

Kernels by Op View

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   28

Column Description

Iteration The iteration interval numbers

Total Kernels The number of GPU kernels called during thisiteration.

TC Kernels The number of GPU Tensor Core kernelscalled during the iteration.

Total GPU Time (μs) Cumulative execution time for all kernels onthe GPU during the iteration.

TC GPU Time (μs) Cumulative execution time for all Tensor Corekernels on the GPU during the iteration.

% TC Duration Percentage of GPU time spent executingTensor Cores for the iteration.

7.2.  Ops in Selected Iteration TableOps in Selected Iteration

See this description for all the features available in all DataTables.

Column Description

Op Name The name of the op that launched the kernel.

Op Type The type of the op.

Op Start The time the op was launched. Used to sortops chronologically.

Total Kernels The number of GPU kernels called during thisiteration.

TC Kernels The number of GPU Tensor Core kernelscalled during the iteration.

Total GPU Time (ns) Cumulative execution time for all kernels onthe GPU during the op.

TC GPU Time (ns) Cumulative execution time for all Tensor Corekernels on the GPU during the op.

% TC Duration Percentage of GPU time spent executingTensor Cores for the op.

Data Type The data type of this op (eg, float16, int64,int32).

Input Shapes The input shapes of the op.

Page 33: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

Kernels by Op View

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   29

7.3.  Kernels Selected Iteration / OpCombination Table

Kernels in Selected Iteration / Op combination

See this description for all the features available in all DataTables.

Column Description

Kernel Timestamp (ns) The timestamp for when the CUDA API callwas made for this kernel in the CPU thread.Useful to see the order in which the kernelswere called.

Uses TC A true/false field indicating whether thekernel uses Tensor Cores.

GPU Time (ns) The time spent executing the kernel on theGPU.

Kernel Name The name of the kernel.

7.4.  Using Data TablesDataTables are used in many views in the DLProf Plugin. The features in DataTables enableusers to quickly find information. Below is a screenshot to see the location of the features,followed by a table describing the functionality of each feature.

Page 34: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

Kernels by Op View

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   30

Data Table Feature Definition

Showing label The label under the table (bottom left) will show areal-time count of rows in the table.

Search text box Filter results by text search. Typing in this fieldwill display only those rows that contain the text inthe box. Adding or removing text in the Search boxwill update the “Showing…” label.

Sort toggle button Clicking on a column header will sort thetable. When clicked, all rows are sorted eitherascending or descending. The initial sort on mostnumeral columns are descending.

Show Entries drop list This drop list allows the user to display 10, 25, 50,or 100 rows. Changing the setting will update the“Showing…” label.

Pagination buttons Previous, next, and page# navigation. Allowsusers to quickly page through large data sets.Uses ‘Show Entries’ setting and updates the“Showing…” label.

Page 35: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   31

Chapter 8. Graphs Plugin

The Graphs Plugin will only be available if a graphdef file is specified in DLProf and thetfevents.<xxx>.<yyy>.pb event file is created. This is a modified version of the TensorBoard'sdefault graph plugin that highlights TensorCore usage and eligibility within the graph. Thisplugin is only available in the NGC containers.

8.1.  Network View PanelThis view visualizes your neural network in the form of connected nodes. Graphs are often verylarge, so you can manipulate the graph visualization:

‣ Scroll to zoom in and out

‣ Drag to pan

‣ Double clicking toggles node expansion (a node can be a container for other nodes)

Page 36: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

Graphs Plugin

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   32

You can also see metadata by clicking on a node. This allows you to see inputs, outputs,shapes and other details.

8.2.  Guidance PanelWhen the graph is in view without a node selected, a Guidance Panel is displayed. The textinside the panel provides guidance on how to improve Tensor Core usage by giving a fewpractices.

8.3.  Tensor Core Help PanelWhen an op node is selected in the network that is eligible to run tensor cores but does not, ahelp panel appears. It provides an explanation as to why Tensor Cores may not be used on thisnode. The help panel also has helpful links to help the user with additional information andresources.

8.4.  Navigation PanelThe elements in this panel enable you to view Tensor Core eligibility, and search for nodes ona graph.

Search Nodes

When you enter the name of a node in the text box, TensorBoard displays a list of matchingnodes in a list box, located directly below the text box. Click a node to zoom-in to the node, andselect it in the graph.

‣ Regular expressions are supported.

‣ The list will disappear if you enter a node that does not exist in the graph.

‣ To remove the list, delete the text in the text box.

Tensor Core Compatibility radio button

When selected, all nodes in the graph will show compatibility with Tensor Cores. The nodeswill be color-coded to visually see each node’s compatibility. For details, see the Legend itembelow.

Legend

All nodes in the graph are color coded:

‣ Green - Node uses Tensor Cores

‣ Red - Node is compatible with Tensor Cores but is not running Tensor Cores

‣ Blue - Node is not compatible with Tensor Cores

Page 37: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

Graphs Plugin

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   33

An op node is colored green if any of its kernels use a Tensor Core. A group node is coloredgreen if any of the contained op nodes use a Tensor Core.

Fit to Screen

Click Fit to Screen to center the graph of the browser window.

Page 38: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

DLProf Plugin for TensorBoard DU-09833-002 _v20.06   |   34

Chapter 9. XLA View

The Graphs Plugin can also display results from an XLA-enabled profiling. Below is the graphview of the ResNet50 v1.5 profiling with XLA optimizations enabled. The network shown here isthe same network, just without XLA optimizations enabled.

Page 39: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

Notice

This document is provided for information purposes only and shall not be regarded as a warranty of a certain functionality, condition, or quality of a product. NVIDIACorporation (“NVIDIA”) makes no representations or warranties, expressed or implied, as to the accuracy or completeness of the information contained in thisdocument and assumes no responsibility for any errors contained herein. NVIDIA shall have no liability for the consequences or use of such information or for anyinfringement of patents or other rights of third parties that may result from its use. This document is not a commitment to develop, release, or deliver any Material(defined below), code, or functionality.

NVIDIA reserves the right to make corrections, modifications, enhancements, improvements, and any other changes to this document, at any time without notice.

Customer should obtain the latest relevant information before placing orders and should verify that such information is current and complete.

NVIDIA products are sold subject to the NVIDIA standard terms and conditions of sale supplied at the time of order acknowledgement, unless otherwise agreedin an individual sales agreement signed by authorized representatives of NVIDIA and customer (“Terms of Sale”). NVIDIA hereby expressly objects to applying anycustomer general terms and conditions with regards to the purchase of the NVIDIA product referenced in this document. No contractual obligations are formedeither directly or indirectly by this document.

NVIDIA products are not designed, authorized, or warranted to be suitable for use in medical, military, aircraft, space, or life support equipment, nor in applicationswhere failure or malfunction of the NVIDIA product can reasonably be expected to result in personal injury, death, or property or environmental damage. NVIDIAaccepts no liability for inclusion and/or use of NVIDIA products in such equipment or applications and therefore such inclusion and/or use is at customer’s own risk.

NVIDIA makes no representation or warranty that products based on this document will be suitable for any specified use. Testing of all parameters of each productis not necessarily performed by NVIDIA. It is customer’s sole responsibility to evaluate and determine the applicability of any information contained in this document,ensure the product is suitable and fit for the application planned by customer, and perform the necessary testing for the application in order to avoid a default ofthe application or the product. Weaknesses in customer’s product designs may affect the quality and reliability of the NVIDIA product and may result in additionalor different conditions and/or requirements beyond those contained in this document. NVIDIA accepts no liability related to any default, damage, costs, or problemwhich may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this document or (ii) customer product designs.

No license, either expressed or implied, is granted under any NVIDIA patent right, copyright, or other NVIDIA intellectual property right under this document.Information published by NVIDIA regarding third-party products or services does not constitute a license from NVIDIA to use such products or services or a warrantyor endorsement thereof. Use of such information may require a license from a third party under the patents or other intellectual property rights of the third party,or a license from NVIDIA under the patents or other intellectual property rights of NVIDIA.

Reproduction of information in this document is permissible only if approved in advance by NVIDIA in writing, reproduced without alteration and in full compliancewith all applicable export laws and regulations, and accompanied by all associated conditions, limitations, and notices.

THIS DOCUMENT AND ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS(TOGETHER AND SEPARATELY, “MATERIALS”) ARE BEING PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OROTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, ANDFITNESS FOR A PARTICULAR PURPOSE. TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL NVIDIA BE LIABLE FOR ANY DAMAGES, INCLUDINGWITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OFTHE THEORY OF LIABILITY, ARISING OUT OF ANY USE OF THIS DOCUMENT, EVEN IF NVIDIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.Notwithstanding any damages that customer might incur for any reason whatsoever, NVIDIA’s aggregate and cumulative liability towards customer for the productsdescribed herein shall be limited in accordance with the Terms of Sale for the product.

VESA DisplayPort

DisplayPort and DisplayPort Compliance Logo, DisplayPort Compliance Logo for Dual-mode Sources, and DisplayPort Compliance Logo for Active Cables aretrademarks owned by the Video Electronics Standards Association in the United States and other countries.

OpenCL

OpenCL is a trademark of Apple Inc. used under license to the Khronos Group Inc.

NVIDIA Corporation  |  2788 San Tomas Expressway, Santa Clara, CA 95051http://www.nvidia.com

Page 40: DLProf Plugin for TensorBoard · This panel displays common issues detected in the profiled network and provides potential solutions and suggestions to address the issues. The panel

Trademarks

NVIDIA, the NVIDIA logo, and cuBLAS, CUDA, cuDNN, DALI, DIGITS, DGX, DGX-1, DGX-2, DGX Station, DLProf, Jetson, Kepler, Maxwell, NCCL, Nsight Compute,Nsight Systems, NvCaffe, NVIDIA Ampere GPU Architecture, PerfWorks, Pascal, SDK Manager, Tegra, TensorRT, Triton Inference Server, Tesla, TF-TRT, and Voltaare trademarks and/or registered trademarks of NVIDIA Corporation in the U.S. and other countries. Other company and product names may be trademarks of therespective companies with which they are associated.

Copyright© 2020-2020 NVIDIA Corporation. All rights reserved.

NVIDIA Corporation  |  2788 San Tomas Expressway, Santa Clara, CA 95051http://www.nvidia.com