sebastien domine, may 2017 -...
TRANSCRIPT
Sebastien Domine, May 2017
S7519: DEVELOPER TOOLS FOR AUTOMOTIVE, DRONES AND INTELLIGENT CAMERA APPLICATIONS
2
AGENDA
Some Context
Development Flows and Challenges
Hardware and Software Topologies
Soul Use Cases
Developer Tools Support
Conclusion and Q&A
3
INTELLIGENT SYSTEMSAI at the Edge
Industrial InspectionSearch and Rescue
Package DeliveryFactory AutomationEnterprise Collaboration Public Safety
Personal Assist
Service Robotics
Portable Medical Self Driving Car Driver Assistance
4
CHARACTERISTICS
Smart Computer with Machine Learning capabilities – training and/or inference
Real-Time constrains
Multiple sensors
Networked
Power limits
What is common to Automotive, Drones and IVA solutions
5
TYPICAL TASKS
Object Detection Feature Detection Localization Path Planning
Real Time
6
EMBEDDED SOFTWARE DEVELOPMENT WORKFLOW
Software Development
Toolchain Setup
Cross-compilation
Porting
Debugging
CPU/GPU
Remote
Debugging
Profiling
System/CPU/GPU/IO/…
Remote
Profiling
Running
Ship it!
DriveInstall
JetPack
Nsight EE
Eclipse
Tegra/Linux
Graphics
Debugger
Tegra/Linux
Graphics
Debugger
CUDA Visual Profiler
Tegra
System Profiler
Cuda-gdb
PerfWorks
nvprof
CUPTI
Cuda-memcheck
Nsight EE
Desktop
Tools
7NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
GETTING STARTED…
Jump starts developing for Embedded platforms
Installs Linux ARM cross-compilation tool chain
Installs Developer tools, CUDA, Libraries,…
Flashes Drive PX, Jetson OS Images
Reference documentation and samples
Compiles code samples, pushes them to devkit
And Runs one sample…
JetPack Installer For Jetson and DriveInstall For DRIVE
8
NVIDIA® NSIGHT™Homogeneous application development for
CPU+GPU compute platforms
CUDA-Aware Editor CUDA Debugger
CPU+GPU
CUDA Profiler
9
NSIGHT ECLIPSE EDITION NEXT-GEN
• True plug-in to Eclipse
• CUDA-GDB upgrade to GDB 7.12 source base
• Edit, build, debug and profile CUDA-C applications
• CUDA aware source code editor – syntax highlighting, code completion and inline help
• Debugger - Seamless and simultaneous debugging of both CPU and GPU code
• Profiler integration – Launch visual profiler as an external application with the CUDA application built in this IDE to easily identify performance bottlenecks
Shipping with CUDA 9.0
10
ECLIPSE INTEGRATION
• Required Eclipse version 4.4 or 4.5
• Developed based on Eclipse CDT/DSF framework.
• Using Eclipse remote system explorer(RSE) plugins to connect to the remote devices.
• Nsight EE plugins are bundled as an archive file(zip) and can be installed using standard Eclipse plugins install dialog.
• The dependent plugins (CDT/RSE) will be automatically installed.
• It can coexist with other eclipse plugins in the user environment.
Plugins can be installed on any standard eclipse
11
Visual Profiler
Trace CUDA activities
Profile CUDA kernels
Correlate performance instrumentation with source code
Expert-guided performance analysis
NVPROF
Collect performance events and metrics
GPU Library Advisor
Detect CUDA library optimization opportunities
NVDISASM, CUOBJDUMP
CUDA-MEMCHECK
Detect out-of-bounds memory accesses
Detect race condition in memory accesses
Detect uninitialized variable accesses
Detect incorrect GPU thread synchronization
CUDA-GDB
Debug CUDA kernels with CLI
Debug CPU and GPU code
CPU and GPU core dump support
CUDA STANDALONE TOOLS
12
NVIDIA JETSON TX2
Memory
Storage
Wifi
Jetson TX2USB
HDMI
A57 A57 A57 A57
Denver Denver
Pascal iGPU
TX2
Jetson TX2 Developer Kit
GB/E
CSI
CSI
Video Dec/Enc
13
Deep Learning
TensorRT
cuDNN
Computer Vision
VisionWorks
OpenCV
Graphics Media
Multimedia API
Vulkan
OpenGL
CUDA
JETSON SOFTWARE
libargus
Video API
Linux4Tegra (Ubuntu 16.04), ROS Support
CUDA Accelerated libraries
14
IVA APPLICATIONSample of Complex Application
raw vidVid
conv
Vid
conv
Vid
conv
TensorRT
Classifier
Tracking
bbox AnalyticsVid
conv
TensorRT
Attribute
Detector
OSD
TensorRT
Attribute
Detector
TensorRT
Attribute
Detector
Display
15
ENHANCED TOOLS EXPERIENCE
Application source code decoration and instrumentation
• Highlight execution phases, mark resource utilization
• Visualize in all NVIDIA Developer Tools
Features:
• Markers, nested ranges, and resource naming
• Color, payload, and text
NVIDIA Tools eXtension (NVTX)
nvtxRangePushA("Compute Work");
nvtxRangePushA("Sobel");
…
nvtxRangePop();
nvtxRangePushA(“CubeGen");
…
nvtxRangePop();
nvtxRangePop();
nvtxRangeId_t rid_A = nvtx::RangeStart(nvtx::Attributes()
.category(CATEGORY_CUDA_MEMORY)
.color(COLOR_RED).message(“A”));
cuMemAlloc(&d_A, mem_size_A);
…
cuMemFree(d_A);
nvtx::RangeEnd(rid_A);
16
TEGRA SYSTEM PROFILER
Visualize multi-core CPU and GPU activities w/ timeline view
Visualize thread state
Thread core migration
Time range filtering
Trace CUDA & OpenGL/ES API calls
Trace GPU compute & graphics workloads
NVIDIA Tools eXtension (NVTX) support
Multi-core CPU profiler and System Trace
17
DEMO IVA APP
- CPU utilization
- Thread /Core affinity migration
- NVTX
- CUDA API and workload trace w/ correlation
- OpenGL API and workload trace w/ correlation
- Gpu process trace
18
CPU UTILIZATION
CPU Core Utilization
Thread Utilization
Core Occupancy
Thread State
19
BLOCKED STATE BACKTRACE
Diagnose issues with blocking
calls, sched_yield, sleep, etc.
Including poor GPU API usage!!!
20
NVTX
21
SYSTEM TRACECUDA & OPENGL & NVTX Trace timeline
Graphics API calls
CPU CUDA API
invocations
GPU CUDA events
NVTX API
22
CALL-STACK SAMPLING
Hot functions filtered
by timeline range
23
GPU CONTEXT SWITCH
24
DRIVE AUTOCRUISE
A57 A57 A57 A57
Denver Denver
Pascal iGPU
TX2
Video Dec/Enc
Memory
Storage
Wifi
USB
GB/E
CSI
CSI
CAN
25
NVIDIA AUTOCHAUFFEUR
A57 A57 A57 A57
Denver Denver
Pascal iGPU
TX2
Video Dec/Enc
A57 A57 A57 A57
Denver Denver
Pascal iGPU
TX2
Video Dec/Enc
USB
GB/ECSICSI CAN
Aurix
Pascal dGPU Pascal dGPU
26
DRIVE Hypervisor
DRIVE SOFTWARE
DRIVE Linux(Ubuntu 16.04) Guest OSes
Deep Learning
TensorRT
cuDNN
Computer Vision
VisionWorks
Graphics Media and Sensor
DriveWorks
OpenGL-ES
OpenGL / Vulkan
CUDA
NVMEDIA
CUDA Accelerated libraries
27
NVIDIA HYPERVISOR ARCHITECTURE
Tegra™ Hardware (ARM, GPU & SoC Peripherals)
DRIVE Hypervisor
Hypervisor
Reso
urc
e M
anager
Serv
er
I/O
Serv
er
Part
itio
n M
onit
or
Guest OS 0 Guest OS 1 Guest OS 2
Earl
y B
oot
Part
itio
n
28
MULTI-OS SYSTEM ARCHITECTURE
QNX RTOS for cluster & HUD
Linux with Genivi for IVI
Linux with Co-pilot
Android for application sandboxing
Foundation type-1 hypervisor
& services
Sandbox
Foundation – DRIVE Hypervisor
ClusterFoundation- Secure boot loader
- Trusted Execution
Environment
- Secure partition
Loader
- Monitor partition
Co-Pilot
29
DRIVER ASSISTANCE“Co-Pilot / KITT”
raw
vid
Vid
convTensorRT
Head pose
GRID Objects
CAN
GPS
Risk
Assessment
ModuleUI
TensorRT
FaceID
TensorRT
Gaze
TensorRT
Eye
Openness
TensorRT
Lip reading
Speech
Engine
Navraw
vid
Vid
conv
30
Sensor Fusion
AUTONOMOUS DRIVING“RoadRunner”
Vid
conv
TensorRT
Lane
Detection
raw
vid
Vid
conv
TensorRT
Object
Detection
raw
vid
Path
PlanningLocalization
HD Mapping
Vehicle State I/O
Prediction
Engine
Driver Assistance
Car Control
System
Actuators
Sensor
Data
Filtering
point
cloud
Tracking
31
DEMO
- Multi-process
- Multi-OS
- Multi-node
- Discreet Pascal GPU and integrated Pascal GPU
- Hypervisor event trace
DrivePX2 - RoadRunner and Co-Pilot / IVI
32
MULTI-PROCESS TRACEAll Processes running
during the capture
Low-impact thread and
processes are filtered out
Kernel=Red
Requires Root
Kernel=Red
Requires Root
33
MULTI-OSLINUX+LINUX on 1 Tegra SoC
1 OS with 4 CPU Cores
1 OS with 2 CPU Cores
34
MULTI-NODE
1 OS per SoC1 OS per SoC
35
MULTI-GPUiGPU and dGPU
1 Process using 2 GPUs
GPU Process Trace
dGPU and iGPU
36
TEGRA SYSTEM PROFILER NEXTHypervisor Event Trace
37
TEGRA GRAPHICS DEBUGGERNext-gen graphics development tools
Supports OpenGL ES 2.0/3.0/3.1/3.2 + Android Extension Pack, OpenGL 4.x
Monitor key software and hardware performance metrics
Debug draw calls, related states and resources
Live capture of a single rendering frame
Automatic GPU bottleneck analysis
38
TEGRA GRAPHICS DEBUGGER
Performance Monitor
Range Profiler
Automated bottleneck analysis
Shader performance analysis
Offline perf simulation
Dynamic Shader editor
Advanced & Targeted GPU ProfilingSelect section of
interest based on scene
ranges, render targets
used, etc.Overview of selected
range, including time
spent, call count, etc.
Show efficiency
of pipeline units’
usage
Break down
memory subsystem
utilization
39
FUTURE DEVELOPER TOOLS
Additional improvement and unification for the out-of-box experience
Ubuntu 16.04 as a host OS
Better cross-compilation support
Hypervisor developer tools support
More HW units to be traced
More consistency of the developer tools offering across Desktop and Tegra
Vulkan support
40
CONCLUSION
Complete Developer Tools offering for Application Development on Heterogeneous Platform
Extensive coverage for devkit topologies, HW units and SW stack in system trace
Developer Tools support for soul use cases for each platform
41
Q&A