2
OUR TEAM
Enable GPUs in the container ecosystem:
• Monitoring
• Orchestration
• Images
• Runtime
• OS
Core Container Technologies
3
CHALLENGESA Typical Cluster
Ubuntu 14.04Drivers 3674x Maxwell
CentOS 7Drivers 3614x Kepler
Ubuntu 16.04Drivers 3758x Pascal
CUDA 7.5
CUDA 7.0cuDNN 4
CUDA 7.5cuDNN 6
CUDA 8.0
Patches
4
CONTAINERS
Portable and reproducible builds
Ease of deployment
Isolation of resources
Run across heterogeneous CUDA toolkit environments (sharing the host driver)
Bare Metal Performance
Facilitate collaboration
To the rescue
9
DOCKERHUB IMAGES
CUDA 8.0 runtime
CUDA 8.0 runtime
cuDNN v5 runtime
CUDA 8.0 devel
cuDNN v5 runtime
CUDA 8.0 devel
cuDNN v5 devel
NVIDIA/Caffe0.15.13
TensorFlowDIGITS
5.0
Ubuntu14.04
Ubuntu16.04
CNTK
cuDNN v5devel
cuDNN v6devel
PyTorch
Multiple flavors
11
NVIDIA-DOCKER 1.0Internals
nvidia-docker
dockerdockerd
nvidia-docker-plugin
http+unix
http+unix
http
GPU information
cuda+nvmlnvidia driver
container process
$ NV_GPU=0 nvidia-docker run -ti nvidia/cuda
12
LIMITS OF NVIDIA-DOCKER 1.0
Only for Docker CLI
Docker plugins are difficult to manage
Not extensible (OpenGL, Vulkan, InfiniBand, KVM, etc.)
Challenging to support new architectures (Power, ARM)
Difficult to integrate into the container ecosystem
Limited scope
14
WHAT’S A CONTAINER
Init system
Namespaces
Cgroups
BPF
LSM
Netfilter
Netlink
Capabilities
Seccomp
UnionFS
KVM
...
Kernel building blocks
15
LIBNVIDIA-CONTAINERgithub.com/NVIDIA/libnvidia-container
Integrates with the container internals
Agnostic of the container runtime
Drop-in GPU support for runtime developers
Better stability, follows driver releases
Brings features seamlessly (Graphics, Display, Exclusive mode, VM, etc.)
16
NVIDIA-DOCKER 2.0 Internals
dockerdlibnvidia-container
http(s)(+unix)
cuda+nvml
nvidia driverdocker-containerd
+ shim
nvidia-runc nvidia-oci-runtime
grpc+unix
container process
docker
$ docker run -ti -e NVIDIA_VISIBLE_DEVICES=0 --runtime=nvidia nvidia/cuda
18
CONTAINER FUTURE
nvidia-docker 2.0 release
Multi-arch support (Power, ARM)
Support other container runtimes (LXC/LXD, Rkt)
Additional Docker images
Additional features (OpenGL, Vulkan, InfiniBand, KVM, etc.)
Support for GPU monitoring (cAdvisor)
Enable GPUs everywhere