hpc in containers - nvidiaon-demand.gputechconf.com/...hpc-in-containers-why-containers-wh… ·...
TRANSCRIPT
CJ Newburn, HPC Architect, NVIDIA Compute SW
Principal Engineer
HPC IN CONTAINERS:WHY CONTAINERS, WHY HPC, WHY NVIDIAGTC’18, S8642, Monday March 26, 11am
2GTC’18: HPC Containers
OUTLINE
• Motivation
• What NVIDIA is doing
• Collaborations
• Requested feedback
• Call to action
3GTC’18: HPC Containers
WHY CONTAINERS: MOTIVATIONAL STORIES
• Hard to configure and install HPC apps
• App updates get delayed
• Lack of a reference design
• Many variants, some better than others
• Experimental/simulation hybrid molecular modeling as a service
• Will a given app run on a new platform?
• Better startup times with fewer libs loaded from bottlenecked metadata servers
• Encapsulating pipelines reduces complexity
War stories from the trenches
4GTC’18: HPC Containers
RUNNING A GPU APPLICATIONCustomer Pain Points
RHEL 7.3CUDA 8.0Driver 3754x PascalPython 2.7
Ubuntu 16.04CUDA 9.0Driver 3844x VoltaPython 3.5
▪ “This framework requires installing 6 dependencies from sources”
▪ “I want to train my model on the cluster but it’s running RHEL 7”
▪ “Some machines in the cluster have different NVIDIA hardware & drivers”
▪ “How do I deploy a DL model/application at scale”
DL Application
5GTC’18: HPC Containers
EXPERIMENTATION+MODELING
• Experimenters
• Run equipment to collect raw data
• Challenge: what’s signal vs. noise?
• Scientists who don’t do code or SW administration
• Augmenting with modeling
• Model helps filter out noise → more accurate with less processing time
• Provide container, e.g. NAMD on 1 GPU in a few hours
HPC modeling as a service
6GTC’18: HPC Containers
EASING THE TRANSITION TO IMPROVED SYSTEMSTry before you buy, on your own workload
Legacy
Cloud
Latest GPUs
7GTC’18: HPC Containers
TRIMMING LIB DEPENDENCES VIA CONTAINERS
• Size of dependent libraries can become huge
• SquashFS @ 4x can make fit in RAMdisk for faster access
• Metadata server I/O can become bottleneck, e.g. with 20 job groups
• Trim away shared libs and Python include searches
• Fix/patch to merge data locally and move to Lustre at the end of the job avoids conflicts
• RAMdisk access improvements can greatly reduce startup time, even with copy
• Relevant example
• ATLAS (CERN) simulations on Titan, courtesy of Sergey Panitkin of BNL
• Container build defines mount points, installs special versions with perf optimizations
• Optimizing for size and using RAMdisk halved setup time, reduced runtime by >2 minutes (9%)
Container is a good fit for applying special optimization steps
Background info for this use case courtesy of Adam Simpson, ORNL
8GTC’18: HPC Containers
PIPELINE EXAMPLE
• Consider a pipeline of many processes
• Each could have its own dependences, require its own set up
• But each stage or the whole set of stages could be containerized
• Some relevant work: snakemake, SCI-F by Vanessa Sochat, Stanford: “The Scientific Filesystem,” Containers in HPC Symposium at UCAR, Boulder CO, https://sea.ucar.edu/conference/2018/containers.
Moving toward HPC as a service vs. becoming an app mechanic
index
map
sort
index
report
9GTC’18: HPC Containers
WHY HIGH-PERFORMANCE COMPUTING
• Performance can depend on
• Tuning – discover and apply best-known methods
• Getting the latest version
• We are making a transition from “HPC for experts” to “HPC for the masses”
• Breadth of adoption may strongly depend on ease of use
• The time is ripe!
We in HPC care about performance; democratizing HPC
10GTC’18: HPC Containers
PROBLEMS ADDRESSED VIA CONTAINERIZATION
• Portability
• Repeatability
• Resource isolation
• New telemetry surface
• Bare metal performance, vs. VMs
• Parameterizability and control over runtime
Making it easier for users, admins and developers
11GTC’18: HPC Containers
DESIGNED FOR GPU-ACCELERATED SYSTEMS
RUN ON PASCAL- & VOLTA-POWERED SYSTEMS
Workstations Supercomputing Clusters Cloud Computing
12GTC’18: HPC Containers
OPENMPI DOCKERFILE VARIANTSReal examples: lots of ways, some better than others
RUN OPENMPI_VERSION=3.0.0 && \wget -q -O - https://www.open-
mpi.org/software/ompi/v3.0/downloads/openmpi-${OPENMPI_VERSION}.tar.gz | tar -xzf - && \
cd openmpi-${OPENMPI_VERSION} && \./configure --enable-orterun-prefix-by-default --with-cuda --
with-verbs \--prefix=/usr/local/mpi --disable-getpwuid && \
make -j"$(nproc)" install && \cd .. && rm -rf openmpi-${OPENMPI_VERSION} && \echo "/usr/local/mpi/lib" >> /etc/ld.so.conf.d/openmpi.conf &&
ldconfigENV PATH /usr/local/mpi/bin:$PATH
WORKDIR /tmpADD http://www.open-mpi.org//software/ompi/v1.10/downloads/openmpi-1.10.7.tar.gz /tmpRUN tar -xzf openmpi-1.10.7.tar.gz && \
cd openmpi-*&& ./configure --with-cuda=/usr/local/cuda \--enable-mpi-cxx --prefix=/usr && \make -j 32 && make install && cd /tmp \&& rm -rf openmpi-*
RUN mkdir /logsRUN wget -nv https://www.open-mpi.org/software/ompi/v1.10/downloads/openmpi-1.10.7.tar.gz && \
tar -xzf openmpi-1.10.7.tar.gz && \cd openmpi-*&& ./configure --with-cuda=/usr/local/cuda \--enable-mpi-cxx --prefix=/usr 2>&1 | tee /logs/openmpi_config
&& \make -j 32 2>&1 | tee /logs/openmpi_make && make install 2>&1
| tee /logs/openmpi_install && cd /tmp \&& rm -rf openmpi-*
RUN apt-get update \&& apt-get install -y --no-install-recommends \libopenmpi-dev \openmpi-bin \openmpi-common \
&& rm -rf /var/lib/apt/lists/*ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/openmpi/lib
RUN wget -q -O - https://www.open-mpi.org/software/ompi/v3.0/downloads/openmpi-3.0.0.tar.bz2 | tar -xjf - && \
cd openmpi-3.0.0 && \CXX=pgc++ CC=pgcc FC=pgfortran F77=pgfortran ./configure --
prefix=/usr/local/openmpi --with-cuda=/usr/local/cuda --with-verbs --disable-getpwuid && \
make -j4 install && \rm -rf /openmpi-3.0.0
COPY openmpi /usr/local/openmpiWORKDIR /usr/local/openmpiRUN /bin/bash -c "source /opt/pgi/LICENSE.txt && CC=pgcc CXX=pgc++ F77=pgf77 FC=pgf90 ./configure --with-cuda --prefix=/usr/local/openmpi”RUN /bin/bash -c "source /opt/pgi/LICENSE.txt && make all install"
Functional, simpler, but
not CUDA or IB aware
Enable many versions
with parameters to
common interface
Different compilers
Bad layering
Control environment
Parameters vary
13GTC’18: HPC Containers
WHAT NVIDIA IS DOING
• Enabling
• Offerings
• Technology collaboration
14GTC’18: HPC Containers
SCOPE OF ENABLING PLANS
• Ecosystem: nurture a collaborative ecosystem around HPC containers
• Registry: host containerized applications, CUDA base containers
• Ingredients, recipes: Recommend and validate best practices
• HPC Containers: easily derive application containers from these
• Container technologies: GPU enabled
• System SW: OS, container runtime, scheduler is GPU enabled
• Recommended platforms: known-good solutions for HPC apps
Better, more up-to-date results with less effort
15GTC’18: HPC Containers
MAKING IT EASIER WITH HPC CONTAINERS
• NVIDIA has experience collaborating with developers to containerize HPC apps
• Identifying, improving, creating ingredients
• Developing and optimizing recipes
• Codify those learnings
• Dockerfiles and other recipe files with tuned steps for each recommended ingredient
• Careful layering, for the sake of minimizing size, maximizing cacheability
• Validated combinations in specific HPC base containers from which app containers derived
• Recipes for building platforms – container runtime, scheduler, OS, system
• Consistent approach to documentation
Potentially easier for non-expert end users
16GTC’18: HPC Containers
RAPID USER ADOPTION
HPC APPS CONTAINERS ON NVIDIA GPU CLOUD
RAPID CONTAINER ADDITION
GAMESSCHROMA*CANDLE GROMACS LAMMPS
NAMD RELIONLattice Microbes MILC*
*Coming soon
17GTC’18: HPC Containers
U CLOUD FOR HPC VISUALIZATION
ParaView with NVIDIA OptiX
ParaView with NVIDIA Holodeck
ParaView with NVIDIA IndeX
NVIDIA GPU CLOUD FOR HPC VISUALIZATION
VMDIndeX
18
NVIDIA CONTAINER RUNTIMEEnables GPU support in popular container runtimes
▶ NVIDIA-Docker makes GPU containers truly portable
▶ Integrates Linux container internals instead of wrapping specific runtimes (e.g. Docker)
▶ Better integration into the container ecosystem - Kubernetes (CRI), HPC (rootless)
▶ 2M downloads
NVML
NVIDIA Driver
CUDALibnvidia-container
Nvidia-container-runtimeContainer Runtime
OCI Runtime Interface
Docker
CaffeNAMD
Tensor
Flow
MILCContainerized Applications
LXC, CRIO
etc
19
KUBERNETES ON NVIDIA GPUs
▶ GPU enhancements to mainline Kubernetes -get features faster than community releases
▶ Updated with each release of K8s (current version is v1.9) and close collaboration with community to upstream changes
▶ Minimize friction to adoption of Kubernetes on GPUs
▶ Fully open-source
NVIDIA CONTAINER RUNTIME
KUBERNETES
NVIDIA DRIVER
20GTC’18: HPC Containers
HPC CONTAINER MAKER - HPCCM
• Collect and codify best practices
• Make recipe file creation easy, repeatable, modular, qualifiable
• Using this as a reference and a vehicle to drive collaboration
• Container implementation neutral
• Write Python code that calls primitives and building blocks vs. roll your own
• Leverage latest and greatest building blocks
“h-p-see-um”
21GTC’18: HPC Containers
BIG PICTURE
Dockerfile
Singularity
recipe file
Singularity
image
Docker image
$ singularity build …
docker2singularity
$ docker build …
Buildkit, buildah, …reference hpccm
input recipe
hpccm
CLI tool
HPC Container Maker: hpccm
Python file with references to primitives
and parameterized building blocks
Script that transforms into container recipe file,
using primitive and building block implementations
container spec file container build container image
Base
images
Recipe
implementations
22GTC’18: HPC Containers
HPCCM CONCEPTS AND TERMINOLOGY
• hpccm input recipe file: what hpccm ingests
• container recipe file: what hpccm produces, e.g. Dockerfile, Singularity recipe file
• primitive: line in hpccm-recipe file that has a 1:1 mapping with primitive implementation line in container recipe file
• building block: line in hpccm-recipe file with a 1:many primitive mapping; mapping is codified in hpccm implementation; these are parameterized
• recipe implementations: collection of implementations of primitives and building blocks
23GTC’18: HPC Containers
RECIPES INCLUDED WITH CONTAINER MAKER
• HPC base recipe with GNU compilers
• Ubuntu 16.04
• CUDA 9.0
• Python 2 and 3
• GNU compilers (upstream)
• Mellanox OFED 3.4-1.0.0.0
• OpenMPI 3.0.0
• FFTW 3.3.7
• HDF5 1.10.1
• HPC base recipe with PGI compilers
• Ubuntu 16.04
• CUDA 9.0
• Python 2 and 3
• PGI compilers 17.10
• Mellanox OFED 3.4-1.0.0.0
• OpenMPI 3.0.0
• FFTW 3.3.7
• HDF5 1.10.1
Shown in current build order
HPC application samples coming
24GTC’18: HPC Containers
BUILDING AN HPC APPLICATION IMAGE
1. Use the HPC base image as your starting point
2. Generate a Dockerfile from the HPC base recipe Dockerfile and manually edit it to add the steps to build your application
3. Copy the HPC base recipe file and add your application build steps to the recipe
Analogous workflows for Singularity
Base recipe Dockerfile Base image App Dockerfile
Base recipe Dockerfile App Dockerfile
Base recipe App recipe
25GTC’18: HPC Containers
HIGHER LEVEL ABSTRACTIONBuilding block encapsulates simplified best practices, avoids duplication
# OpenMPI version 3.0.0RUN apt-get update -y && \
apt-get install -y --no-install-recommends \file \hwloc && \
rm -rf /var/lib/apt/lists/*RUN mkdir -p /tmp && wget -q --no-check-certificate -P /tmp https://www.open-mpi.org/software/ompi/v3.0/downloads/openmpi-3.0.0.tar.bz2 && \
tar -x -f /tmp/openmpi-3.0.0.tar.bz2 -C /tmp -j && \cd /tmp/openmpi-3.0.0 && ./configure --prefix=/usr/local/openmpi --disable-getpwuid --with-cuda --without-verbs && \make -j4 && \make -j4 install && \rm -rf /tmp/openmpi-3.0.0.tar.bz2 /tmp/openmpi-3.0.0
ENV PATH=/usr/local/openmpi/bin:$PATH \LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH
ompi = openmpi(version='3.0.0', toolchain=tc)Stage0 += ompi
26GTC’18: HPC Containers
"""HPC Base imageContents:
CUDA version 9.0FFTW version 3.3.7GNU compilers (upstream)HDF5 version 1.10.1Mellanox OFED version 3.4-1.0.0.0OpenMPI version 3.0.0Python 2 and 3 (upstream)
"""
Stage0 += comment(__doc__, reformat=False)
Stage0 += baseimage(image='nvidia/cuda:9.0-devel', _as='devel')
# Python (use upstream)Stage0 += apt_get(ospackages=['python', 'python3'])
# Compilers (use upstream)Stage0 += apt_get(ospackages=['gcc', 'g++', 'gfortran'])
# Create a toolchaintc = hpccm.toolchain(CC='gcc', CXX='g++', F77='gfortran', F90='gfortran',
FC='gfortran', CUDA_HOME='/usr/local/cuda')
# Mellanox OFEDofed = mlnx_ofed(version='3.4-1.0.0.0')Stage0 += ofed
# OpenMPIompi = openmpi(version='3.0.0', toolchain=tc)Stage0 += ompi
# FFTWfftw = fftw(version='3.3.7', toolchain=tc)Stage0 += fftw
# HDF5hdf5 = hdf5(version='1.10.1', toolchain=tc)Stage0 += hdf5
## HPC Base image## Contents:# CUDA version 9.0# FFTW version 3.3.7# GNU compilers (upstream)# HDF5 version 1.10.1# Mellanox OFED version 3.4-1.0.0.0# OpenMPI version 3.0.0# Python 2 and 3 (upstream)#
FROM nvidia/cuda:9.0-devel AS devel
RUN apt-get update -y && \apt-get install -y --no-install-recommends \
python \python3 && \
rm -rf /var/lib/apt/lists/*
RUN apt-get update -y && \apt-get install -y --no-install-recommends \
gcc \g++ \gfortran && \
rm -rf /var/lib/apt/lists/*
# Mellanox OFED version 3.4-1.0.0.0RUN apt-get update -y && \
apt-get install -y --no-install-recommends \libnl-3-200 \libnl-route-3-200 \libnuma1 \wget && \
rm -rf /var/lib/apt/lists/*RUN mkdir -p /tmp && wget -q --no-check-certificate -P /tmp http://content.mellanox.com/ofed/MLNX_OFED-3.4-1.0.0.0/MLNX_OFED_LINUX-3.4-1.0.0.0-ubuntu16.04-x86_64.tgz && \
tar -x -f /tmp/MLNX_OFED_LINUX-3.4-1.0.0.0-ubuntu16.04-x86_64.tgz -C /tmp -z && \dpkg --install /tmp/MLNX_OFED_LINUX-3.4-1.0.0.0-ubuntu16.04-x86_64/DEBS/libibverbs1_*_amd64.deb && \dpkg --install /tmp/MLNX_OFED_LINUX-3.4-1.0.0.0-ubuntu16.04-x86_64/DEBS/libibverbs-dev_*_amd64.deb && \dpkg --install /tmp/MLNX_OFED_LINUX-3.4-1.0.0.0-ubuntu16.04-x86_64/DEBS/libmlx5-1_*_amd64.deb && \dpkg --install /tmp/MLNX_OFED_LINUX-3.4-1.0.0.0-ubuntu16.04-x86_64/DEBS/ibverbs-utils_*_amd64.deb && \rm -rf /tmp/MLNX_OFED_LINUX-3.4-1.0.0.0-ubuntu16.04-x86_64.tgz /tmp/MLNX_OFED_LINUX-3.4-1.0.0.0-ubuntu16.04-x86_64
# OpenMPI version 3.0.0RUN apt-get update -y && \
apt-get install -y --no-install-recommends \file \hwloc \openssh-client \wget && \
rm -rf /var/lib/apt/lists/*RUN mkdir -p /tmp && wget -q --no-check-certificate -P /tmp https://www.open-mpi.org/software/ompi/v3.0/downloads/openmpi-3.0.0.tar.bz2 && \
tar -x -f /tmp/openmpi-3.0.0.tar.bz2 -C /tmp -j && \cd /tmp/openmpi-3.0.0 && CC=gcc CXX=g++ F77=gfortran F90=gfortran FC=gfortran ./configure --prefix=/usr/local/openmpi --disable-getpwuid --enable-orterun-
prefix-by-default --with-cuda=/usr/local/cuda --with-verbs && \make -j4 && \make -j4 install && \rm -rf /tmp/openmpi-3.0.0.tar.bz2 /tmp/openmpi-3.0.0
ENV PATH=/usr/local/openmpi/bin:$PATH \LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH
# FFTW version 3.3.7RUN apt-get update -y && \
apt-get install -y --no-install-recommends \file \make \wget && \
rm -rf /var/lib/apt/lists/*RUN mkdir -p /tmp && wget -q --no-check-certificate -P /tmp ftp://ftp.fftw.org/pub/fftw/fftw-3.3.7.tar.gz && \
tar -x -f /tmp/fftw-3.3.7.tar.gz -C /tmp -z && \cd /tmp/fftw-3.3.7 && CC=gcc CXX=g++ F77=gfortran F90=gfortran FC=gfortran ./configure --prefix=/usr/local/fftw --enable-shared --enable-openmp --enable-
threads --enable-sse2 && \make -j4 && \make -j4 install && \rm -rf /tmp/fftw-3.3.7.tar.gz /tmp/fftw-3.3.7
ENV LD_LIBRARY_PATH=/usr/local/fftw/lib:$LD_LIBRARY_PATH
# HDF5 version 1.10.1RUN apt-get update -y && \
apt-get install -y --no-install-recommends \file \make \wget \zlib1g-dev && \
rm -rf /var/lib/apt/lists/*RUN mkdir -p /tmp && wget -q --no-check-certificate -P /tmp http://www.hdfgroup.org/ftp/HDF5/releases/hdf5-1.10/hdf5-1.10.1/src/hdf5-1.10.1.tar.bz2 && \
tar -x -f /tmp/hdf5-1.10.1.tar.bz2 -C /tmp -j && \cd /tmp/hdf5-1.10.1 && CC=gcc CXX=g++ F77=gfortran F90=gfortran FC=gfortran ./configure --prefix=/usr/local/hdf5 --enable-cxx --enable-fortran && \make -j4 && \make -j4 install && \rm -rf /tmp/hdf5-1.10.1.tar.bz2 /tmp/hdf5-1.10.1
ENV PATH=/usr/local/hdf5/bin:$PATH \HDF5_DIR=/usr/local/hdf5 \LD_LIBRARY_PATH=/usr/local/hdf5/lib:$LD_LIBRARY_PATH
27GTC’18: HPC Containers
FROM nvidia/cuda:9.0-devel AS devel
...
# OpenMPI version 3.0.0RUN apt-get update -y && \
apt-get install -y --no-install-recommends \file \hwloc \openssh-client \wget && \
rm -rf /var/lib/apt/lists/*RUN mkdir -p /tmp && wget -q --no-check-certificate -P /tmp https://www.open-mpi.org/software/ompi/v3.0/downloads/openmpi-3.0.0.tar.bz2 && \
tar -x -f /tmp/openmpi-3.0.0.tar.bz2 -C /tmp -j && \cd /tmp/openmpi-3.0.0 && CC=gcc CXX=g++ F77=gfortran F90=gfortran FC=gfortran ./configure --
prefix=/usr/local/openmpi --disable-getpwuid --enable-orterun-prefix-by-default --with-cuda=/usr/local/cuda --with-verbs && \
make -j4 && \make -j4 install && \rm -rf /tmp/openmpi-3.0.0.tar.bz2 /tmp/openmpi-3.0.0
ENV PATH=/usr/local/openmpi/bin:$PATH \LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH
...
FROM nvidia/cuda:9.0-runtime
...
# OpenMPIRUN apt-get update -y && \
apt-get install -y --no-install-recommends \hwloc \openssh-client && \
rm -rf /var/lib/apt/lists/*COPY --from=0 /usr/local/openmpi /usr/local/openmpiENV PATH=/usr/local/openmpi/bin:$PATH \
LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH
...
Stage0 += baseimage(image='nvidia/cuda:9.0-devel', _as='devel’)
...
# OpenMPIompi = openmpi(version='3.0.0', toolchain=tc)Stage0 += ompi
...
####### Runtime image######
Stage1 += baseimage(image='nvidia/cuda:9.0-runtime’)
...
# OpenMPIStage1 += ompi.runtime()
...
MULTI-STAGE BUILDS
# OpenMPIRUN apt-get update -y && \
apt-get install -y --no-install-recommends \hwloc \openssh-client && \
rm -rf /var/lib/apt/lists/*COPY --from=0 /usr/local/openmpi /usr/local/openmpiENV PATH=/usr/local/openmpi/bin:$PATH \
LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH
28GTC’18: HPC Containers
PARAMETERIZED BUILDING BLOCKS
• openmpi(check=False, # run “make check”?configure_opts=[‘—disable-getpwuid’, ‘—enable-orterun-prefix-by-default’], cuda=True, directory=‘’, # path to source in build contextinfiniband=True, ospackages=[‘file’, ‘hwloc’, ‘openssh-client’, ‘wget’],prefix=‘/usr/local/openmpi’, toolchain=toolchain(),version=‘3.0.0’) # version to download
• mlnx_ofed(ospackages=[‘libnl-3-200’, ‘libnl-route-3-200’, ‘libnuma1’, ‘wget’],packages=[‘libibverbs1’, ‘libibverbs-dev’, ‘libmlx5-1’, ‘ibverbs-utils’],version=‘3.4-1.0.0.0’) # version to download
Parameters enable specialization; implementations invoke Python code
Also: apt-get, FFTW, HDF5, Linux OFED, PGI compiler
Full recipe documentation can be found in RECIPES.md
29GTC’18: HPC Containers
CONTAINER IMPLEMENTATION ABSTRACTION
Container Builder primitive Dockerfile Singularity recipe file
baseimage(image=‘ubuntu:16.04’) FROM ubuntu:16.04 Bootstrap: dockerFrom: ubuntu:16.04
shell(commands=[‘a’, ‘b’, ‘c’]) RUN a && \b && \c
%postabc
copy(src=‘a’, dest=‘b’) COPY a b %filesa b
Single source to either Docker or Singularity
30GTC’18: HPC Containers
FULL PROGRAMMING LANGUAGE AVAILABLE
• # get and validate precisionVALID_PRECISION = [‘single’, ‘double’, ‘mixed’]precision = os.environ.get(’LAMMPS_PRECISION’, ‘single’)if precision not in VALID_PRECISION:
raise ValueError(‘Invalid precision’)
• …
• Stage0 += shell(commands=[f‘make –f Makefile.linux.{precision}’, …])
• …
Conditional branching, validation, etc. in hpccm input recipe
Courtesy of Logan Herche
31GTC’18: HPC Containers
HPC runtime
container
ENVISIONED FLOWAccelerating the container creation and usage
NGC GPU clusters
GPU-enabled
technologies
App source
HPC devel
container HPC runtime
container
App test
binary
NVIDIA registry
App
image
Container runtime,
scheduler, OS
ValidatedValidatedValidatedValidated
App final
binary
hpccm
32GTC’18: HPC Containers
V0.5 OUTSIDE-CONTAINER TRADE-OFFS
Ingredient Choices Choice factors
CUDA Version 9.0 Supports Kepler through Volta, highest performance
Container runtimes Docker, LXC, Shifter, Singularity
Docker has best GPU support today.NVIDIA is investing in LXC for rootless
Orchestration & scheduling
SLURM, Kubernetes SLURM widely used in HPC; Kubernetes widely used in cloud
GPU Enablement NVIDIA Container Runtime SDK
OCI compliant, enabled multiple container runtimes, multi-node support
OS Ubuntu 16.04, CentOS 7 Application based choice. Ubuntu has more testing for GPU enabled containers. CentOS uses RPMs.
Situation and environment-based choices
33GTC’18: HPC Containers
SAMPLE DOCKER FILES
• We’re in the process of normalizing our containers wrt these devel and runtime HPC container offerings
• GROMACS
• MILC
34GTC’18: HPC Containers
GROMACS DOCKERFILE PART 1; BUILD STAGEFROM nvidia/cuda:9.0-devel-ubuntu16.04 AS devel
RUN apt-get update -y && \
apt-get install -y --no-install-recommends \
ca-certificates cmake file git hwloc \
libibverbs-dev openssh-client python wget && \
rm -rf /var/lib/apt/lists/*
RUN mkdir -p /tmp && \
wget -q --no-check-certificate -P /tmp https://www.open-
mpi.org/software/ompi/v3.0/downloads/openmpi-3.0.0.tar.bz2 && \
tar -x -f /tmp/openmpi-3.0.0.tar.bz2 -C /tmp -j && \
cd /tmp/openmpi-3.0.0 && \
./configure --prefix=/opt/openmpi --enable-mpi-cxx --with-cuda \
--with-verbs && \
make -j32 && \
make -j32 install && \
rm -rf /tmp/openmpi-3.0.0.tar.bz2 /tmp/openmpi-3.0.0
ENV LD_LIBRARY_PATH=/opt/openmpi/lib:$LD_LIBRARY_PATH \
PATH=/opt/openmpi/bin:$PATH
Initialize
build stage
Install packages
and cleanup
Install OpenMPI
35GTC’18: HPC Containers
GROMACS DOCKERFILE PART 2RUN mkdir -p /gromacs/install && \
mkdir -p /gromacs/builds && \
mkdir -p /tmp && git -C /tmp clone --depth=1 --branch v2018 \
https://github.com/gromacs/gromacs && \
mv /tmp/gromacs /gromacs/src && \
cd /gromacs/builds && \
CC=gcc CXX=g++ cmake /gromacs/src -DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=/gromacs/install \
-DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda \
-DGMX_BUILD_OWN_FFTW=ON -DGMX_GPU=ON -DGMX_MPI=OFF \
-DGMX_OPENMP=ON -DGMX_PREFER_STATIC_LIBS=ON \
-DMPIEXEC_PREFLAGS=--allow-run-as-root \
-DREGRESSIONTEST_DOWNLOAD=ON && \
make -j && \
make install && \
make check
Build
GROMACS
36GTC’18: HPC Containers
GROMACS DOCKERFILE PART 3; RUNTIME STAGEFROM nvidia/cuda:9.0-runtime-ubuntu16.04
RUN apt-get update -y && \
apt-get install -y --no-install-recommends \
hwloc \
libgomp1 \
libibverbs-dev \
openssh-client \
python && \
rm -rf /var/lib/apt/lists/*
COPY --from=devel /opt/openmpi /opt/openmpi
ENV LD_LIBRARY_PATH=/opt/openmpi/lib:$LD_LIBRARY_PATH \
PATH=/opt/openmpi/bin:$PATH
COPY --from=devel /gromacs/install /gromacs/install
ENV PATH=$PATH:/gromacs/install/bin
WORKDIR /workspace
Initialize
release stage
Install packages
and cleanup
Copy OpenMPI
from build
Copy GROMACS
from build
37GTC’18: HPC Containers
MILC DOCKERFILE PART 1; BUILD STAGEFROM nvidia/cuda:9.0-devel-ubuntu16.04 AS devel
RUN apt-get update -y && \
apt-get install -y --no-install-recommends \
autoconf automake ca-certificates cmake dapl2-utils \
file git hwloc ibutils ibverbs-utils \
infiniband-diags libdapl-dev libibcm-dev \
libibmad5 libibverbs-dev libibverbs1 \
libmlx4-1 libmlx4-dev libmlx5-1 libmlx5-dev \
libnuma-dev librdmacm-dev librdmacm1 opensm \
openssh-client rdmacm-utils wget && \
rm -rf /var/lib/apt/lists/*
Initialize build
stage
Install packages
and cleanup
38GTC’18: HPC Containers
MILC DOCKERFILE PART 2RUN mkdir -p /tmp && wget -q --no-check-certificate -P /tmp
https://www.open-mpi.org/software/ompi/v3.0/downloads/openmpi-
3.0.0.tar.bz2 && \
tar -x -f /tmp/openmpi-3.0.0.tar.bz2 -C /tmp -j && \
cd /tmp/openmpi-3.0.0 && \
./configure --prefix=/opt/openmpi --enable-mpi-cxx \
--with-cuda --with-verbs && \
make -j32 && \
make -j32 install && \
rm -rf /tmp/openmpi-3.0.0.tar.bz2 /tmp/openmpi-3.0.0
ENV LD_LIBRARY_PATH=/opt/openmpi/lib:$LD_LIBRARY_PATH \
PATH=/opt/openmpi/bin:$PATH
Build OpenMPI
39GTC’18: HPC Containers
MILC DOCKERFILE PART 3WORKDIR /quda
RUN mkdir -p /tmp && git -C /tmp clone --depth=1 --branch
release/0.8.x https://github.com/lattice/quda && \
mv /tmp/quda /quda/src && \
mkdir -p /quda/build && \
cd /quda/build && \
cmake ../src -DCMAKE_BUILD_TYPE=RELEASE \
-DQUDA_DIRAC_CLOVER=ON -DQUDA_DIRAC_DOMAIN_WALL=ON \
-DQUDA_DIRAC_STAGGERED=ON \
-DQUDA_DIRAC_TWISTED_CLOVER=ON \
-DQUDA_DIRAC_TWISTED_MASS=ON -DQUDA_DIRAC_WILSON=ON \
-DQUDA_FORCE_GAUGE=ON -DQUDA_FORCE_HISQ=ON \
-DQUDA_GPU_ARCH=sm_70 -DQUDA_INTERFACE_MILC=ON \
-DQUDA_INTERFACE_QDP=ON -DQUDA_LINK_HISQ=ON \
-DQUDA_MPI=ON && \
make -j32 && \
rm -rf /quda/src
Build QUDA
40GTC’18: HPC Containers
MILC DOCKERFILE PART 4RUN mkdir -p /tmp && \
git -C /tmp clone --depth=1 https://github.com/milc-qcd/milc_qcd && \
mv /tmp/milc /milc && \
cd /milc/ks_imp_rhmc/ && \
cp /milc/Makefile /milc/ks_imp_rhmc/ && \
sed -i 's/WANTQUDA\(.*\)=.*/WANTQUDA\1= true/g' Makefile && \
sed -i 's/\(WANT_.*_GPU\)\(.*\)= .*/\1\2= true/g' Makefile && \
sed -i 's/QUDA_HOME\(.*\)= .*/QUDA_HOME\1= \/quda\/build/g' Makefile && \
sed -i 's/CUDA_HOME\(.*\)= .*/CUDA_HOME\1= \/usr\/local\/cuda/g' Makefile && \
sed -i 's/#\?MPP = .*/MPP = true/g' Makefile && \
sed -i 's/#\?CC = .*/CC = mpicc/g' Makefile && \
sed -i 's/LD\(\s+\)= .*/LD\1= mpicxx/g' Makefile && \
sed -i 's/PRECISION = \d+/PRECISION = 2/g' Makefile && \
sed -i 's/WANTQIO = .*/WANTQIO = #true or blank. Implies HAVEQMP./g' Makefile && \
sed -i 's/CGEOM =.*-DFIX_NODE_GEOM.*/CGEOM = #-DFIX_NODE_GEOM/g' Makefile && \
C_INCLUDE_PATH=/quda/build/include make su3_rhmd_hisq
Build MILC
41GTC’18: HPC Containers
MILC DOCKERFILE PART 5; RUNTIME STAGEFROM nvidia/cuda:9.0-runtime-ubuntu16.04
RUN apt-get update -y && \
apt-get install -y --no-install-recommends \
hwloc \
libibverbs1 \
libnuma1 \
librdmacm1 \
openssh-client && \
rm -rf /var/lib/apt/lists/*
COPY --from=devel /opt/openmpi /opt/openmpi
ENV LD_LIBRARY_PATH=/opt/openmpi/lib:$LD_LIBRARY_PATH \
PATH=/opt/openmpi/bin:/milc:$PATH
COPY --from=devel /milc/ks_imp_rhmc/su3_rhmd_hisq /milc/su3_rhmd_hisq
COPY examples /workspace/examples
WORKDIR /workspace
Initialize
release stage
Install packages
and cleanup
Copy OpenMPI
from build stage
Copy MILC from
build stage
Copy examples
into container
Multi-node MPI
is enabled
42GTC’18: HPC Containers
MILC HPCCM INPUT RECIPE FILE
Ubuntu 16.04, CUDA 9.0, QUDA, MPI and MILC.
Build with:
nvidia-docker build -t milc .
Run with:
nvidia-docker run -it milc
Header
43GTC’18: HPC Containers
MILC HPCCM INPUT RECIPE FILE, PART1# pylint: disable=invalid-name, undefined-variable, used-before-assignment# pylama: ignore=E0602
gpu_arch = USERARG.get('GPU_ARCH', 'sm_70')
# add docstring to DockerfileStage0 += comment(__doc__.strip(), reformat=False)
################################################################################ Devel stage###############################################################################Stage0.name = 'devel'Stage0 += baseimage(image='nvidia/cuda:9.0-devel-ubuntu16.04', AS=Stage0.name)
Stage0 += apt_get(ospackages=['autoconf', 'automake', 'cmake','git', 'ca-certificates'])
Stage0 += ofed()mpi_prefix = '/opt/openmpi'ompi = openmpi(configure_opts=['--enable-mpi-cxx'], prefix=mpi_prefix,
parallel=32, version="3.0.0")Stage0 += ompi
44GTC’18: HPC Containers
MILC HPCCM INPUT RECIPE FILE, PART 2: QUDA# build QUDAgit = hpccm.git()quda_build_dir = '/quda/build'Stage0 += workdir(directory="/quda")Stage0 += shell(commands=[git.clone_step(repository="https://github.com/lattice/quda",
branch="release/0.8.x"),'mv /tmp/quda /quda/src','mkdir -p {}'.format(quda_build_dir),'cd {}'.format(quda_build_dir),('cmake ../src ' +'-DCMAKE_BUILD_TYPE=RELEASE ' +'-DQUDA_DIRAC_CLOVER=ON ' +'-DQUDA_DIRAC_DOMAIN_WALL=ON ' +'-DQUDA_DIRAC_STAGGERED=ON ' +'-DQUDA_DIRAC_TWISTED_CLOVER=ON ' +'-DQUDA_DIRAC_TWISTED_MASS=ON ' +'-DQUDA_DIRAC_WILSON=ON ' +'-DQUDA_FORCE_GAUGE=ON ' +'-DQUDA_FORCE_HISQ=ON ' +'-DQUDA_GPU_ARCH={} '.format(gpu_arch) +'-DQUDA_INTERFACE_MILC=ON ' +'-DQUDA_INTERFACE_QDP=ON ' +'-DQUDA_LINK_HISQ=ON ' +'-DQUDA_MPI=ON'),'make -j32','rm -rf /quda/src'])
45GTC’18: HPC Containers
MILC HPCCM INPUT RECIPE FILE, PART 3: MILC# build MILCStage0 += shell(commands=[git.clone_step(repository="https://github.com/milc-qcd/milc_qcd"),
'mv /tmp/milc_qcd /milc','cd /milc/ks_imp_rhmc/','cp /milc/Makefile /milc/ks_imp_rhmc/',r"sed -i 's/WANTQUDA\(.*\)=.*/WANTQUDA\1= true/g' Makefile",r"sed -i 's/\(WANT_.*_GPU\)\(.*\)= .*/\1\2= true/g' Makefile",r"sed -i 's/QUDA_HOME\(.*\)= .*/QUDA_HOME\1= \/quda\/build/g' Makefile",r"sed -i 's/CUDA_HOME\(.*\)= .*/CUDA_HOME\1= \/usr\/local\/cuda/g' Makefile",r"sed -i 's/#\?MPP = .*/MPP = true/g' Makefile",r"sed -i 's/#\?CC = .*/CC = mpicc/g' Makefile",r"sed -i 's/LD\(\s+\)= .*/LD\1= mpicxx/g' Makefile",r"sed -i 's/PRECISION = \d+/PRECISION = 2/g' Makefile",r"sed -i 's/WANTQIO = .*/WANTQIO = #true or blank. Implies HAVEQMP./g' Makefile",r"sed -i 's/CGEOM =.*-DFIX_NODE_GEOM.*/CGEOM = #-DFIX_NODE_GEOM/g' Makefile",'C_INCLUDE_PATH={}/include make su3_rhmd_hisq'.format(quda_build_dir)])
46GTC’18: HPC Containers
MILC HPCCM INPUT RECIPE FILE, PART 4: RELEASE################################################################################ Release stage###############################################################################Stage0 += baseimage(image='nvidia/cuda:9.0-release-ubuntu16.04')Stage0 += apt_get(ospackages=['libnuma1', 'ssh', 'libibverbs1', 'librdmacm1'])Stage1 += ompi.runtime()Stage1 += copy(_from=Stage0.name, src='/milc/ks_imp_rhmc/su3_rhmd_hisq', dest='/milc/su3_rhmd_hisq')Stage1 += copy(src='examples', dest='/workspace/examples')Stage1 += workdir(directory='/workspace')
47GTC’18: HPC Containers
COLLABORATIONS
• Mellanox [Yong Qin]
• Negligible overhead from using containers
• Working on resolving driver versioning issues
• Collaborating on best recipes, including for multi-node
• Dell [Nishanth Dandapanthula]
• Negligible overhead from using containers
• Ease of use
• Universities and labs
• Evaluations, feedback, use cases
Nurturing a communal effort
48© 2018 Mellanox Technologies
RDMA Performance
▪RDMA performances are inline between host and containers▪Images built with hpccm
0.00
5000.00
10000.00
15000.00
1 32 1024 32768 1048576
BW
(M
B/s
ec)
Size (Bytes)
RDMA BW (EDR)
Host Singularity Docker
0.1
1
10
100
1000
1 32 1024 32768 1048576
Late
ncy
(u
s)
Size (Bytes)
RDMA Latency (EDR)
Host Singularity Docker
Courtesy of Yong Qin, Mellanox
49© 2018 Mellanox Technologies
RDMA Performance (Cont.)
▪ Larger variations observed on small message sizes due to runtime overheads from Containers
0.1
1
10
100
1000
1 16 256 4096 65536 1048576
Late
ncy
(u
s)
Size (Bytes)
RDMA Latency (99% percentile)
Host Singularity Docker
Courtesy of Yong Qin, Mellanox
50GTC’18: HPC Containers
Courtesy of Nishanth Danapanthula, HPC & DL Solutions Engineering, DellEMC
51GTC’18: HPC Containers
Courtesy of Nishanth Danapanthula, HPC & DL Solutions Engineering, DellEMC
52GTC’18: HPC ContainersCourtesy of Nishanth Danapanthula, HPC & DL Solutions Engineering, DellEMC
Image built with hpccm
53GTC’18: HPC Containers
Courtesy of Nishanth Danapanthula, HPC & DL Solutions Engineering, DellEMC
54GTC’18: HPC Containers
KEY ISSUES
Addressing key issues across the ecosystem to increase container adoption
• Developers
• Posting containerized HPC apps to our registry
• Infrastructure for making containers: recipes, scripts (hpccm), validated images
• Admins
• Driver matching
• Multi-node containers
• End users
• Working with OEMs to assure best performance
• Using containers from our registry
Working with the community to deliver leading reference solutions
55GTC’18: HPC Containers
DRIVER VERSIONING
• Problem
• Container doesn’t know which kernel driver versions are installed on the target platform
• Mismatches may be problematic, e.g. CUDA or mofed user and kernel drivers
• One approach
• Appropriate kernel driver is loaded into the container with container runtime enabling
• Relevance
• There are available solutions for Docker and Singularity for the CUDA driver case
• This issue is being actively worked in the Mellanox driver case
• Plans for Mellanox drivers
• Nail down support matrix
• Share test cases for regression suite, based on hpccm input receipe, platform config
One down, more to go
56GTC’18: HPC Containers
YOUR FEEDBACK
• What value do you see to using containers that motivate you to containerize your app?
• What ingredients do you most want to see
• inside of an HPC container?
• outside of an HPC container?
• What are your pain points around developing containers?
• What pain points do you hear about for deploying containers?
• Are you willing to try out
• NVIDIA’s containerized HPC apps?
• the hpccm infrastructure that helps with containerization of HPC apps?
57GTC’18: HPC Containers
CALL TO ACTION
• Try https://github.com/NVIDIA/hpc-container-maker - OSS project
• Find this content at GTC website for Monday Mar 26 11am by CJ Newburn
• App developers
• Build your containers with HPCCM & deploy on NGC, offer feedback
• Take the opportunity to focus efforts, collaborate around a reference
• System Admins
• Make your cluster container ready with Docker, LXC and/or Singularity runtimes
• Application users
• Pull and run containers from ngc.nvidia.com
• Enjoy HPC apps with greater ease and confidence
• OEMs
• Build container-ready systems with NGC
58GTC’18: HPC Containers
FAQ
• Supported systems - The containers must run on Pascal, Volta. & newer GPU-powered systems
• Testing & performance – NVIDIA may QA and benchmark the container
• License agreement – Developer has to comply with all the app license requirements
• Ownership – Container developer owns and retains all the rights, tile, and interest in and to HPC
containers
• Support – Developer must provide technical support to the end user of the container
• Cost – NVIDIA will host the containers on NGC for free
• Container removal – Both NVIDIA and the container developer have the right to take down the
container at any time for any reason
59GTC’18: HPC Containers
REQUEST FOR FEEDBACK: HPC RUNTIME CONTAINER V0.5
Kind of ingredient Recommended choice Rationale, alternatives
OS Ubuntu 16.04 Aligned with DL, current focus
Alternative: CentOS 7
CUDA version 9.0 Backwards compatible [driver upgrade]
CUDA type Runtime For deployment, not development
Compiler PGI, gcc [and Intel] runtimes Actual compiler not needed for most usages
Comms libraries CUDA-aware OpenMPI 3.0.0
MOFED 3.4-1 libs
OpenMPI CUDA enabling is underway via UCX
Scientific libraries FFTW 3.3.7 [and MKL] Most commonly used
Infrastructure Python 2 and 3, HDF5 1.10.1 Commonly used, may add more tools
Design and validate HPC container, derive from there for apps
60GTC’18: HPC Containers
REQUEST FOR FEEDBACK: HPC DEVEL CONTAINER V0.5
Kind of ingredient Recommended choice Rationale, alternatives
OS Ubuntu 16.04 Aligned with DL, current focus
Alternative: CentOS 7
CUDA version 9.0 Backwards compatible [driver upgrade]
CUDA type Devel For development, include CUDA toolkit
Compiler PGI, gcc [and Intel] compilers Compiler and its license in private images
only
Comms libraries CUDA-aware OpenMPI 3.0.0
MOFED 3.4-1 libs
OpenMPI CUDA enabling is underway via UCX
Scientific libraries FFTW 3.3.7 [and MKL] Most commonly used
Infrastructure Python 2 and 3, HDF5 1.10.1 Commonly used, may add more tools
Design and validate HPC container, derive from there for apps
61© 2018 Mellanox Technologies
Container Namespace Isolation
Docker Singularity
Namespace Isolation Share almost nothing Share almost everything
File System (Mount) Isolated by defaultCan bind mount host volumes
$HOME, /proc, /sys, /tmp, etc., from host by defaultCan bind mount other host volumes
PID Isolated Shared
Network IsolatedCan be expanded with full support
Shared with limited support
Courtesy of Yong Qin, Mellanox
62© 2018 Mellanox Technologies
MPI
Docker Singularity
MPI library Inside of container Outside of container (host)
MPI Program Binary Inside of container Inside of container
Network Container Host
Security Docker Daemon Inherited from host
Courtesy of Yong Qin, Mellanox