hpc in containers - nvidiaon-demand.gputechconf.com/...hpc-in-containers-why-containers-wh… ·...

CJ Newburn, HPC Architect, NVIDIA Compute SW

Principal Engineer

HPC IN CONTAINERS:WHY CONTAINERS, WHY HPC, WHY NVIDIAGTC’18, S8642, Monday March 26, 11am

2GTC’18: HPC Containers

OUTLINE

• Motivation

• What NVIDIA is doing

• Collaborations

• Requested feedback

• Call to action


WHY CONTAINERS: MOTIVATIONAL STORIES

• Hard to configure and install HPC apps

• App updates get delayed

• Lack of a reference design

• Many variants, some better than others

• Experimental/simulation hybrid molecular modeling as a service

• Will a given app run on a new platform?

• Better startup times with fewer libs loaded from bottlenecked metadata servers

• Encapsulating pipelines reduces complexity

War stories from the trenches


RUNNING A GPU APPLICATIONCustomer Pain Points

RHEL 7.3CUDA 8.0Driver 3754x PascalPython 2.7

Ubuntu 16.04CUDA 9.0Driver 3844x VoltaPython 3.5

▪ “This framework requires installing 6 dependencies from sources”

▪ “I want to train my model on the cluster but it’s running RHEL 7”

▪ “Some machines in the cluster have different NVIDIA hardware & drivers”

▪ “How do I deploy a DL model/application at scale”

DL Application


EXPERIMENTATION+MODELING

• Experimenters

• Run equipment to collect raw data

• Challenge: what’s signal vs. noise?

• Scientists who don’t do code or SW administration

• Augmenting with modeling

• Model helps filter out noise → more accurate with less processing time

• Provide container, e.g. NAMD on 1 GPU in a few hours

HPC modeling as a service


EASING THE TRANSITION TO IMPROVED SYSTEMSTry before you buy, on your own workload

Legacy

Cloud

Latest GPUs


TRIMMING LIB DEPENDENCES VIA CONTAINERS

• Size of dependent libraries can become huge

• SquashFS @ 4x can make fit in RAMdisk for faster access

• Metadata server I/O can become bottleneck, e.g. with 20 job groups

• Trim away shared libs and Python include searches

• Fix/patch to merge data locally and move to Lustre at the end of the job avoids conflicts

• RAMdisk access improvements can greatly reduce startup time, even with copy

• Relevant example

• ATLAS (CERN) simulations on Titan, courtesy of Sergey Panitkin of BNL

• Container build defines mount points, installs special versions with perf optimizations

• Optimizing for size and using RAMdisk halved setup time, reduced runtime by >2 minutes (9%)

Container is a good fit for applying special optimization steps

Background info for this use case courtesy of Adam Simpson, ORNL


PIPELINE EXAMPLE

• Consider a pipeline of many processes

• Each could have its own dependences, require its own set up

• But each stage or the whole set of stages could be containerized

• Some relevant work: snakemake, SCI-F by Vanessa Sochat, Stanford: “The Scientific Filesystem,” Containers in HPC Symposium at UCAR, Boulder CO, https://sea.ucar.edu/conference/2018/containers.

Moving toward HPC as a service vs. becoming an app mechanic

index

map

sort

index

report

https://github.com/sci-f/snakemake.scif

https://sci-f.github.io/


WHY HIGH-PERFORMANCE COMPUTING

• Performance can depend on

• Tuning – discover and apply best-known methods

• Getting the latest version

• We are making a transition from “HPC for experts” to “HPC for the masses”

• Breadth of adoption may strongly depend on ease of use

• The time is ripe!

We in HPC care about performance; democratizing HPC


PROBLEMS ADDRESSED VIA CONTAINERIZATION

• Portability

• Repeatability

• Resource isolation

• New telemetry surface

• Bare metal performance, vs. VMs

• Parameterizability and control over runtime

Making it easier for users, admins and developers


DESIGNED FOR GPU-ACCELERATED SYSTEMS

RUN ON PASCAL- & VOLTA-POWERED SYSTEMS

Workstations Supercomputing Clusters Cloud Computing


OPENMPI DOCKERFILE VARIANTSReal examples: lots of ways, some better than others

RUN OPENMPI_VERSION=3.0.0 && \wget -q -O - https://www.open-

mpi.org/software/ompi/v3.0/downloads/openmpi-${OPENMPI_VERSION}.tar.gz | tar -xzf - && \

cd openmpi-${OPENMPI_VERSION} && \./configure --enable-orterun-prefix-by-default --with-cuda --

with-verbs \--prefix=/usr/local/mpi --disable-getpwuid && \

make -j"$(nproc)" install && \cd .. && rm -rf openmpi-${OPENMPI_VERSION} && \echo "/usr/local/mpi/lib" >> /etc/ld.so.conf.d/openmpi.conf &&

ldconfigENV PATH /usr/local/mpi/bin:$PATH

WORKDIR /tmpADD http://www.open-mpi.org//software/ompi/v1.10/downloads/openmpi-1.10.7.tar.gz /tmpRUN tar -xzf openmpi-1.10.7.tar.gz && \

cd openmpi-*&& ./configure --with-cuda=/usr/local/cuda \--enable-mpi-cxx --prefix=/usr && \make -j 32 && make install && cd /tmp \&& rm -rf openmpi-*

RUN mkdir /logsRUN wget -nv https://www.open-mpi.org/software/ompi/v1.10/downloads/openmpi-1.10.7.tar.gz && \

tar -xzf openmpi-1.10.7.tar.gz && \cd openmpi-*&& ./configure --with-cuda=/usr/local/cuda \--enable-mpi-cxx --prefix=/usr 2>&1 | tee /logs/openmpi_config

&& \make -j 32 2>&1 | tee /logs/openmpi_make && make install 2>&1

| tee /logs/openmpi_install && cd /tmp \&& rm -rf openmpi-*

RUN apt-get update \&& apt-get install -y --no-install-recommends \libopenmpi-dev \openmpi-bin \openmpi-common \

&& rm -rf /var/lib/apt/lists/*ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/openmpi/lib

RUN wget -q -O - https://www.open-mpi.org/software/ompi/v3.0/downloads/openmpi-3.0.0.tar.bz2 | tar -xjf - && \

cd openmpi-3.0.0 && \CXX=pgc++ CC=pgcc FC=pgfortran F77=pgfortran ./configure --

prefix=/usr/local/openmpi --with-cuda=/usr/local/cuda --with-verbs --disable-getpwuid && \

make -j4 install && \rm -rf /openmpi-3.0.0

COPY openmpi /usr/local/openmpiWORKDIR /usr/local/openmpiRUN /bin/bash -c "source /opt/pgi/LICENSE.txt && CC=pgcc CXX=pgc++ F77=pgf77 FC=pgf90 ./configure --with-cuda --prefix=/usr/local/openmpi”RUN /bin/bash -c "source /opt/pgi/LICENSE.txt && make all install"

Functional, simpler, but

not CUDA or IB aware

Enable many versions

with parameters to

common interface

Different compilers

Bad layering

Control environment

Parameters vary


WHAT NVIDIA IS DOING

• Enabling

• Offerings

• Technology collaboration


SCOPE OF ENABLING PLANS

• Ecosystem: nurture a collaborative ecosystem around HPC containers

• Registry: host containerized applications, CUDA base containers

• Ingredients, recipes: Recommend and validate best practices

• HPC Containers: easily derive application containers from these

• Container technologies: GPU enabled

• System SW: OS, container runtime, scheduler is GPU enabled

• Recommended platforms: known-good solutions for HPC apps

Better, more up-to-date results with less effort


MAKING IT EASIER WITH HPC CONTAINERS

• NVIDIA has experience collaborating with developers to containerize HPC apps

• Identifying, improving, creating ingredients

• Developing and optimizing recipes

• Codify those learnings

• Dockerfiles and other recipe files with tuned steps for each recommended ingredient

• Careful layering, for the sake of minimizing size, maximizing cacheability

• Validated combinations in specific HPC base containers from which app containers derived

• Recipes for building platforms – container runtime, scheduler, OS, system

• Consistent approach to documentation

Potentially easier for non-expert end users


RAPID USER ADOPTION

HPC APPS CONTAINERS ON NVIDIA GPU CLOUD

RAPID CONTAINER ADDITION

GAMESSCHROMA*CANDLE GROMACS LAMMPS

NAMD RELIONLattice Microbes MILC*

*Coming soon


U CLOUD FOR HPC VISUALIZATION

ParaView with NVIDIA OptiX

ParaView with NVIDIA Holodeck

ParaView with NVIDIA IndeX

NVIDIA GPU CLOUD FOR HPC VISUALIZATION

VMDIndeX

18

NVIDIA CONTAINER RUNTIMEEnables GPU support in popular container runtimes

▶ NVIDIA-Docker makes GPU containers truly portable

▶ Integrates Linux container internals instead of wrapping specific runtimes (e.g. Docker)

▶ Better integration into the container ecosystem - Kubernetes (CRI), HPC (rootless)

▶ 2M downloads

NVML

NVIDIA Driver

CUDALibnvidia-container

Nvidia-container-runtimeContainer Runtime

OCI Runtime Interface

Docker

CaffeNAMD

Tensor

Flow

MILCContainerized Applications

LXC, CRIO

etc

19

KUBERNETES ON NVIDIA GPUs

▶ GPU enhancements to mainline Kubernetes -get features faster than community releases

▶ Updated with each release of K8s (current version is v1.9) and close collaboration with community to upstream changes

▶ Minimize friction to adoption of Kubernetes on GPUs

▶ Fully open-source

NVIDIA CONTAINER RUNTIME

KUBERNETES

NVIDIA DRIVER


HPC CONTAINER MAKER - HPCCM

• Collect and codify best practices

• Make recipe file creation easy, repeatable, modular, qualifiable

• Using this as a reference and a vehicle to drive collaboration

• Container implementation neutral

• Write Python code that calls primitives and building blocks vs. roll your own

• Leverage latest and greatest building blocks

“h-p-see-um”


BIG PICTURE

Dockerfile

Singularity

recipe file

Singularity

image

Docker image

$ singularity build …

docker2singularity

$ docker build …

Buildkit, buildah, …reference hpccm

input recipe

hpccm

CLI tool

HPC Container Maker: hpccm

Python file with references to primitives

and parameterized building blocks

Script that transforms into container recipe file,

using primitive and building block implementations

container spec file container build container image

Base

images

Recipe

implementations


HPCCM CONCEPTS AND TERMINOLOGY

• hpccm input recipe file: what hpccm ingests

• container recipe file: what hpccm produces, e.g. Dockerfile, Singularity recipe file

• primitive: line in hpccm-recipe file that has a 1:1 mapping with primitive implementation line in container recipe file

• building block: line in hpccm-recipe file with a 1:many primitive mapping; mapping is codified in hpccm implementation; these are parameterized

• recipe implementations: collection of implementations of primitives and building blocks


RECIPES INCLUDED WITH CONTAINER MAKER

• HPC base recipe with GNU compilers

• Ubuntu 16.04

• CUDA 9.0

• Python 2 and 3

• GNU compilers (upstream)

• Mellanox OFED 3.4-1.0.0.0

• OpenMPI 3.0.0

• FFTW 3.3.7

• HDF5 1.10.1

• HPC base recipe with PGI compilers

• Ubuntu 16.04

• CUDA 9.0

• Python 2 and 3

• PGI compilers 17.10

• Mellanox OFED 3.4-1.0.0.0

• OpenMPI 3.0.0

• FFTW 3.3.7

• HDF5 1.10.1

Shown in current build order

HPC application samples coming


BUILDING AN HPC APPLICATION IMAGE

1. Use the HPC base image as your starting point

2. Generate a Dockerfile from the HPC base recipe Dockerfile and manually edit it to add the steps to build your application

3. Copy the HPC base recipe file and add your application build steps to the recipe

Analogous workflows for Singularity

Base recipe Dockerfile Base image App Dockerfile

Base recipe Dockerfile App Dockerfile

Base recipe App recipe


HIGHER LEVEL ABSTRACTIONBuilding block encapsulates simplified best practices, avoids duplication

# OpenMPI version 3.0.0RUN apt-get update -y && \

apt-get install -y --no-install-recommends \file \hwloc && \

rm -rf /var/lib/apt/lists/*RUN mkdir -p /tmp && wget -q --no-check-certificate -P /tmp https://www.open-mpi.org/software/ompi/v3.0/downloads/openmpi-3.0.0.tar.bz2 && \

tar -x -f /tmp/openmpi-3.0.0.tar.bz2 -C /tmp -j && \cd /tmp/openmpi-3.0.0 && ./configure --prefix=/usr/local/openmpi --disable-getpwuid --with-cuda --without-verbs && \make -j4 && \make -j4 install && \rm -rf /tmp/openmpi-3.0.0.tar.bz2 /tmp/openmpi-3.0.0

ENV PATH=/usr/local/openmpi/bin:$PATH \LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH

ompi = openmpi(version='3.0.0', toolchain=tc)Stage0 += ompi


"""HPC Base imageContents:

CUDA version 9.0FFTW version 3.3.7GNU compilers (upstream)HDF5 version 1.10.1Mellanox OFED version 3.4-1.0.0.0OpenMPI version 3.0.0Python 2 and 3 (upstream)

"""

Stage0 += comment(__doc__, reformat=False)

Stage0 += baseimage(image='nvidia/cuda:9.0-devel', _as='devel')

# Python (use upstream)Stage0 += apt_get(ospackages=['python', 'python3'])

# Compilers (use upstream)Stage0 += apt_get(ospackages=['gcc', 'g++', 'gfortran'])

# Create a toolchaintc = hpccm.toolchain(CC='gcc', CXX='g++', F77='gfortran', F90='gfortran',

FC='gfortran', CUDA_HOME='/usr/local/cuda')

# Mellanox OFEDofed = mlnx_ofed(version='3.4-1.0.0.0')Stage0 += ofed

# OpenMPIompi = openmpi(version='3.0.0', toolchain=tc)Stage0 += ompi

# FFTWfftw = fftw(version='3.3.7', toolchain=tc)Stage0 += fftw

# HDF5hdf5 = hdf5(version='1.10.1', toolchain=tc)Stage0 += hdf5

## HPC Base image## Contents:# CUDA version 9.0# FFTW version 3.3.7# GNU compilers (upstream)# HDF5 version 1.10.1# Mellanox OFED version 3.4-1.0.0.0# OpenMPI version 3.0.0# Python 2 and 3 (upstream)#

FROM nvidia/cuda:9.0-devel AS devel

RUN apt-get update -y && \apt-get install -y --no-install-recommends \

python \python3 && \

rm -rf /var/lib/apt/lists/*

RUN apt-get update -y && \apt-get install -y --no-install-recommends \

gcc \g++ \gfortran && \


# Mellanox OFED version 3.4-1.0.0.0RUN apt-get update -y && \

apt-get install -y --no-install-recommends \libnl-3-200 \libnl-route-3-200 \libnuma1 \wget && \

rm -rf /var/lib/apt/lists/*RUN mkdir -p /tmp && wget -q --no-check-certificate -P /tmp http://content.mellanox.com/ofed/MLNX_OFED-3.4-1.0.0.0/MLNX_OFED_LINUX-3.4-1.0.0.0-ubuntu16.04-x86_64.tgz && \

tar -x -f /tmp/MLNX_OFED_LINUX-3.4-1.0.0.0-ubuntu16.04-x86_64.tgz -C /tmp -z && \dpkg --install /tmp/MLNX_OFED_LINUX-3.4-1.0.0.0-ubuntu16.04-x86_64/DEBS/libibverbs1_*_amd64.deb && \dpkg --install /tmp/MLNX_OFED_LINUX-3.4-1.0.0.0-ubuntu16.04-x86_64/DEBS/libibverbs-dev_*_amd64.deb && \dpkg --install /tmp/MLNX_OFED_LINUX-3.4-1.0.0.0-ubuntu16.04-x86_64/DEBS/libmlx5-1_*_amd64.deb && \dpkg --install /tmp/MLNX_OFED_LINUX-3.4-1.0.0.0-ubuntu16.04-x86_64/DEBS/ibverbs-utils_*_amd64.deb && \rm -rf /tmp/MLNX_OFED_LINUX-3.4-1.0.0.0-ubuntu16.04-x86_64.tgz /tmp/MLNX_OFED_LINUX-3.4-1.0.0.0-ubuntu16.04-x86_64


apt-get install -y --no-install-recommends \file \hwloc \openssh-client \wget && \


tar -x -f /tmp/openmpi-3.0.0.tar.bz2 -C /tmp -j && \cd /tmp/openmpi-3.0.0 && CC=gcc CXX=g++ F77=gfortran F90=gfortran FC=gfortran ./configure --prefix=/usr/local/openmpi --disable-getpwuid --enable-orterun-

prefix-by-default --with-cuda=/usr/local/cuda --with-verbs && \make -j4 && \make -j4 install && \rm -rf /tmp/openmpi-3.0.0.tar.bz2 /tmp/openmpi-3.0.0


# FFTW version 3.3.7RUN apt-get update -y && \

apt-get install -y --no-install-recommends \file \make \wget && \

rm -rf /var/lib/apt/lists/*RUN mkdir -p /tmp && wget -q --no-check-certificate -P /tmp ftp://ftp.fftw.org/pub/fftw/fftw-3.3.7.tar.gz && \

tar -x -f /tmp/fftw-3.3.7.tar.gz -C /tmp -z && \cd /tmp/fftw-3.3.7 && CC=gcc CXX=g++ F77=gfortran F90=gfortran FC=gfortran ./configure --prefix=/usr/local/fftw --enable-shared --enable-openmp --enable-

threads --enable-sse2 && \make -j4 && \make -j4 install && \rm -rf /tmp/fftw-3.3.7.tar.gz /tmp/fftw-3.3.7

ENV LD_LIBRARY_PATH=/usr/local/fftw/lib:$LD_LIBRARY_PATH

# HDF5 version 1.10.1RUN apt-get update -y && \

apt-get install -y --no-install-recommends \file \make \wget \zlib1g-dev && \

rm -rf /var/lib/apt/lists/*RUN mkdir -p /tmp && wget -q --no-check-certificate -P /tmp http://www.hdfgroup.org/ftp/HDF5/releases/hdf5-1.10/hdf5-1.10.1/src/hdf5-1.10.1.tar.bz2 && \

tar -x -f /tmp/hdf5-1.10.1.tar.bz2 -C /tmp -j && \cd /tmp/hdf5-1.10.1 && CC=gcc CXX=g++ F77=gfortran F90=gfortran FC=gfortran ./configure --prefix=/usr/local/hdf5 --enable-cxx --enable-fortran && \make -j4 && \make -j4 install && \rm -rf /tmp/hdf5-1.10.1.tar.bz2 /tmp/hdf5-1.10.1

ENV PATH=/usr/local/hdf5/bin:$PATH \HDF5_DIR=/usr/local/hdf5 \LD_LIBRARY_PATH=/usr/local/hdf5/lib:$LD_LIBRARY_PATH


FROM nvidia/cuda:9.0-devel AS devel

...


apt-get install -y --no-install-recommends \file \hwloc \openssh-client \wget && \


tar -x -f /tmp/openmpi-3.0.0.tar.bz2 -C /tmp -j && \cd /tmp/openmpi-3.0.0 && CC=gcc CXX=g++ F77=gfortran F90=gfortran FC=gfortran ./configure --

prefix=/usr/local/openmpi --disable-getpwuid --enable-orterun-prefix-by-default --with-cuda=/usr/local/cuda --with-verbs && \

make -j4 && \make -j4 install && \rm -rf /tmp/openmpi-3.0.0.tar.bz2 /tmp/openmpi-3.0.0


...

FROM nvidia/cuda:9.0-runtime

...

# OpenMPIRUN apt-get update -y && \

apt-get install -y --no-install-recommends \hwloc \openssh-client && \

rm -rf /var/lib/apt/lists/*COPY --from=0 /usr/local/openmpi /usr/local/openmpiENV PATH=/usr/local/openmpi/bin:$PATH \

LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH

...

Stage0 += baseimage(image='nvidia/cuda:9.0-devel', _as='devel’)

...

# OpenMPIompi = openmpi(version='3.0.0', toolchain=tc)Stage0 += ompi

...

####### Runtime image######

Stage1 += baseimage(image='nvidia/cuda:9.0-runtime’)

...

# OpenMPIStage1 += ompi.runtime()

...

MULTI-STAGE BUILDS

# OpenMPIRUN apt-get update -y && \

apt-get install -y --no-install-recommends \hwloc \openssh-client && \

rm -rf /var/lib/apt/lists/*COPY --from=0 /usr/local/openmpi /usr/local/openmpiENV PATH=/usr/local/openmpi/bin:$PATH \

LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH


PARAMETERIZED BUILDING BLOCKS

• openmpi(check=False, # run “make check”?configure_opts=[‘—disable-getpwuid’, ‘—enable-orterun-prefix-by-default’], cuda=True, directory=‘’, # path to source in build contextinfiniband=True, ospackages=[‘file’, ‘hwloc’, ‘openssh-client’, ‘wget’],prefix=‘/usr/local/openmpi’, toolchain=toolchain(),version=‘3.0.0’) # version to download

• mlnx_ofed(ospackages=[‘libnl-3-200’, ‘libnl-route-3-200’, ‘libnuma1’, ‘wget’],packages=[‘libibverbs1’, ‘libibverbs-dev’, ‘libmlx5-1’, ‘ibverbs-utils’],version=‘3.4-1.0.0.0’) # version to download

Parameters enable specialization; implementations invoke Python code

Also: apt-get, FFTW, HDF5, Linux OFED, PGI compiler

Full recipe documentation can be found in RECIPES.md


CONTAINER IMPLEMENTATION ABSTRACTION

Container Builder primitive Dockerfile Singularity recipe file

baseimage(image=‘ubuntu:16.04’) FROM ubuntu:16.04 Bootstrap: dockerFrom: ubuntu:16.04

shell(commands=[‘a’, ‘b’, ‘c’]) RUN a && \b && \c

%postabc

copy(src=‘a’, dest=‘b’) COPY a b %filesa b

Single source to either Docker or Singularity


FULL PROGRAMMING LANGUAGE AVAILABLE

• # get and validate precisionVALID_PRECISION = [‘single’, ‘double’, ‘mixed’]precision = os.environ.get(’LAMMPS_PRECISION’, ‘single’)if precision not in VALID_PRECISION:

raise ValueError(‘Invalid precision’)

• …

• Stage0 += shell(commands=[f‘make –f Makefile.linux.{precision}’, …])

• …

Conditional branching, validation, etc. in hpccm input recipe

Courtesy of Logan Herche


HPC runtime

container

ENVISIONED FLOWAccelerating the container creation and usage

NGC GPU clusters

GPU-enabled

technologies

App source

HPC devel

container HPC runtime

container

App test

binary

NVIDIA registry

App

image

Container runtime,

scheduler, OS

ValidatedValidatedValidatedValidated

App final

binary

hpccm


V0.5 OUTSIDE-CONTAINER TRADE-OFFS

Ingredient Choices Choice factors

CUDA Version 9.0 Supports Kepler through Volta, highest performance

Container runtimes Docker, LXC, Shifter, Singularity

Docker has best GPU support today.NVIDIA is investing in LXC for rootless

Orchestration & scheduling

SLURM, Kubernetes SLURM widely used in HPC; Kubernetes widely used in cloud

GPU Enablement NVIDIA Container Runtime SDK

OCI compliant, enabled multiple container runtimes, multi-node support

OS Ubuntu 16.04, CentOS 7 Application based choice. Ubuntu has more testing for GPU enabled containers. CentOS uses RPMs.

Situation and environment-based choices


SAMPLE DOCKER FILES

• We’re in the process of normalizing our containers wrt these devel and runtime HPC container offerings

• GROMACS

• MILC


GROMACS DOCKERFILE PART 1; BUILD STAGEFROM nvidia/cuda:9.0-devel-ubuntu16.04 AS devel

RUN apt-get update -y && \

apt-get install -y --no-install-recommends \

ca-certificates cmake file git hwloc \

libibverbs-dev openssh-client python wget && \


RUN mkdir -p /tmp && \

wget -q --no-check-certificate -P /tmp https://www.open-

mpi.org/software/ompi/v3.0/downloads/openmpi-3.0.0.tar.bz2 && \

tar -x -f /tmp/openmpi-3.0.0.tar.bz2 -C /tmp -j && \

cd /tmp/openmpi-3.0.0 && \

./configure --prefix=/opt/openmpi --enable-mpi-cxx --with-cuda \

--with-verbs && \

make -j32 && \

make -j32 install && \

rm -rf /tmp/openmpi-3.0.0.tar.bz2 /tmp/openmpi-3.0.0

ENV LD_LIBRARY_PATH=/opt/openmpi/lib:$LD_LIBRARY_PATH \

PATH=/opt/openmpi/bin:$PATH

Initialize

build stage

Install packages

and cleanup

Install OpenMPI


GROMACS DOCKERFILE PART 2RUN mkdir -p /gromacs/install && \

mkdir -p /gromacs/builds && \

mkdir -p /tmp && git -C /tmp clone --depth=1 --branch v2018 \

https://github.com/gromacs/gromacs && \

mv /tmp/gromacs /gromacs/src && \

cd /gromacs/builds && \

CC=gcc CXX=g++ cmake /gromacs/src -DCMAKE_BUILD_TYPE=Release \

-DCMAKE_INSTALL_PREFIX=/gromacs/install \

-DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda \

-DGMX_BUILD_OWN_FFTW=ON -DGMX_GPU=ON -DGMX_MPI=OFF \

-DGMX_OPENMP=ON -DGMX_PREFER_STATIC_LIBS=ON \

-DMPIEXEC_PREFLAGS=--allow-run-as-root \

-DREGRESSIONTEST_DOWNLOAD=ON && \

make -j && \

make install && \

make check

Build

GROMACS


GROMACS DOCKERFILE PART 3; RUNTIME STAGEFROM nvidia/cuda:9.0-runtime-ubuntu16.04



hwloc \

libgomp1 \

libibverbs-dev \

openssh-client \

python && \


COPY --from=devel /opt/openmpi /opt/openmpi



COPY --from=devel /gromacs/install /gromacs/install

ENV PATH=$PATH:/gromacs/install/bin

WORKDIR /workspace

Initialize

release stage

Install packages

and cleanup

Copy OpenMPI

from build

Copy GROMACS

from build


MILC DOCKERFILE PART 1; BUILD STAGEFROM nvidia/cuda:9.0-devel-ubuntu16.04 AS devel



autoconf automake ca-certificates cmake dapl2-utils \

file git hwloc ibutils ibverbs-utils \

infiniband-diags libdapl-dev libibcm-dev \

libibmad5 libibverbs-dev libibverbs1 \

libmlx4-1 libmlx4-dev libmlx5-1 libmlx5-dev \

libnuma-dev librdmacm-dev librdmacm1 opensm \

openssh-client rdmacm-utils wget && \


Initialize build

stage

Install packages

and cleanup


MILC DOCKERFILE PART 2RUN mkdir -p /tmp && wget -q --no-check-certificate -P /tmp

https://www.open-mpi.org/software/ompi/v3.0/downloads/openmpi-

3.0.0.tar.bz2 && \

tar -x -f /tmp/openmpi-3.0.0.tar.bz2 -C /tmp -j && \

cd /tmp/openmpi-3.0.0 && \

./configure --prefix=/opt/openmpi --enable-mpi-cxx \

--with-cuda --with-verbs && \

make -j32 && \

make -j32 install && \

rm -rf /tmp/openmpi-3.0.0.tar.bz2 /tmp/openmpi-3.0.0



Build OpenMPI


MILC DOCKERFILE PART 3WORKDIR /quda

RUN mkdir -p /tmp && git -C /tmp clone --depth=1 --branch

release/0.8.x https://github.com/lattice/quda && \

mv /tmp/quda /quda/src && \

mkdir -p /quda/build && \

cd /quda/build && \

cmake ../src -DCMAKE_BUILD_TYPE=RELEASE \

-DQUDA_DIRAC_CLOVER=ON -DQUDA_DIRAC_DOMAIN_WALL=ON \

-DQUDA_DIRAC_STAGGERED=ON \

-DQUDA_DIRAC_TWISTED_CLOVER=ON \

-DQUDA_DIRAC_TWISTED_MASS=ON -DQUDA_DIRAC_WILSON=ON \

-DQUDA_FORCE_GAUGE=ON -DQUDA_FORCE_HISQ=ON \

-DQUDA_GPU_ARCH=sm_70 -DQUDA_INTERFACE_MILC=ON \

-DQUDA_INTERFACE_QDP=ON -DQUDA_LINK_HISQ=ON \

-DQUDA_MPI=ON && \

make -j32 && \

rm -rf /quda/src

Build QUDA


MILC DOCKERFILE PART 4RUN mkdir -p /tmp && \

git -C /tmp clone --depth=1 https://github.com/milc-qcd/milc_qcd && \

mv /tmp/milc /milc && \

cd /milc/ks_imp_rhmc/ && \

cp /milc/Makefile /milc/ks_imp_rhmc/ && \

sed -i 's/WANTQUDA$.*$=.*/WANTQUDA\1= true/g' Makefile && \

sed -i 's/$WANT_.*_GPU$$.*$= .*/\1\2= true/g' Makefile && \

sed -i 's/QUDA_HOME$.*$= .*/QUDA_HOME\1= \/quda\/build/g' Makefile && \

sed -i 's/CUDA_HOME$.*$= .*/CUDA_HOME\1= \/usr\/local\/cuda/g' Makefile && \

sed -i 's/#\?MPP = .*/MPP = true/g' Makefile && \

sed -i 's/#\?CC = .*/CC = mpicc/g' Makefile && \

sed -i 's/LD$\s+$= .*/LD\1= mpicxx/g' Makefile && \

sed -i 's/PRECISION = \d+/PRECISION = 2/g' Makefile && \

sed -i 's/WANTQIO = .*/WANTQIO = #true or blank. Implies HAVEQMP./g' Makefile && \

sed -i 's/CGEOM =.*-DFIX_NODE_GEOM.*/CGEOM = #-DFIX_NODE_GEOM/g' Makefile && \

C_INCLUDE_PATH=/quda/build/include make su3_rhmd_hisq

Build MILC


MILC DOCKERFILE PART 5; RUNTIME STAGEFROM nvidia/cuda:9.0-runtime-ubuntu16.04



hwloc \

libibverbs1 \

libnuma1 \

librdmacm1 \

openssh-client && \


COPY --from=devel /opt/openmpi /opt/openmpi


PATH=/opt/openmpi/bin:/milc:$PATH

COPY --from=devel /milc/ks_imp_rhmc/su3_rhmd_hisq /milc/su3_rhmd_hisq

COPY examples /workspace/examples

WORKDIR /workspace

Initialize

release stage

Install packages

and cleanup

Copy OpenMPI

from build stage

Copy MILC from

build stage

Copy examples

into container

Multi-node MPI

is enabled


MILC HPCCM INPUT RECIPE FILE

Ubuntu 16.04, CUDA 9.0, QUDA, MPI and MILC.

Build with:

nvidia-docker build -t milc .

Run with:

nvidia-docker run -it milc

Header


MILC HPCCM INPUT RECIPE FILE, PART1# pylint: disable=invalid-name, undefined-variable, used-before-assignment# pylama: ignore=E0602

gpu_arch = USERARG.get('GPU_ARCH', 'sm_70')

# add docstring to DockerfileStage0 += comment(__doc__.strip(), reformat=False)

################################################################################ Devel stage###############################################################################Stage0.name = 'devel'Stage0 += baseimage(image='nvidia/cuda:9.0-devel-ubuntu16.04', AS=Stage0.name)

Stage0 += apt_get(ospackages=['autoconf', 'automake', 'cmake','git', 'ca-certificates'])

Stage0 += ofed()mpi_prefix = '/opt/openmpi'ompi = openmpi(configure_opts=['--enable-mpi-cxx'], prefix=mpi_prefix,

parallel=32, version="3.0.0")Stage0 += ompi


MILC HPCCM INPUT RECIPE FILE, PART 2: QUDA# build QUDAgit = hpccm.git()quda_build_dir = '/quda/build'Stage0 += workdir(directory="/quda")Stage0 += shell(commands=[git.clone_step(repository="https://github.com/lattice/quda",

branch="release/0.8.x"),'mv /tmp/quda /quda/src','mkdir -p {}'.format(quda_build_dir),'cd {}'.format(quda_build_dir),('cmake ../src ' +'-DCMAKE_BUILD_TYPE=RELEASE ' +'-DQUDA_DIRAC_CLOVER=ON ' +'-DQUDA_DIRAC_DOMAIN_WALL=ON ' +'-DQUDA_DIRAC_STAGGERED=ON ' +'-DQUDA_DIRAC_TWISTED_CLOVER=ON ' +'-DQUDA_DIRAC_TWISTED_MASS=ON ' +'-DQUDA_DIRAC_WILSON=ON ' +'-DQUDA_FORCE_GAUGE=ON ' +'-DQUDA_FORCE_HISQ=ON ' +'-DQUDA_GPU_ARCH={} '.format(gpu_arch) +'-DQUDA_INTERFACE_MILC=ON ' +'-DQUDA_INTERFACE_QDP=ON ' +'-DQUDA_LINK_HISQ=ON ' +'-DQUDA_MPI=ON'),'make -j32','rm -rf /quda/src'])


MILC HPCCM INPUT RECIPE FILE, PART 3: MILC# build MILCStage0 += shell(commands=[git.clone_step(repository="https://github.com/milc-qcd/milc_qcd"),

'mv /tmp/milc_qcd /milc','cd /milc/ks_imp_rhmc/','cp /milc/Makefile /milc/ks_imp_rhmc/',r"sed -i 's/WANTQUDA$.*$=.*/WANTQUDA\1= true/g' Makefile",r"sed -i 's/$WANT_.*_GPU$$.*$= .*/\1\2= true/g' Makefile",r"sed -i 's/QUDA_HOME$.*$= .*/QUDA_HOME\1= \/quda\/build/g' Makefile",r"sed -i 's/CUDA_HOME$.*$= .*/CUDA_HOME\1= \/usr\/local\/cuda/g' Makefile",r"sed -i 's/#\?MPP = .*/MPP = true/g' Makefile",r"sed -i 's/#\?CC = .*/CC = mpicc/g' Makefile",r"sed -i 's/LD$\s+$= .*/LD\1= mpicxx/g' Makefile",r"sed -i 's/PRECISION = \d+/PRECISION = 2/g' Makefile",r"sed -i 's/WANTQIO = .*/WANTQIO = #true or blank. Implies HAVEQMP./g' Makefile",r"sed -i 's/CGEOM =.*-DFIX_NODE_GEOM.*/CGEOM = #-DFIX_NODE_GEOM/g' Makefile",'C_INCLUDE_PATH={}/include make su3_rhmd_hisq'.format(quda_build_dir)])


MILC HPCCM INPUT RECIPE FILE, PART 4: RELEASE################################################################################ Release stage###############################################################################Stage0 += baseimage(image='nvidia/cuda:9.0-release-ubuntu16.04')Stage0 += apt_get(ospackages=['libnuma1', 'ssh', 'libibverbs1', 'librdmacm1'])Stage1 += ompi.runtime()Stage1 += copy(_from=Stage0.name, src='/milc/ks_imp_rhmc/su3_rhmd_hisq', dest='/milc/su3_rhmd_hisq')Stage1 += copy(src='examples', dest='/workspace/examples')Stage1 += workdir(directory='/workspace')


COLLABORATIONS

• Mellanox [Yong Qin]

• Negligible overhead from using containers

• Working on resolving driver versioning issues

• Collaborating on best recipes, including for multi-node

• Dell [Nishanth Dandapanthula]

• Negligible overhead from using containers

• Ease of use

• Universities and labs

• Evaluations, feedback, use cases

Nurturing a communal effort

48© 2018 Mellanox Technologies

RDMA Performance

▪RDMA performances are inline between host and containers▪Images built with hpccm

0.00

5000.00

10000.00

15000.00

1 32 1024 32768 1048576

BW

(M

B/s

ec)

Size (Bytes)

RDMA BW (EDR)

Host Singularity Docker

0.1

1

10

100

1000

1 32 1024 32768 1048576

Late

ncy

(u

s)

Size (Bytes)

RDMA Latency (EDR)


Courtesy of Yong Qin, Mellanox


RDMA Performance (Cont.)

▪ Larger variations observed on small message sizes due to runtime overheads from Containers

0.1

1

10

100

1000

1 16 256 4096 65536 1048576

Late

ncy

(u

s)

Size (Bytes)

RDMA Latency (99% percentile)




Courtesy of Nishanth Danapanthula, HPC & DL Solutions Engineering, DellEMC

52GTC’18: HPC ContainersCourtesy of Nishanth Danapanthula, HPC & DL Solutions Engineering, DellEMC

Image built with hpccm


KEY ISSUES

Addressing key issues across the ecosystem to increase container adoption

• Developers

• Posting containerized HPC apps to our registry

• Infrastructure for making containers: recipes, scripts (hpccm), validated images

• Admins

• Driver matching

• Multi-node containers

• End users

• Working with OEMs to assure best performance

• Using containers from our registry

Working with the community to deliver leading reference solutions


DRIVER VERSIONING

• Problem

• Container doesn’t know which kernel driver versions are installed on the target platform

• Mismatches may be problematic, e.g. CUDA or mofed user and kernel drivers

• One approach

• Appropriate kernel driver is loaded into the container with container runtime enabling

• Relevance

• There are available solutions for Docker and Singularity for the CUDA driver case

• This issue is being actively worked in the Mellanox driver case

• Plans for Mellanox drivers

• Nail down support matrix

• Share test cases for regression suite, based on hpccm input receipe, platform config

One down, more to go


YOUR FEEDBACK

• What value do you see to using containers that motivate you to containerize your app?

• What ingredients do you most want to see

• inside of an HPC container?

• outside of an HPC container?

• What are your pain points around developing containers?

• What pain points do you hear about for deploying containers?

• Are you willing to try out

• NVIDIA’s containerized HPC apps?

• the hpccm infrastructure that helps with containerization of HPC apps?


CALL TO ACTION

• Try https://github.com/NVIDIA/hpc-container-maker - OSS project

• Find this content at GTC website for Monday Mar 26 11am by CJ Newburn

• App developers

• Build your containers with HPCCM & deploy on NGC, offer feedback

• Take the opportunity to focus efforts, collaborate around a reference

• System Admins

• Make your cluster container ready with Docker, LXC and/or Singularity runtimes

• Application users

• Pull and run containers from ngc.nvidia.com

• Enjoy HPC apps with greater ease and confidence

• OEMs

• Build container-ready systems with NGC

https://github.com/NVIDIA/hpc-container-maker

https://2018gputechconf.smarteventscloud.com/connect/sessionDetail.ww?SESSION_ID=152355


FAQ

• Supported systems - The containers must run on Pascal, Volta. & newer GPU-powered systems

• Testing & performance – NVIDIA may QA and benchmark the container

• License agreement – Developer has to comply with all the app license requirements

• Ownership – Container developer owns and retains all the rights, tile, and interest in and to HPC

containers

• Support – Developer must provide technical support to the end user of the container

• Cost – NVIDIA will host the containers on NGC for free

• Container removal – Both NVIDIA and the container developer have the right to take down the

container at any time for any reason


REQUEST FOR FEEDBACK: HPC RUNTIME CONTAINER V0.5

Kind of ingredient Recommended choice Rationale, alternatives

OS Ubuntu 16.04 Aligned with DL, current focus

Alternative: CentOS 7

CUDA version 9.0 Backwards compatible [driver upgrade]

CUDA type Runtime For deployment, not development

Compiler PGI, gcc [and Intel] runtimes Actual compiler not needed for most usages

Comms libraries CUDA-aware OpenMPI 3.0.0

MOFED 3.4-1 libs

OpenMPI CUDA enabling is underway via UCX

Scientific libraries FFTW 3.3.7 [and MKL] Most commonly used

Infrastructure Python 2 and 3, HDF5 1.10.1 Commonly used, may add more tools

Design and validate HPC container, derive from there for apps


REQUEST FOR FEEDBACK: HPC DEVEL CONTAINER V0.5

Kind of ingredient Recommended choice Rationale, alternatives

OS Ubuntu 16.04 Aligned with DL, current focus

Alternative: CentOS 7

CUDA version 9.0 Backwards compatible [driver upgrade]

CUDA type Devel For development, include CUDA toolkit

Compiler PGI, gcc [and Intel] compilers Compiler and its license in private images

only

Comms libraries CUDA-aware OpenMPI 3.0.0

MOFED 3.4-1 libs

OpenMPI CUDA enabling is underway via UCX

Scientific libraries FFTW 3.3.7 [and MKL] Most commonly used

Infrastructure Python 2 and 3, HDF5 1.10.1 Commonly used, may add more tools

Design and validate HPC container, derive from there for apps


Container Namespace Isolation

Docker Singularity

Namespace Isolation Share almost nothing Share almost everything

File System (Mount) Isolated by defaultCan bind mount host volumes

$HOME, /proc, /sys, /tmp, etc., from host by defaultCan bind mount other host volumes

PID Isolated Shared

Network IsolatedCan be expanded with full support

Shared with limited support



MPI

Docker Singularity

MPI library Inside of container Outside of container (host)

MPI Program Binary Inside of container Inside of container

Network Container Host

Security Docker Daemon Inherited from host


hpc in containers - nvidiaon-demand.gputechconf.com/...hpc-in-containers-why-containers-wh… ·...

Documents