caffe + h2o - by cyprien noel
TRANSCRIPT
Caffe + H2OCyprien Noel
Context - me● Distributed systems - trading, air control, neural nets● Multi-GPU Caffe● Caffe over InfiniBand in Spark
Now at UCB● Caffe: python, help merge forks● Project: how to generalize work above?
○ Help leverage devices, e.g. in H2O○ New distributed Caffe, meta graph
Context - industry
Example
Problem● DPDK● Libfabric● Accelio● UCX● PMEM● More every week...
● GPUDirect● NVM Express● HMM● CAPI● CCIX● HSA● OFED
A single abstraction?● Intra (device bus) vs inter-machine (networks)
○ E.g. CUDA copy and sockets○ RDMA blurs local and remote devices
● Communication vs persistence○ Sockets vs files is orthogonal to location○ NVMe allows storage on remote disks
● Ephemeral vs durable○ 3D XPoint & ReRAM are in-between RAM and SSD○ Intel’s pmem exposes device directly as memory
Proposal● An in-memory file system
○ Location transparent mmap○ Transactional
Example - GPU kernel on data in storage
Today
BFS
● Client reads HDFS path● HDFS client resolves worker● Establishes connection● Server accepts connection
● Authentication, authorization● File system operation● Network transfer● CUDA transfer
data = mmap("/path")gpu_kernel(data)
Example - Compute graph in hardware/app/jpgs/* /layers/* /vars/* // Access DB /redis db = redis.open("./redis")
● Everything is a file○ Using mmap, named pipes, unix sockets○ E.g. inputs jpgs, weights, activations, counters
● All state and coordination in fs○ Minimal code, e.g. persistent GPU kernels○ Location independent → dynamic placement○ Arbitrary graph splitting, e.g. data & model parallel ML
Example - Caffe & H2O
● H2O can write to Caffe input layers○ Data directly placed GPUs○ RDMA atomic ops to count dependencies
● Can form pipelines○ No need for pair wise integrations○ Uniform monitoring, logging etc.○ Leverage best device for each step
Benefits● Performance
○ mmap lowest possible overhead○ Leverages hardware, e.g. GPUDirect, RDMA, NVMe, atomic ops
● Complexity○ Unified naming, permissioning, distributed state management○ Hierarchical naming & location transparency → HA, placement
● Security○ File permissions familiar & kernel level, other networking disabled○ Mounting folder gives access to well defined resources / capabilities
Prototype● Single master with meta data● Distributed mmap (CPU)● Embedded platform (X1)● Ethernet, InfiniBand
Summary● Caffe progress - multi-GPU in python, merge NV work● Working on new programming model
○ “Unix philosophy for modern apps”○ Helps leverage devices, e.g. in H2O○ Simplifies apps integration & pipelines○ Distributed version of Caffe first use case