omnix: an accelerator-centric os for omni-programmable systems · 2020-02-29 · mark silberstein,...

62
Omnix: an accelerator-centric OS for omni-programmable systems Rethinking the role of CPUs in modern computers Mark Silberstein EE, Technion

Upload: others

Post on 06-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Omnix: an accelerator-centric OS for omni-programmable systems

Rethinking the role of CPUs in modern computers

Mark SilbersteinEE, Technion

Page 2: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 2

Oratorio for CPUs, accelerators and OS in 4 parts

accelerators WePart 1: Accelerando con brio

Part 2: Amore SOSPenuto appassionato

OS CPU accelerators

CPU

Part 4: Tutti accelerando a capella

OmniX = acceleratorsOS

Part 3: Subito non-CPUtto

Page 3: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 3

2004: The free lunch is over

Page 4: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 4

2015: no more lunch as we know itThe last International Technology

Roadmap for Seminconductors (ITRS)

Page 5: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 5

Looking beyond CMOS

● Cryogenic computing● Approximate/stochastic computing● Neuromorphic computing● Biological computing/storage● Quantum computing

Page 6: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 6

Looking beyond CMOS

From «IEEE rebooting computing»

Page 7: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 7

What to do until the next revolution?

Performance

Today

Birth of new technology

New technology matured

?????????

Page 8: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 8

What to do until the next revolution?

Performance

Today

Birth of new technology

New technology matured

Hardware specialization and

near-data accelerators

Page 9: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 9

Computer hardware: circa ~2017

Network I/O accelerator

Storage I/O accelerator

GPU parallelaccelerator

[size ~ transistor count]

Page 10: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 10

Central Processing Units (CPUs) are no longer Central

Network I/O accelerator

Storage I/O accelerator

GPU parallelaccelerator

Programmability

Page 11: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 11

Omni-programmable systemNear-X-execution Units: NXUs

Network I/O accelerator

Storage I/O accelerator

GPU parallelaccelerator

Near-Data Processing

Near-Data Processing

AcceleratedProcessing

Programmability

Page 12: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 12

Challenges of programming omni-programmable systems

Part 2: Amore SOSPenuto appassionato

OS acceleratorsCPU

Page 13: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 13

Truisms

Programming acclerators is hard

but

Programming is hardWriting efficient programs is hard

Multi-threaded programming is hardso…?

Page 14: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 14

Number of NXUs

ProgrammerProductivity

CPU CPU+GPU

Zero CPU+GPU+FPGA

Masochism

Maintaining whole-application efficiency will be hard

Page 15: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 15

Example: image server

1. put: parse → contrast-enhance → store 2. get: parse → resize → store → marshal

putget

Page 16: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 16

Accelerating with NXUs

1. put: parse → contrast-enhance → store 2. get: parse → resize → store → marshal

NIC SSD GPU CPU

Page 17: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 17

Accelerating with NXUs

1. put: parse → contrast-enhance → store 2. get: parse → resize → store → marshal

NIC SSD GPU CPU

Page 18: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 18

Closer look at get

parse req

resize imgstore img

marshal resp

SSDNIC

parse → resize → store → marshal

Page 19: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 19

send(soc,resp)

marshal resp

Realtiy: offloading overheads dominate

SSDNIC

get: parse → resize → store → marshal

CPU

recv(soc,req)

in=open(«f»)c=open(«cache»)read(in,img)

parse req

resize img

write (c,img)

Page 20: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 20

send(soc,resp)

marshal resp

NXUs use CPU to access I/O abstractions!

SSDNIC

get: parse → resize → store → marshal

CPU

recv(soc,req)

in=open(«f»)c=open(«cache»)read(in,img)

parse req

resize img

write (c,img)

No sockets, isolation, transport layer …

No files, protection...

Page 21: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 21

THETHE problem:OS architecture is CPU - centric

GPUStorage

NXU

CPU

NetworkNXU

Page 22: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 22

OmniX: accelerator-centric OS architecture

CPU

Hardware is already here (almost)

OS

S

ervices

Operating system

OS Services

OS

S

erv

ices

GPU StorageNXU

NetworkNXU

Page 23: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 23

marshal resp

send(soc,resp)

Wouldn't it be lovely?

NIC

get: parse → resize → store → marshal

c=open(«cache»)read(in,img)

parse req

resize img

write (c,img)

recv(soc,req)

in=open(«f»)

SSD

Page 24: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 24

OmniX design choices

CPU

Part 3: Subito non-CPUtto

Page 25: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 25

NXU hardware: What does the future hold?

● Q:General purpose computations?

● Q:Support for self-management: Interrupt handling, VM management, privileged execution?

● Q:Discrete or integrated?

● Q:Memory organization?

● Q:System memory model?

Page 26: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 26

NXU hardware: What does the future hold?

● Q:General purpose computations?

● Q:Support for self-management: Interrupt handling, VM management, privileged execution?

● Q:Discrete or integrated?

● Q:Memory organization?

● Q:System memory model?

Page 27: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 27

Will NXUs support self-management?

● Essential for running an OS: – Interrupt handling, in-device VM and address

space management, privileged execution

Self-management is unlikely in the next generations of NXUs

Speculation

Page 28: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 28

Lets learn from GPGPUs

● Emerged as a hack, then endorsed by NVIDIA● Dramatic programmability improvements

~2000

Programmableshaders

~2006

Compute Unified Device Architecture [CUDA]

Cpointersrandom access

functionsrecursion

«fork/exec»

page faults

C++11static link

~2016

Shared Virtual Memory

Any self-management

feature?

Page 29: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 29

Lets learn from GPGPUs

● Emerged as a hack, then endorsed by NVIDIA● Dramatic programmability improvements

~2000

Programmableshaders

~2006

Compute Unified Device Architecture [CUDA]

Cpointersrandom access

functionsrecursion

«fork/exec»

page faults

C++11static link

~2016

Shared Virtual Memory

Great for graphics: Major hardware changes: great performance

Page 30: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 30

Lets learn from GPGPUs

● Emerged as a hack, then endorsed by NVIDIA● Dramatic programmability improvements

~2000

Programmableshaders

~2006

Compute Unified Device Architecture [CUDA]

Cpointersrandom access

functionsrecursion

«fork/exec»

page faults

C++11static link

~2016

Shared Virtual Memory

Great for graphics: Major hardware changes: great performance

Not needed for graphics: Minor hardware changes, Performance so-so!

Exceptions: double precision, precise exceptions

Page 31: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 31

Support for self-management in GPUs?

● Improves graphics performance — NO!● Requires major hardware changes — YES!● Guess what the answer is (and probably will

be)

Speculation

Page 32: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 32

Emerging NXUs: similar piggiback on high-end I/O hardware

● Mellanox Innova: an FPGA glued into Connect-X4 HCA as a bump-in-the-wire

– No self-management support● Smart SSDs from Samsung: re-use existing ARM cores.

– Possibly can run an OS, but normaly do not

Page 33: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 33

Emerging NXUs: similar piggiback on high-end I/O hardware

● Mellanox Innova: an FPGA glued into Connect-X4 HCA as a bump-in-the-wire

– No self-management support● Smart SSDs from Samsung: re-use existing ARM cores.

– Possibly can run an OS, but normaly do not

Performance first, Programmability lastSelf-management will be added if it

contributes to performance + has low hardware cost

Speculation

Page 34: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 34

Emerging NXUs: similar piggiback on high-end I/O hardware

● Mellanox Innova: an FPGA glued into Connect-X4 HCA as a bump-in-the-wire

– No self-management support● Smart SSDs from Samsung: re-use existing ARM cores.

– Possibly can run an OS, but normaly do not

OmniX does not rely on self-management in NXUs

Performance first, Programmability lastSelf-management will be added if it

contributes to performance + has low hardware cost

Speculation

Page 35: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 35

System memory model

● Shared virtual memory with the host● Coherence + remote atomics● Extreme NUMA

Speculation

Page 36: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 36

CAPI/OpenCAPI/CCIX...

● Emerging chip-to-chip interconnects add support to VM and coherence

Page 37: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 37

CAPI/OpenCAPI/CCIX...

● Emerging chip-to-chip interconnects add support to VM and coherence

Page 38: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 38

Part 4: Tutti accelerando a capella

OmniX = acceleratorsOS

Page 39: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 39

OmniX design

● Each NXU runs an optimized library OS ● Shared socket/FD namespace ● Shared virtual address space● Single application OS (unikernel)● Protection via SRIOV

● NXUs invoke tasks and perform I/O directly on their peers, without CPU mediation

● CPU used for setup and management

Page 40: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 40

OmniX design

● Each NXU runs an optimized library OS ● Shared socket/FD namespace ● Shared virtual address space● Single application OS (unikernel)

● NXUs invoke tasks and perform I/O directly on their peers, without CPU mediation

● CPU used for setup and management● Protection via SRIOV

Page 41: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 41

Why CPU mediation is bad?

● Higher latency● CPU is the bottleneck● Poor scalability ● Poor performance isolation

Page 42: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 42

1. Increased I/O latency

GPU-to-GPU roundtrip latency via Infiniband

GPUnet (CPU-mediated): 50 usec

GPUrdma (NIC controled by GPU): 10 usec

Page 43: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 43

2. CPU is the bottleneck● Example: Image Similarity Search, 6 GPUs● Dataset statically partitioned, random data

access, 2ms latency per request● GPU-driven: GPU invocations without CPU

involvement

GPU Driven

Page 44: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 44

3. Poor scaling with #NXUs

GPU DrivenCPU Driven

Page 45: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 45

4. Poor performance isolation

● CPU hog running with multi-GPU server

GPU DrivenCPU Driven

Page 46: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 46

OmniX removes the CPU from both data and control planes

● NXUs invoke tasks and I/O operations on each other and on themselves

NICGPU

nx_wait(nx_exec(MARSHAL)

); NXU

QP/CQ

MARSHAL(){ send(create_msg()) }

Page 47: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 47

Virtual memory as a capability

● NXU task/IO queues are mapped into the shared virtual address space

● Without access to the queue the NXU/device cannot be used

● Mapping/unmapping corresponds to grant/revoke: a privileged operation

Page 48: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 48

Coherent Virtual Shared Memory

● Location and transfer type agnostic– SSD performs send to the NIC

– NIC passes GPU buffer to code running on SSD

● Coherence essential with local caches

Page 49: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 49

Role of the CPU

● Not really. ● Handle exceptions, first access to resources

(files, sockets), cleanup, any privileged operations ● Runs the main program

Page 50: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 50

Open questions

● Does it require reimplementing the FS/Network stack on NXUs?

● Support for near-memory computations● How do we do scheduling?● Support for asynchronous execution on non-premptive

devices

Observation: the problem is similar to that of RDMA access to file server

Page 51: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 51

First steps: OS services for GPUs/NICs

● GPUfs: file system access from GPUs (ASPLOS13,TOCS14,CACM15)

● GPUnet: network abstractions for GPUs (OSDI14,TOCS16)

● GPUrdma: native RDMA for GPUs (ROSS16)

● ActivePointers: In-GPU VM Management (ISCA16)

● GPUPipe: CPU-less network servers (under submission)

● NICA: network application accelerators (ongoing)

Page 52: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 52

OmniX is an ongoing work in Accelerator Computer Systems Lab

● Haggai Eran, Amir Watad, Shai Bergman, Tanya Brokhman, Vasilis Dimistas, Lior Zeno, Maroun Tork, Meni Orenbach, Shai Vakhnin, Lev Rosenblit, Marina Minkin, Pavel Lifshits and Gabi Malka

Page 53: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 53

OmniX: Accelerator-centric OS is essential for

efficiency and programmability of

omni-programmable systems

[email protected]

Page 54: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 54

backup

Page 55: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 55

NXU hardware: What does the future hold?

● Q:General purpose computations?

A: Slow

● Q:Support for self-management:

A: Limited or absent

● Q:Discrete or integrated?

A: Both, but discrete will be faster

● Q:Memory organization?

A: NUMA (NUM-B)

● Q:System memory model?

A: Coherent Shared Virtual Memory

Page 56: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 56

OmniX design implications

● Q:General purpose computations?

A: Slow

● Q:Support for self-management:

A: Limited or absent

● Q:Discrete or integrated?

A: Both, but discrete will be faster

● Q:Memory organization?

A: NUMA (NUM-B)

● Q:System memory model?

A: Coherent Shared Virtual Memory

Run hardware-optimized tasks

User-space code onlyNeed CPU for privileged ops

Locality is critical

Use VM for protection

Page 57: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 57

On-NXU I/O services● Internal services

– I/O abstractions for NXU programs

GPU applicationusing GPUfs File API

OS File System Interface

GPU Memory(Page cache)CPU Memory

GPUfs Distributed Buffer Cache

Unchanged applicationsusing OS File API

GPUfs hooks GPUfs GPU File I/O library

CPU GPU

Page 58: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 58

On-NXU I/O services

● I/O services for the hosting device – Networking on NICs, files on SSDs

● Interfaces– CQ/QP: for networking

– mmap: for storage

● Requirements from hardware– Isolation and QoS

– Security, e.g., avoid spoofing or listen on arbitrary port

Page 59: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 59

Direct local/remote task invocation

Page 60: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 60

Direct local/remote task invocation

NICGPU

nx_wait(nx_exec(MARSHAL)

);

callee

SSD h=nx_exec(SEND);nx_run_after(h,[]{ printf(«done»);}

NXU

QP/CQ

→ SEND

Page 61: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 61

Direct local/remote task invocation

NICGPU

nx_wait(nx_exec(MARSHAL)

);

callee

SSD h=nx_exec(SEND);nx_run_after(h,[]{ printf(«done»);}

NXU

QP/CQ

→ SEND

We implement (almost) that today with GPUsGPUrdma: direct control of the NIC from the GPU

GPUpipe: direct task invocation among GPUs

Page 62: Omnix: an accelerator-centric OS for omni-programmable systems · 2020-02-29 · Mark Silberstein, Technion 33 Emerging NXUs: similar piggiback on high-end I/O hardware Mellanox Innova:

Mark Silberstein, Technion 62

Case in point: CAPI improves performance!