high-performance heterogeneous resource management — cyborg · this encompasses fpgas managed by...

26
High-performance Heterogeneous Resource Management — Cyborg Huawei Luwei he

Upload: others

Post on 25-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High-performance Heterogeneous Resource Management — Cyborg · This encompasses FPGAs managed by Cyborg as well as VGPUs (of multiple types) managed by Nova. Note: This includes

H i g h - p e r f o r m a n c e H e t e r o g e n e o u s R e s o u r c e M a n a g e m e n t — C y b o r g

H u a w e iL u w e i h e

Page 2: High-performance Heterogeneous Resource Management — Cyborg · This encompasses FPGAs managed by Cyborg as well as VGPUs (of multiple types) managed by Nova. Note: This includes

1. Cyborg introdcution

2. Cyborg journey

3. Cyborg concept overview

4. Mainly work in Train release

Agenda

Page 3: High-performance Heterogeneous Resource Management — Cyborg · This encompasses FPGAs managed by Cyborg as well as VGPUs (of multiple types) managed by Nova. Note: This includes

New Era of Domain Specific Architecture

NPU

Neural network processors for machine learning

GPU

GPUs for graphics, virtual reality, ML

SmartNIC/FPGA

Programmable network switches and hardware

Page 4: High-performance Heterogeneous Resource Management — Cyborg · This encompasses FPGAs managed by Cyborg as well as VGPUs (of multiple types) managed by Nova. Note: This includes

How To Manage These DSAs For Cloud ?

Page 5: High-performance Heterogeneous Resource Management — Cyborg · This encompasses FPGAs managed by Cyborg as well as VGPUs (of multiple types) managed by Nova. Note: This includes

Cyborg is a general management framework for acceleratorsProud OpenStack Official Project since 2017.09 (https://github.com/openstack/cyborg )

Page 6: High-performance Heterogeneous Resource Management — Cyborg · This encompasses FPGAs managed by Cyborg as well as VGPUs (of multiple types) managed by Nova. Note: This includes

An Unbelievable Journey

Page 7: High-performance Heterogeneous Resource Management — Cyborg · This encompasses FPGAs managed by Cyborg as well as VGPUs (of multiple types) managed by Nova. Note: This includes

Cyborg’s growth (popularity)

2016 .02- 2017.02

Page 8: High-performance Heterogeneous Resource Management — Cyborg · This encompasses FPGAs managed by Cyborg as well as VGPUs (of multiple types) managed by Nova. Note: This includes

Cyborg’s growth (popularity)

2016 .02- 2017.02 2017 .02- 2017.09

Page 9: High-performance Heterogeneous Resource Management — Cyborg · This encompasses FPGAs managed by Cyborg as well as VGPUs (of multiple types) managed by Nova. Note: This includes

Cyborg’s growth (popularity)

2016 .02- 2017.02 2017 .02- 2017.09 2017 .09- 2019.06

Page 10: High-performance Heterogeneous Resource Management — Cyborg · This encompasses FPGAs managed by Cyborg as well as VGPUs (of multiple types) managed by Nova. Note: This includes

Cyborg’s growth (Diversity)

Q 2018.02 R 2018.08 S 2019.04

Page 11: High-performance Heterogeneous Resource Management — Cyborg · This encompasses FPGAs managed by Cyborg as well as VGPUs (of multiple types) managed by Nova. Note: This includes

Cyborg’s growth (Maturity)

Q 2018.02 R 2018.08 S 2019.04

Page 12: High-performance Heterogeneous Resource Management — Cyborg · This encompasses FPGAs managed by Cyborg as well as VGPUs (of multiple types) managed by Nova. Note: This includes

Cyborg’s growth (Arch Evolution)

2017.09 2018.02 2019.04

Page 13: High-performance Heterogeneous Resource Management — Cyborg · This encompasses FPGAs managed by Cyborg as well as VGPUs (of multiple types) managed by Nova. Note: This includes

cyborg-api

cyborg-conductor

cyborg-agent

SPDK driver

FPGAdriver

GPUdriver

NVMeSSD

GPUdevice

FPGAdevice

cyborg-dbController

Compute Node

Client

Cyborg arch overview

AICHIPdriver

AICHIPdriver

Page 14: High-performance Heterogeneous Resource Management — Cyborg · This encompasses FPGAs managed by Cyborg as well as VGPUs (of multiple types) managed by Nova. Note: This includes

Device

Accelerators

Attach HandlesControlPath ID

Deployables

Term Meaning Placement Represent

Deployable A logical structure in a device that provides a resource.A resource can be an accelerator, local memory, etc.

Resource Provider

Accelerator A logical resource to offload computation, etc. Resource Class Inventory

Device Physical hardware. E.g. PCI card. Includes board (Flash/BMC). --

ControlPath ID Unique identifier to access the device. E.g. PCI PF. --

Attach Handle An ID of the handle to attach to an instance. E.g. PCI VF, mdev UUID.

--

Abstract Device Model

Page 15: High-performance Heterogeneous Resource Management — Cyborg · This encompasses FPGAs managed by Cyborg as well as VGPUs (of multiple types) managed by Nova. Note: This includes

Compute NodeVCPU: 16, Mem: 32768

FPGA ResourceAcc: 2

NPU ResourceAcc: 1

# lspci | grep d1000c:00.0 Processing accelerators: Device 19e5:d100 (rev 20)

Devices discover

Placement

Nova-api

Cyborg agent

Cyborg driver

NPU Device

DB

Compute node

Cyborg-api

Controller node

Page 16: High-performance Heterogeneous Resource Management — Cyborg · This encompasses FPGAs managed by Cyborg as well as VGPUs (of multiple types) managed by Nova. Note: This includes

The mainly work in 2019 (Train Release)

https://etherpad.openstack.org/p/cyborg-train-goals

• Nova-Cyborg Integration.

• Generic Driver Implementation.

• Driver improvement and support.

• Python 3 migration.

• Testing and validation.

Page 17: High-performance Heterogeneous Resource Management — Cyborg · This encompasses FPGAs managed by Cyborg as well as VGPUs (of multiple types) managed by Nova. Note: This includes

Nova-Cyborg Integration

[1] https://specs.openstack.org/openstack/nova-specs/priorities/train-priorities.html

[2] Nova-Cyborg spec: https://review.opendev.org/#/c/603955/ Merged in 2019.06.20

Enable requesting an instance with one or more accelerators either preprogrammed or dynamically programmed.

This encompasses FPGAs managed by Cyborg as well as VGPUs (of multiple types) managed by Nova.

Note: This includes cross-project work with Cyborg. The Cyborg team’s cycle priorities are aligned accordingly.

Page 18: High-performance Heterogeneous Resource Management — Cyborg · This encompasses FPGAs managed by Cyborg as well as VGPUs (of multiple types) managed by Nova. Note: This includes

Nova-Cyborg Integration

Placement

Nova-api

Cyborg agent

Cyborg driver

NPU Device

DB

Rest API

Compute node

Cyborg-api

Controller node

# vim /etc/cyborg/cyborg.conf[agent]enabled_drivers = huawei_ascend_driver

1. Operator can enable or disable drivers

Page 19: High-performance Heterogeneous Resource Management — Cyborg · This encompasses FPGAs managed by Cyborg as well as VGPUs (of multiple types) managed by Nova. Note: This includes

{"name": “ascend-aichip", "groups": [ {"resources:ACCELERATOR_ASCEND": "1", "trait:AICHIP_ASCEND": "required“ } ]}

2. Operator defines device profiles(accelerator flavor)

Nova-Cyborg Integration

Placement

Nova-api

Cyborg agent

Cyborg driver

NPU Device

DB

Rest API

Compute node

Cyborg-api

Controller node

Page 20: High-performance Heterogeneous Resource Management — Cyborg · This encompasses FPGAs managed by Cyborg as well as VGPUs (of multiple types) managed by Nova. Note: This includes

$ openstack flavor set --property \ “accel:device_profile_name=ascend-aichip” \ ascend-flavor

3. Operator sets device profiles in compute flavor

Nova-Cyborg Integration

Placement

Nova-api

Cyborg agent

Cyborg driver

NPU Device

DB

Rest API

Compute node

Cyborg-api

Controller node

Page 21: High-performance Heterogeneous Resource Management — Cyborg · This encompasses FPGAs managed by Cyborg as well as VGPUs (of multiple types) managed by Nova. Note: This includes

Placement

Nova-api

Cyborg agent

Cyborg driver

NPU Device

Rest API

$ openstack boot server –-flavor ascend-flavor vm-with-ascend

4. User requests vm with a flaovr

Cyborg

Nova-sch

Nova-cond

Nova-compute

②③

Nova-Cyborg Integration

Page 22: High-performance Heterogeneous Resource Management — Cyborg · This encompasses FPGAs managed by Cyborg as well as VGPUs (of multiple types) managed by Nova. Note: This includes

The key flow of nova-cyborg interaction

Nova Placement Cyborg Nova Compute

GET /v2/device_profiles?name=mydp

Select a candidate

GET /allocation_candidates

Build and run instance

post accelerator_request

patch accelerator_requestPOST /os-server-external-events

Wait for event notification

get accelerator_request?instance=$uuid&bind_state=resolved

https://review.opendev.org/#/c/603955/ Merged

Page 23: High-performance Heterogeneous Resource Management — Cyborg · This encompasses FPGAs managed by Cyborg as well as VGPUs (of multiple types) managed by Nova. Note: This includes

Generic driver

In order to easy to add the support for the common accelerator, we propose to improve the generic driver to manage these devices.

[1] https://github.com/openstack/cyborg/blob/master/cyborg/accelerator/drivers/generic_driver.py

What we want to be done

class FPGADriver(GenericDriver)

class GPUDriver(GenericDriver)

class NPUDRIVER(GenericDriver)

class FakeDRIVER(GenericDriver)

class GenericDriver(object):

# Discover a specified accelerator. def discover(self):

# Update the device firmware with specific image. def update(self, control_path, image_path)

# Collects device stats def get_stats(self):

Page 24: High-performance Heterogeneous Resource Management — Cyborg · This encompasses FPGAs managed by Cyborg as well as VGPUs (of multiple types) managed by Nova. Note: This includes

Driver improve and support

• FPGA Driver

• Nvidia GPU Driver

• (New)Huawei Ascend Driver

• (New)Intel Movidius Driver

Page 25: High-performance Heterogeneous Resource Management — Cyborg · This encompasses FPGAs managed by Cyborg as well as VGPUs (of multiple types) managed by Nova. Note: This includes

Join us

IRC: #openstack-cyborg

Wechat: Cyborg中国开发讨论组

Page 26: High-performance Heterogeneous Resource Management — Cyborg · This encompasses FPGAs managed by Cyborg as well as VGPUs (of multiple types) managed by Nova. Note: This includes

THANKS