using arm® processing for efficient hyperscale … demand for greater efficiency using arm®...

20
1 Demand for Greater Efficiency Using Arm® Processing for Efficient Hyperscale Storage Sco$ Furey Marvell, Associate VP - Enterprise Storage Business

Upload: nguyenkhanh

Post on 19-Apr-2018

233 views

Category:

Documents


3 download

TRANSCRIPT

1

Demand for Greater Efficiency

UsingArm®ProcessingforEfficientHyperscaleStorage

Sco$FureyMarvell,AssociateVP-EnterpriseStorageBusiness

Storage Demand in Data Centers

2

Increased Capacity

Increased Performance

Demand for Greater Efficiency

POW

ER

DEN

SITY

PERFORMANCE

uS

GB/s

Balance Performance, Power, Cost

Architectural tradeoffs on core components

Storage + Networking + Processing

3

Hyperconverged

SSD SSD SSD SSD

SSD SSD SSD SSD

CPU

CPU

CPU

CPU Add Incremental

Compute + Storage

Network

Hyperscale (Disaggregated Compute/Storage)

5

Network

SSD SSD SSD SSD

SSD SSD SSD SSD

CPU

CPU

CPU

CPU

Add Incremental

Compute

Add Incremental

Storage

Current Disaggregated All Flash Array Architecture

6

Substantial CPU capability

Enormous network capacity

Performance at all cost CPU

CPU PCIe

PCIe

SSD SSD SSD

SSD SSD SSD

SSD SSD SSD

SSD SSD SSD

SSD SSD SSD

SSD SSD SSD

SSD SSD SSD

SSD SSD SSD

Nx 100 G

12 - 24 Drives

Cost of Adding Capacity

Expensive compute/network replicated per instance

Next tiers of storage tend to have a very similar server architecture

7

Can embedded storage controllers with Arm® processors better address this?

SSD SSD SSD SSD

SSD SSD SSD SSD

CPU

CPU

CPU

CPU CPU

CPU

CPU

CPU

Network

Embedded ARM Processor in Storage

8

Evolving storage architectures removing several performance barriers

Compute + HW Acceleration = Reduced Cost / Power / Latencies

Improved Efficiencies in Storage IO Stacks

NVM

e

SCSI

OVE

RH

EAD

50% Reduction

Application

OS Scheduler

Block Driver

SCSI/SATA Translation

Device Driver

Application

SCSI Traditional Access Time (ms)

NVMe SSD Access Time (us) VFS

VFS

OS Scheduler

Block Driver

Device Driver

Reduced Networking Overhead with RoCEv2

10

NVMe™ Host Software Host Side Transport Abstraction

NVMe RDMA RDMA Verbs

RoCE

RoCE

RDMA Verbs NVMe RDMA

Controller Side Transport Abstraction NVMe Controller

RDMA Fabric

RoC

Ev2

TCP

OVE

RH

EAD

Reduced Complexity

Adoption of Linux User Space Drivers

Eliminates Interrupts

Fast Scheduling of Threads

Lower Latency / Higher Performance

11

User Space

Hardware Accelerators

DDR

nvmE Stub Driver

Mapper Driver

Kernel

SAS Back End (polled)

NVMe Front End

Run

time

VM

Or R

TOS

Storage Applications

Fast path

Fast path

Slo

w

path

Initialization

Inte

rrup

ts

Efficiency Through Integration/Optimization

Application Optimized Network Capability Hardware Acceleration Scale CPU Cores for the Application Eliminate Additional Storage Fan-Out Devices

12

Network

Arm Compute

Storage

SSD SSD

SSD SSD

SSD SSD

SSD SSD

Nx 25G

Hardware Optimizations for Hyperscale Storage

13

Storage •  Configurable IO to support any storage service •  Virtualize any storage device as an NVMe namespace

Networking •  Optimized for full datapath offload of NVMe-oF •  Zero copy

Hash / Compression / Encryption / Erasure Codes •  Line rate throughput / concurrent operation •  Memory utilization reduced by 60%

Embedded Scale-Out Storage Controller

Multi-Core 64-bit Arm®

4x25G Target RDMA Ports

24 Flexible Storage Ports

Line Rate Hardware Offload

Power < 25W

14

Cache

Dual DDR3/DDR3L/DDR4 64-bit DRAM w/ECC

Multi-Core Arm v8 Processors

Storage Accelerators

NVMe/SAS/SATA

24 x High Speed SERDES

Network Accelerators

RoCEv2/NVMe-oF

4 x 25 G Ethernet

Encryption

Security

RAID/Erasure codes

SHA / Compression

Single Controller All Flash Array

100W Fully Populated

Compact Scalable Unit

Integrate Several Arrays into a Single Chassis

15

Up to 12 direct attached M.2

Single Controller Hybrid Array

HDD Cost/Capacity with Improved Performance

Attach HDDs to Unified NVMe-oF Interconnect

16

2x4 SSD and 16 direct attached SAS

NVMe-oF JBOD/JBOF - Appliances

17

CPU

CPU CPU

CPU

CPU

Network

Low power footprint for flexible scaling capacity vs performance

NVMe-oF

Distributed Storage Cluster

18

CPU

CPU CPU

CPU

CPU

Network

Embedded CPUs enable clustering applications HW acceleration for erasure code generation

Hierarchical / Hybrid – self replicating / self healing

NVMe-oF

STORAGECPU

SSD

SSD

STORAGECONTROLLER

SSD

SSD

STORAGECPU

SSD SSD

SSD SSD

SSD SSD

SSD SSD

STORAGECONTROLLER

SSD SSD

SSD SSD

SSD SSD

SSD SSD

Conclusion

Extension of the storage controller not another server

Optimize the attachment of storage to the server over the network

Integration ensures storage isn’t disproportionately burdened by cost/power of disaggregating hardware

Embedded compute enables Software Defined Storage while hardware offload

improves overall efficiency/performance

19

Thank You

20