beegfs & nvmesh taming i/o-hungry application beasts with · 2019. 8. 22. · taming i/o-hungry...

26
Taming I/O-hungry Application Beasts with BeeGFS & NVMesh Sven Breuner Field CTO [email protected]

Upload: others

Post on 07-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BeeGFS & NVMesh Taming I/O-hungry Application Beasts with · 2019. 8. 22. · Taming I/O-hungry Application Beasts with BeeGFS & NVMesh. Sven Breuner. Field CTO sven@excelero.com

Taming I/O-hungry Application Beasts with BeeGFS & NVMesh

Sven BreunerField [email protected]

Page 2: BeeGFS & NVMesh Taming I/O-hungry Application Beasts with · 2019. 8. 22. · Taming I/O-hungry Application Beasts with BeeGFS & NVMesh. Sven Breuner. Field CTO sven@excelero.com

About Your Speaker: Sven Breuner

M.Sc. in Computer Science, specialization in Distributed Algorithms & Applications

Joined Fraunhofer Center for High Performance Computing in 2005 to create a new parallel file system, which later became BeeGFS

Co-Founder, CEO & CTO of ThinkParQ in 2013 to provide professional services and to extend BeeGFS for use-cases beyond HPC

Joined Excelero in 2019 as Field CTO to be at the forefront of the paradigm shift to the new generation of NVMe-based storage systems

Page 3: BeeGFS & NVMesh Taming I/O-hungry Application Beasts with · 2019. 8. 22. · Taming I/O-hungry Application Beasts with BeeGFS & NVMesh. Sven Breuner. Field CTO sven@excelero.com

•Ultra low latency & high throughput storage to…• Enable deeper analytics & more realistic simulations for better insights• Complete critical tasks in shorter time• Allow for better parallelism in multi-user environments• Enable a new class of efficient algorithms that are not designed around avoiding disk seeks

• Stay ahead of the competition

Why is everyone talking about NVMe storage these days3

But: While it’s easy to throw some NVMe drives at a server these days,it’s usually considered hard to make them play well with the cluster

Page 4: BeeGFS & NVMesh Taming I/O-hungry Application Beasts with · 2019. 8. 22. · Taming I/O-hungry Application Beasts with BeeGFS & NVMesh. Sven Breuner. Field CTO sven@excelero.com

Ingredients for ultra low latency & high throughput storage4

• Software for network access to NVMe, designed for ultra low latency: NVMesh

• Software for scale-out volumes on top of NVMe drives: NVMesh

• Flexibility to…• Have a good system balance (e.g. network interconnect vs drives per server)• Pick the hardware (models & amount) that works best for you• Keep your solution affordable⇨ Software-defined storage (NVMesh + BeeGFS)

• Easy management, so that you don’t have to hug your storage every day to keep it running:⇨ NVMesh + BeeGFS

• A cluster file system that nicely handles…• Small & large files, different access patterns,

various system sizes, concurrent access:BeeGFS

• Hardware technology for ultra low latency & high throughput: NVMe• A fast network: Ideally InfiniBand or 100GbE

Page 5: BeeGFS & NVMesh Taming I/O-hungry Application Beasts with · 2019. 8. 22. · Taming I/O-hungry Application Beasts with BeeGFS & NVMesh. Sven Breuner. Field CTO sven@excelero.com

5

Enter NVMesh…Turn individual NVMe drives into something that’s actually useful for a cluster

Page 6: BeeGFS & NVMesh Taming I/O-hungry Application Beasts with · 2019. 8. 22. · Taming I/O-hungry Application Beasts with BeeGFS & NVMesh. Sven Breuner. Field CTO sven@excelero.com

Ingredient #1: Remote NVMe Access at Local Latency 6

Page 7: BeeGFS & NVMesh Taming I/O-hungry Application Beasts with · 2019. 8. 22. · Taming I/O-hungry Application Beasts with BeeGFS & NVMesh. Sven Breuner. Field CTO sven@excelero.com

Ingredient #2: Scale-out Volumes

File Systems

Logical Volumes

with Multi-Pathing

Ultra Fast Remote NVMe Access

OS

Servers

7

Linux

I/O-hungry Application Beasts

Local

Parallel

Page 8: BeeGFS & NVMesh Taming I/O-hungry Application Beasts with · 2019. 8. 22. · Taming I/O-hungry Application Beasts with BeeGFS & NVMesh. Sven Breuner. Field CTO sven@excelero.com

No Redundancy

N+M Erasure Coding(Distributed RAID5 / RAID6)

Mirroring Parity-based

Ingredient #3: Flexible Data Protection

New!90+% UsableConcatenated

Mirrored(Distributed

RAID1)

Striped(Distributed RAID0)

Striped & Mirrored(Distributed RAID10)

8

Page 9: BeeGFS & NVMesh Taming I/O-hungry Application Beasts with · 2019. 8. 22. · Taming I/O-hungry Application Beasts with BeeGFS & NVMesh. Sven Breuner. Field CTO sven@excelero.com

Ingredient #4: Deployment Your WayConverged - Local Storage in Servers Top-of-Rack Flash

• Single, unified storage pool

• NVMesh runs on all nodes

• NVMesh bypasses server CPU

• Rotating parity

• Linearly scalable

• Single, unified storage pool

• NVMesh Target runs on

storage nodes

• NVMesh Client runs on

application servers

• Applications get

performance of local

storage

• Rotating parity

• Linearly scalable

NVMesh Targets

NVMesh Clients

NVMesh Targets

NVMesh Clients

9

Page 10: BeeGFS & NVMesh Taming I/O-hungry Application Beasts with · 2019. 8. 22. · Taming I/O-hungry Application Beasts with BeeGFS & NVMesh. Sven Breuner. Field CTO sven@excelero.com

10

Enter BeeGFS…Turn fast block storage into fast file storage…

Page 11: BeeGFS & NVMesh Taming I/O-hungry Application Beasts with · 2019. 8. 22. · Taming I/O-hungry Application Beasts with BeeGFS & NVMesh. Sven Breuner. Field CTO sven@excelero.com

Ingredient #5: Turn fast blocks into fast files11

Storage Server #1

Storage Server #2

Storage Server #3

Storage Server #4

Storage Server #5

Metadata Server #1

File #1

1 1 2

File #2File #3

2 3 31 2 3 M M M

Simply grow CAPACITY and PERFORMANCE to the Level that you need

is…

/mnt/beegfs/dir1

A Hardware-independent parallel Cluster FilesystemDesigned for performance-critical Environments

Page 12: BeeGFS & NVMesh Taming I/O-hungry Application Beasts with · 2019. 8. 22. · Taming I/O-hungry Application Beasts with BeeGFS & NVMesh. Sven Breuner. Field CTO sven@excelero.com

BeeGFS Services Overview12

▪ CLIENT SERVICE▪ Native Linux module to mount the file system

▪ STORAGE SERVICE▪ Store the (distributed) file contents

▪METADATA SERVICE▪ Maintain striping information for files▪ Not involved in data access between file open/close

▪MANAGEMENT SERVICE + GUI▪ Service registry and watch dog

▪ BeeGFS Servers assume fast, locally-attached, protected volumes to deliver best-in-class Performance⇨ NVMesh

Page 13: BeeGFS & NVMesh Taming I/O-hungry Application Beasts with · 2019. 8. 22. · Taming I/O-hungry Application Beasts with BeeGFS & NVMesh. Sven Breuner. Field CTO sven@excelero.com

BeeOND: BeeGFS-On-Demand for NVMes you already have13

Create parallel file system instanceson-the-fly

▪ Start/stop with one simple command▪ Can be integrated into cluster batch system

(Slurm, Univa, PBS, …)▪ Common use case:

Per-job parallel file system▪ Aggregate the performance and capacity of local

drives in compute nodes of a job▪ Take load from global storage▪ Speed up "nasty" I/O patterns

▪ NVMesh flexibly creates the volumes and makes them available where needed

Page 14: BeeGFS & NVMesh Taming I/O-hungry Application Beasts with · 2019. 8. 22. · Taming I/O-hungry Application Beasts with BeeGFS & NVMesh. Sven Breuner. Field CTO sven@excelero.com

Turbo-boosted Storagewith BeeGFS & NVMesh

Sven BreunerField CTO, [email protected]

Page 15: BeeGFS & NVMesh Taming I/O-hungry Application Beasts with · 2019. 8. 22. · Taming I/O-hungry Application Beasts with BeeGFS & NVMesh. Sven Breuner. Field CTO sven@excelero.com

BeeGFS + NVMesh: How are they related?

/

OS: Linux

Efficient Applications & Happy Users

Page 16: BeeGFS & NVMesh Taming I/O-hungry Application Beasts with · 2019. 8. 22. · Taming I/O-hungry Application Beasts with BeeGFS & NVMesh. Sven Breuner. Field CTO sven@excelero.com

How to boost Storage with BeeGFS & NVMesh

BeeGFS runs seamlesslyon top of NVMesh

• Various booster options• Metadata target, Storage Pool, BeeOND,

all-flash system• Data protection

• NVMesh adds improved mirroring and erasure coding to BeeGFS

• Logical volumes• NVMe drives can be shared for BeeGFS

and other use-cases• Easy & flexible monitoring

• Grafana dashboards can show BeeGFS & NVMesh workloads

Metadata

BeeOND

Storage Pool

Page 17: BeeGFS & NVMesh Taming I/O-hungry Application Beasts with · 2019. 8. 22. · Taming I/O-hungry Application Beasts with BeeGFS & NVMesh. Sven Breuner. Field CTO sven@excelero.com

Typical AI System with BeeGFS + NVMesh in a Box17

The DGX-2s here are very happy for not starving on I/O :-)

Page 18: BeeGFS & NVMesh Taming I/O-hungry Application Beasts with · 2019. 8. 22. · Taming I/O-hungry Application Beasts with BeeGFS & NVMesh. Sven Breuner. Field CTO sven@excelero.com

What to keep in mind for perfect Balance• 48 PCIe lanes per Intel Socket

(96 PCIe lanes for dual Socket)• 16 PCIe lanes per 100Gbps NIC• 4 PCIe lanes per NVMe drive• To avoid crossing sockets, you are limited to 1 NIC per Socket

(because 2 NICs would consume more than half the available lanes)

⇨ Thus, 1 NIC + 4 NVMe drives per Intel Socket are optimal(2 NICs + 8 NVMe drives per dual-socket server)

Recipe for balanced NVMe Servers

Page 19: BeeGFS & NVMesh Taming I/O-hungry Application Beasts with · 2019. 8. 22. · Taming I/O-hungry Application Beasts with BeeGFS & NVMesh. Sven Breuner. Field CTO sven@excelero.com

BeeGFS & NVMesh in a Box

Balanced Reference Design

• Elegant and dense• 4x Server in 2U, 24x NVMe, 8x 100Gbps NIC

• Always on• Nicely goes with RAID10 or RAID6 (6+2)

• Fully unleashed NVMe performance• Random 4K write IOPS boosted to

<imagine_crazy_high_number_here>• File creates boosted to <imagine_insanely

cool_number_here>

Page 20: BeeGFS & NVMesh Taming I/O-hungry Application Beasts with · 2019. 8. 22. · Taming I/O-hungry Application Beasts with BeeGFS & NVMesh. Sven Breuner. Field CTO sven@excelero.com

BeeGFS & NVMesh Performance “out of the box”20

Clients / Compute Nodes

4-Node 2U Server with 24 NVMe Drives

8x BeeGFS Client, each:• 2x Intel 2630 CPU, 128 GB RAM

• 1x Mellanox Connect-X 4 100Gb InfiniBand

4x BeeGFS & NVMesh Storage Node, each:• 6x WD SN200 3.2TB NVMe

• 2x Intel Xeon 4114 CPU, 192GB RAM

• 2x Mellanox Connect-X 5 100Gb InfiniBand

BeeGFS Storage Node Setup, each:• 4x 100GB MeshProtect 10 volumes for metadata services

• 6x 1.5TB MeshProtect 10 volumes for storage services

OR

• 6x 2.25TB MeshProtect 6 (6+2) volumes for storage services

RDMA Fabric(100Gb InfiniBand)

Page 21: BeeGFS & NVMesh Taming I/O-hungry Application Beasts with · 2019. 8. 22. · Taming I/O-hungry Application Beasts with BeeGFS & NVMesh. Sven Breuner. Field CTO sven@excelero.com

Metadata Write & Read Operations: MeshProtect 10 vs. Buddy Mirror21

BeeGFS file creates boosted to 600,000/s(3x improvement on same hardware)

BeeGFS file stats boosted to >60,000,000/s(2.5x improvement on same hardware)

Page 22: BeeGFS & NVMesh Taming I/O-hungry Application Beasts with · 2019. 8. 22. · Taming I/O-hungry Application Beasts with BeeGFS & NVMesh. Sven Breuner. Field CTO sven@excelero.com

Small I/O (4K) Performance: MeshProtect 10 vs. Buddy Mirror22

BeeGFS random small writes boosted to > 1.25M/s(2.5x improvement on same hardware)

Page 23: BeeGFS & NVMesh Taming I/O-hungry Application Beasts with · 2019. 8. 22. · Taming I/O-hungry Application Beasts with BeeGFS & NVMesh. Sven Breuner. Field CTO sven@excelero.com

23

Large I/O Performance

50%

U

sabl

eCa

paci

ty

75%

Usa

ble

Capa

city

50%

Usa

ble

Capa

city

50%

Usa

ble

Capa

city

50%

U

sabl

eCa

paci

ty

MeshProtectErasure Coding (6+2)

=High Throughput,

more Capacity

75%

Usa

ble

Capa

cit

y

NVMesh Erasure Coding enables over 90% Usable Capacity (11+1) for BeeGFS at NVMe Speed

Page 24: BeeGFS & NVMesh Taming I/O-hungry Application Beasts with · 2019. 8. 22. · Taming I/O-hungry Application Beasts with BeeGFS & NVMesh. Sven Breuner. Field CTO sven@excelero.com

See what’s going on for BeeGFS & NVMesh together24

Combine BeeGFS and NVMesh Grafana Dashboards to produce a unified, end-to-end cluster view.

In this example:• NVMesh Cluster

IOPs• BeeGFS IO load• BeeGFS

metadata operations

• NVMesh cluster throughput

• BeeGFS file system throughput

Page 25: BeeGFS & NVMesh Taming I/O-hungry Application Beasts with · 2019. 8. 22. · Taming I/O-hungry Application Beasts with BeeGFS & NVMesh. Sven Breuner. Field CTO sven@excelero.com

To sum it up: How do NVMesh & BeeGFS boost your storage?25

•Protected Storage & Protected Investment• Easy to manage software-defined storage with flexible redundancy• Freedom to choose components from different manufacturers• No lock-in

•Full NVMe Advantage & Scale-out Performance• NVMesh provides ultra low latency access to NVMe volumes for BeeGFS over the network

• Hardware, volumes and service instances can be added on-the-fly

Page 26: BeeGFS & NVMesh Taming I/O-hungry Application Beasts with · 2019. 8. 22. · Taming I/O-hungry Application Beasts with BeeGFS & NVMesh. Sven Breuner. Field CTO sven@excelero.com

Sven BreunerField [email protected]

Thank you!Questions?

Taming I/O-hungry Application Beastswith BeeGFS & NVMesh