ceph day new york 2014: ceph over high performance networks

13
Building World Class Data Centers Mellanox High Performance Networks for Ceph Ceph Day, October 8 th , 2014

Upload: inktank

Post on 22-Apr-2015

451 views

Category:

Software


0 download

DESCRIPTION

Presented by Asaf Wachtel, Mellanox

TRANSCRIPT

Page 1: Ceph Day New York 2014: Ceph over High Performance Networks

Building World Class Data Centers

Mellanox High Performance Networks for Ceph

Ceph Day, October 8th, 2014

Page 2: Ceph Day New York 2014: Ceph over High Performance Networks

© 2014 Mellanox Technologies 2- Mellanox Confidential -

Agenda

Who are we and what does it have to do with CEPH?

Benefits of High Performance Interconnect for CEPH

Typical deployment scenarios

Advanced performance work in progress

Examples of Commercial Solutions

Page 3: Ceph Day New York 2014: Ceph over High Performance Networks

© 2014 Mellanox Technologies 3- Mellanox Confidential -

Leading Supplier of End-to-End Interconnect Solutions

Virtual Protocol Interconnect

StorageFront / Back-EndServer / Compute Switch / Gateway

56G IB & FCoIB 56G InfiniBand

10/40/56GbE & FCoE 10/40/56GbE

Virtual Protocol Interconnect

Host/Fabric SoftwareICs Switches/GatewaysAdapter Cards Cables/Modules

Comprehensive End-to-End InfiniBand and Ethernet Portfolio

Metro / WAN

Page 5: Ceph Day New York 2014: Ceph over High Performance Networks

© 2014 Mellanox Technologies 5- Mellanox Confidential -

From Scale-Up to Scale-Out Architecture

Only way to support storage capacity growth in a cost-effective manner We have seen this transition on the compute side in HPC in the early 2000s Scaling performance linearly requires “seamless connectivity” (ie lossless, high bw, low latency,

cpu offloads)

Interconnect Capabilities Determine Scale Out Performance

Page 6: Ceph Day New York 2014: Ceph over High Performance Networks

© 2014 Mellanox Technologies 6- Mellanox Confidential -

CEPH and Networks

High performance networks enable maximum cluster availability• Clients, OSD, Monitors and Metadata servers communicate over multiple network layers• Real-time requirements for heartbeat, replication, recovery and re-balancing

Cluster (“backend”) network performance dictates cluster’s performance and scalability• “Network load between Ceph OSD Daemons easily dwarfs the network load between Ceph Clients

and the Ceph Storage Cluster” (Ceph Documentation)

Page 7: Ceph Day New York 2014: Ceph over High Performance Networks

© 2014 Mellanox Technologies 7- Mellanox Confidential -

How Customers Deploy CEPH with Mellanox Interconnect

Building Scalable, Performing Storage Solutions• Cluster network @ 40Gb Ethernet• Clients @ 10G/40Gb Ethernet

Directly connect over 500 Client Nodes• Target Retail Cost: US$350/1TB

Scale Out Customers Use SSDs• For OSDs and Journals

8.5PB System Currently Being Deployed

Page 8: Ceph Day New York 2014: Ceph over High Performance Networks

© 2014 Mellanox Technologies 8- Mellanox Confidential -

CEPH Deployment Using 10GbE and 40GbE

Cluster (Private) Network @ 40GbE• Smooth HA, unblocked heartbeats, efficient data balancing

Throughput Clients @ 40GbE• Guaranties line rate for high ingress/egress clients

IOPs Clients @ 10GbE / 40GbE• 100K+ IOPs/Client @4K blocks

20x Higher Throughput , 4x Higher IOPs with 40Gb Ethernet Clients!(http://www.mellanox.com/related-docs/whitepapers/WP_Deploying_Ceph_over_High_Performance_Networks.pdf)

Throughput Testing results based on fio benchmark, 8m block, 20GB file,128 parallel jobs, RBD Kernel Driver with Linux Kernel 3.13.3 RHEL 6.3, Ceph 0.72.2IOPs Testing results based on fio benchmark, 4k block, 20GB file,128 parallel jobs, RBD Kernel Driver with Linux Kernel 3.13.3 RHEL 6.3, Ceph 0.72.2

Cluster Network

Admin Node

40GbE

Public Network10GbE/40GBE

Ceph Nodes(Monitors, OSDs, MDS)

Client Nodes10GbE/40GbE

Page 9: Ceph Day New York 2014: Ceph over High Performance Networks

© 2014 Mellanox Technologies 9- Mellanox Confidential -

I/O Offload Frees Up CPU for Application Processing

~88% CPU Efficiency

Use

r S

pa

ceS

yst

em S

pa

ce

~53% CPU Efficiency

~47% CPU Overhead/Idle

~12% CPU Overhead/Idle

Without RDMA With RDMA and Offload

Use

r S

pa

ceS

yst

em S

pa

ce

Page 10: Ceph Day New York 2014: Ceph over High Performance Networks

© 2014 Mellanox Technologies 10- Mellanox Confidential -

Phase 1: RDMA (multi-transport) integration of XioMessenger based on Accelio• Basic integration complete, in up-stream contribution process• http://

wiki.ceph.com/Planning/Blueprints/Hammer/Accelio_RDMA_Messenger

Eliminating XioMessenger bottleneck uncovered several others… Increased focus on performance, and weekly community calls

involving RedHat, Cisco, Fujitsu and others:• http://pad.ceph.com/p/performance_weekly

CEPH+RDMA: Coming Soon

Accelio Native simple messenger

Xio messenger CephTCP

CephRDMA

ExpectedRamdisk local perf

BW Read (MB/s) 3200* 980** 3200* 950 1300 2534

BW Write (MB/s) 3200* 980** 3200* 435 500 2361

* Measured on gen2 PCIE system, expected 5000 in gen3 systems** Increased pipe depth to 10000 and 1000000

Page 11: Ceph Day New York 2014: Ceph over High Performance Networks

© 2014 Mellanox Technologies 11- Mellanox Confidential -

Commercial Solutions in the Market Today

ISS: Storage Super Core• 82,000 IOPS on 512B reads

• 74,000 IOPS on 4KB reads

• 1.1 GB/sec on 256KB reads.

http://www.iss-integration.com/supercore.html

OnyxCCS: ElectraStack

Turnkey IaaS multi-tenant computing system

5X Faster Node/Data Restoration

https://www.onyxccs.com/products/8-series

Scalable Informatics: Unison

Archive like storage density with tier 1 performance• 60 drives in a 4U chassis all driven at full bandwidth

https://www.scalableinformatics.com/unison.html

Daystrom/StorageFoundry: Nautilus

Massive-scale storage platform• Single client reaching up to 2.2GB/sec on read throughput.

Easy integration with existing FC storage

http://storagefoundry.net/

Healthcare

High Performance Storage

Cloud

Media / Entertainment

Generally Available, Pre-configured, Pre-tuned Solutions leveraging 40/56Gb/s Cluster Networks

Page 12: Ceph Day New York 2014: Ceph over High Performance Networks

© 2014 Mellanox Technologies 12- Mellanox Confidential -

Summary

CEPH cluster scalability and availability rely on high performance networks

End to end 40/56 Gb/s transport with full CPU offloads available and being deployed• 100Gb/s around the corner

RDMA and more performance improvements coming soon

Page 13: Ceph Day New York 2014: Ceph over High Performance Networks

Thank You