approaching hyperconvergedopenstack

39
Copyright 2015 Bit-isle Inc. All Rights Reserved Approaching Open-Source Hyper-Converged OpenStack using 40Gbit Ethernet Network Ikuo Kumagai – Bit-isie Inc. Yuki Kitajima - Altima corp. Special Thanks Masayoshi Oka - Netone systems

Upload: ikuo-kumagai

Post on 14-Feb-2017

1.241 views

Category:

Technology


2 download

TRANSCRIPT

Copyright 2015 Bit-isle Inc. All Rights Reserved

Approaching

Open-Source Hyper-Converged OpenStack using 40Gbit Ethernet Network

Ikuo Kumagai – Bit-isie Inc.

Yuki Kitajima - Altima corp.

Special Thanks Masayoshi Oka - Netone systems

Copyright 2015 Bit-isle Inc. All Rights Reserved

Background

The Services of Bit-isle

‣ IDC service

▪ iDCs

- 5 iDCs in Tokyo Metropolitan area and 1 iDCs in Osaka.

▪ Network connectivity

- Providing Internet or Private Network

▪ Rental service

- Server and network equipment rentals are available for collocation.

▪ Managed

- This offers a fully managed environment across on-premises, collocation in Data Center.

‣ There is also Cloud services

▪ It is not in today’s topic.

Copyright 2015 Bit-isle Inc. All Rights Reserved

Hyper converged infrastructure – our needs

Element of Hyper Converged

‣ Structured as simple as possible

‣ Deploying as rapid as possible

‣ Managing integrated

‣ As much as possible flexible scalability

Our Concept

‣ No Special Appliance

‣ No product

Copyright 2015 Bit-isle Inc. All Rights Reserved

Our Goals① Providing easily

【Goal】

Short Leadtime at less cost. ▪ Supply as fast as possible

▪ Stock as less as possible

▪ Cost as cheap as possible

‣ Scale easily

▪ Easy to deploy physical machine

▪ Easy to deploy logical conponents

【Method】

‣ Physical systems as simple as possible

▪ Simple Using 1U Server(Our service base)

▪ Only 2 Switch systems

▪ Using Ceph Cluster

Copyright 2015 Bit-isle Inc. All Rights Reserved

Network device

‣ 1 × 40G Network for All Service

‣ 1 × 1G Network for IPMI

OpenStack Nodes

‣ 1 Control and NW

‣ 5 Compute and Storage

Deployment Node

‣ Juju /Maas Server

Basic Structure

Compute&

Storage

CTRL/NW

1 node

Deployment

Router

CTRL/NW

Compute/OSD

Compute/OSD

Compute/OSD

MAAS/Juju

OpenStack Segment IPMI Segment

Compute/OSD

Compute/OSD

Copyright 2015 Bit-isle Inc. All Rights Reserved

【Goal】

Performance is much higher

Base is provided open source.

Specific applications option is for profit.

Our Goals② Performance

【Method】 ▪ Basic Server (spec will be linear upgrade.)

▪ Using 40Gbit/56Gbit Switch

▪ PCIe SSD (for Ceph Journal & OSD)

Copyright 2015 Bit-isle Inc. All Rights Reserved

Resource Server

40GB/Ethernet

Hyper Visor

Compute & Storage Server

KVM

VM

VM

VM

VM

VM

VM

KVM

VM

VM

VM

VM

VM

VM

KVM

VM

VM

VM

VM

VM

VM

KVM

VM

VM

VM

VM

VM

VM

Ceph Cluster SSD(OSD))

SSD(Journal)

SSD(OSD)

SSD(Journal)

SSD(OSD))

SSD(Journal)

SSD(OSD))

SSD(Journal)

Server fundamentals

Server HP ProLiant DL360 Gen9

CPU E5-2690v3 2.60GHz 1P/12C * 2

HDD SAS 1TB HDD *2 2 RAID1 for OS

PCIeSSD Fusion-io iodrive 320GB * 1 1 for Journal , 1 for OSD

40Gbps NIC Mellanox ConnectX3-Pro

Copyright 2015 Bit-isle Inc. All Rights Reserved

Server & Storage

Server(HP DL360 Gen9)

PCIe SSD

‣ Fusion-io ioDrive Duo 320GB

Copyright 2015 Bit-isle Inc. All Rights Reserved

Network Device

HW selection

‣ Adapter (NIC)

‣ Switch

10 / 40 / 56GbE

RDMA supported

VXLAN offload supported

36 ports x QSFP

48 ports x SFP+ , 12 ports x QSFP

12 ports x QSFP (48 ports x SFP+)

*Breakout Cable

10 / 40 / 56GbE

220ns Low Latency

Best suits for SDS Network

Copyright 2015 Bit-isle Inc. All Rights Reserved

【Goal】

Easy to Customize

‣ Deploy server more easily

Sharing knowledge.

Our Goals③ Knowledge Sharing

【Method】

‣ Using Juju/MAAS (open sourced deployment tool)

Copyright 2015 Bit-isle Inc. All Rights Reserved

Deploy

Using Juju/Maas

‣ Nodes Setup (by Local Charm )

▪ Installing OS

▪ Installing Device Drivers & Network settings

- For 40G NIC & PCIe SSD Driver

‣ Deploy Ceph and OpenStack Components

cs: ceph cs: ceph-osd cs:trusty/ntp cs:trusty/ceph cs:trusty/ceph-osd cs:trusty/rsyslog cs:trusty/rsyslog-forwarder-ha local:trusty/nova-compute cs:trusty/percona-cluster cs:trusty/rabbitmq-server-32

cs:trusty/keystone local:trusty/openstack-dashboard local:trusty/nova-cloud-controller cs:trusty/neutron-api cs:trusty/neutron-gateway cs:trusty/cinder cs:trusty/glance cs:trusty/cinder-ceph cs:trusty/neutron-openvswitch cs:trusty/hacluster

Copyright 2015 Bit-isle Inc. All Rights Reserved

Performance Test

Copyright 2015 Bit-isle Inc. All Rights Reserved

Test Items (Network)

KVM

Compute Node-2

OVS

VXLAN

VM VM VM VM

KVM

Compute Node-1

OVS

VM VM VM VM

VM to VM between physical nodes

1 – 16 VM per physical node

Metering by iperf3 TCP & UDP

Copyright 2015 Bit-isle Inc. All Rights Reserved

Basic Perfomance TCP Bandwidth

Total & Average Performance(※iperf3 default)

Bandwidth(GBits/sec)

1-1 2-2 4-4 8-8 16-16

Total 2.05 3.18 5.74 7.18 10.53

Average 2.05 1.59 1.43 0.90 0.66

0.00

2.00

4.00

6.00

8.00

10.00

12.00

1-1 2-2 4-4 8-8 16-16

Total

Average

Copyright 2015 Bit-isle Inc. All Rights Reserved

Basic Perfomance (UDP Bandwidth by packet size)

Total

Average

Copyright 2015 Bit-isle Inc. All Rights Reserved

Basic Perfomance (UDP Laytency by packet size)

Laytency (Jitter)

Lost Packet

Copyright 2015 Bit-isle Inc. All Rights Reserved

Ceph Cluster

Test Items (IOPS)

KVM

Compute Node-2

Network

VM VM VM VM

KVM

Compute Node-1

VM VM VM VM

KVM

Compute Node-3

VM VM VM VM

SSD(Journal)

SSD(Journal)

SSD(Journal)

HDD(OSD) HDD(OSD)

HDD(OSD)

FIO

FIO (8k 100jobs )

‣ 1 – 16 vm (1 ,2 or 4 VM per Host, Hosts count : 1 – 4 )

Copyright 2015 Bit-isle Inc. All Rights Reserved

Basic Performance of storage(Bandwidth)

Bandwidth(8k MByte/sec) ‣ Total

‣ Average

【FYI】 Fio parameters bs=8k size=10M runtime=60 iodepth=32 numjobs=80 group_reporting

Copyright 2015 Bit-isle Inc. All Rights Reserved

Basic Performance of storage (IOPS)

IOPS(8k MByte/sec) ‣ Total

‣ Average

【FYI】 Fio parameters bs=8k size=10M runtime=60 iodepth=32 numjobs=80 group_reporting

Copyright 2015 Bit-isle Inc. All Rights Reserved

More Speed

Copyright 2015 Bit-isle Inc. All Rights Reserved

Counter Plan

In order to using the 40Gbit network more effectively

‣ Network performance improvement

▪ Using VXLAN Offload

- Offloading cpu workload of VXLAN

▪ Using DPDK

- Reduce network function cost of Linux kernel

‣ Ceph IO performance improvement

▪ Using Ceph RDMA

- Enable direct memory access over ethernet for storage cluster

Copyright 2015 Bit-isle Inc. All Rights Reserved

VXLAN offloard

Copyright 2015 Bit-isle Inc. All Rights Reserved

VXLAN offload

OVS + Normal NIC [General Understanding]

‣ VXLAN process handled by OVS.

‣ It means that CPU works for packet process of VXLAN packets.

‣ Normal NIC can NOT take care about,

▪ Checksum, TSO, RSS, etc

Copyright 2015 Bit-isle Inc. All Rights Reserved

VXLAN offload

What is the VXLAN offload

‣ Offload VXLAN protocol on edge-point (NIC)

‣ VXLAN offload engine enables TCP/IP offload

▪ Enable checksum, TSO, RSS, GRO

‣ Get more throughput, Less latency and less CPU resource

VM generate inner packet OVS generates outer packet

Copyright 2015 Bit-isle Inc. All Rights Reserved

VXLAN offload

HW selection

型番 MCX311A-XCCT

MCX312B-XCCT MCX313A-BCCT MCX314A-BCCT

Port Single 10GbE

Dual 10GbE

Single /10/40/56GbE

Dual /10/40/56GbE

Port Type SFP+ SFP+ QSFP QSFP

Cable Cupper, Optical

Host Bus PCIe 3.0 x 8

Features VXLAN/NVGRE offload, RDMA, SR-IOV, etc

OS RHEL, SLES, Microsoft Windows Sever, FreeBSD, Ubuntu, VMWare ESXi

Copyright 2015 Bit-isle Inc. All Rights Reserved

VXLAN offload result(TCP Bandwidth)

Compair VXLAN offload and normal

Bandwidth(GBps)  - VXLAN offload

1-1 2-2 4-4 8-8 16-16

Total 14.40 21.70 30.00 31.43 24.63

Average 14.40 10.85 7.50 3.93 1.54

0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

1-1 2-2 4-4 8-8 16-16

Total

Average

Bandwidth(Gbps) normal

1-1 2-2 4-4 8-8 16-16

Total 2.05 3.18 5.74 7.18 10.53

Average 2.05 1.59 1.43 0.90 0.66

0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

1-1 2-2 4-4 8-8 16-16

Total

Average

Copyright 2015 Bit-isle Inc. All Rights Reserved

DPDK (Data Plane Development Kit)

Copyright 2015 Bit-isle Inc. All Rights Reserved

Virtualization bottole-neck

Bandwidth(GBps)  - VXLAN offload

1-1 2-2 4-4 8-8 16-16

Total 14.40 21.70 30.00 31.43 24.63

Average 14.40 10.85 7.50 3.93 1.54

0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

1-1 2-2 4-4 8-8 16-16

Total

Average

Why

Copyright 2015 Bit-isle Inc. All Rights Reserved

How to use CPU

Allocate CPU Core to DPDK explicit.

Data Plane

Data Plane

DPDK DPDK Kernel

Linux

Processor 0 Processor 1

C1 C2 C3 C4 C1 C2 C3 C4

Data Plane Control Plane

Linux Linux Linux Linux Linux

Copyright 2015 Bit-isle Inc. All Rights Reserved

Network Bottleneck by Linux kernel Stack

Network Bottleneck by Linux kernel Stack

‣ Data Plane Development Kit

Application

General Process DPDK

Linux Kernel

System Call

Packet Copy

Network Device

Device Driver

Hyper-visor

Kernel

Device Driver

Application

DPDK liblary Kernel

Copyright 2015 Bit-isle Inc. All Rights Reserved

DPDK

1 to 1 performance

N to N perfomane

To be verified next month

Copyright 2015 Bit-isle Inc. All Rights Reserved

Ceph RDMA

Copyright 2015 Bit-isle Inc. All Rights Reserved

Ceph RDMA

What is the RDMA? ‣ Remote DMA

‣ Zero-Copy Technology

‣ Protocol

▪ iSER, RoCE, iWARP

Application

General Process RDMA Process

Socket API RDMA verbs API

Application

Socket

TCP

Network Device

Device Driver

Kernel

Device Driver

Copyright 2015 Bit-isle Inc. All Rights Reserved

Ceph RDMA

RDMA network suits for the flash storage

RDMA Advantage for Ceph

‣ Reduce CPU workload of Hypervisors for IO transaction

‣ Much faster IO for east-west traffic and fail-over(fail-back)

‣ Gets higher throughput and IOPS

Total: 45usec Total: 25.7usec

RoCE

Copyright 2015 Bit-isle Inc. All Rights Reserved

Ceph RDMA

Ceph supports RDMA

‣ v0.94 Hammer released

https://ceph.com/releases/v0-94-hammer-released/

http://tracker.ceph.com/projects/ceph/wiki/Accelio_RDMA_Messenger

Fuction :XioMessenger Library :Accelio

Copyright 2015 Bit-isle Inc. All Rights Reserved

Ceph RDMA RESULT

For reference purpose only

‣ 3 nodes Ceph Cluster & Fio access direct rbd

‣ Bandwidth

IOPS

Copyright 2015 Bit-isle Inc. All Rights Reserved

Summary

VXLAN offload is one of the effective solution

The other solutions require continual verification

To be continued.

Copyright 2015 Bit-isle Inc. All Rights Reserved

Next Plan

More Performance

‣ Network workload offload

‣ Increase Memory (DIMM NAND flush)

‣ NVMe SSD and DIMM Storage

Scale Flexibility

‣ Scale Internet Gateway (SDN or NFV)

‣ Multi region scaling

Copyright 2015 Bit-isle Inc. All Rights Reserved

Thank you. See you in our DCs.