FCoE Deep Dive

Download FCoE Deep Dive

Post on 24-Apr-2015

353 views

Category:

Documents

1 download

Embed Size (px)

TRANSCRIPT

DCB and FCoE Deep dive

Jaromr Pila (jpilar@cisco.com) Consulting Systems Engineer, CCIE 2910

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

1

What Is I/O ConsolidationIT organizations operate multiple parallel networksIP and other LAN protocols over an Ethernet network SAN over a Fibre Channel network HPC/IPC over an InfiniBand network

I/O consolidation supports all three types of traffic onto a single network

Servers have a common interface adapter that supports all three types of traffic

IPC: Inter-Process Communication 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

2

Consolidation - one of major trends in datacenterBut where is the main consolidation potential ? Majority of ports in fabric is in access layer regardless of fabric type => access layer has the highest potential for consolidation

Different fabrics (network, SAN, HPC) have different requirements => do we have the technology which can serve them all at once? If we have it => is the technology mature enough and affordable to be massively deployed?

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

3

I/O Consolidation in the Network

Processor Memory

Processor Memory

I/OStorage

I/O

I/OLAN

I/O SubsystemStorage LAN

IPC

IPC: Inter-Process Communication 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

IPC

4

I/O Consolidation in the HostFewer CNAs (Converged Network Adapters) instead of NICs, HBAs, and HCAs Limited number of interfaces for Blade ServersFC HBA FC HBA NIC NIC NIC HCA HCA 2006 Cisco Systems, Inc. All rights reserved.

FC Traffic FC Traffic

Enet Traffic Enet Traffic Enet Traffic

CNA CNA

All Traffic Goes over 10 GE

IPC Traffic IPC TrafficCisco Confidential

5

Cabling and I/O Consolidation

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

6

Merging the RequirementsLAN/IPMust be EthernetToo much investment Too many applications that assume Ethernet

StorageMust follow the Fibre Channel model

(Inter-Process Communication)

IPC

Losing frames is not an option

Doesnt care of the underlying network, provided that:It is cheap It is low latency

It supports APIs like OFED, RDS, MPI, sockets

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

7

Why Consolidation Attempts Have Not Succeeded Yet?Previous attemptsFibre Channel InfiniBand iSCSI Never credible as data network infrastructure Not Ethernet Not Fibre Channel

Before PCI-Express there was not enough I/O bandwidth in the servers It needs to be Ethernet, but 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

1 GE didnt have enough bandwidth8

Drivers for 10GE to the ServersMulticore CPU Architectures Allowing Bigger and Multiple Workloads on the Same Machine Server Virtualization Driving the Need for More Bandwidth per Server Due to Server Consolidation Growing Need for Network Storage Driving the Demand for Higher Network Bandwidth to the Server Multicore CPUs and Server Virtualization Driving the Demand for Higher Bandwidth Network Connections 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

9

Enabling Technologies

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

10

Three Challenges + One

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

11

Why Are Frames Lost?CollisionNo longer present in full duplex Ethernet

Transmission ErrorVery rare in the data center

CongestionMost common cause

Congestion is a switch issue, not a link issue

It must be dealt with in the bridge/switchBy IEEE 802.1

A full duplex IEEE 802.3 link does not lose frames

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

12

Can Ethernet Be Lossless?Yes, with Ethernet PAUSE FrameEthernet Link

STOPSwitch A

PAUSE

Queue FullSwitch B

Defined in IEEE 802.3Annex 31B

Ethernet PAUSE transforms Ethernet into a lossless fabric 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

The PAUSE operation is used to inhibit transmission of data frames for a specified period of time

13

How PAUSE WorksThreshold

A

Start Sending Stop Frames for This Frames Again Interval of Time 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

PAUSE Frame

B

14

Lets Compare PAUSE with FC Buffer to Buffer CreditEight credits preagreed

AR_RDY

B

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

15

PAUSE Frame FormatPAUSE Frame01:80:C2:00:00:01 Source Station MAC EtherType = 0x8808 Opcode = 0x0001 Pause_Time

A standard Ethernet frame, not tagged

EtherType = 0x8808 means MAC Control Frame Pause_Time is the time the link needs to remain paused in Pause Quanta (512-bits time)

Opcode = 0x0101 means PAUSE

Pad 42 Bytes

CRC

There is a single Pause_Time for the whole link

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

16

Why Is PAUSE Not Widely Deployed?Inconsistent implementationsEasy to fix Standard allows for asymmetric implementations

PAUSE applies to the whole links

This may cause traffic interference

Single mechanism for all traffic classes

e.g., Storage traffic paused due to a congestion on IP traffic

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

17

Priority Flow Control (PFC)IEEE 802.1Q defines eight priorities No traffic interference Or, vice versa Traffic classes are mapped to different priorities:

a.k.a. PPP (Per Priority Pause) PFC enables PAUSE functionality per Ethernet priority

IP traffic may be paused while storage traffic is being forwarded

High level of industry supportCisco distributed proposal

Requires independent resources per priority (buffers)

Standard track in IEEE 802.1Qbb

16EtherType = IEEE 802.1Q

IEEE 802.1Q Tag

Priority CFI

3

1

12 BitsVLAN ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

18

Priority Flow Control in ActionTransmit QueuesOne

Ethernet Link

Receive QueuesOne Two Three Four Five

Two

Three Four Five SixSTOP PAUSE

Eight Priorities

Seven Eight

Six

Seven Eight

Switch A 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

Switch B19

PFC Frame FormatPriority Flow Control01:80:C2:00:00:01 Source Station MAC EtherType = 0x8808Opcode = 0x0101 Class Enable Vector Time (Class 0) Time (Class 1) Time (Class 2) Time (Class 3) Time (Class 4) Time (Class 5) Time (Class 6) Time (class 7)

Similar to the PAUSE frame

Opcode = 0x0101 is used to distinguish PFC from PAUSE

Class vector indicates for which priorities the frame carries valid Pause information There are eight Time fields, one per priority

Pad 28 Bytes

CRC

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

20

Is Anything Else Required?In Order to Build a Deployable I/O Consolidation Solution, the Following Additional Components Are Required:

Discovery protocol (DCBX) Bandwidth manager

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

21

DCBXHop-by-hop negotiation for:Priority Flow Control (PFC) Bandwidth management Applications Logical link-down

Based on LLDP (Link Level Discovery Protocol) Allows either full configuration or configuration checkingLink partners can choose supported features and willingness to accept configuration from peer Added reliable transport

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

22

Bandwidth ManagementIEEE 802.1Q defines priorities, but not a simple, effective, and consistent scheduling mechanism

Products typically implement some form of Deficit Weighted Round Robin (DWRR) Proposal for HW-efficient, two-level DWRR with strict priority support Standard track in IEEE 802.1QazConfiguration and interworking is problematic

Consistent behavior and configuration across network elements

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

23

Priority GroupsLAN Priorities Are Assigned to Individual Traffic Classes

Priority Groups Are Then Scheduled

SAN

IPC Priority Groups First Level of Scheduling Inside Each Group 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

Final Link Behavior

24

Example of Link Bandwidth AllocationOffered Traffic3 Gbs 3 Gbs 2 Gbs

10 GE Link Realized Traffic Utilization (30%) (30%) HPC Traffic (30%) LAN Traffic (40%) Storage Traffic (30%) T2 (20%)

3 Gbs

4 Gbs

6 Gbs

(50%)

3 Gbs

3 Gbs

3 Gbs

(30%) T1

(30%) T3

T1

T2

T3

HPC TrafficPriority Class High20% Guaranteed Bandwidth

LAN TrafficPriority Class Medium50% Guaranteed Bandwidth 2006 Cisco Systems, Inc. All rights reserved. Cisco Confidential

Storage TrafficPriority Class Medium-High30% Default Bandwidth25

FCoE: Fibre Channel over Ethernet

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential