advanced storage area network designd2zmdbbm9feqrf.cloudfront.net/2016/eur/pdf/brksan-2883.pdf ·...
TRANSCRIPT
Advanced Storage Area Network Design
Edward Mazurek
Technical Lead Data Center Storage Networking
@TheRealEdMaz
BRKSAN-2883
• Introduction
• Technology Overview
• Design Principles
• Storage Fabric Design Considerations
• Data Center SAN Topologies
• Intelligent SAN Services
• Q&A
Agenda
3
An Era of Massive Data GrowthCreating New Business Imperatives for IT
IDC April 2014: The Digital Universe of Opportunities: Rich Data and Increasing Value of Internet of Things
10X Increase in Data Produced (From 4.4T GB to 44T GB)
32B IoT Devices (Will be Connected to Internet)
85% of Data for Which Enterprises Will Have
Liability and Responsibility
40% of Data Will Be “Touched” by Cloud
By 2020
7
Evolution of Storage Networking….
Block and/or File Arrays
Enterprise Apps: OLTP, VDI, etc. Big Data, Scale-Out NAS Cloud Storage (Object)
Multi-Protocol (FC, FICON, FCIP, FCoE, NAS, iSCSI, HTTP)
Performance (16G FC, 10GE, 40GE, 100GE)
Scale (Tens of Thousands P/V Devices, Billions of Objects)
Operational Simplicity (Automation, Self-Service Provisioning)
Fabric
Fabric
Compute Nodes
REST API
8
Enterprise Flash Drives = More IO
• Drive performance hasn’t changed since 2003 (15K drives)
• Supports new application performance requirements
• Price/performance making SSD more affordable
• Solid state drives dramatically increase IOPS that a given array can support
• Increased IO directly translates to increased throughput
Significantly More IO/s per Drive at Much Lower Response Time
100% Random Read Miss 8KB
One Drive per DA Processor - 8 processors
0
10
20
30
40
50
60
70
80
90
100
110
0 5000 10000 15000 20000 25000 30000 35000 40000 45000
IOPs
Re
sp
on
se
Tim
e M
se
c
SATA drives
(8 drives)
15K rpm drives
(8 drives)
Enterprise Flash
Drives (8 drives)
9
Fibre Channel – Foundations
• Foundational protocol, forms the basis of an I/O transaction
• Communications are based upon Point to Point
• Storage is accessed at a block-level via SCSI
• High-performance interconnect providing high I/O throughput
• The foundation for all block-based storage connectivity
• Mature - SCSI-1 developed in 1986
Based on SCSI
Host (Initiator)
SCSI I/O Channel
Disk(Target)
Host (Initiator)
SCSI I/O Channel
Disk(Target)
SCSI READ Operation
SCSI WRITE Operation
16
Fibre Channel - Communications
Point-to-point oriented
• Facilitated through device login
N_Port-to-N_Port connection
• Logical node connection point
Flow controlled
• Buffer-to-buffer credits and end-to-end basis
Acknowledged
• For certain classes of traffic, none for others
Multiple connections allowed per device
Transmitter Receiver
TransmitterReceiver
N_port N_port
Host (Initiator)
Disk(Target)
Transmitter Receiver
TransmitterReceiver
N_port
Host (Initiator)
SAN(Switch)
17
Fibre Channel AddressingEvery Fibre Channel port and node has two 64 bit hard-coded addresses called World Wide Names (WWN)
• NWWN(node) uniquely identify devices
• PWWN(port) uniquely identify each port in a device
• Allocated to manufacturer by IEEE
• Coded into each device when manufactured
Switch Name Server maps PWWN to FCID
Dual Port HBA
10:00:00:00:c9:6e:a8:16
10:00:00:00:c9:6e:a8:17
50:0a:09:83:9d:53:43:54
phx2-9513# show int fc 1/1
fc1/1 is up
Hardware is Fibre Channel, SFP is short wave laser
Port WWN is 20:01:00:05:9b:29:e8:80
0002N-port or
F_port Identifier
IEEE Organizational Unique ID
(OUI)Locally Assigned Identifier
24 bits 24 bits12 bits4 bits
Format Identifier Port Identifier Assigned to each vendor Vendor-Unique Assignment
DiskHost Switch
18
Port Initialization – FLOGI and PLOGIGIs/PLOGIs
N_Port
F_Port
HBA
FC Fabric
Initiator
E_Port
Step 1: Fabric Login (FLOGI)
• Determines the presence or absence of a Fabric
• Exchanges Service Parameters with the Fabric
• Switch identifies the WWN in the service parameters of the accept frame and assigns a Fibre Channel ID (FCID)
• Initializes the buffer-to-buffer credits
Step 2: Port Login (PLOGI)
• Required between nodes that want to communicate
• Similar to FLOGI – Transports a PLOGI frame to the designation node port
• In P2P topology (no fabric present), initializes buffer-to-buffer credits
1
2
3
Target
19
Private Loop DeviceAddress Model
8 Bits
00 00Arbitrated Loop
Physical Address (AL_PA)
Switch Topology ModelSwitchDomain
Area Device
8 Bits 8 Bits
Public Loop DeviceAddress Model
SwitchDomain
AreaArbitrated Loop
Physical Address (AL_PA)
FC_ID Address Model
• FC_ID address models help speed up FC routing
• Switches assign FC_ID addresses to N_Ports
• Some addresses are reserved for fabric services
• Private loop devices only understand 8-bit address (0x0000xx)
• FL_Port can provide proxy service for public address translation
• Maximum switch domains = 239 (based on standard)
20
FSPF
• Provides routing services within any FC fabric
• Supports multipath routing
• Bases path status on a link state protocol similar to OSPF
• Routes hop by hop, based only on the domain ID
• Runs on E ports or TE ports and provides a loop free topology
• Runs on a per VSAN basis. Connectivity in a given VSAN in a fabric is guaranteed only for the switches configured in that VSAN.
• Uses a topology database to keep track of the state of the links on all switches in the fabric and associates a cost with each link
• Fibre Channel standard ANSI T11 FC-SW2
Fabric Shortest Path First
21
FSPF
phx2-5548-3# show fsp database vsan 12
FSPF Link State Database for VSAN 12 Domain 0x02(2)
LSR Type = 1
Advertising domain ID = 0x02(2)
LSR Age = 1400
Number of links = 4
NbrDomainId IfIndex NbrIfIndex Link Type Cost
-----------------------------------------------------------------------------
0x01(1) 0x00010101 0x00010003 1 125
0x01(1) 0x00010100 0x00010002 1 125
0x03(3) 0x00040000 0x00040000 1 62
0x03(3) 0x00040001 0x00040001 1 125
FSPF Link State Database for VSAN 12 Domain 0x03(3)
LSR Type = 1
Advertising domain ID = 0x03(3)
LSR Age = 1486
Number of links = 2
NbrDomainId IfIndex NbrIfIndex Link Type Cost
-----------------------------------------------------------------------------
0x02(2) 0x00040000 0x00040000 1 62
0x02(2) 0x00040001 0x00040001 1 125
phx2-5548-3#
5548 9148-2
1/4
1/3
1/1
1/2
2/13
2/14
2/15
2/16
1
2D2 D3
16G
8G
16G
Port-Channel
22
Frame Fields
SEQ_ID
SEQ_CNT
ULP Information Unit
OX_ID and
RX_ID Exchange
Frame Frame Frame
Sequence Sequence Sequence
Fibre Channel FC-2 Hierarchy
• Multiple exchanges are initiated between initiators (hosts) and targets (disks)
• Each exchange consists of one or more bidirectional sequences
• Each sequence consists of one or more frames
• For the SCSI3 ULP, each exchange maps to a SCSI command
23
What Is FCoE?
From a Fibre Channel standpoint it’s
• FC connectivity over a new type of cable called… Ethernet
From an Ethernet standpoints it’s
• Yet another ULP (Upper Layer Protocol) to be transported
It’s Fibre Channel
FC-0 Physical Interface
FC-1 Encoding
FC-2 Framing & Flow Control
FC-3 Generic Services
FC-4 ULP Mapping
Ethernet Media Access Control
Ethernet Physical Layer
FC-2 Framing & Flow Control
FC-3 Generic Services
FC-4 ULP Mapping
FCoE Logical End Point
24
25
Standards for FCoEFCoE is fully defined in FC-BB-5 standard
FCoE works alongside additional technologies to make I/O Consolidation a reality
T11
IEEE 802.1FCoE
FC on
other
network
media
FC on Other
Network
Media
FC-BB-5
DCB
PFC
802.1Qbb
Lossless
Ethernet
ETS
802.1Qaz
Priority
Grouping
DCBX
802.1Qaz
Configuration
Verification
Technically stable October, 2008Completed in June 2009Published in May, 2010
Sponsor Ballot July 2010
Published Fall 2011
Sponsor Ballot October 2010
Published Fall 2011Sponsor Ballot October 2010
Published Fall 2011Standard
Status25
FCoE Flow ControlIEEE 802.1Qbb Priority Flow Control
• VLAN Tag enables 8 priorities for Ethernet traffic
• PFC enables Flow Control on a Per-Priority basis using PAUSE frames (IEEE 802.1p)
• Receiving device/switch sends Pause frame when receiving buffer passes threshold
• Two types of pause frames
• Quanta = 65535 = 3.3ms
• Quanta = 0 = Immediate resume
• Distance support is determined by how much buffer is available to absorb data in flight after Pause frame sentEthernet Wire
FCoE
Resume
3.3ms
26
ETS: Enhanced Transmission Selection
• Allows you to create priority groups
• Can guarantee bandwidth
• Can assign bandwidth percentages to groups
• Not all priorities need to be used or in groups
IEEE 802.1Qaz
Ethernet Wire
FCoE20%80%20%80%
27
FCoE Is Really Two Different ProtocolsFCoE Itself
• Is the data plane protocol
• It is used to carry most of the FC frames and all the SCSI traffic
• Ethertype 0x8906
FIP (FCoE Initialization Protocol)
• It is the control plane protocol
• It is used to discover the FC entities connected to an Ethernet cloud
• It is also used to login to and logout from the FC fabric
• Uses unique BIA on CNA for MAC
• Ethertype 0x8914
http://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps9670/white_paper_c11-560403.html
The Two Protocols Have• Two different Ethertypes
• Two different frame formats
• Both are defined in FC-BB-5
28
FPMA assigned for each FCID
FPMA composed of a FC-MAP and FCID
FC-MAP – Mapped Address Prefix
the upper 24 bits of the FPMA
FCID is the lower 24 bits of the FPMA
FCoE forwarding decisions still made based on FSPF and the FCID within the FPMA
Domain ID
FC-MAP
(0E-FC-xx)
FC-ID
10.00.01
FC-MAP
(0E-FC-xx)
FC-ID
10.00.01
Fibre Channel
FCID Addressing
FPMA - Fabric Provided MAC AddressFibre Channel over Ethernet Addressing Scheme
Domain
ID 10
FC Fabric Domain
ID 11
FCID
11.00.01
FCID
10.00.01
FPMA
29
What is an FCoE Switch?• FCF (Fibre Channel Forwarder) accepts a Fibre Channel frame
encapsulated in an Ethernet packet and forwards that packet over a VLAN across an Ethernet network to a remote FCoE end device
• FCF is a logical FC switch inside an FCoE switch• Fibre Channel login happens at the FCF
• Contains an FCF-MAC address
• Consumes a Domain ID
• FCoE encapsulation/decapsulation happens within the FCF
• NPV devices are not FCF’s and do not have domainsMDS
Nexus FCoE
FCF
Nexus
FCF
FCoEAttachedStorage
FC
30
FCoE is Operationally Identical
• Supports both FC and FCoE
• FCoE is treated exactly the same as FC
• After zoning device perform registration and then performs discovery
phx2-9513# show fcns database vsan 42
VSAN 42:--------------------------------------------------------------------------FCID TYPE PWWN (VENDOR) FC4-TYPE:FEATURE--------------------------------------------------------------------------0xac0600 N 50:0a:09:83:8d:53:43:54 (NetApp) scsi-fcp:target0xac0700 N 50:0a:09:84:9d:53:43:54 (NetApp) scsi-fcp:target0xac0c00 N 20:41:54:7f:ee:07:9c:00 (Cisco) npv0xac1800 N 10:00:00:00:c9:6e:b7:f0 scsi-fcp:init fc-gs0xef0000 N 20:01:a0:36:9f:0d:eb:25 scsi-fcp:init fc-gs
Which
are
FCoE
hosts?
33
VN_Port
VF_Port
FIP Discovery
E_Ports or
VE_Port
FC-MAC
FC or FCoE
Fabric
Target
ENode
After Link Is Up, Accessing StorageFIP and FCoE Login Process
Step 1: FIP Discovery Process
• Enables FCoE adapters to discover which VLAN to transmit & receive FCoE frames
• Enables FCoE adapters and FCoE switches to discover other FCoE capable devices
• Occurs over Lossless Ethernet
Step 2: FIP Login Process
• Similar to existing Fibre Channel Login (FLOGI) process – Sent to upstream FCF
• FCF assigns the host a FCID and FPMA to be used for FCoE forwarding
• Returns the FCID and the Fabric Provided MAC Address(FPMA) to the ENode
CNA
34
SCSI is the foundation for all
S C S I
Physical Wire
F C P
F C P
S C S I S C S I
i S C S I
I P
T C P
E t h e r n e t
T C P
S C S I
F C P
F C P
FC I P
TC P
I P
E t h e r n e t
FC P
F C o E
S C S I
F C P
F C o E
L o s s l e s s
E t h e r n e t
O p e r a t i n g S y s t e m
35
Connectivity Types
Blade Server
Chassis
E
SwitchSwitch
TENPF
TE
E
SwitchSwitch
VE
Blade Server
Chassis
VE VF VNP
FC FCoE
FF
InitiatorTarget
N
Switch
N VF
InitiatorTarget
VN
Switch
VF VN
36
G_Port
Fibre Channel Port TypesSummary
Fibre Channel Switch
NPV
SwitchFabric
Switch
Fabric
Switch
End
Node
End
Node
NPV
Switch
Fabric
Switch
E_Port
TE_Port
VE_PortVE_Port
TE_Port
E_Port
F_Port
VF_Port
F_Port
VF_Port VNP_Port
N_Port
VN_Port
NP_Port
37
The Story of Interface Speeds• Comparing speeds is more
complex than just the “apparent” speed
• Data throughput is based on both the interface clocking (how fast the interface transmits) and how efficient the interface transmits (how much encoding overhead)
ProtocolClocking
GbpsEncodingData/Sent
Data Rate
Gbps MB/s
8G FC 8.500 8b/10b 6.8 850
10G FC 10.51875 64b/66b 10.2 1,275
10G FCoE 10.3125 64b/66b 10.0 1,250
16G FC 14.025 64b/66b 13.6 1,700
32G FC 28.050 64b/66b 27.2 3,400
40G FCoE 41.250 64b/66b 40.0 5,000
38
38
VSANs
• A Virtual SAN (VSAN) Provides a Method to Allocate Ports within a Physical Fabric and Create Virtual Fabrics
• Analogous to VLANs in Ethernet
• Virtual fabrics created from larger cost-effective redundant physical fabric
• Reduces wasted ports of a SAN island approach
• Fabric events are isolated per VSAN which gives further isolation for High Availability
• FC Features can be configured on a per VSAN basis.
• ANSI T.11 committee and is now part of Fibre Channel standardsas Virtual Fabrics
Introduced in 2002
Per Port Allocation
40
VSAN• Assign ports to VSANs
• Logically separate fabrics
• Hardware enforced
• Prevents fabric disruptions
• RSCN sent within fabric only
• Each fabric service (zone server, name server, login server, etc.) operates independently in each VSAN
• Each VSAN is configured and managed independently
vsan databasevsan 2 interface fc1/1vsan 2 interface fc1/2vsan 4 interface fc1/8vsan 4 interface fc1/9
phx2-9513# show fspf vsan 43FSPF routing for VSAN 43FSPF routing administration status is enabledFSPF routing operational status is UPIt is an intra-domain router Autonomous region is 0MinLsArrival = 1000 msec , MinLsInterval = 2000 msecLocal Domain is 0xe6(230)Number of LSRs = 3, Total Checksum = 0x00012848
phx2-9513# show zoneset active vsan 43zoneset name UCS-Fabric-B vsan 43zone name UCS-B-VMware-Netapp vsan 43 41
VSAN 2
Disk1
Host2Disk4
Host1
Disk2Disk3
Zone A
Zone B
Zone C
VSAN 3
Disk6
Disk5
Host4
Host3
Zone B
Zone A
Zoneset 1
Zoning & VSANs
1. Assign physical ports to VSANs
2. Configure zones within each VSAN• A zone consists of multiple zone members
3. Assign zones to zoneset• Each VSAN has its own zoneset
4. Activate zoneset in VSAN
• Members in a zone can access each other; members in different zones cannot access each other
• Devices can belong to more than one zone
Zoneset 1
42
Zoning examples
• Non-zoned devices are members of the default zone
• A physical fabric can have a maximum of 16,000 zones (9700-only network)
• Attributes can include pWWN, FC alias, FCID, FWWN, Switch Interface fc x/y, Symbolic node name, Device alias
zone name AS01_NetApp vsan 42member pwwn 20:03:00:25:b5:0a:00:06member pwwn 50:0a:09:84:9d:53:43:54
device-alias name AS01 pwwn 20:03:00:25:b5:0a:00:06
device-alias name NTAPmember pwwn 50:0a:09:84:9d:53:43:54
zone name AS01_NetApp vsan 42member device-alias AS01member device-alias NTAP
43
The Trouble with sizable ZoningAll Zone Members are Created Equal
Standard zoning model just has “members”
Any member can talk to any other member
Recommendation: 1-1 zoning
Each pair consumes an ACL entry in TCAM
Result: n*(n-1) entries 0
2,000
4,000
6,000
8,000
10,000
010
20
30
40
50
60
70
80
90
100N
um
ber
of
AC
L E
ntr
ies
Number of Members
Number of ACLs
44
Operation Smart Zoning
Zones Cmds ACLs
Create
zones(s)1 13 64
Add an
initiator+1 +8
Add a
target+1 +16
Operation Today – Many - Many
Zones Cmds ACLs
Create
zones(s)1 13 132
Add an
initiator+1 +24
Add a
target+1 +24
Smart Zoning
• Feature added in NX-OS 5.2(6)
• Allows storage admins to create larger zones while still keeping premise of single initiator & single target
• Dramatic reduction SAN administrative time for zoning
• Utility to convert existing zone or zoneset to Smart Zoning
8 x I
4 x T
Operation Today – 1:1 Zoning
Zones Cmds ACLs
Create
zones(s)32 96 64
Add an
initiator+4 +12 +8
Add a
target+8 +24 +16
45
Zoning Best Practices
• zone mode enhanced
• Acquires lock on all switches while zoning changes are underway
• Enables full zoneset distribution
• zone confirm-commit
• Causes zoning changes to be displayed during zone commit
• zoneset overwrite-control – New in NX-OS 6.2(13)
• Prevents a different zoneset than the currently activated zoneset from being inadvertently activated
Note: Above setting are per-VSAN
48
VSAN 2
Disk1
Host2Disk4
Host1
Disk2Disk3
Zone A
Zone B
Zone C
VSAN 3
Disk6
Disk5
Host4
Host3
Zone BZone A
Zoneset 1
IVR - Inter-VSAN Routing
• Enables devices in different VSANs to communicate
• Allows selective routing between specific members of two or more VSANs
• Traffic flow between selective devices
• Resource sharing, i.e., tape libraries and disks
• IVR Zoneset
• A collection of IVR zones that must be activated to be operational
Zoneset 1
49
Forward Error Correction - FEC
• Allows for the correction of some errors in frames
• Almost zero latency penalty
• Can prevent SCSI timeouts and aborts
• Applies to MDS 9700 FC and MDS 9396S
• Applies to 16G fixed speed FC ISLs only
switchport speed 16000
• Configured via:
switchport fec tts
• No reason not to use it!
9710-2# show interface fc1/8
fc1/8 is trunking
…
Port mode is TE
Port vsan is 1
Speed is 16 Gbps
Rate mode is dedicated
Transmit B2B Credit is 500
Receive B2B Credit is 500
B2B State Change Number is 14
Receive data field Size is 2112
Beacon is turned off
admin fec state is up
oper fec state is up
Trunk vsans (admin allowed and active) (1-2,20,237)
50
Trunking & Port Channels
Up to 16 links can be combined into a PortChannel
increasing the aggregate bandwidth by distributing traffic
granularly among all functional links in the channel
Load balances across multiple links and maintains optimum
bandwidth utilization. Load balancing is based on the
source ID, destination ID, and exchange ID
If one link fails, traffic previously carried on this link is
switched to the remaining links. To the upper protocol, the
link is still there, although the bandwidth is diminished. The
routing tables are not affected by link failure
Single-link ISL or PortChannel ISL can be configured to
become EISL – (TE_Port)
Traffic engineering with pruning VSANs on/off the trunk
Efficient use of ISL bandwidth
Trunking
VSAN3
Trunk
VSAN2
VSAN1
TE Port TE Port
VSAN3
VSAN2
VSAN1
PortChannel
Port Channel
TE Port
E Port
TE Port
E Port
51
N-Port Virtualization
• N-Port Virtualizer (NPV) utilizes NPIV functionality to allow a “switch” to act like a server/HBA performing multiple fabric logins through a single physical link
• Physical servers connect to the NPV switch and login to the upstream NPIV core switch
• No local switching is done on an FC switch in NPV mode
• FC edge switch in NPV mode does not take up a domain ID
• Helps to alleviate domain ID exhaustion in large fabrics
Scaling Fabrics with Stability
FC NPIVCore Switch
FC1/1
FC1/2
FC1/3
Server1N_Port_ID 1
Server2N_Port_ID 2
Server3N_Port_ID 3 F_Port
N-Port
F-Port
F-Port
Blade Server NPV Switch
NP-Port
phx2-9513 (config)# feature npiv
52
Comparison Between NPIV and NPV
NPIV (N-Port ID Virtualization)
•Used by HBA and FC switches
•Enables multiple logins on a single interface
•Allows SAN to control and monitor virtual machines (VMs)
•Used for VMWare, MS Virtual Server and Linux Xen applications
NPV (N-Port Virtualizer)
•Used by FC (MDS 9124, 9148, 9148S, etc.), FCOE switches (Nexus 5K), blade switches and Cisco UCS Fabric InterConnects(UCS6100)
•Aggregate multiple physical/logical logins to the core switch
•Addresses the explosion of number of FC switches
•Used for server consolidation applications
53
NPV Uplink Selection
NPV supports automatic selection of NP uplinks. When a server interface is brought up, the NP uplink
interface with the minimum load is selected from the available NP uplinks in the same VSAN as the
server interface.
When a new NP uplink interface becomes operational, the existing load is not redistributed automatically
to include the newly available uplink. Server interfaces that become operational after the NP uplink can
select the new NP uplink.
Manual method with NPV Traffic-Maps associates one or more NP uplink interfaces with a server
interface.
Note: Use of parallel NPV links will pin traffic to one NPV link. Use of SAN Portchannels with NPV actual
traffic will be load balanced.
http://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus5000/sw/configuration/guide/cli_rel_4_0_1a/CLIConfigurationGuide/npv.html#wp1534672
54
NPV Uplink Selection – UCS Example
• NPV uplink selection can be automatic or manual
• With UCS autoselection, the vHBAs will be uniformly assigned to the available uplinks depending on the number of logins on each uplink
FC NPIVCore SwitchBlade Server NP-Port
F_PortFC1/1
FC1/2
FC1/3
FC1/4
FC1/5
FC1/6
F_Port
F-Port
Cisco UCSNPV Switch
58
Uplink Port Failure
• Failure of an uplink moves pinned hosts from failed port to up port(s)
• Path selection is the same as when new hosts join NPV switch and pathingdecision is made
FC NPIVCore Switch
Port is Down
Blade Server
F_Port
F_Port
FC1/1
FC1/2
FC1/3
FC1/4
FC1/5
FC1/6
NP-Port
F-Port
Cisco UCSNPV Switch
2 devices re-login
59
Uplink Port Recovery
• No automatic redistribution of hosts to recovered NP port
FC NPIVCore Switch
Port is Up
Blade Server
F_PortFC1/1
FC1/2
FC1/3
FC1/4
FC1/5
FC1/6
F_Port
NP-Port
F-Port
Cisco UCSNPV Switch
60
New F-Port Attached Host
• New host entering fabric is automatically pinned to recovered NP_Port
• Previously pinned hosts are still not automatically redistributed
FC NPIVCore Switch
Blade Server
F_Port
F_Port
FC1/1
FC1/2
FC1/3
FC1/4
FC1/5
FC1/6
NP-Port
F-Port
Cisco UCSNPV Switch
61
New NP_Port & New F-Port Attached Host
• NPV continues to distribute new hosts joining fabric
FC NPIVCore Switch
Blade Server
F_Port
F_Port
F_Port
FC1/1
FC1/2
FC1/3
FC1/4
FC1/5
FC1/6
NP-Port
F-Port
New Port Added
Cisco UCSNPV Switch
62
Auto-Load-Balance
npv_switch(config)# npv auto-load-balance disruptive
This is Disruptive
Disruptive load balance works independent of automatic selection of interfaces and a configured traffic map of external
interfaces. This feature forces reinitialization of the server interfaces to achieve load balance when this feature is
enabled and whenever a new external interface comes up. To avoid flapping the server interfaces too often, enable this
feature once and then disable it whenever the needed load balance is achieved.
If disruptive load balance is not enabled, you need to manually flap the server interface to move some of the load to a
new external interface.
http://www.cisco.com/c/en/us/td/docs/switches/datacenter/mds9000/sw/6_2/configuration/guides/interfaces/nx-os/cli_interfaces/npv.html#pgfId-1072790
63
F-Port Port Channel and F-Port TrunkingEnhanced Blade Switch Resiliency
F-Port Port Channel w/ NPV
Bundle multiple ports in to 1 logical link
Any port, any module
High-Availability (HA)
Blade Servers are transparent if a cable, port, or line cards fails
Traffic Management
Higher aggregate bandwidth
Hardware-based load balancing
F-Port Trunking w/ NPV
Partition F-Port to carry traffic for multiple VSANs
Extend VSAN benefits to BladeServers
Separate management domains
Separate fault isolation domains
Differentiated services: QoS, Security
Storage
Bla
de
Sys
tem
F-Port Port Channel
Blade 1
Blade 2
Blade N
F-Port Port Channel
F-PortsN-Ports
Core Director
Bla
de
Sys
tem
F-Port TrunkingCore
Director
VSAN3
F-Port Trunking
F-Port
Blade 1
Blade 2
Blade N
N-Port
VSAN2
VSAN1
64
Port Channeling & Trunking - Configuration
phx2-5548-3# show run interface fc 2/13-14
interface fc2/13
channel-group 1 force
no shutdown
interface fc2/14
channel-group 1 force
no shutdown
Nexus 5548
MDS 9148
fc1/1
fc1/2
fc2/13
fc2/14 1
D2 D3
phx2-5548-3# show run interface san-port-channel 1
interface san-port-channel 1
switchport trunk allowed vsan 1
switchport trunk allowed vsan add 12
67
Port Channeling & Trunking - Configuration
phx2-9148-2# show run interface fc1/1-2
interface fc1/1
channel-group 1 force
no shutdown
interface fc1/2
channel-group 1 force
no shutdown
Nexus 5548
MDS 9148
fc1/1
fc1/2
fc2/13
fc2/14 1
D2 D3
phx2-9148-2# show run interface port-channel 1
interface port-channel1
switchport trunk allowed vsan 1
switchport trunk allowed vsan add 12
68
Port Channel – Nexus switch config
phx2-5548-3# show run int fc 2/9-10
interface fc2/9
switchport mode F
channel-group 3 force
no shutdown
interface fc2/10
switchport mode F
channel-group 3 force
no shutdown
phx2-5548-3# show run int san-port-channel 3
interface san-port-channel 3
channel mode active
switchport mode F
switchport trunk allowed vsan 1
switchport trunk allowed vsan add 12
Nexus 5548
fc2/9 fc2/10
fc2/1 fc2/2
Fabric
Interconnect
D2
3
71
FLOGI – Before Port Channel
phx2-5548-3# show flogi database
--------------------------------------------------------------------------------
INTERFACE VSAN FCID PORT NAME NODE NAME
--------------------------------------------------------------------------------
fc2/9 12 0x020000 20:41:00:0d:ec:fd:9e:00 20:0c:00:0d:ec:fd:9e:01
fc2/9 12 0x020001 20:02:00:25:b5:0b:00:02 20:02:00:25:b5:00:00:02
fc2/9 12 0x020002 20:02:00:25:b5:0b:00:04 20:02:00:25:b5:00:00:04
fc2/9 12 0x020003 20:02:00:25:b5:0b:00:01 20:02:00:25:b5:00:00:01
fc2/10 12 0x020020 20:42:00:0d:ec:fd:9e:00 20:0c:00:0d:ec:fd:9e:01
fc2/10 12 0x020021 20:02:00:25:b5:0b:00:03 20:02:00:25:b5:00:00:03
fc2/10 12 0x020022 20:02:00:25:b5:0b:00:00 20:02:00:25:b5:00:00:00
Total number of flogi = 7
phx2-5548-3#
5548
fc2/9 fc2/10
fc2/1 fc2/2
Fabric
Interconnect
D2
73
FLOGI- After port channel
phx2-5548-3# show flogi database
--------------------------------------------------------------------------------
INTERFACE VSAN FCID PORT NAME NODE NAME
--------------------------------------------------------------------------------
San-po3 12 0x020040 24:0c:00:0d:ec:fd:9e:00 20:0c:00:0d:ec:fd:9e:01
San-po3 12 0x020001 20:02:00:25:b5:0b:00:02 20:02:00:25:b5:00:00:02
San-po3 12 0x020002 20:02:00:25:b5:0b:00:04 20:02:00:25:b5:00:00:04
San-po3 12 0x020003 20:02:00:25:b5:0b:00:01 20:02:00:25:b5:00:00:01
San-po3 12 0x020021 20:02:00:25:b5:0b:00:03 20:02:00:25:b5:00:00:03
San-po3 12 0x020022 20:02:00:25:b5:0b:00:00 20:02:00:25:b5:00:00:00
Total number of flogi = 6
phx2-5548-3#
5548
2/9 2/10
2/1 2/2
Fabric
Interconnect
D2
74
Port-channel design considerations
• All types of switches
• Name port-channels the same on both sides
• Common port allocation in both fabrics
• ISL speeds should be >= edge device speeds
• Maximum 16 members per port-channel allowed
• Multiple port-channels to same adjacent switch should be equal cost
• Member of VSAN 1 + trunk other VSANs
• Check TCAM usage:• show system internal acl tcam-usage
75
port-channel design considerations
• Director class
• Split port-channel members across multiple line cards
• When possible use same port on each LC:• Ex. fc1/5, fc2/5, fc3/5, fc4/5, etc.
• If multiple members per linecard distribute across port-groups• show port-resources module x
76
Port-channel design considerations• Fabric switches
• Ensure enough credits for distance• Can “rob” buffers from other ports in port-group that are “out-of-service”
• Split port-channel member across different forwarding engines to distribute ACLTCAM• For F port-channels to NPV switches (like UCS FIs)
• Each device’s zoning ACLTCAM programming will be repeated on each PC member
• For E port-channels using IVR
• Each host/target session that gets translated will take up ACLTCAM on each member
• Use following table:
• Ex. On a 9148S a six member port-channel could be allocated across the 3 fwd engines as follows:
• fc1/1, fc1/2, fc1/17, fc1/18, fc1/33 and fc1/34
• Consider MDS 9396S for larger scale deployments
77
F port-channel design considerations
Switch/ModuleFwd
EnginesPort Range(s) Fwd-Eng Number
Zoning Region
Entries
Bottom Region
Entries
MDS 9148 3 fc1/25-36 & fc1/45-48 1 2852 407
fc1/5-12 & fc1/37-44 2 2852 407
1-4 & 13-24 3 2852 407
MDS 9250i 4 fc1/5-12 & eth1/1-8 1 2852 407
fc1/1-4 & fc1/13-20 &
fc1/37-402 2852 407
fc1/21-36 3 2852 407
ips1/1-2 4 2852 407
MDS 9148S 3 1-16 1 2852 407
17-32 2 2852 407
33-48 3 2852 407
Ports are allocated to fwd-engines according the following table:
78
F port-channel design considerations
Switch/ModuleFwd
EnginesPort Range(s) Fwd-Eng Number
Zoning Region
Entries
Bottom Region
Entries
MDS 9396S 12 1-8 0 49136 19664
9-16 1 49136 19664
17-24 2 49136 19664
25-32 3 49136 19664
33-40 4 49136 19664
41-48 5 49136 19664
49-56 6 49136 19664
57-64 7 49136 19664
65-72 8 49136 19664
73-80 9 49136 19664
81-88 10 49136 19664
89-96 11 49136 19664
79
F port-channel design considerations
Switch/ModuleFwd
EnginesPort Range(s) Fwd-Eng Number
Zoning Region
Entries
Bottom Region
Entries
DS-X9248-48K9 1 1-48 0 27168 2680
DS-X9248-96K9 2 1-24 0 27168 2680
25-48 1 27168 2680
DS-X9224-96K9 2 1-12 0 27168 2680
13-24 1 27168 2680
DS-X9232-256K9 4 1-8 0 49136 19664
9-16 1 49136 19664
17-24 2 49136 19664
25-32 3 49136 19664
DS-X9248-256K9 4 1-12 0 49136 19664
13-24 1 49136 19664
25-36 2 49136 19664
37-48 3 49136 19664
80
F port-channel design considerations
Switch/ModuleFwd
EnginesPort Range(s) Fwd-Eng Number
Zoning Region
Entries
Bottom Region
Entries
DS-X9448-768K9 6 1-8 0 49136 19664
9-16 1 49136 19664
17-24 2 49136 19664
25-32 3 49136 19664
33-40 4 49136 19664
41-48 5 49136 19664
81
Internal CRC handling
• New feature to handle frames internally corrupted due to bad HW
• Frames that are received corrupted are dropped at the ingress port
• These frames are not included in this feature
• In rare cases frames can get corrupted internally due to bad hardware
• These are then dropped
• Sometimes difficult to detect
• New feature detects the condition and isolates hardware
• 5 possible stages where frames can get corrupted
82
Internal CRC handling
• Stages of Internal CRC Detection and Isolation
The five possible stages at which internal CRC errors may occur in a switch:
1. Ingress buffer of a module
2. Ingress crossbar of a module
3. Crossbar of a fabric module
4. Egress crossbar of a module
5. Egress buffer of a module
83
Internal CRC handling
• The modules that support this functionality are:
• Cisco MDS 9700 48-Port 16-Gbps Fibre Channel Switching Module
• Cisco MDS 9700 48-Port 10-Gbps Fibre Channel over Ethernet Switching Module
• Cisco MDS 9700 Fabric Module 1
• Cisco MDS 9700 Supervisors
• Enabled via the following configuration command:
• hardware fabric crc threshold 1-100
• When detected failing module is powered down
• New in NX-OS 6.2(13)
84
Device-alias
• device-alias(DA) is a way of naming PWWNs
• DAs are distributed on a fabric basis via CFS
• device-alias database is independent of VSANs
• If a device is moved from one VSAN to another no DA changes are needed
• device-alias can run in two modes:
• Basic – device-alias names can be used but PWWNs are substituted in config
• Enhanced – device-alias names exist in configuration natively – Allows rename without zoneset re-activations
• device-alias are used in zoning, IVR zoning and port-security
• copy running-config startup-config fabric after making changes!
85
Device-alias
• device-alias confirm-commit
• Displays the changes and prompts for confirmation
MDS9710-2(config)# device-alias confirm-commit enable
MDS9710-2(config)# device-alias database
MDS9710-2(config-device-alias-db)# device-alias name edm pwwn 1000000011111111
MDS9710-2(config-device-alias-db)# device-alias commit
The following device-alias changes are about to be committed
+ device-alias name edm pwwn 10:00:00:00:11:11:11:11
Do you want to continue? (y/n) [n]
86
Device-alias
• Note: To prevent problems the same device-alias is only allowed once per commit.
• Example:
MDS9148s-1(config)# device-alias database
MDS9148s-1(config-device-alias-db)# device-alias name test pwwn 1122334455667788
MDS9148s-1(config-device-alias-db)# device-alias rename test test1
Command rejected. Device-alias reused in current session :test
Please use 'show device-alias session rejected' to display the rejected set of commands and for the device-alias best-practices recommendation.
87
Cisco Prime Data Center Network ManagerFeature Support and User Interface
VMpath Analysis provides VM connectivity to network and storage across Unified Compute and Unified Fabric
• Visibility past physical access (switch) layer
• Standard & Custom Reports
• On Nexus and MDS platforms
• Dynamic Topology Views
• Rule-based event filtering and forwarding
• Threshold Alerting
• Integration via vCenter API
88
SAN Design Security ChallengesSAN design security is often overlooked as an area of concern
• Application integrity and security is addressed, but not back-end storage network carrying actual data
• SAN extension solutions now push SANs outside datacenter boundaries
Not all compromises are intentional
• Accidental breaches can still have the same consequences
SAN design security is only one part of complete data center solution
• Host access security—one-time passwords, auditing, VPNs
• Storage security—data-at-rest encryption, LUN security
SAN
LAN
FC
UnauthorizedConnections
(Internal)ApplicationTampering
(Trojans, etc.)
Privilege Escalation/Unintended Privilege
External DOSor OtherIntrusion
DataTampering
Theft
89
SAN Security
Secure management access• Role-based access control
• CLI, SNMP, and web access
Secure management protocols• SSH, SFTP, and SNMPv3
Secure switch control protocols• TrustSec
• FC-SP (DH-CHAP)
AAA - RADIUS, TACACS+ and LDAP• User, switch and iSCSI host authentication
Fabric Binding• Prevent unauthorized switches from joining the
fabric
Device/SANManagement
Security Via SSH, SFTP, SNMPv3, and
User Roles
SAN Protocol Security(FC-SP)
Shared Physical Storage
VSANs ProvideSecure
Isolation
iSCSI-AttachedServers
Hardware-Based Zoning Via Port and WWN
RADIUS or TACACS+ or LDAP
Server for Authentication
90
Slow Drain
• Devices can impart slowness in a fabric
• Feature of the fabric that’ll expose that device for remediation
• BRKSAN-3446 SAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric
Slow Drain Device Detection and Congestion Avoidance
http://www.cisco.com/en/US/prod/collateral/ps4159/ps6409/ps12970/white_paper_c11-729444.pdfWhite paper (2013)
91
SAN designs traditionally robust: dual fabrics, data loss is not tolerated
Must manage ratios
• Fan in/out
• ISL oversubscription
• Virtualized storage IO streams (NPIV attached devices, server RDM, LPARs, etc.)
• Queue depth
Latency
• Initiator to target
• Slow drain
• Performance under load: does my fabric perform the same
Application independence
• Consistent fabric performance regardless of changes to SCSI profile• Number of frames
• Frame size
• Speed or throughput
The Importance of “Architecture”
93
SAN Major Design FactorsPort density
• How many now, how many later?
• Topology to accommodate port requirements
Network performance
• What is acceptable? Unavoidable?
Traffic management
• Preferential routing or resource allocation
Fault isolation
• Consolidation while maintaining isolation
Management
• Secure, simplified management
1
High Performance
Crossbar
2
QoS, Congestion
Control, Reduce FSPF
Routes
Failure of One Device Has No Impact on Others
Large Port Count
Directors
94
3
4
8 8 8 8 8 8 8 88 8 88
94
Scalability—Port DensityTopology Requirements
Considerations
• Number of ports for end devices
• How many ports are needed now?
• What is the expected life of the SAN?
• How many will be needed in the future?
• Hierarchical SAN design
Best Practice
• Design to cater for future requirements
• Doesn’t imply “build it all now,” but means “cater for it” and avoids costly retrofits tomorrow
8 8 8 8 8 8 8 88 8 88
Large Port Count
Directors
95
Scalability—Port Density – MDS Switch selection
• MDS 9148S – 48 ports 16G FC
• MDS 9250i – 40 ports 16G FC + 8 port 10G FCoE + 2 FCIP ports
• MDS 9396S – 96 ports 16G FC
• MDS 9706 – Up to 192 ports 16G FC and/or 10G FCoE and/or 40G FCoE
• MDS 9710 – Up to 384ports 16G FC and/or 10G FCoE and/or 40G FCoE
• MDS 9718 – Up to 768 ports 16G FC and/or 10G FCoE and/or 40G FCoE
• All MDS 97xx chassis are 32G ready!
• All 16G MDS platforms are full line rate
96
Scalability—Port Density – Nexus Switch selection
• Nexus 55xx – Up to 96 ports 10G FCoE and/or 8G FC ports
• Nexus 5672UP – Up to 48 10G FCoE and/or 16 8G FC ports
• Nexus 5672UP-16G – Up to 48 10G FCoE and/or 24 16G FC ports
• Nexus5624Q – 12 ports 40G or 48 ports 10G FCoE
• Nexus5648Q – 24 ports 40G or 96 ports 10G FCoE
• Nexus5696Q – Up to 32 ports 100G / 96 ports 40G / 384 ports 10G FCoE or 60 8G FC
• Nexus 56128P – Up to 96 10G FCoE and/or 48 8G FC ports
• All Nexus platforms are full line rate
97
Traffic Management
Do different apps/servers have different performance requirements?
• Should bandwidth be reserved for specific applications?
• Is preferential treatment/QoS necessary?
Given two alternate paths for traffic between data centers, should traffic use one path in preference to the other?
• Preferential routes
8 8 8 8 8 8 8 88 8 88
QoS, Congestion
Control, Reduce FSPF
Routes
98
Port Channels Help Reduce
Oversubscription While Maintaining HA Requirements
Host Oversubscription
Largest variance observed at this level. DB servers close to line rate, others highly
oversubscribed
16Gb line cards non-oversubscribed
Network PerformanceOversubscription Design Considerations
All SAN Designs Have Some Degree of Oversubscription
• Without oversubscription, SANs would be too costly
• Oversubscription is introduced at multiple points
• Switches are rarely the bottleneck in SAN implementations
• Device capabilities (peak and sustained) must be considered along with network oversubscription
• Must consider oversubscription during a network failure event
Disk Oversubscription
Disk do not sustain wire-rate I/O with ‘realistic’ I/O mixtures
Vendors may recommend a 6:1 to as high as 20:1 host to disk
fan-out ratio
Highly application dependent
ISL Oversubscription
Two-tier design ratio less than fan-out ratio
Tape Oversubscription
Need to sustain close to maximum data rate
LTO-6 Native Transfer Rate ~ 160 MBps
8 8 8 8 88
99
Fault IsolationConsolidation of Storage
• Single Fabric = Increased Storage Utilization + Reduced Administration Overhead
Major Drawback
• Faults Are No Longer Isolated
• Technologies such as VSANs enable consolidation and scalability while maintaining security and stability
• VSANs constrain fault impacts
• Faults in one virtual fabric (VSAN) are contained and do not impact other virtual fabrics
Physical SAN Islands Are Virtualized onto Common
SAN Infrastructure
Fabric#1
Fabric#3
Fabric#2
100
Horizontal Cabling
Vertical Cabling
Main X-Connect
DC Infrastructure ChangesDenser: cabinets, cross-connects cable runs
Horizontal Cabling: from 10G, through 40G to 100G – longer distances
Vertical Cable: match appropriate server connectivity choice
Is SAN EoR economical now?
Denser Server CabinetsWhat are the implications?
EoR X-Connect
ToR
From 42U to ~58U
Uplinks change from 40 GE servers to 4x 10G servers
102
Structured Cabling
• Pricing advantage for manufactured cabling systems
• Removes guessing game of how many strands to pull per cabinet
• Growth at 6 or 12 LC ports per cassette
• Fiber-only cable plant designs possible
Supporting new EoR & ToR designs
103
Core-Edge
• Traditional SAN design for growing SANs
• High density directors in core and fabric switches, directors or blade switches on edge
• Predictable performance
• Scalable growth up to core and ISL capacity
• Evolves to support EoR & ToR
• MDS 9718 as core
Highly Scalable Network Design
Blade ServerEnd of Row Top of Rack
105
Large Edge-Core-Edge/End-of-Row DesignLarge Edge/Core/Edge(2496 End Device Ports per Fabric)
• Traditional Edge-Core-Edge design Is ideal for very large centralized services and consistent host-disk performance regardless of location
• Full line rate ports, no fabric oversubscription
• 8Gb or 16Gb hosts and targets
• Services consolidated in the core
• Easy expansion
“A” Fabric Shown, Repeat for “B” Fabric
240 Storage ports at 16Gb
(optional 480 @ 8Gb without changing bandwidth ratios)
240 ISLs from storage edge to core @ 16Gb
240 ISLs from host
edge to core @ 16Gb
1680 hosts @ 8Gb
or 16Gb
Ports Deployed 3456 per fabric 6,912 total
Used Ports 5,760 total @ 16Gb
6,240 total @ 8Gb
Storage Ports 480 total @ 16Gb, or
960 total @ 8Gb
Host Ports 3360 total
ISL ports 960 total
Host ISL Oversubscription 7:1 @ 16Gb
End to End
Oversubscription
7:1 @ 16Gb storage
7:1 @ 8Gb storage
MD
S 9
71
0
24
120
106
Very Large Edge-Core/End-of-Row Design
Very Large Edge/Core/Edge(6144 End Device Ports per Fabric)
• Traditional Core-Edge design Is ideal for very large centralized services and consistent host-disk performance regardless of location
• Full line rate ports, no fabric oversubscription
• 16Gb hosts and targets
• Services consolidated in the core
• Easy expansion
“A” Fabric Shown, Repeat for “B” Fabric576(288 per switch)
Storage ports at 16Gb
768(48 per switch)
ISLs from host edge to core @ 16Gb
4032 (252 per
switch) hosts @ 8Gb or 16Gb
Ports Deployed 12,288
Used Ports 10,368 @ 16Gb
Storage Ports 1152 @ 16Gb
Host Ports 8064
ISL ports 768
Host ISL Oversubscription 7:1 @ 16Gb
End to End Oversubscription 7:1 @ 16Gb storage
MDS 9710
MDS 9718
24
107
SAN Top of Rack – MDS 9148S
Rack
Ports Deployed 5,376
Used Ports 5,344
Storage Ports 352 @ 16Gb
Host Ports 4,224
Host ISL Oversubscription 12:1 @ 16Gb
End to End Oversubscription 12:1 @ 16G hosts
4 ISLs from each
edge to core @ 16Gb
352 Storage ports at 16Gb
4,224 hosts @ 16Gb
48 Racks
44 Dual-attached servers per rack
A B
SAN Top of Rack(5,376 Usable Ports)
• Ideal for centralized services while reducing cabling requirements
• Consistent host/target performance regardless of location in rack
• 8Gb hosts & 16Gb targets
• Easy edge expansion
• Massive cabling infrastructure avoided as compared to EoR designs
• Additional efficiencies with in rack IO convergence
MDS 9710
MDS 9148S
108
Top-of-Rack Design - Blade Centers
Rack
Ports Deployed 1,920
Used Ports 192 @ 16Gb
1056 @ 8Gb
Storage Ports 192 @ 16Gb, or
192 @ 8Gb
Host Ports 2304
Host ISL Oversubscription 4:1 @ 8G
End to End Oversubscription 6:1 @ 16Gb Storage
12:1 @ 8Gb Storage
8 ISLs from each
edge to core @ 8Gb
96 Storage ports
at 16Gb
960 hosts @ 8Gb
12 Racks, 72 chassis
96 Dual-attached blade servers per rack
SAN Top of Rack – Blade Centers(1,920 Usable Ports per Fabric)
• Ideal for centralized services
• Consistent host/target performance regardless of location in blade enclosure or rack
• 8Gb hosts & 16Gb targets
• Need to manage more SAN Edge switches/Blade Switches
• NPV attachment reduces fabric complexity
• Assumes little east-west SAN traffic
• Add blade server ISLs to reduce fabric oversubscription
A B
MDS 9710
Blade Center
109
Medium Scale Dual FabricCollapsed Core Design
Ports Deployed 768
Used Ports 768@ 16Gb
Storage Ports 96 @ 16Gb
Host Ports 672 @ 16Gb
Host ISL Oversubscription N/A
End to End Oversubscription 7:1 @ 16Gb
96 Storage ports at 16Gb
672 hosts @ 16Gb
“A” Fabric Shown, Repeat for “B” FabricMedium Scale Dual Fabric
(768 Usable Ports per Fabric)
• Ideal for centralized services
• Consistent host/target performance regardless of location
• 8Gb or 16Gb hosts & targets (if they exist)
• Relatively easy edge expansion to Core/Edge
• EoR design
• Supports blade centers connectivity
MDS 9710
110
POD SAN Design
6 ISLs from each edge to core @ 16Gb
36-48 Storage ports at 16Gb
252 hosts @ 16Gb or 288 hosts @ 10Gb
6 Racks, 252 chassis
42 Dual-attached servers per rack
POD SAN DesignIdeal for centralized services
• Consistent host/target performance regardless of location in blade enclosure or rack
• 10/16Gb hosts & 16Gb targets
• Need to manage more SAN Edge switches/Blade Switches
• NPV attachment reduces fabric complexity
• Add blade server ISLs to reduce fabric oversubscription
A B
6 Racks, 288 blades
48 Dual-attached blade servers per rack
8 ISLs from each edge to core @ 8Gb
UCS FI 6248UP
MDS 9396SMDS 9396S
MDS 9148S
111
FI 6332-16UP, FI 6332 UCS SAN Design
FI 6332-16UP Use Case FI 6332 Use Case
FI 6332-16UP
MDS9700
Storage Array
Nexus
7K/9K
UCS
B-Series
B200
B260
B460
and
IOM 2304
UCS
C-Series
C220
C240
C460
40G 40G
16G FC
40G
16G FC
40G
FI 6332
MDS9700
Storage Array
Nexus
7K/9K
UCS
B-Series
B200
B260
B460
and
IOM 2304
UCS
C-Series
C220
C240
C460
40G 40G
40G FCoE
40G
40G FCoE
40G
112
Enhancing SAN Design with ServicesExtend Fabrics
• FCIP
• Extended Buffer to Buffer credits
• Encrypt the pipe
SAN Services extend the effective distance for remote applications
• SAN IO acceleration
• Write acceleration
• Tape acceleration
Enhance array replication requirements
Reduces WAN-induced latency
Improves application performance over distance
Data Migration
SAN Extension with FCIP
Fabric is aware of all data frames from initiator to target
Data Migration withDMM IO Acceleration
114
SAN Extension with FCIP
• Encapsulation of Fibre Channel frames into IP packets and tunneling through an existing TCP/IP network infrastructure, in order to connect geographically distant islands
• Write Acceleration to improve throughput and latency
• Hardware-based compression
• Hardware-based IPSec encryption
Fibre Channel over IP
FCIP Tunnel TE Port
Array to Array Replication
115
FC Redirect - How IOA Works
MAN/WAN
IOA= I/O Accelerator
IOA
IOA
IOA
IOA
ReplicationStarts
Replication Starts
Initiator to target
Flow redirected to IOA Engine
Flow accelerated and sent towards normal
routing path
Initiator Target
Virtual Initiator Virtual Target
Initiator Target
116
Data Acceleration
• Accelerate SCSI I/O
Over both Fibre Channel (FC) and Fibre Channel over IP (FCIP) links
For both Write Acceleration (WA) and Tape Acceleration (TA)
• I/O Acceleration Node platforms: MSM-18/4, SSN-16, MDS-9222i, MDS-9250i
• Uses FC Redirect
A fabric service to accelerate I/O between SAN devices
MAN/WAN
IOA= I/O Accelerator
IOA
IOA
IOA
IOA
117
IOA FCIP Tape Backup
Highly resilient– Clustering of IOA engines allows for load balancing and failover
Improved Scalability- Scale without increasing management overhead
Significant reutilization of existing infrastructure- All chassis and common equipment re-utilized
Flat VSAN topology- Simple capacity and availability planning
Large Health Insurance Firm
MDS IOA Results
FCIP92% throughput increase
118
SAN Extension – FC over long distance BB_Credits and Distance
16 Km
2 Gbps FC
8 Gbps FC~0.25 km per Frame
~1 km per Frame
4 Gbps FC~0.5 km per Frame
16 Gbps FC ~0.125 km per Frame
phx2-9513(config)# feature fcrxbbcredit extended
phx2-9513(config)# interface 1/1
phx2-9513(config-if)# switchport fcrxbbcredit extended 1000
phx2-9513# show interface 1/1
fc1/1 is up
…..
Transmit B2B Credit is 128
Receive B2B Credit is 1000
• BB_Credits are used to ensure enough FC frames in flight
• A full (2112 byte) FC frame is approx 1 km long @ 2 Gbps, ½ km long @ 4 Gbps ¼ km long at 8 Gbps
• As distance increases, the number of available BB_Creditsneed to increase as well
• Insufficient BB_Credits will throttle performance - no data will be transmitted until R_RDY is returned
119
SAN Extension – FCoE over long distanceFCoE Flow Control
For long distance FCoE, receiving switch Ingress Buffer must be large enough to absorb all packets in flight from the time the Pause frame is sent to the to time the Pause Frame is received
• A 10GE, 50 km link can hold ~300 frames
• That means 600+ frames could be either in flight or will be transmitted by the time the receiver detects buffer congestion and sends a Pause frame to the time the Pause frame is received and the sender stops transmitting
Pause
Frame Frame Frame Frame Frame
Frame
Frame
Frame
Frame
Frame
Frame
Frame
FrameEgress Buffer
Ingress Buffer
Buffer Threshold
Frame
Latency BufferPause threshold
Latency Buffer turning is platform specific
120
Data Mobility
• Migrates data between storage arrays for
• Technology refreshes
• Workload balancing
• Storage consolidation
• DMM offers
• Online migration of heterogeneous arrays
• Simultaneous migration of multiple LUNs
• Unequal size LUN migration
• Rate adjusted migration
• Verification of migrated data
• Dual fabric support
• CLI and wizard-based management with Cisco Fabric Manager
• Not metered on no. of terabytes migrated or no. of arrays
• Requires no SAN reconfiguration or rewiring
• Uses FC Redirect
Data Mobility Manager
ApplicationI/O
Old Array
Data Migration
NewArray
Application Servers
121
• 8 channels WDM using 20nm spacing
• Colored CWDM SFPs used in FC switch
• Optical multiplexing done in OADM
• Passive device
SAN Extension - CWDMCourse Wavelength Division Multiplexing
Optical fiber pairTX
Optical
transmittersOptical
receivers
TX
TX
TX
RX
RX
RX
RX
OADM
Transmission
122
• DWDM systems use optical devices to combine the output of several optical transmitters
• Higher density technology compared with CWDM, <1nm spacing
SAN Extension - DWDMDense Wavelength Division Multiplexing
TX
Optical
transmittersOptical
receivers
TX
TX
TX
RX
RX
RX
RX
DWDM devices
Optical fiber pair
Transmission
Optical Splitter Protection
123
Dense vs Coarse (DWDM vs CWDM)DWDM CWDM
Application Long Haul Metro
Amplifiers Typically EDFAs Almost Never
# Channels Up to 80 Up to 8
Channel Spacing 0.4 nm 20nm
Distance Up to 3000km Up to 80km
Spectrum 1530nm to 1560nm 1270nm to 1610nm
Filter Technology Intelligent Passive
ONSMDS
Array
DWDM CWDM
Site 1 Site 2 Site 1 Site 2
124
Summary
Drivers in DC are forcing change
• 10G convergence & server virtualization
• It's not just about FCP anymore. FCoE, NFS, iSCSI are being adopted
Proper SAN design is holistic in the approach
• Performance, Scale, Management attributes all play critical roles
• Not all security issues are external
• Fault isolation goes beyond SAN A/B separation
• Consider performance under load
• Design for SAN services
Many design options
• Optimized for performance
• Some for management
• Others for cable plant optimization
125
Additional Relevant Sessions
• BRKSAN-3446 - SAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco Fabric
• Friday 9AM
Storage Networking – Cisco Live Berlin
126
Call to Action
• Visit the World of Solutions for:
• Multiprotocol Storage Networking booth• See the MDS 9718, Nexus 5672UP, 2348UPQ, and MDS 40G FCoE blade
• Data Center Switching Whisper Suite• Strategy & Roadmap (Product portfolio includes: Cisco Nexus 2K, 5K, 6K, 7K, and MDS products).
• Technical Solution Clinics
• Meet the Engineer
• Available Tuesday and Thursday
128
Complete Your Online Session Evaluation
• Please complete your online sessionevaluations after each session.Complete 4 session evaluations& the Overall Conference Evaluation(available from Thursday)to receive your Cisco Live T-shirt.
• All surveys can be completed viathe Cisco Live Mobile App or theCommunication Stations
129