Новый подход к построению ЦОД. Демонстрация metafabric
DESCRIPTION
Презентация для доклада, сделанного в рамках конференции Juniper New Network Day 01.01.2014. Докладчик -- Senior Network Engineer компании Juniper Networks Иван Лысогор. Видеозапись этого доклада с онлайн-трансляции конференции вы можете увидеть здесь: http://www.youtube.com/watch?v=yBXWI8YyKss&hd=1TRANSCRIPT
METAFABRIC ARCHITECTURE
Ivan LysogorSystems Engineer
2 Copyright © 2013 Juniper Networks, Inc.
INTRODUCING THE METAFABRIC ARCHITECTURE
VM
VM
VM
VirtualPhysical
VM
VM
VM
VirtualPhysical
VM
VM
VM
VM
VM
VM
Virtual Virtual
My on-premisesdata center
My hostedservice provider
My managedservice provider
My cloudservice provider
VM
VM
VM
VirtualPhysical
VM
VM
VM
VirtualPhysical
SIMPLE. OPEN. SMART.
3 Copyright © 2013 Juniper Networks, Inc.
METAFABRIC ARCHITECTURE PILLARS
Easy to deploy & use
Save time, improve
performance
Maximizeflexibility
Simple SmartOpen
4 Copyright © 2013 Juniper Networks, Inc.
METAFABRIC ARCHITECTURE PORTFOLIO
Flexible building blocks; simple switching fabricsSwitching
Universal data center gatewaysRouting
Smart automation and orchestration toolsManagement
Simple and flexible SDN capabilitiesSDN
Adaptive security to counter data center threatsData Center Security
Reference architectures and professional servicesSolutions & Services
5 Copyright © 2013 Juniper Networks, Inc.
METAFABRIC REFERENCE ARCHITECTURE
Validated and tested designs
Version 1.0 – virtualized (VMware) Enterprise data center with key partners (IBM, EMC, F5)
Reduce risk – accelerate customer adoption
6 Copyright © 2013 Juniper Networks, Inc.
New
Virtual Chassis Fabric
Up to 20 members
QFX5100 DEPLOYMENT OPTIONS
Spine-Leaf
…
Virtual Chassis
Improved
Up to 10 members
QFabricImproved
Managed as a Single Switch
Layer 3 Fabric
L3 Fabric
QFX5100
… Up to 128 members
7 Copyright © 2013 Juniper Networks, Inc.
QFX5100 PLATFORMQ4 2013 Q1 2014
1.5GHz Dual Core Intel Sandy Bridge X86 CPU 8GB Memory, 2x16GB SSD
Innovated Junos software architecture Redundant, hot-swappable AC or DC power supply
Redundant, hot-swappable fan tray AFI (FRU to port side) or AFO (Port to FRU side) airflow
Beacon LED, no LCD panel
L2/L3 line rate forwarding 10GbE/40GbE and FCoE Feature-rich Junos, full L2/L3
protocol, MPLS
48 X 1/10GbE 6 x 40GbE 24 X 40GbE Slot 1 Slot 2
96 X 1/10GbE 8x40GbE
4 x 40GbE QSFP module
8 Copyright © 2013 Juniper Networks, Inc.
ADVANCED JUNOS SOFTWARE ARCHITECTURE
Provides the foundation for advanced functions• ISSU (In-Service Software Upgrade)
• Other Juniper applications for additional service in a single switch• Third-party application
• Can bring up the system much faster
Linux Kernel (Centos)Host NW Bridge KVM
JunOSVM
(Active)
JunOSVM
(Active)
JunOSVM
(Standby)
JunOSVM
(Standby)
3rd Party Application
3rd Party Application Juniper AppsJuniper Apps
9 Copyright © 2013 Juniper Networks, Inc.
ISSU (IN-SERVICE-SOFTWARE-UPGRADE)
• Master Junos VM controls the hardware–PFE and FRU on the system
• Master issues upgrade command• System launches a new Junos VM
with new image as backup• All states are synchronized to the
new backup Junos• Detach PFE from current master,
then attach to backup Junos (hot move)
• The PFE control component in new master will control the forwarding
• Stop the new backup VM
PFE Contro
l
Master/Backup Election
Other JUNOS
process
MASTER VM
PFE Contro
l
Other JUNOS
process
Master/Backup Election
HOST OS
OTHER HARDWAREPFE hardware
Partition for PFE
warm boot
Backup VM
Software Bridge
10 Copyright © 2013 Juniper Networks, Inc.
INSIGHT TECHNOLOGY
Hotspot & microburst impacts application performance
Not visible with traditional counters Network operation is blind folded
Captures microburst events which exceed defined thresholds
Adjustable sampling intervals
Reports the microburst events instantaneously via CLI Syslog Log file (human readable format) Streaming (Java Script Object Notification, CSV, TSV
formats)
Time
Que
ue D
epth
or Q
ueue
Lat
ency
Buffer Utilization Monitoring And Reporting
High Threshold
Low Threshold
Microburst
11 Copyright © 2013 Juniper Networks, Inc.
UNIFIED FORWARDING TABLE
• Flexibly allocate L2 MAC, L3 host and LPM (Longest Prefix Match) resources from a single pool
• L3 host holds /32 IPv4 or /128 IPv6 routes• LPM table holds any routes not handled by L3 host table
• Optimized forwarding table size based on deployment scenarios
• Use system resource efficientlyUFT (Unified Forwarding Table)L2 MAC + L3 Host + LPM
UFT (Unified Forwarding Table)L2 MAC + L3 Host + LPM
L2 MAC LPML3 Host
UFT (Unified Forwarding Table)L2 MAC + L3 Host + LPM
L2 MAC LPML3 Host
12 Copyright © 2013 Juniper Networks, Inc.
UNIFIED FORWARDING TABLE
UFT (Unified Forwarding Table)L2 MAC + L3 Host + LPM
288K (L2 MAC)16K
(LPM)
16K (L3
Host)
UFT (Unified Forwarding Table)L2 MAC + L3 Host + LPM
160K (L2 MAC)16K
(LPM)144K (L3 Host)
UFT (Unified Forwarding Table)L2 MAC + L3 Host + LPM
224K (L2 MAC)16K
(LPM)80K (L3 Host)
UFT (Unified Forwarding Table)L2 MAC + L3 Host + LPM
96K (L2 MAC)16K
(LPM)208K (L3 Host)
UFT (Unified Forwarding Table)L2 MAC + L3 Host + LPM
32K (L2 MAC)
128K (LPM)16K (L3
Host)
Profile 1: l2-heavy-one
Profile 3: l2-heavy-three (Default)
Profile 2: l2-heavy-two
Profile 4: l3-heavy
Profile 5: LPM-heavy*
*under test, may come after FRS
13 Copyright © 2013 Juniper Networks, Inc.
Simple NetworkArchitecture
Zero-touch provisioning Ops/event scripts Python Network Director API
Network Automation
VMware Puppet, Chef OpenStack CloudStack
Data Center Automation
AUTOMATION*
*Not all features will be available at FRS
14 Copyright © 2013 Juniper Networks, Inc.
JUNOS ENHANCED AUTOMATION IMAGE
Junos Enhanced Automation image provides increased flexibility to our large Data Center customers
VeriExec disabled on Junos Flex enables customers to run unsigned binaries on QFX 5100
Ability to run Python/Ruby with custom Libraries like Collectd/Ganglia/Monit/etc
Puppet and Chef packaged with Junos Flex to help MSDCs automate configuration
15 Copyright © 2013 Juniper Networks, Inc.
VIRTUAL CHASSIS FABRIC
16 Copyright © 2013 Juniper Networks, Inc.
VCF ESSENTIALS
1 RU, 48 SFP+ & 1 QIC
Node #1
Node #16
Node #3 Node #2
Active
Node #4
Backup
Single device to manage
Accessible from any member of fabric
In band Virtual Backplane to enable Junos LC-RE communications
Multi-path forwarding
LogicalPhysical
17 Copyright © 2013 Juniper Networks, Inc.
VCF BUILDING BLOCKS
EX4300 (1GE)
QFX5100-24Q(40GE)QFX5100-48S(10GE)
QFX5100-48S(10GE)
QFX3500(10GE) QFX3600(40GE)
VCF 10/40GE spine nodes
VCF 1/10/40GE leaf nodes
QFX5100-24Q(40GE)
18 Copyright © 2013 Juniper Networks, Inc.
VCF BUILDING BLOCKS - COMPATIBILITY MATRIX
Scales to 20 members
Platform VCF spine node VCF leaf node
QFX5100-24Q ✓ ✓
QFX5100-48S ✓ ✓
QFX5100-96S ✓ ✓
QFX3500 ✗ ✓
QFX3600 ✗ ✓
EX4300 ✗ ✓
19 Copyright © 2013 Juniper Networks, Inc.
VCF SCALE All QFX5100 Mixed
Spine QFX5100-24Q QFX5100-24Q QFX5100-48S
Leaf QFX5100-48SQFX5100-24QQFX5100-96S
QFX5100-48SQFX5100-24QQFX5100-96S
QFX3500 & QFX3600 EX4300
EX4300
Scale QFX5100 Lowest Common Scale
root@opus# set chassis forwarding-options ?Possible completions:
l2-profile-one MAC: 288K L3-host: 16K LPM: 16K l2-profile-three MAC: 160K L3-host: 88K LPM: 16K l2-profile-two MAC: 224K L3-host: 56K LPM: 16K l3-profile MAC: 96K L3-host: 120K LPM: 16K lpm-profile MAC: 32K L3-host: 16K LPM: 128K
L2 MAC 128KL3 Host 8k
L3LPM 16KL3 Multicast4K
IPv6 scale= IPv4 LPM/4
QFX3500/3600 Scale
L2 MAC 64KL3 Host 32kL3LPM 16KL3 Multicast16K
EX4300 Scale
20 Copyright © 2013 Juniper Networks, Inc.
DEPLOYMENT FLEXIBILITY
10G 1/10/40G 1G
10G40G
10/40G spine nodes & 1/10/40G leaf nodes
10G POD 1/10/40G POD 1G POD
Spine Node QFX5100-24Q QFX5100-24Q QFX5100-48S
Leaf Node QFX5100-48SQFX5100-24QQFX5100-96S
QFX3500 & QFX3600
QFX5100-48SQFX5100-24QQFX5100-96S
QFX3500 & QFX3600EX4300
EX4300
QFX5100-24Q QFX5100-24Q QFX5100-48S
1GE, 10GE & 40GE all in one fabric
21 Copyright © 2013 Juniper Networks, Inc.
OPERATIONAL SIMPLICITY - PLUG ‘N’ PLAYmember 1 { role routing-engine; serial-number SER1ALNUM1;} member 2 { role routing-engine; serial-number SER1ALNUM2;} member 3 { role routing-engine; serial-number SERIALNUM3;} member 4 { role routing-engine; serial-number SERIALNUM4;}
1 RU, 48 SFP+ & 1 QIC
Non- Factory
Default or 3rd Party Spine nodes & leaf nodes are auto
provisioned
Factory-default device will join the fabric Non factory-default device will not join the
fabric
Configuration and image synchronization
22 Copyright © 2013 Juniper Networks, Inc.
HA - RESILIENT CONTROL & DATA PLANE
Active Hot- Backup Backup Control Plane Redundancy
Quaternary RE (routing engine) redundancy
Resilient In-Band Control plane
GRES ,NSR, NSB
uplink redundancy
1 RU, 48 SFP+ & 1 QIC
Data Plane Redundancy
OVM VM VM
vSwitch
Virtual Server
OVM VM VM
vSwitch
Virtual Server
Server multi-homing
Active-active uplink forwarding
server multi-homing
uplink redundancy
Redundant Routing engines
Backup
23 Copyright © 2013 Juniper Networks, Inc.
FORWARDING PLANE (SMART TRUNKS)
Automatic fabric trunks• Fabric links automatically aggregated into trunks (LAGs)Fabric trunk types• Next Hop (NH)-trunks: from local to direct neighbors• Remote Destination (RD)-trunks: from local to a remote destination PFEWeights based path (instead of NH link) bandwidth ratio to avoid fabric congestion
1 RU, 48 SFP+ & 1 QIC
SW 5 SW 16
SW 1 SW 2 SW 4SW 3
L1 L2 L3 L4 L16
T2
24 Copyright © 2013 Juniper Networks, Inc.
HA - HITLESS UPGRADE WITH ISSU
Today
Upgrade one rack/node at a timeApplications run on half bandwidthLong maintenance window
Upgrade multiple racks at a timeApplication run on full bandwidthShorter maintenance windowDoes not require hardware redundancy
Hitless upgrade using single switch
VCF
25 Copyright © 2013 Juniper Networks, Inc.
OVM VM VM
vSwitch
Virtual Server
OVM VM VM
vSwitch
Virtual ServerBare Metal
1 RU, 48 SFP+ & 1 QIC
Services GWWAN/Core
VCF ARCHITECTURE PROVIDES
Predictable application performance Deterministic latency Resilient multi-path High bi-sectional bandwidth Smart leafs (local switching) Network ports on spine switches
Mixed 1/10/40G fabric Integrated control plane Integrated RE GRES/NSR/NSB Plug-and-play fabric Analytics on fabric ports
26 Copyright © 2013 Juniper Networks, Inc.
NG DC INTERCONNECT- EVPN
27 Copyright © 2013 Juniper Networks, Inc.
Scenario with VMTO enabled
PRIVATE MPLS WAN PRIVATE MPLS WAN
VLAN 10 VLAN 10 VLAN 10VLAN 10
Scenario without VMTO
VM MOBILITY TRAFFIC OPTIMIZATION
DC1 DC2 DC1 DC2
28 Copyright © 2013 Juniper Networks, Inc.
SRX
VPLS DEPLOYMENT OPTIONS WITH MX – TODAY
NATFWLB
IPSec
SRX
Switch
MX Series
NATFWLB
IPSecSwitch
MX Series
MC-LAG
NATFWLB
IPSec
SRX
Switch
MX Series
LAG
VC
VPLS Multi-Homing
VPLS with MC-LAG Active-Standby
VPLS with MX Virtual Chassis
LAG LAG
IP, MPLS IP, MPLS IP, MPLS
LAG LAG
>1 VPLS devicesVPLS controlled Active-StandbyPer VLAN
A A A ASS
>1 VPLS devicesMC-LAG controlled Active-Standby on LANPer VLAN
One VPLS deviceActive forwarding through all links of LAG
LAG
29 Copyright © 2013 Juniper Networks, Inc.
DC 2VLAN 10
10.10.10.100/24
DC 3
10.10.10.200/24
VLAN 10
Server 2 Server 3
Server 1
PRIVATE MPLS WAN
DC 1
20.20.20.100/24
Active VRRPDG:
10.10.10.1
Standby VRRPDG:
10.10.10.1
Standby VRRPDG:
10.10.10.1
Standby VRRPDG:
10.10.10.1
DCI WITH VPLS AND VRRP
Task: Server 3 in Data Center 3 needs to send
packets to Server 1 in Data Center 1.
Problem: Server 3’s active Default Gateway for VLAN 10
is in Data Center 2.
Effect: 1. Traffic must travel via Layer 2 from Data
Center 3 to Data Center 2 to reach VLAN 10’s active Default Gateway.
2. The packet must reach the Default Gateway in order to be routed towards Data Center 1. This results in duplicate traffic on WAN links and suboptimal routing – hence the “Egress
Trombone Effect.”
VLAN 20
30 Copyright © 2013 Juniper Networks, Inc.
EVPN provides standard-based VLAN Extension over a shared IP/MPLS network.
http://datatracker.ietf.org/doc/draft-ietf-l2vpn-evpn/?include_text=1
EVPN REQUIREMENTS (ON TOP OF VPLS)
All-Active Multi-Homing
Better Control Over MAC Learning
ARP/ND Flooding Minimization
L3 Egress Traffic Forwarding Optimization
L3 Ingress Traffic Forwarding Optimization
All available paths should be used (CE-PE, PE-PE)
MAC learning happens in control plane
Proxy ARP support
Usage of Default Gateway Extended Community
Automatic advertisement of host routes into L3 VPN
31 Copyright © 2013 Juniper Networks, Inc.
DC 2VLAN 10
10.10.10.100/24
DC 3
10.10.10.200/24
VLAN 10
Server 2 Server 3
Server 1
PRIVATE MPLS WAN
DC 1
20.20.20.100/24
Active RVIDG:
10.10.10.1
Active RVIDG:
10.10.10.1
Active RVIDG:
10.10.10.1
Active RVIDG:
10.10.10.1
EVPN: NO EGRESS TROMBONE EFFECT
Task: Server 3 in Datacenter 3 needs to send packets
to Server 1 in Datacenter 1.
Solution: Virtualize and distribute the Default Gateway
so it is active on every router that participates in the VLAN.
Effect: 1. Egress packets can be sent to any router on
VLAN 10 allowing the routing to be done in the local datacenter. This eliminates the
“Egress Trombone Effect” and creates the most optimal forwarding path for the Inter-DC
traffic.
VLAN 20
32 Copyright © 2013 Juniper Networks, Inc.
EVPN TEST TOPOLOGY
EVPN
33 Copyright © 2013 Juniper Networks, Inc.
SUPPORTED CE-PE TOPOLOGY
Do not try to configure MC-LAG on PEs
Do not try to configure single LAG towards two PEs
CE (qfabric)PE1 (MX240-3)
ae0
MPLS
PE2 (MX240-4)
Supported CE-PE config
ae1
ae1
ae1
PE1/PE2 config CE config
34 Copyright © 2013 Juniper Networks, Inc.
HOW TO PREVENT DUPLICATE COPIES ON MULTI-HOMED SEGMENTS?
Designated Forwarder (DF) is elected for each EVI or entire Ethernet Segment.
DF is responsible for forwarding of BUM traffic
CE1
PE1
PE2
MPLS
PE3 CE2
LAG
35 Copyright © 2013 Juniper Networks, Inc.
EVI LOAD BALANCING
Per default ALL CE links will be actively used for traffic forwarding. Half of EVIs will have PE1 as DF and another half PE2 as DF.
PE2
PE1
36 Copyright © 2013 Juniper Networks, Inc.
VM EGRESS TRAFFIC OPTIMIZATION
EVPN advantages over VPLS:- No need for VRRP, Multi-homing VPLS, MC-LAG (less machinery and
protocol dependencies)- IRB within EVPN VRF is configured on all PEs with a same IP address
(copy&paste IRB config on all PEs)- Each PE has a mapping between Default GW IP and all PEs MACs- If VM moves from DC1 to DC2 it continue to use “old” MAC address
from PE located in DC1. However, both PEs in DC2 forward traffic destined to this MAC locally.
IRB MAC on MX240-4IRB MAC on MX480-3IRB MAC on MX480-4
37 Copyright © 2013 Juniper Networks, Inc.
EVPN ROUTE TYPE 2: MAC ADVERTISEMENT ROUTE
If you need to decode pcaps with EVPN NLRIs then you could use dissector I put into Wireshark GIT repository: https://code.wireshark.org/review/#/c/296/
38 Copyright © 2013 Juniper Networks, Inc.
DC 2VLAN 10
10.10.10.100/24
DC 3
10.10.10.200/24
VLAN 10
Server 2 Server 3
Server 1
PRIVATE MPLS WAN
DC 1
20.20.20.100/24
WITHOUT VMTO: INGRESS TROMBONE EFFECT
Task: Server 1 in Datacenter 1 needs to send packets
to Server 3 in Datacenter 3.
Problem: Datacenter 1’s edge router prefers the path to
Datacenter 2 for the 10.10.10.0/24 subnet. It has no knowledge of individual host IPs.
Effect:1. Traffic from Server 1 is first routed across
the WAN to Datacenter 2 due to a lower cost route for the 10.10.10.0/24 subnet.
2. Then the edge router in Datacenter 2 will send the packet via Layer 2 to Datacenter 3.
10.10.10.0/24 Cost 5
10.10.10.0/24 Cost 10
Route Mask
Cost Next Hop
10.10.10.0 24 5 Datacenter 2
10.10.10.0 24 10 Datacenter 3
DC 1’s Edge Router Table Without VMTO
VLAN 20
39 Copyright © 2013 Juniper Networks, Inc.
DC 2VLAN 10
10.10.10.100/24
DC 3
10.10.10.200/24
VLAN 10
VLAN 20
Server 2 Server 3
Server 1
PRIVATE MPLS WAN
DC 1
20.20.20.100/24
WITH VMTO: NO INGRESS TROMBONE EFFECT
Effect: 1. Ingress traffic destined for Server 3 is sent
directly across the WAN from Datacenter 1 to Datacenter 3. This eliminates the “Ingress
Trombone Effect” and creates the most optimal forwarding path for the Inter-DC
traffic.
Task: Server 1 in Datacenter 1 needs to send packets
to Server 3 in Datacenter 3.
Solution: In addition to sending a summary route of
10.10.10.0/24 the datacenter edge routers also send host routes which represent the location
of local servers.
10.10.10.0/24 Cost 5
10.10.10.0/24 Cost 10
Route Mask
Cost Next Hop
10.10.10.0 24 5 Datacenter 2
10.10.10.0 24 10 Datacenter 3
10.10.10.100
32 5 Datacenter 2
10.10.10.200
32 5 Datacenter 3DC 1’s Edge Router Table WITH VMTO
10.10.10.100/32 Cost 5
10.10.10.200/32 Cost 5
40 Copyright © 2013 Juniper Networks, Inc.
REFERENCES
MetaFabric Solution Brief:http://www.juniper.net/us/en/local/pdf/solutionbriefs/3510495-en.pdf
MetaFabric 1.0 Reference Architecture:http://www.juniper.net/us/en/local/pdf/reference-architectures/8030012-en.pdf
MetaFabric 1.0 Design and Implementation Guide:http://www.juniper.net/us/en/local/pdf/design-guides/8020020-en.pdf