vxlan bgp evpn: technology building blocks....o -ospf, ia -ospf inter area, e1 - ospf external type...
TRANSCRIPT
VXLAN BGP EVPN: TECHNOLOGY
BUILDING BLOCKS.
UNDERLAY / OVERLAY / IP FABRIC /VXLAN / EVPN /MULTI-TENANT
Jide Akintola
TechForceNG
17/06/2019
TECHFORCENG.C
OM
AGENDA
Ø VxLAN Overview and Configuration
Ø EVPN Overview and Configuration
Ø Underlay Configuration Walk through
Ø Overlay Configuration Walk through
Ø EVPN VxLAN Service Configuration Walk through
Ø Sample Legacy Device Migration to VxLAN BGP EVPN
TECHFORCENG.C
OM
DATA CENTER TECHNOLOGYSample Vendors’ Supported Options
L2 + STP + L3 + RVI
MC-LAG
QFabric
Virtual Chassis Fabric
CLOS: 3 / 5 -Stage
VXLAN + EPVN Fabric
Traditional Ethernet Fabric IP Fabric
VCP /VCP+ ACI
Virtual Chassis
TECHFORCENG.C
OM
VXLAN ACRONYMSØ VXLAN - Virtual eXtensible Local Area Network
Ø VNI - VXLAN Network Identifier (or VXLAN Segment ID)
Ø VXLAN Segment - VXLAN Layer 2 overlay network over which VMs communicate.
Ø VTEP - VXLAN Tunnel End Point. An entity that originates and/or terminates VXLAN tunnels.
Ø VXLAN Gateway - an entity that forwards traffic between VXLANs.
TECHFORCENG.C
OM
VXLAN OVERVIEW
Ø VxLAN is a Layer 2 overlay scheme over an existing Layer 3 network infrastructure.
Ø An overlay network is used to carry the MAC traffic from the individual VMs/host in an encapsulated format over a logical stateless "tunnel".
Ø With VxLAN overlay network, the original packet is encapsulated on the ingress device with an outer header before being forwarded to the egress device. All intermediate devices simply forward the encapsulated packet based on the outer header and are not aware of the original packet payload. At the egress device, the encapsulated packet header is removed and the original packet is forwarded based on the inner payload.
TECHFORCENG.C
OM
VXLAN OVERVIEWØ Each overlay is termed a VXLAN segment. Only VMs/hosts within
the same VXLAN segment can communicate with each other.
Ø Each VXLAN segment is identified through a 24-bit segment ID, termed the "VXLAN Network Identifier (VNI)". This allows up to 16 Million VXLAN segments to coexist within the same administrative domain.
Ø The VNI is in an outer header that encapsulates the inner MAC frame originated by the VM/host, hence providing traffic isolation while allowing for overlapping MAC addresses across different VNI.
Ø The underlay network on the contrary, is a transport network that provides network reachability between the ingress and egress overlay devices.
TECHFORCENG.C
OM
VXLAN NETWORK OVERLAY
Underlay Network
Overlay Tunnels
TECHFORCENG.C
OM
VXLAN OVERVIEW – UDP WHY?
Ø VXLAN uses UDP encapsulation to take advantage of the load
balancing in the network.
Ø The UDP source port can be set to the hash of inner packet fields and
the UDP destination port is set to the 4789
Ø Setting the UDP source port as packet hash allows for load balancing
of the packets using 5-tuples.
Ø The existing IP network infrastructure supports this and no changes are
required to support VXLAN in the network
TECHFORCENG.C
OM
VXLAN VTEP PEER DISCOVERY
Ø The vanilla implementation of VxLAN has no mechanism for VTEP peer
auto-discovery but rather relies on manual definition of those Vxlan
overlay edge devices as part of the device configuration. EVPN is used
to address this shortcoming.
TECHFORCENG.C
OM
VXLAN END-HOST DEVICES DISCOVERY
Ø Similar to VPLS, the original implementation of VxLAN relies on the data plane flood and learn (F&L) discovery scheme.
Ø To however address the scalability concern of F&L discovery scheme, other controller-less control plane discovery scheme such as BGP EVPN and OVSDB have been defined. It is also worth noting that other SDN controller-based discovery scheme such as Cisco APIC or Juniper Contrail can also be used.
TECHFORCENG.C
OM
VXLAN AND MULTICAST TRAFFICØ The original VxLAN implementation mandated the underlay network
to support native IP multicast for forwarding BUM (broadcast, unknown unicast & multicast) traffic.
Ø Layer 2 VNI is mapped to an IP multicast group address, VTEP then sends out PIM Join/Prune message expressing interest in the multicast traffic. Network does the replication.
Ø Newer software from all vendors now support Ingress Replication (IR) or Head-End Replication (HER), eliminating the need for the underlay to support native IP multicast.
Ø With HER, the ingress router builds a flood list which basically specifies all remote VTEPs to replicate the BUM traffic to.
TECHFORCENG.C
OM
VXLAN PACKET HEADER AND ENCAPSULATION
OUTER
MAC
OUTER
IP
OUTER
UDP
VXLAN
Header
F
C
S
OriginalL2Frame
Reserved
VXLAN Network Identifier (VNI) Reserved
R R R R I R R R
Flag
The I flag is set to 1 for a valid VNI. R flag are
reserved and must be
set to 0.
50 Bytes (14+20+8+8) of additional overhead added.
TECHFORCENG.C
OM
VXLAN – TEST TOPOLOGY
TECHFORCENG.C
OM
VXLAN – SAMPLE CONFIGURATION ARISTA!hostname aris-lf1
!
vlan 20
name vla20
!interface Ethernet3
switchport trunk allowed vlan
10,20,30,40
switchport mode trunk
!interface Loopback0
ip address 10.1.1.3/32
!
interface Vxlan1
vxlan source-interface Loopback0vxlan udp-port 4789
vxlan vlan 20 vni 10020
vxlan vlan 20 flood vtep 10.1.1.4
!
!hostname aris-lf2
!
vlan 20
name vla20
!interface Ethernet5
switchport trunk allowed vlan 20
switchport mode trunk
!
interface Loopback0ip address 10.1.1.4/32
!
interface Vxlan1
vxlan source-interface Loopback0
vxlan udp-port 4789vxlan vlan 20 vni 10020
vxlan vlan 20 flood vtep 10.1.1.3
!
Leaf 1 Leaf 2
TECHFORCENG.C
OM
!hostname CLIENT1
!
!
vlan 20
name vla20!
interface Ethernet1
switchport trunk allowed vlan 10,20
switchport mode trunk
!!
interface Vlan20
ip address 20.20.20.1/24
!
ip routing!
!hostname CLIENT3
!
!
vlan 20
name vla20!
!
interface Ethernet2
switchport trunk allowed vlan 20,30
switchport mode trunk!
!
interface Vlan20
ip address 20.20.20.3/24
!ip routing
!
VXLAN – SAMPLE CONFIGURATION ARISTA
TECHFORCENG.C
OM
VXLAN – SAMPLE OUTPUTSaris-lf1#sh ip route 10.1.1.4
VRF: default
Codes: C - connected, S - static, K - kernel,
O - OSPF, IA - OSPF inter area, E1 -OSPF external type 1,
E2 - OSPF external type 2, N1 - OSPF
NSSA external type 1,
N2 - OSPF NSSA external type2, B I -iBGP, B E - eBGP,
R - RIP, I L1 - IS-IS level 1, I L2 - IS-IS
level 2,O3 - OSPFv3, A B - BGP Aggregate, A O -
OSPF Summary,
NG - Nexthop Group Static Route, V -VXLAN Control Service,
DH - Dhcp client installed default route
O 10.1.1.4/32 [110/30] via 10.10.10.4, Ethernet2
aris-lf1#
aris-lf2#sh ip route 10.1.1.3
VRF: default
Codes: C - connected, S - static, K - kernel,
O - OSPF, IA - OSPF inter area, E1 -OSPF external type 1,
E2 - OSPF external type 2, N1 - OSPF
NSSA external type 1,
N2 - OSPF NSSA external type2, B I -iBGP, B E - eBGP,
R - RIP, I L1 - IS-IS level 1, I L2 - IS-IS
level 2,O3 - OSPFv3, A B - BGP Aggregate, A O -
OSPF Summary,
NG - Nexthop Group Static Route, V -VXLAN Control Service,
DH - Dhcp client installed default route
O 10.1.1.3/32 [110/30] via 10.10.10.8, Ethernet2
aris-lf2#
TECHFORCENG.C
OM
VXLAN – SAMPLE OUTPUTSCLIENT3#ping 20.20.20.1PING 20.20.20.1 (20.20.20.1) 72(100)
bytes of data.
80 bytes from 20.20.20.1: icmp_seq=1
ttl=64 time=212 ms
80 bytes from 20.20.20.1: icmp_seq=2 ttl=64 time=216 ms
80 bytes from 20.20.20.1: icmp_seq=3
ttl=64 time=228 ms
80 bytes from 20.20.20.1: icmp_seq=4
ttl=64 time=248 ms80 bytes from 20.20.20.1: icmp_seq=5
ttl=64 time=244 ms
--- 20.20.20.1 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 864ms
rtt min/avg/max/mdev =
212.013/229.614/248.016/14.451 ms,
pipe 2, ipg/ewma 216.013/221.817 ms
CLIENT3#
CLIENT1#ping 20.20.20.3PING 20.20.20.3 (20.20.20.3) 72(100)
bytes of data.
80 bytes from 20.20.20.3: icmp_seq=1
ttl=64 time=200 ms
80 bytes from 20.20.20.3: icmp_seq=2 ttl=64 time=220 ms
80 bytes from 20.20.20.3: icmp_seq=3
ttl=64 time=248 ms
80 bytes from 20.20.20.3: icmp_seq=4
ttl=64 time=260 ms80 bytes from 20.20.20.3: icmp_seq=5
ttl=64 time=268 ms
--- 20.20.20.3 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 824ms
rtt min/avg/max/mdev =
200.013/239.215/268.017/25.476 ms,
pipe 2, ipg/ewma 206.013/221.345 ms
CLIENT1#
TECHFORCENG.C
OM
VXLAN – SAMPLE OUTPUTSaris-lf1#sh vxlan vtepRemote VTEPS for Vxlan1:
10.1.1.4
Total number of remote VTEPS: 1
aris-lf1#
aris-lf1#sh vxlan address-table
Vxlan Mac Address Table
------------------------------------------------------
----------------
VLAN Mac Address Type Prt
VTEP Moves Last Move
---- ----------- ---- --- ---- -----
---------20 5000.00d7.ee0b DYNAMIC Vx1
10.1.1.4 1 0:01:01 ago
Total Remote Mac Addresses for this
criterion: 1
aris-lf1#
aris-lf2#sh vxlan vtepRemote VTEPS for Vxlan1:
10.1.1.3
Total number of remote VTEPS: 1
aris-lf2#
aris-lf2#sh vxlan address-table
Vxlan Mac Address Table
------------------------------------------------------
----------------
VLAN Mac Address Type Prt
VTEP Moves Last Move
---- ----------- ---- --- ---- -----
---------20 5000.00af.d3f6 DYNAMIC Vx1
10.1.1.3 1 0:02:02 ago
Total Remote Mac Addresses for this
criterion: 1
aris-lf2#
TECHFORCENG.C
OM
EVPN ACRONYMSØ EVPN - Ethernet VPN.
Ø EVI - EVPN Instance. An EVPN instance spanning the Provider Edge (PE) devices participating in that EVPN.
Ø MAC-VRF - A Virtual Routing and Forwarding table for Media Access Control (MAC) addresses on a PE.
Ø IP-VRF - A Virtual Routing and Forwarding table for Internet Protocol (IP) addresses on a PE.
Ø DF – Designated Forwarder.
TECHFORCENG.C
OM
EVPN ACRONYMS
Ø ES - Ethernet Segment. When a customer site (device or
network) is connected to one or more PEs via a set of Ethernet
links, then that set of links is referred to as an ’Ethernet segment’.
Ø VTEP - VXLAN Tunnel End Point. An entity that originates and/or terminates VXLAN tunnels.
Ø NVE - Network Virtualization Edges (same as a PE/VTEP).
Ø NVGRE - Network Virtualization using Generic Routing
Encapsulation.
TECHFORCENG.C
OM
EVPN OVERVIEW
Ø While the VxLAN draft defines an extensible data plane for virtual networks, a control plane was never not specified.
The implication of this is that, the vanilla VxLANimplementation relies on the data plane flood and learn (F&L) approach, leading to scalability concern.
Ø EVPN was develop to address the above limitation in
VxLAN.
Ø EVPN technology is also used within the data center to
offer multi-tenancy.
TECHFORCENG.C
OM
EVPN OVERVIEWØ In an EVPN, MAC learning between PEs occurs not in the
data plane as was the case in VPLS but rather in the control
plane. Data plane MAC learning in EVPN is limited to PE-CE link only.
Ø Data plane learning requires the flooding of unknown unicast and Address Resolution Protocol (ARP) frames, whereas,
the control plane learning eliminates flooding.
Ø Moreover, control plane information is distributed with MP-
BGP which allows for auto-discovery of PE devices participating in a given EVPN instance.
TECHFORCENG.C
OM
ADVANTAGES OF EVPN
ØImproved network efficiency and Scalability ØReduced unknown-unicast flooding due to control-plane MAC
learning.
ØMulti-path traffic over multiple spine switches.
ØMulti-path traffic to active / active dual-homed server.
ØDistributed layer-3 gateway.
ØVery scalable MP-BGP-based control plane.
ØImproved Network ConvergenceØFaster re-convergence when link to dual-homed server fails (mass-
withdrawal).
TECHFORCENG.C
OM
EVPN DATA PLANE OPTIONS – IP / MPLS
Ø The following data-plane encapsulation are defined and supported
with EVPN
Value Name
8 VXLAN Encapsulation
9 NVGRE Encapsulation
10 MPLS Encapsulation
11 MPLS in GRE Encapsulation
12 VXLAN GPE Encapsulation
TECHFORCENG.C
OM
EVPN DATA PLANE ENCAP– IP / MPLS
Transport Label Service Label PayloadMPLS
Outer IP Header VXLAN VNID PayloadVXLAN
Ø Both VXLAN and NVGRE are examples of technologies that provide
a data plane encapsulation which is used to transport a packet over
native IP infrastructure.
Outer IP Header NVGRE VSID PayloadNVGRE
TECHFORCENG.C
OM
EVPN-VXLANØ Multiprotocol Border Gateway Protocol Ethernet Virtual Private
Network (MP-BGP EVPN) is used as the control plane for VXLAN.
Ø It provides VTEP peer discovery and end-host reachability information
distribution.
Ø It allows more scalable VXLAN overlay network designs suitable for
private and public clouds.
Ø The MP-BGP EVPN control plane introduces a set of features that
reduces or eliminates traffic flooding in the overlay network and
enables optimal forwarding for both west-east and south-north traffic.
TECHFORCENG.C
OM
ØThe current EVPN service model otherwise known as the deployment
scenarios specifies different ways of how VLAN-to-VNI Mapping can be
achieved. The following three service models are defined:
1. VLAN-Based Service Interface
2. VLAN Bundle Service Interface / Port-Based Service Interface
3. VLAN-Aware Bundle Service Interface
ØMost vendors however, only support option 1 and 3 from the list above
though.
EVPN SERVICE MODEL – VLAN-TO-VNI MAPPING
TECHFORCENG.C
OM
EVPN SERVICE MODEL – VLAN-to-VNI MAPPING
• VLAN-Based Service Interface:
Ø Has a one-to-one mapping between a VLAN ID (VID) on the
interface and a MAC-VRF. Also, the EVPN instance consists of only
a single broadcast domain.
• VLAN Bundle Service Interface:
ØHas a many-to-one mapping between VLANs and a MAC-VRF, and
the MAC-VRF consists of a single bridge table. Also, the EVPN
instance corresponds to multiple broadcast domains.
TECHFORCENG.C
OM
EVPN SERVICE MODEL – VLAN-to-VNI MAPPING
• VLAN-Aware Bundle Service Interface:
Ø Here EVPN instance consists of multiple broadcast domains with
each VLAN having its own bridge table.
TECHFORCENG.C
OM
EVPN SERVICES MODEL SUMMARY
Attribute VLAN-Based Service VLAN Bundle Service VLAN Aware Service
VLAN to EVPN Instance Ratio 1:1 N:1 N:1
Route Target VLAN VRF VRF
Service Label VLAN VRF VLAN
VLAN Normalization Yes No Yes
Overlapping MAC Addresses Yes No Yes
TECHFORCENG.C
OM
EVPN ROUTE TYPESRoute Type Description Usage
1 Ethernet Auto-Discovery PE Discovery and Mass Withdraw
2 MAC Advertisement MAC Advertisement
3 Multicast Route BUM Flooding
4 Ethernet Segment Route ES Discovery and DF Election
5 IP Prefix Route IP Route Advertisement
ØThere is no change to the encoding of the original EVPN routes to support
VXLAN or NVGRE data-plane encapsulation.
Ø In order to indicate which type of data-plane encapsulation is to be used,
the BGP encapsulation extended community is included with all EVPN
routes to signify which data-plane encapsulation is in used.
TECHFORCENG.C
OM
EVPN ROUTE TYPES FORMAT – TYPE 1
ØAn Ethernet Tag ID is a 32-bit field containing either a 12-bit or 24-bit
identifier that identifies a particular broadcast domain for instance, a
VLAN in an EVPN instance.
Route Distinguisher (RD) (8 octets)
Ethernet Segment Identifier (10 octets)
Ethernet Tag ID (4 octets)
MPLS Label / VNI (3 octets)
Ø Also known as Ethernet Auto-
Discovery Route (Ethernet A-D per
ESI and Ethernet A-D per EVI)
Ø Used for remote VTEP auto-
discovery.
Ø Usedforadvertisingsplit-horizon
label
Ø Alsoprovidesforfastconvergence
throughmasswithdrawal
TECHFORCENG.C
OM
EVPN ROUTE TYPES FORMAT – TYPE 2
Route Distinguisher (RD) (8 octets)
Ethernet Segment Identifier (10 octets)
Ethernet Tag ID (4 octets)
MAC Address Length (1 octet)
IP Address (0, 4, or 16 octets)
MPLS Label 2 (0 or 3 octets)
MAC Address (6 octets)
IP Address Length (1 octet)
MPLS Label 1 / VNI Field (3 octets)
Ø Also known as MAC/IP
Advertisement Route
Ø Used to provides end-host
reachability information.
TECHFORCENG.C
OM
EVPN ROUTE TYPES FORMAT – TYPE 3
Route Distinguisher (RD) (8 octets)
IP Address Length (1 octet)
Ethernet Tag ID (4 octets)
Originating Router's IP Address (4 or 16 octets)
Ø Also known as Inclusive
Multicast Ethernet Tag (IMET)
Route
Ø Used to create the distribution
list for ingress replication.
Ø UsedtosetuppathsforBUM
trafficperVLANperEVIbasis.
Ø Usedtodiscoverthemulticast
tunnelsamongtheendpoints
associatedwithagivenEVI.
Thisrouteistaggedwiththe
PMSITunnelattribute,which
isusedtoencodethetypeof
multicasttunneltobeused
The following PMSI Tunnel attribute types are supported for VXLAN/NVGRE encapsulation:
Ø PIM-SSM Tree
Ø PIM-SM Tree
Ø BIDIR-PIM TreeØ Ingress Replication
TECHFORCENG.C
OM
EVPN ROUTE TYPES FORMAT – TYPE 4
Ø Also known as Ethernet
Segment Route
Ø Used for Ethernet Segment
auto-discovery by allowing
VNE with the same ESI to
discover each other.
Ø It also allows for designated forwarder (DF) election.
Route Distinguisher (RD) (8 octets)
IP Address Length (1 octet)
Ethernet Segment Identifier (10 octets)
Originating Router's IP Address (4 or 16 octets)
TECHFORCENG.C
OM
EVPN ROUTE TYPES FORMAT – TYPE 5
Ø Also known as IP Prefix Route
Ø Used to decouple IP Prefix
from MAC/IP route to provide
IP prefix advertisement.
Route Distinguisher (RD) (8 octets)
IP Prefix Length (1 octet)
Ethernet Segment ID (ESI) (10 octets)
MPLS label / VNI (3 octets)
Ethernet Tag ID (4 octets)
IP Prefix (4 or 16 octets)
Gateway IP Address (4 or 16 octets)
TECHFORCENG.C
OM
DESIGNATED FORWARDER (DF)Ø The designated forwarder (DF) is the NVE / PE router responsible for sending broadcast, unknown
unicast and multicast (BUM) traffic to multi-homed CE on a particular Ethernet Segment (ES) within
a given VLAN.
Ø The original DF election process elects a DF per <ES, EVI> and uses the following election
algorithm: each PE that is multi-homed to a given Ethernet Segment builds an ordered list of the IP
addresses of all the PE nodes connected to the Ethernet segment including itself in a numerical
ascending order starting from zero. Each IP address in the list is then assigned an ordinal number
based on its position in the list. The ordinal number starts from zero with value zero assigned to the PE that has the least IP address. Then given a total of N PEs multi-homed to the same Ethernet
segment, the PE's with the ordinal number “o” is the DF if (VLAN-ID mod N == o) where
VLAN-ID is the “dividend” and N is the “divisor” and “mod” is Modulo and “o” is the “remainder” in
the formula.
Ø To ensure that the service is evenly carved, the above original DF election algorithm however
assumes Ethernet tag are uniformly distributed between odd and even VLAN/Ethernet Tag values.
Hence for cases where this uniformity does not exist, such as if all VLAN ID are odd numbers or all
VLAN ID are even numbers then no DF load balancing happens and one of the PE never gets
elected at all.
TECHFORCENG.C
OM
DESIGNATED FORWARDERØ Example assuming we have two PEs (PE0 and PE1) connected to the same Ethernet Segment,
meaning N=2, then assume again that all the VLAN IDs are even as follows, 4, 34, 44, 88; In this
case applying the DF default election algorithm PE's with the ordinal number “o” is the DF if
(VLAN-ID mod N == o) ==> (4 mod 2 == 0; 34 mod 2 == 0; 44 mod 2 == 0; 88 mod 2 == 0). As can
be seen PE0 would always be elected as the DF for all these even VLAN IDs with the default DF
algorithm, hence defeating the service carving notion.
Ø The proposed updated DF election process is defined in “draft-ietf-bess-evpn-df-election-framework-09” and it elects a DF per <ES, BD> as oppose to the default DF election method of <ES, EVI> .
Ø The new DF election algorithm is based on Highest Random Weight (HRW) Algorithm that allows for
fair load distribution, avoidance of needless service disruption, redundancy and fast access.
TECHFORCENG.C
OM
EVPN ROUTE EXTENDED COMMUNITY– MAC MOBILITYØ Advertised along with MAC/IP
advertisement routes
Ø The sequence number is used
to ensure that PEs retain the
correct MAC/IP Advertisement
route when multiple updates
occur for the same MAC
address.
Ø PE increments sequence
number.
Ø PE with highest sequence
number wins.
Ø If a tie occurs, highest router-id
flushes its cache
Type Sub-Type
Sequence Number
Flags (1 Octect) Reserved
TECHFORCENG.C
OM
EVPN ROUTE EXTENDED COMMUNITY– ES-IMPORT RT
Ø Transitive Route Target
extended community carried
with the Ethernet Segment
route ES Type 4 route.
Ø Enables all the PEs connected
to the same multi-homed site to
import the Ethernet Segment
routes. Hence limiting the scope of the ES route to the
multi-homed segment.
Type Sub-Type
ES-Import Cont'd
ES-Import
TECHFORCENG.C
OM
EVPN ROUTE EXTENDED COMMUNITY– ESI LABEL
Ø Advertised with the Ethernet
Auto-discovery routes
Ø Used for split-horizon filtering
in multi-home sites and used to
encode the split-horizon label.
Ø It is also used to indicate
whether an ES segment is operating in Single-Active, or
All-Active redundancy mode.
Type Sub-Type
ESI Label
Flags (1 Octect) Reserved
Reserved
TECHFORCENG.C
OM
EVPN ROUTE EXTENDED COMMUNITY– ESI LABEL
• Split Horizon Operation
Ø In EVPN with MPLS encapsulation setup, an MPLS label is used for split-horizon filtering to
support all-active multi-homing where an ingress NVE adds an ESI label corresponding to
the site of origin when encapsulating the packet.
Ø The egress NVE checks the ESI label when attempting to forward a multi-destination frame
out an interface, and if the label corresponds to the same site identifier (ESI) associated with
that interface, the packet gets dropped. This prevents the occurrence of forwarding loops on
that segment.
Ø With VXLAN or NVGRE encapsulation however, there is no concept of labels, hence every
NVE tracks the IP address associated with the other NVE with which it has shared multi-
homed ESs.
Ø When the NVE receives a multi-destination frame from the overlay network, it examines the
source IP address in the tunnel header and filters out the frame on all local interfaces
connected to ESs that are shared with the ingress NVE.
TECHFORCENG.C
OM
EVPN ROUTE EXTENDED COMMUNITY– ESI LABEL
• Split Horizon Operation
Ø It is also worth noting that with VXLAN or NVGRE encapsulation, the ingress VNE is "Locally
Biased", meaning that the ingress NVE performs replication locally to all directly attached
Ethernet segments regardless of the DF election state for all flooded traffic ingress from the access interfaces.
TECHFORCENG.C
OM
EVPN MASS WITHDRAW – FAST CONVERGENCE
All-Active Mode
LAG
Ø PE withdraws the set of Ethernet A-D per ES routes. This triggers all PEs that receive the
withdrawal to update their next-hop
adjacencies for all MAC addresses associated
with the Ethernet segment in question.
Ø PE then withdraws all MAC addresses
associated with the Ethernet Segment (ES) L3 L4
L5
L6L2
L1
TECHFORCENG.C
OM
EVPN MAC ALIASING
MAC learned
MAC not learned
Ø Aliasing improves load-balancing by allowing remote VNEs to continue to
load-balance traffic evenly though they
have only received a single MAC/IP
from a single ingress VNE.
Ø Aliasing is define as the ability of a PE to signal that it has reachability to an EVPN instance on
a given ES even when it has learned no MAC
addresses from that EVI/ES.
Ø Aliasing uses the Ethernet A-D per EVI type 1 routes
Ø A remote PE that receives a MAC/IP Advertisement route with a non-reserved ESI would consider the
advertised MAC address to be reachable via all PEs
that have advertised reachability to that MAC
address EVI/ES via the Ethernet A-D per EVI route.
L4
L3
L2
L1
TECHFORCENG.C
OM
DISTRIBUTED ANYCAST GATEWAY
Server1
S1 S2
L2L1 L3
.1/24.1/24
ØGateway is closer to the
end-hosts reducing the
failure domain.
ØEliminate traffic hair
pinning and unnecessary
traffic backhauling to
centralized gateway.
ØUses Anycast Gateway
MAC (AGM) address to
prevent traffic block-holed
resulting from MAC
mobility.
.1/24
L3
Server2
TECHFORCENG.C
OM
INTEGRATED ROUTING AND BRIDGING (IRB)• Two different operations are specified for IRB with VXLAN BGP EVPN deployment depending on the
number of operations carried out on both the ingress and egress NVE.
Ø Asymmetric IRB
Ø Symmetric IRB
• Asymmetric IRB performs two operations on the ingress and one operation on the egress device
hence the name. It follows the bridge-route-bridge approach, bridging and routing operations are
performed on the ingress NVE followed by bridging to the respective destination through the Layer-2 VNI (L2VNI) on the egress NVE. This means that the device hosting the first-hop gateway function is
required to have all possible destination MAC/IP binding information resulting in scaling concern.
• Symmetric IRB on the other hand uses a bridge-route-route-bridge approach, meaning both ingress
and egress device perform the same number of operations (route-bridge) in this case. Routed traffic
from ingress to egress is forwarded via a transit segment, defined on a per-VRF basis and termed the
Layer-3 VNI or L3VNI. This means that only MAC/IP bindings associated with locally attached End-
Points are required on the device hosting the first-hop gateway function, making this a more scalable
approach.
TECHFORCENG.C
OM
VXLAN BGP EVPN FABRIC RECOMMENDATION
Spine 1 Spine 2
Leaf 2 Leaf 3 Leaf 4Leaf 1
AS101 AS101
AS201 AS202 AS203 AS204
Ø Simple design, suitable for most enterprise. Unless traffic engineering (TE) is required intra and inter DC,
in which case Segment Routing can be considered.
Ø Underlay eBGP bound to /31 physical interfaces.
Ø Export loopback prefixes for the overlay EVPN
session.
Ø iBGP bound to the loopback interface in the overlay
Ø BGP ASN per switch pair.
Ø No IGP required, single protocol to manage. Unless
TE / Segment Routing is required and used.
Ø /31 interface addresses can be re-use across multiple
data centers, meaning new DC can be turn up very
quickly.
Ø Ethernet OAM – Link Fault Management (LFM).
Leaf 1 Leaf 2 Leaf 3 Leaf 4
TECHFORCENG.C
OM
AS per router Spine pair
AS 65000 AS 65000
AS 65100 AS 65101 AS 65102 AS 65103
Easy
ConfigurationTemplating
/31 per link
EBGP
• Multipath for ECMP• Export loopbacks
VXLAN BGP EVPN FABRIC RECOMMENDATION
Ethernet OAM -LFM
AS per router Leaf
pair
TECHFORCENG.C
OM
VXLAN BGP EVPN– TEST TOPOLOGY
TECHFORCENG.C
OM
VXLAN BGP EVPN UNDERLAY – CONFIG ARISTA
service routing protocols model multi-agent
router bgp 65000
router-id 4.4.4.4distance bgp 20 200 200
maximum-paths 8 ecmp 16
neighbor ALL-UNDERLAY peer-group
neighbor ALL-UNDERLAY fall-over bfdneighbor ALL-UNDERLAY description "ALL
UNDERLAY NEIBOURS"
neighbor ALL-UNDERLAY allowas-in 2neighbor ALL-UNDERLAY maximum-routes 12000
neighbor 10.10.10.11 peer-group ALL-UNDERLAY
neighbor 10.10.10.11 remote-as 65001neighbor 10.10.10.13 peer-group ALL-UNDERLAY
neighbor 10.10.10.13 remote-as 65001
neighbor 10.10.10.15 peer-group ALL-UNDERLAY
neighbor 10.10.10.15 remote-as 65002neighbor 10.10.10.17 peer-group ALL-UNDERLAY
neighbor 10.10.10.17 remote-as 65002
neighbor 10.10.10.19 peer-group ALL-UNDERLAYneighbor 10.10.10.19 remote-as 60000redistribute connected route-map ADV-LOOPBACK
Spine 5 Spine 6service routing protocols model multi-agent
router bgp 65000
router-id 4.4.4.4distance bgp 20 200 200
maximum-paths 8 ecmp 16
neighbor ALL-UNDERLAY peer-group
neighbor ALL-UNDERLAY fall-over bfdneighbor ALL-UNDERLAY description "ALL
UNDERLAY NEIBOURS"
neighbor ALL-UNDERLAY allowas-in 2neighbor ALL-UNDERLAY maximum-routes 12000
neighbor 10.10.10.11 peer-group ALL-UNDERLAY
neighbor 10.10.10.11 remote-as 65001neighbor 10.10.10.13 peer-group ALL-UNDERLAY
neighbor 10.10.10.13 remote-as 65001
neighbor 10.10.10.15 peer-group ALL-UNDERLAY
neighbor 10.10.10.15 remote-as 65002neighbor 10.10.10.17 peer-group ALL-UNDERLAY
neighbor 10.10.10.17 remote-as 65002
neighbor 10.10.10.19 peer-group ALL-UNDERLAYneighbor 10.10.10.19 remote-as 60000redistribute connected route-map ADV-LOOPBACK
TECHFORCENG.C
OM
Leaf 9 Leaf 10
VXLAN BGP EVPN UNDERLAY – CONFIG ARISTA
service routing protocols model multi-agent
router bgp 65001
router-id 1.1.1.1
distance bgp 20 200 200
maximum-paths 8 ecmp 16
neighbor MLAG-IBGP peer-group
neighbor MLAG-IBGP remote-as 65001
neighbor MLAG-IBGP next-hop-self
neighbor MLAG-IBGP weight 0
neighbor MLAG-IBGP description "MLAG PEER UNDERLAY"
neighbor MLAG-IBGP send-community
neighbor MLAG-IBGP maximum-routes 12000
neighbor SPINE-PEERS-UNDERLAY peer-group
neighbor SPINE-PEERS-UNDERLAY remote-as 65000
neighbor SPINE-PEERS-UNDERLAY weight 100
neighbor SPINE-PEERS-UNDERLAY description "SPINE
NEIBOURS UNDERLAY"
neighbor SPINE-PEERS-UNDERLAY allowas-in 2
neighbor SPINE-PEERS-UNDERLAY maximum-routes 12000
neighbor 10.0.0.2 peer-group MLAG-IBGP
neighbor 10.10.10.0 peer-group SPINE-PEERS-UNDERLAY
neighbor 10.10.10.10 peer-group SPINE-PEERS-UNDERLAY
redistribute connected route-map ADV-LOOPBACK
service routing protocols model multi-agent
router bgp 65001
router-id 3.3.3.3
distance bgp 20 200 200maximum-paths 8 ecmp 16
neighbor MLAG-IBGP peer-group
neighbor MLAG-IBGP remote-as 65001neighbor MLAG-IBGP next-hop-self
neighbor MLAG-IBGP weight 0
neighbor MLAG-IBGP description "MLAG PEER UNDERLAY"neighbor MLAG-IBGP send-community
neighbor MLAG-IBGP maximum-routes 12000
neighbor SPINE-PEERS-UNDERLAY peer-groupneighbor SPINE-PEERS-UNDERLAY remote-as 65000
neighbor SPINE-PEERS-UNDERLAY weight 100neighbor SPINE-PEERS-UNDERLAY description "SPINE NEIBOURS
UNDERLAY"
neighbor SPINE-PEERS-UNDERLAY allowas-in 2neighbor SPINE-PEERS-UNDERLAY maximum-routes 12000
neighbor 10.0.0.1 peer-group MLAG-IBGP
neighbor 10.10.10.2 peer-group SPINE-PEERS-UNDERLAYneighbor 10.10.10.12 peer-group SPINE-PEERS-UNDERLAY
redistribute connected route-map ADV-LOOPBACK
TECHFORCENG.C
OM
Leaf9
VXLAN BGP EVPN UNDERLAY – OUTPUTS
Spine 5
TECHFORCENG.C
OM
Leaf9
VXLAN BGP EVPN UNDERLAY – OUTPUTS
TECHFORCENG.C
OM
Spine 5
VXLAN BGP EVPN UNDERLAY – OUTPUTS
TECHFORCENG.C
OM
router bgp 65000
router-id 2.2.2.2
distance bgp 20 200 200
maximum-paths 8 ecmp 16
neighbor CORE-RR-OVERLAY peer-group
neighbor CORE-RR-OVERLAY remote-as 60000
neighbor CORE-RR-OVERLAY local-as 60000 no-prepend
replace-as
neighbor CORE-RR-OVERLAY update-source Loopback0
neighbor CORE-RR-OVERLAY fall-over bfd
neighbor CORE-RR-OVERLAY description "CORE-RR
OVERLAY"
neighbor CORE-RR-OVERLAY allowas-in 3
neighbor CORE-RR-OVERLAY send-community extended
neighbor CORE-RR-OVERLAY maximum-routes 12000
neighbor 15.15.15.1 peer-group CORE-RR-OVERLAY
neighbor 15.15.15.2 peer-group CORE-RR-OVERLAY
!
address-family evpn
neighbor CORE-RR-OVERLAY activate
!
address-family ipv4
no neighbor CORE-RR-OVERLAY activate
!
Spine 5 Spine 6
VXLAN BGP EVPN OVERLAY– CONFIG ARISTA
router bgp 65000
router-id 4.4.4.4
distance bgp 20 200 200
maximum-paths 8 ecmp 16
neighbor CORE-RR-OVERLAY peer-group
neighbor CORE-RR-OVERLAY remote-as 60000
neighbor CORE-RR-OVERLAY local-as 60000 no-prepend
replace-as
neighbor CORE-RR-OVERLAY update-source Loopback0
neighbor CORE-RR-OVERLAY fall-over bfd
neighbor CORE-RR-OVERLAY description "CORE-RR OVERLAY"
neighbor CORE-RR-OVERLAY allowas-in 3
neighbor CORE-RR-OVERLAY send-community extended
neighbor CORE-RR-OVERLAY maximum-routes 12000
neighbor 15.15.15.1 peer-group CORE-RR-OVERLAY
neighbor 15.15.15.2 peer-group CORE-RR-OVERLAY
!
address-family evpn
neighbor CORE-RR-OVERLAY activate
!
address-family ipv4
no neighbor CORE-RR-OVERLAY activate
!
TECHFORCENG.C
OM
router bgp 65001
router-id 1.1.1.1
distance bgp 20 200 200
maximum-paths 8 ecmp 16
neighbor CORE-RR-OVERLAY peer-group
neighbor CORE-RR-OVERLAY remote-as 60000
neighbor CORE-RR-OVERLAY local-as 60000 no-prepend
replace-as
neighbor CORE-RR-OVERLAY update-source Loopback0
neighbor CORE-RR-OVERLAY fall-over bfd
neighbor CORE-RR-OVERLAY description "CORE-RR
OVERLAY"
neighbor CORE-RR-OVERLAY allowas-in 3
neighbor CORE-RR-OVERLAY send-community extended
neighbor CORE-RR-OVERLAY maximum-routes 12000
neighbor 15.15.15.1 peer-group CORE-RR-OVERLAY
neighbor 15.15.15.2 peer-group CORE-RR-OVERLAY
!
address-family evpn
neighbor CORE-RR-OVERLAY activate
!
address-family ipv4
no neighbor CORE-RR-OVERLAY activate
Leaf 9 Leaf 10
VXLAN BGP EVPN OVERLAY– CONFIG ARISTA
router bgp 65001
router-id 3.3.3.3
distance bgp 20 200 200
maximum-paths 8 ecmp 16
neighbor CORE-RR-OVERLAY peer-group
neighbor CORE-RR-OVERLAY remote-as 60000
neighbor CORE-RR-OVERLAY local-as 60000 no-prepend
replace-as
neighbor CORE-RR-OVERLAY update-source Loopback0
neighbor CORE-RR-OVERLAY fall-over bfd
neighbor CORE-RR-OVERLAY description "CORE-RR
OVERLAY"
neighbor CORE-RR-OVERLAY allowas-in 3
neighbor CORE-RR-OVERLAY send-community extended
neighbor CORE-RR-OVERLAY maximum-routes 12000
neighbor 15.15.15.1 peer-group CORE-RR-OVERLAY
neighbor 15.15.15.2 peer-group CORE-RR-OVERLAY
!
address-family evpn
neighbor CORE-RR-OVERLAY activate
!
address-family ipv4
no neighbor CORE-RR-OVERLAY activate
TECHFORCENG.C
OM
Leaf9
VXLAN BGP EVPN OVERLAY– OUTPUTS
TECHFORCENG.C
OM
VXLAN BGP EVPN OVERLAY– OUTPUTS
Spine 5
Core-RR
TECHFORCENG.C
OM
Leaf9
VXLAN BGP EVPN OVERLAY– OUTPUTS
TECHFORCENG.C
OM
!vlan 400
name OVERLAY-L2
!
!
interface Vxlan1vxlan source-interface Loopback1
vxlan udp-port 4789
vxlan vlan 400 vni 10400
router bgp 65001<..>
!
vlan 400
rd 1.1.1.1:10400
route-target both 10400:10400redistribute learned
redistribute static
!
Leaf 9 Leaf 13VXLAN BGP EVPN OVERLAY SERVICE– PURE L2
!vlan 400
name OVERLAY-L2
!
!
interface Vxlan1vxlan source-interface Loopback1
vxlan udp-port 4789
vxlan vlan 400 vni 10400
router bgp 64001<..>
!
vlan 400
rd 100.1.1.1:10400
route-target both 10400:10400redistribute learned
redistribute static
!
TECHFORCENG.C
OM
!vlan 400
name OVERLAY-L2
!
interface Port-Channel1
description "to LF9-LF10"switchport trunk allowed vlan 29,400,729
switchport mode trunk
!
!
interface Vlan400vrf forwarding OVERLAY-L2-CLIENT-VRF-LITE
ip address 40.40.40.27/24
!
Client27
VXLAN BGP EVPN OVERLAY SERVICE– CONFIG ARISTA
!vlan 400
name OVERLAY-L2
!
interface Port-Channel1
description "to LF13-LF14"switchport trunk allowed vlan 29,400,729
switchport mode trunk
!
!
interface Vlan400vrf forwarding OVERLAY-L2-CLIENT-VRF-LITE
ip address 40.40.40.29/24
!
Client29
TECHFORCENG.C
OM
Client29
VXLAN BGP EVPN OVERLAY SERVICE– OUTPUTS
TECHFORCENG.C
OM
VXLAN BGP EVPN OVERLAY SERVICE– OUTPUTS
Leaf9
TECHFORCENG.C
OM
VXLAN BGP EVPN OVERLAY SERVICE– OUTPUTS
Leaf9
TECHFORCENG.C
OM
ØUse case Juniper Qfabric
SAMPLE MIGRATION OF LEGACY PLATFORM TO VXLAN EVPN
TECHFORCENG.C
OM
L2 CONNECTIVITY INTRA DC- MIGRATION
Ø Dedicated Border Leaf BLF connects the Qfabric NNG device via a layer 2 trunk
interface.
Ø Per customer L2 domain is stretched to the BLF and terminates in per customer MAC VRF.
Ø Connectivity between BLF
and leafs LF1…N is via VXLAN EVPN.
DCI1 DCI2
BLF LF1…N
SP1 SP2
Servers and Other L2/L3
Devices
Backbone
trunk
Qfabric
TECHFORCENG.C
OM
L3 CONENNECTIVTY INTRA/ INTER-DC - MIGRATIONØ Separate BGP/EVPN Session needed between
DCI and the Border Leaf to learn remove EVPN
routes.
Ø Per customer SVI /IP VRF eBGP session between
Qfabric and BLF. To allow for smooth decommission of the Qfabric, also per tenant
VRF eBGP session between DCI VRF and BLF is
needed.
Ø Both DCI and Qfabric advertise a single default route into the per customer IP VRF on the BLF.
BGP attributes can then be manipulated to
control exit traffic path from BLF.
Ø Specific routes are advertised from the per customer IP VRF on the BLF back to the Qfabric
and DCI. BGP attributes can be manipulated to
determine the traffic flow.
Ø Connectivity between BLF and leafs LF1…N and new remote inter-DC is via VXLAN EVPN.
DCI1 DCI2
BLF LF1…N
SP1 SP2
Servers and Other L2/L3
Devices
Backbone
trunk
Qfabric
L3 link -runs 802.1Q from DCI VRF/
eBGP/PIM
TECHFORCENG.C
OM
TOPOLOGY POST DC MIGRATION
DCI1 DCI2
BLF
LF1…N
SP1 SP2
Servers and Other L2/L3
Devices
Backbone
L3 link -runs 802.1Q from DCI VRF/
eBGP/PIM
TECHFORCENG.C
OM
ØCisco Press – Building Data Centers with VXLAN BGP EVPN by Lukas Krattiger, Shyam Kapadia, David Jansen.
ØO’Reilly – Juniper QFX1000 Series A Compressive Guide to Building Next-Generation Data Centers by Douglas Richard Hanks,Jr.
Øhttp://eve-ng.net/Øhttps://tools.ietf.org/html/rfc7348Øhttps://tools.ietf.org/html/rfc7432Øhttps://datatracker.ietf.org/doc/draft-ietf-bess-evpn-df-election-
framework/?include_text=1Øhttps://www.arista.com/enØhttps://www.microsoft.com/en-us/research/wp-
content/uploads/2017/02/HRW98.pdfØhttps://www.juniper.net/documentation/en_US/junos/topics/concept/qfabric
-overview.html
REFERENCES
TECHFORCENG.C
OM
?
TECHFORCENG.C
OM