Download - Advanced BGP Convergence Techniques
1© 2006 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialSession NumberPresentation_ID
Advanced BGP Convergence Techniques
Pradosh Mohapatra
Apricot 2006
2© 2005 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialSession NumberPresentation_ID
Agenda
•Terminology
•Convergence Scenarios
•Core Link Failure
•Edge Node Failure
•Edge Link Failure
3© 2005 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialSession NumberPresentation_ID
Basic Terminology
• Prefix – A route that is learnt by routing protocols.
–12.0.0.0/16
• Pathlist – A list of Next Hop paths learnt by routing protocols.
–12.0.0.0/16
Via POS1/0
Via GE2/0, 5.5.5.5
–10.0.0.0/16
Via 5.5.5.5
Non-recursive
Recursive
(Depends on the resolution of the next-hop)
4© 2005 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialSession NumberPresentation_ID
Forwarding Table Structure
BGPPL
path 1
path 2
IGPPL
path 1
path 2
IGPPL
path 1
path 2
Intf1/NH1
Intf2/NH2
Intf3/NH3
Intf4/NH4
5© 2005 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialSession NumberPresentation_ID
Salient Features
• Pathlist Sharing:
All BGP prefixes that have the same set of paths point to a single pathlist.
• Hierarchical Structure:
BGP prefixes (recursive) point to IGP prefixes (non-recursive).
6© 2005 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialSession NumberPresentation_ID
Core Link Failure
666
7© 2005 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialSession NumberPresentation_ID
Multipath BGP, Multipath IGP, IGP path goes down
BGPPL
path 1
path 2
IGPPL
path 1
path 2
IGPPL
path 1
path 2
• Initial organization before failure of IGP path 1.
• Link to Path 1 goes down.
8© 2005 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialSession NumberPresentation_ID
Multipath BGP, Multipath IGP, IGP path goes down
BGPPL
path 1
path 2
IGPPL
path 2
IGPPL
path 1
path 2
• IGP pathlist modified after Path 1 failure.
• BGP Convergence = IGP Convergence.
9© 2005 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialSession NumberPresentation_ID
Multipath BGP, Multipath IGP, IGP prefix is deleted
BGPPL
path 1
path 2
IGP PL
Path 1
Path 2
IGPPL
path 1
path 2
• Initial organization before deletion of IGP prefix 1.
• IGP Prefix 1 gets deleted.
• Fix-up BGP PL to point to the second path.
10© 2005 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialSession NumberPresentation_ID
Multipath BGP, Multipath IGP, IGP prefix is deleted
BGPLI
path 1
IGPLI
path 1
path 2
• BGP pathlist modified after deletion of IGP prefix 1.
• BGP Convergence = IGP Convergence.
11© 2005 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialSession NumberPresentation_ID
Multipath BGP, Multipath IGP, IGP path modified
BGPLI
path 1
path 2
IGPLI
path 1
path 2
IGPLI
path 1
path 2
• Initial organization before modification of IGP Path 1.
• IGP Path 1 gets modified.
• BGP Convergence = IGP Convergence
12© 2005 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialSession NumberPresentation_ID
Conclusion
• In case of core link failure:
Sub-second convergence.
BGP Prefix-independent & In-place modification of the forwarding table.
Make-before-break solution
13© 2005 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialSession NumberPresentation_ID
Edge Node Failure
131313
14© 2005 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialSession NumberPresentation_ID
Edge node failure
• PE1 has selected PE2 as bestpath and has installed that path only in forwarding table.
• What PE1 needs upon PE2’s failure is fast detection of Unreachability.
• Unreachability status requires all the IGP neighbors to have detected the failure and have sent their LSP’s to PE1.
• PE1 now needs to point to PE3.
PE2
PE3PE1 P1 P2
15© 2005 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialSession NumberPresentation_ID
BGP Next-Hop Tracking
• Event-driven reaction to BGP next-hop changes– BGP communicates its next-hops to RIB.
– If RIB gets a modify/delete/add of an entry covering these next-hops, it notifies BGP.
– BGP runs bestpath algorithm.
• Stability requirement– Fast reaction to isolated events
– Delayed reaction to too frequent events
• Classification of Events– Next-hop unreachable is critical: React faster.
–Metric Change is non-critical: React slower.
16© 2005 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialSession NumberPresentation_ID
BGP NHT – Implementation highlights
• RIB implements dampening algorithm
– Next-hops flapping too often are dampened.
• RIB classifies next-hop changes as critical or non-critical.
– Critical events are sent immediately to BGP. Non-critical events are delayed up-to 3 seconds.
• BGP has an initial delay before it reacts to next-hop changes.
– Default: 5s. Configurable.– Capture as many changes as
possible within the initial delay before running bestpath.
router bgp 1 bgp nexthop-trigger-delay 1
17© 2005 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialSession NumberPresentation_ID
BGP NHT - example
RIB sends 1st NH
notification
IGP CV
Lk
Dn
T2
NHScan + BestPath
T1 T3
• T1: Link failure triggering IGP convergence.
• T2: First next-hop notification to BGP.
• T3: BGP reads the next-hop updates and starts initial delay timer.
• T4: Initial delay period expires. BGP does Nhscan and bestpath change (a function of the table size).
T4
18© 2005 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialSession NumberPresentation_ID
BGP NHT
• Principle: The first SPF must declare PE2 as unreachable
We want to make sure that if PE2 fails, then all its neighbors have had the time to detect the failure, originate their LSP and have flooded it to PE1
We want to make sure that when PE1 starts its SPF, all PE2’s neighbors LSP’s are in PE1’s database
• Dependency
fast failure detection
fast flooding
SPF Initial-wait conservative enough
19© 2005 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialSession NumberPresentation_ID
BGP NHT – Typical Timing
• 0: PE2 failure
• 50ms: PE1 receives the 1st LSP and schedules SPF at T=200ms
the other LSP’s will have all the time to arrive in the meantime
• 200ms: PE1 starts SPF
we account a duration of 30ms but with iSPF it will be ~1ms
• 232ms: PE1 deletes PE2’s loopback and schedules BGP NHT at T=1232ms
there are few prefixes to modify as this is a node failure
• 1232ms: PE1 runs BGP NHT
table scan: ~6us per entry: if PE1 has 20k routes: ~ 120ms
RIB modify: ~140us per entry: if PE1 has 5k routes from PE2, it takes ~ 700ms
70ms distribution download
• 2122ms: PE1/LC has finished modifying the BGP entries to use nh=PE3. We still need to resolve them
resolution starts [0, 1000ms]
resolution lasts: ~ 100us per entry
• 3622ms: Convergence is finished in the worst case
20© 2005 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialSession NumberPresentation_ID
Conclusion – Edge node failure
• Sub-5s is achievable
analyzed scenario leads to WC ~ 3500ms
• Sub-Second is challenging
• Ongoing work to improve this further:
Backup path
21© 2005 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialSession NumberPresentation_ID
Backup Path
BGPPL
path 1
backup path
IGPPL
path 1
path 2
Intf3/NH3
Intf4/NH4
IGPPL
path 1
path 2
Intf1/NH1
Intf2/NH2
•No Multipath. Prefix always points to Path 1.
•Reroute triggered per IGP prefix: fix-up Path 1 to
point to the backup path.
22© 2005 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialSession NumberPresentation_ID
Backup Path – Contd.
• Problem:
How to know the backup path? BGP advertises only one path.
Peering with RRs: RR sends only the bestpath it computes.
• Solution:
Add-path draft.
23© 2005 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialSession NumberPresentation_ID
ADD-PATH
• Mechanism that allows the advertisement of multiple paths for the same prefix without the new paths implicitly replacing any previous ones.
• Add a path identifier to the encoding to distinguish between different prefixes.
+-----------------------------+| Path Identifier (4 octets) |+-----------------------------+| Length (1 octet) |+-----------------------------+| Label (3 octets) |+-----------------------------+...........................................+-----------------------------+| Prefix (variable) |+-----------------------------+
+----------------------+| Path
Identifier (4 octets) |+-------------
---------+| Length (1 octet)
|+-------------
---------+| Prefix
(variable) |
+----------------------+
24© 2005 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialSession NumberPresentation_ID
ADD-PATH - Operation
• New capability: Add-path
• Advertisement of the capability indicates ability to receive multiple paths for all negotiated AFI/SAFI.
• Advertisement of specific AFI/SAFI information in the capability indicates the intent to send multiple paths.
• Only in these cases must the new encoding be used.
• Concerns: Cost of multiple paths advertisement outweigh the benefits on convergence?
25© 2005 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialSession NumberPresentation_ID
Edge Link Failure
252525
26© 2005 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialSession NumberPresentation_ID
Example: PE-CE Link Failure
CE2
CE3CE1
VPN1 site
VPN1 HQ
PE1
PE2
PE3
RRA1
RRA2
RRB1
RRB2
27© 2005 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialSession NumberPresentation_ID
Edge Link Failure scenarios
• Edge Link Failure: Next-hop on the peering link
Convergence behavior same as the last two scenarios.
• Edge Link Failure: Next-hop-self
Default behavior for L3VPN
In-place modification and/or BGP NHT do not help.
Advanced BGP signaling required.
28© 2005 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialSession NumberPresentation_ID
Any Questions ?