the impact of internet policy and topology on delayed routing convergence
DESCRIPTION
Money Time It CAN ’ T be tolerated any more Internet is to became an major factor in economy. e-commerce, VoIP, real-time video, etc.TRANSCRIPT
Good Old DaysInternet always was and still is BAD in reliability, availability & QoS.For historical reasons QoS just was not
there initially: “Best Effort” Principle.
e-mail & web surfing did not place high standards.
Money TimeIt CAN’T be tolerated any more
Internet is to became an major factor in economy.
e-commerce , VoIP, real-time video, etc.
QoS BattleEfforts to bring QoS to Internet are enormous, BUT:
Stable underlying infrastructure MUST exist for any application level solution!
Bad NewsExisting Internet Backbone DO NOT provide rapid restoration and rerouting
NO effective interdomain path fail-over!Fail-Over for single failure takes
milliseconds in PSTN, minutes in Internet
It hurts!
Impact on performance is huge. While restoring path :30 times more packet loss4 times end-to-end latency
Some fail-overs takes 15 minutes, average 3 minutes.
What’s the Problem?
Slow Convergence during Fail-OverRouting tables oscillates after failure for
long period seeking for consistent network view.
WHO is to be blamed ? BGP - Currently used inter-domain routing protocol.
Nasty things about BGPAS path based BGP solves count-to-infinity of RIP, but exacerbates the number of routing table oscillations.
For unbounded delay BGP : ALL possible paths may be explored after single failure : O(N!).
And More…
Even assuming bounded delay , BGP convergence for full mesh topology without filters is O(N * T DELAY)N is number of AS and there are 70000 of
them in Internet. T DELAY is about 30 sec. (recommendation is
30 sec +/- short random jitter).
And Even More…
It is possible for autonomous systems to define “unsafe” policies causing persistent route oscillation.
Any-Way?
BGP4 used in Internet routers has bounded delay, provided by MinRouteAdver timer delaying distribution of too rapid updates.
So, O(N!) performance is irrelevant in real Internet.
Who cares?
BGP divergence was never observed in practice and remains theoretical problem.
There are modifications to BGP policies guaranteeing convergence.
What a Mesh?!
Internet topology is long way from being complete mesh.
BGP Updates filtering is done by almost every BGP node.
Now What?
Experimental results indicates fail-over problems caused by bad BGP performance.For studying and resolving those problems, much more realistic Internet BGP processes models should be developed.
Drug “Providers?”
Internet retains hierarchy with several tiers of ISPs.
This hierarchy is specified by commercial relationships.Smaller ISP are customers of big ones.
Talk to me…Transit – upstream provider transits service to the customer.Default-free routing tables passed downstream. Customers & backbone routes passed
upstream.Peer – symmetric connection providing access to each other customers. Never used for transit to other ISP.Only customers & backbone routes exchanged.
Backup transit – normally acts like Peer, provides transit after fault detection.
It is strictly businessFiltering mechanism of AS boarder routers are used for emphasizing those commercial relationships: If You don’t want other side to use some route – You should not announce it. So:Send customer & backbones routes to all
peers.Provide with other routing information (learned
from peers & upstream) only customers.
No Free LunchesTransit relations – Inbound filters
Prefix filters limiting customer announcement to “legitimate” address space of the customer.Used by 100% ISPs.Upstream customer is willing to transit
routes for its customers only.
Friend to friendPeer relations – Outbound filters
Community filters is based on tagging routes to distinguish customer routes. Only updates from routes tagged as customer routes will pass the filter.Used by 73% of ISPs
Don’t talk too muchPeer relations – Outbound filters (cont.)
Prefix filters also may be used to distinguish customer routes.Applying prefix filter only (used by 13% of
ISPs) may cause creation of unintentional back-up transit path.
Check it
Peer relations – Outbound filters (cont.)ASPaths regular expressions are used to explicitly permit routes advertising.Combination of ASPaths & prefix filters
prevents creation of unintentional back-up transit path.
Both ASPaths & prefix filters are used by 13 % of IPSs.
A B
C
D E
F
G
H
J
I
Peer ___Transit ___Back-up ___UnintentionalBack-up ___
Tier 1
Tier 2
Tier 3
D-C
D-C
Example:In absence of ASPath
check : path “D-C” learned after AD link
failure will be announced to B by A (after DA link failure)
providing unintentional back-up
path from C to B through A.
Trust Me…
Peer relations – Outbound filtersGenerally ISPs just trusts their peers to send only valid information.Only “bogon” filters identifying generally illegal (private, unallocated, etc.) addresses are applied.80% ISPs use “bogon” filters.20% ISPs use none.
Let Us Introduce…Model of BGP convergence is a directed graph.Node represent AS.Model is given for fixed destination X.The shortest path is chosenArc e(u,v) exists iff u informs v about its best route to X (not vice versa)The graph is not symmetricTopology of graph differs for different
destinations X
Up And DownGiven X – client connected to network by single arc to node A (AS of X).Link goes down : TDOWN is the time elapsed until every node knows there is no path to X (new stable state)Connection reestablished : TUP is the time elapsed until all nodes add route to X to their tables.
What We Want to HearAfter establishing connection : Node learns about its best path to X in time dependent on its shortest path to XProof by simple induction.
TUP convergence is ruled by d - maximal shortest distance from X to any node.O(d * T DELAY), where T DELAY is T WAIT + T SEND
T DELAY may be of the same order as MinDelayAdver ,especially if implemented on per peer (not peer + destination) basis
And What We don’t…After A-X link goes down multiple update messages are sent along arcs. Nodes will announce back-up paths for them withdraw
wasn’t received yet. Generally updates will propagate more slowly via long
paths because router add 0 to 30 sec delay Always add ~30 sec after initial update received.
Simple Path from X to A is covered by time T if any node in the path received update from preceding node and resend update to the next node before time T.
Long DownNode U has no route to X in time T iff all simple paths from X to U are covered.Simple path of length L is covered in O(L* T DELAY ) time.TDOWN convergence is ruled by D – length of longest simple path from X to any node.O(D * T DELAY)
What Do You Want?
Minimize network diameter for improving TUP - increase connectivity!Minimize longest possible paths for improving TDOWN - decrease connectivity!NP-complete problem
For full mesh – diameter is 1, longest path is N
Welcome to the Reality6 months of experimental studies.Geographically and topologically diverse BGP sessions with > 20 IPSs.Artificial BGP transitions (announcement & withdraws) injected in > 10 providers.Broad spectrum of other IPSs surveyed.
Real World ExampleJapanese ISP (ISP4) have BGP peer sessions with providers IPS1, ISP2, ISP3 at Mae-West.Withdraw route Ri from IPSi. Observe paths announced by IPS4 for
every case.
ISP4
ISP1
ISP5
R1 Fault
Steady State
The only back-up path explored is ISP1 -> ISP5 -> ISP4. The path explored in 96% events, 92 sec. Average. No path was explored in 4% events, 32 sec. Average.
ISP4
ISP2
ISP5
R2 Fault
Steady State
ISP6
ISP10
ISP13
VagabondPath !
No path was explored in 7% events, 54 sec. Average. Only ISP2-ISP5-ISP4 was explored in 63% events, 79 sec. Average. ISP2-ISP5-ISP4 & ISP2-ISP5-ISP6-ISP4 was explored in 7% events, 88 sec. Average. 11 more unique paths in 45 distinct sequences of announcements. Most of them are “vagabond” back-up paths resulting from router configuration errors.
ISP11
ISP12
It Was an Easy One…
Withdraw of R3 from ISP3 causes exploring fairly complex topology.
More than 20 distinct paths were announced. Almost 150 different combinations of
announcements. Much bigger convergence times (~ 140 sec) Only 35% of those paths are “legitimate” and the
rest are “vagabond” unintentional back-up paths.
Do not Interfere!
Selection & Order of back-up paths depends on interaction of MinRouteAdver timers on routersMinRouteAdver is usually implemented on
peer (not peer+address) basis , so earlier instability interferes.
For example: In ISP1 case in 4% of cases initial delay on IPS4 was longer than delay needed to propagate back-up path.
LA to SF via HaifaVagabond paths were found in the majority of 200 monitored ISP pairs.Usually persist for short period (several days)Those erroneous paths do not conform any intended or published policy.Single error may have global impact mainly cause of lack of inbound filters on peer connections.Vagabond paths may impact performance and need to be automatically detected!
You call It line?Average convergence delay clearly corresponds to the length of the longest back-up path.Back-up paths are determined by policy
and topology.
Data contains significant variability but linear relationships is clued by the experimental data.
But Some are more equalTopology is dependent on ISP tier.Smaller ISP typically purchase transit from
multiple upstream providers.Smaller ISP implements back-up transits
policy unnecessary in large ISPs.Longest legitimate path : 9 ASes for Tier 1, 12 ASes for Tier 2.
This way Supported by the provided example: ISP1 is large tier-1 backbone provider ISP2 is moderate sized US-based tier-2
provider ISP3 is regional tier-3 network
Tier-1 & tier-2 topology is much simpler and their customers are much less impacted by fail-over problems.
Now You SeeInternet lacks the level of reliability required by its future role.Route fail-over complexity scales linearly with longest back-up for the route.The back-up paths length depends on number of contractuals & policy implementation.
Advices are for freeFor Customer: If You do mission-critical stuff , connect to large providers.For Small ISP: Limit number of transit & backup transit connections.For All ASes: Avoid vagabond paths.Better route validation & authentication
mechanism are needed.