internet routing instability three papers presented by michael a. smith craig labovitz, g. robert...
TRANSCRIPT
Internet Routing InstabilityInternet Routing InstabilityThree Papers Presented by Michael A. SmithThree Papers Presented by Michael A. Smith
Craig Labovitz, G. Robert Malan, Farnam Jahanian, "Internet Routing Craig Labovitz, G. Robert Malan, Farnam Jahanian, "Internet Routing Instability." IEEE/ACM Transactions on Networking, 6(5):515-Instability." IEEE/ACM Transactions on Networking, 6(5):515-528, 1998.528, 1998.
Craig Labovitz, G. Robert Malan, Farnam Jahanian, "Origins of Craig Labovitz, G. Robert Malan, Farnam Jahanian, "Origins of Internet Routing Instability", IEEE INFOCOM 1999.Internet Routing Instability", IEEE INFOCOM 1999.
Craig Labovitz, G. Abha Ahuja, Farnam Jahanian, "Experimental Craig Labovitz, G. Abha Ahuja, Farnam Jahanian, "Experimental Study of Internet Stability and Backbone Failures." FTCS 1999. Study of Internet Stability and Backbone Failures." FTCS 1999.
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 22 of 50 of 50
BackgroundBackground EventsEvents
NSFNet backbone ended in April ‘95 Evident
Network degradationNetwork degradation bandwidth shortagesbandwidth shortages lack of router switching capacitylack of router switching capacity
“Death of Internet is Imminent” reported by popular pressreported by popular press
Routing Instability (“route flaps”)Routing Instability (“route flaps”) Informally defined as:
““the rapid change of network reachability and the rapid change of network reachability and topology information”topology information”
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 33 of 50 of 50
The Internet BackboneThe Internet Backbone 12 large ISPs, tier one12 large ISPs, tier one
4000-6000 tier two providers 4000-6000 tier two providers
Large public exchange points are Large public exchange points are considered the “core” of the Internet.considered the “core” of the Internet.
Backbone service providers must maintain a Backbone service providers must maintain a complete map, or complete map, or default-freedefault-free routing table. routing table.
Divided into different regions of Divided into different regions of administrative control called autonomous administrative control called autonomous systems (AS’s).systems (AS’s).
Most AS’s exchange routing information Most AS’s exchange routing information through the border gateway protocol (BGP).through the border gateway protocol (BGP).
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 44 of 50 of 50
Routing InstabilityRouting Instability
OriginsOrigins Router configuration errors Transient physical and data link
problems Software bugs
EffectsEffects Poorer end-to-end network
performance Degradation of overall efficiency of
the Internet infrastructure
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 55 of 50 of 50
Route FlapsRoute Flaps Result in large number of routing Result in large number of routing
updates passed to core Internet updates passed to core Internet exchange point routers.exchange point routers.
Network instability spreads from Network instability spreads from router to router and propagates router to router and propagates throughout the network.throughout the network.
Effects in Internet infrstructure:Effects in Internet infrstructure: Increased packet loss Delays in time for network convergence Resource overhead (CPU, memory, etc.)
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 66 of 50 of 50
BGPBGP An incremental protocolAn incremental protocol
Does not flood intra-domain network with Does not flood intra-domain network with topological information or link state entries topological information or link state entries (like IGRP and OSPF)(like IGRP and OSPF)
Sends update information only upon Sends update information only upon changes in topology or policychanges in topology or policy
Uses TCP as underlying transport Uses TCP as underlying transport mechanism (as opposed to reliability mechanism (as opposed to reliability through datagram service)through datagram service)
As a path vector routing protocol, it limits As a path vector routing protocol, it limits the distribution of reachability information.the distribution of reachability information.
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 77 of 50 of 50
Routing on the BackboneRouting on the Backbone pathpath - sequence of intermediate AS’s between source and - sequence of intermediate AS’s between source and
destination routers that form a directed route for packets destination routers that form a directed route for packets to travelto travel
Router configuration files allow the stipulation of routing Router configuration files allow the stipulation of routing policies which may:policies which may:
specify the filtering of specific routes modify path attributes before sharing
Policy decisions can be made based on:Policy decisions can be made based on: announcement of routes from peers attributes of announced routes (such as MED’s)
After each router makes a new local decision on the best After each router makes a new local decision on the best route to a destination, it sends it.route to a destination, it sends it.
As the route propagates, each AS appends its unique As the route propagates, each AS appends its unique number to the route’s number to the route’s ASPATHASPATH, which, in conjunction with , which, in conjunction with the prefix, provides a specific handle for transit. the prefix, provides a specific handle for transit.
The The ASPATH ASPATH mechanism allows a router to detect and mechanism allows a router to detect and prevent routing loops.prevent routing loops.
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 88 of 50 of 50
Routing Information in BGPRouting Information in BGP Two forms:Two forms:
Announcements Indicates that a router has either learned a new network Indicates that a router has either learned a new network
attachment or has made a policy decision to prefer a diff. attachment or has made a policy decision to prefer a diff. route to a destination.route to a destination.
Withdrawals Sent when a router decides that a network is no longer Sent when a router decides that a network is no longer
reachablereachable Paper distinguishes between:Paper distinguishes between:
Explicit – associated with actual withdrawal message Implicit – existing route replaced by new route
A BGP A BGP updateupdate may contain multiple may contain multiple announcements and withdrawals.announcements and withdrawals. Ideally, routers should only generate routing
updates for relatively infrequent policy changes and the addition of new physical networks.
It’s been found that BGP’s ASPATH It’s been found that BGP’s ASPATH mechanism is not sufficient to ensure mechanism is not sufficient to ensure network convergence.network convergence.
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 99 of 50 of 50
Methodology of StudiesMethodology of Studies Geographically diverse exchange points.Geographically diverse exchange points.
Although the route servers do not forward network Although the route servers do not forward network traffic, the route servers do peer with over 90% of traffic, the route servers do peer with over 90% of the service providers at each exchange point.the service providers at each exchange point.
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 1010 of 50 of 50
Route Tracker ArchitectureRoute Tracker Architecture
Devloped on Sun workstations
Uses MRT and IPMA toolkits to analyze BGP updates
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 1111 of 50 of 50
““Internet Routing Instability”Internet Routing Instability” Monitored BGP updates generated by Monitored BGP updates generated by five five service provider service provider
backbone routers at the major U.S. public exchange points backbone routers at the major U.S. public exchange points over a period of over a period of ninenine months. months.
Paper distinguishes three types of updates:Paper distinguishes three types of updates: forwarding instability – may reflect legitimate topological changes
and affects the paths on which data will be forwarded routing policy fluctuation – reflects changes in routing policy
information that do no affect forwarding paths pathological – updates are redundant BGP information that do not
reflect routing nor forwarding instability
Instability is defined as:Instability is defined as: an instance of either forwarding instability or policy fluctuation
Data reflects the stability of inter-domain Internet routing, Data reflects the stability of inter-domain Internet routing, or changes in topology or policy among AS’sor changes in topology or policy among AS’s
““Intra-domain routing instability is not explicitly measured Intra-domain routing instability is not explicitly measured and is only indirectly observed through BGP information and is only indirectly observed through BGP information exchanged with a domain’s peer.”exchanged with a domain’s peer.”
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 1212 of 50 of 50
Results of StudyResults of Study The number of BGP updates exchanged per day in The number of BGP updates exchanged per day in
the Internet core is one or more orders of magnitude the Internet core is one or more orders of magnitude larger than expected.larger than expected.
Routing information is dominated by pathological, or Routing information is dominated by pathological, or redundant updates, which may not reflect changes in redundant updates, which may not reflect changes in routing policy or topology.routing policy or topology.
Instability and redundant updates exhibit a specific Instability and redundant updates exhibit a specific periodicity of 30 and 60 seconds.periodicity of 30 and 60 seconds.
Instability and redundant updates show a surprising Instability and redundant updates show a surprising correlation to network usage and exhibit correlation to network usage and exhibit corresponding daily and weekly cyclic trends.corresponding daily and weekly cyclic trends.
Instability is not dominated by a small set of Instability is not dominated by a small set of autonomous systems or routes.autonomous systems or routes.
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 1313 of 50 of 50
Results of Study (2)Results of Study (2) Instability and redundant updates exhibit Instability and redundant updates exhibit
both strong high and low frequency both strong high and low frequency components. Much of the high frequency components. Much of the high frequency instability is pathological.instability is pathological.
Discounting the contribution of redundant Discounting the contribution of redundant updates, the majority (over 80%) of Internet updates, the majority (over 80%) of Internet routes exhibits a high degree of stability.routes exhibits a high degree of stability.
This work has led to specific architectural This work has led to specific architectural and protocol changes in commercial and protocol changes in commercial Internet routers through the collaboration Internet routers through the collaboration with vendors.with vendors.
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 1414 of 50 of 50
Methodology of Study (2)Methodology of Study (2) 12 Gb of data starting in January ’9612 Gb of data starting in January ’96
Uses several tools from XYZ toolkitUses several tools from XYZ toolkit
Focuses on largest exchange, Mae-Focuses on largest exchange, Mae-EastEast
Data verification against BGP Data verification against BGP backbone logs from a number of large backbone logs from a number of large service providersservice providers
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 1515 of 50 of 50
More BackgroundMore Background Problems of network topology fluctuation (non-Problems of network topology fluctuation (non-
convergence):convergence): packets get dropped packets delivered out of order
Internet routers of the day were based on route caching Internet routers of the day were based on route caching architecture.architecture.
Each interface card maintains a routing table of cache of destination and next-hop lookups
If found, then switch on CPU independent “fast-path.”
Sustained levels of instability increase the probability of Sustained levels of instability increase the probability of packet encountering a cache miss, which leads to:packet encountering a cache miss, which leads to:
increased load on CPU increased switching latency dropped or lost packets queuing delay, preventing timely routing of Keep-Alive packets
It should be noted that new generations of routers that do It should be noted that new generations of routers that do not require caching and are able to maintain the full not require caching and are able to maintain the full routing table in memory do not exhibit the same routing table in memory do not exhibit the same pathological loss under heavy routing updates.pathological loss under heavy routing updates.
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 1616 of 50 of 50
Route Flap StormsRoute Flap Storms A failed router can instigate a “route flap A failed router can instigate a “route flap
storm.”storm.” This pathological oscillation causes overloaded routers
to be marked as unreachable since the required interval of Keep-Alive transmissions is not met.
Peers of the failed router find alternative paths for destinations previously reachable and transmit updates.
After the failed router recovers, it will re-initiate BGP peering sessions with peers, transmit large state dumps, and cause more routers to fail.
““Route Flap Storms” in 1996 caused Route Flap Storms” in 1996 caused extended outages for several million extended outages for several million network customers.network customers.
Newer generations of routers provide a Newer generations of routers provide a mechanism for giving BGP and Keep-Alive mechanism for giving BGP and Keep-Alive messages higher priority.messages higher priority.
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 1717 of 50 of 50
Battling Routing InstabilityBattling Routing Instability Route Aggregation (Supernetting):Route Aggregation (Supernetting):
combines a number of smaller IP prefixes into a single, less specific route announcement.
reduces overall number of networks visible on the core Internet
fails in multi-homing (when end-sites have redundant connections to the internet via multiple service providers).
In 1996, more than 25% (and growing) of prefixes In 1996, more than 25% (and growing) of prefixes were multi-homed and therefore non-aggregatable.were multi-homed and therefore non-aggregatable.
Deployment of route dampening Deployment of route dampening algorithmsalgorithms “hold-down” updates that exceed certain
parameters (i.e. quota of updates per hour) can introduce artificial connectivity problems
as “legitimate” announcements are delayed.
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 1818 of 50 of 50
ProblemsProblems The internet continues to exhibit high The internet continues to exhibit high
levels of routing instability despite the levels of routing instability despite the increased emphasis on aggregation and increased emphasis on aggregation and route dampening.route dampening.
Internet topology is growing increasingly Internet topology is growing increasingly less hierarchical with the addition of new less hierarchical with the addition of new exchange points and peering exchange points and peering relationships.relationships.
The behavior and dynamics of Internet The behavior and dynamics of Internet routing stability has gone mostly without routing stability has gone mostly without formal study prior to the publication of formal study prior to the publication of the paper. Little was known!the paper. Little was known!
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 1919 of 50 of 50
ObservationsObservations Disproportionalism:Disproportionalism:
42,000 Internet prefixes 1300 Autonomous Systems 1500 Unique ASPATHS 3-6 million routing updates per day 125 updates per network per day
At times, 100 prefix announcements per sec.At times, 100 prefix announcements per sec.
Once exceeded 30 million, monitor crashed!
This is a problem for all but the most This is a problem for all but the most high-end of commercial routers, and high-end of commercial routers, and even they exhibit problems.even they exhibit problems.
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 2020 of 50 of 50
Classification of BGP UpdatesClassification of BGP Updates
WADiffWADiff – A route is explicitly withdrawn as it becomes – A route is explicitly withdrawn as it becomes unreachable and it is later replaced with an unreachable and it is later replaced with an alternative route to the same destination; forwarding alternative route to the same destination; forwarding instability.instability.
AADiffAADiff – A route is implicitly withdrawn and replaced – A route is implicitly withdrawn and replaced by an alternative route as the original route becomes by an alternative route as the original route becomes unreachable, or a preferred alternative path unreachable, or a preferred alternative path becomes available; forwarding instability.becomes available; forwarding instability.
WADupWADup – A route is explicitly withdrawn and then re- – A route is explicitly withdrawn and then re-announced as unreachable. This may reflect announced as unreachable. This may reflect transient topological (link or router failure, or it may transient topological (link or router failure, or it may represent a pathological oscillation; forwarding represent a pathological oscillation; forwarding instability or pathological behavior (instability or pathological behavior (see next slidesee next slide))
All considered to be instabilityAll considered to be instability
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 2121 of 50 of 50
Classification of Pathological Classification of Pathological Behavior (Redunant Updates)Behavior (Redunant Updates) AADupAADup – A route is implicitly withdrawn and – A route is implicitly withdrawn and
replaced with a duplicate of the original replaced with a duplicate of the original route (a router should only send an update route (a router should only send an update for a change in topology). for a change in topology).
WWDupWWDup – The repeated transmission of BGP – The repeated transmission of BGP withdrawals for a prefix that is currently withdrawals for a prefix that is currently unreachable.unreachable.
All considered to beAll considered to be pathological pathological instability.instability.
Pathological updates may have a minimal Pathological updates may have a minimal impact on the performance of the Internet.impact on the performance of the Internet.
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 2222 of 50 of 50
Expected InstabilityExpected Instability Problems affecting aggregation into Problems affecting aggregation into
supernets:supernets: Multi-homing initial lack of hierarchical IP address space
allocation reluctance to renumber IP addresses
Result: Large number of globally visible Result: Large number of globally visible addressesaddresses
Each globally visible address is reachable by Each globally visible address is reachable by one or more paths.one or more paths.
You would expect Internet instability to be You would expect Internet instability to be proportional to the total number of available proportional to the total number of available paths to all globally visible network paths to all globally visible network addresses or aggregatesaddresses or aggregates
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 2323 of 50 of 50
Mae-East Routing UpdatesMae-East Routing Updates
Most WWDup withdrawals are transmitted by routers belonging to Most WWDup withdrawals are transmitted by routers belonging to AS’s that never previously announce reachability from the withdrawn AS’s that never previously announce reachability from the withdrawn prefixes.prefixes.
On average, 500,000 – 6 million pathological withdrawals per dayOn average, 500,000 – 6 million pathological withdrawals per day
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 2424 of 50 of 50
Update Totals per ISP on a Update Totals per ISP on a Given DayGiven Day
Many of the exchange point routers withdraw an Many of the exchange point routers withdraw an order of order of magnitudemagnitude more routes than they announce during a given more routes than they announce during a given day.day.
Provider I shows the disproportionate effect that a single Provider I shows the disproportionate effect that a single service provider can have on the global routing mesh.service provider can have on the global routing mesh.
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 2525 of 50 of 50
More ObservationsMore Observations Guess what:Guess what:
There is a strong causal relationship between the manufacturer of router used by an ISP and the ISP’s exhibited level of pathological BGP behavior.
Routing updates have a regular, Routing updates have a regular, specific periodicity, usually either 30 specific periodicity, usually either 30 or 60 seconds.or 60 seconds.
The The persistencepersistence of instability is the of instability is the duration of time that routing duration of time that routing information fluctuates before it information fluctuates before it stabilizes.stabilizes.
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 2626 of 50 of 50
Origins of Routing PathologiesOrigins of Routing Pathologies
Some pathological withdrawals can be Some pathological withdrawals can be at attributed to implementation at attributed to implementation decisionsdecisions time-space trade off in not maintaining state
of advertisements stateless BGP = O(N*U) updates Presentation of results led to a router
vendor’s updating of software to a partial state
Stateless BGP contributes an Stateless BGP contributes an insignificant number of updates and insignificant number of updates and does not account for oscillating does not account for oscillating behavior of WWDup and AADup behavior of WWDup and AADup updates.updates.
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 2727 of 50 of 50
Origins of Routing Pathologies (2)Origins of Routing Pathologies (2)
Single-homed, stateless peer routers should Single-homed, stateless peer routers should result in at most result in at most O(N) O(N) updates, but instead:updates, but instead: It seemed that each legitimate withdrawal
induces some type of short-lived pathological network oscillation
Persistence of these updates is between 1 and 5 minutes
Periodic routing instability may be caused Periodic routing instability may be caused by:by: inadvertant synchronization on update
transmission improper configuration of interaction between IGP
and BGP (conversion is lossy) Internet Routing Instability still remains Internet Routing Instability still remains
poorly understoodpoorly understood
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 2828 of 50 of 50
Forwarding InstabilityForwarding Instability
Instability DensityInstability Density
Black squares are above a particular threshold (mean of Black squares are above a particular threshold (mean of detrended data) (345 updates in March, 770 in September)detrended data) (345 updates in March, 770 in September)
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 2929 of 50 of 50
Forwarding Instability (2)Forwarding Instability (2)
A week of raw forwardingA week of raw forwarding
Little instability over the weekendLittle instability over the weekend
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 3030 of 50 of 50
Forwarding Instability (3)Forwarding Instability (3)
Time series analyses, FFT and MEM spectral estimation, validate results.
Routing instability corresponds closely to trends in Internet bandwidth usage and packet loss (intuitively obvious?)
Rigorous justification of network usage equating to routing instability is problematic due to the size and heterogeneity of the internet.
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 3131 of 50 of 50
Fine-grained Instability Stats.Fine-grained Instability Stats.
NoNo single AS consistently dominates single AS consistently dominates the instability statistics.the instability statistics.
There is There is notnot a correlation between a correlation between the size (# routes responsible for in the size (# routes responsible for in table) of an AS and its proportion of table) of an AS and its proportion of the instability statistics.the instability statistics.
A small set of paths or prefixes do A small set of paths or prefixes do notnot dominate the instability statistics; dominate the instability statistics; instability is evenly distributed across instability is evenly distributed across routesroutes
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 3232 of 50 of 50
Fine-grained Instability Stats. (2)Fine-grained Instability Stats. (2)
Internet routing tables are dominated by 6-8 ISPs
Over the course of the month, their share of the default-free routing tables did not change significantly
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 3333 of 50 of 50
Fine-grained Instability Stats. (3)Fine-grained Instability Stats. (3)
Internet routing tables are dominated by 6-8 ISPs
Over the course of the month, their share of the default-free routing tables did not change significantly
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 3434 of 50 of 50
Fine-grained Instability Stats. (4)Fine-grained Instability Stats. (4)
80-100% of the daily instability is contributed by Prefix + AS pairs announced less than 50 times.
(a) ISP A announced seven routes between 630 and 650 times with no withdrawals
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 3535 of 50 of 50
Fine-grained Instability Stats. (5)Fine-grained Instability Stats. (5)
80-100% of the daily instability is contributed by Prefix + AS pairs announced less than 50 times.
(c) ISP A announced seven routes between 630 and 650 times with no withdrawals
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 3636 of 50 of 50
Fine-grained Instability Stats. (6)Fine-grained Instability Stats. (6)
(a) 20-90% of AADiff events are contributed by (a) 20-90% of AADiff events are contributed by routes that changed 10 times or lessroutes that changed 10 times or less
No single route consistently dominates the instability No single route consistently dominates the instability measured.measured.
Some days, a single Prefix+AS pair contributes Some days, a single Prefix+AS pair contributes substantially (40%) - account for lowest curve in (a) substantially (40%) - account for lowest curve in (a) (ISP A)(ISP A)
WADiff climbs to a plateau about 95% faster than WADiff climbs to a plateau about 95% faster than other three categories.other three categories.
WADiff has fewest number of Prefix+AS pairs that WADiff has fewest number of Prefix+AS pairs that dominate their days.dominate their days. Comforting, since categories probably best represent
topological instability
Investigation on prefix alone provided similar results.Investigation on prefix alone provided similar results.
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 3737 of 50 of 50
Temporal Properties of Temporal Properties of Instability StatisticsInstability Statistics Update frequency distributions for Update frequency distributions for
instability events at Prefix+AS levelinstability events at Prefix+AS level Update frequency is the inverse of the inter-
arrival time between routing updates; higher frequency corresponds to a short inter-arrival time
Other work has been able to capture Other work has been able to capture the lower frequencies through both the lower frequencies through both routing table snapshots and end-to-routing table snapshots and end-to-end techniquesend techniques
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 3838 of 50 of 50
Temporal Properties of Temporal Properties of Instability Statistics (2)Instability Statistics (2)
Histogram distribution captured in 30 second and 1 minute bins
You would expect a Poisson distribution reflecting exogneous events, such as power outages, fiber cuts, and natural human events.
30 second periodicity suggests widespread systematic influence in origin.
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 3939 of 50 of 50
Temporal Properties of Temporal Properties of Instability Statistics (3)Instability Statistics (3)
Histogram distribution captured in 30 second and 1 minute bins
You would expect a Poisson distribution reflecting exogneous events, such as power outages, fiber cuts, and natural human events.
30 second periodicity suggests widespread systematic influence in origin.
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 4040 of 50 of 50
ConclusionsConclusions Routing instability can have a significant Routing instability can have a significant
deleterious impact in Internet infrastructuredeleterious impact in Internet infrastructure
Majority (99%) of routing information is Majority (99%) of routing information is pathological and may not reflect real pathological and may not reflect real network topological changes.network topological changes.
Instability is well distributed across AS’s and Instability is well distributed across AS’s and prefix space.prefix space.
Instability and redundant routing Instability and redundant routing information exhibit a strong periodicity (of information exhibit a strong periodicity (of unknown origin).unknown origin).
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 4141 of 50 of 50
Conclusions (2)Conclusions (2)
Proportion of Internet Routes Proportion of Internet Routes affected by routing updatesaffected by routing updates
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 4242 of 50 of 50
Conclusions (3)Conclusions (3) Current trends in the evolution of the Current trends in the evolution of the
Internet may have a significant impact on Internet may have a significant impact on routing instability and the future routing instability and the future performance of the network.performance of the network.
25% of networks are multi-homed and the 25% of networks are multi-homed and the growth rate is about lineargrowth rate is about linear
Proliferation of exchange points is leading to Proliferation of exchange points is leading to a less hierarchical Internet.a less hierarchical Internet.
This research helps characterize the effect This research helps characterize the effect of added topological complexity since the of added topological complexity since the end of the NSFNet backbone.end of the NSFNet backbone.
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 4343 of 50 of 50
““Origins of Internet Routing Origins of Internet Routing Instability”Instability” 28 months gathering data from more than 40 28 months gathering data from more than 40
commercial routers, switches, and Unix-based commercial routers, switches, and Unix-based PC routersPC routers
Also collected IBGP information at the state of Also collected IBGP information at the state of Michigan’s public Internet backbone, MichNetMichigan’s public Internet backbone, MichNet
Maintains that routing instability remains well Maintains that routing instability remains well distributed across prefix and AS space but distributed across prefix and AS space but that instability is that instability is not related to prefix length.not related to prefix length.
Since previous paper’s work, the volume of Since previous paper’s work, the volume of inter-domain routing messages in the Internet inter-domain routing messages in the Internet core has decreased by an order of magnitude.core has decreased by an order of magnitude.
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 4444 of 50 of 50
Research Pays OffResearch Pays Off
Number of BGP updates almost doubled in 28 mo.’sNumber of BGP updates almost doubled in 28 mo.’s Number of announcements per day eventually Number of announcements per day eventually
(finally) surpassed the number of withdrawals at Mae (finally) surpassed the number of withdrawals at Mae East.East.
On average, across backbone, exchange point On average, across backbone, exchange point routers generated only half of the number of routers generated only half of the number of withdrawals at the number of announcementswithdrawals at the number of announcements
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 4545 of 50 of 50
New Routing Update New Routing Update CategoriesCategories We still have AADiff, AADup, and We still have AADiff, AADup, and
WWDup, but we add:WWDup, but we add: Tup and Tdown – fluctuation in the
reachability for a given prefix. An announced route is withdrawn and transitions down, or a currently unreachable prefix is announced as reachable and transitions up
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 4646 of 50 of 50
Breakdown of BGP UpdatesBreakdown of BGP Updates
Tup roughly equal to Tdown, connection recovery (good!) Fluctuation in prefix reachability account for over 40% of all non
WWDup BGP traffic After January ’98, AADup comprised largest cat. of updates.
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 4747 of 50 of 50
Analysis of AADiffsAnalysis of AADiffs
90% of MED oscillations involve 90% of MED oscillations involve only two large ISPs, product of only two large ISPs, product of their specific routing policies.their specific routing policies.
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 4848 of 50 of 50
Dynamically Mapped Dynamically Mapped MEDMED
AS2 always wants traffic flowing from AS3 to AS1 to take the shortest path through its network, so instead of setting the MED value via static configuration rules, AS2 dynamically maps the IGP distance between R5 and R3, and between R5 and R4 to the MED attribute value associated with route advertisements from routers R3 and R4 to AS1.
AS2 influences AS1 who wants to reach Network A. AS1 will prefer the route via R4.
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 4949 of 50 of 50
More ResultsMore Results
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 5050 of 50 of 50
ConclusionsConclusions
ImprovementImprovement Routing update messages reduced by
a magnitude Suppressed pathological withdrawals Instability is still well distributed
across AS and prefix space More bugs in router software led to
anomalies
Spring 2006Spring 2006 Internet Routing InstabilityInternet Routing Instability 5151 of 50 of 50
““Experimental Study of Internet Stability Experimental Study of Internet Stability and Wide-Area Backbone Failures”and Wide-Area Backbone Failures”
ConclusionsConclusions Internet has proven remarkably
robust. A small number of routes contribute to
overall unavailability. 40% of routes exhibit multiple failures Outages lasting longer than two hours
usually represent long-term outages requiring significant engineering effort for repair
BGP failures must stemp from non-hardware/software sourcdes, probably TCP characteristics.