detection of routing loops and analysis of its causesdetection of routing loops and analysis of its...
TRANSCRIPT
Detection of Routing Loops and Analysis of Its Causes
Sue MoonDept. of Computer Science
KAIST
Joint work with Urs Hengartner, Ashwin Sridharan,Richard Mortier, Christophe Diot
2
Link Utilization
Internet backbone link
Routing loop causes
increase by 25%!
3
Overview
! Routing protocols have much impact on the performance of the network4How do we detect them?4How often do loops occur?4How do they impact loss and delay?
! Analyze causes of loops4What causes them?
4
Possible Causes of Routing Loops
! Persistent routing loops4E.g., due to misconfiguration.4Loops can last hours if undetected.
! Transient routing loops4Routing state is dynamic.4Inconsistencies in routing state can cause loops.
4Inconsistencies should disappear within seconds/minutes.
4Expectation: Loops last seconds/minutes.
5
How Can Transient Routing Loop Occur?
R1 R2
R3
6
Detection of “Loops” in Packet Traces
! Detect replicas in a packet trace4Packets with exact same header but for TTL,CRC
4TTL difference: 2 or larger4Set of replicas = Packet Loop4Set of packet loops associated with a routing event = Routing Loop
7
Traces
! Backbone traces4NYC and SJ links from Nov. 8th, 20014NYC links from Oct. 9th, 2002
8
Packet Traces
0.026%135010711Backbone 41.687%202.211Backbone 30.118%1 6772437.5Backbone 24.839%50124Backbone 1
PacketsTotal (106)(Mbps)(hours)
Looped PacketsAvg BWLengthTrace
…loops occur in bursts and can affect up to 25% of packets!
On average, loops do not affect much traffic, but…
9
Observations about Packet Loops
! General Observations4Loop size: # of nodes involved in packet loop4Number of replicas in packet loop
! Properties of packet loops4Packet types
! Duration4Of packet loops in packets
10
Loop Size
Loop size: value by which TTL field in packet loops gets decremented.
Figure 2
11
Packet Loop Length
How often does a packet show up before it expires?
Figure 3
12
Traffic Types
! Different types of Internet traffic.! Routers are oblivious to type of traffic.! Expectation: Traffic types of packet loops
streams are distributed similarly as traffic types of overall traffic.
13
Traffic Types (Backbone 2)
! By protocol4TCP: 10% (93%)4UDP: 16% (6%)4ICMP 77% (0.3%)
! TCP Flags4SYN: 51% (5%)4ACK: 73% (97%)4RST: 13% (1.5%)4FIN: 8% (4%)
14
Reasons for Increases
! TCP SYN traffic.4TCP is connection oriented.4End point tries to open connection, sends SYN packet.4SYN packet loops and expires, no other packets are
sent. ! UDP traffic.
4UDP is connectionless, no feedback from receiver.4Sending application is oblivious of loop.
! ICMP traffic.4Caused by traceroute/ping applications.4People are exploring loop.Observations confirm presence of loops!
15
Out-Of-Order Delivery
16
Causes of Packet Loops: BGP
customer
AS 1
AS 2
A
B
C
D
17
Matching BGP Updates
! Any advertisement of the longest prefix?! Temporal vicinity of 2 minutes to packet
loops?! Change in next hop or AS path?
18
Causes of Loops: ISIS
R1
R3R2
R5R4
1 1
1
1 1
4
19
Time-Line at Nodes R2 and R3
R2 R3
Failure Detection
LSP generation
Shortest Path ComputationLSP Flooding
FIB UpdateLSP Arrival
Shortest PathComputation
FIB Update
20
Matching ISIS Updates
! Upon receipt of an LSP, compute the shortest path from the observation node to the egress router
! If forwarding path changed and it is within temporal vicinity of loop4see if the observation node lies on the shortest path before or after the change
21
BGP Update Matches
59.2015.543.7NYC-25
70.00070.0NYC-24
99.480.6018.8NYC-22
3.3003.3NYC-23
87.97.5080.2NYC-21
90.850.8040.1NYC-20
Total% persistent (no BGP)
% persistent
(BGP)
% transient
Trace
22
Factors to Varying Success
! Persistent Loops4Events occurred before trace collection
! BGP changes external to Sprint4Comparison with RouteView updates: increase in
matches! Geographical distribution of loop destinations
4Measurement PoP not involved in route changes4Avg # of ASes traversed: longest for NYC-23
23
Conclusions
! Loops can be detected and analyzed! Loops are not uncommon! Most are due to BGP updates! BGP changes farther away from the
observations point may not be identified
BACKUP SLIDE
25
CDF of Number of Replicas
26
CDF of Inter-Replica Spacing Time
27
Packet Types of All Traffic
28
Packet Types of Loops
29
Destination Addresses of Loops
Regional 2Backbone 1
30
CDF of Replica Stream Duration in Time
31
CDF of Routing Loop Duration in Time
32
Overview
! Types and causes behind routing loops4 Transient - part of normal routing protocol operation4 Persistent - “long-lasting”, manual intervention
required! Detection of routing loops in packet traces
4 Detection algorithm4 Observations about the routing loops
! Analysis of performance impact4 Loss, delay, out-of-order delivery
! On-line detection algorithm! Summary
33
Fraction of Packets in Loops
Backbone 1 Backbone 4
34
Construction of a Typical End-To-End Path
10 hopsin the Backbone
DSL/LAN/Cable/Phone
Regional to Backbone
35
Estimate of End-to-End Loss
! Assume:4No loss on the access link due to routing loops4Losses are independence between links
! Estimate:4Lr: from Regional traces4Lb: from Backbone traces but for Backbone 441 - (1- Lr)2(1- Lb)10 = 0.003 ~ 0.0254Implications on SLA??
36
Delay Due to Routing Loops
37
Out-Of-Order Delivery
38
Causes of Loop
Trace TotalLoops BGP IGP
Backbone 1 413 57.4 2.67Backbone 2 124 13.4 noneBackbone 3 150 10.7 24Backbone 4 857 93.2 noneBackbone 5 14 85.7 noneBackbone 6 194 4.12 none
Likely Cause (%)
39
Overview
! Types and causes behind routing loops4 Transient - part of normal routing protocol operation4 Persistent - “long-lasting”, manual intervention
required! Detection of routing loops in packet traces
4 Detection algorithm4 Observations about the routing loops
! Analysis of performance impact4 Loss, delay, out-of-order delivery
! On-line detection algorithm! Summary & Future Work
40
To Detect a Loop On-line
! Focus on persistent loops! Questions:
4More focus on persistent loops4How much traffic is affected? -> alarm4What prefix is affected? -> warning
41
On-Line Detection Algorithm
! How many packets to /24 get looped? 100"WARNING
! How many looped packets / million? 5%! How long (in millions) did it last? 10 millions"ALARM
! By the time an alarm is raised, warnings are raised and help debugging the system
! Fixed memory and computation complexity
42
Validation of On-Line Algorithm
43
Summary
! Impact of routing on performance has been analyzed in terms of loss and delay.4Per-link loss varies greatly.4Excluding “outliers”, end-to-end loss of 0.3% is
unavoidable.4For a small number of packets that escape the loops,
50 ~ 500 msec delay is added on the average.! On-line detection algorithm
4In conjunction with routing protocol monitoring, it will help detect and fix persistent loops.
44
Future Work
! More work needed to determined causes behind routing loops4Correlate with BGP/IS-IS updates
• Address hijacking• Wrong aggregation• Origin misconfiguration• Export misconfiguration
! Integration with existing monitoring tools
Backup Slides
46
Superbowl Sunday, 2/3/2002
47
Superbowl Sunday, 2/3/2002
48
What Next?
! Alarms and warnings4How to extract just enough info to be useful4How to relate it with BGP/IS-IS update info4How to integrate with management/monitoring infrastructure