a measurement study on the impact of routing events on end-to-end internet path performance feng...

24
A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang 1 , Zhuoqing Morley Mao 2 Jia Wang 3 , Lixin Gao 1 , Randy Bush 4 September 14, 2006 1 University of Massachusetts, Amherst 2 University of Michigan 3 AT&T Labs-Research 4 Internet Initiative Japan

Upload: harriet-tucker

Post on 13-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang 1, Zhuoqing Morley Mao 2 Jia Wang 3, Lixin Gao 1,

A Measurement Study on the Impact of Routing Events on

End-to-End Internet Path Performance

Feng Wang1 , Zhuoqing Morley Mao2

Jia Wang3, Lixin Gao1, Randy Bush4

September 14, 2006

1University of Massachusetts, Amherst2University of Michigan3AT&T Labs-Research4Internet Initiative Japan

Page 2: A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang 1, Zhuoqing Morley Mao 2 Jia Wang 3, Lixin Gao 1,

Motivation

Real-time services have made high availability of end-to-end Internet paths of paramount importance.

– low packet loss rate, low delay, high network availability, and fast reaction time

Internet path failures are widespread [Labovitz:98,

Markopoulou:04,Feamster:03].– can last as long as 10 minutes

Degraded end-to-end path performance is correlated with routing dynamics.

Page 3: A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang 1, Zhuoqing Morley Mao 2 Jia Wang 3, Lixin Gao 1,

Open Questions

How routing changes result in degraded end-to-end path performance?

What kinds of routing dynamics cause the degraded end-to-end performance?

How factors such as topological properties, or routing policies affect performance degradation?

Page 4: A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang 1, Zhuoqing Morley Mao 2 Jia Wang 3, Lixin Gao 1,

Our Work

Study end-to-end performance under realistic topologies.

Investigate several metrics to characterize the end-to-end loss, delay, and out-of-order packets.

Characterize the kinds of routing changes that impact end-to-end path performance.

Analyze the impact of topology, routing policies, MRAI timer and iBGP configurations on end-to-end path performance.

Page 5: A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang 1, Zhuoqing Morley Mao 2 Jia Wang 3, Lixin Gao 1,

Methodology A multi-homed prefix

– BGP Beacon prefix: 192.83.230.0/24

Controlled Routing Changes – Failover events: Beacon changes from the state of having both

providers to the state of having only a single provider.

– Recovery events: Beacon changes from the state of having a single provider for connectivity to the state of having both providers.

Provider 1

Beacon

Provider 2 Provider 1 Provider 2 Provider 1 Provider 2

Beacon Beacon

Failover event Recovery event

Page 6: A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang 1, Zhuoqing Morley Mao 2 Jia Wang 3, Lixin Gao 1,

Active Probing From 37 PlanetLab hosts to the Beacon host (a host

within the Beacon prefix)– Back-to-back traceroutes– Back-to-back pings– UDP probing (50msec interval)

Data plane performance metricsInternet

Provider 2

Beacon host

Provider 1

host Bhost A

host C

metrics

Active probing

traceroute ping UDP probing

Pack loss

Delay

Out-of-order

Page 7: A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang 1, Zhuoqing Morley Mao 2 Jia Wang 3, Lixin Gao 1,

Packet Loss

Loss burst: consecutive UDP probing packets lost during a routing change event.

Failover Recovery

Page 8: A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang 1, Zhuoqing Morley Mao 2 Jia Wang 3, Lixin Gao 1,

Correlating Packet Loss with Routing Failures

ICMP replies – temporary loss of reachability (!N or !H) – forwarding loops (exceeded TTL)

Routing failures– temporary loss of reachability and transient routing loops

Correlate loss bursts with ICMP messages – time window [-1 sec, 1 sec]

Underestimate the number of loss bursts due to routing failures – missing ICMP packets.

Page 9: A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang 1, Zhuoqing Morley Mao 2 Jia Wang 3, Lixin Gao 1,

An Example

planet02.csc.ncsu.edu experiences packet loss on July 30, 2005

Page 10: A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang 1, Zhuoqing Morley Mao 2 Jia Wang 3, Lixin Gao 1,

Loss Bursts due to Routing Failures

Failover events: 76% packets lost Recovery events: 26% packets lost

Failover Recovery

Page 11: A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang 1, Zhuoqing Morley Mao 2 Jia Wang 3, Lixin Gao 1,

How Routing Failures Occur (Failover)?

R1

Beacon

R4 R5

R6

R2 R3

Provider 1 Provider 2Peer link

0

02 0

0

01 0

0

0

Prefer-customer routing policy: routes received from a provider’s customers are always preferred over those received from its peers.

AS 0

Customer link

Page 12: A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang 1, Zhuoqing Morley Mao 2 Jia Wang 3, Lixin Gao 1,

How Routing Failures Occur (Failover)? (contd.)

R1

Beacon

R4 R5

R6

R2 R3

Provider 1 Provider 2

Peer link

0

02 0

001 0 0

0

R7 R9Provider 32 01 0

1 01 02 0

No-valley routing policy: peers do not transit traffic from one peer to another.

AS 0

Peer link

R8

Page 13: A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang 1, Zhuoqing Morley Mao 2 Jia Wang 3, Lixin Gao 1,

How Routing Failures Occur? (Recovery)

R1R2 R4

R3

0Beacon

path (0) Path (0)

Withdraw (2 0)

5. R1 regains its connection to the Beacon

1. Path 0 R3 recovery.

2. R3 sends the path to R2

3. R2 sends a withdrawal

to R14. R3 sends the recovery path to R1

iBGP constraint: a route received from an iBGP router cannot be transited to another iBGP router

Provider 1

Provider 2

AS 0

Page 14: A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang 1, Zhuoqing Morley Mao 2 Jia Wang 3, Lixin Gao 1,

Summary During failover and recovery events

– Routing changes impact packet loss significantly.– Multiple loss bursts are observed in 60% of events.– Routing changes can lead to long packet round-trip delays and

reordering.

Loss bursts explained by routing failures last longer than those unidentified ones.

Loss bursts caused by forwarding loops last longer than those caused by loop-free routing failures.

Page 15: A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang 1, Zhuoqing Morley Mao 2 Jia Wang 3, Lixin Gao 1,

Conclusions During failover and recovery events

– routing failures contribute to end-to-end packet loss significantly.

Routing policies, iBGP configuration and MRAI timer values play a major role in causing packet loss during routing events.

Degraded end-to-end performance can be experienced by a diverse set of hosts when there is a routing change.

Accommodate routing redundancy may eliminate majority of identified path failures.

Page 16: A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang 1, Zhuoqing Morley Mao 2 Jia Wang 3, Lixin Gao 1,

The End

Thanks!

Page 17: A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang 1, Zhuoqing Morley Mao 2 Jia Wang 3, Lixin Gao 1,

Location of Lost Bursts (Failover events)

Location of the first lost bursts caused by routing failures.

From ISP 2’s BGP updates:– Routing failures do occur and are not visible from ICMP messages due

to short duration.

From another AS’s BGP updates, and Oregon RouteView– Routing failures are cascaded to other ASes.

Class ISP 1 ISP 2 Other tier1 Non tier-1

Failover 1 92% 0 5% 3%

Failover 2 0 9% 73% 18%

Page 18: A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang 1, Zhuoqing Morley Mao 2 Jia Wang 3, Lixin Gao 1,

Location of Lost Bursts (Recovery events)

Location of the first lost bursts caused by routing failures.

BGP updates from ISP 2– 12 withdrawals over 724 recovery events

Class ISP 1 ISP 2 Other tier1 Non tier-1

Failover 1 90% N/A 0% 10%

Failover 2 N/A 0% 59% 41%

Page 19: A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang 1, Zhuoqing Morley Mao 2 Jia Wang 3, Lixin Gao 1,

Representativeness

Connectivity of Destination Prefixes– SS: Single-homed prefixes via a single upstream link– SM: Single-homed prefixes via multiple upstream links– MS: Multi-homed prefixes via a single upstream link– MM: Multi-homed prefixes via multiple upstream links

Routing tables from one tier-1 ISP on January 15, 2006

class SS SM MS MM

percentage 48% 6% 29% 17%

Page 20: A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang 1, Zhuoqing Morley Mao 2 Jia Wang 3, Lixin Gao 1,

Representativeness (contd.)

Multi-homed destination prefixes

ISP 2 ISP 3

ISP 1

destination

Customer link Customer link

Peer link

Page 21: A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang 1, Zhuoqing Morley Mao 2 Jia Wang 3, Lixin Gao 1,

Representativeness (contd.)

Multi-homed destination prefixes with multi-upstream links

ISP 2

ISP 1

ISP 1 ISP 2

Page 22: A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang 1, Zhuoqing Morley Mao 2 Jia Wang 3, Lixin Gao 1,

Loss Burst Length

loss burst length can be as long as 480 packets for failover events, and 180 packets for recovery events

Loss burst length

Failover events Recovery events

Page 23: A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang 1, Zhuoqing Morley Mao 2 Jia Wang 3, Lixin Gao 1,

Multiple Loss Bursts

Multiple loss bursts after the injection of a withdrawal message or an announcement.

Failover Recovery

Page 24: A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang 1, Zhuoqing Morley Mao 2 Jia Wang 3, Lixin Gao 1,

Methodology Evaluation Our measurement is not significantly biased by ICMP

blocking– The number of ICMP messages in the absence of routing

change (0.6%).

– ICMP messages from 68 ASes, and 53% of them belong to 10 tier-1 ASes.

– 52% of ISP1’s routers, and 95% of ISP2’s routers generate ICMP messages.