transient bgp loops do they matter, and what can be done about them? nate kushman mit/akamai...

30
Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

Upload: cameron-turner

Post on 18-Dec-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

Transient BGP Loops

Do they matter, and what can be done about them?

Nate Kushman

MIT/Akamai

Srikanth Kandula, Dina Katabi and John Wroclawski

Page 2: Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

What causes: “Transient BGP Loops”

MIT

Bob

Joe

AT&TSprint

Maintenance

Withdraw MIT

Page 3: Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

What causes: “Transient BGP Loops”

MIT

Bob

Joe

AT&TSprint

Maintenance

Page 4: Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

What causes: “Transient BGP Loops”

MIT

Bob

Joe

AT&TSprint

Maintenance

Withdraw MIT

Page 5: Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

What causes: “Transient BGP Loops”

MIT

Bob

Joe

AT&TSprint

Maintenance

Routing Loop

Page 6: Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

What causes: “Transient BGP Loops”

MIT

Bob

Joe

AT&TSprint

Maintenance

Withdraw MIT

Page 7: Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

What causes: “Transient BGP Loops”

MIT

Bob

Joe

AT&TSprint

Maintenance

Page 8: Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

What causes: “Transient BGP Loops”

MIT

Bob

Joe

AT&TSprint

Maintenance

Page 9: Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

How common are: “Transient Inter-domain Routing Loops”

• Sprint Study (IMC 2003, IMW 2002):– Looked at packet traces from the Sprint

backbone

– Up to 90% of the observed packet-loss was caused by routing loops

– 60-100% of the loops attributable to BGP

Page 10: Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

Routing Loop Damage

• Our Study:– 20 vantage points with BGP feeds– 2 Months– 70,000 unique prefixes– Pinged once every 2 minutes– Trace-routed once every 30 minutes– TTL Exceeded responses to detect loops– Additional pings and traceroutes when loops

detected

Page 11: Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

Routing Loop Damage

10-15% of updates cause routing loops

Page 12: Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

Collateral Damage

AS A

AS F

AS E

AS D

AS C

AS B

Page 13: Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

Collateral Damage

AS A

AS F

AS E

AS D

AS C

AS B

CollateralDamage

X

Page 14: Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

Collateral Damage

Prefixes sharing a loopy link see 19% loss

0

2

4

6

8

10

12

14

16

18

20

-1000 -500 0 500 1000

100 second windows around sharing a loopy link

Pe

rce

nta

ge

of

Pa

ck

et

Lo

ss

Page 15: Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

What should be done?

We should prevent forwarding loops

Page 16: Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

A loop occurs because:

One AS pushes a route update to the data plane, but other AS's, unaware yet of the move, try to send packets on the old route

Page 17: Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

How can we avoid Routing Loops?

MIT

Bob

Joe

AT&TSprint

Maintenance

Withdraw MIT

Page 18: Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

How can we avoid Routing Loops?

MIT

Bob

Joe

AT&TSprint

Maintenance

Withdraw MIT

AT&T still thinks

Joe is routing

through Bob

Page 19: Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

How can we avoid Routing Loops?

MIT

Bob

Joe

AT&TSprint

Maintenance

What if:

AT&T knew about

Joe’s change before

making its own?

Page 20: Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

Suspension

• Continue to route traffic

• Tell control system not to propagate the route

Page 21: Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

How can we avoid Routing Loops?

MIT

Bob

Joe

AT&TSprint

Maintenance

Withdraw MIT

Page 22: Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

How can we avoid Routing Loops?

MIT

Bob

Joe

AT&TSprint

Maintenance

Withdraw MIT

What if:

Joe sends it’s update

before changing it’s

forwarding table?

Page 23: Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

How can we avoid Routing Loops?

MIT

Bob

Joe

AT&TSprint

Maintenance

Page 24: Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

How can we avoid Routing Loops?

MIT

Bob

Joe

AT&TSprint

Maintenance

And also waits for an

Ack from AT&T

before updating

it’s forwarding table?

Page 25: Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

How can we avoid Routing Loops?

MIT

Bob

Joe

AT&TSprint

Maintenance

Then we can be sure

that AT&T knows

about the path change

before it happens and

will not use the path

Page 26: Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

How can we avoid Routing Loops?

MIT

Bob

Joe

AT&TSprint

Maintenance

Instead, AT&T will

move immediately

to the Sprint path and

the loop is avoided.

Page 27: Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

More Generally

• We have proven:– Loops are prevented in the general case

– Convergence properties similar to normal BGP

• All sorts of good proofs and stuff:– http://nms.lcs.mit.edu/~nkushman/

Page 28: Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

Your feedback

• Clearly:– Planned Maintenance events

• 20% of update events caused by planned

maintenance

– Link up events

• What about?– Unplanned Link down events– Trade-off between loss on current path and

collateral damage

Page 29: Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

In Short

• Routing loops cause significant performance problems

• Even prefixes with no BGP updates are significantly affected by loops

• A simple change to BGP can avoid all routing loops

Page 30: Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

Questions?