1 studying black holes on the internet with hubble ethan katz-bassett, harsha v. madhyastha, john p....

44
1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas Anderson University of Washington NSDI, April 2008 This work partially supported by

Upload: sarah-lamb

Post on 24-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

11

Studying Black Holes on the Internet with Hubble

Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas Anderson University of WashingtonNSDI, April 2008

This work partially supported by

Page 2: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

2

Global Reachability When an address is reachable from every

other address Most basic goal of Internet, especially BGP

“There is only one failure, and it is complete partition” Clarke, Design Philosophy of the DARPA Internet Protocols

Physical path BGP path traffic reaches Black hole: BGP path, but traffic persistently

does not reach

Page 3: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

3

From use, seems to usually work Can we assume the protocols just make it work?

“Please try to reach my network 194.9.82.0/24 from your networks…. Kindly anyone assist.” Operator on NANOG mailing list, March 2008.

Does Internet give global reachability?

Page 4: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

4

Does Internet give global reachability?

Page 5: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

55

Hubble System Goal

In real-time on a global scale, automatically monitor long-lasting reachability problems and classify causes

Page 6: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

66

Problem Seen by Hubble on Oct. 8, 2007

1. Target Identification – distributed ping monitors detect when the destination becomes unreachable

Fr:XTo:DPing?

Fr:DTo:XPing!

Fr:ZTo:DPing?

5:09 a.m.

5:11 a.m.

Page 7: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

77

Problem Seen by Hubble on Oct. 8, 2007

1. Target Identification – distributed ping monitors

2. Reachability analysis – distributed traceroutes determine the extent of unreachability

5:13 a.m.

Page 8: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

88

Problem Seen by Hubble on Oct. 8, 2007

1. Target Identification – distributed ping monitors2. Reachability analysis – distributed traceroutes3. Problem Classification

a) group failed traceroutes

Page 9: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

99

Problem Seen by Hubble on Oct. 8, 2007

1. Target Identification – distributed ping monitors2. Reachability analysis – distributed traceroutes3. Problem Classification

a) group failed traceroutesb) spoofed probes to isolate direction of failure

Fr:XTo:DPing?

D to Y works!

Y to D fails!

D to Z works!

Z to D fails!

Fr:YTo:DPing?

Fr:DTo:YPing!

Fr:YTo:DPing?

Fr:DTo:YPing!

Page 10: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

1010

Architecture: Detect Problem

Ping prefix to check if still reachable Every 2 minutes from PlanetLab Report target after series of failed pings

Maintain BGP tables from RouteViews feeds Allows IP AS mapping Identify prefixes undergoing BGP changes as targets

Page 11: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

1111

Architecture: Assess Extent of Problem

Traceroutes to gather topological data Keep probing while problem persists Every 15 minutes from 35 PlanetLab sites

Analyze which traceroutes reach BGP table to map addresses to ASes Alias information to map interfaces to routers

Page 12: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

1212

Architecture: Classify Problem

To aid operators in diagnosis and repair: Which ISP contains problem? Which routers? Which destinations?

Page 13: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

1313

Architecture: Classify Problem

Real-time, automated classification Find common entity that explains substantial

number of failed traceroutes to a prefix Does not have to explain all failed traceroutes Not necessarily pinpointing exact problem

Page 14: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

1414

Classifying with Current Topology Group failed/successful traceroutes by last

AS, router

Example: Router problem No probes reach P through router R Some reach through R’s AS 28% of classified problems

Page 15: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

1515

Classifying with Historical Topology Daily probes from PlanetLab to all prefixes Gives baseline view of paths before problems

Example: “Next hop” problem Paths previously converged on router R Now terminate just before R

14% of classifiedproblems

Page 16: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

16

Classifying with Direction Isolation Internet paths can be asymmetric Traceroutes only return routers on forward path

Might assume last hop is problem Even so, require working reverse path Hard to determine reverse path

Isolate forward from reverse to test individually Without node behind problem, use spoofed probes

Spoof from S to check forward path from S Spoof as S to check reverse path back to S

Page 17: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

17

Classifying with Direction Isolation

Hubble deployment on RON employs spoofed probes 6 of 13 RON permit source spoofing PlanetLab does not support source spoofing

Example: Multi-homed provider problem Probes through Provider B fail Some reach through Provider A Like Cox/USC

6% of classified problems

Page 18: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

1818

Architecture: Summary of Approach

Synthesis of multiple information sources Passive monitoring of route advertisements Active monitoring from distributed vantage points

Historical monitoring data to enable troubleshooting Topological classification and spoofing point at

problem

Page 19: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

19

EvaluationTarget Identification How much of the Internet does Hubble monitor? Reachability Analysis What percentage of the various paths to a prefix

does Hubble analyze?Problem Classification How often can Hubble identify a common entity that

explains the failed paths to a prefix? How often does spoofing isolate the failure

direction?

For further evaluation, please see the paper.

Page 20: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

2020

How much does Hubble monitor?Every 2 minutes: 110,000 prefixes 89% of Internet’s edge address space 92% of edge ASes Origin ASes for 99% of 14M BitTorrent users

Page 21: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

21

Intel

Intel

Intel

Intel

What % of paths does Hubble monitor?

AT&TAT&T SprintSprint

CenicCenicGigapopGigapop AbileneAbilene

UWUW WSUWSU UTUT UMUM MITMIT

Tier 1

Transit

Stub

AT&TAT&T

GigapopGigapop CenicCenic

SprintSprint

PlanetLab’s restricted size and homogeneity limit uphill 90% of our failed traceroutes terminate within 2 AS hops

of prefix’s origin

Compare withBGP paths of447 RIPE peers(downhill ASes)

Page 22: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

22

Intel

Intel

Intel

Intel

What % of paths does Hubble monitor?

AT&TAT&T SprintSprint

CenicCenicGigapopGigapop AbileneAbilene

UWUW WSUWSU UTUT UMUM MITMIT

Tier 1

Transit

Stub

AT&TAT&T

GigapopGigapop CenicCenic

SprintSprint

BGP ASes: { AT&T, Sprint, Gigapop, Cenic, Intel }Also on Traceroutes: { Sprint, Gigapop, Cenic, Intel }Coverage for Intel prefix: 4 of 5 downhill ASes = 80%

Compare withBGP paths of447 RIPE peers(downhill ASes)

Page 23: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

23

Intel

Intel

Intel

Intel

What % of paths does Hubble monitor?

AT&TAT&T SprintSprint

CenicCenicGigapopGigapop AbileneAbilene

UWUW WSUWSU UTUT UMUM MITMIT

Tier 1

Transit

Stub

AT&TAT&T

GigapopGigapop CenicCenic

SprintSprint

Overall for prefixes monitored by Hubble For >60% of prefixes, traverse ALL downhill RIPE ASes For 90% of prefixes, traverse more than half the ASes

Compare withBGP paths of447 RIPE peers(downhill ASes)

Page 24: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

2424

How often can Hubble classify? 9 classes currently

Based on topology Point to an AS and/or router

Results from first week of February 2008 Automatically classified 375,775/457,960

(82%) of problems as they occurred

Page 25: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

2525

How often does spoofing work?When a RON path works and another does not: Isolate 68% of failures from spoofing sources 47% forward, 21% reverse

Page 26: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

2626

How long do black holes last?

3 week study starting September 17, 2007 31,000 black holes involving 10,000 prefixes 20% lasted at least 10 hours! 68% were cases of partial reachability

Page 27: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

2727

How long do black holes last?

3 week study starting September 17, 2007 31,000 black holes involving 10,000 prefixes 20% lasted at least 10 hours! 68% were cases of partial reachability

Partial reachability:

Can’t be just hardware failure

Configuration/ policy

Page 28: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

28

Other Measurement Results Can’t find problems using only BGP updates

Only 38% of problems correlate with RouteViews updates Multi-homing may not give resilience against failure

100s of multi-homed prefixes had provider problems like COX/USC, and ALL occurred on path TO prefix

Inconsistencies across an AS For an AS responsible for partial reachability, usually some

paths work and some do not Path changes accompany failures

3/4 router problems are with routers NOT on baseline path

Page 29: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

2929

Conclusions and Future Work Hubble: working real-time system

Lots of reachability problems, some long lasting Baseline/ fine-grained data enable problem

classification Spoofing to isolate direction of path failures

http://hubble.cs.washington.eduUses iPlane, MaxMind, Google Maps

Page 30: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

3030

Thanks!

http://hubble.cs.washington.edu

Page 31: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

3131

Long term prospects for spoofing?Support for spoofing: No complaints about our spoofed probes Can receive spoofed probes at PlanetLab PlanetLab support in future kernels? Router vendor talking to us about router

support for measurements

Alternatives to spoofing: Traceroute servers behind problems End-hosts behind problems

Page 32: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

3232

Comparison to PlanetSeer [OSDI ‘04] Most similar system Passively monitors CoDeeN clients, probes on

anomalies Different and complimentary analysis

PlanetSeer Clients that connected

within 15 minutes 43% edge ASes

(sum over 3 months) Not problems that

prevent access to CDN

Hubble All prefixes every 2

minutes 92% edge Ases

(every 2 minutes) All partial or complete

reachability problems

Page 33: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

3333

Characteristics of Problems of Interest Routable prefix present in BGP tables Persistent through 2 rounds of probes Routing infrastructure failures

Not simply end-system/end-network failure Judgments based on connectivity to origin AS

Not simply source problem Monitor if less than 90% of vantages reach Based on 4 months of probes to 110K prefixes

Page 34: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

3434

How well does Hubble work?Scale: 89% of the Internet’s edge address space 92% of edge ASes Origin ASes for 99% of 14M BitTorrent users

Effectiveness: Finds 85% of black holes, 95% of those that last at least 1 hr

[compared to pervasive approach]

Cost: 5.5% of the probes required by pervasive approach

Page 35: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

3535

Does spoofing work?

When 3+ spoofing RON nodes fail to reach: Isolate all failed paths in 61% of cases 42% forward, 16% reverse, 3% mixed For 95% of cases, all paths isolate to same

direction

Page 36: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

3636

Provider(s) Unreachable

• No probes reach even the provider(s) of Origin AS

• Probes fail in AS upstream

3% of classified problems (1-13% at any point in time)

Page 37: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

3737

Single-homed Origin AS Down• No probes reach single-homed Origin AS

• Some reach its provider

17% of classified problems (4-37% at any point in time)

Page 38: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

3838

Multi-homed Origin AS Down

• No probes reach multi-homed Origin AS

• Some reach its provider(s)

9% of classified problems (2-30% at any point in time)

Page 39: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

3939

Provider AS Problem for Multi-Homed• Probes through Provider B fail to reach P

• Some reach through Provider A

6% of classified problems (1-17% at any point in time)

Page 40: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

4040

Non-Provider AS Problem

• Probes through Non-Provider C fail

• Some reach through other ASes

17% of classified problems (1-37% at any point in time)

Page 41: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

4141

Router Problem on Known Path • Last hop router R was seen on recent paths reaching P

• No probes reach P through R

• Some reach through R’s AS

7% of classified problems (1-40% at any point in time)

Historical

Traceroutes

Page 42: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

4242

Router Problem on New Path

• Last hop router R not seen on recent paths reaching P

• No probes reach P through R

• Some reach through R’s AS

21% of classified problems (1-40% at any point in time)

Page 43: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

4343

Next Hop Problem on Known Paths • No last hop router or AS explains problem

• Paths previously converged on router R

• Now terminate just before R

14% of classified problems (1-39% at any point in time)

Page 44: 1 Studying Black Holes on the Internet with Hubble Ethan Katz-Bassett, Harsha V. Madhyastha, John P. John, Arvind Krishnamurthy, David Wetherall, Thomas

4444

Topological classification resultsOf ones we classify: Overall (range over time)

1. Provider(s) unreachable: 3% (1-13%)

2. Single-homed origin AS down: 17% (4-37%)

3. Multi-homed origin AS down: 9% (2-30%)

4. Provider AS problem for multi-homed origin AS: 6% (1-17%)

5. Non-provider AS problem: 17% (1-37%)

6. Router problem on old path: 7% (1-40%)

7. Router problem on new path: 21% (1-40%)

8. Next hop problem on known paths: 14% (1-39%)

9. Prefix unreachable: 22% (7-79%)