network troubleshooting: rcc and beyond nick feamster georgia tech (joint with russ clark, yiyi...

16
Network Troubleshooting: rcc and Beyond Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina)

Upload: diego-lee

Post on 27-Mar-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Network Troubleshooting: rcc and Beyond Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina)

Network Troubleshooting:rcc and Beyond

Nick FeamsterGeorgia Tech

(joint with Russ Clark, Yiyi Huang, Anukool Lakhina)

Page 2: Network Troubleshooting: rcc and Beyond Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina)

2

rcc: Router Configuration Checker• Proactive routing configuration analysis

• Idea: Analyze configuration before deployment

ConfigureDetectFaults

Deploy

rcc

Many faults can be detected with static analysis.

Page 3: Network Troubleshooting: rcc and Beyond Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina)

3

rcc Implementation

Preprocessor Parser

Verifier

Distributed routerconfigurations Relational

Database(mySQL)

Constraints

Faults

(Cisco, Avici, Juniper, Procket, etc.)

http://nms.csail.mit.edu/rcc/

Page 4: Network Troubleshooting: rcc and Beyond Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina)

4

rcc Interface

Page 5: Network Troubleshooting: rcc and Beyond Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina)

5

Parsing Configuration

Page 6: Network Troubleshooting: rcc and Beyond Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina)

6

List of Faults

Page 7: Network Troubleshooting: rcc and Beyond Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina)

7

Yes, but Surprises Happen!

• Link failures• Node failures• Traffic volumes shift• Network devices “wedged”• …

• Two problems– Detection– Localization

Page 8: Network Troubleshooting: rcc and Beyond Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina)

8

A Closer Look

• Proactive analysis– Fault avoidance– Policy conformance

• Reactive diagnosis– Correcting network faults

• Detection• Localization

– Active and passive measurements– Need user’s perspective

Idea: These analyses should inform each other

Page 9: Network Troubleshooting: rcc and Beyond Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina)

9

Detection: Analyze Routing Dynamics

• Idea: Routers exhibit correlated behavior

Blips across signals may be more operationally interesting than any spike in one.

Page 10: Network Troubleshooting: rcc and Beyond Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina)

10

Detection Three Types of Events

• Single-router bursts• Correlated bursts• Multi-router bursts

• Common• Commonly

missed using thresholds

Page 11: Network Troubleshooting: rcc and Beyond Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina)

11

Localization: Joint Dynamic/Static

• Which routers are “border routers” for that burst• Topological properties of routers in the burst

Static Dynamic

Proactive Analysis

Deployment

Reactive Detection

Diagnosis/Correction

Page 12: Network Troubleshooting: rcc and Beyond Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina)

12

Configuration Analysis: Next Steps

• BGP/MPLS Layer 3 VPNs– Need access to these configurations to do this!– Help needed!

• Firewall and switch configurations– Take high-level operator policy as input– Analyze static configuration to see whether

configuration matches policy– Perform active probing experiments to check

Page 13: Network Troubleshooting: rcc and Beyond Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina)

13

Firewall configuration: Case Study

• Georgia Tech Campus Network– Research and Administrative Network– 180 buildings– 130+ firewalls– 1700+ switches– 55000+ ports

• Problem: Availability/Reachability– Flux in firewall, router, switch configurations– No common authority over changes made

Page 14: Network Troubleshooting: rcc and Beyond Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina)

14

Specific Focus: Firewall Configuration

• Difficult to understand and audit configs

• Subject to continual modifications– Roughly 1-2 touches per day

• Federated policy, distributed dependencies– Each department has independent policies– Local changes may affect global behavior

Page 15: Network Troubleshooting: rcc and Beyond Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina)

15

Firewall Configurations

• Georgia Tech Campus Network– Research and Administrative Network– 180 buildings– 130+ firewalls– 1700+ switches– 55000+ ports

• Problem: Availability/Reachability– Flux in firewall, router, switch configurations– No common authority over changes made

Page 16: Network Troubleshooting: rcc and Beyond Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina)

16

Specific Focus: Firewall Configuration

• Difficult to understand and audit configs

• Subject to continual modifications– Roughly 1-2 touches per day

• Federated policy, distributed dependencies– Each department has independent policies– Local changes may affect global behavior