iknowwhatyourpacketdidlasthop: usingpackethistories … · takes through a network plus the switch...

34
I Know What Your Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks Nikhil Handigol With Brandon Heller, Vimal Jeyakumar, David Mazières, Nick McKeown NSDI 2014, SeaOle, WA April 2, 2014

Upload: others

Post on 20-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

I  Know  What  Your  Packet  Did  Last  Hop:  Using  Packet  Histories    

to  Troubleshoot  Networks  

Nikhil  Handigol  With  

Brandon  Heller,  Vimal  Jeyakumar,  David  Mazières,  Nick  McKeown  NSDI  2014,  SeaOle,  WA  

April  2,  2014  

Page 2: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

2  

Bug  Story:  Incomplete  Handover  A  

B  

Switch  X  

WiFi  AP  Y   WiFi  AP  Z  

Match:    AcQon  Src  A,  Dst  B:  Output  to  Y  

Page 3: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

Network  Outages  make  news  headlines  

3  

HosQng.com's  New  Jersey  data  center  was  taken  down  on  June  1,  2010,  igniQng  a  cloud  outage  and  connec7vity  loss  for  nearly  two  hours…  HosQng.com  said  the  connecQvity  loss  was  due  to  a  so>ware  bug  in  a  Cisco  switch  that  caused  the  switch  to  fail.  

On  April  26,  2010,  NetSuite  suffered  a  service  outage  that  rendered  its  cloud-­‐based  applicaQons  inaccessible  to  customers  worldwide  for  30  minutes…  NetSuite  blamed  a  network  issue  for  the  downQme.  

The  Planet  was  rocked  by  a  pair  of  network  outages  that  knocked  it  off  line  for  about  90  minutes  on  May  2,  2010.  The  outages  caused  disrupQons  for  another  90  minutes  the  following  morning....  InvesQgaQon  found  that  the  outage  was  caused  by  a  fault  in  a  router  in  one  of  the  company's  data  centers.    

Page 4: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

TroubleshooQng  Networks  is  Hard  Today    

     

 

   

4  

ping

traceroute

tcpdump/SPAN/sFlow

SNMP

Forwarding  State  

Forwarding  State  

Forwarding  State  

Forwarding  State  

Forwarding  State  

•  Tedious  and  ad  hoc  •  Requires  skill  and  experience  •  Not  guaranteed  to  provide  helpful  answers  

Lots  and  lots  of  graphs  

(source:  NANOG  Survey  in  “AutomaQc  Test  Packet  GeneraQon”,  Hongyi  Zeng,  et.  al.)  

Page 5: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

We  want  complete  network  visibility  

5  

ping

traceroute sFlow

SNMP

Complete  visibility:  every  event  that  ever  happened  to  every  packet  

0   100  

We  want  to  be  here  Visibility  Spectrum  

Page 6: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

Talk  Outline  

1.  How  to  achieve  complete  network  visibility  – An  abstracQon:  Packet  History  – A  plagorm:  NetSight  

2.  Why  achieving  complete  visibility  is  feasible  – Data  compression  – MapReduce-­‐style  scale-­‐out  design  

6  

Page 7: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

7  

   

   

   

Forwarding  State  

Forwarding  State  

Forwarding  State  

Forwarding  State  

Packet  History  

Packet  history    =    Path  taken  by  a  packet                                  +  Header  modificaQons  

                                         +  Switch  state  encountered  

Page 8: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

Our  TroubleshooQng  Workflow  

8  

   

   

   

Forwarding  State  

Forwarding  State  

Forwarding  State  

Forwarding  State  

1.  Record  and  store  all  packet  histories  2.  Query  and  use  packet  histories  of  errant  packets  

Page 9: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

NetSight  A  plagorm  to  capture  and  filter  packet  histories  of  interest  

9  

Page 10: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

Postcard  Collector  

10  

   

   

   

Control  Plane  

Flow  Table  State  Recorder  

Page 11: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

Postcard  Collector  

11  

   

   

   

Control  Plane  

Flow  Table  State  Recorder  Match   ACT  

Match   ACT  

Page 12: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

Postcard  Collector  

12  

   

   

   

Control  Plane  

Flow  Table  State  Recorder  

Page 13: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

Postcard  Collector  

13  

   

   

   

Control  Plane  

Flow  Table  State  Recorder  

Version  -­‐>  Flow  Table  State  

Packet  Header  

Switch  ID   Output  port  

Version  

Step  1:  Generate              postcards  

Page 14: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

ReconstrucQng  Packet  Histories  Step  2:  Group  postcards  by  generaQng  packet  

Packet  Header  

Switch  ID   Output  port  

Version  

Packet  Header  

Switch  ID   Output  port  

Version  

Packet  Header  

Switch  ID   Output  port  

Version  

Packet  Header  

Switch  ID   Output  port  

Version  

14  

Page 15: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

ReconstrucQng  Packet  Histories  Step  3:  Sort  postcards  using  topology  

     

 

   

15  

Topo-­‐sort  

Page 16: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

Postcard  Collector  

16  

   

   

   

Control  Plane  

Flow  Table  State  Recorder  

1.   <Match,  AcQon>  2.   <Match,  AcQon>  3.   <Match,  AcQon>  4.   <Match,  AcQon>  5.   <Match,  AcQon>    6.   …  7.   …  

1.   <Match,  AcQon>  2.   <Match,  AcQon>  3.   <Match,  AcQon>  4.   <Match,  AcQon>  5.   <Match,  AcQon>    6.   …  7.   …  

1.   <Match,  AcQon>  2.   <Match,  AcQon>  3.   <Match,  AcQon>  4.   <Match,  AcQon>  5.   <Match,  AcQon>    6.   …  7.   …  

1.   <Match,  AcQon>  2.   <Match,  AcQon>  3.   <Match,  AcQon>  4.   <Match,  AcQon>  5.   <Match,  AcQon>    6.   …  7.   …  

Page 17: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

17  

     

 

   

Control  Plane  

Flow  Table  State  Recorder  

Postcard  Collector  

NetSight  API  Packet  History  Filter:  A  regular-­‐expression-­‐like  language  to  specify  packet  histories  of  interest  

•  Reachability  errors  •  IsolaQon  violaQon  •  Black  holes  •  Waypoint  rouQng  violaQon  

TroubleshooQng  Apps  

Postcards  

Packet  History  Assembly  

TroubleshooQng  ApplicaQon  

TroubleshooQng  ApplicaQon  

TroubleshooQng  ApplicaQon  

TroubleshooQng  App  

Filtered  Packet  Histories  

Page 18: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

Bug  Story:  Incomplete  Handover  

18  

     

 

   

Packet  History  Filter  

Packet  History  

WiFi  AP  Y   WiFi  AP  Z  

Switch  X  

Packet  History  Filter  “Pkts from server not reaching the client”  

Packet  History  Switch X: inport: p0, outports: [p1] mods: [...] state version: 3 Switch Y: inport p1, outports: [p3] mods: ... …

 

Y  

X  

Page 19: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

TroubleshooQng  Apps  

netshark nprof

ndb netwatch

ndb:  InteracQve  network  debugger  

netwatch:  Live  network  invariant  monitor  

netshark:  Network-­‐wide  wireshark  

nprof:  Hierarchical  network  profiler  

19  

Page 20: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

But  will  it  scale?  

20  

Page 21: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

Why  generaQng  postcards  for  every  packet  at  every  hop  is  crazy!  

Network  Overhead  – 64  byte-­‐postcard/pkt/hop  – Stanford  Network:  5  hops  avg,  1031  byte  avg  pkt  – 31%  extra  traffic!  

Processing  Overhead  – Packet  history  assembly  and  filtering  

Storage  Overhead  

21  

Page 22: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

Why  generaQng  postcards  for  every  packet  at  every  hop  is      ^      crazy!  

Cost  is  OK  for  low-­‐uQlizaQon  networks  – E.g.,  test  networks,  “bring-­‐up  phase”  networks  – Single  server  can  handle  enQre  Stanford  traffic  

22  

not  

Page 23: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

Why  generaQng  postcards  for  every  packet  at  every  hop  is      ^      crazy!  

Huge  redundancy  in  packet  header  fields  – Only  a  few  fields  change  –  IP  ID,  TCP  seq.  no.  – Postcards  can  be  compressed  to  10-­‐20  bytes/pkt  

23  

not  

Diff-­‐based  compression  

Page 24: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

Why  generaQng  postcards  for  every  packet  at  every  hop  is      ^      crazy!  

Postcard  processing  is  embarrassingly  parallel  – Each  packet  history  can  be  processed  independent  of  other  packet  histories  

24  

not  

Assembly   Filtering  

Page 25: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

Scaling  NetSight  Performance  

25  

Switch  

Switch  

Switch  

NetSight  Server  

NetSight  Server  

NetSight  Server  

NetSight  Server  

NetSight  Server  

NetSight  Server  

Disk  

Disk  

Disk  

…   …  

…  

…  

Postcards  

Shuffle  

Compressed  Postcard  Lists  

Compressed  Packet  Histories  

Page 26: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

Scaling  NetSight  Performance  

26  

Switch  

Switch  

Switch  

NetSight  Server  

NetSight  Server  

NetSight  Server  

NetSight  Server  

NetSight  Server  

NetSight  Server  

Disk  

Disk  

Disk  

…   …  

…  

…  

Postcards  

Shuffle  

Compressed  Postcard  Lists  

Compressed  Packet  Histories  

Page 27: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

Scaling  NetSight  Performance  

27  

Switch  

Switch  

Switch  

NetSight  Server  

NetSight  Server  

NetSight  Server  

NetSight  Server  

NetSight  Server  

NetSight  Server  

Disk  

Disk  

Disk  

…   …  

…  

…  

Postcards  

Shuffle  

Compressed  Postcard  Lists  

Compressed  Packet  Histories  

Page 28: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

Scaling  NetSight  Performance  

28  

Switch  

Switch  

Switch  

NetSight  Server  

NetSight  Server  

NetSight  Server  

NetSight  Server  

NetSight  Server  

NetSight  Server  

Disk  

Disk  

Disk  

…   …  

…  

…  

Postcards  

Shuffle  

Compressed  Postcard  Lists  

Compressed  Packet  Histories  

Page 29: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

NetSight  Variants  

29  

Page 30: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

NetSight-­‐SwitchAssist  moves  postcard  compression  to  switches  

30  

Switch  

Switch  

Switch  

NetSight  Server  

NetSight  Server  

NetSight  Server  

Disk  

Disk  

Disk  

…   …  

…  Shuffle  

Move  postcard  compression  to  switches  with  simple  hardware  mechanisms  

Page 31: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

NetSight-­‐HostAssist  exploits  visibility  from  the  hypervisor  

31  

Switch  

Switch  

Switch  

NetSight  Server  

NetSight  Server  

NetSight  Server  

Disk  

Disk  

Disk  

…   …  

…  Shuffle  

HV  Packet  Header  

Mini-­‐postcards  contain  only    unique  pkt  ID  and  switch  state  version  

(1)   Store  packet  header  at  the  hypervisor  (2)   Add  unique  pkt  ID  

Page 32: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

Overhead  ReducQon  in  NetSight  

Basic  (naïve)  NetSight  :  31%  extra  traffic  in  Stanford  backbone  network  

 NetSight  Switch-­‐Assist:  7%  

 NetSight  Host-­‐Assist:  3%  

32  

Page 33: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

Takeaways  

Complete  network  visibility  is  possible  – Packet  History:  a  powerful  troubleshooQng  abstracQon  that  gives  complete  visibility  

– NetSight:  a  plagorm  to  capture  and  filter  packet  histories  of  interest  

Complete  network  visibility  is  feasible    –  It  is  possible  to  collect  and  filter  packet  histories  at  scale  

33  

Page 34: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history

34  

I Know What Your Packet Did Last Hop:Using Packet Histories to Troubleshoot Networks

Nikhil Handigol†, Brandon Heller†, Vimalkumar Jeyakumar†, David Mazieres, Nick McKeown{nikhilh,brandonh}@cs.stanford.edu, {jvimal,nickm}@stanford.edu, http: // www. scs. stanford. edu/

~

dm/ addr/

Stanford University, Stanford, CA USA† These authors contributed equally to this work

Abstract

The complexity of networks has outpaced ourtools to debug them; today, administrators use man-ual tools to diagnose problems. In this paper, weshow how packet histories—the full stories of everypacket’s journey through the network—can simplifynetwork diagnosis. To demonstrate the usefulnessof packet histories and the practical feasibility ofconstructing them, we built NetSight, an extensibleplatform that captures packet histories and enablesapplications to concisely and flexibly retrieve packethistories of interest. Atop NetSight, we built fourapplications that illustrate its flexibility: an inter-active network debugger, a live invariant monitor,a path-aware history logger, and a hierarchical net-work profiler. On a single modern multi-core server,NetSight can process packet histories for the traf-fic of multiple 10 Gb/s links. For larger networks,NetSight scales linearly with additional servers andscales even further with straightforward additions tohardware- and hypervisor-based switches.

1 IntroductionOperating networks is hard. When networks godown, administrators have only a rudimentary set oftools at their disposal (traceroute, ping, SNMP, Net-Flow, sFlow) to track down the root cause of the out-age. This debugging toolkit has remained essentiallyunchanged, despite an increase in distributed proto-cols that modify network state. Network adminis-trators have become “masters of complexity” [40]who use their skill and experience to divine the rootcause of each bug. Humans are involved almost ev-ery time something goes wrong, and we are still farfrom an era of automated troubleshooting.

We could easily diagnose many network problemsif we could ask the network about suspect tra�c andreceive an immediate answer. For example:

1. “Host A cannot talk to Host B. Show me wherepackets from A intended for B are going, alongwith any header modifications.”

2. “I don’t want forwarding loops in my network,even transient ones. Show me every packet thatpasses the same switch twice.”

3. “Some hosts are failing to grab IP addresses.Show me where DHCP tra�c is going in thenetwork.”

4. “One port is experiencing congestion. Show methe tra�c sources causing the congestion.”

Today, we cannot “just ask” these questions. Ournetwork diagnosis tools either provide no way topose such a question, or lack access to the informa-tion needed to provide a useful answer. But, thesequestions could be answered with an omniscient viewof every packet’s journey through the network. Wecall this notion a packet history. More specifically,

Definition A packet history is the route a packettakes through a network plus the switch state andheader modifications it encounters at each hop.

A single packet history can be the “smoking gun”that reveals why, how, and where a network failed,evidence that would otherwise remain hidden in gi-gabytes of message logs, flow records [8, 34], andpacket dumps [15, 18, 32].

Using this construct, it becomes possible to buildnetwork analysis programs to diagnose problems.We built four such applications: (1) ndb, an inter-active network debugger, (2) netwatch, a live net-work invariant monitor, (3) netshark, a network-wide packet history logger, and (4) nprof, a hier-archical network profiler. The problems describedabove are a small sample from the set of problemsthese applications can help solve.

These four applications run on top of a prototypeplatform we built, called NetSight. With a view of

1

Every  

NetSight API

hOp://yuba.stanford.edu/netsight