iknowwhatyourpacketdidlasthop: usingpackethistories … · takes through a network plus the switch...
TRANSCRIPT
![Page 1: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/1.jpg)
I Know What Your Packet Did Last Hop: Using Packet Histories
to Troubleshoot Networks
Nikhil Handigol With
Brandon Heller, Vimal Jeyakumar, David Mazières, Nick McKeown NSDI 2014, SeaOle, WA
April 2, 2014
![Page 2: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/2.jpg)
2
Bug Story: Incomplete Handover A
B
Switch X
WiFi AP Y WiFi AP Z
Match: AcQon Src A, Dst B: Output to Y
![Page 3: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/3.jpg)
Network Outages make news headlines
3
HosQng.com's New Jersey data center was taken down on June 1, 2010, igniQng a cloud outage and connec7vity loss for nearly two hours… HosQng.com said the connecQvity loss was due to a so>ware bug in a Cisco switch that caused the switch to fail.
On April 26, 2010, NetSuite suffered a service outage that rendered its cloud-‐based applicaQons inaccessible to customers worldwide for 30 minutes… NetSuite blamed a network issue for the downQme.
The Planet was rocked by a pair of network outages that knocked it off line for about 90 minutes on May 2, 2010. The outages caused disrupQons for another 90 minutes the following morning.... InvesQgaQon found that the outage was caused by a fault in a router in one of the company's data centers.
![Page 4: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/4.jpg)
TroubleshooQng Networks is Hard Today
4
ping
traceroute
tcpdump/SPAN/sFlow
SNMP
Forwarding State
Forwarding State
Forwarding State
Forwarding State
Forwarding State
• Tedious and ad hoc • Requires skill and experience • Not guaranteed to provide helpful answers
Lots and lots of graphs
(source: NANOG Survey in “AutomaQc Test Packet GeneraQon”, Hongyi Zeng, et. al.)
![Page 5: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/5.jpg)
We want complete network visibility
5
ping
traceroute sFlow
SNMP
Complete visibility: every event that ever happened to every packet
0 100
We want to be here Visibility Spectrum
![Page 6: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/6.jpg)
Talk Outline
1. How to achieve complete network visibility – An abstracQon: Packet History – A plagorm: NetSight
2. Why achieving complete visibility is feasible – Data compression – MapReduce-‐style scale-‐out design
6
![Page 7: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/7.jpg)
7
Forwarding State
Forwarding State
Forwarding State
Forwarding State
Packet History
Packet history = Path taken by a packet + Header modificaQons
+ Switch state encountered
![Page 8: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/8.jpg)
Our TroubleshooQng Workflow
8
Forwarding State
Forwarding State
Forwarding State
Forwarding State
1. Record and store all packet histories 2. Query and use packet histories of errant packets
![Page 9: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/9.jpg)
NetSight A plagorm to capture and filter packet histories of interest
9
![Page 10: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/10.jpg)
Postcard Collector
10
Control Plane
Flow Table State Recorder
![Page 11: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/11.jpg)
Postcard Collector
11
Control Plane
Flow Table State Recorder Match ACT
Match ACT
![Page 12: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/12.jpg)
Postcard Collector
12
Control Plane
Flow Table State Recorder
![Page 13: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/13.jpg)
Postcard Collector
13
Control Plane
Flow Table State Recorder
Version -‐> Flow Table State
Packet Header
Switch ID Output port
Version
Step 1: Generate postcards
![Page 14: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/14.jpg)
ReconstrucQng Packet Histories Step 2: Group postcards by generaQng packet
Packet Header
Switch ID Output port
Version
Packet Header
Switch ID Output port
Version
Packet Header
Switch ID Output port
Version
Packet Header
Switch ID Output port
Version
14
![Page 15: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/15.jpg)
ReconstrucQng Packet Histories Step 3: Sort postcards using topology
15
Topo-‐sort
![Page 16: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/16.jpg)
Postcard Collector
16
Control Plane
Flow Table State Recorder
1. <Match, AcQon> 2. <Match, AcQon> 3. <Match, AcQon> 4. <Match, AcQon> 5. <Match, AcQon> 6. … 7. …
1. <Match, AcQon> 2. <Match, AcQon> 3. <Match, AcQon> 4. <Match, AcQon> 5. <Match, AcQon> 6. … 7. …
1. <Match, AcQon> 2. <Match, AcQon> 3. <Match, AcQon> 4. <Match, AcQon> 5. <Match, AcQon> 6. … 7. …
1. <Match, AcQon> 2. <Match, AcQon> 3. <Match, AcQon> 4. <Match, AcQon> 5. <Match, AcQon> 6. … 7. …
![Page 17: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/17.jpg)
17
Control Plane
Flow Table State Recorder
Postcard Collector
NetSight API Packet History Filter: A regular-‐expression-‐like language to specify packet histories of interest
• Reachability errors • IsolaQon violaQon • Black holes • Waypoint rouQng violaQon
TroubleshooQng Apps
Postcards
Packet History Assembly
TroubleshooQng ApplicaQon
TroubleshooQng ApplicaQon
TroubleshooQng ApplicaQon
TroubleshooQng App
Filtered Packet Histories
![Page 18: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/18.jpg)
Bug Story: Incomplete Handover
18
Packet History Filter
Packet History
WiFi AP Y WiFi AP Z
Switch X
Packet History Filter “Pkts from server not reaching the client”
Packet History Switch X: inport: p0, outports: [p1] mods: [...] state version: 3 Switch Y: inport p1, outports: [p3] mods: ... …
Y
X
![Page 19: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/19.jpg)
TroubleshooQng Apps
netshark nprof
ndb netwatch
ndb: InteracQve network debugger
netwatch: Live network invariant monitor
netshark: Network-‐wide wireshark
nprof: Hierarchical network profiler
19
![Page 20: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/20.jpg)
But will it scale?
20
![Page 21: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/21.jpg)
Why generaQng postcards for every packet at every hop is crazy!
Network Overhead – 64 byte-‐postcard/pkt/hop – Stanford Network: 5 hops avg, 1031 byte avg pkt – 31% extra traffic!
Processing Overhead – Packet history assembly and filtering
Storage Overhead
21
![Page 22: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/22.jpg)
Why generaQng postcards for every packet at every hop is ^ crazy!
Cost is OK for low-‐uQlizaQon networks – E.g., test networks, “bring-‐up phase” networks – Single server can handle enQre Stanford traffic
22
not
![Page 23: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/23.jpg)
Why generaQng postcards for every packet at every hop is ^ crazy!
Huge redundancy in packet header fields – Only a few fields change – IP ID, TCP seq. no. – Postcards can be compressed to 10-‐20 bytes/pkt
23
not
Diff-‐based compression
![Page 24: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/24.jpg)
Why generaQng postcards for every packet at every hop is ^ crazy!
Postcard processing is embarrassingly parallel – Each packet history can be processed independent of other packet histories
24
not
Assembly Filtering
![Page 25: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/25.jpg)
Scaling NetSight Performance
25
Switch
Switch
Switch
NetSight Server
NetSight Server
NetSight Server
NetSight Server
NetSight Server
NetSight Server
Disk
Disk
Disk
… …
…
…
Postcards
Shuffle
Compressed Postcard Lists
Compressed Packet Histories
![Page 26: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/26.jpg)
Scaling NetSight Performance
26
Switch
Switch
Switch
NetSight Server
NetSight Server
NetSight Server
NetSight Server
NetSight Server
NetSight Server
Disk
Disk
Disk
… …
…
…
Postcards
Shuffle
Compressed Postcard Lists
Compressed Packet Histories
![Page 27: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/27.jpg)
Scaling NetSight Performance
27
Switch
Switch
Switch
NetSight Server
NetSight Server
NetSight Server
NetSight Server
NetSight Server
NetSight Server
Disk
Disk
Disk
… …
…
…
Postcards
Shuffle
Compressed Postcard Lists
Compressed Packet Histories
![Page 28: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/28.jpg)
Scaling NetSight Performance
28
Switch
Switch
Switch
NetSight Server
NetSight Server
NetSight Server
NetSight Server
NetSight Server
NetSight Server
Disk
Disk
Disk
… …
…
…
Postcards
Shuffle
Compressed Postcard Lists
Compressed Packet Histories
![Page 29: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/29.jpg)
NetSight Variants
29
![Page 30: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/30.jpg)
NetSight-‐SwitchAssist moves postcard compression to switches
30
Switch
Switch
Switch
NetSight Server
NetSight Server
NetSight Server
Disk
Disk
Disk
… …
… Shuffle
Move postcard compression to switches with simple hardware mechanisms
![Page 31: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/31.jpg)
NetSight-‐HostAssist exploits visibility from the hypervisor
31
Switch
Switch
Switch
NetSight Server
NetSight Server
NetSight Server
Disk
Disk
Disk
… …
… Shuffle
HV Packet Header
Mini-‐postcards contain only unique pkt ID and switch state version
(1) Store packet header at the hypervisor (2) Add unique pkt ID
![Page 32: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/32.jpg)
Overhead ReducQon in NetSight
Basic (naïve) NetSight : 31% extra traffic in Stanford backbone network
NetSight Switch-‐Assist: 7%
NetSight Host-‐Assist: 3%
32
![Page 33: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/33.jpg)
Takeaways
Complete network visibility is possible – Packet History: a powerful troubleshooQng abstracQon that gives complete visibility
– NetSight: a plagorm to capture and filter packet histories of interest
Complete network visibility is feasible – It is possible to collect and filter packet histories at scale
33
![Page 34: IKnowWhatYourPacketDidLastHop: UsingPacketHistories … · takes through a network plus the switch state and header modifications it encounters at each hop. A single packet history](https://reader033.vdocuments.mx/reader033/viewer/2022050212/5f5ec2d996cc9535d54c8b48/html5/thumbnails/34.jpg)
34
I Know What Your Packet Did Last Hop:Using Packet Histories to Troubleshoot Networks
Nikhil Handigol†, Brandon Heller†, Vimalkumar Jeyakumar†, David Mazieres, Nick McKeown{nikhilh,brandonh}@cs.stanford.edu, {jvimal,nickm}@stanford.edu, http: // www. scs. stanford. edu/
~
dm/ addr/
Stanford University, Stanford, CA USA† These authors contributed equally to this work
Abstract
The complexity of networks has outpaced ourtools to debug them; today, administrators use man-ual tools to diagnose problems. In this paper, weshow how packet histories—the full stories of everypacket’s journey through the network—can simplifynetwork diagnosis. To demonstrate the usefulnessof packet histories and the practical feasibility ofconstructing them, we built NetSight, an extensibleplatform that captures packet histories and enablesapplications to concisely and flexibly retrieve packethistories of interest. Atop NetSight, we built fourapplications that illustrate its flexibility: an inter-active network debugger, a live invariant monitor,a path-aware history logger, and a hierarchical net-work profiler. On a single modern multi-core server,NetSight can process packet histories for the traf-fic of multiple 10 Gb/s links. For larger networks,NetSight scales linearly with additional servers andscales even further with straightforward additions tohardware- and hypervisor-based switches.
1 IntroductionOperating networks is hard. When networks godown, administrators have only a rudimentary set oftools at their disposal (traceroute, ping, SNMP, Net-Flow, sFlow) to track down the root cause of the out-age. This debugging toolkit has remained essentiallyunchanged, despite an increase in distributed proto-cols that modify network state. Network adminis-trators have become “masters of complexity” [40]who use their skill and experience to divine the rootcause of each bug. Humans are involved almost ev-ery time something goes wrong, and we are still farfrom an era of automated troubleshooting.
We could easily diagnose many network problemsif we could ask the network about suspect tra�c andreceive an immediate answer. For example:
1. “Host A cannot talk to Host B. Show me wherepackets from A intended for B are going, alongwith any header modifications.”
2. “I don’t want forwarding loops in my network,even transient ones. Show me every packet thatpasses the same switch twice.”
3. “Some hosts are failing to grab IP addresses.Show me where DHCP tra�c is going in thenetwork.”
4. “One port is experiencing congestion. Show methe tra�c sources causing the congestion.”
Today, we cannot “just ask” these questions. Ournetwork diagnosis tools either provide no way topose such a question, or lack access to the informa-tion needed to provide a useful answer. But, thesequestions could be answered with an omniscient viewof every packet’s journey through the network. Wecall this notion a packet history. More specifically,
Definition A packet history is the route a packettakes through a network plus the switch state andheader modifications it encounters at each hop.
A single packet history can be the “smoking gun”that reveals why, how, and where a network failed,evidence that would otherwise remain hidden in gi-gabytes of message logs, flow records [8, 34], andpacket dumps [15, 18, 32].
Using this construct, it becomes possible to buildnetwork analysis programs to diagnose problems.We built four such applications: (1) ndb, an inter-active network debugger, (2) netwatch, a live net-work invariant monitor, (3) netshark, a network-wide packet history logger, and (4) nprof, a hier-archical network profiler. The problems describedabove are a small sample from the set of problemsthese applications can help solve.
These four applications run on top of a prototypeplatform we built, called NetSight. With a view of
1
Every
NetSight API
hOp://yuba.stanford.edu/netsight