Detecting Shared Congestion of Flows Via
End-to-end Measurement(and other inference problems)
Dan Rubenstein
joint work with Jim Kurose andDon Towsley
Umass Amherst
Network Inference• What’s going on in there?
NETWORK
• Where are packets getting lost / delayed?• Where is congestion occurring?• Where are the network hot spots?• What are routers doing (WFQ, RED)?• What version of TCP are end-hosts using?
Multiple Autonomous Systems
• What routing capabilities does your ISP provide? “That’s proprietary info”• Who’s to blame for poor service?
• Consequence: who has to figure out what and where the problem is and how to fix it?
somebody else!
Overview
• Overview of other inference work:– Identifying bottleneck capacities– Multicast inference of loss (MINC)– TCP inference (TBIT)
• Detecting shared points of congestion
Identifying bottleneck bandwidths
• Links have different capacities– “skinniest” link processes slowest: creates a
rate bottleneck– can the bottleneck rate be identified?
• Lots of work here [Carter’96, Jacobson’97, Downey’99, Lai’99, Melander’99, Lai’00]
Multicast Inference
• Infer loss points on multicast tree via correlation patterns of receivers w/in a multicast group [Ratnas’99, Caceres’99 (3), LoPresti’99, Adler’00]
S
R R RRR
Pts of loss
TCP Inference (TBIT)
• Many versions of TCP exist– RENO, TAHOE, VEGAS
• Many “optional” components– SACK, ECN compliance
• Are specification reqmts being met?– initial window sizes, slow start
• TBIT: TCP Behavior Identification Tool [Padhye’00]– stress-tests a server’s TCP by intentionally delaying /
dropping various ACKs– different TCPs / TCP options respond differently to
the delayed / dropped ACKs
Client
Point of congestion
Detecting Shared Pts of Congestion: Why bother?
• When flows share common point of congestion (POC), bandwidth can be “transferred” between flows w/o impacting other traffic
• Applications: WWW servers, multi-flow (multi-media) sessions, multi-sender multicast
• Can limit “transfer” to flows w/ identical e2e data paths [Balak’99]
– ensures flows have common bottleneck– but limits applicability
Server
Point of congestion
Detecting Shared POCs
Q: Can we identify whether two flows share the same Point of Congestion (POC)?
Network Assumptions:– routers use FIFO forwarding
– The two flows’ POCs are either all shared or all
separate
Techniques for detecting shared POCs
• Requirement: flows’ senders or receivers are co-located
• Packet ordering through a potential SPOC same as that at the co-located end-system
• Good SPOC candidates
S2
S1
R1
R2
S1
S2
R1
R2
co-located senders
co-located receivers
Simple Queueing Models of POCs for two flows
FG Flow 1
FG Flow 2
A Shared POCFG Flow 1
FG Flow 2
Separate POCs
BGBG BG
InternetInternet
Approach (High level)
• Idea: Packets passing through same POC close in time experience loss and delay correlations [Moon’98, Yajnik’99]
• Using either loss or delay statistics, compute two measures of correlation:
– Mc: cross-measure (correlation between flows)
– Ma: auto-measure (correlation within a flow)
• such that – if Mc < Ma then infer POCs are separate– else Mc > Ma and infer POCs are shared
The Correlation Statistics...
Loss-Corr for co-located senders:
Mc = Pr(Lost(i) | Lost(i-1))
Ma = Pr(Lost(i) | Lost(prev(i)))
Loss-Corr for co-located receivers: a bit more complex
Delay: Either co-located topology:
Mc = C(Delay(i), Delay(i-1))
Ma = C(Delay(i), Delay(prev(i))C(X,Y) =
E[XY] - E[X]E[Y]
(E[X2] - E2[X])(E[Y2] - E2[Y])
i-4
i-2
i
i-1
i-3
i+1
time
Flow 1 pkts
Flow 2 pkts
Intuition: Why the comparison works
Tarr(prev(i), i)Tarr(i-1, i) • Recall: Pkts closer together exhibit higher correlation
• E[Tarr(i-1, i)] < E[Tarr(prev(i), i)]– On avg, i “more correlated” with i-1 than with prev(i) – True for many distributions, e.g.,
• deterministic, any• poisson, poisson
• Rest of talk: assume poisson, poisson
• Delay-Correlation technique: Assume POC(s) are M+G/G/1/ queues– Thm: Both co-located topologies: Mc > Ma iff flows share
POCs
Analytical Results
As # samples • Loss-Correlation technique:
– Assume POC(s) are M+M/M/1/K queues:
– Thm: Co-located senders, then Mc > Ma iff flows share POCs
– co-located receivers: Mc > Ma iff flows share POCs shown via extensive tests using recursive solutions of Mc and Ma
Simulation Setup
• Co-located senders: Shared POCs
10ms 30ms 10ms
20ms 20ms
30ms 20ms 30ms
S1S2
R1
R2
1.5 Mbs
1000 Mbs
TCP trafficon/off sources
20 pps
20 pps
2nd Simulation Setup
• Co-located senders: Independent POCs
TCP trafficon/off sources
10ms 30ms 10ms
20ms 20ms
30ms 20ms 30ms
S1S2
R1
R2
1000 Mbs
1.5 Mbs20pps
20pps
TCP trafficon/off sources
Independent POCs Shared POCs
Simulation results
• Delay-corr an order of magnitude faster than loss-corr• The Shared loss-corr dip: bias due to delayed Mc samples
• Similar results on co-located receiver topology simulations
Internet Experiments• Goal: Verify techniques using real Internet
traces• Experimental Setup:
– Choose topologies where POC status (shared or unshared)
– Use traceroute to assess shared links and approximate per-link delays
UMass
ACIRI
UCL
Separate POCs (?)193 ms
264 ms 30
ms
Experimental Results
CorrectInconclusive
Wrong
3 Umass (MA)
Columbia (NY)
UCL (UK)
ACIRI (Calif.)
AT&T (Calif.)
Sites
Summary
• E2E Shared-POC detecting techniques– Delay-based techniques more accurate, take less
time (order of magnitude)
• Future Directions:– Experiment with non-Poisson foreground traffic
– Focus on making techniques more practical (e.g., Byers @ BU CS for recent TR)
• Paper available (SIGMETRICS’00)