Internet Congestion Characteristics and Implications
Jim Gast
University of Wisconsin / Madison
October 25, 2001
Research Question
• What are the characteristics of congestion in the Internet?– duration
– propagation
– complexity
– location
– implications
Challenges
• Accurately recreating congestion in the lab– multiple bottlenecks, appropriate length paths
• Visualizing congestion to gain understanding– choosing the right level of detail (router vs. AS)
• Infering congestion from data we can realistically gather– passive vs. active
– packet headers vs. flow level
Packet Switching
100
100
100
100
100
Router
outboundqueue
• Routers receive packets and forward them to the next hop toward their destination.
• Packets from an inbound link might arrive when their outbound link is busy. They are queued.
• When the queue is full, the router drops packets.
Speed Matching
• Think about a 10 MByte transfer from a research lab to a home computer connected at 0.5 MBps
• The sender wants to send data at 10 MBps• Historically, the only way routers could ask connections
to slow down was to drop packets
1000.5
Vern Sally
9 8 7 5 4
100100100
ACK
Your fair share
400
800
800
12001100
Assume 4 flows through 3 links. Each flow can support speedsup to 1000 packets per second.
400
300
Your fair share
400
800400
800
12001100
0
800?
At time t1, the blue flow ends.
When one of the flows ends, there is reduced congestion atthe link with capacity 800. After t1, the black flow slowlyincreases toward 800 pps.
300
Your fair share
400
800800
800
12001100
0
600?
600?
Increased flow of black packets causes congestion at the middlelink. It responds by randomly dropping brown and blackpackets until the sum of brown and black drops to 1200.
300
Your fair share
400
800800
800300
12001100
0
600?
600?500?
Congestion at the right link disappears. Decreased flow ofbrown packets leaves more room for green so green’s usageslowly rises toward 500 pps.
Related Work - E2E TCP
• Congestion control and avoidance (Jacobson)• Equation-based, steady state analyses of losses
(Floyd, Mathis, Padhye)– Predicts Bulk Transfer Capacity and Goodput
– Assumes losses are IID
• Temporal dependence in packet loss (Moon)• TCP in a bursty world (Balakrishnan, Huang,
Paxson, Willinger)– Especially in long paths
Related Work - Network Aware
• Random Early Detection (Floyd)– drop packets probabalistically before queue fills
• Explicit Congestion Notification (Floyd, Black, Low)– like DECbit
– ask senders to slow down
• Coordinated Congestion Control (Padmanabhan)• Pacing (Savage)
Related Work - Gathering the Data
• Critical Path Analysis (Barford)– identifying the specific packets that contribute to delay
• FlowScan (Plonka)– passive data gathering at an aggregate level
• PacketScope (Rexford)– combining passive data from neighboring nodes
Related Work - Macroscopic• NPD (Paxson)
– 1,000 distinct Internet paths
• NIMI (Mathis)– 36 systems– also a measurement framework
• Surveyor (Kalidindi)– 61 systems capable of measuring 1-way delay and packet loss– highly accurate clock
• Skitter / CAIDA– 16 monitors probing 7,600 AS’es
• Mercator (Govindan)– discovers routers, resolves aliases
Related Work - Topology
• Router-level mapping (Cheswick, Burch)• Netflow data can be used instead of packet
headers. AS Flows as aggregates (Uhlig, Bonaventure)
• AS Graph (Gao, Rexford)– customer - to - provider
– peer - to - peer
My thesis
I believe understanding disruptive losses is key to understanding congestion events.
Per Connection Congestion Control
60
20
1.0 2.0 3.0 4.0 5.0 6.0 7.0
KB
Time (seconds)
70
304050
10
The blue line is the congestion window of this connection.Each connection tries to find the best throughput.
Coarse Timeout Normal Packet Arrival
Packet that will need to beretransmitted
Copyright MKP, 1996 (Peterson & Davies)
Conceptual (unproven) Simulation400
800
400
800 300
12001100
Flow Rates
0100200300400500600700800900
0 50 100 150 200 250 300
Blue
Black
Brown
Green
Conceptual (unproven) Simulation400
800
400
800 300
12001100
Congestion Events
0100
200300400
500600700
800900
0 50 100 150 200 250 300
Time
Blue
Black
Brown
Green
CongEvt
Simple bottleneck - pacing losses
Losses during steady pacing
0
1
0 50 100 150 200
Packet number
Loss
Multiple Congestion
Losses on complex path
0
1
0 0.5 1 1.5 2
Time
Lo
ss
Disruptive LossDesired Signal
Disruptive Losses
Analysis ExampleWavelets Level 4
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
-2.00 -1.00 0.00 1.00 2.00 3.00 4.00 5.00
Time
Daub18
Daub8
-2.00 -1.00 0.00 1.00 2.00 3.00 4.00 5.00
Disruptive Losses
Shared Congestion Suspicion
Loss reports at a site
-2.0
0.0
2.04.0
6.0
8.0
10.0
12.0
14.016.0
18.0
20.0
0 0.5 1 1.5 2
Time
Co
nn
ec
tio
ns
th
at
rep
ort
c
on
ge
sti
on
Component contributed bythe connection in the prior slide
Total of congestion reports fromall of the connections that passthrough this site
Research Plan
• Simulation• Measurement• Emulation• Visualization• Congestion Inference• Validation
Simulation• Develop basic understanding of:
– the data that will be needed
– the experimental environment
– the tools and their limitations
• Simulate simple topologies and demand– cascade
– loop
– star
Measurement
• Use external measurement infrastructure (e.g. NIMI) to measure the traffic in the Internet at the AS level– throughput
– delay
– loss
– effective queue length
Emulation• Construct a lab environment that emulates the conditions
found in the wild– connectivity
– delay attributed to each AS
– effective queue length of each AS
– loss at boundaries
• Record the congestion events
• Observe the extent of congestion propagation
• Measure the effects of multiple congestion
Visualization
• Predict path of packets across AS’es– Using BGP tables
• Determine best paths– Subset based on provider - to - customer links
Congestion Inference and Validation
• Using assumed path, attribute data to transit nodes• Using ECN in the lab, infer location and duration
of congestion• Using flow data, infer location and duration of
congestion• Compare measured congestion to inferred
Expected Contributions
• Foundation for understanding congestion events– Definition of pacing loss
– Definition of disruptive loss
• Improved understanding of protocols’ performance (e.g. RED, ECN)
• Near real-time congestion localization tools• Visualization tools• Analysis of equitable bandwidth sharing
Implications
• Improved tuning and performance of existing protocols– RED
– ECN
• Fewer surprises when the next killer app hits campus
• Basis for cross-protocol next generation congestion control primitives
• Lossless congestion control
• Reduced delay, reduced jitter, reduced retransmissions when sharing
• Improved Internet fairness, predictability, and scalability
Summary
• Congestion phenomena have not been adequately studied in the wild
• Wisconsin has unique resources to bring to bear– WAIL
– Wavelet IDR
• Upcoming protocols will need better congestion metrics
Schedule• Oct 01 Prelim• Nov 01 Simulation & Wide-area measurements• Feb 02 Analyze Measurements• Apr 02 Emulate large subset in the lab• May 02 Analyze lab results
» congestion propagation
» multiple congestion
• Oct 02 Write thesis• May 03 Graduate
Questions?
SPARE SLIDES AFTER THIS POINT
Where is the Backbone?
• Use the BGP tables tocreate an Adjacency Matrix, A(i,j)
A(i,j) Atl Bos Chi LA NY SFAtl 1 1 0 0 1 0Bos 1 1 1 0 1 0Chi 0 1 1 0 0 1LA 0 0 0 1 0 1NY 1 1 0 0 1 0SF 0 0 1 1 0 1
Chi
NY
Bos
Atl
SF
LA
Where is the Backbone?
• Compute A2
A2 Atl Bos Chi LA NY SFAtl 3 3 1 0 3 0Bos 3 4 2 0 3 1Chi 1 2 3 1 1 2LA 0 0 1 2 0 2NY 3 3 1 0 3 0SF 0 1 2 2 0 3
Chi
NY
Bos
Atl
SF
LA
Where is the Backbone?
• Compute A3, then
Chi
NY
Bos
Atl
SF
LA
A3 Atl Bos Chi LA NY SF BB?Atl 9 10 4 0 9 1 FALSEBos 10 12 7 1 10 3 TRUEChi 4 7 7 3 4 6 TRUELA 0 1 3 4 0 5 FALSENY 9 10 4 0 9 1 FALSESF 1 3 6 5 1 7 TRUE
)),(2),(3()(,1
ni
jiAjiAjBB
12 AS’es that see 90% of the AS Graph within 3 hops
AS Name Degree Size Peers Depth1 BBNPlanet 171 394 11 4174 PSINet 25 51 8 4286 EUNet 21 56 6 3293 ESNet 41 186 8 3701 Alternet 491 972 10 51239 Sprint 408 715 7 42828 Concentric 30 136 7 52914 Verio 152 980 11 53561 Cable&Wireless 287 603 8 36453 Teleglobe 58 381 8 56461 AboveNet 58 1211 9 57018 A T & T 282 698 9 4
Explicit Congestion Notification
Version HLen TOS Length
Ident Flags Offset
TTL Protocol Checksum
SourceAddr
DestinationAddr
Options (variable) Pad(variable)
0 4 8 16 19 31
Data
ECT = ECN Capable Transport CE = Congestion Experienced
ECN
IPv4 Packet Header
00 = Not ECN01 = ECT(1)10 = ECT(0)11 = CE
Operational Questions• I want to see relevant Internet congestion at a glance
– What is the topology of the Internet?• Timely updating
– What level of detail is appropriate?• Router Level vs. AS Level
• Hierarchical
• Can we gather the data realistically?– Active vs. Passive
– Netflow / FlowScan
– Infer congestion