experiences in design and implementation of a high performance transport protocol yunhong gu, xinwei...

32
Experiences in Design and Implementation of a High Performance Transport Protocol Yunhong Gu , Xinwei Hong, and Robert L. Grossman National Center for Data Mining

Upload: brice-potter

Post on 28-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Experiences in Design and Implementation of a High Performance Transport Protocol

Yunhong Gu, Xinwei Hong, and Robert L. Grossman

National Center for Data Mining

Outline

• TCP’s inefficiency in grid applications

• UDT

• Design issues

• Implementations issues

• Conclusion and future work

TCP and AIMD

• TCP has been very successful in the Internet– AIMD (Additive Increase Multiplicative

Decrease)

• Fair: max-min fairness• Stable: globally asynchronously stable• But, inefficient and not scalable

– In grid networks (with high bandwidth-delay product)

• RTT bias

Efficiency of TCP

1 Gb/s link, 200ms RTT, between Tokyo and Chicago

28 minutesOn 10 Gb/s link, 200ms RTT, it will take 4 hours 43 minutes to recover from a single loss.TCP’s throughput model:

It needs extremely low loss rate on high bandwidth-delay product networks.

pRTT

S

2

3

Fairness of TCP

100ms1 Gb/s

1ms1Gb/s

Merge two real-time data streams

From Chicago 1 to Chicago 2: 800Mbps

From Amsterdam to Chicago 2: 80Mbps

The throughput is limited by the slowest stream!

Amsterdam

Chicago 2

Chicago 1

UDT – UDP-based Data Transfer Protocol

• Application level transport protocol built above UDP

• Reliable data delivery

• End-to-end approach

• Bi-directional

• General transport API; not a (file transfer) tool.

• Open source

UDT Architecture

DATA

ACK

ACK2

NAK

Sender

Recver

Sender

Recver

Pkt. Scheduling Timer

ACK Timer

NAK Timer

Retransmission Timer

Rate Control Timer

Sender

UDT – Objectives

• Goals– Easy to install and use– Efficient for bulk data transfer– Fair– Friendly to TCP

• Non-goals– TCP replacement– Messaging service

Design Issues

• Reliability/Acknowledging

• Congestion/Flow Control

• Performance evaluation– Efficiency– Fairness and friendliness– Stability

Reliability/Acknowledging

• Acknowledging is expensive– Packet processing at end hosts and routers– Buffer processing

• Timer-based selective acknowledgement– Send acknowledgement per constant time (if

there are packets to be acknowledged)

• Explicit negative acknowledgement

Congestion Control

• AIMD with decreasing increases

• Increase formula

• Decrease– 1/9

• Control interval is constant – SYN = 0.01 second

SYNSx xCL 11500

10)( ))(log(

UDT Algorithm

C (Mbps) L - C (Mbps) Increment (pkts/SYN)

[0, 9000) (1000, 10000] 10

[9000, 9900) (100, 1000] 1

[9900, 9990) (10, 100] 0.1

[9990, 9999) (1, 10] 0.01

[9999, 9999.9) (0.1, 1] 0.001

9999.9+ <0.1 0.00067

L = 10 Gbps, S = 1500 bytes

UDT: Efficiency and Fairness Characteristics

• Takes 7.5 seconds to reach 90% of the link capacity, independent of BDP

• Satisfies max-min fairness if all the flows have the same end-to-end link capacity– Otherwise, any flow will obtain at least half of

its fair share

• Does not take more bandwidth than concurrent TCP flow as long as

6/10822 SYNLRTT

Efficiency

0 10 20 30 40 50 60 70 80 90 1000

200

400

600

800

1000

Time (s)

Th

rou

gh

pu

t (M

bp

s)

to Chicago, 1Gbps, 0.04msto Canarie, OC-12, 16msto Amsterdam, 1Gbps, 110ms

• UDT bandwidth utilization– 960Mb/s on 1Gb/s– 580Mb/s on OC-12 (622Mb/s)

0 10 20 30 40 50 60 70 80 90 100

0

200

400

600

0 10 20 30 40 50 60 70 80 90 100320

322

324

326

328

330

Time (s)

Th

rou

gh

pu

t (M

bp

s)

Fairness

• Fair bandwidth sharing between networks with different RTTs and bottleneck capacities – 330 Mb/s each for the 3 flows from Chicago to Chicago

Local via 1Gb/s, Amsterdam via 1Gb/s and Ottawa via 622Mb/s

Fairness

10-2

10-1

100

101

102

103

0.8

0.85

0.9

0.95

1

RTT (ms)

Fa

irn

ess

Ind

ex

UDTTCP

• Fairness index– Simulation: Jain’s Fairness Index for 10 UDT

and TCP flows over 100Mb/s link with different RTTs

RTT Fairness

100

101

102

103

0.9

0.92

0.94

0.96

0.98

1

RTT (ms)

RT

T F

airn

ess

• Fairness index of TCP flows with different RTTs– 2 flows, one has 1ms RTT, the other varies

from 1ms to 1000ms

Fairness and Friendliness

50 TCP flows and 4 UDT flows between SARA and StarLight

Realtime snapshot of the throughput

The 4 UDT flows have similar performance and leave enough space for TCP flows

TCP Friendliness

0 1 2 3 4 5 6 7 8 9 1020

30

40

50

60

70

80

Number of UDT flows

TC

P T

hro

ug

hp

ut (

Mb

ps)

• Impact on short life TCP flows– 500 1MB TCP flows with 1-10 bulk UDT

flows, over 1Gb/s link between Chicago and Amsterdam

Stability

10-2

10-1

100

101

102

103

0

0.2

0.4

0.6

0.8

RTT (ms)

Sta

bili

ty In

de

x

UDTTCP

• Stability index of UDT and TCP– Stability: average standard deviation of throughout per

unit time– 10 UDT flows and 10 TCP flows with different RTTs

Implementations Issues

• Efficiency and CPU utilization

• Loss information processing

• Memory management

• API

• Conformance

Efficiency and CPU utilization

• Efficiency = Mbps/MHz

• Maximize throughput– Use CPU time as little as possible, so that CPU

won’t be used up before network bottleneck is reached

– Remove CPU burst, which can cause packet loss: even distribution of processing

• Minimize CPU utilization

Loss Processing

• On high BDP networks, the number of lost packets can be very large during a loss event

• Access to the loss information may take long time• Acknowledge may take several packets

0 10 20 30 40 50 60 70 80 90 1000

1000

2000

3000

Loss Events

Nu

mb

er

of L

oss

Pa

cke

ts

Loss Processing

• UDT loss processing– Most loss are continuous– Record loss event other than lost packets– Access time is almost constant

0 10 20 30 40 50 60 70 80 90 1000

2

4

6

8

Loss Events

Acc

ess

Tim

e (

us)

Memory Processing

• Memory copy avoidance• Overlapped IO• Data scattering/gathering• Speculation of next packet

Protocol Buffer Protocol Buffer

User Buffer

Data

New Data

API

• Socket-like API

• Support overlapped IO

• File transfer API– sendfile/recvfile

• Thread safe

• Performance monitoring

API - Example

UDTSOCKET client = UDT::socket(AF_INET, SOCK_STREAM, 0);

UDT::connect(client, (sockaddr*)&serv_addr, sizeof(serv_addr));

If (UDTERROR == UDT::send(client, data, size, 0)){ //error processing}

int client = socket(AF_INET, SOCK_STREAM, 0);

connect(client, (sockaddr*)&serv_addr, sizeof(serv_addr));

If (-1 == send(client, data, size, 0)){ //error processing}

Implementation Efficiency

0 5 10 15 20 25 30 35 40 45 500

10

20

30

40

50

60

Sample Event

CP

U U

sag

e (

%)

udt sendingudt receivingtcp sendingtcp receiving

• CPU usage of UDT and TCP– UDT takes about 10% more CPU than TCP– More code optimizations are still on going

Conclusion

• TCP is not suitable for distributed data intensive applications over grid networks

• We introduced a new application level protocol named UDT, to overcome the shortcomings of TCP

• We explained the design rationale and implementations details in this paper

Future Work

• Bandwidth Estimation

• CPU utilization– Self-clocking– Code optimization

• Theoretical work

References

• More details can be found in our paper.

• UDT specification– Draft-gg-udt-01.txt

• Congestion control– Paper on Gridnets '04 workshop

• UDT open source project– http://udt.sf.net

Thank you!

Questions and comments are welcome!

For more information, please visit

Booth 653 (UIC/NCDM) at Exhibition Floor

UDT Project: http://udt.sf.net

NCDM: http://www.ncdm.uic.edu