end-host route selection in the cheetah networking solution

36
1 End-host Route Selection in the CHEETAH Networking Solution Zhanxiang Huang 05/01/2006 Advisor: Malathi Veeraraghavan Master’s Project Presentation Acknowledgement: This work was carried out under the sponsorship of NSF ITR-0312376, NSF ANI-0335190, NSF ANI- 0087487, and DOE DE-FG02-04ER25640 grants.

Upload: shaina

Post on 12-Jan-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Master’s Project Presentation. End-host Route Selection in the CHEETAH Networking Solution. Zhanxiang Huang 05/01/2006 Advisor: Malathi Veeraraghavan. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: End-host Route Selection in the CHEETAH Networking Solution

1

End-host Route Selection in the CHEETAH Networking Solution

Zhanxiang Huang

05/01/2006

Advisor: Malathi Veeraraghavan

Master’s Project Presentation

Acknowledgement: This work was carried out under the sponsorship of NSF ITR-0312376, NSF ANI-0335190, NSF ANI-0087487, and DOE DE-FG02-04ER25640 grants.

Page 2: End-host Route Selection in the CHEETAH Networking Solution

2

Outline

• CHEETAH project overview• End-host route selection problem• Model-based solution• Measurement-based solution• Conclusion and future work

Page 3: End-host Route Selection in the CHEETAH Networking Solution

3

Circuit-switched High-speed End-to-End Transport ArcHitecture (CHEETAH)

ConnectionlessBest-effort

Internet

Goal: high-speed rate-guaranteed end-to-end circuits with call-by-call-based bandwidth sharing

long term leased line (under-utilized &

expensive)

TelephonyNetwork

64kbps circuits

end-to-end connection

CongestionDelayJitterLoss

Page 4: End-host Route Selection in the CHEETAH Networking Solution

4

CHEETAH Applications

• Applications:– video telephony

– high speed file transfer

– remote visualization

especially in eScience community,

e.g. Terascale Supernova Initiative (TSI) project

Internet Internet

Page 5: End-host Route Selection in the CHEETAH Networking Solution

5

Current CHEETAH Network

Control card

OC192card

GbE/10GbEcard

Cray X-1

ORNL

SN16000

OC192card

Control card

GbEcard

SN16000

OC192card

Control card

AtlantaSN16000

OC192card

OC-192

OC-192

GbE/10GbEcard

high-speed network

dynamic signaling scheme

end-host software

signaling engine

NCSU

UVA

CUNY

signaling engine

signaling engine

NC

GTech

Page 6: End-host Route Selection in the CHEETAH Networking Solution

6

CHEETAH End-host Software Architecture

– OCS: check Optical Connection Service availability.– Routing Decision: choose between circuit and Internet

path for each file transfer. – RSVP-TE Module: dynamic provision of circuits.– C-TCP: transport layer protocol optimized for circuits.

Internet

End-host End-host

ApplicationTCP/IP

NICII

CHEETAHNetwork

NICIRSVP-TE Module

Routing Decision

OCS Client

CHEETAH software

C-TCP NICII

NICI RSVP-TE Module

Routing Decision

OCS Client

CHEETAH software

ApplicationTCP/IP

C-TCP

Page 7: End-host Route Selection in the CHEETAH Networking Solution

7

Circuit or Internet Path?

• Circuit setup requests may be denied.• It depends on the data transfer delays on the two paths.

Internet(best-effort path)

CHEETAHNetwork(circuit)

End-host End-host

Circuit transfer delay is about 5.1 seconds.

Internet transfer delay is about 100ms.An extreme example: Transfer a 1K-byte file using TCP.

round trip time=24msBottleneck link rate=100Mbps

round trip time=8mscircuit rate=1Gbpssetup delay=5 seconds

Page 8: End-host Route Selection in the CHEETAH Networking Solution

8

What Determines Data Transfer Delays?

• Over paths:– Circuit:

• Circuit rate• Round trip time• Setup delay

– Internet:• Round trip time• Bottleneck link rate• Packet loss rate

• At end-hosts:– Transport layer protocol and parameter settings– OS Process scheduling– Hard disk throughput

Page 9: End-host Route Selection in the CHEETAH Networking Solution

9

How to Estimate Data Transfer Delays?

• Model-based solution– Construct mathematical models for computing file transfer

delays over the circuit and Internet paths.

• Measurement-based solution– Estimate file transfer delays based on delay

measurements of past file transfers.

Page 10: End-host Route Selection in the CHEETAH Networking Solution

10

Model-based Solution

• Modeling TCP delay over Internet path– TCP Reno delay model [UMass98]

• Modeling delay over CHEETAH circuit– Let Pb be the call blocking probability

– Average delay over circuit is

(1 ) ( _ _ _ _ )

( _ _ _ _ _ )b

b

P setup delay transfer delay over circuit

P average setup failure delay delay over Internet

Page 11: End-host Route Selection in the CHEETAH Networking Solution

11

Inputs to Delay Models

• Inputs to TCP Reno delay model: – File size– Bottleneck link rate– Round trip time– Packet loss rate

– Initial congestion window size

– Sender and receiver buffer sizes

• Inputs to circuit delay model:– File size– Circuit rate– Round trip time over the

circuit path– Round trip time over the

signaling path– Call processing delay at

each switch– Signaling engine call load– Number of switches on the

path– Call blocking probability

Page 12: End-host Route Selection in the CHEETAH Networking Solution

12

Limitations of the Model-based Solution

• Packet loss rate is difficult to measure. (Tools that I tested include Sting, iperf, ping, badabing and etc.)

• Same are call blocking probability and signaling engine call load.

• Many TCP variants are emerging but there is no delay model for them yet.– e.g. BIC-TCP has been included in linux kernel 2.6 but has

not been modeled yet.

Page 13: End-host Route Selection in the CHEETAH Networking Solution

13

Internet

Measurement-based Solution

• Assumptions– Fixed circuit rates, e.g. 1Gbps,

100Mbps…

– The number of destinations with which an end-host typically communicates, is not large.

– Internet traffic has repeating patterns over time, which means that during a specific time period, round trip time, packet loss rate and call blocking probability are likely the same.

delay

file size

circuit

Internet

0crossover

circuit

Idea: Discretize time and file size, at each time slot, for each destination and each circuit rate, measure the delays of file transfers over both paths to find the crossover file size.

Page 14: End-host Route Selection in the CHEETAH Networking Solution

14

Active and Passive Measurements

• Active measurements – Traffic is injected into the network explicitly for

the purpose of obtaining measurements.

• Passive measurements– Data is collected under normal network usage.

Page 15: End-host Route Selection in the CHEETAH Networking Solution

15

A Best-case Active-measurement Experiment

Best-case means packet loss rate and call blocking probability are equal to zero. TCP buffers are set to Bandwidth Delay Product values.

Drawback: significant measurement traffic overhead

Page 16: End-host Route Selection in the CHEETAH Networking Solution

16

mid

Active Measurements

Delays on Internet path and circuit are random variables, DI and DC.

1. Find an interval (min, max) that contains the crossover file size;

2. Measure delays on both paths for file size mid=(min+max)/2;

3. If |E(DI)-E(DC)|<e, crossover=mid;

4. If E(DI)>E(DC), max=mid;

5. If E(DI)<E(DC), min=mid;

6. Go to 2;

delay

file size

circuit

Internet

0 crossover

min max

Drawback: measurement traffic overhead

Let M be the initial max file size and N be the initial min file size. Traffic size = O(M*log(M-N)).

Page 17: End-host Route Selection in the CHEETAH Networking Solution

17

Passive Measurements

1. Initiate (min, max) with (0, +inf).

2. If file size < min, choose Internet;

3. If file size > max, choose circuit;

4. If min <= file size <= max, choose each path with probability ½. Record the data transfer delays.

5. Once there are sufficient records to compute Pr(DI-DC>0) for a file size in (min, max), adjust min or max based on Pr(DI-DC>0).

p

file size

maxmin0

1

1/2

crossover

(Note that min and max are file sizes in application queries and assume DI and DC follow normal distributions.)

Page 18: End-host Route Selection in the CHEETAH Networking Solution

18

Hybrid Measurements

• Fast startup– Find the bottleneck link rate of the Internet path and the

circuit setup delay through either passive or active measurement.

– Solve the equation for “file_size”.

– Init (min, max) with (file_size/2, file_size*2).

• Use active measurements when initiated by administrator users.

_ __ _

_ _ _ _ _

file size file sizeestimated setup delay

circuit rate Internet path bottleneck link rate

Page 19: End-host Route Selection in the CHEETAH Networking Solution

19

Bookkeeping Data Structure

Time Slot Destination Circuit Rate

Crossover File Size

Transfer Delay Records

File Size DI (sec) DC (sec)

02:00 – 03:00 Sunday

128.109.34.22 1Gbps 50MByte – 70MByte

50MByte 5.081 5.715

60MByte 5.060 5.066

70MByte 5.033 4.002

… … …

Page 20: End-host Route Selection in the CHEETAH Networking Solution

20

Interaction Between CHEETAH Software Modules and Applications

ApplicationDecision-making

Thread 1

Measurement Monitor

Thread 2

RDDatabase

query

reply

update

Routing Decision Module

triggerreport delays

report blocks

Administrator

Admin Interface

QueryInterface

ReportInterface

RDAPI

RSVPAPI

update

query

reply

RSVP / C-TCP Modules

TCP

trigger

trigger

Active Measurement

Scheduler

Thread 3

SysCallInterface

MeasurementTools

RSVPAPI

trigger

reportdelays orbandwidth

trigger

1 23

4

5 67

5

Page 21: End-host Route Selection in the CHEETAH Networking Solution

21

Evaluation

• Experiment setup– The Routing Decision server and an application run on a

Linux-2.6 box with 2 Xeon 2.8GHz CPUs and 1GB memory.– The application queries with parameters, <128.109.34.22,

1Gbps circuit rate, 1GByte file size, time slot 02:00 Sunday>. The database has an entry corresponding to this IP and time slot.

– Internet path: bottleneck link rate=100Mbps; round trip time =24ms. Circuit: round trip time=8ms.

• Delay– An application submits 100 queries.– Mean query delay = 0.0055 sec < round trip time << 5 sec

(the average setup delay).– Query delay standard deviation = 2.3608e-004 sec < 0.3ms

Page 22: End-host Route Selection in the CHEETAH Networking Solution

22

Conclusion and Future Work

• Conclusion– Measurement-based solution is better than the model-

based solution. Adaptive to new TCP variants Adaptive to the traffic pattern changes Adaptive to hardware or software configuration changes Low overhead

• Future work– Scalability issues

• For a computer that communicates with a large number of end-hosts (e.g. a web server), we can separate the RD module from the computer and run a separate RD server for it.

• For computers in the same LAN and with the same hardware and software configurations, we create an RD server for the whole LAN.

Page 23: End-host Route Selection in the CHEETAH Networking Solution

23

Reference

[CHEETAH] M. Veeraraghavan, X. Zheng, H. Lee, M. Gardner, W. Feng, CHEETAH: Circuit-switched High-speed End-to-End Transport ArcHitecture, Proc. of Opticomm 2003, Oct. 13-17, 2003. Dallas, TX, Won Best Student Paper Award.

[C-TCP] A. P. Mudambi, X. Zheng, and M. Veeraraghavan, A Transport Protocol for Dedicated End-to-End Circuits, accepted by ICC 2006.

[UMass98] J. Padhye, V. Firoiu, D. Towsley and J. Kurose. Modeling TCP throughput: A simple model and its empirical validation. In SIGCOMM ’98, September 1998.

Page 24: End-host Route Selection in the CHEETAH Networking Solution

24

Backup Slides

Page 25: End-host Route Selection in the CHEETAH Networking Solution

25

How to compute Pr(DI-DC>0)?

• Assume the delays observed on the Internet path and the circuit are normally distributed random variables, DI and DC. Each file size has these two random variables.

2

( ) ( ) ( )

( ) ( ) ( )

(2 ), where z is standard normal distribution,

is the sample standard deviation, is the confidence

level and w is the width of the confidence interval

I C I C

I C I C

E D D E D E D

V D D V D V D

n zw

.0

E(DI-DC)P(DI-DC)

DI-DC

Page 26: End-host Route Selection in the CHEETAH Networking Solution

26

CHEETAH network

CentuarFastIron

FESX448

1GCompute-0-4 152.48.249.6

Orbitty Compute Nodes

1GOC192 OC192 GbE

1-8-331-8-34

1-8-35

1-8-36

1-6-1

1-6-171-8-37

MCNCCatalyst

7600

H

H

H

H

H1G1G

1G

1G

1-7-1

Compute-0-3 152.48.249.5

Compute-0-2 152.48.249.4

Compute-0-1 152.48.249.3

Compute-0-0 152.48.249.2

1G

1G1G

1G

Wukong 152.48.249.102

1-8-381-7-17

cheetah-nc

3x1G VLAN

OC192

1-6-1

1-6-17

10GbE

1-7-1

GbE

1-7-33

1-7-34

1-7-35

1-7-36

1-7-37

1-7-38

1-7-39

1GZelda1 10.0.0.11

H

H

H1G

1GZelda2 10.0.0.12

Zelda3 10.0.0.13

1G

1G

Zelda4 10.0.0.14

H

H

Zelda5 10.0.0.15

2x1G MPLS tunnels

1G1G

Cheetah-atl

OC-192 lamda

10GbEGbE

1-7-33

1-7-34

1-7-35

1-7-36

Cheetah-ornl

1-7-1 1-6-1

OC192

X1(E)UCNS1GFC1G

1G

JuniperT320

JuniperT320

1G

1G

Force10E300

switch

ORNL

Atlanta

NC

Direct fibers

VLANs

MPLS tunnels

Wuneng 152.48.249.103H1-8-39

H1G

UVa Catalyst

4948

WASHHOPI

Force10

WASHAbileneT640

NCSUM20

2x1G MPLS tunnels

CUNYFoundry

NYCHOPI

Force10

1G

1GUVa host H

CUNY host

H

1GUVa

CUNY

By Xuan Zheng, [email protected]

Page 27: End-host Route Selection in the CHEETAH Networking Solution

27

Delay model

The average delay using CHEETAH circuit is:

[ ] (1 )( [ ] [ ]) ( [ ] [ ]), (1)

Comparing (1) with [ ], we get:

(1) [ ] (1

primarycircuitE T P E T E T P E T E Tsetup tcpcheetah b b fail

primaryE Ttcp

primaryE T Ptcp b

)( [ ] [ ] [ ]) [ ], (2)

Approximating [ ] to [ ], we get:

(2) (1 )( [ ] [ ]) [ ],

primarycircuitE T E T E T P E Tsetup tcp b fail

E T E Tsetupfail

primarycircuitP E T E T E Ttcp setupb

(3)

If (3)<0 then the application should try to set up a circuit;

otherwise it should use the primary access link.

[ ]In orther words, if [ ] [ ] ,

1

E Tsetupprimary circuitE T E TtcpPb

(4)

then attempt circuit setup, otherwise resort to the TCP/IP Internet path.

Page 28: End-host Route Selection in the CHEETAH Networking Solution

28

(a) is call blocking probability.

(b) [ ] ,

in which

is the size of the file to transfer,

is the data rate of the circuit, and

is round-trip propagation del

2

PbcircuitTf propcircuitE T

rc

f

rccircuitTprop

ay of the circuit.

Circuit delay model (1)

Page 29: End-host Route Selection in the CHEETAH Networking Solution

29

Circuit delay model (2)

(c) [ ] [1 ]( 1) [1 ] , (6)2(1 ) 2(1 )

in which

is the cumulative size of signaling messages used in circuit setup,

is the signaling link

msig sig sp signalingE T k T k Tsp propsetup

rs spsig

msig

rs

rate, assuming all the signaling links have the same rate,

is the traffic load of the M/D/1 queue model of the signaling link,

is the number of switches on the circuit path,

is the call-proce

sig

k

Tsp

ssing delay incurred at each switch,

is the traffic load of the M/D/1 queue model of the signaling

processor, and

is round-trip propagation delay of the circuit's signaling path.

sp

signalingTprop

Page 30: End-host Route Selection in the CHEETAH Networking Solution

30

TCP-Reno delay model (1)(d) [ ] [ ] [ ] [ ] [ ]

The [ ] depends on the specific operating system.

Approximate [ ] to 100ms for BSD-derived stacks and 150ms for Windows,

(i) Calculate

primaryE T E T E T E T E Tss catcp loss delack

E Tdelack

E Tdelack

[ ]

[ ]( 1)[log ( )], [ ] max

1[ ]1 maxmax 1[log ( ) 1 ( [ ] )],

1max1

in which

RTT=[the round trip delay]

1 1/ [the rate of exponential growth of cwnd

E Tss

E dssRTT when E W Wssw

E Tss W wWRTT E d otherwisess

w W

b

during slow start]

[1 (1 ) ](1 )[ ] 1

[sender's initial cwnd size]1

[the maximum window we would expect TCP to achieve at the end of slow start]max

[number of segments to send]

[data segment

dp pE dss

p

w

W

d

p

loss rate]

[ ]( 1) 1[ ]

[number of segments to send a delayed ACK]

wE dssE Wss

b

Page 31: End-host Route Selection in the CHEETAH Networking Solution

31

TCP-Reno delay model (2)

(ii) Calculate [ ]

[ ] [ ( , [ ]) [ ] (1 ( , [ ])) ]

1 (1 )

3 31 (1 ) [1 (1 ) ]( , ) min(1, 3[1 (1 ) ] /[1 (1 ) ]

( ) 0[ ] ,1

6 1( ) 1 21

[the average duration 0

E TlossTOE T l Q p E W E Z Q p E W RTTss ss ssloss

dl psswp p

Q p w wp p

G p TTOE Zp

i iG p pi

T

of the first TO in a sequence of one

or more successful timeouts].

Page 32: End-host Route Selection in the CHEETAH Networking Solution

32

TCP-Reno delay model (3)

(iii) Calculate [ ]

[ ] [ ] / ( , , , )max0

[ ] [ ]

2 8(1 ) 2 2( ) ( )3 3 3

( , , , )max0

1 ( )( , ( ))

2, when ( ) max( , ( )) ( ) 0( ( ) 1)

2 1

1 max ( , ma2

E Tca

E T E d R p RTT T Wca ca

E d d E dca ss

b p bW p

b bp b

R p RTT T W

p W pQ p W p

pW p W

Q p W p G p TbRTT W p

p

WpQ p W

p

)x

,( , ) ( )1 max 0( 2)max

8 1max

otherwiseQ p W G p Tb p

RTT WpW p

Page 33: End-host Route Selection in the CHEETAH Networking Solution

33

Start Setup Delay Timer

Call Bandwidth Requester

Setup Success

Yes

Stop Setup Delay Timer

Init sl = s = su setup_delay*ci

rcuit_rate, cover = false

No

Start Circuit Transfer Delay

Timer

Transfer file of size s over circuit

Stop Circuit Transfer Delay

Timer

Compute Circuit Throughput

Start Internet Transfer Delay

Timer

Transfer file of size s over the Internet

Stop Internet Transfer Delay

Timer

Compute Internet Throughput

Internet Throughput

>Circuit

Throughput

Yes

sl = sIf ( !cover ) su

= 2*sus = (sl+su)/2

No sl = su

Yes

| T_Internet -T_Circuit | <

delta

Yes

Crossover File Size is s and

update the DB

No

sl = 0s = (sl+su)/2If ( !cover ) cover = true

No

Start

End

Too many fails

s denotes File Size, sl denotes the Lower Bound of s, su denotes the upper Bound of s, cover denotes whether or not (sl, su) has covered the crossover file size and delta is the threshold for the difference between circuit and Internet throughputs.

Tear down circuit

Binary Search Algorithm for Determining the Crossover File Size for One Destination

su = ss = (sl+su)/2If ( !cover ) cover = true

Page 34: End-host Route Selection in the CHEETAH Networking Solution

34

Measurement example room in

Page 35: End-host Route Selection in the CHEETAH Networking Solution

35

Experiment setup

mvstu6

CPU 2 CPUs, each is Intel(R) Xeon(TM) CPU

2.80GHz with 1024KB cache

Memory 1GB

Hard disk 1 MegaRAID Model: LD 0 RAID0 69G

OS 2.6.12-1.1381_FC3smp

File system EXT3

NIC Intel PRO/1000 Single Port Adapters working at rate 100Mbps, Full Duplex

Page 36: End-host Route Selection in the CHEETAH Networking Solution

36

Acronym

• CHEETAH – Circuit-switched High-speed End-to-End Transport ArcHitecture

• PLR – Packet Loss Rate• SD – Setup/Teardown Delay• RTT – Round Trip Time• AB – Available Bandwidth• GMPLS – Generalized Multiple Protocol Label

Switching • SONET – Synchronous Optical NETwork• SDH – Synchronous Digital Hierarchy