packet scheduling (the rest of the dueling bandwidth story)

1

Packet Scheduling(The rest of the dueling

bandwidth story)

2

Lab 9: Configuring a Linux Router

Set NICs in 10 Mbps full-duplex modeEnable IPv4 forwardingManually configure routing tablesInstall tcp_sink and udp_sinkGenerate traffic from tcp_gen and udp_genTCP/UDP traffic flow measurements

3

Lab 9 ResultsWhat is the major issue?

What impact did TCP’s flow control have?

What impact did UDP’s flow control (or lack thereof) have?

What implications does this have for today’s Internet?

4

TCP and UDP Traffic for TCP Interarrival Time = 0.0008 sec

0.0000

1.0000

2.0000

3.0000

4.0000

5.0000

6.0000

7.0000

8.0000

9.0000

10.0000

0.0400 0.0300 0.0200 0.0100 0.0050 0.0040 0.0030 0.0020 0.0010 0.0009 0.0008

UDP Interarrival Time (sec)

Net

wor

k Tr

affic

(Mbi

ts/s

ec)

UDP Traffic

TCP Traffic

5

Lab 9 (first part): Conclusions

TCP’s flow control mechanisms back off in the presence of UDP congestionUDP’s lack of flow control mechanisms can cause link starvation for TCP flows TCP application performance (e-mail, web,FTP) can be degraded significantly by UDP traffic on the same shared link

6

Lab 9 (first part): Conclusions (cont.)

UDP is the preferred protocol for most multimedia applications. Why?Future challenge for the Internet community:

Will multimedia applications of the Internet impair the performance of mainstay TCP applications? How can the industry manage this new Internet traffic without stifling the growth of new applications?

7

Lab 9 (Second part): Strict Priority

SchedulingOur first attempt to solve problem of TCP and UDP interaction: Priority SchedulingModify the Linux source codeImplemented a strict priority schedulerPriority based on layer 4 protocolGive TCP priority over UDP which has no flow controlGenerate traffic from tcp_gen and udp_genTCP/UDP traffic flow measurements

8

TCP and UDP Traffic for UDP Interarrival Time = 0.0015 sec

0.0000

1.0000

2.0000

3.0000

4.0000

5.0000

6.0000

7.0000

8.0000

9.0000

10.0000

0.0050 0.0030 0.0020 0.0015 0.0012 0.0010 0.0009 0.0005 0.0003 0.0002

TCP Interarrival Time (sec)

UDP Traffi c

TCP Traffi c

9

Lab 9 (Second part): Conclusions

TCP’s flow control mechanism is “greedy,” but “timid.”Strict priority scheduling removes the “timid” aspects. TCP greedily consumes all available bandwidth.We have not solved the problem. We have just shifted it from UDP to TCP.

10

The “Real” Solution: Fair Scheduling

11

IntroductionWhat is scheduling?Advantages of schedulingScheduling “wish list”Scheduling PoliciesGeneralized Processor Sharing (GPS)Packetized GPS AlgorithmsStochastic Fair Queuing (SFQ) and Class Based Queuing (CBQ)

12

Motivation for Scheduling

TCP application performance degraded significantly by UDP traffic on the same shared linkDifferent versions of TCP may not co-exist fairly (ex: TCP Reno vs. TCP Vegas)Quality of Service (QoS) requirements for next generation InternetMost important: Finishes the story about TCP and UDP traffic mixtures (email and web versus video teleconferencing and Voice over IP)

13

What is Scheduling?Sharing of bandwidth always results in contentionA scheduling discipline resolves contention: Which packet should be serviced next?Future networks will need the capability to share resources fairly and provide performance guaranteesImplications for QoS?

14

Where does scheduling occur?

Anywhere where contention may occurAt every layer of the protocol stackDiscussion will focus on MAC/network layer scheduling – at the output queues of switches and routers

15

Advantages of Scheduling

1) Differentiation - different users can have different QoS over the same network

2) Performance Isolation - behavior of each flow or class is independent of all other network traffic

3) QoS Resource Allocation - with respect to bandwidth, delay, and loss characteristics

4) Fair Resource Allocation - includes both short and long term fairness

16

Scheduling “Wish List”An ideal scheduling discipline…

1) Is amenable to high speed implementation

2) Achieves (weighted) fairness3) Supports multiple QoS classes4) Provides performance bounds5) Allows easy admission control decisions

Does such an algorithm exist that can satisfy all these requirements?

17

Requirement 1: High Speed

ImplementationScheduler must make a decision once every few microsecondsShould be implementable in hardware. Critical constraint: VLSI area available.Should be scalable and efficient in software. Critical constraint: Order of growth per flow or class.

18

Requirement 2: Fairness

Scheduling discipline allocates a scare resourceFairness is defined both on a short term and long term basisFairness is evaluated according to the max-min criteria

19

Max-Min Fairness Criteria

Each connection gets no more bandwidth than what it needsExcess bandwidth, if any, is shared equallyExample: Generalized Processor Sharing (GPS) scheduler managing three flows with equal priority

20

Benefits of Fairness Fair schedulers provide protection

Bandwidth gobbling applications are kept in checkAutomatic isolation of heavy traffic flows

Fairness is a global (Internet level) objective, while scheduling is local (router or switch level)Global fairness guarantees are beyond the scope of the course (go to grad school :>)

21

Scheduling Policies1) First Come First Serve (FCFS)

Packets queued in FCFS orderNo fairnessMost widely adopted scheme in today’s Internet

2) Strict PriorityMultiple queues with different prioritiesPackets in a given queue are served only when all higher priority queues are empty

3) Generalized Processor Sharing (GPS)

22

Generalized Processor Sharing

Idealized fair queuing approach based on a fluid model of network trafficDivides the link of bandwidth B into a discrete number of channelsEach channel has bandwidth bi where:

B = b1 + b2 + b3 + …

Extremely simple in conceptImpossible to implement in practice. Why?

23

Shortcomings of GPSReason 1: Inaccurate traffic modelUnderlying model of networks is fluid-based or continuousActual network traffic consists of discrete units (packets)Impossible to divide link indefinitely

24

Shortcomings of GPSReason 2: Transmission is serialGPS depicts a parallel division of link usageActual networks transmit bits serially“Sending more bits” implies sending increasing the transmission rate

25

Packetized GPSPacketized version of GPSAttempts to approximate the behavior of GPS as closely as possibleAll schemes hereafter fall under this category

26

Packetized GPS Algorithms

1) Weighted Fair Queuing (WFQ)2) Weighted Round Robin (WRR)3) Deficit Round Robin (DRR)4) Stochastic Fair Queuing

(SFQ)5) Class Based Queuing (CBQ)6) Many, many others…

27

Weighted Fair QueuingComputes the finish time of each packet under GPSPackets tagged with each finish timePacket with smallest finish time across queues is serviced firstNot scalable due to the overhead of computing the ideal GPS schedule

28

WFQ: An Example3 flows A, B, C read left to rightAssume all packets are same sizeGiven example weights A=1, B=2, C=3Divide packet finish time by weightWeighted fair share of service results

2

5

8

1347

6

29

Weighted Round RobinSimplest approximation of GPSQueues serviced in round robin fashion, proportional to assigned weightsMax-min fair over long time scalesMay cause short term unfairness

A

B

C

Fixed Tx Schedule:C C C B B A A

30

Deficit Round Robin (DRR)

Handles varying size packetsEach queue begins with zero credits or quantaFlow transmits a packet only when it accumulates enough quanta, subtract used quantaA queues not served during a round accumulates a weighted number of quantaUse of quanta permit DRR to fairly serve packets of varying size

31

Stochastic Fair Queuing*

Traffic divided into a large number of FIFO queues serviced in a round robin fashionUses a “stochastic” rather than fixed allocation of flows to queues by means of a hashing algorithm to decide which queue to put flow inPrevents unfair bandwidth usage of any one flowFrequent recalculating of the hash necessary to ensure fairness Extremely simple to configure in Linux

32

Class Based Queuing*A framework for organizing hierarchical link sharing

• Link divided into different traffic classes

• Each class can have its own scheduling algorithm, providing enormous flexibility

• Classes can borrow spare capacity from a parent class

• Most difficult scheduling discipline to configure in Linux

33

CBQ: An Example

34

Some results from a previous semester’s

final lab • Covered SFQ and CBQ• Identical experimental setup as Lab 9• SFQ and CBQ are already built into version 2.4.7-10 and higher of the Linux kernel

• No modification of the source code required

• Repeat TCP and UDP traffic measurements to determine the impact of each scheduling discipline

35

Overview (cont.)1) Do TCP and UDP flows share the

link fairly in the experiment?2) What are the relative

advantages and disadvantages of SFQ vs. CBQ? How does each one meet the 5 requirements of the scheduling “wish list”?

36

Overview (cont.)3) Are these scheduling

disciplines scalable to the complexity required to handle real Internet traffic?

4) How can these scheduling algorithms be used to provide QoS guarantees in tomorrow’s Internet? What might this architecture look like?

37

How we turned on SFQ• cd /usr/src/linux-2.4.18-14• make oldconfig• This command will save all of the options that are currently built

into the kernel to a file (.config). This allows you to keep the current options you have selected and add to them, rather than erase the options you have previously turned on.

• cp .config /root (y to overwrite) • make clean• make mrproper• make xconfig• Click “Load Configuration from file”; in Enter filename, type

/root/.config• We need to turn on several options.• In the main menu, select Networking Options. Scroll down and

select QoS and/or Fair Queuing. Select <y> for every option in this menu. This will enable every available queuing discipline that is built into the Linux Kernel. Click on OK. Click Main Menu. Click Save and Exit. Click OK.

• make dep• make bzImage• Completed the remaining steps we did in lab 9 to compile the kernel

38

How we turned on fair queuing

• Opened an xterm window and type:• tc qdisc add dev eth1 root sfq perturb 5• This line enables SFQ and installs it on the interface eth0, which is

connected to your destination. The command tc sets up a traffic classifier in the router. The word qdisc stands for queuing discipline. The value perturb 5 indicates that the hashing scheme used by SFQ is reconfigured once every 5 seconds. In general, the smaller the pertrub value, the better the division of bandwidth between TCP and UDP. To change perturb value to a different value (e.g., 6), type the followings

• tc qdisc del dev eth1 root sfq perturb 5• then• tc qdisc add dev eth1 root sfq perturb 6• Now, type the following command:• tc –s –d qdisc ls• This should return a string of text similar to the following:• qdisc sfq 800c: dev eth1 quantum 1514b limit 128p flows 128/1024 perturb

5sec• Sent 4812 bytes 62 pkts (dropped 0, overlimits 0)• The number 800c is the automatically assigned handle number. Limit means

that 128 packets can wait in this queue. There are 1024 hash buckets available for accounting, of which 128 can be active at a time (no more than 128 packets would be queued!) Once every 5 seconds, the hashes are reconfigured.

39

Stochastic Fair Queuing

• Enabled SFQ and set perturb value to 5 which means hashing scheme used by SFQ is reconfigured once every 5 seconds

40

Measured ResultsUDP IA time

TCP (Mb/s) Measured

UDP Attempted (Mb/s)

UDP Measured (Mb/s)

0.05 9.0 0.1638 0.15

0.01 8.41 0.8192 0.79

0.005 7.56 1.6384 1.61

0.001 4.82 8.192 4.32

0.0001 4.87 81.92 4.34

41

Stochastic Fair Queuing

0

1

2

3

4

5

6

7

8

9

10

0 20 40 60 80 100

Attempted UDP (Mb/s)

Mea

sure

d (

Mb

/s)

TCP

UDP

42

How we turned on CBQ• tc qdisc add dev eth1 root handle 1: cbq bandwidth 10Mbit allot 1514

cell 8 avpkt 1024 mpu 64• This line enables CBQ and installs it on the interface eth0, which is

connected to your destination. • The command tc sets up anything related to traffic controller in a

router. • The word qdisc stands for queuing discipline. • Generally, the classes in CBQ can be constructed into a tree structure,

starting from the root and its direct descendants. A descendant is a parent if it has its own direct descendants. Each parent can originate a CBQ with a certain amount of bandwidth available for its direct descendants. Each descendant class is identified by a class identification with the syntax handle x. In this case, the root handle 1:0 means that this CBQ is located at root and the classid of a direct descendant classes of the root has the form 1:x (e.g., 1:1, 1:2, 1:3).

• The bandwidth 10Mbits is the maximum available bandwidth for this CBQ.• allot is a parameter that is used by the link sharing scheduler. • A cell value of 8 indicates that the packet transmission time will be

measured in terms of 8 bytes. • mpu represents the minimum number of bytes that will be sent in a

packet. Packets that are of size less than mpu are set to mpu usually set equal to 64. This is done because for ethernet-like interfaces the minimum packet size is 64.

43

How we turned on CBQ• tc class add dev eth1 parent 1:0 classid 1:1 cbq bandwidth 10Mbit

rate 10Mbit allot 1514 cell 8 avpkt 1024 mpu 64 maxburst 40• tc class add dev eth1 parent 1:1 classid 1:2 cbq bandwidth 10Mbit

rate 5Mbit allot 1514 cell 8 avpkt 1024 mpu 64 maxburst 40• tc class add dev eth1 parent 1:1 classid 1:3 cbq bandwidth 10Mbit

rate 5Mbit allot 1514 cell 8 avpkt 1024 mpu 64 maxburst 40• First , we define a direct descendant classes of 1:0, whose classid

is 1:1. Then, we define two direct descendant classes of 1:1, whose classids are 1:2 (for TCP traffic) and 1:3 (for UDP traffic)

• The tc class add is a command used to define a class. • parent defines the parent class.• cbq bandwidth 10Mbits represents the maximum available bandwidth

possible for the class• rate 5Mbit is the bandwidth guaranteed for the class• For each class, we enable “bandwidth borrowing” option, in which a

descendant class is allowed to borrow the available bandwidth from its parent.

• In CBQ, a class can send at most maxburst back-to-back packets, so the rate of a class is proportional to maxburst : rate = packetsize * maxburst * 8 / (kernel clock speed)

44

How we turned on CBQ• Type the following commands,• tc filter add dev eth1 parent 1:0 protocol ip

u32 match ip protocol 6 0xff flowid 1:2 • tc filter add dev eth1 parent 1:0 protocol ip

u32 match ip protocol 17 0xff flowid 1:3• tc filter add is a command that installs a

filter for IP packets passing through a device• flowid represents the classid with which the

filter is associated. If the IP protocol number in the IP header of a packet is equal to 6 (TCP), the packet belongs to class 1:2. If the IP protocol number in the IP header of a packet is equal to 17 (UDP), the packet belongs to class 1:3.

45

Class Based Queuing• Can define separate classes for different applications and then treat them equally (or unequally if desired)

• Here CBQ was enabled with each class assigned 5 Mb/s rate

46

Measured ResultsUDP IA time

TCP (Mb/s) Measured

UDP Attempted (Mb/s)

UDP Measured (Mb/s)

0.05 8.99 0.1638 0.16

0.01 8.43 0.8192 0.79

0.005 7.71 1.6384 1.59

0.001 4.76 8.192 4.78

0.0001 4.75 81.92 4.75

47

Class Based Queuing

0

1

2

3

4

5

6

7

8

9

10

0 20 40 60 80 100

Attempted UDP (Mb/s)

Mea

sure

d (

Mb

/s)

TCP

UDP

48

Conclusions• Class based Queuing allocates bandwidth better than any other approach we have used, including SFQ.

• Neither type of traffic gets more than 5 Mb/s (unless there is no other traffic class in which case more than 5 Mb/s will be allowed

packet scheduling (the rest of the dueling bandwidth story)

Documents

sinkgenerate traffic

flow controlgenerate

new internet traffic

problem of tcp

protocolgive tcp priority

udp interaction

routing tablesinstall

udps flow control