supporting differentiated service classes in large ip networks.pdf

Upload: abdulrahman-m-abutaleb

Post on 14-Apr-2018

222 views

Category:

Documents


1 download

TRANSCRIPT

  • 7/27/2019 Supporting Differentiated Service Classes in Large IP Networks.pdf

    1/35

    Juniper Networks, Inc.1194 North Mathilda AvenueSunnyvale, CA 94089 USA

    408 745 2000 or 888 JUNIPERwww.juniper.net

    Part Number:200019-001 12/01

    Supporting Differentiated Service Classes

    in Large IP Networks

    Chuck Semeria

    Technical Marketing Engineer

    John W. Stewart III

    Product Line Manager

    White Paper

  • 7/27/2019 Supporting Differentiated Service Classes in Large IP Networks.pdf

    2/35

    Copyright 2001, Juniper Networks, Inc.

    Contents

    Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Fundamentals of Differentiated Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    Classic Time-division Multiplexing vs. Statistical Multiplexing . . . . . . . . . . . . . . . . . . 6Classic Time-division Multiplexing (TDM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6Statistical Multiplexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    Best-effort Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Differentiated Service Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    The Impact of Statistical Multiplexing on Perceived Quality of Service . . . . . . . . . . . . . . . 11Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    Sources of Network Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Managing Delay While Maximizing Bandwidth Utilization . . . . . . . . . . . . . . . . 14

    Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Impact of Jitter on Perceived QoS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Packet Loss Can Be Good . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    A Brief History of Differentiated Services in Large IP Networks . . . . . . . . . . . . . . . . . . . . . 20The First Approach: RFC 791 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20The Second Approach: The Integrated Services Model (IntServ) . . . . . . . . . . . . . . . . 21

    IntServ Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21An IntServ Enhancement: Aggregation of RSVP Reservations . . . . . . . . . . . . . . 22

    The Third Approach: The Differentiated Services Model (DiffServ) . . . . . . . . . . . . . . 23The IETF Architecture for Differentiated Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    Differentiated Services Domain (DS Domain) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25Differentiated Service Router Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    Packet Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26Traffic Conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26Differentiated Services Router Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    Per-hop Behaviors (PHBs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28Expedited Forwarding (EF PHB) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28Assured Forwarding (AF PHB) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29Default PHB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    General Observations about Differentiated Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30DiffServ Does Not Create Free Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30DiffServ Does Not Change the Speed of Light . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30The Strictest Service Guarantees Will Be between Well-Known Endpoints . . . . . . . . 31

    Support for Interprovider DiffServ Is a Business Issue . . . . . . . . . . . . . . . . . . . . . . . . . 31Providers Do Not Control All Aspects of the User Experience . . . . . . . . . . . . . . . . . . 31Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31Acronym Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

    Requests for Comments (RFCs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34Internet Drafts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34Textbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34Technical Papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

  • 7/27/2019 Supporting Differentiated Service Classes in Large IP Networks.pdf

    3/35

    Copyright 2001, Juniper Networks, Inc. 3

    List of Figures

    Figure 1: Classic Time-division Multiplexing (TDM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6Figure 2: Statistical Multiplexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Figure 3: Classic TDM vs. Statistical Multiplexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Figure 4: End-to-end Delay Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Figure 5: Bandwidth Utilization vs. Round-trip Time (RTT) Delay . . . . . . . . . . . . . . . . . . . 15Figure 6: Jitter Makes Packet Spacing Uneven . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Figure 7: Sources of Packet Loss in IP Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Figure 8: Multiple Queues with Different Shares of a Ports Bandwidth . . . . . . . . . . . . . . 19Figure 9: Tail-drop Queue Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19Figure 10: RFC 791 Bit Definitions of ToS Bytes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20Figure 11: Resource Reservation Protocol (RSVP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    Figure 12: Cost Relative to Complexity of Differentiated Services Solutions . . . . . . . . . . . 23Figure 13: Differentiated Services Field (DS Field) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24Figure 14: Differentiated Services Domain (DS Domain) . . . . . . . . . . . . . . . . . . . . . . . . . . . 25Figure 15: Packet Classifier and Traffic Conditioner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26Figure 16: DiffServ Router Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    List of Tables

    Table 1: Serialization DelayPacket Size vs. Port Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Table 2: Recommended AF DiffServ Codepoint (DSCP) Values . . . . . . . . . . . . . . . . . . . . . 29

    http://hardware%20router%20wp.pdf/http://hardware%20router%20wp.pdf/
  • 7/27/2019 Supporting Differentiated Service Classes in Large IP Networks.pdf

    4/35

    Supporting Differentiated Service Classes in Large IP Networks

    Copyright 2001, Juniper Networks, Inc.

    4

    Executive Summary

    This white paper is the introduction to a series of papers published by Juniper Networks, Inc.that describe the support of differentiated service classes in large IP networks. This overview

    presents the motivations for deploying multiple service classes, the fundamentals of statisticalmultiplexing, and the impact of statistical multiplexing on the quality of service delivered by a

    network in terms of packet throughput, delay, jitter, and loss. We also provide a brief history ofthe various approaches that have been proposed to support differentiated service classes, a

    description of the IETF DiffServ architecture, and general observations about what you canexpect from the deployment of multiple service classes in your network. The other papers inthis series provide technical discussions of queue scheduling disciplines, queue memory

    management, host TCP congestion-avoidance mechanisms, and other issues related to thedeployment of multiple service classes in your network.

    Perspective

    Service provider IP networks have traditionally supported only public Internet service.Initially, Internet applications (e-mail, remote login, file transfer, and Web access) were not

    considered mission-critical and did not have specific performance requirements forthroughput, delay, jitter, and packet loss. As a result, a single best-effort class of service (CoS)

    was adequate to support all Internet applications.

    However, the commercial success of the Internet has caused all of this to change, thus affecting

    service providers in several ways.

    I

    Your IP network is now the single largest consumer of bandwidth, or at least is growing

    toward this trend.

    I

    Your networks 24/7 availability and reliability are even more imperative. Internet serviceshave become mission-critical. For some organizations, such as online retailers or stockmarkets, the cost of an hour-long network outage can be extremely expensive.

    I

    You need to differentiate your company from the competition by offering a range of serviceclasses with service-level agreements (SLAs) that are specifically tailored to meet your

    customers and their customers requirements.

    I

    You want to offer better classes of service to your premium customers and charge more forthose services.

    I

    You are probably considering offering services such as voice-over-IP (VoIP) or virtualprivate networks (VPNs) that have more rigid performance requirements than traditionalInternet applications.

    You may also be considering deploying a variety of services over a shared IP infrastructure,

    each of which has different performance requirements. In a multiservice IP network, IP routersrather than Frame Relay switches, ATM switches, or voice switches are used to access thetransmission network.

    I

    A larger service portfolio allows you to attract and keep new customers

    I

    Converged networks minimize your operating expenses, because you have fewer networksto manage.

    I

    A packet-based network maximizes bandwidth efficiency through the use of statisticalmultiplexing.

  • 7/27/2019 Supporting Differentiated Service Classes in Large IP Networks.pdf

    5/35

    Copyright 2001, Juniper Networks, Inc. 5

    Supporting Differentiated Service Classes in Large IP Networks

    There are two fundamentally different approaches to supporting the delivery of multiple

    service classes in large IP networks. One approach is simply to overprovision the network andthrow raw bandwidth at the problem. The other approach is to build a CoS-enabled backbone

    based on bandwidth management.

    Those who favor overprovisioning argue that:

    I

    The additional cost and complexity of managing traffic outweighs the gain it provides inbandwidth efficiency.

    I

    It is very difficult to monitor, verify, and account for multiple service classes in large IPnetworks.

    I

    You already have other CoS-enabled infrastructures (TDM and ATM) that you can use tosupport services that have strict performance requirements.

    Those who favor bandwidth management argue that:

    I

    Bandwidth management allows you to optimize bandwidth utilization and run yournetwork at close to its maximum capacity.

    I

    New applications emerge, you deploy new networking equipment, and bandwidth arrives

    in discrete chunks. These events rarely occur in a coordinated manner, and trafficmanagement allows you to control bandwidth and smoothly handle mismatches innetwork capacity as these transitions occur.

    I

    Bandwidth management allows you to increase your revenue by selling multiple serviceclasses over a shared infrastructure, such as a converged IP/MPLS backbone. A convergedinfrastructure allows you to reduce your operating expenses, to use a single access

    technology, and to market a wide range of integrated products, such as Internet access,VPN access, and videoconferencing.

    While the arguments for both of these approaches are convincing, the cost is roughly equal.Initially, the deployment of bandwidth management in your network involves simply enabling

    specific router functions. However, there are a number of hidden training, operational, andmaintenance costs involved in successfully managing bandwidth in a production network.

    Also, while it is relatively easy to understand how to manage bandwidth from an engineeringperspective, service providers have very little practical experience in supporting, debugging,

    tuning, and accounting for multiple service classes in large IP networks. On the other hand, ifyou do not have the ability to throttle traffic to some degree, even a network of enormous

    bandwidth can be overrun by misbehaving applications to a point that mission-critical anddelay-sensitive services are severely impacted.

    Successful providers will adopt a solution that is based on a combination of overprovisioningbandwidth and MPLS traffic engineering to minimize the long-term average level of

    congestion, while also deploying Integrated Services (IntServ) and Differentiated Services(DiffServ) to address the requirements of delay- and jitter-sensitive traffic during short-termperiods of congestion. It is only through a combination of technologies that you will be able to

    support the delivery of differentiated service classes on a large scale and at a reasonable cost.

  • 7/27/2019 Supporting Differentiated Service Classes in Large IP Networks.pdf

    6/35

    Supporting Differentiated Service Classes in Large IP Networks

    Copyright 2001, Juniper Networks, Inc.

    6

    Fundamentals of Differentiated Services

    To support business objectives that require multiple service classes, there is growing interest inthe mechanisms that make it possible to deliver differentiated traffic classes over a common IP

    infrastructure. Because these mechanisms are widely misunderstood, we begin with adiscussion of some of the fundamental concepts that are relevant to the deployment of

    differentiated service classes.

    Classic Time-division Multiplexing vs. Statistical Multiplexing

    Network transmission facilities are an expensive resource, as you know. Multiplexing can saveyou money by allowing many different data flows to share a common physical transmission

    path, rather than requiring that each flow have a dedicated transmission path. There arecurrently two basic types of multiplexing used in data communications:

    I

    Time-division multiplexing (TDM)The transmission facility is divided into multiplechannels by allocating the facility to several different channels, one at a time.

    I

    Frequency-division multiplexing (FDM)The transmission facility is divided into multiple

    channels by using different frequencies to carry different signals.

    Within TDM, there are two methods of arbitrating bandwidth on an output port: the staticallocation of fixed-sized time slots and the dynamic allocation of variable-sized time slots.

    Classic TDM devices switch traffic by using static arbitration to allocate input bandwidth to anequal amount of output bandwidth and by mapping traffic to a specific output time slot.

    Packet switches use variable arbitration, with bandwidth allocated on demand on a per-packetbasis.

    Classic Time-division Multiplexing (TDM)

    Classic time-division multiplexing (TDM) is a technique that is applied to circuit-switched

    networks. TDM assumes that data streams are organized into bits, bytes, or words rather thanpackets. Figure 1 illustrates the basic concept behind TDM.

    Figure 1: Classic Time-division Multiplexing (TDM)

    Although the following description is not the classic definition of TDM, it is sufficient to

    provide a background for our discussion of differentiated service classes. At the ingress end ofthe shared link, the TDM multiplexer samples and then interleaves the five discrete input data

    streams in a round-robin fashion, granting each stream the entire bandwidth of the shared linkfor a very short time. TDM guarantees that the bandwidth of the output link is never less thanthe sum of the rates of the individual input streams, because, at input, each unit of bandwidth

    1

    4

    3

    2

    5

    1

    4

    3

    2

    5

    14 3 2514 3 25

    MUX DEMUX

  • 7/27/2019 Supporting Differentiated Service Classes in Large IP Networks.pdf

    7/35

    Copyright 2001, Juniper Networks, Inc. 7

    Supporting Differentiated Service Classes in Large IP Networks

    is mapped at configuration time to an equal-sized unit of bandwidth on the output link. At the

    egress end of the shared link, the TDM demultiplexer processes the traffic and reconstructs thefive individual data streams.

    There are two key features of classic TDM that are relevant to our discussion of supportingmultiple service classes:

    I

    First, it is not necessary to buffer data when input streams are multiplexed onto the sharedoutput link, because the capacity of the output link is always greater than or equal to the

    sum of the rates of the individual input streams.

    I

    Second, classic TDM leads to an aggregate underutilization of bandwidth on an output

    port. Assuming that you are transmitting packet data over a classic TDM system, eachinput channel consumes somewhere between zero percent and 100 percent of its available

    bandwidth, depending on the burstiness of the application. If you examine the bandwidththat is not used and add this up for all of the channels in your system, you can achieve anoverall bandwidth utilization on the output port of only 10 to 15 percent, depending on the

    specific behavior of your traffic.

    Two common examples of classic TDM in large carrier or provider networks are:

    I

    A T-1 multiplexer with 28 T-1 circuits on the input side and one DS-3 circuit on the outputside, or

    I

    A SONET multiplexer with 4 OC-12c/STM-4s on the input side and one OC-48c/STM-16

    on the output side.

    Statistical Multiplexing

    Statistical multiplexing is designed to support packet-switched networks by dynamicallyallocating variable-length time slots on an output port. Statistical multiplexing devices assume

    that data flows are organized into packets, frames, or cells rather than bits, bytes, or words.Figure 2 illustrates the basic concept behind statistical multiplexing.

    Figure 2: Statistical Multiplexing

    Unlike classic TDM devices, a statistical multiplexing device does not

    map each unit of inputbandwidth to an equal-sized unit of bandwidth on an output port. Statistical multiplexing

    dynamically allocates bandwidth on an output port only to active input streams, making betteruse of the available bandwidth and allowing more streams to be transported across the shared

    port than with other multiplexing techniques.

    Stat Mux Device

    21

    3

  • 7/27/2019 Supporting Differentiated Service Classes in Large IP Networks.pdf

    8/35

    Supporting Differentiated Service Classes in Large IP Networks

    Copyright 2001, Juniper Networks, Inc.

    8

    A packet, frame, or cell arriving on one port of a statistical multiplexing device can potentially

    exit from any other port of the device. The specific output port is determined by the result of alookup, based on the contents of the packet headera MAC address, a VPI/VCI, a DLCI, or anIP address. This means that there may be times when more packets, frames, or cells need to be

    transmitted from a port than the given port has bandwidth to support. When this occurs, thestatistical multiplexing device places the oversubscribed packets, frames, or cells into a buffer

    (queue) that is associated with the output port. The buffer absorbs packets during theextremely short periods of time when the output port experiences congestion.

    Common examples of statistical multiplexing devices in large carrier or provider networksinclude:

    I

    IP routers,

    I

    Ethernet switches, and

    I

    Frame Relay switches.

    Optimal Buffer Size

    Determining the optimal size for a packet buffer is critical, because providing a packet buffer

    that is too small is just as bad as providing a packet buffer that is too large.

    I

    Small packet buffers can cause packets from bursts to be dropped. This forces a host TCP to

    reduce its transmission rate by returning to slow-start or congestion-avoidance mode. Thiscan severely reduce the sessions overall packet throughput rate.

    I

    Large packet buffers at each hop can cause the total round-trip time (RTT) to increase to apoint where packets that are waiting in buffers in the core of a network are retransmitted by

    the source TCP even though they have not been dropped. A source TCP maintains aretransmission timer that it uses to decide when it should start retransmitting lost

    packets if

    it does not receive an ACK from the destination TCP.

    Optimally, a router buffer needs to be large enough to absorb the burstiness of traffic flows but

    small enough that the RTT remains relatively small, so that packets waiting in queues are not

    mistakenly retransmitted.

    The amount of memory that needs to be assigned to each queue is determined by the speed ofthe link, the behavior of the traffic, and the characteristics of the higher-layer transport protocol

    that provides flow control. For a queue designed to support UDP-based, real-time applications,such as VoIP, a large packet buffer is not desirable, because it can increase end-to-end delay.

    However, for a queue designed to support TCP-based applications, optimal performancerequires that the bandwidth-delay buffer size be calculated using the following formula:

    Buffer_Size = (Port bandwidth) * (longest RTT flow forwarded across the port)

    For example, the size of the buffer required to support a maximum round-trip delay of 100 ms

    on an OC-48c/STM-16 port is ~32 MB.

    Bandwidth Oversubscription

    Voice networks have always been oversubscribed, in that dedicated bandwidth is not reservedfor each potential voice user. Carriers can overprovision their voice networks because there are

    far more voice subscribers than there are voice calls at any given moment. Generally, it is easierto provision a voice network than a data network, because you have a much better

    understanding of the call activity you expect to see at any time of the day than you do of theamount of data traffic your network will be required to transport at the same time. However,

    we have all experienced situations when all circuits are busy during catastrophic events.

  • 7/27/2019 Supporting Differentiated Service Classes in Large IP Networks.pdf

    9/35

    Copyright 2001, Juniper Networks, Inc. 9

    Supporting Differentiated Service Classes in Large IP Networks

    In packet-based networks, statistical multiplexing takes advantage of the fact that each host

    attached to a network is not always active, and when it is active, data is transmitted in bursts.As a result, statistical multiplexing allows you to oversubscribe network resources and supporta greater number of flows than classic TDM using the same amount of bandwidth. This is

    known as the statistical multiplexing gain

    .

    Figure 3 shows three hosts transmitting data. When the network uses classic TDM to access theoutput port, certain time slots remain empty, which causes the bandwidth of those time slots to

    be wasted. In contrast, when the network uses statistical multiplexing to access the output

    port, empty time slots are not transmitted, so this extra bandwidth can be used to support thetransmission of other statistically multiplexed flows.

    Figure 3: Classic TDM vs. Statistical Multiplexing

    Lets examine typical oversubscription numbers used by large service providers. Core links aretypically oversubscribed by a factor of 2X, while access links are generally oversubscribed by a

    factor of 8X (8 times more than the potential capacity going into the core than the core cantransport). As long as the queues in the network usually

    remain empty, the network willcontinue to provide satisfactory performance at these oversubscription levels. If the traffic

    patterns in the network are well-understood, then it is possible to apply an oversubscriptionpolicy that ensures that queues do, in fact, usually remain empty. The oversubscription

    capabilities supported by statistical multiplexing devices offer monetary savings. For example,an oversubscription policy of 20 percent allows packets from almost 23 E-3 circuits (775 Mbps)

    to be aggregated onto a single OC-3/STM-1 circuit (155 Mbps).

    Statistical Multiplexing and Multiple Service Classes

    As a foundation to our discussion of differentiated service classes, there are two key features tokeep in mind regarding statistical multiplexing:

    I

    Statistical multiplexing requires packet buffering during transient periods of congestionwhen the output-port bandwidth is momentarily less than the sum of the rates of the input

    flows seeking to use that bandwidth.

    Flow 3

    Flow 2

    Flow 1

    Multiplexer

    Output Port

    P1P3 P2 Bandwidth Utilization

    Classic TDM

    Statistical

    MultiplexingExtra Bandwidth Available

    Wasted Bandwidth

    Statistical

    Multiplexing

    GainPass 1Pass 2Pass 3

    Pass 1Pass 2Pass 3

  • 7/27/2019 Supporting Differentiated Service Classes in Large IP Networks.pdf

    10/35

    Supporting Differentiated Service Classes in Large IP Networks

    Copyright 2001, Juniper Networks, Inc.

    10

    I

    Statistical multiplexing provides significantly better utilization of the output port

    bandwidth than classic TDM. This enhanced utilization can be approximately four times(4X) greater than classic TDM, depending on the specific traffic flows. The higherutilization of output port bandwidth is the key benefit of statistical multiplexing when

    compared with classic TDM.

    Best-effort Delivery

    IP routers perform statistical multiplexing because they are packet switches. The InternetProtocol is a datagram protocol, where each packet is routed independently of all other packets

    without the concept of a connection. IP has traditionally offered only a single class of service,known as best-effort delivery, where all packets traversing the network were treated with the

    same priority. Best-effort means that IP makes a reasonable effort to deliver each datagram toits destination with uncorrupted data, but there are no guarantees that a packet will not be

    corrupted, duplicated, reordered, or misdelivered. Additionally, there are no promises withrespect to the amount of throughput, delay, jitter, or loss that a traffic stream will experience.

    The network makes a best-effort attempt to satisfy its clients and does not arbitrarily discardpackets. However, best-effort service without the support of intelligent transport protocols

    would lead to chaos. The only reason that best-effort works in global IP networks is becauseTCP does not compromise the network when it experiences congestion, but rather detects andthen responds smoothly to packet loss by reducing its transmission rate. TCP is the basic

    building block that makes the best-effort queue the most well-behaved queue in a router,because it backs off when it experiences congestion.

    Best-effort delivery is not a pejorative term. In fact, the ability to support a single best-effortservice has allowed large IP networks and the Internet to become what they are todaythe

    unchallenged technology of choice for supporting mission-critical applications at a globalscale. However, there are a number of perceived issues related to IPs ability to support only a

    single best-effort class of service and the potential impact on IPs continued commercialsuccess. Some carriers and providers see the need to offer multiple service levels if they are to

    support the deployment of new services, each with different performance requirements, over a

    shared IP infrastructure.

    Differentiated Service Classes

    Supporting multiple service classes for specific applications or customers is concerned with

    treating packets that belong to certain data streams differently from packets that belong toother data streams. Multiple service classes are all about providing managed unfairness

    tocertain traffic classes.

    Differentiated service levels are supported by manipulating the key attributes of certain

    streams to change the customers perception of the quality of service that the network isdelivering. These attributes include:

    I

    The amount of data that can be transmitted per unit of time (throughput),

    I

    The amount of time that it takes for data to be transmitted from one point to another point

    in the network (delay or latency),

    I

    The variation in this delay over time (jitter) for consecutive packets in a given flow, and

    I

    The percentage of transmitted data that does not arrive at its destination correctly (loss).

    However, the quality of service provided to a given service class can be only as good as thelowest quality of service delivered by the weakest link in the end-to-end path.

  • 7/27/2019 Supporting Differentiated Service Classes in Large IP Networks.pdf

    11/35

    Copyright 2001, Juniper Networks, Inc. 11

    Supporting Differentiated Service Classes in Large IP Networks

    The concept of multiple service classes is not applicable to classic TDM services, because if a

    TDM link is up, then bandwidth, delay, and jitter are constant, and the percent of packet loss iszero. Any errors that occur result from bandwidth going to zero, delay going to infinity, andloss going to 100 percent. For classic TDM services, the concept of differentiated service classes

    involves providing different uptime commitments and meeting different customer servicerequirements, restoration times, and so forth.

    The need to provide multiple service classes for customers or applications applies much moreto the delivery of statistically multiplexed services. This is because specific packet flows of

    interest traverse several routers, and the quality of service

    perceived by individual users is afunction of the way that statistical multiplexing is performed at each hop in the path, as well as

    the characteristics of the individual links in the path. By treating some packets differently fromothers when performing statistical multiplexing, a network of routers can offer different kinds

    of throughput, delay, jitter, and loss for different packet flows.

    Finally, supporting differentiated service classes through bandwidth reservations or lower

    oversubscription factors for higher-priority services results in a less efficient use of networkbandwidth than if you provide only a single best-effort statistical multiplexing service.However, you can compensate for your lower bandwidth efficiency by charging your

    subscribers a premium for higher-priority services.

    NOTE

    Once you make the business decision to offer multiple levels of service, it is important to

    perform the analysis that is necessary to determine exactly how much more you need to chargeyour subscribers to maintain your profit margins and to compensate for your loss of

    bandwidth efficiency.

    The Impact of Statistical Multiplexing on Perceived Quality of Service

    In this section, we examine how the statistical multiplexing performed by routers can influence

    the user's perception of the quality of service delivered by a network. The quality of service

    attributes that can be affected by statistical multiplexing include:

    I

    Throughput,

    I

    Delay,

    I

    Jitter, and

    I

    Loss.

    Throughput

    Throughput

    is a generic term used to describe the capacity of a system to transfer data. It is easyto measure the throughput for a TDM service, because the throughput is simply the bandwidth

    of the transmission channel. For example, the throughput of a DS-3 circuit is 45 Mbps.However, for TCP/IP statistically multiplexed services, throughput is much harder to define

    and measure, because there are numerous ways that it can be calculated, including:

    I

    The packet or byte rate across the circuit,

    I

    The packet or byte rate of a specific application flow,

    I

    The packet or byte rate of host-to-host aggregated flows, or

    I

    The packet or byte rate of network-to-network aggregated flows.

  • 7/27/2019 Supporting Differentiated Service Classes in Large IP Networks.pdf

    12/35

    Supporting Differentiated Service Classes in Large IP Networks

    Copyright 2001, Juniper Networks, Inc.

    12

    The most direct way that a router's statistical multiplexing can be tuned to affect throughput is

    by the amount of bandwidth it allocates to different types of packets.

    I

    In classic best-effort service, the router does not specifically control the amount of

    bandwidth assigned to different traffic classes. Instead, during periods of congestion, allpackets are placed into a single first-in, first-out (FIFO) queue. When faced with congestion,

    User Datagram Protocol (UDP) flows continue to transmit at the same rate, but TCP flowsdetect and then react to packet loss by reducing their transmission rate. As a result, UDPflows end up consuming the majority of the bandwidth on the congested port, but each

    TCP flow receives a roughly equal share of the leftover bandwidth.

    I

    When attempting to support differentiated treatment for different traffic classes, each class

    of traffic can be given different shares of output-port bandwidth. For example, a router canbe configured to allocate different amounts of bandwidth to each class of traffic on the

    output port, or one class of traffic can be given strict priority over all other classes, or oneclass of traffic can be given strict priority with a bandwidth limit (to prevent the starvation

    of the other classes) over all other classes. The support of differentiated service classesimplies the use of more than just a single FIFO queue on each output port.

    Delay

    Delay

    (or latency

    ) is the amount of time that it takes for a packet to be transmitted from one

    point in a network to another point in the network. There are a number of factors thatcontribute to the amount of delay experienced by a packet as it traverses your network:

    I

    Forwarding delay,

    I

    Queuing delay,

    I

    Propagation delay, and

    I

    Serialization delay.

    Figure 4 illustrates that the end-to-end delay can be calculated as the sum of the individual

    forwarding, queuing, serialization, and propagation delays occurring at each node and link inyour network.

    Figure 4: End-to-end Delay Calculation

    However, when examining the causes of application delay in your network, it is important to

    remember that the routers represent only a part of the end-to-end path and that you must alsoconsider several other factors:

    Ingress

    Router

    Egress

    Router

    D (Node) D (Link)

    D (Forwarding) D (Queuing) D(Serialization) D (Propagation)

  • 7/27/2019 Supporting Differentiated Service Classes in Large IP Networks.pdf

    13/35

    Copyright 2001, Juniper Networks, Inc. 13

    Supporting Differentiated Service Classes in Large IP Networks

    I

    The performance bottlenecks within hosts and servers

    I

    Operating system scheduling delays

    I

    Application resource contention delays

    I

    Physical layer framing delays

    I

    CODEC encoding, compression, and packetization delays

    I The quality of the different TCP/IP implementations running on these end systems

    I The stability of routing in the network

    Sources of Network Delay

    In this section, we examine each of the sources of delayforwarding, queuing, propagation,

    and serialization delay.

    Forwarding Delay

    Forwarding delay is the amount of time that it takes a router to receive a packet, make a

    forwarding decision, and then begin transmitting the packet through an uncongested outputport. This represents the minimum amount of time that it takes the router to perform its basicfunction and is typically measured in tens or hundreds of microseconds (0.000001 sec). Other

    than deploying the industry standard in hardware-based routers, you have no real control overforwarding delay.

    Queuing Delay

    Queuing delay is the amount of time that a packet has to wait in a queue as the system

    performs statistical multiplexing and while other packets are serviced before it can betransmitted on the output port. The queuing delay at a given router can vary over time

    between zero seconds for an uncongested link, to the sum of the times that it takes to transmiteach of the other packets that are queued ahead of it. During periods of congestion, the queue

    memory management and queue scheduling disciplines allow you to control the amount ofqueuing delay experienced by different classes of traffic placed in different queues.

    Propagation Delay

    Propagation delay is the amount of time that it takes for electrons or photons to traverse a

    physical link. The propagation delay is based on the speed of light and is measured inmilliseconds (0.001 sec). When estimating the propagation delay across a point-to-point link,you can assume one millisecond (1 ms) of propagation delay per 100 mile (160 km) round-trip

    distance. Consequently, the speed-of-light propagation RTT delay from San Francisco to NewYork (6000 mi, 9654 km) is between 60 to 70 ms (0.060 sec to 0.070 sec). Because you cant

    change the speed of light in optical fiber, you have no control over propagation delay.

    It is interesting to note that the speed of light in optical fiber is approximately 65 percent of the

    speed of light in a vacuum, while the speed of electron propagation through copper is slightlyfaster, at 75 percent of the speed of light. Although the signal representing each bit travels

    slightly faster in copper than in fiber, fiber has numerous advantages over copper, because itresults in fewer bit errors, supports longer cable runs between repeaters, and allows more bits

    to be packed into a given length of cable. For example, a 10 Mbps copper interface (traditionalEthernet) transports 78 bits per mile (124 bits per km), resulting in a 1500-byte packet that is

  • 7/27/2019 Supporting Differentiated Service Classes in Large IP Networks.pdf

    14/35

    Supporting Differentiated Service Classes in Large IP Networks

    Copyright 2001, Juniper Networks, Inc. 14

    154 miles (248 km) long. In contrast, a 2.488 Gbps fiber interface (OC-48c/STM-16) transports

    19,440 bits per mile (31,104 bits per km), creating a 1500-byte packet that is only 3,260 feet (994m) long.

    Serialization Delay

    Serialization delay is the amount of time that it takes to place the bits of a packet onto the wirewhen a router transmits a packet. Serialization delay is measured in milliseconds (ms, or 0.001sec) and is a function of the size of the packet and the speed of the port. Since there is no

    practical mechanism to control the size of the packets in your network (other than reducing theMTU or forcing packet fragmentation), the only action you can take to reduce serializationdelay is to install higher-speed router interfaces.

    Table 1 displays the serialization delay for various packet sizes and different port speeds.

    Table 1: Serialization DelayPacket Size vs. Port Speed

    From Table 1, you can see that it takes 7.7 ms to place a 1500-byte packet on a DS-1 circuit. Thisis a significant amount of time if you consider that the typical one-way propagation delay from

    San Francisco to New York (3000 mi, 4827 km) is between 30 and 35 ms. On the other hand, theserialization delay for a 1500-byte packet on an OC-192c/STM-64 port is only 0.0012 ms. In a

    network consisting of high-speed interfaces, serialization delay contributes an insignificantamount to the overall end-to-end delay. However, in a network consisting of low-speed

    interfaces, serialization delay can contribute significantly to the overall end-to-end delay.

    Managing Delay While Maximizing Bandwidth Utilization

    Given that the only component of end-to-end delay that you can actually control is queuingdelay, support for differentiated service classes is based on managing the queuing delay

    experienced by different traffic classes during periods of network congestion. In the absence ofactive queue management techniques, such as Random Early Detection (RED), there is a directrelationship between the bandwidth utilization on a link and the RTT delay. If you maintain a

    5-minute weighted bandwidth utilization of 10 percent, there will be minimal packet loss andminimal RTT delay, because the output ports are generally underused. However, if you

    increase the 5-minute weighted bandwidth utilization to approximately 50 percent, theaverage RTT starts to increase exponentially as the load on your network increases. (Figure 5.)

    DS-1 DS-3 OC-3 OC-12 OC-48 OC-192

    40 byte 0.2073 ms 0.0072 ms 0.0021 ms 0.0005 ms 0.0001 ms < 0.0001 ms

    256 byte 1.3264 ms 0.0458 ms 0.0132 ms 0.0033 ms 0.0008 ms 0.0002 ms

    320 byte 1.6580 ms 0.0572 ms 0.0165 ms 0.0041 ms 0.0010 ms 0.0003 ms

    512 byte 2.6528 ms 0.0916 ms 0.0264 ms 0.0066 ms 0.0016 ms 0.0004 ms

    1500 byte 7.7720 ms 0.2682 ms 0.0774 ms 0.0193 ms 0.0048 ms 0.0012 ms

    4470 byte 23.1606 ms 0.7994 ms 0.2307 ms 0.0575 ms 0.0144 ms 0.0036 ms

    9180 byte 47.5648 ms 1.6416 ms 0.4738 ms 0.1181 ms 0.0295 ms 0.0074 ms

  • 7/27/2019 Supporting Differentiated Service Classes in Large IP Networks.pdf

    15/35

    Copyright 2001, Juniper Networks, Inc. 15

    Supporting Differentiated Service Classes in Large IP Networks

    Figure 5: Bandwidth Utilization vs. Round-trip Time (RTT) Delay

    The challenge when trying to manage delay is that, at the same time, you also need to

    maximize bandwidth utilization in your network for financial reasons. Bear in mind thatbandwidth utilization statistics are meaningful only when the length of the circuit observation

    period is specified. If you measure bandwidth utilization over one nanosecond (0.000000001sec.), you get one of two values: zero percent or 100 percent utilization. If you measure the

    utilization of a circuit over 5 minutes, you get a reasonably damped average. Whenever wediscuss bandwidth utilization here, we always mean a 5-minute weighted average.

    A 5-minute weighted bandwidth utilization of 50 percent doesnt meanjust 50 percentutilization. It means that there are short, sub-second intervals when utilization is close to 100

    percent, queues fill up, and packets are dropped. It also means that there are other periodswhen the bandwidth utilization is close to zero percent, queue depth is zero, and packets arenever dropped. A 5-minute weighted average utilization of 50 to 60 percent is considered

    heavy bandwidth utilization. If financial factors compel you to drive your utilization up to 70or 75 percent, then you dramatically increase the RTT delay and the variation in RTT delay forall applications running across your network.

    So your dilemma is how to optimize the bandwidth utilization of your network while also

    managing queuing delays for delay-sensitive traffic. To find the solution, you must firstdetermine which applications in your network can cope with increasing delay and delay

    variation. TCP-based applications are specifically designed to be rate-adaptive and to cope

    with delay, but there are other types of applications, such as real-time voice, that are unable tooperate smoothly when experiencing long delays or delay variation.

    Therefore, the solution to optimizing bandwidth utilization while also managing queuing

    delays is to isolate the applications that cannot handle delay from the 50 to 60 percentutilization class. You can accomplish this by placing packets from those applications into adedicated queue that does not experience the aggregate delay caused by the high utilization of

    00%

    5-minute Weighted Bandwidth Utilization

    RTT

    Delay

    25% 50% 75% 100%

  • 7/27/2019 Supporting Differentiated Service Classes in Large IP Networks.pdf

    16/35

    Supporting Differentiated Service Classes in Large IP Networks

    Copyright 2001, Juniper Networks, Inc. 16

    the circuit. In effect, you identify a certain set of applications, isolate those applications from

    other types of traffic by placing them into a dedicated queue, and then control the amount ofqueuing delay experienced by those specific applications.

    There are three things that you need to keep in mind with respect to delay in your network:

    I In a well-designed and properly functioning network, queuing delay should be zero when

    measured over time. There will always be extremely short periods of congestion, butnetwork links need to be properly provisioned. Otherwise, queuing delay will increase

    rapidly, because you are asking that too much traffic cross an underprovisioned link.

    I If you examine the relative impact of the factors other than queuing delay (forwarding,

    propagation, and serialization) that contribute to delay, propagation delay is the majorsource of delay by several orders of magnitude.

    I The only delay factor that you can control is queuing delay. The challenge with the otherfactors is that you have no real control over them.

    Jitter

    Jitter is the variation in delay over time experienced by consecutive packets that are part of thesame flow. (See Figure 6.) You can measure jitter by using a number of different techniques,including the mean, standard deviation, maximum, or minimum of the interpacket arrival

    times for consecutive packets in a given flow.

    Figure 6: Jitter Makes Packet Spacing Uneven

    TDM systems can cause jitter, but the variation in delay is so small that, for all practical

    purposes, you can ignore it. In a statistically multiplexed network, the primary source of jitteris the variability of queuing delay over time for consecutive packets in a given flow. Another

    potential source of jitter is that consecutive packets in a flow may not follow the same physicalpath across the network due to equal-cost load balancing or routing changes.

    Jitter increases exponentially with bandwidth utilization, just like delay. You can see this byexecuting a number of pings across a highly used link. You will notice not only an increase in

    delay, but also an increase in the variation of delay.There are a couple of other considerations relevant to jitter in statistical multiplexing networks:

    I In statistically multiplexed networks, the end-to-end jitter is never constant. This is becausethe level of congestion in a network always changes from place to place and moment to

    moment. Unless you are assured that the transmission of a packet will begin immediatelyafter a router s forwarding decision, the amount of delay introduced at each hop in anend-to-end path is variable.

    Network

    Constant Flow of Packets Packets Arrive Unevenly Spaced

    Source Destination

    PC PC

  • 7/27/2019 Supporting Differentiated Service Classes in Large IP Networks.pdf

    17/35

    Copyright 2001, Juniper Networks, Inc. 17

    Supporting Differentiated Service Classes in Large IP Networks

    I ATM has traditionally supported real-time traffic by using 53-byte cells as a way to place an

    upper bound on the amount of delay that a cell is subject to at any single network node.The point is that a 53-byte time period is a lot less than a 1500-byte time period.

    Impact of Jitter on Perceived QoS

    Some applications are unable to handle jitter.

    I With interactive voice or video applications, jitter can result in a jerky or uneven quality tothe sound or image. The solution is to properly provision the network, including the queue

    scheduling discipline, and to condition traffic so that jitter stays within acceptable limitsThe jitter that remains can be handled by a short playback buffer on the destination host

    that buffers packets briefly before playing them back as a smoothed data stream.

    I For emulated TDM service over a statistical multiplexed network, jitter outside of a

    narrowly defined range can introduce errors. The solution is to properly provision thenetwork, including priority queuing, and to condition traffic at the edges of the network so

    that jitter stays within a predefined range.

    However, there are other types of applications (such as those that run over TCP/IP) for which

    jitter is not a problem. Also, for non-interactive applications, such as streaming voice or video,jitter does not present serious problems, because it can be overcome by using large playbackbuffers.

    Loss

    There are three sources of packet loss in an IP network, as illustrated in Figure 7:

    I A break in a physical link that prevents the transmission of a packet,

    I A packet that is corrupted by noise and is detected by a checksum failure at thedownstream node, and

    I Network congestion that leads to buffer overflow.

    Figure 7: Sources of Packet Loss in IP Networks

    Breaks in physical links do occur, but they are rare, and the combination of self-healing

    physical layers and redundant topologies respond dynamically to this source of packet loss.With the exception of wireless networking, when using modern physical layer technologies,the chance of packet corruption is statistically insignificant, so you can ignore this source of

    packet loss also.

    Consequently, the primary reason for packet loss in a non-wireless IP network is due to buffer

    overflow resulting from congestion. The amount of packet loss in a network is typicallyexpressed in terms of the probability that a given packet will be discarded by the network.

    Buffer OverflowBroken Link Corruption

  • 7/27/2019 Supporting Differentiated Service Classes in Large IP Networks.pdf

    18/35

    Supporting Differentiated Service Classes in Large IP Networks

    Copyright 2001, Juniper Networks, Inc. 18

    IP networks do not carry a constant load, because traffic is bursty, and this causes the load on

    the network to vary over time. There are periods when the volume of traffic that the network isasked to carry exceeds the capacity of some of the components in the network. When thisoccurs, congested network nodes attempt to reduce their load by discarding packets. When the

    TCP/IP stack on host systems detects a packet loss, it assumes that the packet loss is due tocongestion somewhere in the network.

    Packet Loss Can Be Good

    It is important to understand that packet loss in an IP network is not always a bad thing. EachTCP session seeks all of the bandwidth that it can for its flow, but it must find the maximum

    bandwidth without causing sustained congestion in the network. TCP accomplishes this by

    transmitting slowly at the beginning of each session (slow-start), and then increasing thetransmission rate until it eventually detects the loss of a packet. Since TCP understands that a

    packet drop means congestion is present at some point in the network, it reacts to thecongestion by temporarily reducing the transmission rate of the flow. Given enough time, each

    TCP flow will eventually settle on the maximum bandwidth it can get across the networkwithout experiencing sustained congestion. When multiple TCP flows do this in parallel, theresult is fairness for all TCP sessions across the network. Thus, occasional packet loss is good,

    because each TCP session needs to experience some amount of packet loss to find all of thebandwidth that it can to handle its flow.

    Host response to network congestion is the same whether your IP network runs over packetsor over an ATM infrastructure, because TCP congestion-avoidance mechanisms are executed at

    the transport layer, not the data link layer. An ATM transport does not possess remarkableproperties that allow you to better control the amount of traffic that a host injects into your

    network, because the applications are native-IP-based, not ATM-based. If you need to controlend-system behavior, you are still required to perform traffic policing or shaping at the ingress

    edges of your network. ATM can only support this at a relatively coarse level, because it is notaware of TCP/IP or the operation of its congestion-avoidance mechanisms. In fact, runningTCP/IP over an ATM infrastructure has a number of well-known limitations (cell tax, the

    number of routing adjacencies required, the inability to identify IP packets in the core of the

    network without reassembly, and so forth) that may actually obscure congestion-avoidanceissues, because there are more network layers that can hide this problem.

    Mindful that a certain amount of packet loss is to be expected in any IP network, how can you

    support differentiated service classes for specific customers or applications by arranging forsome packets to be treated differently from other packets with respect to packet loss? Assume

    that you offer a fixed amount of bandwidth between two points in your network. As long asthe total amount of traffic sent along the path between these two points is less than or equal to

    the agreed-upon throughput, there should be minimal packet loss after TCP sessions stabilize.This assumption allows us to support the differentiated treatment of packets with respect toloss by deploying multiple queues on each port, rather than just a single FIFO queue. The

    output traffic stream is first classified, and then different types of packets are placed intodifferent queues. Finally, each queue is given a different share of the ports bandwidth. As long

    as the amount of traffic placed into each of the queues is less than or equal to the agreed-uponbandwidth for the particular queue, then each queue should experience minimal packet loss

    after the TCP sessions traversing the queue stabilize. (See Figure 8.)

  • 7/27/2019 Supporting Differentiated Service Classes in Large IP Networks.pdf

    19/35

    Copyright 2001, Juniper Networks, Inc. 19

    Supporting Differentiated Service Classes in Large IP Networks

    Figure 8: Multiple Queues with Different Shares of a Ports Bandwidth

    But what do you do if the amount of traffic placed into a given service class exceeds its

    agreed-upon throughput? This becomes a policy decision with a number of options to managethe traffic load:

    I Drop packets that are out-of-profile.

    I Mark the packet, and then forward it with an increased drop probability. If theout-of-profile packet experiences congestion at a downstream node, it can be dropped

    before other in-profile packets are dropped.

    I Queue the packet, and then use traffic conditioning tools to control its rate on egress

    I Transmit an explicit congestion notification (ECN) by setting the congestion experienced(CE) bit in the header of packets sourced from ECN-capable transport protocols.

    It is important to note that, up to this point, we have limited our discussion of packet loss to thecase when a queue becomes 100 percent full. This mechanism is known as tail drop queue

    management, because packets are dropped from the logical back, or tail, of the queue. (SeeFigure 9.)

    Figure 9: Tail-drop Queue Management

    Tail-drop queue management is a simple algorithm that is easy to implement. However, it doesnot discard packets fairly, because it allows a poorly behaved, bursty stream to consume all of

    a queues resources, causing packets from other well-behaved streams to be discarded becausethe queue is 100 percent full.

    30% Bandwidth Queue

    20% Bandwidth Queue

    10% Bandwidth Queue

    Source 1

    Source N

    Source 6

    Source 5

    Source 4

    Source 3

    Source 2

    40% Bandwidth Queue

    Interface

    Scheduler

    Classifier

    Full Queue

    HeadTail

    Tail-drop

  • 7/27/2019 Supporting Differentiated Service Classes in Large IP Networks.pdf

    20/35

    Supporting Differentiated Service Classes in Large IP Networks

    Copyright 2001, Juniper Networks, Inc. 20

    Although you should expect a limited amount of packet loss in any network, significant packet

    loss due to sustained congestion adversely affects the operation of your network. Sustainedcongestion creates critical problems for your network in several ways:

    I The exchange of routing information is disrupted, and this can lead to route instability.

    I The network is no longer able to absorb bursts of traffic.

    I New TCP sessions cannot be established.

    I Each of the existing TCP sessions traversing a heavily congested link begin to experiencesome amount of packet loss. Therefore, since packets from different sessions areinterleaved, all of the sessions begin to experience packet loss at roughly the same time.

    This causes each of the individual sessions to go into slow-start, which creates aphenomenon known asglobal TCP synchronization. When this occurs, all of the TCP sessions

    across the congested link become synchronized, resulting in periodic surges of traffic. As aresult, the link alternates between heavy congestion, as each TCP begins to seek its

    maximum bandwidth, and light use, as all of the TCPs return to slow-start when they beginto experience congestion. This cycle repeats itself over and over again. Depending on wherethe congestion occurs in your network, this phenomenon can involve hundreds, thousands,

    or even tens of thousands of TCP sessions.Random Early Detection (RED) is an active queue management mechanism that combats theproblem of global TCP synchronization, while also introducing a degree of fairness into thediscard-selection process.

    A Brief History of Differentiated Services in Large IP Networks

    The notion of providing more than just a single best-effort class of service has been part of the

    IP architecture for more than 20 years. In this section, we examine some of the historicalapproaches to supporting differentiated service classes in large IP networks.

    The First Approach: RFC 791In September 1981, RFC 791 standardized the Internet Protocol and reserved the second byte of

    the IP header as the type of service (ToS) field. The bits of the ToS byte were defined as Figure10 shows.

    Figure 10: RFC 791 Bit Definitions of ToS Bytes

    The first three bits in the ToS byte (precedence bits) could be set by a node to select the relativepriority or precedence of the packet. The next three bits could be set to specify normal or low

    delay (D), normal or high throughput (T), and normal or high reliability (R). The final two bitsof the ToS byte were reserved for future use. However, very little architecture was provided to

    support the delivery of differentiated service classes in IP networks using these capabilities.

    Precedence D T R Reserved

    0 1 5 6 7432

  • 7/27/2019 Supporting Differentiated Service Classes in Large IP Networks.pdf

    21/35

    Copyright 2001, Juniper Networks, Inc. 21

    Supporting Differentiated Service Classes in Large IP Networks

    The only application of the IP precedence bits until the mid-1990s was to support a feature

    known as selective packet discard (SPD). SPD set the precedence bits for control packets(link-level keepalives, routing protocol keepalives, and routing protocol updates) so that, if thenetwork experienced congestion, critical control traffic would be the last to be discarded. The

    goal was to enhance network stability during periods of congestion. In practice, the DTR bitswere never used.

    The Second Approach: The Integrated Services Model (IntServ)

    Around 1993, comprehensive work began in the IETF to develop a mechanism that would

    allow IP to support more than a single best-effort class of service. The goal was to providereal-time service simultaneously with traditional non-real-time service in a shared IP network.

    This work resulted in the development of the Integrated Services (IntServ) architecture. TheIntServ architecture is based onper-flow resource reservation.

    IntServ Architecture

    The IntServ architecture defined a reference model that specifies a number of different

    components and the interplay among these components:I The resource reservation setup protocol (RSVP) that allows individual applications to

    request resources from routers and then install per-flow state along the path of the packetflow.

    I Two new service modelsguaranteed service and controlled load service. Guaranteedservice provides firm assurances (through strict admission control, bandwidth allocation,

    and fair queuing) for applications that require guaranteed bandwidth and delay. Thecontrolled load service does not provide guaranteed bounds on bandwidth or delay, and

    emulates a lightly loaded, best-effort network.

    I Flow specifications that provide a syntax that allows applications to specify their specific

    resource requirements.

    I A packet classification process that examines incoming packets and decides which of thevarious classes of service should be applied to each packet.

    I An admission control process that determines whether a requested reservation can besupported, based on the availability of both local and network resources.

    I A policing and shaping process that monitors each flow to ensure that it conforms to itstraffic profile.

    I A packet scheduling process that distributes network resources (buffers and bandwidth)among the different flows.

    The IntServ model requires that source and destination hosts exchange RSVP signalingmessages to establish packet classification and forwarding state at each node along the path

    between them. (See Figure 11.)

  • 7/27/2019 Supporting Differentiated Service Classes in Large IP Networks.pdf

    22/35

    Supporting Differentiated Service Classes in Large IP Networks

    Copyright 2001, Juniper Networks, Inc. 22

    Figure 11: Resource Reservation Protocol (RSVP)

    While people in the industry learned a tremendous amount during the development of IntServarchitecture, they eventually concluded that IntServ was not a suitable mechanism to support

    the delivery of differentiated service classes in large IP networks.

    I IntServ is not scalable, because it requires significant amounts of per-flow state and packetprocessing at each node along the end-to-end path. In the absence of state aggregation, theamount of state that needs to be maintained at each node scales in proportion to the

    number of simultaneous reservations through a given node. The number of flows on a

    high-speed backbone link could potentially range from tens of thousands to over a million.I IntServ requires that applications running on end systems support the RSVP signaling

    protocol. There were very few operating systems that supported an RSVP API that

    application developers could access.

    I IntServ requires that all nodes in the network path support the IntServ model. This includes

    the ability to the map IntServ service classes to link-layer technologies.

    While the IntServ model failed, it led to the development and deployment of RSVP, which we

    now use as a general-purpose signaling protocol for MPLS traffic engineering, fast LSPrestoration, and the rapid provisioning of optical links (GMPLS or MPLambdaS). RSVP

    performs very well as a signaling protocol for MPLS because, in this application, it does notexperience the scalability problems associated with IntServ.

    An IntServ Enhancement: Aggregation of RSVP Reservations

    As discussed above, one of the major scalability limitations of RSVP is that it does not have the

    ability to aggregate individually reserved sessions into a single, shared class. In September2001, RFC 3175 ("Aggregation of RSVP for IPv4 and IPv6 Reservations") defined procedures

    that allow a single RSVP reservation to aggregate other RSVP reservations across a large IPnetwork. It proposed mechanisms to dynamically establish the aggregate reservation, identify

    the specific traffic for which the aggregate reservation applies, determine how muchbandwidth is required to satisfy the reservation requirement, and reclaim bandwidth when thesubreservations are no longer required.

    RFC 3175 enhances the scalability of RSVP for use in large IP networks by:

    I Reducing the number of signaling messages exchanged and the amount of reservation state

    that needs to be maintained by making a limited number of large reservations, rather than alarge number of small, flow-specific reservations,

    I Streamlining the packet classification process in core routers by using the Differentiated

    Services codepoint, or DSCP (see the discussion of DiffServ that follows), to identify anaggregated flow, instead of the traditional RSVP flow classification mechanism, and

    I Simplifying packet queuing and scheduling by combining the aggregated streams into thesame queue on an output port.

    Source Destination

    PC

    PCPath

    Resv

    Path

    Resv

    Path

    Resv

    Path

    Resv

    Path

    Resv

    RSVP RSVPRSVPRSVP RSVPRSVP

  • 7/27/2019 Supporting Differentiated Service Classes in Large IP Networks.pdf

    23/35

    Copyright 2001, Juniper Networks, Inc. 23

    Supporting Differentiated Service Classes in Large IP Networks

    Among the potential applications for aggregation of RSVP reservations are these three:

    I Interconnection of PSTN-call gateways across a provider backbone,

    I Aggregation of RSVP paths at the edges of a provider network, and

    I Aggregation of RSVP paths across the core of a provider network.

    One of the strengths of RSVP is that it supports admission control on a per-flow basis. This canbe a powerful tool when supporting premium interactive voice services. Assume that you

    establish an aggregated RSVP reservation to support 1000 voice calls. As long as there arefewer than 1000 active calls, a new call will be accepted by admission control, which will

    allocate adequate bandwidth to support subscriber performance requirements. The 1001st callwill be denied access by admission control, thus preserving the quality of service delivered tothe 1000 established calls.

    As you will see in the next section, the DiffServ model performs admission control on a

    per-packet basis, not on a per-flow basis. This means that, at the edge of a DiffServ domain,calls 1001 through 1100 will be accepted but, because the service class is now out-of-profile,packets will be randomly dropped, thereby impacting the quality of service delivered for all of

    the calls. You can overcome this feature of DiffServ by using a combination of aggregated

    RSVP at the edges of the network to perform per-flow admission control for a voice gatewayplus DiffServ in the core of the network to support application performance requirementsacross the backbone.

    The Third Approach: The Differentiated Services Model (DiffServ)

    Around 1995 or 1996, service providers and various academic institutions began to examine

    alternative approaches to supporting more than a single best-effort class of service, but thistime by using mechanisms that could provide the requisite scalability. As discussed in theprevious section, the failure of the IntServ model was due to the signaling explosion and the

    amount of per-flow state that needed to be maintained at each node in the packet-forwardingpath. As a result, all of these new proposals sought to prevent these scalability issues. Figure 12

    illustrates the cost, relative to complexity, of the new approaches to supporting differentiated

    service classes.

    Figure 12: Cost Relative to Complexity of Differentiated Services Solutions

    At that time, there were a number of different proposals to redefine the meaning of the three

    precedence bits in the ToS byte of the IP header. The proposals ranged from using a single bit,similar to the Frame Relay DE bit, to arbitrary bit definitions and even hybrid approaches,

    where some bits were used for certain functions and the remaining bits were used for other

    Increasing

    Cost

    Increasing Complexity

    Best-effort

    IntServ

    DiffServ

  • 7/27/2019 Supporting Differentiated Service Classes in Large IP Networks.pdf

    24/35

    Supporting Differentiated Service Classes in Large IP Networks

    Copyright 2001, Juniper Networks, Inc. 24

    functions. There was a lot of talk, some vendor code, but never any real production

    deployment. The lack of successful deployment was because routers were software-based, andany attempt to make the packet forwarding process more complicated affected forwardingperformance, so it was simply easier to overprovision congested links.

    By 1997, the IETF realized that IntServ was not going to be deployed in production networks,

    and that the commercial sector had been thinking about supporting differentiated serviceclasses for specific customers or applications in a more coarse-grained and more scalable way

    by using the IP precedence bits. As a result, the IETF created the DiffServ Working Group,

    which met for the first time in March 1998. The goal of this group was to create relativelysimple and coarse methods of providing differentiated classes of service for Internet traffic, to

    support various types of applications and specific business models.

    The IETF Architecture for Differentiated Services

    The DiffServ Working Group has changed the name of the IPv4 ToS octet to the DS byte and

    defined new meanings for each of the bits. (See Figure 13.) The new specification for the DSField is applied to both the IPv4 ToS octet and the IPv6 traffic class octet, so that they use a

    common set of mechanisms to support the delivery of differentiated service classes.

    Figure 13: Differentiated Services Field (DS Field)

    The IETFs DiffServ Working Group divides the DS byte into two subfields:

    I The six high-order bits are known as the Differentiated Services codepoint (DSCP). TheDSCP is used by a router to select the per-hop behavior (PHB) that a packet experiences at

    each hop within a Differentiated Services domain. PHB is an externally observableforwarding treatment applied to all packets that belong to the same service class or

    behavior aggregate (BA).

    I The two low-order bits are currently unused (CU) and reserved for future use. These two

    bits are presently set aside for use by the explicit congestion notification (ECN) experiment.The values of the CU bits are ignored by each node when it determines the PHB to apply to

    a packet.

    The complete DiffServ architecture, defined in RFC 2475, is based on a relatively simple model,

    whereby traffic that enters a network is first classified and then possibly conditioned at the

    edges of the network. Depending on the result of the packet classification process, each packetis associated with one of the BAs supported by the Differentiated Services domain. The BA thateach packet is assigned to is indicated by the specific value carried in the DSCP bits of the DS

    Field. When a packet enters the core of the network, each router along the transit path appliesthe appropriate PHB, based on the DSCP carried in the packets header. It is this combinationof traffic conditioning (policing and shaping) at the edges of the network, packet marking at

    the edges of the network, local per-class forwarding behaviors in the interior of the network,and adequate network provisioning that allow the DiffServ model to support scalable service

    discrimination across a common IP infrastructure.

    0 1 5 6 7432

    Differentiated Services Codepoint (DSCP) CU

  • 7/27/2019 Supporting Differentiated Service Classes in Large IP Networks.pdf

    25/35

    Copyright 2001, Juniper Networks, Inc. 25

    Supporting Differentiated Service Classes in Large IP Networks

    Differentiated Services Domain (DS Domain)

    A Differentiated Services domain (DS domain) is a contiguous set of routers that operate withcommon sets of service provisioning policies and PHB group definitions. (Figure 14.) A DSdomain is typically managed by a single administrative authority that is responsible for

    ensuring that adequate network resources are available to support the service level

    specifications (SLSs) and traffic conditioning specifications (TCSs) offered by the domain.

    Figure 14: Differentiated Services Domain (DS Domain)

    A DS domain consists of DS boundary nodes and DS interior nodes.

    I DS boundary nodes sit at the edges of a DS domain. DS boundary nodes function as both

    DS ingress and egress nodes for different directions of traffic flows. When functioning as a

    DS ingress node, a DS boundary node is responsible for the classification, marking, andpossibly conditioning of ingress traffic. It classifies each packet, based on an examination ofthe packet header, and then writes the DSCP to indicate one of the PHB groups supportedwithin the DS domain. When functioning as a DS egress node, the DS boundary node may

    be required to perform traffic conditioning functions on traffic forwarded to a directlyconnected peering domain. DS boundary nodes connect a DS domain to another DS

    domain or another non-DS-capable domain.

    I DS interior nodes select the forwarding behavior applied to each packet, based on an

    examination of the packets DSCP (they honor the PHB indicated in the packet header). DSinterior nodes map the DSCP to one of the PHB groups supported by all of the DS interior

    nodes within the DS domain. DS interior nodes connect only to another DS interior node orboundary node within the same DS domain.

    Differentiated Service Router Functions

    Figure 15 provides a logical view of the operation of a packet classifier and traffic conditioner

    on a DiffServ-capable router.

    DS Egress

    Boundary Node

    DS Ingress

    Boundary Node

    DS Interior

    Node

    DS Interior

    Node

    BA Traffic

    Microflows

  • 7/27/2019 Supporting Differentiated Service Classes in Large IP Networks.pdf

    26/35

    Supporting Differentiated Service Classes in Large IP Networks

    Copyright 2001, Juniper Networks, Inc. 26

    Figure 15: Packet Classifier and Traffic Conditioner

    Packet Classification

    A packet classifier selects packets in a traffic stream based on the content of fields in the packetheader. The DiffServ architecture defines two types of packet classifiers:

    I A behavior aggregate (BA) classifier selects packets based on the value of the DSCP only.

    I A multifield (MF) classifier selects packets based on a combination of the values of one or

    more header fields. These fields can include the source address, destination address, DSField, protocol ID, source port, destination port, or other information, such as the incoming

    interface. The result of the classification is written to the DS Field to simplify the packetclassification task for nodes in the interior of the DS domain.

    After the packet classifier identifies packets that match specific rules, the packet is directed to alogical instance of a traffic conditioner for further processing.

    Traffic ConditioningA traffic conditioner may consist of various elements that perform traffic metering, marking,shaping, and dropping. A traffic conditioner is not required to support all of these functions.

    I A meter measures a traffic stream to determine whether a particular packet from the streamis in-profile or out-of-profile. The meter passes the in-profile or out-of-profile stateinformation to other traffic conditioning elements so that different conditioning actions can

    be applied to in-profile and out-of-profile packets.

    I A marker writes (or rewrites) the DS Field of a packet header to a specific DSCP, so that thepacket is assigned to a particular DS behavior aggregate.

    I A shaper delays some or all packets in a traffic stream to bring the stream into conformancewith its traffic profile.

    I A dropper (policer) discards some or all packets in a traffic stream to bring the stream intoconformance with its traffic profile.

    Meter

    MarkerShaper/

    Dropper

    Packets Packet

    Classifier

    Traffic Conditioner

  • 7/27/2019 Supporting Differentiated Service Classes in Large IP Networks.pdf

    27/35

    Copyright 2001, Juniper Networks, Inc. 27

    Supporting Differentiated Service Classes in Large IP Networks

    Differentiated Services Router Functions

    Figure 16 illustrates the functions that are typically performed by DS boundary routers and DSinterior routers.

    Figure 16: DiffServ Router Functions

    The DS ingress boundary router generally performs MF packet classification and trafficconditioning functions on incoming microflows. A microflow is single instance of an

    application-to-application flow that is ultimately assigned to a behavior aggregate. A DSingress boundary router can also apply the appropriate PHB, based on the result of this packet

    classification process.

    NOTE A DS ingress boundary router may also perform BA packet classification if it trusts anupstream DS domains packet classification.

    A DS interior router usually performs BA packet classification to associate each packet with a

    behavior aggregate. It then applies the appropriate PHB by using specific buffer-managementand packet-scheduling mechanisms to support the specific packet-forwarding treatment.

    Although the DiffServ architecture assumes that the majority of complex packet classificationand conditioning occurs at DS boundary routers, the use of MF classification is also supportedin the interior of the network.

    The DS egress boundary router normally performs traffic shaping as packets leave the DSdomain for another DS domain or non-DS-capable domain. A DS egress boundary router mayalso perform MF or BA packet classification and precedence rewriting if it has an agreementwith a downstream DS domain.

    DS Egress

    Boundary Router

    DS Ingress

    Boundary Router

    DS Interior

    Router

    DS Interior

    Router

    BA Traffic

    Microflows

    MF Classification

    Traffic Metering

    Packet Marking

    Traffic Shaping/Dropping

    BA Classification

    PHB Support

    Traffic Metering

    Packet Marking

    Traffic Shaping/Dropping

  • 7/27/2019 Supporting Differentiated Service Classes in Large IP Networks.pdf

    28/35

    Supporting Differentiated Service Classes in Large IP Networks

    Copyright 2001, Juniper Networks, Inc. 28

    Per-hop Behaviors (PHBs)

    A per-hop behavior (PHB) is a description of the externally observable forwarding behaviorapplied to a particular behavior aggregate. The PHB is the means by which a DS node allocatesits resources to different behavior aggregates. The DiffServ architecture supports the delivery

    of scalable service discrimination, based on this hop-by-hop resource allocation mechanism.

    PHBs are defined in terms of the behavior characteristics that are relevant to a providers

    service provisioning policies. A specific PHB may be defined in terms of:

    I The amount of resources allocated to the PHB (buffer size and link bandwidth),

    I The relative priority of the PHB compared with other PHBs, or

    I The observable traffic characteristics (delay, jitter, and loss).

    However, PHBs are not defined in terms of specific implementation mechanisms.

    Consequently, a variety of different implementation mechanisms may be acceptable forimplementing a specific PHB group.

    The IETF DiffServ Working Group has defined two PHBs:

    IExpedited forwarding PHB

    I Assured forwarding PHB

    In the future, new DSCPs can be assigned by a provider for its own local use or by newstandards activity.

    Expedited Forwarding (EF PHB)

    According to the IETFs DiffServ Working Group, the Expedited Forwarding (EF) PHB isdesigned to provide low loss, low delay, low jitter, assured bandwidth, end-to-end service.In effect, the EF PHB simulates a virtual leased line to support highly reliable voice or video

    and to emulate dedicated circuit services. The recommended DSCP for the EF PHB is 101110.

    Since the only aspect of delay that you can control in your network is the queuing delay, you

    can minimize both delay and jitter when you minimize queuing delays. Thus, the intent of theEF PHB is to arrange that suitably marked packets encounter extremely short or empty queues

    to ensure minimal delay and jitter. You can achieve this only if the service rate for EF packetson a given output port exceeds the usual rate of packet arrival at that port, independent of the

    load on other (non-EF) PHBs.

    The EF PHB can be supported on DS-capable routers in serveral ways:

    I By policing EF microflows to prescribed values at the edge of the DS domain (this isrequired to ensure that the service rate for EF packets exceeds their arrival rate in the core

    of the network),

    I By ensuring adequate provisioning of bandwidth across the core of your network,

    I By placing EF packets in the highest strict-priority queue and ensuring that the minimum

    output rate is at least equal to the maximum input rate, or

    I By rate-limiting the EF aggregate load in the core of your network to prevent inadequate

    bandwidth for other service classes.

    Generally, you will not use RED as a queue memory-management mechanism when

    supporting the EF PHB, because the majority of the traffic is UDP-based, and UDP does notrespond to packet drops by reducing its transmission rate.

  • 7/27/2019 Supporting Differentiated Service Classes in Large IP Networks.pdf

    29/35

    Copyright 2001, Juniper Networks, Inc. 29

    Supporting Differentiated Service Classes in Large IP Networks

    Assured Forwarding (AF PHB)

    The Assured Forwarding (AF) PHB is a group of PHBs designed to ensure that packets areforwarded with a high probability of delivery, as long as the aggregate traffic in a forwarding

    class does not exceed the subscribed information rate. If ingress traffic exceeds its subscribedinformation rate, then out-of-profile traffic is not delivered with as high a probability as traffic

    that is in-profile.The AF PHB group includes four traffic classes. Packets within each AF class can be marked

    with one of three possible drop-precedence values. The AF PHB group can be used toimplement Olympic-style service that consists of three service classes: gold, silver, and bronze.If you wish, you can further differentiate packets within each class by giving them either low,

    medium, or high drop precedence within the service class. Table 2 summarizes therecommended DSCPs for the four AF PHB groups.

    Table 2: Recommended AF DiffServ Codepoint (DSCP) Values

    The AF PHB groups have not been assigned specific service definitions by the DiffServ

    Working Group. The groups can be viewed as the mechanism that allows a provider to offerdifferentiated levels of forwarding assurances for IP packets. It is the responsibility of each DSdomain to set the quantitative and qualitative differences between AF classes.

    In a DS-capable router, the level of forwarding assurance for any given packet depends on:

    I The amount of bandwidth and buffer space allocated to the packets AF class,

    I The amount of congestion for the AF class within the router, and

    I The drop precedence of the packet.

    The AF PHB group can be supported on DS-capable routers by:

    I Policing AF microflows to prescribed values at the edge of the DS domain,

    I Ensuring adequate provisioning of bandwidth across the core of your network,

    I Placing each AF service class into a separate queue,

    I Selecting the appropriate queue scheduling discipline to allocate buffer space and

    bandwidth to each AF service class, and

    I Configuring RED to honor the three low-order bits in the DSCP to determine how

    aggressively a packet is dropped during periods of congestion.

    Default PHB

    RFC 1812 specifies the default PHB as the conventional best-effort forwarding behavior. Whenno other agreements are in place, all packets are assumed to belong to this traffic aggregate. A

    packet assigned to this aggregate may be sent into a network without following any specific

    AF Class 1 AF Class 2 AF Class 3 AF Class 4

    Low drop precedence 001010 010010 011010 100010

    Medium drop precedence 001100 010100 011100 100100

    High drop precedence 001110 010110 011110 100110

  • 7/27/2019 Supporting Differentiated Service Classes in Large IP Networks.pdf

    30/35

    Supporting Differentiated Service Classes in Large IP Networks

    Copyright 2001, Juniper Networks, Inc. 30

    rules, and the network will deliver as many of these packets as possible, as soon as possible,

    subject to other resource-policy constraints. The recommended DSCP for the default PHB is000000.

    General Observations about Differentiated Services

    In this section, we discuss general observations about the nature of the DiffServ architecture tohelp you understand what you can or should expect if you decide to deploy it. It is important

    to maintain a healthy skepticism about DiffServ, because it does not provide a magic solutionthat has the ability to solve all of the congestion-related problems in your network.

    DiffServ Does Not Create Free Bandwidth

    Routers are statistical multiplexing devices; therefore, they can experience congestion when

    the amount of traffic that needs to traverse a port exceeds the output ports capacity. Thisme