deriving traffic demands for operational ip networks

16

Upload: independent

Post on 14-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Deriving Tra�c Demands for Operational IPNetworks: Methodology and ExperienceAnja Feldmann, Albert Greenberg, Carsten Lund, Nick Reingold, Jennifer Rexford, and Fred TrueAbstract|Engineering a large IP backbone network with-out an accurate, network-wide view of the tra�c demandsis challenging. Shifts in user behavior, changes in routingpolicies, and failures of network elements can result in sig-ni�cant (and sudden) uctuations in load. In this paper,we present a model of tra�c demands to support tra�cengineering and performance debugging of large InternetService Provider networks. By de�ning a tra�c demandas a volume of load originating from an ingress link anddestined to a set of egress links, we can capture and pre-dict how routing a�ects the tra�c traveling between do-mains. To infer the tra�c demands, we propose a measure-ment methodology that combines ow-level measurementscollected at all ingress links with reachability informationabout all egress links. We discuss how to cope with situa-tions where practical considerations limit the amount andquality of the necessary data. Speci�cally, we show how toinfer interdomain tra�c demands using measurements col-lected at a smaller number of edge links | the peering linksconnecting to neighboring providers. We report on our ex-periences in deriving the tra�c demands in the AT&T IPBackbone, by collecting, validating, and joining very largeand diverse sets of usage, con�guration, and routing dataover extended periods of time. The paper concludes with apreliminary analysis of the observed dynamics of the tra�cdemands and a discussion of the practical implications fortra�c engineering.1 IntroductionThe engineering of large, IP backbone networks faces anumber of di�cult challenges. Owing to the astonishingsuccess of Internet applications and the continuing rolloutof faster access technologies, demand for bandwidth acrossbackbones is growing explosively. In addition, shifts inuser behavior, publishing of new Web content, and deploy-ment of new applications result in signi�cant uctuationsin the volume of tra�c exchanged between various hosts inthe Internet. Furthermore, changes in routing policies andfailures of network elements can cause sudden uctuationsin how tra�c ows through the backbone. This leaves net-work operators in the di�cult situation of trying to tunethe con�guration of the network to adapt to changes in thetra�c demands. The task is particularly daunting sincethe Internet Service Provider (ISP) responsible for admin-istering the backbone does not have end-to-end control ofthe path from the source to the destination. The majorityof tra�c in an Internet Service Provider network travelsacross multiple administrative domains.A. Feldmann is with the Computer Science Department at Universit�atdes Saarlandes in Saarbr�ucken, Germany. A. Greenberg, C. Lund, N.Reingold, J. Rexford, and F. True are with the Internet and NetworkingSystems Center at AT&T Labs { Research in Florham Park, NJ, USA.

The networking community has responded with researchand development on increasing link and router capacity andproviding a more easily con�gurable infrastructure. How-ever, relatively little attention has been given to the sys-tems needed to guide the operation and management of theimproved infrastructure. In particular, there has been verylittle work on models for tra�c demands or on techniquesfor populating these models from network measurements.Most existing measurement techniques provide views of thee�ects of the tra�c demands | poor end-to-end perfor-mance (e.g., high delay and low throughput) and heavyload (e.g., high link utilization and long queues). These ef-fects are captured by active measurements of delay, loss, orthroughput on a path through the network [1], or passivemonitoring of individual routers and links [2, 3].However, managing an ISP backbone begs for a network-wide understanding of the ow of tra�c. An accurate viewof the tra�c demands is crucial for a number of importanttasks, such as debugging performance problems, optimizingthe con�guration of the routing protocols, and planning therollout of new capacity. In particular, the recently-formedIETF working group on Internet Tra�c Engineering rec-ognizes that (i) accurate demand models are crucial fore�ective tra�c engineering of IP networks, but (ii) devel-oping such models and populating them via appropriatemeasurements are open problems [4, 5]. These are preciselythe topics we address in this paper. As far as we know, nocomparable study of the network-wide tra�c demands inan ISP backbone has been conducted to date.How should tra�c demands be modeled and inferredfrom network measurements? At one extreme, IP traf-�c could be represented at the level of individual source-destination pairs, possibly aggregating sources and destina-tions to the network address or autonomous system level.Such an end-to-end tra�c matrix would lend insight intothe uctuations in load over the Internet across time. How-ever, representing all hosts or network addresses would re-sult in an extremely large tra�c matrix. In addition, nosingle ISP is likely to see all of the tra�c to and from eachnetwork address, making it virtually impossible to popu-late such a model.Alternatively, IP tra�c could be aggregated to point-to-point demands between edge links or routers in the ISPbackbone, an option suggested in [6] in the context ofMPLS-enabled networks. However, this approach has fun-damental di�culty in dealing with tra�c that traversesmultiple domains. A given destination (network address)1

source

ISP Backbone

destination

Figure 1: Point-to-multipoint IP tra�c demandis typically reachable from multiple edge routers, as shownin Figure 1. As a result, IP tra�c demands are naturallymodeled as point-to-multipoint volumes. This is a simple,but crucial di�erence between IP networks and connection-oriented networks (such as Frame Relay), where demandsare naturally modeled as point-to-point volumes. The setof egress links depends on the ISP's routing policies and theBGP advertisements received from neighboring domains.The selection of a unique link from this set depends on in-tradomain routing information. In the example, supposethe tra�c exits the network via the top egress link. Alink failure or a change in the con�guration of intrado-main routing could cause the tra�c to move to the bottomegress link. A change in the ISP's interdomain policies orthe withdrawal of a route advertisement from a neighbor-ing domain could also alter the ow of tra�c. Modelinginterdomain tra�c as point-to-point would couple the de-mand model to the internal routing con�guration, makingit di�cult to predict how such changes would a�ect net-work load; the routing change itself could have a majorimpact on the point-to-point demands.In this paper, �rst we propose a simple tra�c demandmodel, which e�ectively handles interdomain tra�c. Asdiscussed in Section 2, the model is invariant to changesin the internal routing con�guration, and as such providesa sound basis for tra�c engineering. Our demand modelallows us to predict how changing the internal routingcon�guration impacts the distribution of load on internallinks. Second, we provide a methodology for populatingthe model from usage measurements collected at ingresslinks and reachability information collected at egress links.Third, we consider how to apply the model when practicalconsiderations severely limit the location of usage measure-ments to a much smaller number of edge links. Speci�cally,in Section 3, we propose a methodology for populating theinterdomain demand model when usage measurements arelimited to the links to neighboring service providers, cop-ing (in particular) with not having usage measurements atcustomer access points.Next, we describe our practical experiences applying themethods of Sections 2 and 3 in a large, operational ISPnetwork | the AT&T IP Backbone. This is where wemust confront practical limitations in the usage, con�gura-tion, and routing data available in today's IP networks. InSection 4, we describe the challenges of processing routercon�guration �les, forwarding tables, ow-level measure-ments, and SNMP data collected from multiple locationsin the network over an extended period of time. In par-

ticular, we highlight how we addressed several practicalconstraints that arose in processing the large (and lossy) ow-level measurements. In Section 5, we present resultsshowing the e�ectiveness of the techniques in Section 2and 3. We show that the data sets collected at multipletimes and locations are remarkably coherent, and presenta detailed explanation of the occasional inconsistencies thatarise from network dynamics.Our analysis of the measured demands focuses on thetime scale of tens of minutes to hours or days. Tra�c engi-neering tasks occur on this time scale [7], where fundamen-tal shifts in user behavior and changes in network routingintroduce tra�c variability beyond statistical uctuations.On a smaller time scale, Internet tra�c uctuates in reac-tion to bursty user behavior and congestion control mecha-nisms. In populating our demand model, we focus on largeaggregates of tra�c, rather than the dynamics of individ-ual ows. The distribution of tra�c through the networkis sensitive to the dynamics of interdomain routing, whichmay change the set of egress points for a particular desti-nation. Our demand model can be applied to investigatethe impact of such changes in reachability information, dueto network failures, recon�gurations, or policy changes.2 Tra�c DemandsThis section presents a brief overview of ISP backbone ar-chitectures and routing protocols. We also propose a modelfor IP tra�c demands, and discuss its application to sev-eral important tra�c-engineering tasks. Then, we describehow to compute these demands from ow-level measure-ments at ingress links and reachability information aboutegress links.2.1 ISP Backbone NetworksAn ISP backbone network consists of a collection of IProuters and bidirectional layer-three links, as shown in Fig-ure 2. Backbone links connect routers inside the ISP back-bone, and edge links connect to downstream customers orneighboring providers. Edge links are divided into accesslinks and peering links. For example, an access link couldconnect to a modem bank for dial-up users, a web-hostingcomplex, or a particular business or university campus.Multi-homed customers have two or more access links forhigher capacity, load balancing, or fault tolerance. Peeringlinks connect to neighboring service providers. A peeringlink could connect to a public Internet exchange point, ordirectly to a private peer or transit provider. An ISP oftenhas multiple peering links to each neighboring provider,typically in di�erent geographic locations. Depending onthe contractual relationships, the ISP may or may not al-low a pair of peers to communicate across the backbone.Each link is bidirectional. When carrying tra�c into theISP backbone, an edge link is referred to as an ingress link;when carrying tra�c away from the ISP backbone, an edgelink is referred to as an egress link. Figure 2 illustrates thefour possible scenarios | internal tra�c that travels from2

inboundflow

internalflow

peering links

access links

backbonelinks

transit

outboundflow

flowtransit flow

Figure 2: Tra�c ows in an ISP backbonean ingress access link to an egress access link, transit traf-�c that travels from an ingress peering link to an egresspeering link, inbound tra�c that travels from an ingresspeering link to an egress access link, and outbound tra�cthat travels from an ingress access link to an egress peer-ing link. Much of the tra�c in the Internet must travelthrough multiple domains en route from the source to thedestination. Hence, most of the tra�c in an ISP backboneenters or leaves the network on a peering link. As such, anISP rarely has complete control of the entire path from thesource to the destination. Even for internal tra�c, the cus-tomer exercises control over how the tra�c enters the ISPbackbone, and how the tra�c travels from the egress linkthrough the customer's network to the destination host.The path traveled by an IP packet depends on the inter-play between interdomain and intradomain routing. TheISP employs an intradomain routing protocol, such as OSPFor IS-IS, to select paths through the backbone betweeningress and egress links. Under OSPF and IS-IS, the routersexchange link-state information and forward packets alongshortest paths, based on the sum of link weights chosen bythe ISP. Typically, customers and peers do not participatedirectly in these protocols with the ISP. Communicatingacross domains requires coordination with customers andpeers to exchange reachability information. Interdomainrouting operates at the level of a network address, or pre-�x, consisting of an IP address and a mask length (e.g.,135:207:119:0=24 has a 24-bit mask that speci�es a blockof 256 contiguous addresses). An ISP typically uses staticroutes to direct tra�c toward customers who have a �xedset of network addresses and do not participate in an in-terdomain routing protocol. The Border Gateway Proto-col (BGP) is used to exchange dynamic reachability in-formation with the remaining customers and neighboringproviders.2.2 Demand ModelThe interplay between intradomain and interdomain rout-ing has important implications for how we de�ne a tra�cdemand. The ISP network lies in the middle of the Inter-

net, and may not have a direct connection to the sender orthe receiver of any particular ow of packets. As such, aparticular destination pre�x may be reachable via multipleegress links from the ISP. A multi-homed customer may re-ceive tra�c on two or more links that connect to di�erentpoints in the ISP backbone. Likewise, an ISP may havemultiple links connecting to a neighboring provider. Whena router learns multiple routes to the same destination pre-�x, the ultimate decision of which route to use depends onthe BGP route-selection process. The decision process in-volves multiple steps to select from the set of advertisedroutes. First, import policies may prefer one route overanother. For example, the router may prefer routes viaone neighboring autonomous system over another. Then,the decision process considers the length of the path, interms of the number of autonomous systems involved, fol-lowed by several other criteria [8, 9].Later in the tie-breaking process, the selection of a route(and the corresponding egress link) depends on informationfrom the intradomain routing protocol. For example, sup-pose the BGP selection process results in two routes leavingthe ISP backbone in New York and San Francisco, respec-tively. The egress link for a particular packet would de-pend on where this tra�c entered the network. The packetwould travel to the \closest" egress link, where closenessdepends on the intradomain routing parameters. For ex-ample, tra�c entering the ISP backbone in Los Angeleswould travel to the San Francisco egress link, whereas traf-�c entering the ISP backbone in Washington, D.C., wouldtravel to the New York egress link. Finally, if both egresslinks have the same shortest-path cost, the tie is broken bycomparing the identi�ers of the two routers responsible foradvertising these routes. The dependence on intradomainrouting implies that a change in the backbone topology orrouting con�guration could change which egress link is se-lected. Similarly, if tra�c enters the backbone in a di�erentlocation, the egress link could change.To be practical, our representation of tra�c demandsshould enable experimentation with changes to the net-work topology and routing con�guration. Hence, we as-sociate each tra�c demand with a set of egress links thatcould carry the tra�c. The set represents the outcome ofthe early stages of the BGP route-selection process, beforethe consideration of the intradomain protocol. This is incontrast to models that use a multipoint set to capture un-certainty in the distribution of customer tra�c across a setof di�erent destinations [10]. In our model, the selection ofa particular egress link within the set depends on the con-�guration of intradomain and interdomain routing. TheISP typically has very limited control over the selection ofthe ingress link of the tra�c. The selection of the ingresslink depends on the routing policies of other autonomoussystems and directly-connected customers. For our initialwork on computing and analyzing the tra�c matrix, we donot attempt to model how the ingress link is selected. Ourmodel of a tra�c demand consists of an ingress link, a setof egress links, and a volume of load.3

2.3 Tra�c-Engineering ApplicationsThe volume of load represents the quantity of tra�c dis-tributed from the ingress link to the set of possible egresslinks, averaged over some time scale. This introduces is-sues of both spatial and temporal aggregation. On the oneextreme, it is possible to compute a separate demand foreach source-destination pair that exchanges tra�c acrossthe backbone. On the other extreme, there could be a sin-gle demand for all tra�c with the same ingress link and setof egress links. The appropriate choice depends on the ap-plication. For example, consider the task of optimizing thecon�guration of intradomain routing to balance networkload [11]. This application could combine all tra�c withthe same ingress link and set of egress links into a singledemand. Changes in intradomain routing could a�ect theselection of the egress link for each demand. The details ofwhich packets or ows contribute to the demand are notimportant. Minor extensions to this approach could de�nea separate demand for each tra�c class under di�erenti-ated services. This would enable the route optimizationto consider the load imparted on each link by each tra�cclass.As another application, suppose an ISP is considering achange in its BGP import policies for routes to a partic-ular destination pre�x belonging to another provider. Adestination pre�x that receives a large amount of tra�ccould result in heavy congestion on one or more peeringlinks. Redirecting this load to a di�erent set of egress linkscould alleviate the congestion. BGP route advertisements,or entries in the BGP tables, could be used to determinethe egress links for a destination pre�x. A change in BGPimport policies, such as �ltering a route advertisement orassigning di�erent local preference values, would changethe set of egress links associated with this destination pre-�x. Similarly, network failures, policy changes in other do-mains, and even network congestion could result in uctu-ations in the BGP reachability information [12, 13]. Theseintentional and unintentional changes would result in a newtra�c demand. To experiment with di�erent sets of egresslinks, the ISP would need to know which tra�c is associ-ated with this particular pre�x. For this application, tra�cdestined to this pre�x should not be aggregated with othertra�c with the same ingress link and set of egress links.An ISP may also need to predict the e�ects of addingor moving an access link. For example, the ISP could re-home an existing customer to a di�erent edge router. Inthis situation, all outbound demands associated with thiscustomer should originate from the new location, and allinbound demands would have a new set of egress links tore ect the rehomed access link. This would enable the ISPto predict how rehoming the customer would a�ect theload on the backbone. Similarly, an existing customer mayrequest a new access link for higher bandwidth or fault tol-erance. The new link could be added to the set of egresslinks for inbound demands. The ISP may also have in-formation about how the customer would direct outboundtra�c to its access links. This would enable the ISP to pre-

dict what portion of the existing outbound tra�c from thiscustomer is likely to enter the network on the new accesslink. Finally, the ISP may need to estimate the e�ects ofadding a new customer. In some situations, the ISP mayhave information that can aid in predicting the demands.For example, a customer that hosts Web content may haveserver logs. The tra�c statistics could be aggregated to theclient pre�x level [14] to predict the outbound demands forthe new access link.Ultimately the spatial aggregation of the tra�c demandsdepends on the particular application, ranging from perfor-mance debugging and backbone tra�c engineering to BGPpolicy changes and capacity planning. Likewise, the tem-poral aggregation depends on the application. Long-termcapacity planning could consider the tra�c on a relativelycoarse time scale, whereas debugging short-term perfor-mance problems would require a more careful considera-tion of how load uctuates across time. In our initial studyof tra�c demands, we focus on backbone tra�c engineer-ing [7]. As such, we aggregate tra�c with the same ingresslink and set of egress links into a single demand. In a large,operational ISP network, this results in fairly large num-ber of tra�c demands. The volume of load associated witheach demand is identi�ed by ow-level measurements atthe ingress links. The set of egress links is identi�ed basedon snapshots of the forwarding tables from the routers inthe operational network. Then, we compute the currentset of demands over a variety of time scales and study thetra�c dynamics.2.4 Measurement MethodologyTo compute the tra�c demands, �ne-grain tra�c measure-ments should be collected at all ingress links. This enablesus to identify the tra�c as it enters the ISP backbone.However, collecting packet-level traces at each ingress linkwould be prohibitively expensive. In addition, tra�c engi-neering does not necessarily need to operate at the smalltime scale of individual packets. Instead, we propose that ow-level statistics should be collected at each ingress link.These measurements can be collected directly by the inci-dent router [15, 16]. A ow is de�ned as a set of packetsthat match in the key IP and TCP/UDP header �elds (suchas the source and destination addresses, and port numbers)and arrive on the same ingress link. The router should gen-erate a record summarizing the tra�c statistics on a regularbasis, either after the ow has become inactive or after anextended period of activity. The ow record should includesu�cient information for computing the tra�c demands:the input link and the dest IP address to identify theend-points of the demand, the start and finish times ofthe ow, and the total number of bytes in the ow. (Anyadditional information in the measurement records, such asTCP/UDP port numbers or type-of-service bits, could beused to compute separate tra�c demands for each quality-of-service class or application.) Sampling of the measure-ments may be performed to reduce the total amount ofdata.4

For each ow: (input, dest, start, �nish, bytes)dest pre�x = longest pre�x match(dest, dest pre�x set);egress set = reachability(dest pre�x);start bin = bstart/widthc * width;�nish bin = b�nish/widthc * width;if (start bin == �nish bin)volume[input, egress set, start bin] += bytes;else /* Compute volume of tra�c for each time bin */byte rate = bytes/(�nish - start);volume[input, egress set, start bin] += byte rate * (start bin + width - start);for (time bin = start bin + width; time bin < �nish bin; time bin += width)volume[input, egress set, time bin] += byte rate * width;volume[input, egress set, �nish bin] += byte rate * (�nish - �nish bin);Output for each aggregate: (input, egress set, time bin, volume)Figure 3: Computing tra�c demands based on measurements at ingress linksComputing the tra�c demands requires information aboutthe destination pre�xes associated with each egress link.The aggregation process draws on a list, dest prefix set,of network addresses, each consisting of an IP address andmask length. Each pre�x, dest prefix, can be associatedwith a set of egress links, reachability(dest prefix).In an operational network, these pre�xes could be deter-mined from the forwarding tables of the routers that ter-minate egress links. In particular, each forwarding-tableentry identi�es the next-hop link(s) for a particular pre�x.This enables identi�cation of the pre�xes associated witheach egress link. (The router connected to the egress linkshas the most detailed view of the destination pre�x. Sup-pose a router has egress links that connect to customersthat were assigned contiguous blocks of IP addresses. Thisegress router's forwarding table would have an entry foreach pre�x to direct tra�c to the appropriate access link.However, the other routers in the ISP backbone, and therest of the Internet, do not need such detailed information.As such, the edge router may advertise an aggregated net-work address to the rest of the backbone. In this case,information available at the ingress router would not besu�cient to identify the customer pre�x and the associ-ated set of egress links.)Each ow spans some time interval (from start to finish)and contributes some volume of tra�c (bytes). Comput-ing tra�c demands across a collection of ows at di�erentrouters introduces a number of timing challenges. The owrecords do not capture the timing of the individual pack-ets within a ow. Since tra�c engineering occurs on a timescale larger than most ow durations, we compute demandson a time scale of tens of minutes to multiple hours. Con-sequently, we are not concerned with small variations intiming on the scale of less than a few minutes. We dividetime into consecutive width-second bins. Most ows startand �nish in a single bin. When a ow spans multiplebins, we subdivide the tra�c in proportion to the fractionof time spent in each time period. For example, supposea 10-kbyte ow spent one second in the �rst bin and nineseconds in the second bin. Then, we associate 1 kbyte with

the �rst bin and 9 kbytes with the second bin. The algo-rithm for computing the tra�c demands in summarized inFigure 3.3 Measurement at Peering LinksCollecting �ne-grain measurements at each ingress link wouldbe the ideal way to determine the tra�c demands. In thissection, we extend our methodology to measurements col-lected at a much smaller number of edge links | the linksconnecting the ISP to neighboring providers. We describehow to infer where outbound tra�c enters the backbone,based on customer address information and a model of howtra�c from each of the customer's access links would berouted across the ISP backbone.3.1 Adapted Measurement MethodologyCollecting �ne-grained measurements at every ingress linkwould introduce substantial overhead in a large network.ISP backbones typically include a large number of accesslinks that connect to customers at various speeds. Therouters that terminate these links often vary in function-ality and must perform computationally-intensive accesscontrol functions to �lter tra�c to/from customers. Col-lecting ow-level statistics at every access link is not alwaysfeasible in practice. In some cases, a router may not be ca-pable of collecting �ne-grain measurements. In other cases,enabling measurement would impart a heavy load on therouter or preclude the use of other functionality on therouter. In contrast, a small number of high-end routers areused to connect to neighboring providers. These routerstypically have much fewer links with substantial function-ality (including measurement functions) implemented di-rectly on the interface cards. Collecting �ne-grain mea-surements on these links is much less di�cult. In addition,throughout the Internet, the links between major serviceproviders carry a large amount of tra�c and are vulnera-ble to uctuations in interdomain routing, making it veryimportant to have detailed usage statistics from these lo-5

cations.By monitoring both the ingress and egress links at theselocations, we capture a large fraction of the tra�c in theISP backbone. Measurement data is available at the ingresslinks for inbound and transit tra�c, and at the egress linksfors for outbound tra�c. Reachability data is available atthe egress links for all four types of tra�c. Measuring onlyat the peering links introduces three main issues:� Internal tra�c: Monitoring the peering links doesnot capture the internal tra�c sent from one accesslink to another. Some customer tra�c may travel overparticularly important access links to and from theISP's e-mail, Web, and DNS services. Flow-level mea-surements should be enabled on these access links |e�ectively treating these connections as peering links.� Ambiguous ingress point for outbound tra�c:Computing the outbound demands that travel fromaccess links to peering links becomes more di�cult,since ow-level measurements are not available at theingress points. Inferring how these ows entered thenetwork is the main focus of the rest of this section.� Duplicate measurement of transit tra�c: Mea-suring both ingress and egress tra�c at the peeringlinks may result in duplicate measurements of transittra�c. These ows should not be counted twice.Classifying a ow: The �rst step in computing the traf-�c demands is to classify a ow as inbound, transit, oroutbound, as illustrated in Figure 2. The classi�cation de-pends on the input and output links at the router thatmeasured the ow, as summarized in Table 1. We initiallyfocus our discussion on inbound and outbound ows, anddiscuss transit tra�c in greater depth at the end of the sub-section. For inbound ows, traveling from a peering linkto a backbone link, we can directly apply our methodologyfrom Section 2, since ow-level measurements are availablefrom the ingress link. The process for aggregating the owrecords is summarized in Figure 4, skipping the details fromFigure 3 of dividing the bytes of the ow across multipletime bins.Handling outbound ows: Outbound ows require morecareful handling. The owmeasurements provide two piecesof information that help us infer the ingress link responsi-ble for outbound tra�c | the source IP address and theinput/output links that observed the ow at the egressrouter. The source address indicates which customer gen-erated the tra�c. We can match the source address with acustomer pre�x and, in turn, match this pre�x with a set ofpossible access links that could have generated the tra�c.(Note that we must assume that the source address cor-rectly identi�es the sender of the tra�c. Although this istypically the case, a small fraction of the packets may havespoofed source addresses; that is, the sender may put a bo-gus source address in the IP packet header to evade detec-tion while attacking the destination host.) The pseudocodein Figure 4 draws on a list, src access prefix set, of the

network addresses introducing tra�c at access links. Eachsource pre�x, src prefix, can be associated with a set ofingress links based on the map sendability(). We alsoretain information about the input and output links thatmeasured the ow. This information helps us infer whichof these access link(s) could have originated the tra�c, asdiscussed in Section 3.3.Handling transit ows: Next, we discuss how our method-ology applies to transit tra�c that travels from one neigh-boring provider to another. Transit tra�c falls into twocategories | single-hop and multiple-hop, as shown in Fig-ure 2. A single-hop transit ow enters and exits the ISPbackbone at the same edge router, without traversing anybackbone links; in this case, the ow is measured once, atthis router. A multi-hop transit ow enters at one router,traverses one or more backbone links, and exits at an-other router; in this case, the ow is measured twice |at the ingress and egress points. The best place to cap-ture a transit ow is at its ingress link, where we can applythe methodology of Section 2. To avoid double-countingthe ow, we need to discard the ow records generated bymulti-hop transit ows as they leave the network. This re-quires distinguishing outbound ows (introduced by an ac-cess link) from transit ows (introduced by a peering link).For a ow leaving the network, the algorithm in Figure 4attempts to match the source IP address with customerpre�x. For transit ows, this matching process would fail,and the associated ow record would be excluded.3.2 Identifying Candidate Ingress LinksTo associate each outbound ow with a set of ingress links,we must determine which source IP addresses could in-troduce tra�c on each access link. On the surface, thisproblem seems equivalent to determining the set of desti-nation pre�xes associated with each access link. However,Internet routing is not symmetric. Tra�c to and from acustomer does not necessarily leave or enter the backboneon the same link. Hence, the forwarding table of the routerterminating the access link may not have su�cient informa-tion to identify the source pre�xes. For example, supposea customer with two IP pre�xes has two access links tothe ISP. For load-balancing purposes, the customer maywish to receive tra�c for one pre�x on the �rst access link,and the rest of the tra�c on the second access link. (Thismay involve con�guring static routes for these pre�xes inthe edge routers that terminate the access links. Alterna-tively, the customer may announce these routes to the ISPusing a routing protocol such as RIP or BGP.) In this ex-ample, each pre�x would be reachable via a single accesslink. Yet, the customer could conceivably send tra�c fromeither pre�x via either access link. Hence, the router for-warding tables are not su�cient for identifying the sourceaddresses that might generate tra�c on an access link.Fortunately, an ISP typically knows the IP addresses ofits directly-connected customers. In fact, the customermay be assigned IP pre�xes from a larger address block6

Input Output Classi�cation ActionPeer Backbone Inbound or multi-hop transit Point-to-multipoint demandPeer Peer Single-hop transit Point-to-multipoint demandBackbone Backbone Backbone tra�c Omit owBackbone Peer Outbound or multi-hop transit Identify possible ingress link(s)Omit ow or compute demandTable 1: Flow classi�cation based on input and output linksFor each ow: (input, output, source, dest, start, �nish, bytes)dest pre�x = longest pre�x match(dest, dest pre�x set);egress set = reachability(dest pre�x);if (input.type == peer) /* Inbound or (ingress) transit ow */compute volume[input, egress set, input, output, time bin] for each bin;else /* Outbound or (egress) transit ow */src pre�x = longest pre�x match(source, src access pre�x set);if (src pre�x 6= �)ingress set = sendability(src pre�x);compute volume[ingress set, egress set, input, output, time bin] for each binOutput for each aggregate: (ingress set, egress set, input, output, time bin, volume)Figure 4: Computing tra�c demands based on measurements at peering linksbelonging to the ISP. In other situations, the customeralready has its own block of IP addresses. Informationabout the assignment of address blocks may be availablein a variety of places, including the ISP's database of itscustomers or in a con�guration database. As part of provi-sioning a new customer, the ISP con�gures the router thatterminates the associated access link. Packet �lters arespeci�ed to detect and remove tra�c with bogus source IPaddresses [17]. These packet �lters indicate which sourcesmight send tra�c via a particular access link. The packet�lters for each interface are speci�ed in the router's con�g-uration �le. By parsing the router con�guration �les, wecan determine which source pre�xes to associate with eachaccess link. From this information, we can determine theset of access links associated with each source pre�x.Using packet �lters to identify source IP addresses ismost appropriate for access links to directly-connected cus-tomers that do not connect to other service providers, orhave downstream customers of their own. For customersthat do connect to other service providers, or have down-stream customers of their own, it is di�cult to specify staticpacket �lters for each source pre�x on each possible ingresslink. For example, when a neighboring domain acquiresa new customer, tra�c from these new source addressescould enter the ISP's backbone. Although the downstreamprovider typically performs packet �ltering, these �ltersmay not be known to the upstream ISP. This is a fun-damental issue that arises in the Internet due to the useof dynamic routing protocols based on destination reach-ability information. In these situations, our measurementmethodology would argue for performing ow-level mea-surements at the ingress links, rather than depending onknowing the set of links where these sources could enterthe ISP backbone.

3.3 Matching Flows with RoutesFor inbound and transit ows, the algorithm in Figure 4 re-sults in a point-to-multipoint demand. However, each out-bound ow is associated with a set of ingress links, result-ing in a multipoint-to-multipoint aggregate. Computingpoint-to-multipoint demands for outbound tra�c requiresan additional step to determine which access link initiatedthe tra�c. Knowledge of intradomain routing can help re-solve the ambiguity. For example, consider a source IPaddress that is associated with access links in Los Angelesand Washington, D.C., as shown by the two \?" symbolsin Figure 5. Suppose the customer sends tra�c to a des-tination with egress links in San Francisco and New York,and the actual ow was observed leaving the backbone ona peering link in San Francisco. Suppose that tra�c fromthe Washington, D.C., access link to that destination wouldhave been routed to the New York peering link. Then, the ow observed in San Francisco could not have originatedfrom the D.C. access link. Instead, the ow entered theISP backbone in Los Angeles.Determining whether an outbound ow could have en-tered the network at a given ingress link requires knowl-edge of the backbone topology and intradomain routingcon�guration at the time the ow was measured. For agiven ingress link and set of egress links, we determineon which egress link the ow would exit the network. Ifthis was not the egress link where the ow was observed,then this ingress link can be eliminated from consideration.In fact, knowing the path(s) from the ingress link to theegress link provides additional information. The ow wasobserved as it traveled from an input (backbone) link toan output (peering) link at the egress router. The pathof the ow from the ingress link must include both of the7

For each aggregate: (ingress set, egress set, input, output, time bin, volume)For each a in ingress setroute = Route (a, egress set);if (route does not use input and output links)remove a from ingress set;if (ingress set 6= �)for each a in ingress setdvolume[a, egress set, time bin] += volume/size of(ingress set);else count as a miss;Output for each demand: (a, egress set, time bin, dvolume)Figure 6: Disambiguating the set of ingress links based on routing information

?access links

DC

NY

LA

SF

peering links

route route

?

outbound flowmeasured

Figure 5: Example of disambiguating the ingress linklinks that observed the ow. Otherwise, this ingress linkshould be excluded from consideration. This process mustbe repeated for each of the possible ingress links, as shownin Figure 6. The disambiguation process has three possibleoutcomes:� One ingress link: A single ingress link could havegenerated the tra�c. This is the ideal situation, re-sulting in a single point-to-multipoint demand.� Multiple ingress links: More than one of the can-didate ingress links could have generated the tra�c.This would happen if multiple ingress links would sendthe tra�c to the same egress router, and would en-ter this router on the same input link. Tra�c fromthese access links may follow a similar path throughthe backbone, imparting load on some of the samelinks and routers. When multiple access links couldhave generated the tra�c, the disambiguation processgenerates multiple demands, each with the an equalfraction of the tra�c.� Zero ingress links: If none of the candidate ingresslinks could have generated the tra�c, the disambigua-tion process has failed and the ow record is discarded.These \misses" are discussed in more detail in Sec-tion 5.2.In a similar manner, the routing model is useful for verify-

ing the correctness of the inbound and transit demands.The disambiguation process depends on knowing the pos-sible paths from each ingress link to each egress link. Weobtain this information from a routing model that capturesthe details of path selection in the ISP backbone. For eachpoint-to-multipoint demand, the routing model determinesthe particular egress point as well as the path(s) throughthe network from the ingress link to the egress link. The setof egress links represents the outcome of the early steps ofthe BGP route-selection process. The routing model cap-tures the last two steps | selection of the shortest-pathegress link(s) based on the intradomain routing protocoland tie-breaking based on the router identi�er. The maincomplexity stems from the modeling of intradomain rout-ing. Our routing model [7] captures the details of OSPFrouting in networks with multiple areas, including the split-ting of tra�c across multiple shortest-path routes. Snap-shots of the router forwarding tables from the operationalnetwork have been used to verify the correctness of ourrouting software.4 Data Sets from AT&T BackboneThis section describes our experiences harvesting, parsing,and joining four large data sets, each collected from mul-tiple locations in the AT&T IP Backbone. Monitoring thepeering links produces, on average, one byte of measure-ment data for every 138 bytes of data tra�c. We describehow we join these ow-level measurements with informa-tion from router con�guration �les, router forwarding ta-bles, and SNMP data to compute the tra�c demands.Then, we discuss how we addressed several practical con-straints in processing the large set of ow-level measure-ments.4.1 Data SetsThe computation of the tra�c demands draws on severaldi�erent data sets, as summarized in Table 2:Router con�guration �les: Router con�guration �lesre ect the con�guration of a router as part of the IP net-work. The �le speci�es the con�guration of the router8

hardware, the partitioning of resources (e.g., bu�ers andlink capacity), the routing protocols (e.g., static routes,OSPF, and BGP), and the packet-forwarding policies. Aglobal view of the network topology and con�guration canbe constructed by joining information across the con�gura-tion �les of the various routers in the ISP's backbone [18].This enables us to identify all of the routers and links, andtheir connectivity. In addition, the joined con�guration�les enable us to determine the type of each link (access,peering, or backbone), as well as the packet �lters associ-ated with each access link. This information is importantfor aggregating the ow-level measurements. Finally, thecon�guration �les indicate the link capacities, as well asthe OSPF weight and area for each backbone link, whichare necessary input for the routing model.Router forwarding tables: Each router has a forward-ing table that identi�es the IP address(es) and name(s) ofthe next-hop interface(s) for each destination pre�x (e.g.,\135.207.0.0/16 12.126.223.194 Serial2/0/0:26"). We usethe forwarding tables to associate each destination pre�xwith a set of egress links. The name of the next-hop in-terface is joined with the name of the corresponding egresslink from the router con�guration �les. Joining this infor-mation produces the list, dest prefix set, of destinationpre�xes and the map, reachability(), of each destinationpre�x to a set of egress links. In addition, the forward-ing tables are used to verify the correctness of the routingmodel. Having a snapshot of the forwarding tables closetogether in time enables us to determine how each routerforwarded tra�c toward each destination pre�x. In par-ticular, the forwarding tables enable us to identify whichsubset of the backbone links would carry tra�c destined toa particular pre�x. These paths were checked against theroutes computed by our routing model.Net ow records: The ow-level measurements were col-lected by enabling Net ow [15] on each of routers thatterminate peering links. Each router exports the mea-surements as UDP packets in groups of one to thirty owrecords. These UDP packets are sent to a collection server.Each ow record corresponds to a collection of packets thatmatch in their key IP and TCP/UDP header �elds andwere routed in the same manner (i.e., same input and out-put link, and same forwarding-table entry). The record in-cludes the packet header and routing information, as wellas the time (i.e., start and �nish time in seconds) and size(i.e., number of bytes and packets) of the ow. Our analy-sis focuses on the source and destination IP addresses, theinput and output links, the start and �nish time, and thenumber of bytes. The other �elds in the Net ow recordscould be used to compute separate tra�c demands for dif-ferent subsets of the tra�c.SNMP interface indices: Processing a Net ow recordrequires associating the input and output link that ob-served the ow with the corresponding links in our model

of the network topology. However, the Net ow record iden-ti�es each link in terms of an integer SNMP index, whereasthe forwarding tables and router con�guration �les refer-ence a link by its name and IP address. The SNMP indexis an integer value that uniquely identi�es each interface inthe router. The index does not change unless the interfaceis moved or another interface is installed in the same router.However, this identi�er is not available in the router con-�guration �les or the router forwarding tables. Periodicpolling of the router's SNMP variables allows us to deter-mine the IP address and name associated with each SNMPindex. SNMP data also includes statistics about the num-ber of bytes carried on each link on a �ve-minute time scale.We used these statistics as an independent veri�cation ofthe loads computed by aggregating the Net ow data.4.2 Practical ConstraintsThe processing of the Net ow data introduced several prac-tical challenges, which we brie y summarize:Router clock synchronization: Each router synchro-nizes the clock of its route processor to a central server us-ing the Network Time Protocol (NTP). However, the clockson individual interface cards are not always synchronized,due to a historical bug in Cisco's Internet operating system.We addressed this problem by aligning the Net ow recordscollected on the interface cards with records from the routeprocessor. All timestamps within the Net ow data are rel-ative to a base clock. For each router, it su�ces to adjustthe base clock of the records originating each link withthose originated by the route processor. In post-processingthe Net ow data, we realign the base clock of each inter-face to match with the most recent record from the routeprocessor. The route processor receives a relatively smallnumber of data packets (such as routing protocol tra�c andpackets with expired TTL values), compared to the inter-face cards. Still, ow records are generated by the routeprocessor quite frequently on a busy router; during a sam-pled 24-hour period, the interarrival time of ow recordsfrom the route processor has a mean of 0.32 seconds anda maximum of 91.4 seconds. Hence, the uncertainty intro-duced by correcting for timestamp problems is very smallcompared to the time scale of the subsequent aggregation.Lost measurement data: Net ow records are transmit-ted to the collection server using UDP. As such, the mea-surement data is not delivered reliably. Limited bandwidthto our collection server resulted in loss of up to 90% of theUDP packets during heavy load periods. Nearly all of thesepackets were lost on the link connecting to our measure-ment server, dwar�ng the losses experienced by the Net owdata in the rest of the backbone. The tra�c on the link tothe collection server consists mainly of the UDP Net owdata. The tra�c dynamics typically introduced by TCPcongestion control do not arise in this context. The domi-nance of UDP tra�c, coupled with the limited bandwidth,results in a nearly random sampling of the measurement9

Dataset Location Key FieldsCon�guration �les Router Per interface: IP address, name, type (peer/customer/core), and capacityPer core interface: OSPF con�guration (weight and area)Per customer interface: network addresses for packet �ltersForwarding tables Router Per interface: set of network addresses (IP address and pre�x length)Net ow logs Peer link Per ow: input and output interfaces (SNMP index), src and dest IPaddresses, start and �nish times, and number of bytes and packetsSNMP data Interface Per interface: SNMP index, IP address, name, and utilizationTable 2: Datasets and key �elds used in computing and validating the tra�c demandsNet ow Con�guration Forwarding SNMPLogs Files Tables Indices(all day) (8pm GMT) (4pm GMT) (8pm GMT)11/03/1999 11/03/1999 11/04/1999 11/01/199911/04/1999 11/04/1999 11/04/1999 11/01/199911/11/1999 11/11/1999 11/14/1999 11/08/199911/12/1999 11/12/1999 11/14/1999 11/08/1999Table 3: Collection times for each data setrecords. To test our hypothesis of random packet loss, weanalyzed the loss patterns based on the sequence numbersof the Net ow records that arrived successfully. Detailedanalysis of the loss characteristics showed that the distri-bution of the number of consecutive losses is consistentwith assuming independent loss. Based of the assumptionof random independent loss, we apply a correction factorto the received ow records to account for lost measure-ment data, taking in to account the fact that the loss ratevaries across time and (potentially) across routers. First,we determine the loss rate during each (ten-minute) timeinterval for each router (and each \engine" that exportsNet ow data), based on sequence numbers in the streamof ow records. Then, we assume that ows that are ob-served are statistically similar to other ows that endedduring the same time period. We apply a correction factorbased on the loss probability during that time period, cor-responding to the time that the ow record was exportedto the collection server. This correction factor is appliedto all bytes within the ow. To verify our approach, andto select the ten-minute interval for applying the loss cor-rection, we compared our corrected Net ow data againstindependent link-load statistics from SNMP. For example,Figure 7 plots the utilization of both directions of a peeringlink on a one-hour timescale over the course of a day. Theplots for loss-corrected Net ow match the SNMP statisticsrelatively closely.Data sets from multiple time periods: Computingthe tra�c demands required joining four independent datasets, each collected from multiple locations in the networkat di�erent times during the day. This introduces signi�-cant challenges in joining and analyzing the data. Thesedata sets also provide a unique opportunity to quantify thee�ects of routing instability on an operational network. Ta-ble 3 summarizes the data sets used in the experiments inthe remainder of the section. We focus on four days in

0

10

20

30

40

50

60

70

80

90

100

12/2900:00

12/2903:00

12/2906:00

12/2909:00

12/2912:00

12/2915:00

12/2918:00

12/2921:00

12/3000:00

Util

izat

ion

(%)

Loss Corrected Flow data vs. SNMP: 12/29/99

In:NetflowIn:SNMP

Out:NetflowOut:SNMP

Figure 7: Comparison of link utilization for SNMP andcorrected Net owNovember 1999. November 3 and 4 are a Wednesday anda Thursday, respectively. November 11 and 12 are a Thurs-day and a Friday, respectively. These ow measurementsenable us to compare tra�c on two consecutive days andtwo consecutive weeks. Daily con�guration �les are usedto generate the topology model. Each experiment uses themost recent forwarding tables and SNMP data available.The SNMP data is the least sensitive, since the SNMP in-dex for each link does not change unless the network under-goes a structural change; these changes occur infrequentlyon the routers that terminate peering links. Independentveri�cation assured that this did not occur during the pe-riods of our data collection.5 Experimental ResultsIn this section, we present the results from aggregatingand validating the ow-level measurements collected at thepeering links. Then, we discuss the application of therouting model to disambiguate and validate the demands.In both cases, we discuss the implications of the ambigu-ity of the ingress links for outbound ows, uctuations inegress reachability information, and inconsistencies acrossthe various data sets. Then, we present our initial resultsfrom analyzing the spatial and temporal properties of the10

tra�c demands.5.1 Net ow AggregationThe �rst phase of computing the tra�c demands appliesthe methodology in Figure 4 to the Net ow data. Typi-cally, more than 98% of the bytes observed at the peeringlinks can be mapped to a point-to-multipoint (inbound/transit ows) or multipoint-to-multipoint aggregate (outbound ows),as shown in the \miss" column of Table 4. These mis-matches stem from the three key steps in Figure 4 | (i)identifying the input and output links that observed the ow, (ii) associating the destination IP address with a setof egress links, and (iii) associating the source IP address(of an outbound ow) with a set of ingress links.Identifying the input and output links that observedthe ow: A signi�cant fraction of the misses can be ex-plained by step (i), as shown in the \Out 0" and \Loop"columns in Table 4. Each Net ow record logs the SNMPindices for the input and output links that observed the ow. In our data sets, every Net ow record had valid in-put and output �elds that matched with our SNMP data.Approximately 0:5{0:7% of the bytes in the network hada output link of 0, meaning that the data was delivered tothe route processor. Further inspection of the raw Net owdata reveals that about 0:4% of these bytes stem from traf-�c actually destined to the router 1. The remaining owswith an output link of 0 correspond to unroutable tra�c.For example, a packet with an expired TTL (time-to-live)�eld, as generated by traceroute, would fall in this category.These unroutable packets are directed to the route proces-sor for further handling. The second category of misses(\loop") arises when a ow enters and leaves the router onthe same link. These transient forwarding loops accountfor an extremely small portion of the total bytes in thenetwork (e.g., less than 0:03%).Associating the destination IP address with a set ofegress links: As expected, the matching process is mostsuccessful when measurements are collected at the ingresslink, as seen in the \Inbound (Egress)" column. Still, asmall number of mismatches arise in associating a ow'sdestination IP address with the egress links. That is, thedestination IP address does not match with any of the pre-�xes observed in the snapshots of the router forwardingtables. These mismatches stem from transient changes inreachability information. For example, the destination mayhave been temporarily unreachable when the forwardingtables were collected. Or, perhaps the destination's egresspoint moved from one router to another, with neither snap-shot showing a route to that destination. These kinds of uctuations in reachability information are unavoidable indynamic routing protocols. Fortunately, they did not have1The destination address of these ows is the router's loopback IP ad-dress. This tra�c comes from ICMP (Internet Control Message Protocol)messages, Telnet, and SNMP polling for routine operational tasks.

a signi�cant a�ect on our ability to match the ows. To ver-ify this hypothesis, we identi�ed the top few destinations,responsible for the majority of the missed tra�c, and foundthat these destinations were represented in the forwardingtables collected on subsequent days. Identifying the egresslinks for outbound tra�c has similar challenges, as seen bythe statistics in the \Outbound (Egress)" column2.Associating the source IP address (of an outbound ow) with a set of ingress links: The most challengingpart of aggregating the outbound ows arises in matchingthe source IP address with one or more access links, asshown in the \Outbound (Ingress)" column in Table 4.The aggregation process identi�es at least one candidateingress link for over 99:3% of the outbound bytes. How-ever, matching the source IP address with a set of accesslinks does not necessarily imply that one of these links actu-ally generated the tra�c. This check does not occur untilthe later stage of disambiguating the set of ingress linksbased on the routing model.Overall, the results are consistent across the four ex-periments. However, the November 11 data has a higherproportion of mismatched bytes (4:7% vs. less than 2% forthe other days). These extra misses arise in two categories| egress links for inbound tra�c and ingress links for out-bound tra�c. Both errors relate to customer IP addresses.Upon further inspection, most of these misses stem froma single access link. The access link was upgraded sometime after 8pm GMT, when the con�guration �les werecollected. Hence, our copy of the con�guration �le of therouter terminating the new link had the name and IP ad-dress of the old link. The forwarding table was collectedseveral days later on November 14. In this table, the next-hop entries pointing to the new link are used to direct tra�cto a collection of customer pre�xes. However, in our auto-mated joining of the data sets, we did not associate thesecustomer pre�xes with the old link speci�ed in the con�g-uration �le. Hence, these customer pre�xes were unknownduring the aggregation of the Net ow data. Manually as-sociating the pre�xes with the old link, and repeating theexperiment, reduced the number of egress misses (for out-bound tra�c) from 1:019% of the bytes to 0:468% and thenumber of ingress misses (for inbound tra�c) from 2:939%of the bytes to 0:595%, consistent with results from otherdays.5.2 Route DisambiguationThe second phase of computing demands applies the method-ology in Figure 6 to match the aggregated tra�c withroutes. The disambiguation process is primarily used toinfer the ingress link associated with each outbound de-mand. However, we �nd that the procedure also provides2The statistics for egress matching for outbound tra�c are slightlylower than the corresponding statistics for inbound tra�c. This arisesfrom the operation of our aggregation software, which does not try toidentify a set of egress links for an outbound ow unless one or morepossible ingress links could be identi�ed.11

Inbound OutboundRun Miss Out 0 Loop Egress Ingress Egress11/03/99 1.720 0.691 0.006 0.442 0.451 0.12711/04/99 1.630 0.749 0.010 0.379 0.452 0.03911/11/99 4.720 0.540 0.009 1.019 2.939 0.21011/12/99 1.778 0.563 0.022 0.475 0.642 0.074Table 4: Percent of bytes unmatched in aggregating the Net ow dataa useful consistency check on our initial processing of the ow-level data, and aids in studying the dynamics of theother data sets involved in the computation.5.2.1 Inbound and Transit FlowsOur methodology is most e�ective for inbound and tran-sit tra�c, where measurements are available at the ingresslinks. In this case, the techniques in Section 3 producea point-to-multipoint demand. Still, our experimental re-sults from aggregating the Net ow data are not su�cientto show that we associated each tra�c demand with thecorrect ingress link and set of egress links. The routingmodel provides an important consistency check by verify-ing that tra�c from the ingress link to the set of egresslinks would actually traverse the links that measured the ow. The results of this check are shown in the \Inbound(miss)" column in Table 5, which shows that the routingtest failed for less than 1% of all bytes entering the net-work at the peering links. This is very promising, thoughnot perfect. Not all changes in the set of egress links wouldresult in a change in how the observed tra�c would exit thenetwork. Still, an error rate of less than 1% suggests thatour methodology is e�ective for handling tra�c measuredon ingress links.5.2.2 Outbound FlowsWe expect our approach to be less e�ective for outboundtra�c, due to unavoidable ambiguity about the ingresslinks. In addition, the peering links are vulnerable to uc-tuations in reachability information due to the dynamicsof interdomain routing between neighboring ISPs.Inconsistent forwarding tables at peering links: Ina small number of cases, the forwarding tables at the peer-ing links are inconsistent with the observed ows. That is,the forwarding table suggests that the router that observedthe ow would have forwarded the tra�c to a di�erent link!These inconsistencies are agged in the \baddest" column,and account for 1:0{3:5% of the bytes leaving the networkon peering links. We observed the fewest errors on theNovember 4th data set, where we had forwarding tablesand Net ow data from the same day. To verify our hy-pothesis that these inconsistencies stem from uctuationsin reachability information, we inspected a single day of ow data in greater detail. The source-destination pairsin the a�ected demands typically moved in and out of the

\baddest" category across the day, suggesting that the for-warding table entries in the operational router were chang-ing across time.Disambiguation to a single ingress point: Comput-ing demands for outbound ows is also complicated by un-certainty about which ingress link generated the tra�c.Approximately 35{45% of the bytes leaving the networkat the peering links were associated with multiple candi-date ingress links. Some of this ambiguity could be re-solved by the routing model. In fact, between one-thirdand one-fourth of these bytes could be resolved to a sin-gle ingress link after applying the disambiguation processoutlined in Figure 6, as seen by the \perfect" column in Ta-ble 5. With further inspection, we see that some of thesedemands came from customers with access links on the eastand west coasts. Tra�c from these access links are likely toexit the network on di�erent egress links. However, com-plications arise when a customer has more than one link inthe same region of the country. For example, a single cus-tomer may have two access links terminating on di�erentrouters in the same city. This o�ers protection from thefailure of a single router without forcing customer tra�c toenter the network in a di�erent city.Disambiguation to multiple ingress points in thesame city: The routing model typically cannot disam-biguate tra�c from two access links from the same cus-tomer in the same city. Tra�c from these two access linkswould typically exit the network on the same peering linksfor most (if not all) destination pre�xes, and often follows asimilar path through the ISP backbone. This occurs whenthe intradomain path costs to and from these two accesslinks are very similar, if not the same. In this case, suc-cessfully disambiguating the two access links is not veryimportant! Associating the tra�c with the wrong accesslink does not have much in uence on the ow of tra�cin the backbone. In fact, assuming that the tra�c wassplit evenly across the two links, as shown in Figure 6, isquite reasonable. Customers often con�gure their routersto perform load balancing across their multiple access links,resulting in a nearly even division of the tra�c on the twolinks. Overall, disambiguation to a single city accounts for13{19% of the outbound bytes, as seen in the \one city"column in Table 5. In total, about two-thirds of the am-biguous ingress sets were resolved to one or more accesslinks in a single city (\perfect" or \one city").12

Disambiguation to multiple ingress points in di�er-ent cities: Some customers are multi-homed to routersin di�erent cities, and may even generate tra�c from asingle block of IP addresses on both links. Such multi-homing is useful for additional fault-tolerance, or becausethe customer has sites in multiple geographic locations.When the homing locations are relatively close to eachother, the routing model may not be able to disambiguatethe set of ingress links. This is a situation where addi-tional measurement at the ingress links would be useful.Still, overall, the disambiguation process is quite success-ful. Only 2:5{4% of the bytes could not be associated with(one or more) point-to-multipoint demands. These resultsare shown in the \Outbound (miss)" column in Table 5,which includes the contribution of the \Outbound (bad-dest)" statistics. Based on these results, the rest of thissection focuses on analyzing the statistical properties ofthe observed demands.5.3 Tra�c AnalysisIn this section, we present initial results of a statisticalanalysis of the tra�c demands. We focus on: (i) statisticalcharacteristics of inbound and outbound tra�c, at di�er-ent levels of aggregation (point-to-multipoint demands, orcorresponding point-to-point loads on edge routers), (ii)time-of-day variations in tra�c demands, and (iii) varia-tions at coinciding time intervals within the two weeks.Statistical characteristics of inbound and outboundtra�c: A network with many access and peering linkshas a large number of point-to-multipoint demands. How-ever, a very small proportion of these demands contributethe majority of the tra�c. In Figure 8, we rank point-to-multipoint demands (or point-to-point loads) from largestto smallest, and plot the percentage of the total tra�c at-tributable to each. These plots are restricted to the uppertail of the distribution, accounting for 80% of the total traf-�c. We refer to the particular demands (or loads) in thisupper tail as the heavy hitters. We found the plots to benearly linear on the log-log scale, as is characteristic of aZipf-like distribution, where the contribution of the k-thmost popular item varies as 1=ka, for some a. We foundthis general pattern to hold for all data sets and at all lev-els of temporal and spatial aggregation. Figure 8(b) showsgreater concentration of tra�c over fewer heavy hitters inoutbound versus inbound tra�c. Similar trends have beenseen in earlier studies that consider the load on individuallinks or servers. For example, link-level traces show thatthe distribution of tra�c at the pre�x and AS level fol-lows Zipf's law [19]. Studies of the World Wide Web haveshown that a small fraction of the requests, resources, andservers are responsible for the bulk of the tra�c [20, 21].The small number of heavy hitters has important impli-cations for tra�c engineering. On the positive side, sincethe leading heavy hitters account for so much tra�c, carein routing just these demands should provide most of thebene�t. In addition, when measuring tra�c demands, rela-

tively coarse statistical subsampling should su�ce. On thenegative side, if the internal routing con�guration concen-trates heavy-hitter tra�c on common links, through erroror inherent uctuations in the identities of the heavy hit-ters, the negative impact on performance could be severe.In general, the concentration of demand on a few sourcesopens up the possibility of large-scale network variabilityif these sources change behavior.Time-of-day variations in tra�c demands: In Fig-ure 9(a) we plot the percentage of bytes attributable to thetop 500 point-to-multipoint outbound demands over a halfhour, ranked in decreasing order. The graph includes 48curves, to cover the day. (The identities of the top 500 maychange from half hour to half hour.) There is signi�cantvariation in demand sizes at the highest ranks. We havelooked at the top demands more closely, and found thatthey may exhibit quite di�erent time-of-day patterns. Thisis demonstrated in Figure 9(b), where we have plotted thetime of day variation for three heavy demands entering thenetwork in San Francisco. We informally label these threedemands as domestic consumer, domestic business, and in-ternational, because they correspond of the usage patternsof consumer and business domestic dial tra�c, with inter-national tra�c roughly similar to a time-shifted businesspattern.Tra�c variations at coinciding time intervals withinthe two weeks: To investigate change among the heavyhitters more systematically, we consider grouping the de-mands into quantiles (e.g., the �rst group corresponds tothe highest ranked demands together accounting for 5%of the tra�c, the second group to the remaining highestranked demands accounting for the next 5% of the tra�c,and so forth). How do the groupings change with time-of-day? Figure 10 provides a two-dimensional histogram,where the gray-scale of the i,j-th block indicates the pro-portion of the demands in quantile i in one time periodthat move to quantile j in another time period, h hourslater. In Figure 10(a), the lag h is a half hour, and inFigure 10(b) it is 24 hours. The top demands (top rightcorner) show the least variation. In both cases, the concen-tration of mass along the diagonal indicates little quantilejumping. Demands in a given quantile appear in the samequantile or a nearby quantile 24 hours later. Varying h overthe 24 hour interval we found the mass along the diagonal�rst tends to di�use and the band widens up to h = 12hours, whereupon the mass then tends to concentrate andthe band narrows up to h = 24 hours. These preliminaryresults suggest a certain amount of stability in the identityof the top demands across time.6 ConclusionEngineering a large ISP backbone introduces fundamentalchallenges that stem from the dynamic nature of user be-havior and reachability information, and the lack of end-13

Inbound OutboundDisambiguationRun miss mult. ingress perfect one city mult. cities miss baddest11/03/99 0.83 37.89 9.49 16.27 11.25 3.55 1.5111/04/99 0.71 38.49 9.78 16.42 11.77 2.45 0.9611/11/99 0.14 39.73 11.91 13.75 12.47 4.39 3.4711/12/99 0.98 44.07 11.43 18.95 11.96 4.01 3.12Table 5: Disambiguation statistics

sorted demands

% o

f byt

es

1 5 10 50 100 500

0.05

0.50

5.00

point-to-pointpoint-to-multipoint

sorted demands

% o

f byt

es

1 5 10 50 100 500 1000

0.05

0.50

5.00

outboundinbound

(a) Inbound tra�c (b) Inbound and Outbound tra�cFigure 8: Percent bytes attributed to top ranked tra�c volumes, listed in decreasing orderto-end control over the path of a ow. Yet, careful en-gineering of the network is important, since the routingcon�guration and backbone topology have signi�cant im-plications on user performance and resource e�ciency. Inthis paper, we propose a model of tra�c demands that cap-tures (i) the volume of data, (ii) the entry point into theISP network, and (iii) destination reachability information.This simple abstraction facilitates a wide range of traf-�c engineering applications, such as performance debug-ging, route optimization, and capacity planning. We alsopresent a methodology for populating the demand modelfrom ow-level measurements and interdomain routing in-formation, and apply our approach to a large, operationalISP network. Our analysis of the measured demands re-veals signi�cant variations in demand sizes and popularitiesby time-of-day, but a certain amount of stability betweenconsecutive days.In populating our demand model, we faced three mainchallenges:� Working with four di�erent datasets: Organizingaccess to all data sets during the same time period isdi�cult. Insuring their completeness and consistencyposed both operational and computational challenges.Last, determining how best to join the datasets forcedus to address the questions of subsampling and tem-poral uncertainties between the datasets.

� Ambiguity of ingress points: For a ow measuredonly at its egress link, determining the ingress link ischallenging. This di�culty arises because hop-by-hoprouting (based on the destination IP address) impliesthat downstream routers do not necessarily have (orneed!) information about how packets entered the do-main. In addition, the increasing decentralization ofthe Internet makes it di�cult for any one ISP to knowthe source IP addresses of downstream domains.� Dynamics of the egress points: Policy changes inone domain can have unforeseen implications on thereachability information seen by other ISPs. We seeevidence of this in the churn in the forwarding tablesacross time, and the resulting inconsistencies betweenthe data sets. This complicated the identi�cation theset of egress links for tra�c demands.Despite these challenges, our approach for populating thedemand model performs quite well. Inconsistencies thatarose could be explained by natural network dynamics.Our ongoing work focuses on how to compute the tra�cdemands in real time in an e�cient and accurate manner.This requires a scalable monitoring architecture for collect-ing measurement data:� Flow/packet sampling: Collecting �ne-grain tra�cstatistics introduces a large amount of measurement14

sorted demands

% o

f byt

es

1 5 10 50 100

0.5

1.0

1.5

2.0

2.5

3.0

time

% o

f byt

es

0.0

0.01

0.02

0.03

0.04

0.05

19 23 3 7 11 15 19 23 3 7 11 15

domestic businessdomestic consumerinternational

(a) 48 thirty-minute time intervals (Nov 4) (b) Three demands across a two-day period (Nov 3-4)Figure 9: Time-of-day e�ects in the measured tra�c demands for November 3 and 4

0 5 10 15 20

05

1015

20

0 5 10 15 20

05

1015

20

(a) Demands at 1pm and 1:30pm (Nov 3) (b) Demands on Nov 3 and Nov 4 (1pm)Figure 10: Stability of the measured tra�c demands across time (two-dimensional histograms)data. Relatively aggressive sampling should su�ce forestimating the tra�c demands, especially since thetra�c volumes are dominated by a small number of\heavy hitters" as discussed in Section 5.3. We are in-teracting with router vendors to socialize the need forgood support for packet-header sampling in the linecards. The industry needs to treat measurement func-tionality as a \�rst class" feature of routers and linecards.� Distributed collection infrastructure: Transfer-ring a large amount of measurement records to a cen-tral collection machine places a load on the networkand on the processor. A distributed infrastructurecan substantially reduce the overhead of collecting andprocessing the data. With the core research and de-velopment team for the AT&T IP backbone, we arein the process of developing a distributed collectioninfrastructure for collecting and aggregating measure-

ment data.� Measurement at ingress points: Computing traf-�c demands from ingress measurements is more accu-rate than depending on route disambiguation to iden-tify the ingress points. E�cient sampling techniquesand a distributed collection infrastructure reduce theoverhead of enabling measurement at the edge links.However, some routers and line cards may not havee�cient support for measurement. Selective measure-ment at certain important edge links may be su�cientto capture the bulk of the important tra�c.An accurate, real-time view of topology and reachabilitydata is also important:� Online intradomain data: Computing the tra�cdemands and engineering the ow of tra�c throughthe backbone requires an accurate view of the cur-rent topology and routing con�guration. This requirestracking the up/down status of individual routers and15

links, as well as changes in the con�guration of therouting protocols (e.g., OSPF weights). With our col-leagues, we are in the process of building an intrado-main route monitor that captures this information.� Online reachability data: The computation of thetra�c demands depends on associating each destina-tion pre�x with a set of egress links. The set of egresslinks changes over time. However, frequent dumpingof the forwarding tables imposes a high overhead onthe router. Monitoring the BGP advertisements sentthrough the backbone is a more e�cient way to collectreachability information. With the core research anddevelopment team, we are in the process of buildingan interdomain route monitor that tracks this infor-mation.In addition to evolving the monitoring infrastructure, weplan to devote more attention to the analysis our measuredtra�c demands. The network-wide view of con�gurationand usage data in an ISP backbone provides a rich opportu-nity to characterize the uctuations in IP tra�c demands.Our initial analysis suggests that these demands have in-teresting spatial and temporal properties with signi�cantimplications for Internet tra�c engineering. Further sta-tistical analysis of this data would lend insight into newtechniques for the design and operation of IP backbonenetworks.AcknowledgmentsNumerous colleagues in AT&T Labs and AT&T IP Serviceshave provided valuable feedback on this work. We wouldlike to thank Han Nguyen for his role in formulating andsupporting the NetScope project that motivated this work,and Jay Borkenhagen, Michael Langdon, Anthony Longhi-tano, Muayyad Al-Chalabi, Quynh Nguyen, Fred Stringer,and Joel Gottlieb for helping us access and understand thenetwork measurement and con�guration data. Finally, wethank Walter Willinger, Michael Merritt, and the anony-mous reviewers for their valuable suggestions on how toimprove the presentation of the material.References[1] V. Paxson, G. Almes, J. Mahdavi, and M. Mathis,\Framework for IP performance metrics." Request forComments 2330, May 1998.[2] W. Stallings, SNMP, SNMPv2, SNMPv3 and RMON1 and 2. Addison-Wesley, 1999.[3] K. Thompson, G. J. Miller, and R. Wilder, \Wide-area Internet tra�c patterns and characteristics,"IEEE Network Magazine, vol. 11, pp. 10{23, Novem-ber/December 1997.[4] D. O. Awduche, A. Chiu, A. Elwalid, I. Widjaja, andX. Xiao, \A framework for Internet tra�c engineer-ing." Internet Draft draft-ietf-tewg-framework-02.txt,work in progress, June 2000.

[5] D. O. Awduche, \MPLS and tra�c engineering in IPnetworks," IEEE Communication Magazine, pp. 42{47, December 1999.[6] X. Xiao, A. Hannan, B. Bailey, and L. Ni, \Tra�c en-gineering with MPLS in the Internet," IEEE NetworkMagazine, March 2000.[7] A. Feldmann, A. Greenberg, C. Lund, N. Reingold,and J. Rexford, \NetScope: Tra�c engineering for IPnetworks," IEEE Network Magazine, March 2000.[8] B. Halabi, Internet Routing Architectures. CiscoPress, 1997.[9] J. W. Stewart, BGP4: Inter-Domain Routing in theInternet. Addison-Wesley, 1998.[10] N. Du�eld, P. Goyal, A. Greenberg, P. Mishra, K. Ra-makrishnan, and J. van der Merwe, \A exible modelfor resource management in virtual private networks,"in Proc. ACM SIGCOMM, September 1999.[11] B. Fortz and M. Thorup, \Internet tra�c engineeringby optimizing OSPF weights," in Proc. IEEE INFO-COM, March 2000.[12] C. Labovitz, R. Malan, and F. Jahanian, \Inter-net routing stability," IEEE/ACM Trans. Networking,vol. 6, pp. 515{558, October 1998.[13] C. Labovitz, A. Ahuja, and F. Jahanian, \Experimen-tal study of Internet stability and wide-area networkfailures," in Proc. International Symposium on Fault-Tolerant Computing, June 1999.[14] B. Krishnamurthy and J. Wang, \On network-awareclustering of Web clients," in Proc. ACM SIGCOMM,August/September 2000.[15] Cisco Net ow.http://www.cisco.com/warp/public/732/net ow/index.html.[16] S. Handelman, S. Stibler, N. Brownlee, and G. Ruth,\RTFM: New attributes for tra�c ow measurement."Request for Comments 2724, October 1999.[17] P. Ferguson and D. Senie, \Network ingress �ltering:Defeating denial of service attacks which employ IPsource address spoo�ng." Request for Comments 2267,January 1998.[18] A. Feldmann and J. Rexford, \IP network con�gu-ration for tra�c engineering," Tech. Rep. 000526-02,AT&T Labs { Research, May 2000.[19] W. Fang and L. Peterson, \Inter-AS tra�c patternsand their implications," in Proc. IEEE Global InternetSymposium, December 1999.[20] M. E. Crovella and A. Bestavros, \Self-similarityin World Wide Web tra�c: Evidence and possi-ble causes," IEEE/ACM Trans. Networking, vol. 5,pp. 835{846, December 1997.[21] L. Breslau, P. Cao, L. Fan, G. Philips, andS. Shenker, \Web caching and Zipf-like distributions:Evidence and implications," in Proc. IEEE INFO-COM, pp. 126{134, March 1999.16