a distributed and scalable routing table manager for the next generation of ip routers

9

Click here to load reader

Upload: ieeexploreprojects

Post on 18-Nov-2014

313 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: A distributed and scalable routing table manager for the next generation of ip routers

IEEE Network • March/April 20086 0890-8044/08/$25.00 © 2008 IEEE

he explosive growth of the Internet has resulted invery stringent scalability requirements on routers andother network systems. Until very recently, core net-work operators met these requirements by adding

more routers — usually, mid-size routers — in their networks.This approach, referred to as router cluster, imposes extracost for management and maintenance, particularly when thenumber of connections grows very quickly. We present a morecost-effective approach, where a cluster of mid-size routers isreplaced by a next-generation router with a very large switch-ing capacity. One of the main challenges for this new approachis that router architectures have not evolved much recentlywith respect to the increased traffic demand. For example, thethroughput per single chassis of the recent Cisco CRS did notincrease compared to the previous 12000 router series (bothhave 1.2 Tbps). The reason for this is that these routers aredesigned with only one powerful controller card, using therouter- cluster approach rather than more powerful routers.In recent routers (e.g., the Cisco CRS), the throughput isincreased only if more chassis are added with additional con-trol cards. In fact, very few control tasks are offloaded to theline cards, mostly for forwarding information table (FIT)management. Because all line cards share a single controlcard, current architectures are not scalable.

Thus, an innovative solution that is based on the task shar-ing between control and line cards is required to increase thescalability of routers. The solution enables router modules tobe added as capacity requirements increase, and it guarantees

equal performance of the routing software componentsregardless of the number of physical interfaces, router adja-cencies, and IP routes. The resiliency also is improved by theredundancy and replication of critical functions over multiplemodules. The availability is provided with the modular struc-ture that limits the impact of faults in individual modules.With a modular design, routing software components can runindependently on the same or separate central processingunits (CPUs) and interact with each other, regardless of theirrespective physical location. This approach produces a robustnetwork that is not vendor-specific and can use modulesdeveloped by different manufacturers.

The routing table manager (RTM) [1] is one of the mainsoftware component of the router. It links the different rout-ing protocol modules. In core routers, RTM plays an impor-tant role by managing all of the best routes coming fromvarious sources. Possible sources are the different routing pro-tocols, such as Open Shortest Path First (OSPF), Intermedi-ate System to Intermediate System (IS-IS), or Border GatewayProtocol (BGP). The RTM also gathers information from dif-ferent sources such as the static routes configured by the sys-tem user and the dynamic routes. Based on all of the routeinformation, the RTM module computes the overall bestroutes. The RTM also is responsible for redistributing routescoming from a routing protocol to other routing protocols. Inaddition, it also can filter route information that is beingredistributed.

With the ever-increasing number of interconnections

TT

Kim-Khoa Nguyen, Brigitte Jaumard, and Anjali Agarwal, Concordia University

AbstractIn recent years, the exponential growth of Internet users with increased bandwidthrequirements has led to the emergence of the next generation of IP routers. Dis-tributed architecture is one of the promising trends providing petabit routers with alarge switching capacity and high-speed interfaces. Distributed routers aredesigned with an optical switch fabric interconnecting line and control cards. Com-puting and memory resources are available on both control and line cards to per-form routing and forwarding tasks. This new hardware architecture is not efficientlyutilized by the traditional software models where a single control card is responsi-ble for all routing and management operations. The routing table manager playsan extremely critical role by managing routing information and in particular, a for-warding information table. This article presents a distributed architecture set uparound a distributed and scalable routing table manager. This architecture alsocomes provides improvements in robustness and resiliency. The proposed architec-ture is based on a sharing mechanism between control and line cards and is ableto meet the scalability requirements for route computations, notifications, andadvertisements. A comparative scalability evaluation is made between distributedand centralized architectures in terms of required memory and computingresources.

A Distributed and Scalable Routing Table Manager for the Next Generation of IP Routers

JAUMARD LAYOUT 3/6/08 1:45 PM Page 6

Authorized licensed use limited to: GOVERNMENT COLLEGE OF TECHNOLOGY. Downloaded on January 22, 2009 at 02:21 from IEEE Xplore. Restrictions apply.

Page 2: A distributed and scalable routing table manager for the next generation of ip routers

IEEE Network • March/April 2008 7

between routers, the size of the routing table managed by theRTM module tends to increase rapidly. This requires routersto have more CPU cycles, more powerful accompanying hard-ware resources, and an increased memory size to contain allavailable routing information. Until recently, the only validsolution to support the increasing Internet traffic was to peri-odically upgrade the router control card on which the RTMmodule was running or to replace the whole router with a newone, having more powerful hardware resources (e.g., CPU andincreased memory size), demanding some service interrup-tions. An alternate solution is to implement distributed andscalable routers [2].

In this article, we describe the benefits and limitations of adistributed router design and propose a distributed architec-ture for the RTM. We first review the hardware architectureof next-generation routers and provide an overview of thefunctionality of the RTM. The critical issues for a centralizedRTM architecture are then discussed, leading to a proposal ofa completely distributed architecture for the RTM. We thenpresent a comparative scalability evaluation of the proposeddistributed architecture with a centralized one, in terms ofrequired memory and computing resources.

Next-Generation Routers and the RoutingTable ManagerThe first and second generations of IP routers were basicallymade of a single central processor running all routing proto-col modules and with multiple line cards interconnectedthrough a shared bus. Their performance depends on thethroughput of the shared bus and on the speed and capabili-ties of the central processor; and therefore, they are not ableto meet today’s bandwidth requirements. The third generationor the current generation routers were introduced to solvebottlenecks of the second generation [3]. The switch fabricreplaces the shared bus: it is a crossbar connecting multiplecards together, thus providing ample bandwidth for transmit-ting packets simultaneously among line cards. These routershave a set of line cards, a set of forwarding engines, and a sin-gle control card that are interconnected through a switch fab-ric. The header of an incoming packet entering a line cardinterface is sent through the switch fabric to the appropriateforwarding engine. The forwarding engine determines towhich outgoing interface the packet should be sent. Thisinformation is sent back to the line card through the switchfabric, which forwards the packet to the egress line card.Other functionality, such as resource reservation and mainte-nance of the routing table, are handled by the modules run-ning on the control card.

The architecture for next-generation routers is essentiallyswitch-based. However, the switching capacity is enhanced upto petabits per second [4]. The hardware architecture of theserouters is based on three types of cards (Fig. 1a):

•The line card provides multiple gigabit interfaces. Theingress network processor (iNP) is programmable with paral-lel processing capability. It does packet forwarding, classifica-tion, and flow policing. The iNP contains a FIT that is used todetermine the destination of data packets. Control packetscan be filtered and forwarded to the CPU for processing. Theingress traffic manager (iTM) forwards the packets from theiNP to the switch fabric while maintaining traffic load balanc-ing using traffic access control, buffer management, and pack-et scheduling mechanisms. Data packets travel through theswitch fabric to the egress line card, and control packets aresent to the control card. The egress traffic manager (eTM)receives packets from the switch fabric plane directly connect-

ed to its line card, performs packet re-ordering, and controlscongestion. The egress network processor (eNP) sends out thepackets with per-egress-port output scheduling mechanisms.The CPU is multi-purpose and able to perform control planefunctions with the help of the built-in memory.

•The control card or route processor is designed to run themain routing protocol modules (i.e., BGP, OSPF, IS-IS, andmultiprotocol label switching [MPLS]), the RTM, and thecommand line interface (CLI). The control card architectureis similar to a line card, but its processing power and storagecapabilities are far superior, and there is no interface to exter-nal devices. The control card has one iTM chip and one eTMchip to provide interfaces between the local processor and theswitch fabric planes. They are responsible for managing flowsof control packets.

•The control and line cards are interconnected by a scal-able switch fabric that is distributed into identical and inde-pendent switching planes. The switch fabric is made ofso-called matrix cards that provide data switching functions.Per-flow scheduling, path balancing, and congestion manage-ment within the switch fabric are achieved by the fabric trafficmanager chipsets integrated on the matrix card. Each linecard or control card has an ingress port and an egress portconnecting to a matrix card. Each switching plane is made ofthe same number of matrix cards. Several topologies may beused to connect the matrix cards. The Benes topology [4] isrecommended, due to its non-blocking characteristics.

One of the most important software components of therouter is the RTM. It builds the FIT from the routing databasethat stores all routes learned by different routing and signal-ing protocols, including the best and the non-best routes. Fora set of routes having the same destination prefix, only oneroute is deemed the best, which is based on a pre-configuredpreference value assigned to each routing protocol. For exam-ple, if static routes have a high preference value and OSPFroutes have a low preference value, and if a route entry hav-ing the same destination prefix was recorded by each protocol,the static route is considered to be the best route and is addedto the FIT (Fig. 1b). However, some services, such asResource Reservation Protocol (RSVP), can use non-bestroutes to forward data with respect to user-defined parame-ters. Therefore, the RTM must keep all routes, allow users orrequested modules to access the route database, and makerouting decisions based on request next hop and explicit routeresolution; notify any change in the routing tables generatedby the underlying routing protocols (e.g., Routing Informationprotocol [RIP], OSPF, IS-IS, BGP); alert the routing proto-cols about the current state of physical links, such as theup/down status, available bandwidth, and so on to manageassociated link states and indirectly route status; communicatewith a policy manager module for making route filtering deci-sions for routing protocols (e.g., OSPF or BGP); and alert therouting protocols about resource reservation failures.

Another requirement for the RTM is to contain a verylarge number of routes, such as the ever increasing BGProutes. Because router vendors do not increase the memory ofthe main control card by much, Internet service providers(ISPs) are very careful about the amount of informationrouters must store.

Toward a Distributed RTM ArchitectureIn legacy routers, although a data packet is transmitted overline cards, forwarding decisions are made by the control cardor separated forwarding engines. Consequently, data transmis-sion is interrupted if the control card fails. The integration ofthe forwarding table on line cards as in the recent router

JAUMARD LAYOUT 3/6/08 1:45 PM Page 7

Authorized licensed use limited to: GOVERNMENT COLLEGE OF TECHNOLOGY. Downloaded on January 22, 2009 at 02:21 from IEEE Xplore. Restrictions apply.

Page 3: A distributed and scalable routing table manager for the next generation of ip routers

IEEE Network • March/April 20088

products [5], enables them to perform non-stop forwarding; inother words, a data packet can be transmitted regardless ofthe control card failures. Making a local copy of the forward-ing table on a line card is easy, and the control plane need notbe changed. A non-stop routing system [6] enables the controlcard to self-recover from a failure without disrupting the rout-ing protocol interaction with other routers and without drop-ping any packets. Having back-up control cards is not enoughto achieve this goal due to the large time gap to switch over.

In addition, the control card is usually costly. A possible solu-tion is based on protocol extensions that impose a peer routerto wait for the failed one to be restarted [7]. However, it maynot be supported by all legacy routers. A better solution forthe next-generation router is to design a more scalable controlplane, which we address in this article with a specific architec-ture for the RTM.

In recent router products, the RTM module is neither dis-tributed nor scalable (Fig. 2a). Legacy routers consist princi-

n Figure 1. Next generation routers and the RTM: a) architecture of the next generation router; b) rout-ing database update and selection of the best routes by the RTM.

Legend: O: OSPF S: Static

30.0.0.0 is directly connected, Serial 0

30.0.0.0/24 via 20.2.1.1, Ethernet 0

CLI Control card

OSPF

RTM Rotary database

0 30.0.0.0/24 via 20.2.1.1, 00:00:03, Ethernet 0S 30.0.0.0 is directly connected, Serial 0

Forwarding information table (FIT)

Forwarding engine(Network processor-NP)

Linecard

Switch fabric

S 30.0.0.0 is directly connected, Serial 0

FITIP

IP

iTM

RTM

Control card

Switch fabric

ISIS BGPOSPF MPLS

CPUand

memory

eTM

Forwarding engine(Network processor-NP)

Linecard

FIT

CPU Memory

FIT

iNP eNP

Line card 1

Datapacket

Controlpacket

Legend: Data flow Control messages Route update

Interfacespecificchipset

iTM eTM

IP FIT

CPU Memory

FIT

iNP eNP

Line card 2

Controlpacket

(a)

(b)

Datapacket

Interfacespecificchipset

iTM eTM

Forwarding information table (FIT)

S 30.0.0.0 is directly connected, Serial 0

JAUMARD LAYOUT 3/6/08 1:45 PM Page 8

Authorized licensed use limited to: GOVERNMENT COLLEGE OF TECHNOLOGY. Downloaded on January 22, 2009 at 02:21 from IEEE Xplore. Restrictions apply.

Page 4: A distributed and scalable routing table manager for the next generation of ip routers

IEEE Network • March/April 2008 9

pally of an RTM located on a single control card [1] that pro-cesses information from all routing protocols and networks towhich the router connects. There is no RTM module runningon any line card. Resiliency can be enhanced at the controlcard level by a back-up instance that takes over the primaryone in case of failures.

One of the primary requirements for next generationrouters is scalability [4]. In general, a router installed in acore network must exchange control messages with hun-dreds of peers. Due to the growing bandwidth, a largenumber of line cards will be added to the router platform.This imposes several challenges to the operation of routingprotocols. Current generation routers provide terabitthroughput, whereas next-generation routers will reachpetabit throughput. Such a system will be able to supportabout one hundred thousand routes with high flappingrates, which exceeds the capacity of a single control card.Therefore, task sharing should be taken into account tomake the system more scalable. One of the possible solu-tions is to have additional control cards [9]. Each controlcard runs an instance of a routing protocol module or man-ages certain parts of the global routing table. However, thecontrol cards are often costly, and the processing capabili-ties are not improved much due to the quantity and delayof messages exchanged between the different control andline cards in a system.

Some recent products also have been introduced with stan-dalone modules responsible for each protocol. Each protocolmodule is attached with a smaller RTM, denoted by InteriorGateway Protocol (IGP)-RTM or Exterior Gateway Protocol(EGP)-RTM, managing the routes coming from differentdomains of the routing protocols as shown in Fig. 2b [9]. Theglobal RTM (G-RTM) collects best routes from IGP/EGPRTMs to build the FIT. When a routing protocol receives alink-state notification message through the corresponding sig-naling component on a line card, the control component

located on the control card re-computes the best routes andupdates its local IGP-RTM. The G-RTM also is notifiedthrough the link with the IGP-RTM. The overall best routesof the system are selected among those provided by differentprotocols. The route update of each routing protocol is adver-tised by the G-RTM to other routing protocols in order tonotify the neighbors. Finally, the overall best routes areupdated to the FIT on the line cards through the connectionwith the G-RTM.

Such an architecture enables the routing protocols to havea flexible access to the routing tables managed by the G-RTM. The resiliency is improved because the routing protocolstill can use the IGP/EGP-RTMs when the G-RTM tempo-rary fails. However, the following are critical issues:• Although the IGP/EGP-RTMs are distributed on a per pro-

tocol basis, they are basically independent processes run-ning on the same control card. This leads to heavy resourceconsumption and to some overloading of the control cardas the number of routes increases.

• In the case where routing protocols are distributed on theline cards to improve the scalability and fully exploit theavailable memory and CPU resource of the line cards [9],the IGP/EGP-RTM modules also must be migrated to theline cards.

• It is not very efficient to perform the FIT update operationsat the control card level by G-RTM because the FITs arehosted by the line cards.To make the control plane more scalable, some router

vendors and researchers have introduced early productswhere some protocol functions are implemented in a dis-tributed way. For example, the OSPF Hello protocol isdesigned to run at line card level in the Avici TSR product[6]. Similar work also was presented in [8]. However, to thebest of our knowledge, no product or router model with adistributed route management function has been introducedin the market yet.

n Figure 2. Current RTM architectures: a) RTM in a non-distributed routing architecture; b) RTM distributed on the control card.

Control card

ISIS-RTM

ISIScontrol

ISIScontrol

OSPFcontrol

Line card

IP

BGPcontrol

MPLScontrol

OSPF-RTM

OSPFcontrol

BGP-RTM

BGPcontrol

MPLS-RTM

MPLScontrol

(a) (b)

ISIS

RTM

Forwardingengine

OSPF BGP MPLS HW HW

HW

HW

Control card

FIT FIT

FIT

IP

G-RTM

FIT

Switch fabricSwitch fabric

Line card

IP

JAUMARD LAYOUT 3/6/08 1:45 PM Page 9

Authorized licensed use limited to: GOVERNMENT COLLEGE OF TECHNOLOGY. Downloaded on January 22, 2009 at 02:21 from IEEE Xplore. Restrictions apply.

Page 5: A distributed and scalable routing table manager for the next generation of ip routers

IEEE Network • March/April 200810

Proposed Model for Distributed RTMBasically, the RTM module is responsible for managing therouting tables and the routing policy modules. It also providesAPIs that allow exchanges of routing information obtainedfrom routing protocols for processing and making path deci-sions.

To take advantage of the new generation router architec-ture, which provides ultra-high internal switching speed andadditional processing and memory resource on line cards, weinvestigate the ability to move some functions of the RTMfrom the control card to the line cards. This proposal targetsnext-generation routers with petabit switching capacity, fullmemory, and processing capabilities on line cards. Theserouters also are designed with a distributed control planewhere some parts of the routing protocols run on line cards[9].

Our distributed model of RTM consists of two main com-ponents (Fig. 3a):• Each line card handles a line card (LC)-RTM process. The

LC-RTM obtains route information from the local instanceof the routing protocol modules running on its line cardand computes the best routes for each network domainassociated with the line card, depending on its port connec-tions. This task can be achieved by exchanging informationamong the line cards connected to the same domain and insome cases (i.e., for interdomain routes), it may obtain therouting information from the control card in order to makerouting decisions at the platform level. Line cards are orga-nized in a cluster framework, where each cluster corre-sponds to a set of line card ports. Most often, all ports of agiven line card are connected to the same domain, there-fore each cluster will usually correspond to a domain orsub-domain in a network (Fig. 3b).

• G-RTM runs on a control card and obtains routing infor-mation from LC-RTMs to update the routing table andconsequently, the forwarding table of the router. The G-RTM also manages the configuration of static routes con-figured by users (through an external routing policymodule) or traffic engineering (TE) based routes. Addi-

tional control cards can be added to share processing tasksor to save back-up information of the G-RTM used forresiliency purposes. However, load balancing and G-RTMresiliency are beyond the scope of this article. We alsoassume that there is a routing policy module located onthe control card that allows users to configure route filter-ing policies and IGPs/EGPs interworking and to modifypath attributes for BGP routing protocols according tospecific policies.We investigate the capability of using the distributed model

proposed for an RTM based on the following aspects:• Link state notification: the RTMs must be notified of the

changes in the routing information generated by the under-lying routing protocols (i.e., RIP, OSPF, IS-IS, BGP) or bythe user so that the best routes and/or TE/quality of service(QoS)-based routes are re-computed and the forwardingtable is updated.

• Advertisement: the RTMs must send an alert message to therouting protocols about the current state of physical links,such as the available bandwidth. This information helps therouting protocols to update their link state database(LSDB), to flood QoS-related information to the routingdomain, or to build QoS forwarding tables.

• Path computation: best routes or TE/QoS-based routes arecomputed based on information collected from differentrouting protocols and from the user through a CLI. Whenthe RTM is distributed on the line cards, information pro-vided by each process must be consistent and unique forthe whole platform.

• QoS and traffic engineering: The routing policy module onthe control card establishes the QoS and TE-based routersfor specific connections and replaces the existing bestroutes. These routes can be defined by the user or by QoS-enabled protocols such as Resource Reservation Protocolwith Traffic Engineering Extensions (RSVP-TE) or con-straint-based routing label distribution (CR-LDP). Notethat traffic behavior is not dealt with in this article.The distributed RTM model may be required to handle

additional platform specific functions that are not consideredin this article, as mentioned in the following:

n Figure 3. Proposed RTM distributed architecture: a) distributed RTM architecture on the control card and line card; b) distribution ofRTM.

(b)

LC-RTM

“Master”line card 1

Control card

LSDBLine card 1

Line card 2Line card N

OSPFlinkBGPlink

RSVP-TE

LDP

IP forwarding engine

(a)

FIT

LC-RTM

Switch fabric

G-RTM

G-RTM

FITIP

OSPFcontrol

MPLScontrol

BGPcontrol

ISIScontrol

Routingpolicy

RoutingtableControl

card Routingtable

Routeadver-tise-ment

FITEGPIGP

LC-RTM

Line card 2

FITEGPIGP

LC-RTM

Line card N

FIT

C2 CNCluster

C1EGPIGP

JAUMARD LAYOUT 3/6/08 1:45 PM Page 10

Authorized licensed use limited to: GOVERNMENT COLLEGE OF TECHNOLOGY. Downloaded on January 22, 2009 at 02:21 from IEEE Xplore. Restrictions apply.

Page 6: A distributed and scalable routing table manager for the next generation of ip routers

IEEE Network • March/April 2008 11

• Management of routing tables generated by underlying uni-cast and multicast protocols.

• Management of the static routing tables (containing defaultroutes or the routes to often-accessed networks).

• Management of the routing tables per virtual private net-work (VPN) basis (allowing overlapping addresses or ser-vice to each VPN).

• Asynchronous notifications to users about changes in therouting tables.Although the RTM plays a central and key role in a

router, its architecture has never been revealed by manu-facturers. Some recent research [8] stated that the routingtable management and update functions should remain inthe control card. The authors’ conclusion is suitable formedium scale routers where the computing and memoryresources on line cards are not sufficient to support inde-pendent LC-RTM (e.g., routers having ten line cards andone hundred interfaces). The model we propose in thisarticle deals primarily with very large scale core routershaving up to thousands of line cards with petabit switch-ing capacity. The available memory on each line card isalso on the order of tens to hundreds of Mbytes in total.Such a router is less concerned with resource limitationproblems.

Figure 4 presents the architectures we propose for G-RTM and LC-RTM. The inter-card communicationbetween LC-RTMs located on line cards and the G-RTMlocated on the control card or among LC-RTMs isachieved by a specific communication channel called distri-bution services (DS). Designed as an abstract layer, it alsoprovides a synchronization mechanism to manage moduleactivations, monitoring, and state transitioning facilities(active, back-up, in-service upgrade, etc.). DS maintains adistribution database that enables requested modules toobtain appropriate data. The G-RTM is able to record theFIT through routing socket services provided by the IPstack. Route update information can be received fromneighbor routers through interfaces between LC-RTM androuting protocols running on its line card (Fig. 4b). Also,route advertisements can be sent to neighbor routers usingthe same interface.

Basically, the model we propose works as follows.

Link State NotificationIn the proposed architecture, LSDBs are stored on linecards, making them locally available to the requested pro-cesses such as LC-RTM or RSVP-TE. Recall that the LSDBis specific to a routing domain, such as OSPF areas. In acentralized model, the LSDB is handled by the control card;hence, synchronization is not required. In our distributedmodel, we must ensure that all line cards connected to arouting domain maintain the same LSDB. This can beachieved by having a line card acting as a master, assumingthe path computation for the cluster of line cards connectingto the same routing domain. When a line card in the clusterreceives a link state notification message, it forwards themessage to the master. The master updates its database andsynchronizes other line cards in its cluster. An appropriateelection mechanism for the master line card is required foreach cluster. To simplify the architecture, we can assign thefirst line card on which the routing protocol is activated asthe master for that cluster.

AdvertisementWhen a line card detects a change on its physical link states,or a link state is changed by the user, the LC-RTM located onthe line card is asked to broadcast a notification to all linecards in the router. Routing tables must be recalculated, andnotifications are sent to the neighbor routers.

Path ComputationThe path computation is processed on a routing protocolbasis. For link-state routing protocols (e.g., OSPF, IS-IS),path computation can be performed by the master line card ofthe cluster. On the other hand, distance vector-based proto-cols send the route update information they obtain fromneighbors to the control card in order to perform the compu-tation. Basically, the path computation process proceeds asfollows:• The routing protocol modules receive update information

from neighbors or detect local link modifications by them-selves.

• The LC-RTM running on the same line card is notified.Based on the protocol identification, it decides to send thenotification to the G-RTM located on the control card or

n Figure 4. Architecture of G-RTM and LC-RTM: a) architecture of G-RTM located on a control card; b) architecture of LC-RTMlocated on a line card, and its interfaces with OSPF and BGP routing protocols.

BOP

BGPimportpolicy

BGP4_SEND

RIB-OUT

BGP exportpolicy

BGP4_RECV

LSA

OSPF

DS

LC-RTM

RiB-IN

RiB-local

LC-RTM

(OSPF) (BGP)

RTM API

DSRTM API

Distributiondatabase

Distributiondatabase

Routingsocket

APIRoutingsockets

Sockets

RTM API

DS

IP FIT

IPStaticroute D

S

G-RTM

MPLS DS

Policydistri-bution

(a)(b)

DS

API

RTM

Distributiondatabase

JAUMARD LAYOUT 3/6/08 1:45 PM Page 11

Authorized licensed use limited to: GOVERNMENT COLLEGE OF TECHNOLOGY. Downloaded on January 22, 2009 at 02:21 from IEEE Xplore. Restrictions apply.

Page 7: A distributed and scalable routing table manager for the next generation of ip routers

IEEE Network • March/April 200812

forward this information to the master line card of the clus-ter to which it belongs.

• The G-RTM or appropriate master line card runs specificalgorithms (e.g., Dijkstra for link-state based protocols orBellman-Ford for distance vector-based protocols) to buildthe network topology and produce the best routes.

• A new route or an updated route is registered to the for-warding tables located on line cards through the G-RTM.

QoS and Traffic Engineering SpecificationThe model we propose provides user QoS and traffic engi-neering specification functions through an interface betweenthe routing policy module and the G-RTM, both located onthe control card. QoS and TE-based routes also can beestablished using specific protocols like RSVP-TE or CR-LDP. In that case, specific parameters will be updated, firstto the LC-RTM; then computed routes will be updated tothe G-RTM.

Offloading some of processing tasks from the control cardto line cards helps to reduce potential bottlenecks experiencedon the control card when the number of requests increases,resulting from an increasing number of routes, hence of linecards, to be supported by the core router. The LC-RTM alsois able to react rapidly to the physical link modification andefficiently exploit additional resources available on the linecards of next-generation routers. In addition, the model wepropose has the following advantages:• Scalability: It balances the path computation load between

the control card and the line cards. RTM functions are dis-tributed as far as possible, allowing the control card to beavailable for more complicated tasks, such as router man-agement and user interaction.

• High availability: Because route information and LSDBshave a back up on the line cards, we provide a high redun-dancy level for RTMs. Also, problems on the control cardwill not slow down the procedures on the line cards.

• Robustness: In our architecture, the path computation isperformed on each cluster instead of the whole router,which leads to a rapid convergence in case of topologychanges. Routing information and notification also can

arrive faster and more efficiently to the required modulesbecause they can be provided directly by the LC-RTMs.Communication among routing protocols and RTMs is alsomore efficient, and the bandwidth of the switch fabric canbe saved.

Implementation and Scalability EvaluationTo manage the BGP routes, the RTM has two tables. Therouting information base — input (RIB-IN) handles theroutes advertised by BGP neighbor routers (so called BGPspeakers). The routing information base — local (RIB-LOC)contains the routes the router discovers by itself (e.g., physicallinks of the line card or the routes learned by other protocolssuch as OSPF). By combining these two tables, the RTMdetermines the best routes for the BGP, which are stored inthe routing information base — output (RIB-OUT) table, tak-ing into account the additional user policy configurations.Then, the RIB-OUT table is advertised to the BGP neighborrouters.

The LC-RTM has access to the LSDB managed by theOSPF module running on the same line card. This enables theOSPF to be updated with the route changes and the link sta-tus information managed by the LC-RTM. The OSPF bestroute computation is achieved by the OSPF module so theLC-RTM is not involved in this process. However, the finalresults will be stored in the routing table through the RTMAPI services.

The functions provided by the G-RTM and the LC-RTMsare implemented as APIs. They include the store, access,look-up, list, remove, update, and back-up functions. Eachfunction is represented by a type-length-value (TLV) struc-ture. A module, for example, MPLS, can execute an RTMfunction by sending a message containing this data structureto the G-RTM or LC-RTM. The Type field is the name of theoperation, followed by the length of the structure; the Valuefield contains additional information on the function, such asthe parameters to be processed.

Based on a local routing table, such a distributed RTMarchitecture helps to compute effectively the constrainedshortest path first (CSPF) routes. On the line card, the LC-RTM module consists of two main components (Fig. 5):• The traffic engineering database (TED) contains the topol-

ogy and resource information of the cluster. The TED maybe fed by an IGP protocol instance running on the sameline card or on the control cards.

• The path computation element (PCE) achieves the pathcomputation based on a network graph and applies compu-tational constraints during the computation. We investigatethe distributed path computation model in the interdomain,intradomain, and interlayer context.–Interdomain path computation may involve the associationof topology, routing, and policy information from multipledomains. This can be performed at the LC-RTM level.–Intradomain path computation deals with the routinginformation coming from a single domain. This is achievedby routing protocols running on the line cards, such asOSPF or IS-IS.–Interlayer path computation aims at performing the pathcomputation at one or multiple layers while taking intoaccount topology and resource information at these layers.This is achieved by the LC-RTM and local QoS (L-QoS)modules.The CSPF computation process can be described as fol-

lows.• The RSVP-TE module on the ingress line card of the router

receives a PATH message from the upstream router.

n Figure 5. Line card components of the distributed RTM.

Adjacentnode

LC-RTM

Trafficengineering

database

Pathcomputation

element

Signalingprotocols (LDP,

RSVP-TE)

Signaling

Routing

JAUMARD LAYOUT 3/6/08 1:45 PM Page 12

Authorized licensed use limited to: GOVERNMENT COLLEGE OF TECHNOLOGY. Downloaded on January 22, 2009 at 02:21 from IEEE Xplore. Restrictions apply.

Page 8: A distributed and scalable routing table manager for the next generation of ip routers

IEEE Network • March/April 2008 13

• The RSVP-TE module on the ingress line card checks theadmission status (grant/deny) for the new request based oninformation in the TED.

• The LC-RTM computes the next hop (downstream) routerusing the PCE and the traffic engineering database.–In case of interdomain path computation, the request issent to the master of the domain, which is able to build theinterdomain topology with other domains.–In case of intradomain path computation, routing protocolmodules running on the same line card are invoked.–In case of interlayer path computation, the PCE uses infor-mation contained in the traffic engineering database.The egress line card connecting to the downstream router

is contacted in order to forward the PATH message.

Scalability EvaluationTo compare the scalability achieved by the distribution archi-tecture and the centralized architecture, we estimate the num-ber of exchanged messages, the number of consumed CPUcycles, and the amount of required memory in the two archi-tectures with different router configurations.

The router configuration parameters for the scalabilityevaluation of our architecture are as follows:• Number of line cards the router supports. The more line

cards that are added, the higher the connectivity the routerhas.

• The number of interfaces (ports) located on a line card.They are optical interfaces with high capacity (10-40 Gb/s).In practice, a line card can have about 10 ports. In our eval-uation, we use this configuration.

• The number of routing protocols (Np) currently running onthe router. In practice, a router may support one or moreof the following protocols: RIP, OSPF, IS-IS, BGP, MPLS,Label Distribution Protocol (LDP), and RSVP.The number of messages going through the switch fabric is

almost the same on both the centralized and the distributedarchitectures. In the centralized architecture, link notificationmessages received by all line cards are forwarded to the G-RTM on the control card through the switch fabric. In the dis-tributed architecture, they are forwarded to the master linecard of each cluster and only the best routes are sent to theG-RTM on the control card. Therefore, our architecture doesnot increase the traffic on the switch fabric.

In the centralized architecture, all available routes arestored by the G-RTM located on the control card, thus itoccupies a lot of memory. In the proposed distributed archi-tecture, available routes of each cluster are kept by a masterline card using the line card memory. Therefore, we comparethe memory requirement of the control card in the centralizedarchitecture and the master line cards in the distributed archi-tecture. As can be seen from Fig. 6a, the memory requirementincreases with the number of line cards and the number ofprotocols running on the router. Figure 6a also shows thememory requirement on the control card for the distributedarchitecture, which is reduced considerably because most ofthe memory requirement has been moved to the line card.

Although some optimization techniques can be deployed tosave memory on the recent routers, the proposed architectureconsiderably improves the scalability of the router by dis-tributing route storage, especially the label switched path(LSP) storage, on line cards. Hence, each line card will saveonly the LSPs going through the line card. In the centralizedarchitecture, each line card must store all the LSPs goingthrough the router.

For the centralized architecture, there is no RTM on linecards, so the CPU resource for the RTM is consumed mainlyon the control card. On the other hand, the CPU cycles forthe RTM are used mainly on master line cards in the pro-posed architecture. Therefore, we compare these two conges-tion points in Fig. 6b. We can see that the CPU utilization ismuch higher in the centralized architecture than on each mas-ter line card. In other words, the distribution enables the loadon the control card to be transferred to the master line cardsso the control card congestions can be avoided. Each masterline card serves only a small set of line cards; therefore, itscapacity can satisfy the current demand. Even if the size of acluster increases, we can still divide it into smaller segmentswith a master for each segment to avoid the bottlenecks.

ConclusionThe RTM is one of the most important components of arouter. It plays a decisive role for routing performance and con-nectivity of the network. In this article, we presented a noveldistributed architecture model for the RTM for next-generationIP routers. The model we propose can exploit additional com-

n Figure 6. Performance comparison between the centralized and the proposed distributed architectures: a) memory used by RTMs in ourproposed distributed architecture and in the centralized architecture; b) CPU resources used by RTMs in our proposed distributed archi-tecture and in the centralized architecture.

Number of line cards0

10

1

100

1000

10,000

100,000

16 32 48 64

Memory (kbytes)

80 96 112 128 144

Np = 1, CC, CentralizedNp = 2, CC, CentralizedNp = 3, CC, Centralized

Np = 1, Master, ProposedNp = 2, Master, ProposedNp = 3, Master, Proposed

Np = 1, CC, ProposedNp = 2, CC, ProposedNp = 3, CC, Proposed

Number of line cards0

10

1

100

1000

10,000

100,000

16 32 48 64

Number of CPU cycles

80 96 112 128 144

JAUMARD LAYOUT 3/6/08 1:45 PM Page 13

Authorized licensed use limited to: GOVERNMENT COLLEGE OF TECHNOLOGY. Downloaded on January 22, 2009 at 02:21 from IEEE Xplore. Restrictions apply.

Page 9: A distributed and scalable routing table manager for the next generation of ip routers

IEEE Network • March/April 200814

puting and memory resources that are available in line cardsand the very high-speed communication channel among linecards. This model can use the highly scalable hardware archi-tecture of IP routers efficiently. Routes can be computed moreefficiently and in a scalable manner, based on interfacesbetween the LC-RTMs and routing protocols running on linecards. The robustness, availability, and resiliency of the routeralso can be considerably improved. The scalability evaluationbetween the proposed and a centralized architecture in termsof required memory and computing resources shows that theload of the control card has been moved to the line card, thusenabling the router to support a larger number of line cards.

AcknowledgementThe authors would like to thank Hyperchip, Inc. for providingus with financial support. The project also benefited from thesupport of the Concordia Research Chair of B. Jaumard onthe optimization of communication networks.

References[1] A. Zini, Cisco IP Routing, Addison-Wesley, 2002, pp. 80–111.[2] O. Hagsand, M. Hidell, and P. Sjodin, “Design and Implementation of a Dis-

tributed Router,” Proc. 5th IEEE Int’l. Symp. Signal Processing and Info.Tech., Dec. 2005, pp. 227–32.

[3] A. Csaszar et al., “Converging the Evolution of Router Architectures and IPNetworks,” IEEE Network Mag., vol. 21, no. 4, July–Aug. 2007.

[4] H. J. Chao and B. Liu, High Performance Switches and Routers, Wiley-Inter-science, 2007.

[5] Cisco Systems, “Cisco 12000 Series Internet Router Architecture”;http://www.cisco.com

[6] H. Kaplan, “Non-Stop Routing Technology,” white paper, Avici Systems Inc.,2002.

[7] M. Leelanivas, Y. Rekhter, and R. Aggarwal, “Graceful Restart Mechanismfor Label Distribution Protocol,” IETF RFC 3478, Feb. 2003.

[8] M. Deval et al., “Distributed Control Plane Architecture for Network Ele-ments,” Intel Tech. J., vol. 7, no. 4, 2003.

[9] K. K., Nguyen et al., “Towards a Distributed Control Plane Architecture forNext Generation Routers,” ECUMN 2007, France, Feb. 2007.

BiographiesKIM KHOA NGUYEN ([email protected]) received his M.Sc. in comput-er science from the Francophone Institute for Computer Science in 2001 and hisPh.D. in electrical engineering from Concordia University in 2007. Since 2002he has been working with the Optimization of Communication NetworksResearch Laboratory at Concordia University. His current research includes routerarchitectures and QoS for distributed systems. From 1998 to 2002 he was asenior engineer at Vietnam Data-Communication.

ANJALI AGARWAL [SM’03] received her Ph.D. in electrical engineering in 1996from Concordia University, Montreal, her M.Sc. in electrical engineering in 1986from the University of Calgary, and her B.E. in electronics and communicationengineering in 1983 from Delhi College of Engineering, India. She is currentlyan associate professor in the Department of Electrical and Computer Engineeringat Concordia University. Her current research interests are various aspects ofreal-time and multimedia communication over the Internet and wireless accessnetworks. Prior to joining the faculty at Concordia, she worked as a protocoldesign engineer and software engineer in industry.

BRIGITTE JAUMARD () holds a Concordia University Research Chair, Tier 1, on theoptimization of communication networks at the Concordia Institute for Informa-tion Systems and Engineering (CIISE) of Concordia University. She was previous-ly awarded a Canada Research Chair, Tier 1, in the Department of ComputerScience and Operations Research at the Université de Montréal. She is an activeresearcher in combinatorial optimization and mathematical programming, with afocus on applications in telecommunications and artificial intelligence. Recentcontributions include the development of efficient methods for solving large-scalemathematical programs and their applications to the design and management ofoptical, wireless, and 3G/4G networks. In artificial intelligence her contributionsinclude the development of efficient optimization algorithms for probabilistic logic(reasoning under uncertainty) and automated mechanical design. She has pub-lished over 150 papers in international journals in operations research andtelecommunications.

JAUMARD LAYOUT 3/6/08 1:45 PM Page 14

Authorized licensed use limited to: GOVERNMENT COLLEGE OF TECHNOLOGY. Downloaded on January 22, 2009 at 02:21 from IEEE Xplore. Restrictions apply.