fib efficiency in distributed platformseprints.networks.imdea.org/1375/1/pid4413703.pdf · 2016....

10
FIB Efficiency in Distributed Platforms Kirill Kogan * , Sergey I. Nikolenko †‡ , Patrick Eugster § , Alexander Shalimov , Ori Rottenstreich k * IMDEA Networks Institute, Madrid, [email protected] Steklov Institute of Mathematics at St. Petersburg, [email protected] Kazan Federal University, Kazan § TU Darmstadt and Purdue University, [email protected] Lomonosov Moscow State University and ARCCN, [email protected] k Princeton University, [email protected] Abstract—The Internet routing ecosystem is facing substantial scalability challenges due to continuous, significant growth of the state represented in the data plane. Distributed switch architectures introduce additional constraints on efficient im- plementations from both lookup time and memory footprint perspectives. In this work we explore efficient FIB representa- tions in common distributed switch architectures. Our approach introduces substantial savings in memory footprint transparently for existing hardware. Our results are supported by an extensive simulation study on real IPv4 and IPv6 FIBs. I. I NTRODUCTION The Internet continues to grow rapidly, FIB sizes have been doubling over the last 5 years, exceeding half a million entries [1], [2]. The data plane is bogged down by the rapid expansion of forwarding tables constituting forwarding infor- mation bases (FIBs), a largely unsolved problem to date [3], [4]. Efficient FIB representations in the data plane become even more important with the advent of IPv6 since most meth- ods that efficiently represent IP-based FIBs do not scale well to IPv6 [5]–[8] due to its significantly larger 128-bit address width. This is a major reason why state-of-the-art solutions for IPv6 FIB are implemented in very expensive and power- hungry ternary content-address memories (TCAMs) [9]–[11]. In this work, we consider the problem of representing FIBs on distributed platforms [9], [10], where several line-cards (LCs) are interconnected by a switching fabric (see Fig. 1b). Each ingress (RX) LC must separately maintain a FIB table (or parts of one) to forward traffic to the correct egress (TX) LC which transmits it further over an output port. The fundamental question here is how to efficiently implement a FIB table (Fig. 1a) across such a distributed switching platform. Currently, there are two major types of FIB implementa- tions for such distributed platforms: one-stage and two-stage forwarding. In the former, pointers to output ports are already found on ingress LCs (see Fig. 2a). Later, on an egress LC, the corresponding traffic is encapsulated based on the pointer coming from the ingress LC [9], so every ingress LC should contain information about “routable” prefixes of all egress LCs, whilst egress LCs encapsulate packets based The work of Sergey Nikolenko was partially supported by the Government of the Russian Federation grant 14.Z50.31.0030 and the Russian Presidential Grant for Young Ph.D.s MK-7287.2016.1. The work of Patrick Eugster was partially supported by the German Research Foundation under grant SFB-1053 MAKI and European Research Consortium under grant FP7-617805 LiveSoft. Fig. 1. How to represent FIB table (a) on distributed switching platform (b). Fig. 2. One- and two-stage forwarding implemented on existing distributed platforms [10], [12]; lookups are done on all address bits. on the pointer to the output port from the ingress LC. The major drawback of this representation is that maintaining prefixes mapped to output ports of all egress LCs on every ingress LC is not scalable. While such one-stage forwarding representations do not scale well, they are simple and require only dummy processing on egress LCs (processing pipelines are usually doubled to isolate ingress and egress traffic). For example, different generations of Cisco C12000 [9] LCs implement one-stage forwarding. In contrast to one-stage forwarding, where output ports are already computed at ingress LCs, ingress LCs in two-stage forwarding only have to choose which egress LC to send a packet to (see Fig. 2(b)). Now egress LCs have to choose a specific output port by running the additional FIB lookup on the whole destination address for every packet. This allows to mitigate memory constraints on ingress LCs, and each egress LC now contains only prefixes corresponding to its output ports. As a concrete router example, different generations of Cisco CRS-1 [9] LCs implement two-stage forwarding. FIB tables can be represented in different ways. Informally, we say that two representations K and K 0 are equivalent if for any input header both K and K 0 yield the same action.

Upload: others

Post on 05-Sep-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FIB Efficiency in Distributed Platformseprints.networks.imdea.org/1375/1/PID4413703.pdf · 2016. 8. 18. · FIB Efficiency in Distributed Platforms Kirill Kogan , Sergey I. Nikolenkoyz,

FIB Efficiency in Distributed PlatformsKirill Kogan∗, Sergey I. Nikolenko†‡, Patrick Eugster§, Alexander Shalimov¶, Ori Rottenstreich‖

∗IMDEA Networks Institute, Madrid, [email protected]†Steklov Institute of Mathematics at St. Petersburg, [email protected]

‡Kazan Federal University, Kazan§TU Darmstadt and Purdue University, [email protected]

¶Lomonosov Moscow State University and ARCCN, [email protected]‖Princeton University, [email protected]

Abstract—The Internet routing ecosystem is facing substantialscalability challenges due to continuous, significant growth ofthe state represented in the data plane. Distributed switcharchitectures introduce additional constraints on efficient im-plementations from both lookup time and memory footprintperspectives. In this work we explore efficient FIB representa-tions in common distributed switch architectures. Our approachintroduces substantial savings in memory footprint transparentlyfor existing hardware. Our results are supported by an extensivesimulation study on real IPv4 and IPv6 FIBs.

I. INTRODUCTION

The Internet continues to grow rapidly, FIB sizes havebeen doubling over the last 5 years, exceeding half a millionentries [1], [2]. The data plane is bogged down by the rapidexpansion of forwarding tables constituting forwarding infor-mation bases (FIBs), a largely unsolved problem to date [3],[4]. Efficient FIB representations in the data plane becomeeven more important with the advent of IPv6 since most meth-ods that efficiently represent IP-based FIBs do not scale wellto IPv6 [5]–[8] due to its significantly larger 128-bit addresswidth. This is a major reason why state-of-the-art solutionsfor IPv6 FIB are implemented in very expensive and power-hungry ternary content-address memories (TCAMs) [9]–[11].

In this work, we consider the problem of representing FIBson distributed platforms [9], [10], where several line-cards(LCs) are interconnected by a switching fabric (see Fig. 1b).Each ingress (RX) LC must separately maintain a FIB table (orparts of one) to forward traffic to the correct egress (TX) LCwhich transmits it further over an output port. The fundamentalquestion here is how to efficiently implement a FIB table(Fig. 1a) across such a distributed switching platform.

Currently, there are two major types of FIB implementa-tions for such distributed platforms: one-stage and two-stageforwarding. In the former, pointers to output ports are alreadyfound on ingress LCs (see Fig. 2a). Later, on an egressLC, the corresponding traffic is encapsulated based on thepointer coming from the ingress LC [9], so every ingressLC should contain information about “routable” prefixes ofall egress LCs, whilst egress LCs encapsulate packets based

The work of Sergey Nikolenko was partially supported by the Governmentof the Russian Federation grant 14.Z50.31.0030 and the Russian PresidentialGrant for Young Ph.D.s MK-7287.2016.1. The work of Patrick Eugster waspartially supported by the German Research Foundation under grant SFB-1053MAKI and European Research Consortium under grant FP7-617805 LiveSoft.

Fig. 1. How to represent FIB table (a) on distributed switching platform (b).

Fig. 2. One- and two-stage forwarding implemented on existing distributedplatforms [10], [12]; lookups are done on all address bits.

on the pointer to the output port from the ingress LC. Themajor drawback of this representation is that maintainingprefixes mapped to output ports of all egress LCs on everyingress LC is not scalable. While such one-stage forwardingrepresentations do not scale well, they are simple and requireonly dummy processing on egress LCs (processing pipelinesare usually doubled to isolate ingress and egress traffic).For example, different generations of Cisco C12000 [9] LCsimplement one-stage forwarding.

In contrast to one-stage forwarding, where output ports arealready computed at ingress LCs, ingress LCs in two-stageforwarding only have to choose which egress LC to send apacket to (see Fig. 2(b)). Now egress LCs have to choose aspecific output port by running the additional FIB lookup onthe whole destination address for every packet. This allows tomitigate memory constraints on ingress LCs, and each egressLC now contains only prefixes corresponding to its outputports. As a concrete router example, different generations ofCisco CRS-1 [9] LCs implement two-stage forwarding.

FIB tables can be represented in different ways. Informally,we say that two representations K and K′ are equivalent iffor any input header both K and K′ yield the same action.

Page 2: FIB Efficiency in Distributed Platformseprints.networks.imdea.org/1375/1/PID4413703.pdf · 2016. 8. 18. · FIB Efficiency in Distributed Platforms Kirill Kogan , Sergey I. Nikolenkoyz,

Different equivalent FIB representations can optimize distinctobjectives, e.g., minimize required memory or number ofentries [7], [13]. The above-mentioned one- and two-stage for-warding representations are equivalent to FIB representationon a virtual switch (see Fig. 1a) and are currently used in realdistributed platforms [9], [10]. Note that both one- and two-stage forwarding use all bits of addresses which extremelylimits efficient implementations, especially on ingress LCs.

The primary contribution of this work is to exploit structuralFIB properties to balance bits that are looked up betweeningress and egress LCs to minimize required memory andlookup time. This is not a specific classifier implementation;instead, we propose an abstraction layer that defines a subsetof bit indices that should be involved in the lookup process oningress and egress LCs respectively. As a result, a classifierbased on these bit indices can be transparently represented byother schemes (e.g., [7], [14]). Our abstraction based on theproposed structural properties generalizes beyond IPv4, IPv6,and flow-based FIB representations, leaving network-widecontrol plane signaling and underlying network infrastructureintact: FIBs are given and computed by any existing routingprotocol. In contrast to compact routing, our approach doesnot affect routing decisions based on requested address space(i.e., traffic not dropped by the original FIB representation),yet our methods can achieve significant memory savingswithout increasing lookup time (actually improving it) and areapplicable even if input FIBs have already been optimized. Asa byproduct, we show a counter-intuitive result: IPv6 FIBscan be implemented on existing IPv4 or MPLS hardware-or software-based FIB implementations without increasinglookup time or memory requirements or requiring changesin hardware. Apart from prefix-based classifiers with longest-prefix-match (LPM) priorities, we also consider the generalcase of classifiers applicable to clean slate architectures.

The paper is organized as follows. Section II introducesthe basic model; Section III introduces the notion of MATCHEQUIVALENCE and shows how to use its three flavors forefficient FIB implementations on distributed systems. In Sec-tion IV, we propose memory minimization algorithms forFIBs based on MATCH EQUIVALENCE; Section V discussesdynamic updates. Section VI demonstrates 50-80% savingsover optimized equivalent solutions achieved by state-of-the-art Boolean minimization techniques on real FIBs, both IPv4and IPv6, and Section VII compares lookup times for proposedrepresentations against baseline IP and IPv6 implementationsin the Intel Data Plane Development Kit (DPDK) [15]. Thoughwe achieve substantial memory savings, proposed representa-tions are also approximately twice faster than baseline DPDKIPv6 implementation and on par with the baseline IP im-plementation. Section VIII discusses related work, and weconclude with Section IX.

II. MODEL DESCRIPTION

A packet header H = (h1, · · · , hw) is a sequence of wbits from a packet and internal databases of network elements,where each bit hi of H has a value of zero or one, 1 ≤ i ≤ w.

We denote by W the ordered set of w indices of the bits inheaders (i.e., (1, ..., w)). A classifier K = (R1, ..., RN ) is anordered set of rules, where each rule Ri consists of a filter Fiand a pointer to the corresponding action Ai. A filter F is anordered set of w values corresponding to the bits in headers.Possible values are 0, 1, and ∗ (“don’t care”). A header Hmatches a filter F if for any bit of H , the corresponding bitof F has the same value or ∗. A header H matches a ruleRi if Ri’s filter is matched. The set of rules has non-cyclicordering ≺; if a header matches both Ri and Rj for Ri ≺ Rj ,the action of rule Ri is applied. We say that two filters F andF ′ of a classifier intersect if there is at least one header thatmatches both F and F ′; otherwise, F and F ′ are disjoint. Forinstance, for w = 4, F1 = (1 0 0 ∗) , F2 = (0 1 ∗ ∗),F3 = (1 ∗ ∗ ∗), F1 and F3 intersect (for instance, the header(1 0 0 0) matches both filters), while F1 and F2 are disjoint.

Let B be a set of bit indices, B ⊆W , referring to a subset ofthe bits in packet headers. For a header H , we denote by HB

the (sub)header of |B| bits obtained by taking only bits of H(in their original order in W ) with indices in B. Likewise, for arule R or a filter F we denote by RB and FB , respectively, the(sub)rule and the (sub)filter defined on |B| bits by removingthe bits from W \B. Finally, for a classifier K = (R1, ..., RN ),let KB be the (sub)classifier obtained from K by replacingeach rule R in the classifier by RB . The notions of matching,intersection, and disjointness carry over to such subsequences.Similarly, for a header H , we denote by H−B the header ofw − |B| bits obtained from H by considering only bits withindices in W \B. Likewise, let R−B and F−B be the rule andthe filter, respectively, defined on w− |B| bits by eliminatingthe requirement on the bits in B for a rule R or its filter F .Let K−B be the classifier obtained by replacing each rule Rin a classifier K by R−B . We denote the set of all filters inK based on the ordered set of bit indices B ⊆ W by FB ;F = FW . We denote the set of all actions in K by A.

Consider a classifier K with a classification function f :{0, 1}w → A. For a set of bit indices B ⊆ W , we say that aclassifier K−B with classification function g : {0, 1}w−|B| →A is an equivalent representation of K if f(H) = g(H−B)for every header H ∈ {0, 1}w.

III. FIB REPRESENTATIONS

Before we introduce representations based on the proposedstructural properties formally, we present an illustrative exam-ple (Fig. 1a) to motivate our contributions.

A. Traditional Forwarding Representations

Consider a virtual switch (Fig. 1b) representing a distributedsystem. The major drawback of one- and two-stage forwardingapproaches is the fact that all bits of a lookup address arerepresented in FIB tables. In particular, in the case of one-stage forwarding (Fig. 2a), every ingress LC requires five 5-bitentries (not including a default entry) without additional FIBtable on egress cards (Fig. 2a), so the total memory neededfor eight slots is 6 × 5 × 8 = 240 bits, and there is a singleFIB lookup per packet that is done on ingress LCs.

Page 3: FIB Efficiency in Distributed Platformseprints.networks.imdea.org/1375/1/PID4413703.pdf · 2016. 8. 18. · FIB Efficiency in Distributed Platforms Kirill Kogan , Sergey I. Nikolenkoyz,

Fig. 3. Equivalent FIB representations with reduced address bits.

In the two-stage forwarding case (Fig. 2b, in our recurringexample), actions A1 and A2 for output ports are unified tosome “action” LC1, and actions for output ports A3 and A4

to LC2 on ingress LCs. This allows to further compress FIBentries on ingress LCs, uniting for instance prefixes 01000 and01001 with the same action LC1 into a filter 0100∗. So everyingress LC requires three 5-bit entries (not including a defaultentry). In addition each egress LC1 and LC2 requires two 5-bit entries. The total memory needed on all ingress and egressLCs in our example is 4× 5× 8 + 2× 5× 2 = 180 bits.

B. Balancing Bits between Ingress and Egress

Note that interconnecting switch fabrics should supportfull linerate for any combination of ingress and egress LCs,and therefore should not become a bottleneck. Also FIBlookups on egress LCs in the case of two-stage forwarding(or dereferencing of a pointer found on ingress in the case ofone-stage forwarding) is usually the first thing to be completedon egress LCs to determine further processing of packets (e.g.,the pipeline architecture of packet processing engine in CiscoC12000 router [16]). As long as these invariants are maintainedwe can move away from equivalent representations on the LClevel to equivalent representations in the boundaries of thesame distributed system by balancing between evaluating bitsof a represented FIB table on ingress LCs versus egress LCs.In the following, we propose several structural properties ofclassifiers and show how to exploit them to find more efficientrepresentations of FIBs on distributed switching platforms.

C. Match Equivalence

When we reduce a classifier’s width, we often get a classifierthat does not preserve the semantics (we do not have strictequivalence). However, removing a given bit index from aclassifier is in essence equivalent to changing the bit’s valueto ∗ in all filters, so reducing width cannot decrease the set ofheaders covered by filters. Hence, for this relaxation we justneed to guarantee a correct match of headers that are matchedby the original classifier. We introduce a novel property,MATCH EQUIVALENCE, to improve classifier efficiency. Thisproperty relaxes the classification function by keeping only bitindices that preserve the original classifier K’s action for anyheader matched by K. Formally, a classifier KB is MATCH

EQUIVALENT (ME) to classifier KW , B ⊆ W , if the actionfor any matched header HW in KW coincides with the actionfor the corresponding header HB in KB . If HW has no matchin KW then HB can either be matched or remain unmatchedin KB .

Problem 1 (Minimal MATCH EQUIVALENCE). For a givenclassifier K, find a MATCH EQUIVALENT classifier to K whosefilters require minimal memory (in bits).

This definition of MATCH EQUIVALENCE does not directlyyield algorithms for the MINME problem. Next we proposethree different flavors of MATCH EQUIVALENCE. In Section VIwe will compare the impact of these structural properties onthe desired objective.

D. Filter-order Independence

First we consider a family of MATCH EQUIVALENT classi-fiers that exploit filter-order independence. Classifiers whosefilters are all pairwise disjoint (that do not match the sameheaders) are filter-order-independent. Intuitively, representa-tions based on filter-order independence (alone) do not takeinto account the number of different actions. Kogan et al. [17]originally demonstrated the effect of this characteristic onclassifiers whose fields are represented by ranges; the impactof filter-order independence on FIB representations was leftunaddressed. Filter-order independence guarantees that only asingle filter is matched, so to build an equivalent solution weneed to check on egress LCs if the remaining bits (that didnot participate in the classification process on ingress) of thepacket still match an admissible packet (see Fig. 3a). If yes,the forwarded packet is sent out through the correspondingoutput port, otherwise it is dropped, similarly as it is doneby representations in Fig 2. Observe that in this case thereis no lookup table on egress LCs and the remaining bitsare fetched from the pointer found on ingress, so filter-orderindependence has a good fit for one-stage forwarding. Everypacket matching the ingress FIB (except the default entry)in Fig 2a (that implements one-stage forwarding) applies thesame action as in the compressed ingress FIB in Fig 3a. Butnow packets previously matched by the default entry in theingress FIB in Fig 2a may be matched by the compressedingress FIB in Fig 3a and are dropped only on egress LCs.

Page 4: FIB Efficiency in Distributed Platformseprints.networks.imdea.org/1375/1/PID4413703.pdf · 2016. 8. 18. · FIB Efficiency in Distributed Platforms Kirill Kogan , Sergey I. Nikolenkoyz,

Fig. 4. Non-equivalent FIB representations with reduced address bits.

Still this representation is equivalent in the system boundariesand allows to balance lookup time and memory requirementsbetween ingress and egress LCs. In the running example, theprefixes of a given FIB are filter-order independent and the FIBon every ingress LC can be reduced to only five 3-bit entries(see Fig. 3a) (and a default entry), with a total memory require-ment for this representation being 6×3×8+2×2×2 = 152bits. As a significant side effect, this representation can fitIPv6 to already existing IP or even MPLS ingress LCs. Wedemonstrate the feasibility of this representation in Section VIfor real IPv6 FIBs. Note that filter-order independence can betoo restrictive; one does not have to require it from rules withthe same action.

E. Action-order Independence

The major drawback of filter-order independence is that itcan reduce only a number of bit-identities that are involvedin FIB representations on ingress LCs but cannot reduce thenumber of entries in these tables on ingress.

To address this limitation we introduce a family of MATCHEQUIVALENT classifiers that exploit action-order indepen-dence. This characteristic is based on the observation thata classification result really is the action of the rule withthe highest priority, rather than the filter associated with thisrule. Two rules with filters FB1 and FB2 are action-order-independent if either (a) they have the same action or (b) FB1and FB2 are disjoint; otherwise, we say that two rules areaction-order-dependent. Similarly, a classifier K as a wholeis called action-order-independent if every pair of its rulesare action-order-independent; otherwise K is action-order-dependent. Conversely, a classifier K′ that implements action-order independence on a subset B ⊂ W of bit indices of anoriginally action-order-independent classifier K matches thesame headers as K, and as a result K′ is MATCH EQUIVALENTto K with respect to B.

The action-order independence allows to further reduce thenumber of entries for filters with the same action. In this casewe can reduce both the number of entries and the numberof bit-identities participating in the FIB lookup on ingress.The price for this is a slightly more complex “false-positive”check and thus processing on the egress LCs, that in contrastto filter-order independence requires a FIB lookup table based

on a subset of bit-identities (see Figure 3b). Note that the two-stage forwarding approach with action-order independencecreates an additional reduction in the number of entries oningress since a number of actions is reduced to a number ofdifferent LCs. In our example the memory requirements forthis representation type is 3×2×8+3×3+3×4 = 69 bits1.Since this representation is more general than the previousone it supports an even better balance of memory and lookuptimes between ingress and egress LCs. As we will showin a comprehensive comparison on real classifiers shortly inSection VI, it suffices to achieve action-order independenceon a subset of bit indices to find much more space-efficientrepresentations of classifiers than representations that exploitfilter-order independence.

F. Non-conflicting RulesWhile exploiting action-order independence to implement

MATCH EQUIVALENCE has a huge potential for memorysavings (as demonstrated by our evaluations in Section VI),it becomes apparent that we need to better understand theapplicability and expression of MATCH EQUIVALENCE. Tothat end we suggest to express the third type of MATCHEQUIVALENCE through the notion of conflicting rules. Tworules RX1 and RX2 with different actions, RX1 ≺ RX2 , areconflicting with respect to bit indices B ⊂ X ⊆W if there isa header HX matching RX2 that is not matched by RX1 andRB1 matches HB .

Example 1. In the following classifier with |X| = |W | = 3

#1 #2 #3RX

1 (∗ 1 0) → A1

RX2 (1 1 ∗) → A2

we have that RX1 and RX2 are conflicting with respect to B ={1, 2} since RB1 matches HB = (1 1) but RX1 does not matchHX = (1 1 1) when RB1 matches HB and yet it matches RX2 .

#1 #3RB

1 (1 ∗) → A1

RB2 (∗ 0) → A2

On the other hand, for B = {2, 3}, the rules RX1 and RX2 arenot conflicting with respect to B.

1Note that only in this specific example no default entry is required oningress LCs since all combinations of values are already covered.

Page 5: FIB Efficiency in Distributed Platformseprints.networks.imdea.org/1375/1/PID4413703.pdf · 2016. 8. 18. · FIB Efficiency in Distributed Platforms Kirill Kogan , Sergey I. Nikolenkoyz,

#2 #3RB

1 (1 0) → A1

RB2 (1 ∗) → A2

A classifier KB is MATCH EQUIVALENT to a classifier KXwith respect to B, B ⊂ X , if no two rules RX1 ≺ RX2 in KXwith different actions are conflicting with respect to B.

For our running example the FIB representation based onnon-conflicting rules on each ingress LC requires two 2-bitentries (see Figure 3c) with the total memory requirement of2× 2× 8 + 3× 3 + 3× 5 = 56 bits.

Consequences of all three types of MATCH EQUIVALENCEthat we consider are counter-intuitive. For equivalent represen-tations on the LC level, more bit identities in given classifierswith a fixed number of filters implies bigger memory footprint(e.g., IPv6 or OpenFlow versus IP). This is not the casefor MATCH EQUIVALENCE. Representing an ingress FIB tablewith N actions has N logN minimal memory requirements,since we need at least one filter for every action, and logNbits are required to distinguish among N actions. For MATCHEQUIVALENCE, IPv6 and OpenFlow have better chances thanIP to find bit indexes to distinguish between actions. Thisallows us not only to significantly minimize memory footprintversus optimal equivalent representations on the LC level, butalso to implement ingress IPv6-based FIBs or other clean-slatesolutions (e.g., OpenFlow) on existing IPv4 or even MPLSinfrastructures without increasing memory requirements andlookup time, transparently to the hardware. We will demon-strate this effect in Section VI. Fig. 3 summarizes all proposedrepresentations which are at the system level equivalent to theone- and two-stage forwarding representations in Fig. 2 andintroduce substantial savings on ingress and egress LCs.

Having illustrated the power of the proposed representa-tions, let us now proceed with algorithms that compute them.

IV. ALGORITHMS FOR MATCH EQUIVALENCE

In this section, we propose algorithms that compute op-timized MATCH EQUIVALENT classifiers. Algorithm 1 testswhether one can keep only a subset of bit indices B in aclassifier, given the set of filters FX , B ⊂ X , while preservingMATCH EQUIVALENCE on bit indices of B; to test this, onesimply needs to check that no two rules with different actionshave a conflict after removing X \B bit indices. Due to spaceconstraints we present only ISME based on non-conflictingrules; for the other two characteristics we can substituteconflicting rules with filter- or action-order dependence.

Algorithm 1 ISME(FX ,B)1: for every FX

1 ∈ FX do2: for every FX

2 ∈ FX s.t. FX1 ≺ FX

2 and A(R1) 6= A(R2) do3: if FX

1 and FX2 are conflicting with respect to B then

4: return Falsereturn True

So all algorithms proposed below will check if removing acorresponding bit still retains MATCH EQUIVALENCE, and willdiffer in how they choose which bit to remove.

Some of the proposed heuristics that compute MATCHEQUIVALENT representations use algorithms as underlyingbuilding blocks which optimize equivalent representations. Weuse a Boolean minimization technique based on ordered AC0

3

circuits that allow to represent order-dependent classifiers [18].In what follows we denote these by function SE() (strict equiv-alence). This function computes an equivalent representationin minimal memory. In Section VI we use an implementationof SE() that is based on minimization of Boolean expressionswith O(N3 · w) running time.

Algorithm 2 MINME (KX)

1: B ← X , IB ← KX

2: repeat3: B′ ← B4: for each j ∈ B do5: if IsME(IB\{j}) then6: j∗ ← arg minj SE(IB\{j})7: B ← B \ {j∗}, IB ← SE(IB)

8: until B = B′ return B, IB

Algorithm 2 considers bit indices that can be removed whilepreserving MATCH EQUIVALENCE. At each step, it removesone bit from all filters; the bit is selected to maximize thememory savings obtained by the reduction. The algorithmcontinues as long as there is a bit that can be removed whilepreserving MATCH EQUIVALENCE.

Example 2. Consider the following classifier:

#1 #2 #3 #4(0 1 0 0) → A1

(0 1 1 1) → A2

(1 1 1 0) → A2

The proposed heuristic MINME() will consider three bitsthat can be potentially removed in the first step: #1, #2,and #4. Among these bits, removing #2 does not lead tofurther memory reductions, while removing, say, #1 leads toF−{1}2 = (1 1 1) and F

−{1}3 = (1 1 0), which can now be

united by resolution to F ′ = (1 1 ∗); similarly, removing #4leads to F ′ = (∗ 1 1) on bits 1-3. Therefore, the heuristicchooses either #1 or #4, say #1, leading to the followingMATCH EQUIVALENT classifier:

#2 #3 #4(1 0 0) → A1

(1 1 ∗) → A2

It is impossible to further reduce the set of rules (all ruleshave different actions now), so the proposed heuristic will nothave any preference in subsequently removing bits #2 and #4;after removing them, we get the final representation that isbased only on bit #3.

To make this representation more efficient, we propose tofind a MATCH EQUIVALENT classifier on a subset of rules froma given classifier. To do so, we assign all rules to two disjointgroups I and D, where IB remains MATCH EQUIVALENT toI for some subset of bit indices B ⊆ W , and D = K \ Icontains all other rules of the original classifier. The headerof an incoming packet is matched in both groups of rules and

Page 6: FIB Efficiency in Distributed Platformseprints.networks.imdea.org/1375/1/PID4413703.pdf · 2016. 8. 18. · FIB Efficiency in Distributed Platforms Kirill Kogan , Sergey I. Nikolenkoyz,

the rule with the highest priority is chosen. This separationproves to be very fruitful for size reduction, as we will seeshortly in the experimental evaluation.

Different systems have different memory types with con-straints on supported classification. For instance, a networkelement can have limited resources to represent 128-bit clas-sifiers directly (e.g., TCAMs). Note that the D subset may bestored in a different memory unit than the bulk of the classifierIB . Hence, we introduce an additional constraint l on themaximal value of |B| that can be used for IB .

Example 3. In the following classifier

#1 #2 #3 #4 #5 #6(0 0 1 0 0 0) → A1

(0 0 0 1 0 0) → A2

(1 0 0 0 1 0) → A3

(1 1 0 0 0 1) → A4

(∗ ∗ 0 0 0 0) → A5

R5 does not let us remove bits 3-6 because it conflicts withrules R1-R4 with respect to these bits. However, once we setD = {R5}, the remaining I{1,2} becomes MATCH EQUIVA-LENT to the following classifier on fields {1, 2}: (0 0)→ A1,(0 0)→ A2, (1 0)→ A3, (1 1)→ A4.

Problem 2 (MINME). Given a classifier K on a set of bits Wand a positive number l ≤ w, find a subset of bits B, B ⊆W ,|B| ≤ l, and a subset of rules I ⊆ K such that IB ∪ D forD = K \ I is MATCH EQUIVALENT to K and minimizes thevalue of |B| · |I|+ w · |D|.

Algorithm 3 MINME (KX , l)1: D0 ← ∅, B ← ∅, I0 ← K, B0 ← X2: for i = 1..w do3: for each j ∈ B do4: Di,j ← ∅5: for each R ∈ I do6: for R′ ∈ K, R′ � R do7: if R, R′ conflict w.r.t. B \ {j} then Di,j ← Di,j ∪ {R}8: j∗ ← arg minj |Di,j |9: Di ← SE(Di−1 ∪Di,j∗ ), Ii ← SE(K \ Di)

10: B ← B \ {j}, I ← IB , K ← KB

11: i∗ ← arg mini≥w−l (w|Di|+ (w − i)|Ii|)12: return i∗, Di∗ , Ii∗

Algorithm 3 removes at least w − l bits by greedily min-imizing |D|, which is equivalent to minimizing the value of|B| · |IB |+ w · |D|.

Property 1. For a classifier with N rules and headers of wbits, the time complexity of MINME() is O(N3 · w2).

Network elements may be able to perform different numbersof lookups in parallel. In what follows, we explore how thelevel of parallelism β affects representation efficiency.

Problem 3 (PARMINME). Given a classifier K on bits W andtwo positive numbers l ≤ w and β, find β subsets of bits Bi,i = 1..β, and assign the rules to β disjoint groups Ii ⊆ K andD = K\∪βi=1Ii, such that ∪βi=1Ii∪D is MATCH EQUIVALENT

to K, and∑βi=1 |Bi| · |Ii|+ w · |D| is minimized.

Algorithm 4 PARMINME (K, l, β)1: D = K, I0 = ∅, k = 02: while ∪ks=0Is 6= K and k ≤ β do3: k = k + 14: D = SE(D)5: ik,D, Ik = MINME(D, l)

return k, {(is, Is)}ks=1, D

Algorithm 4 shows the basic idea of greedy multi-groupoptimization with MATCH EQUIVALENCE. Note that we applySE() to D on each iteration because rules from I might haveprevented reductions that now become possible.

Property 2. For a classifier with N rules and w-bit headers,the time complexity of MINME() is O(β ·N3 · w2).

In Section VI we empirically compare the efficiency ofclassifiers based on various structural characteristics.

V. DYNAMIC UPDATES

Another practical consideration is to enable dynamic up-dates since new rules may appear or old rules may becomeobsolete; this section studies dynamic update heuristics.A rulecan be simply removed from the I part; for D, it depends onthe representation. Inserting a rule may be more complicated:if an inserted rule R breaks MATCH EQUIVALENCE with thecurrent I, we have to insert R into D. If adding R to oneof the groups does not break MATCH EQUIVALENCE on thecurrent set of bit indices, we add R to that group. We canbegin to recompute both I and D in the background, butstill we face a tradeoff between the current optimality of Irepresentation and the time needed to insert another rule. Ifa rule R in I is modified in only w − |B| bits, no futureupdates to the abstraction layer are required. If the changeinvolves a bit required to retain MATCH EQUIVALENCE of aspecific group, we have to make sure for every group that itsrules do not conflict in I after the change. If they do not, wecan simply update R in the representation of the correspondinggroup; if R conflicts with some other rule, we can move thismodified rule to D. We can recalculate the selection of bits todistinguish among rules to match the recent new rules in anoffline manner. As in the rule insertion case, modification ofa rule in D depends on the representation of D.

VI. EVALUATION

All our algorithms are applicable to multi-field classifiers inthe general case, but due to the lack of scalable real classifiersfor this case we evaluate the impact of our results on IPand IPv6 FIB classifiers. We compare the results of threedifferent types of MATCH EQUIVALENCE with algorithms SE()(minimization of Boolean expressions) from Section IV, notonly to compare results with MATCH EQUIVALENCE but alsoto see additional memory reductions when MINME() is appliedto already optimized equivalent representations.

We have analyzed more than 100 real IP FIBs ranging insize from 400 kbits to 16 Mbits provided by various sourcesincluding large ISPs (unfortunately, specifics omitted due to

Page 7: FIB Efficiency in Distributed Platformseprints.networks.imdea.org/1375/1/PID4413703.pdf · 2016. 8. 18. · FIB Efficiency in Distributed Platforms Kirill Kogan , Sergey I. Nikolenkoyz,

Original One-stage Two-stage (Fig. 2b)(Fig. 2a) ingress egress

Actions Rules Size Rules Size Rules Size Rules SizeIPv4 benchmarks

1 194 410454 13134.5 146164 4677.2 123096 3939 145090 4642.82 3 410513 13136.4 94496 3023.8 95859 3067.4 95869 3067.83 101 502023 16064.7 102281 3272.9 103166 3301.3 103727 3319.24 154 14319 458.2 8401 268.8 7857 251.4 8441 270.1

IPv4 benchmarks with random padding to 128 bits1 194 410454 52538.1 410454 52538.1 410454 52538.1 410454 525382 3 410513 52545.6 410513 52545.6 410513 52545.6 410513 52545.53 101 502023 64258.9 502023 64258.9 502023 64258.9 502023 64258.94 154 14319 1832.8 14319 1832.8 14319 1832.8 14319 1832.8

IPv6 benchmarks1 36 1476 188.9 1476 188.9 1476 188.9 1476 188.92 30 1477 189 1477 189 1477 189 1477 1893 8 1475 188.8 1475 188.8 1475 188.8 1475 188.84 20 1476 188.9 1476 188.9 1476 188.9 1476 188.9

TABLE IONE-STAGE FORWARDING (FIG. 2A) AND TWO-STAGE FORWARDING

(FIG. 2B).

NDAs); the tables of evaluation results show only a fewcharacteristic samples due to space constraints. Our purposewas to show that with the proposed approach, one can actuallyimplement IPv6 classifiers on existing IPv4 infrastructure, sowe used IPv6 benchmarks generated in two different ways.The third, bottom part of every evaluation table shows ourresults for IPv6 FIBs from the Internet2 project [19]; again,due to space constraints we only show a few representativeexamples. Since infrastructural constraints have, to a certainextent, held back the advancement of the IPv6 protocol,available IPv6 benchmarks are relatively small and may notreflect the full extent to which the extra address space can beused. We have generated benchmarks that would correspond toa mature and widely used IPv6 protocol by randomly paddingour IPv4 benchmarks up to the length of 128 bits.

We organize evaluation tables in parallel with Figures 2, 3,and 4. Table I shows FIB sizes for one- and two-stage forward-ing schemes implemented on existing distributed platforms,(Fig. 2). We have applied Boolean heuristics for equivalentFIB reduction on the original classifier (one-stage, Fig. 2a),original classifier with two actions corresponding to differentlinecards, and total size of two egress classifiers for the twoLCs (two-stage, Fig. 2b). We see that equivalent heuristics canreduce IPv4 classifiers by a factor of 2-3 and ingress classifiersin two-stage forwarding by an additional 10-30%. However,there is virtually no reduction for both original IPv6 classifiersand IPv4 classifiers randomly padded to IPv6 size.

Table II shows equivalent FIB representations as shown onFig. 3. For ingress classifiers, in every case we show the widthand size of the first group produced by PARMINME (theseresults correspond to two parallel lookups) and the total finalnumber of groups and final size reduced by PARMINME; forRX classifiers, we only show the total final size. As we can seefrom Table II, on IPv4 benchmarks the PARMINME algorithmprovides an additional reduction by a factor of 1.5-2 comparedto the results of Boolean equivalent heuristics; in some cases,there is a big difference between classifiers with the originalactions and ingress classifiers with actions for linecards, adifference that could not be exploited as well by Boolean

Input One-stage Group 1 Tot. Group 2 Tot. # FinalA. Rules Size Rules Size W. |I| |D| size W. |I| |D| size gr. size

1 36 1476 188.9 1131 144.7 13 625 506 72.8 13 251 255 66.3 6 59.72 30 1477 189 1150 147.2 13 584 566 80 13 234 331 73.9 6 65.43 8 1475 188.8 1066 136.4 13 681 385 58.1 13 226 157 52.2 5 48.14 20 1476 188.9 1080 138.2 13 673 407 60.8 13 251 155 54.3 5 50.3

TABLE VSIMULATION RESULTS, IPV6 INTERNET2: MAX. 13 BIT IDENTITIES PER

GROUP, FIB REPRESENTATION BASED ON NON-CONFL. RULES.

heuristics. As for the IPv6 classifiers, we see in Table II thathere, already in equivalent FIB representations, we achieve ahuge advantage over both original IP and IPv6 classifiers. Thisis an important property of our FIB representation schemes:adding more bit identities to the original FIBs does not hinderthe required memory footprint and may even help it. Inboth Internet2 IPv6 benchmarks and IPv4 benchmarks paddedto 128 bits, we see a further reduction by a factor of 6-7compared to representations from Table I for both ingress andegress classifiers.

Table IV summarizes all results. We see that the total sizeprogressively decreases from left to right, from existing two-stage forwarding schemes to equivalent FIB representationsand then to non-equivalent FIB representations. Total spacereductions are by a factor of 3-5 for IP FIBs and by a factorof 8-10 for IPv6 FIBs. Note also that across all benchmarks,the first group’s I covers a large majority of the rules, andthe resulting total size is comparable to the total multigroupsize after applying PARMINME throughout all heuristics. Thissuggests that even if only two parallel lookups are available,the proposed schemes improve hugely upon existing solutions.We will revisit the problem of parallel lookups in Section VII.

Table III deals with FIB representations that violate strictequivalence, as shown on Fig. 4. The remarks made aboveabout Table II also apply here. In this case we use the sameschemes for ingress classifiers as in Table II but achievesignificant further space savings across all benchmarks inegress classifiers.

Throughout our experiments with IPv6 benchmarks, bothpadded IPv4 benchmarks and real IPv6 FIBs from the Inter-net2 project [19], we never encountered a group more than 32bits wide. The first group usually both has the most rules andthe most bits; note in Tables II and III how we can representall IPv6 Internet2 classifiers with two groups no more than 16bits wide and all padded IPv4 classifiers with groups of about20 bits wide. This means that in practice, one can use existingIPv4 infrastructure designed to handle 32-bit IPv4 FIBs toimplement IPv6 classifiers. Our experiments suggest that IPv6FIBs can be represented even on existing MPLS infrastructure,where usually MPLS lookup tables are implemented as tableswith 213 entries [9]. These results, for the same Internet2 IPv6benchmarks, are shown in Table V. Naturally, in this case thereare more groups, but still only 5-6 groups are required in allcases.

VII. DPDK IMPLEMENTATION

The goal of this evaluation is to compare multi-group classi-fication approach with independent “pseudo-parallel” lookups

Page 8: FIB Efficiency in Distributed Platformseprints.networks.imdea.org/1375/1/PID4413703.pdf · 2016. 8. 18. · FIB Efficiency in Distributed Platforms Kirill Kogan , Sergey I. Nikolenkoyz,

Ingress, Fig. 3a Ingress, Fig. 3b Ingress, Fig. 3cGroup 1 # Final Egress Group 1 # Final Egress Group 1 # Final Egress

W. |I| |D| Size gr. size Fig. 3a W. |I| |D| Size gr. size Fig. 3b W. |I| |D| Size gr. size Fig. 3cIPv4 benchmarks

1 22 129651 16513 3380.7 6 3215.6 3145.9 19 100028 23068 2638.7 6 2338.8 2305.5 19 106085 17011 2559.9 5 2338.8 2280.32 22 88978 5518 2134 5 2078.9 2044.2 7 89935 5924 819.1 5 671 670.7 6 88292 7567 771.8 4 575.1 566.93 22 95730 6551 2315.6 6 2250.1 2215.8 9 96161 7005 1089.6 7 928.4 952.7 9 94649 8517 1124.3 6 928.4 941.84 19 6987 1414 178 6 159.6 156.6 14 6595 1262 132.7 6 109.9 109.5 13 6364 1493 130.5 5 102.1 100.6

IPv4 benchmarks with random padding to 128 bits1 25 409543 911 10355.1 3 10293.0 10294.1 24 409543 911 9945.6 3 9876 9882.3 24 409757 697 9923.3 3 9850.8 9851.22 25 409229 1284 10395.0 3 10384.6 10386.1 25 410294 219 10285.3 3 10267.8 10270.8 24 410294 219 9875 2 9852.3 99013 26 499220 2803 13338.5 3 13178.4 13182.2 25 495002 7021 13273.7 3 13048.0 13035.9 24 492774 9249 13010.4 3 12048.5 12149.24 20 13720 599 351 3 286.3 286.8 19 13951 368 312.1 3 272 272.6 18 13865 454 307.6 3 257.7 258.5

IPv6 benchmarks1 15 1459 17 24 2 22.1 21.9 14 1460 16 22.4 2 20.6 20.5 14 1460 16 22.4 2 20.6 20.52 15 1461 16 23.9 2 22.1 22 15 1470 7 22.9 2 22.1 22 15 1470 7 22.9 2 22.1 223 16 1467 8 24.4 2 23.6 23.5 14 1456 19 22.8 2 20.6 20.4 14 1456 19 22.8 2 20.6 20.44 15 1463 13 23.6 2 22.1 22 14 1460 16 22.4 2 20.6 20.4 14 1460 16 22.4 2 20.6 20.4

TABLE IIEQUIVALENT FIB REPRESENTATIONS: ONE-STAGE INGRESS, FILTER-ORDER IND., FALSE POSITIVE CHECK ON EGRESS (FIG. 3A); TWO-STAGE

FORWARDING, RX ACTION-ORDER IND., TX LOOKUP (FIG. 3B); TWO STAGE FORWARDING, RX NON-CONFL. RULES AND TX LOOKUP (FIG. 3C).

Ingress, Fig. 4a Ingress, Fig. 4b Ingress, Fig. 4cGroup 1 # Final Egress Group 1 # Final Egress Group 1 # Final Egress

W. |I| |D| Size gr. size Fig. 4a W. |I| |D| Size gr. size Fig. 4b W. |I| |D| Size gr. size Fig. 4cIPv4 benchmarks

1 20 118161 28003 3259.3 8 2923.2 2887.5 19 100028 23068 2638.7 6 2338.8 2620.4 19 106085 17011 2559.9 5 2338.8 2620.42 8 90267 4229 857.4 5 755.9 755.9 7 89935 5924 819.1 5 671 447.6 6 88292 7567 771.8 4 575.1 447.63 10 92633 9648 1235 8 1022.8 1049.9 9 96161 7005 1089.6 7 928.4 557.8 9 94649 8517 1124.3 6 928.4 557.84 14 6324 2077 155 6 117.6 122.2 14 6595 1262 132.7 6 109.9 110.4 13 6364 1493 130.5 5 102.1 110.4

IPv4 benchmarks with random padding to 128 bits1 24 405503 4951 10365800 3 10303.6 9134.1 24 409543 911 9945.6 3 9876 9223.1 24 409757 697 9923.3 3 9850.8 9223.12 24 406501 4012 10269560 3 10259.2 9560.3 25 410294 219 10285.3 3 10267.8 9981.8 24 410294 219 9875 2 9852.3 9981.83 24 497694 4329 12498768 3 12348.7 12883.9 25 495002 7021 13273.7 3 13048.0 12112.5 24 492774 9249 13010.4 3 12048.5 12112.54 19 13936 383 313.8 3 272 273.4 19 13951 368 312.1 3 272 236.2 18 13865 454 307.6 3 257.7 236.2

IPv6 benchmarks1 15 1465 11 23.3 2 22.1 22 14 1460 16 22.4 2 20.6 19.7 14 1460 16 22.4 2 20.6 19.72 15 1462 15 23.8 2 22.1 22 15 1470 7 22.9 2 22.1 20.1 15 1470 7 22.9 2 22.1 20.13 14 1457 18 22.7 2 20.6 20.5 14 1456 19 22.8 2 20.6 19 14 1456 19 22.8 2 20.6 194 14 1462 14 22.2 2 20.6 20.5 14 1460 16 22.4 2 20.6 19.9 14 1460 16 22.4 2 20.6 19.9

TABLE IIINON-EQUIVALENT FIB REPRESENTATIONS: ONE-STAGE, RX ACTION-ORDER IND., NO CHECK ON EGRESS (FIG. 4A); TWO-STAGE, RX ACTION-ORDER

IND., TX ACTION-ORDER IND. LOOKUP (FIG. 4B); TWO-STAGE, RX NON-CONFL. RULES, TX ACTION-ORDER IND. LOOKUP (FIG. 4C).

Fig. 2a Fig. 2b Fig. 3a Fig. 3b Fig. 3c Fig. 4a Fig. 4b Fig. 4cingress egress total ingress egress total ingress egress total ingress egress total ingress egress total ingress egress total

IPv4 benchmarks1 4677.2 3939 4642 8581 3215.6 3145.9 6361.5 2338.8 2305.5 4644.3 2338.8 2280.3 4619.1 2923.2 2338.8 2620.4 4959.2 2338.8 2620.4 4959.22 3023.8 3067.4 3067 6134.4 2078.9 2044.2 4123.1 671 670.7 1341.7 575.1 566.9 1142 755.9 671 447.6 1118.6 575.1 447.6 1022.73 3272.9 3301.3 3319 6620.3 2250.1 2215.8 4465.9 928.4 952.7 1881.1 928.4 941.8 1870.2 1022.8 928.4 557.8 1486.2 928.4 557.8 1486.24 268.8 251.4 270 521.4 159.6 156.6 316.2 109.9 109.5 219.4 102.1 100.6 202.7 117.6 109.9 110.4 220.3 102.1 110.4 212.5

IPv4 benchmarks with random padding to 128 bits1 52538 52538 52538 105076 10293 10294 20587 9876 9882 19758 9850 9851 19702 9134 9876 9223 19099 9850 9223 190732 52545 52545 52545 105091 10384 10386 20770 10267 10270 20538 9852 9901 19753 9560 10267 9981 20249 9852 9981 198343 64258 64258 64258 128517 13178 13182 26360 13048 13035 26083 12048 12149 24197 12883 13048 12112 25160 12048 12112 241614 1832.8 1832.8 1832 3664.8 272 272.6 544.6 257.7 258.5 516.2 286.3 286.8 573.1 272 257.7 236.2 493.9 257.7 236.2 493.9

IPv6 benchmarks1 188.9 188.9 188 376.9 22.1 21.9 44 20.6 20.5 41.1 20.6 20.5 41.1 22.1 20.6 19.7 40.3 20.6 19.7 40.32 189 189 189 378 22.1 22 44.1 22.1 22 44.1 22.1 22 44.1 22.1 22.1 20.1 42.2 22.1 20.1 42.23 188.8 188.8 188 376.8 23.6 23.5 47.1 20.6 20.4 41 20.6 20.4 41 20.6 20.6 19 39.6 20.6 19 39.64 188.9 188.9 188 376.9 22.1 22 44.1 20.6 20.4 41 20.6 20.4 41 20.6 20.6 19.9 40.5 20.6 19.9 40.5

TABLE IVSUMMARY TABLE: INGRESS, EGRESS, AND TOTAL CLASSIFIER SIZE FOR ALL FIB REPRESENTATIONS.

to various LPM group tables with short prefixes against tra-ditional approaches with single LPM table and long prefixes.For the multi-group implementation, we adopt the Intel DataPlane Development Kit (DPDK) [15]. The DPDK providesLPM (LPM6) libraries that implement the Longest PrefixMatch (LPM) table search method for IP (IPv6) packets. TheLPM implementation uses the DIR-24-8 algorithm [20] with2 levels of serial lookups with 24 bits and 8 bits respectively.In the IPv6 case, the LPM6 implementation uses DIR-24-8-..-8 with 14 levels; adding a rule can lead to the prefix

expansion (when the prefix length is not a multiple of 8 bitsand greater than 24 bits). We modified the original DPDKLPM implementation for two cases: single 16 bits lookup forour multi-group classification and serial 16-8-8 bits lookupsfor Cisco-style representation [9].

A. Multi-Group vs. Base Line IP RepresentationSince IP implementation in DPDK uses only two consecu-

tive lookups, we do not expect any lookup time improvementfor our multi-group representation. In the IP case the goal isto confirm that the desired number of parallel groups can be

Page 9: FIB Efficiency in Distributed Platformseprints.networks.imdea.org/1375/1/PID4413703.pdf · 2016. 8. 18. · FIB Efficiency in Distributed Platforms Kirill Kogan , Sergey I. Nikolenkoyz,

64 128 256 512 1024 1500Packet size (bytes)

2

4

6

8

10

12

14

Mpp

s

16x116x216x316x416x516x616x716+8+8gen

64 128 256 512 1024 1500Packet size (bytes)

7500

8000

8500

9000

9500

10000

Mbp

s

16x116x216x316x416x516x616x716+8+8gen

(a) (b)Fig. 5. Multi-group vs. 16-8-8 throughput: (a) packets transmitted; (b) Mbits per second.

looked up during a single base line IP lookup. Our approachshould benefit more from “instruction level parallelism” ofmodern CPU pipelines because the processor is not waitingfor memory addresses for next lookups as in serial cases. Allgroup addresses are known and will be efficiently prefetchedby the processor. Thus, in our experiments we focus on howmany 16 bits pseudo-parallel lookups can be executed in thetime of a single 16-8-8 serial lookup. In the experiments, weused two physical servers with two 8 core Intel Xeon CPUE5-2670 @ 2.60GH, 96 GB DDR3 RAM, Ubuntu 14.04.2LTS and two Intel X520 NIC with two 10GE ports. On thefirst server we ran the Pktgen-dpdk [21] traffic generator; onthe second, our LPM implementations. Physical ports on thefirst NIC are connected with physical ports on the secondNIC. In LPM code we measure the average number of cyclesrequired for lookup. We use modified rte lpm lookup() callsand two rte rdtsc precise() calls before and after the lookupto get the number of cycles required for execution the set oflookups. Fig. 5 shows that it is possible to run multi-grouprepresentation on 1-6 parallel groups in the same time as asingle IPv4 lookup, which leads to significant memory sizereduction with no loss in performance.

B. Multi-group vs Base-line IPv6 representation

Since the baseline implementation in DPDK for IPv6 re-quires 14 serial lookups, we can check for improvement inlookup time. We compared multi-group representations basedon l = 24 or 32 versus baseline DPDK implementationprovided by LPM6. Fig. 6 shows that it is possible to runmulti-group representation on 14-18 groups in the same timeas a single IPv6 lookup implemented in DPDK. Hence, IPv6FIBs representations that require less than this number ofgroups can lead to substantial improvement in the lookup time.

VIII. RELATED WORK

Finding efficient implementations of FIBs and packet clas-sifiers is a well studied problem, with solutions divided intotwo main categories: algorithmic solutions with data structuresdedicated for FIBs and TCAM-based implementations appli-cable to more general classifiers. The ORTC algorithm [22]obtains an optimal representation of FIBs in the minimalpossible number of prefix rules. A similar approach calledFIB aggregation combines rules with the same action [7],

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20# pseudoparallel lookups

600

800

1000

1200

1400

1600

1800

2000

Cyc

les

IPv6 with LPM6l=24l=32

Fig. 6. Number of clock cycles as a function of the number of parallellookups: LPM6 and multi-group representations for l = 24 and l = 32.

[13], [23], [24]. Interestingly, Retvari et al. [7] showed that theentropy lower bounds on the size of FIBs are much smallerthan actual FIB size. Compression schemes that try to achievethese bounds have also been described [13]. Coding schemesto reduce the width of forwarding tables were suggestedin [25], [26]. Likewise, several works [8], [27]–[29] suggestedhow to partition the multi-dimensional rule space. Hash-basedsolutions to match a packet to its possible matching rules havealso been discussed [30] , e.g., consisting in splitting rulesinto subsets with disjoint rules such that a Bloom filter canbe used for a fast indication whether a match can be foundin each subset. Different approaches have been describedto reduce TCAM space [31]–[34]. They include removingredundancies [35], applying block permutations in the headerspace [36] and performing transformations [37]. Kogan etal. [17] introduced representations based on filter disjointnessand addressed efficient time-space tradeoffs for multi-fieldclassification, where fields are represented by ranges. Exploit-ing of filter disjointness for FIBs representations is consideredin [38].

A recent approach suggested distributing the rules of aclassifier among several limited-size TCAMs, especially insoftware-defined networks [39]–[41]. [39], [40] studies how todecompose a large forwarding table into several subtables thatcan be located based on previous knowledge on the paths ofpackets. [41] suggested to solve classification decisions thatcannot be determined within a switch in special dedicatedauthority switches in the data plane while avoiding accessingthe controller. Lossy compression was presented in [42],achieving memory efficiency by completing the classification

Page 10: FIB Efficiency in Distributed Platformseprints.networks.imdea.org/1375/1/PID4413703.pdf · 2016. 8. 18. · FIB Efficiency in Distributed Platforms Kirill Kogan , Sergey I. Nikolenkoyz,

of some small amount of traffic in a slower memory. Reducingthe memory size of representing network policies by smartIP address allocation was described in [43]. Our work canserve as a level of abstraction that simplifies a classifierbefore it is represented with one of the above techniques. Forinstance, before rule splitting [44] we can reduce classifierwidth in order to reduce the number of disjoint sets that therules are split into and accordingly decrease the number ofrequired Bloom filters. As a second example, reducing theclassifier width clearly decreases the size of the binary tree thatrepresents the classifier and decreases entropy lower boundson the memory in the classifier’s representation [7].

IX. CONCLUSION

Identifying appropriate invariants around which to imple-ment lookup can significantly affect lookup efficiency and,in particular, space requirements for the corresponding datastructures. In this work we introduce new structural propertiesof classifiers that can be leveraged by appropriate mechanismsto achieve significantly better results than optimized equivalentsolutions on distributed platforms. Our methods define anabstraction layer which does not increase lookup time andminimizes memory for lookup tables whilst remaining trans-parent to methods used for representation of lookup tables innetwork elements.

REFERENCES

[1] “The weekly routing table report,” http://bgp.potaroo.net/index-bgp.html,2015.

[2] “The Size of the Internet Global Routing Table and Its Potential SideEffects,” https://supportforums.cisco.com/document/12202206/size-internet-global-routing-table-and-its-potential-side-effects, 2014.

[3] A. Elmokashfi, A. Kvalbein, and C. Dovrolis, “BGP churn evolution:A perspective from the core,” Trans. Networking, vol. 20, no. 2, pp.571–584, 2012.

[4] A. Elmokashfi and A. Dhamdhere, “Revisiting BGP churn growth,”CCR, vol. 44, no. 1, pp. 5–12, 2014.

[5] V. Srinivasan and G. Varghese, “Faster IP lookups using controlled prefixexpansion,” in SIGMETRICS, 1998, pp. 1–10.

[6] W. Eatherton, G. Varghese, and Z. Dittia, “Tree bitmap: hard-ware/software IP lookups with incremental updates,” CCR, vol. 34, no. 2,pp. 97–122, 2004.

[7] G. Retvari, J. Tapolcai, A. Korosi, A. Majdan, and Z. Heszberger, “Com-pressing IP forwarding tables: towards entropy bounds and beyond,” inSIGCOMM, 2013, pp. 111–122.

[8] T. Yang, G. Xie, Y. Li, Q. Fu, A. X. Liu, Q. Li, and L. Mathy, “GuaranteeIP lookup performance with FIB explosion,” in SIGCOMM, 2014, pp.39–50.

[9] “Cisco XR 12000 and 12000 Series Gigabit Ethernet Line Cards,”http://www.cisco.com/c/en/us/products/collateral/routers/xr12000series-router/product data sheet0900aecd803f856f.html.

[10] “Cisco CRS forwarding Processor Cards,”http://www.cisco.com/c/en/us/products/collateral/routers/carrierrouting-system/datasheetc78730790.pdf.

[11] “The Cisco QuantumFlow Processor: Cisco’s Next Generation NetworkProcessor,” http://www.cisco.com/c/en/us/products/collateral/routers/asr-1000seriesaggregationservicesrouters/solution overview c22-448936.html.

[12] “CISCO 12000 Series routers,” http://www.cisco.com/en/US/ prod-ucts/hw/routers/ps167/index.html.

[13] A. Korosi, J. Tapolcai, B. Mihalka, G. Meszaros, and G. Retvari,“Compressing IP forwarding tables: Realizing information-theoreticalspace bounds and fast lookups simultaneously,” in ICNP, 2014, pp. 332–343.

[14] H. Song and J. Turner, “ABC: Adaptive binary cuttings for multidi-mensional packet classification,” Trans. Networking, vol. 21, no. 1, pp.98–109, 2013.

[15] “Data plane development kit (dpdk),” http://dpdk.org.[16] “Pipelined packet switching and queuing architecture,” US7643486B2,

2010.[17] K. Kogan, S. I. Nikolenko, O. Rottenstreich, W. Culhane, and P. T.

Eugster, “Exploiting order independence for scalable and expressivepacket classification,” Trans. Networking, vol. 24, no. 2, pp. 1251–1264,2016.

[18] E. Allender, L. Hellerstein, P. McCabe, T. Pitassi, and M. E. Saks,“Minimizing DNF formulas and AC0

d circuits given a truth table,” inConference on Computational Complexity, 2006.

[19] “Internet2 - forwarding information base,”http://vn.grnoc.iu.edu/Internet2/fib/index.cgi.

[20] P. Gupta, B. Prabhakar, and S. P. Boyd, “Near optimal routing lookupswith bounded worst case performance,” in INFOCOM, 2000, pp. 1184–1192.

[21] “The pktgen dpdk application,” http://pktgen.readthedocs.org.[22] R. Draves, C. King, V. Srinivasan, and B. Zill, “Constructing optimal

IP routing tables,” in Infocom, 1999.[23] Z. A. Uzmi, M. E. Nebel, A. Tariq, S. Jawad, R. Chen, A. Shaikh,

J. Wang, and P. Francis, “SMALTA: practical and near-optimal FIBaggregation,” in Co-NEXT, 2011.

[24] X. Zhao, Y. Liu, L. Wang, and B. Zhang, “On the aggregatability ofrouter forwarding tables,” in Infocom, 2010.

[25] O. Rottenstreich et al., “Compressing forwarding tables for datacenterscalability,” JSAC, vol. 32, no. 1, pp. 138 – 151, 2014.

[26] O. Rottenstreich, A. Berman, Y. Cassuto, and I. Keslassy, “Compressionfor fixed-width memories,” in ISIT, 2013.

[27] P. Gupta and N. McKeown, “Classifying packets with hierarchicalintelligent cuttings,” Micro, vol. 20, no. 1, pp. 34–41, 2000.

[28] S. Singh, F. Baboescu, G. Varghese, and J. Wang, “Packet classificationusing multidimensional cutting,” in SIGCOMM, 2003.

[29] B. Vamanan, G. Voskuilen, and T. N. Vijaykumar, “EffiCuts: optimizingpacket classification for memory and throughput,” in SIGCOMM, 2010.

[30] S. Dharmapurikar, P. Krishnamurthy, and D. E. Taylor, “Longest prefixmatching using bloom filters,” Trans. Networking, vol. 14, no. 2, pp.397–409, 2006.

[31] A. Kesselman, K. Kogan, S. Nemzer, and M. Segal, “Space and speedtradeoffs in TCAM hierarchical packet classification,” J. Comput. Syst.Sci., vol. 79, no. 1, pp. 111–121, 2013.

[32] K. Kogan, S. I. Nikolenko, W. Culhane, P. Eugster, and E. Ruan,“Towards efficient implementation of packet classifiers in sdn/openflow,”in HotSDN, 2013, pp. 153–154.

[33] K. Kogan, S. I. Nikolenko, P. T. Eugster, and E. Ruan, “Strategies formitigating TCAM space bottlenecks,” in HOTI, 2014, pp. 25–32.

[34] O. Rottenstreich, I. Keslassy, A. Hassidim, H. Kaplan, and E. Porat, “Op-timal in/out TCAM encodings of ranges,” Trans. Networking, vol. 24,no. 1, pp. 555–568, 2016.

[35] A. X. Liu, C. R. Meiners, and Y. Zhou, “All-match based completeredundancy removal for packet classifiers in TCAMs,” in Infocom, 2008.

[36] R. Wei, Y. Xu, and H. J. Chao, “Block permutations in Boolean spaceto minimize TCAM for packet classification,” in Infocom, 2012.

[37] C. R. Meiners, A. X. Liu, and E. Torng, “Topological transformationapproaches to TCAM-based packet classification,” Trans. Networking,vol. 19, no. 1, pp. 237–250, 2011.

[38] S. I. Nikolenko, K. Kogan, G. Retvari, E. R. Kovacs, and A. Shalimov,“How to represent ipv6 forwarding tables on ipv4 or mpls dataplanes,”in Infocom Workshops, 2016, pp. 1–6.

[39] Y. Kanizo, D. Hay, and I. Keslassy, “Palette: Distributing tables insoftware-defined networks,” in IEEE Infocom, 2013.

[40] N. Kang, Z. Liu, J. Rexford, and D. Walker, “Optimizing the ”one bigswitch” abstraction in software-defined networks,” in CoNEXT, 2013,pp. 13–24.

[41] M. Yu, J. Rexford, M. J. Freedman, and J. Wang, “Scalable flow-basednetworking with DIFANE,” in SIGCOMM, 2010.

[42] O. Rottenstreich and J. Tapolcai, “Lossy compression of packet classi-fiers,” in ANCS, 2015.

[43] N. Kang, O. Rottenstreich, and S. G. R. J. Rexford, “Alpaca: Compactnetwork policies with attribute-carrying addresses,” in CoNext, 2015.

[44] S. Dharmapurikar, H. Song, J. S. Turner, and J. W. Lockwood, “Fastpacket classification using Bloom filters,” in ANCS, 2006.