infinite cacheflow in software-defined networks · change point (ixp) that supports the reannz...

6
Infinite CacheFlow in Software-Defined Networks Naga Katta Princeton University [email protected] Omid Alipourfard University of Southern California [email protected] Jennifer Rexford Princeton University [email protected] David Walker Princeton University [email protected] ABSTRACT Software-Defined Networking (SDN) enables fine-grained poli- cies for firewalls, load balancers, routers, traffic monitoring, and other functionality. While Ternary Content Address- able Memory (TCAM) enables OpenFlow switches to pro- cess packets at high speed based on multiple header fields, today’s commodity switches support just thousands to tens of thousands of rules. To realize the potential of SDN on this hardware, we need efficient ways to support the abstraction of a switch with arbitrarily large rule tables. To do so, we de- fine a hardware-software hybrid switch design that relies on rule caching to provide large rule tables at low cost. Unlike traditional caching solutions, we neither cache individual rules (to respect rule dependencies) nor compress rules (to preserve the per-rule traffic counts). Instead we “splice” long dependency chains to cache smaller groups of rules while preserving the semantics of the network policy. Our design satisfies four core criteria: (1) elasticity (combining the best of hardware and software switches), (2) transparency (faith- fully supporting native OpenFlow semantics, including traf- fic counters), (3) fine-grained rule caching (placing popular rules in the TCAM, despite dependencies on less-popular rules), and (4) adaptability (to enable incremental changes to the rule caching as the policy changes). Categories and Subject Descriptors C.2.3 [Computer-Communication Networks]: Network Architecture and Design Keywords Rule Caching; Software-Defined Networking; OpenFlow; Com- modity Switch; TCAM. 1. INTRODUCTION Software-Defined Networking (SDN) enables a wide range of applications by applying fine-grained packet-processing Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full cita- tion on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. HotSDN’14, August 22, 2014, Chicago, IL, USA. Copyright 2014 ACM 978-1-4503-2989-7/14/08 ...$15.00. http://dx.doi.org/10.1145/2620728.2620734. rules that match on multiple header fields [1]. For exam- ple, an access-control application may match on the “five tuple” (e.g., source and destination IP addresses, transport protocol, and source and destination port numbers), while a server load-balancing application may match on the des- tination IP address and the source IP prefix. The finer the granularity of the policies, the larger the number of rules in the switches. More sophisticated SDN controller applica- tions combine multiple functions (e.g., routing, access con- trol, monitoring, and server load balancing) into a single set of rules, leading to more rules at even finer granularity. Hardware switches store these rules in Ternary Content Addressable Memory (TCAM) [2] that performs a parallel lookup on wildcard patterns at line rate. Today’s commod- ity switches support just 2-20K rules [3]. High-end backbone routers handle much larger forwarding tables, but typically match only on destination IP prefix (and optionally a VLAN tag or MPLS label) and are much more expensive. Con- tinued advances in switch design will undoubtedly lead to larger rule tables [4], but the cost and power requirements for TCAMs will continue to limit the granularity of policies SDNs can support. For example, TCAMs are 400X more expensive [5] and consume 100X more power [6] per Mbit than the RAM-based storage in servers. On the surface, software switches built on commodity servers are an attractive alternative. Modern software switches process packets at a high rate [7–9] (about 40 Gbps on a quad-core machine) and store large rule tables in main memory (and the L1 and L2 cache). However, software switches have relatively limited port density, and cannot handle multi-dimensional wildcard rules efficiently. While software switches like Open vSwitch cache exact-match rules in the “fast path” in the kernel, the first packet of each mi- croflow undergoes slower user-space processing (i.e., linear search) to identify the highest-priority matching wildcard rule [10]. Instead, we argue that the TCAM in the hardware switch should handle as many of the packets as possible, and divert the remaining traffic to a software switch (or software agent on the hardware switch) for processing. This gives an un- modified controller application the illusion of an arbitrarily large rule table, while minimizing the performance penalty for exceeding the TCAM size. For example, an 800 Gbps hardware switch, together with a single 40 Gbps software switch could easily handle traffic with a 5% “miss rate” in the TCAM. The two main approaches for managing rule-table space are compression and caching. Rule compression combines

Upload: others

Post on 22-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Infinite CacheFlow in Software-Defined Networks · change Point (IXP) that supports the REANNZ research and education network [22]. This real-world policy has 460 OpenFlow 1.0 rules

Infinite CacheFlow in Software-Defined Networks

Naga KattaPrinceton University

[email protected]

Omid AlipourfardUniversity of Southern

[email protected]

Jennifer RexfordPrinceton University

[email protected]

David WalkerPrinceton University

[email protected]

ABSTRACT

Software-Defined Networking (SDN) enables fine-grained poli-cies for firewalls, load balancers, routers, traffic monitoring,and other functionality. While Ternary Content Address-able Memory (TCAM) enables OpenFlow switches to pro-cess packets at high speed based on multiple header fields,today’s commodity switches support just thousands to tensof thousands of rules. To realize the potential of SDN on thishardware, we need efficient ways to support the abstractionof a switch with arbitrarily large rule tables. To do so, we de-fine a hardware-software hybrid switch design that relies onrule caching to provide large rule tables at low cost. Unliketraditional caching solutions, we neither cache individualrules (to respect rule dependencies) nor compress rules (topreserve the per-rule traffic counts). Instead we“splice” longdependency chains to cache smaller groups of rules whilepreserving the semantics of the network policy. Our designsatisfies four core criteria: (1) elasticity (combining the bestof hardware and software switches), (2) transparency (faith-fully supporting native OpenFlow semantics, including traf-fic counters), (3) fine-grained rule caching (placing popularrules in the TCAM, despite dependencies on less-popularrules), and (4) adaptability (to enable incremental changesto the rule caching as the policy changes).

Categories and Subject Descriptors

C.2.3 [Computer-Communication Networks]: NetworkArchitecture and Design

Keywords

Rule Caching; Software-Defined Networking; OpenFlow; Com-modity Switch; TCAM.

1. INTRODUCTIONSoftware-Defined Networking (SDN) enables a wide range

of applications by applying fine-grained packet-processing

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for profit or commercial advantage and that copies bear this notice and the full cita-

tion on the first page. Copyrights for components of this work owned by others than

ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-

publish, to post on servers or to redistribute to lists, requires prior specific permission

and/or a fee. Request permissions from [email protected].

HotSDN’14, August 22, 2014, Chicago, IL, USA.

Copyright 2014 ACM 978-1-4503-2989-7/14/08 ...$15.00.

http://dx.doi.org/10.1145/2620728.2620734.

rules that match on multiple header fields [1]. For exam-ple, an access-control application may match on the “fivetuple” (e.g., source and destination IP addresses, transportprotocol, and source and destination port numbers), whilea server load-balancing application may match on the des-tination IP address and the source IP prefix. The finer thegranularity of the policies, the larger the number of rulesin the switches. More sophisticated SDN controller applica-tions combine multiple functions (e.g., routing, access con-trol, monitoring, and server load balancing) into a single setof rules, leading to more rules at even finer granularity.

Hardware switches store these rules in Ternary ContentAddressable Memory (TCAM) [2] that performs a parallellookup on wildcard patterns at line rate. Today’s commod-ity switches support just 2-20K rules [3]. High-end backbonerouters handle much larger forwarding tables, but typicallymatch only on destination IP prefix (and optionally a VLANtag or MPLS label) and are much more expensive. Con-tinued advances in switch design will undoubtedly lead tolarger rule tables [4], but the cost and power requirementsfor TCAMs will continue to limit the granularity of policiesSDNs can support. For example, TCAMs are 400X moreexpensive [5] and consume 100X more power [6] per Mbitthan the RAM-based storage in servers.

On the surface, software switches built on commodityservers are an attractive alternative. Modern software switchesprocess packets at a high rate [7–9] (about 40 Gbps ona quad-core machine) and store large rule tables in mainmemory (and the L1 and L2 cache). However, softwareswitches have relatively limited port density, and cannothandle multi-dimensional wildcard rules efficiently. Whilesoftware switches like Open vSwitch cache exact-match rulesin the “fast path” in the kernel, the first packet of each mi-croflow undergoes slower user-space processing (i.e., linearsearch) to identify the highest-priority matching wildcardrule [10].

Instead, we argue that the TCAM in the hardware switchshould handle as many of the packets as possible, and divertthe remaining traffic to a software switch (or software agenton the hardware switch) for processing. This gives an un-modified controller application the illusion of an arbitrarilylarge rule table, while minimizing the performance penaltyfor exceeding the TCAM size. For example, an 800 Gbpshardware switch, together with a single 40 Gbps softwareswitch could easily handle traffic with a 5% “miss rate” inthe TCAM.

The two main approaches for managing rule-table spaceare compression and caching. Rule compression combines

Page 2: Infinite CacheFlow in Software-Defined Networks · change Point (IXP) that supports the REANNZ research and education network [22]. This real-world policy has 460 OpenFlow 1.0 rules
Page 3: Infinite CacheFlow in Software-Defined Networks · change Point (IXP) that supports the REANNZ research and education network [22]. This real-world policy has 460 OpenFlow 1.0 rules
Page 4: Infinite CacheFlow in Software-Defined Networks · change Point (IXP) that supports the REANNZ research and education network [22]. This real-world policy has 460 OpenFlow 1.0 rules
Page 5: Infinite CacheFlow in Software-Defined Networks · change Point (IXP) that supports the REANNZ research and education network [22]. This real-world policy has 460 OpenFlow 1.0 rules

0

20

40

60

80

100

0.5 1 2 5 10 25 50

% C

ache-h

it tra

ffic

% TCAM Cache Size (Log scale)

Mixed-Set AlgoCover-Set Algo

Dependent-Set Algo

Figure 5: ClassBench Access Control Policy

0

20

40

60

80

100

1 2 5 10 25 50

% C

ache-h

it tra

ffic

% TCAM Cache Size (Log scale)

Mixed-Set AlgoCover-Set Algo

Dependent-Set Algo

Figure 6: Stanford Backbone Router Policy

ware switch simply matches on the tag, pops the tag, andforwards to the designated output port(s). If a cache-missrule has an action that sends the packet to the controller,the CacheMaster transforms the packet_in message fromthe software switch by (i) copying the inport from the taginto the inport of the packet_in message and (ii) strippingthe tag from the packet before sending the message to thecontroller.

Traffic counts, barrier messages, and rule time-

outs: CacheFlow preserves the semantics of OpenFlow 1.0constructs like queries on traffic statistics, barrier messages,and rule timeouts by emulating all of these features in theCacheMaster—i.e, the behavior of the switches maintainedby CacheFlow is no different from that of a single OpenFlowswitch with infinite rule space. For example, CacheMastermaintains packet and byte counts for each rule installed bythe controller, updating its local information each time arule moves to a different part of the cache hierarchy. Simi-larly, CacheFlow emulates rule timeouts by installing ruleswithout timeouts, and explicitly removing the rules when thesoftware timeout expires, similar to prior work on LIME [19].

3.2 Implementation and EvaluationWe implemented a prototype for CacheFlow in Python

on top of the Ryu controller platform. At the moment, theprototype transparently supports the semantics of the Open-Flow 1.0 features mentioned earlier, except rule timeoutsand barrier messages. We ran the prototype on a collectionof two Open VSwitch 1.10 instances where one switch acts asthe hardware cache (where the rule-table capacity is limitedby CacheFlow) and another acts as the software switch thatholds the cache-miss rules. We evaluate our prototype forthree algorithms (dependent-set, cover-set, and mixed-set)and three policies, and measure the cache-hit rate.

The first policy is a synthetic Access Control List (ACL)generated using ClassBench [20]. The policy has 10K rulesthat match on the source IP address, with long dependencychains with maximum depth of 10. In the absence of a traf-

0

20

40

60

80

100

1 2 5 10 25 50

% C

ache-h

it tra

ffic

% TCAM Cache Size (Log scale)

Mixed-Set Algo

Figure 7: REANNZ IXP Policy

fic trace, we assume the weight of each rule is proportionalto the portion of flow space it matches. Figure 5 showsthe cache-hit percentage across a range of TCAM sizes, ex-pressed relative to the size of the policy. The mixed-setand cover-set algorithms have similar cache-hit rates and domuch better than the dependent-set algorithm because theysplice the longer dependency chains in the policy. Whilemixed-set and cover-set have a hit rate of around 87% for1% cache size (of total rule-table), all three algorithms havearound 90% hit rate with just 5% of the rules in the TCAM.

Figure 6 shows results for a real-world Cisco router con-figuration on a Stanford backbone router [21], which wetransformed into an OpenFlow policy. The policy has 5KOpenFlow 1.0 rules that match on the destination IP ad-dress, with dependency chains that vary in depth from 1 to6. We analyzed the cache-hit ratio by assigning traffic vol-ume to each rule proportional to the size of its flow space.The mixed-set algorithm does the best among all three anddependent-set does the worst because there is a mixture ofshallow and deep dependencies. While there are differencesin the cache-hit rate, all three algorithms achieve at least70% hit rate with a cache size of 5% of the policy.

Figure 7 shows results for an SDN-enabled Internet eX-change Point (IXP) that supports the REANNZ researchand education network [22]. This real-world policy has 460OpenFlow 1.0 rules matching on multiple packet headers likeinport, dst_ip, eth_type, src_mac, etc. Most dependencychains have depth 1. We replayed a two-day traffic tracefrom the IXP, and updated the cache every two minutesand measured the cache-hit rate over the two-day period.Because of the shallow dependencies, all three algorithmshave the same performance and hence we only show themixed-set algorithm in the figure. The mixed-set algorithmsees a cache hit rate of 75% with a hardware cache of just2% of the rules; with just 10% of the rules, the cache hitrate increases to as much as 97%.

4. RELATED WORKEarlier work on IP route caching [12–15] store only a small

number of IP prefixes in the switch line cards and the restin inexpensive slow memory. Most of them exploit the factthat IP traffic exhibits both temporal and spatial localityto implement route caching. However, most of them do notdeal with cross-rule dependencies and none of them deal withcomplex multidimensional packet-classification. The TCAMRazor [11, 23] line of work compresses multi-dimensionalpacket-classification rules to minimal TCAM rules using de-cision trees and multi-dimensional topological transforma-tion. Dong et. al. [24] proposes a caching technique forternary rules by creating new rules out of existing rules

Page 6: Infinite CacheFlow in Software-Defined Networks · change Point (IXP) that supports the REANNZ research and education network [22]. This real-world policy has 460 OpenFlow 1.0 rules

that handle evolving traffic but requires special hardwareand does not preserve counters. In general, the above tech-niques that use compression to reduce TCAM space sufferfrom not being able to (i) preserve rule counters and (ii)make efficient incremental changes. In recent SDN litera-ture, DIFANE [18] advocates caching of ternary rules, butuses TCAM to handle cache misses—leading to a TCAM-hungry solution. Other work [25,26] shows how to distributerules over multiple switches along a path, but cannot handlerule sets larger than the aggregate table size. vCRIB [27]redirects packets over a longer path before the entire policyis applied and their partitioning approach is not amenableto incremental change or transparency.

5. CONCLUSIONCacheFlow enables fine-grained policies in SDNs by op-

timizing the use of the limited rule-table space in hard-ware switches, while preserving the semantics of OpenFlow.Cache “misses” could be handled in several different ways:(i) an inline CPU or network processor in the data plane ofthe hardware switch (if available), (ii) one or more softwareswitches (to keep packets in the “fast path”, at the expenseof introducing new components in the network), (iii) in asoftware agent on the hardware switch (to minimize the useof link ports and bandwidth, at the expense of imposing ex-tra CPU and I/O load on the switch), or (iv) at the SDNcontroller (to avoid introducing new components while alsoenabling new network-wide optimizations, at the expense ofextra latency and controller load). In our ongoing work, weplan to explore these trade-offs, and also evaluate our algo-rithms under a wider range of SDN policies and workloads.

Acknowledgments. The authors wish to thank the HotSDNreviewers and members of the Frenetic project for their feed-back. We would especially like to thank Josh Bailey for giv-ing us access to the REANZZ OpenFlow policy and for beingpart of helpful discussions related to the system implemen-tation. This work was supported in part by the NSF underthe grant TS-1111520; the ONR under award N00014-12-1-0757; and a Google Research Award.

6. REFERENCES[1] N. McKeown, T. Anderson, H. Balakrishnan,

G. Parulkar, L. Peterson, J. Rexford, S. Shenker, andJ. Turner, “Openflow: Enabling innovation in campusnetworks,” SIGCOMM CCR, vol. 38, no. 2, pp. 69–74,2008.

[2] B. Salisbury, “TCAMs and OpenFlow: What everySDN practitioner must know.” Seehttp://tinyurl.com/kjy99uw, 2012.

[3] B. Stephens, A. Cox, W. Felter, C. Dixon, andJ. Carter, “PAST: Scalable Ethernet for data centers,”in ACM SIGCOMM CoNext, 2012.

[4] P. Bosshart, G. Gibb, H.-S. Kim, G. Varghese,N. McKeown, M. Izzard, F. Mujica, and M. Horowitz,“Forwarding metamorphosis: Fast programmablematch-action processing in hardware for SDN,” inACM SIGCOMM, 2013.

[5] “SDN system performance.” Seehttp://pica8.org/blogs/?p=201, 2012.

[6] E. Spitznagel, D. Taylor, and J. Turner, “Packetclassification using extended TCAMs,” in ICNP, 2003.

[7] M. Dobrescu, N. Egi, K. Argyraki, B.-G. Chun,K. Fall, G. Iannaccone, A. Knies, M. Manesh, andS. Ratnasamy, “RouteBricks: Exploiting parallelism toscale software routers,” in SOSP, 2009.

[8] S. Han, K. Jang, K. Park, and S. Moon,“PacketShader: A GPU-accelerated software router,”in SIGCOMM, 2010.

[9] “Intel DPDK overview.” Seehttp://tinyurl.com/cepawzo.

[10] “The rise of soft switching.” Seehttp://tinyurl.com/bjz8469.

[11] A. X. Liu, C. R. Meiners, and E. Torng, “TCAMRazor: A systematic approach towards minimizingpacket classifiers in TCAMs,” IEEE/ACM Trans.Netw, Apr. 2010.

[12] N. Sarrar, S. Uhlig, A. Feldmann, R. Sherwood, andX. Huang, “Leveraging Zipf’s law for trafficoffloading,” SIGCOMM Comput. Commun. Rev. 2012.

[13] C. Kim, M. Caesar, A. Gerber, and J. Rexford,“Revisiting route caching: The world should be flat,”in Passive and Active Measurement, 2009.

[14] D. Feldmeier, “Improving gateway performance with arouting-table cache,” in INFOCOM, 1988.

[15] H. Liu, “Routing prefix caching in network processordesign,” in ICCN, 2001.

[16] G. Borradaile, B. Heeringa, and G. Wilfong, “Theknapsack problem with neighbour constraints,” J. ofDiscrete Algorithms, vol. 16, pp. 224–235, Oct. 2012.

[17] S. Khuller, A. Moss, and J. S. Naor, “The budgetedmaximum coverage problem,” Inf. Process. Lett., Apr.1999.

[18] M. Yu, J. Rexford, M. J. Freedman, and J. Wang,“Scalable flow-based networking with DIFANE,” inACM SIGCOMM, 2010.

[19] E. Keller, S. Ghorbani, M. Caesar, and J. Rexford,“Live migration of an entire network (and its hosts),”in HotNets, Oct. 2012.

[20] D. E. Taylor and J. S. Turner, “Classbench: A packetclassification benchmark,” in IEEE INFOCOM, 2004.

[21] “Stanford backbone router forwarding configuration.”http://tinyurl.com/o8glh5n.

[22] “REANZZ.” http://reannz.co.nz/.

[23] C. R. Meiners, A. X. Liu, and E. Torng, “Topologicaltransformation approaches to TCAM-based packetclassification,” IEEE/ACM Trans. Netw., vol. 19, Feb.2011.

[24] Q. Dong, S. Banerjee, J. Wang, and D. Agrawal,“Wire speed packet classification without TCAMs: Afew more registers (and a bit of logic) are enough,” inACM SIGMETRICS, 2007.

[25] Y. Kanizo, D. Hay, and I. Keslassy, “Palette:Distributing tables in software-defined networks,” inIEEE Infocom Mini-conference, Apr. 2013.

[26] N. Kang, Z. Liu, J. Rexford, and D. Walker,“Optimizing the ’one big switch’ abstraction inSoftware Defined Networks,” in ACM SIGCOMMCoNext, Dec. 2013.

[27] M. Moshref, M. Yu, A. Sharma, and R. Govindan,“Scalable rule management for data centers,” in NSDI,2013.