chimpp: a click-based programming and simulation ...chimpp: a click-based programming and simulation...

15
Keyword(s): Abstract: © Chimpp: A Click-based Programming and Simulation Environment for Reconfigurable Networking Hardware Erik Rubow, Rick McGeer, Jeffrey C. Mogul, Amin Vahdat HP Laboratories HPL-2010-25 FPGAs, modular programming, Click Reconfigurable network hardware makes it easier to experiment with and prototype high-speed networking systems. However, these devices are still relatively hard to program; for example, the NetFPGA requires users to develop in Verilog. Further, these devices are commonly designed to work with software on a host computer, requiring the co-development of these hardware and software components. We address this situation with Chimpp, a development environment for reconfigurable network hardware, modeled on the popular Click software modular router system. Chimpp employs a modular approach to designing hardware-based packet-processing systems, featuring a simple configuration language similar to that of Click. We demonstrate this development environment by targeting the NetFPGA platform. Chimpp can be combined with Click itself at the software layer for a highly modular, mixed hardware and software design framework. We also enable the integrated simulation of the hardware and software components of a network device together with other network devices using the OMNeT++ network imulator. In contrast to some prior work, Chimpp focuses on making experimentation easy, rather than on optimizing hardware performance. Chimpp also avoids unnecessary restrictions on communication patterns and design styles such as were imposed by prior approaches. We describe our design and implementation of Chimpp, and provide initial evaluations showing how Chimpp makes it easy to implement, simulate, and modify a variety of packet-processing systems on the NetFPGA platform. External Posting Date: March 6, 2010 [Fulltext] Approved for External Publication Internal Posting Date: March 6, 2010 [Fulltext] Copyright 2010 Hewlett-Packard Development Company, L.P.

Upload: others

Post on 19-Mar-2020

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chimpp: A Click-based Programming and Simulation ...Chimpp: A Click-based Programming and Simulation Environment for Reconfigurable Networking Hardware Erik Rubow, Rick McGeer, Jeffrey

Keyword(s): Abstract:

©

Chimpp: A Click-based Programming and Simulation Environment forReconfigurable Networking HardwareErik Rubow, Rick McGeer, Jeffrey C. Mogul, Amin Vahdat

HP LaboratoriesHPL-2010-25

FPGAs, modular programming, Click

Reconfigurable network hardware makes it easier to experiment with and prototype high-speed networkingsystems. However, these devices are still relatively hard to program; for example, the NetFPGA requiresusers to develop in Verilog. Further, these devices are commonly designed to work with software on a hostcomputer, requiring the co-development of these hardware and software components. We address thissituation with Chimpp, a development environment for reconfigurable network hardware, modeled on thepopular Click software modular router system. Chimpp employs a modular approach to designinghardware-based packet-processing systems, featuring a simple configuration language similar to that ofClick. We demonstrate this development environment by targeting the NetFPGA platform. Chimpp can becombined with Click itself at the software layer for a highly modular, mixed hardware and software designframework. We also enable the integrated simulation of the hardware and software components of anetwork device together with other network devices using the OMNeT++ network imulator. In contrast tosome prior work, Chimpp focuses on making experimentation easy, rather than on optimizing hardwareperformance. Chimpp also avoids unnecessary restrictions on communication patterns and design stylessuch as were imposed by prior approaches. We describe our design and implementation of Chimpp, andprovide initial evaluations showing how Chimpp makes it easy to implement, simulate, and modify avariety of packet-processing systems on the NetFPGA platform.

External Posting Date: March 6, 2010 [Fulltext] Approved for External PublicationInternal Posting Date: March 6, 2010 [Fulltext]

Copyright 2010 Hewlett-Packard Development Company, L.P.

Page 2: Chimpp: A Click-based Programming and Simulation ...Chimpp: A Click-based Programming and Simulation Environment for Reconfigurable Networking Hardware Erik Rubow, Rick McGeer, Jeffrey

Chimpp: A Click-based Programming and Simulation Environmentfor Reconfigurable Networking Hardware

Erik Rubow+

[email protected] McGeer∗

[email protected] C. Mogul∗

[email protected] Vahdat+

[email protected]

∗HP Labs, Palo Alto, CA 94304 +UC San Diego

AbstractReconfigurable network hardware makes it easier to

experiment with and prototype high-speed networkingsystems. However, these devices are still relatively hardto program; for example, the NetFPGA requires users todevelop in Verilog. Further, these devices are commonlydesigned to work with software on a host computer, re-quiring the co-development of these hardware and soft-ware components.

We address this situation with Chimpp, a developmentenvironment for reconfigurable network hardware, mod-eled on the popular Click software modular router sys-tem. Chimpp employs a modular approach to designinghardware-based packet-processing systems, featuring asimple configuration language similar to that of Click.We demonstrate this development environment by tar-geting the NetFPGA platform. Chimpp can be combinedwith Click itself at the software layer for a highly modu-lar, mixed hardware and software design framework. Wealso enable the integrated simulation of the hardware andsoftware components of a network device together withother network devices using the OMNeT++ network sim-ulator.

In contrast to some prior work, Chimpp focuses onmaking experimentation easy, rather than on optimizinghardware performance. Chimpp also avoids unnecessaryrestrictions on communication patterns and design stylessuch as were imposed by prior approaches.

We describe our design and implementation ofChimpp, and provide initial evaluations showing howChimpp makes it easy to implement, simulate, and mod-ify a variety of packet-processing systems on the NetF-PGA platform.

1 IntroductionReconfigurable network hardware, based on FPGAs,

makes it much easier to experiment with and rapidly de-velop new ideas in networking, while achieving packetand data rates unsustainable with software-based pro-totypes. Designing with FPGAs is considerably easierthan designing with ASICs, but unfortunately still morecumbersome than writing software, especially since the

typical tools for designing with FPGAs are unfamiliar tomany network experts.

The NetFPGA project [6] in particular has made ma-jor steps towards simplifying the development of FPGA-based network hardware, by providing an accessiblenetwork-specific reconfigurable hardware platform to-gether with an open-source collection of useful designs,as well as an active developer community. However,the NetFPGA requires users to develop using Verilog,which is unfamiliar to most network developers and re-searchers, and which can be cumbersome to use. Further-more, it can be difficult to reuse portions of contributedprojects for integration into other designs.

We address this situation with Chimpp, a developmentenvironment for the NetFPGA modeled on the popularClick software modular router [2]. Starting with theClick approach gives us a well-understood, modular ap-proach to designing packet-processing systems. Throughfiner-grained modularity of packet-processing elements,we simplify and encourage code reuse. By using a high-level Click-like configuration language, we decrease theeffort associated with integration and extension, and atthe same time make it possible to build a variety of in-teresting designs without the need to use Verilog. Wecall this “Chimpp” (Click for reconfigurable-HardwareImplementations of Modular Packet Processing).

It is often the case that packet processing devicesare made up of of some combination of hardware andsoftware components. We explore the combined useof Chimpp hardware elements and Click software ele-ments. The Click portion can operate as both a controlplane and as a slow path for handling uncommon traffic.Click elements can be associated with Chimpp elementsto manage the operation and internal state of hardwareelements. This approach results in highly modular andeasily extensible designs at both the hardware and thesoftware level. It also allows us to shift the boundary be-tween hardware and software; for example, an elementinitially implemented in software can be re-implementedin hardware with a similar interface, if more speed isneeded. Conversely, a hardware module can be shifted

1

Page 3: Chimpp: A Click-based Programming and Simulation ...Chimpp: A Click-based Programming and Simulation Environment for Reconfigurable Networking Hardware Erik Rubow, Rick McGeer, Jeffrey

2

to software if it becomes too complex, or if its perfor-mance proves to be non-critical.

We do not attempt to automatically convert Click el-ements into Chimpp elements, and we do not automat-ically generate element implementations, but rather weassume that Chimpp elements are either developed byhand or possibly generated by some other tool. Instead,we focus on providing a toolbox of useful elements withdefined module interfaces that can be automatically in-stantiated and interconnected from a high-level descrip-tion. We avoid any single standard interface but insteadallow new interfaces and elements to be easily definedwithin any number of add-on packages.

We also provide support for mixed simulation of hard-ware and software within a simulated network. This canbe particularly useful where a design consists of a singlelogical entity that spans the hardware/software boundary.When the co-development of hardware and software re-quires several changes to hardware, this kind of simula-tion can significantly decrease the development cycle byavoiding the time-consuming synthesis process. It alsoprovides greater visibility into a system while undergo-ing dynamic and realistic hardware/software interactionsand traffic patterns. For example, TCP connections anddynamic routing protocols can be simulated in this envi-ronment.

In this paper, we describe our design and implementa-tion of Chimpp, as well as some initial evaluations thatshow how Chimpp makes it easy to implement, simulate,and modify NetFPGA-based packet-processing systems.

Our specific contributions include:

• Generation of a top-level Verilog file from a high-level script.

• Simple XML-based definitions of elements andtheir interfaces.

• A framework for designing systems with Click soft-ware elements and Chimpp hardware elements.

• Integration of combined hardware and softwaresimulation within a network simulator.

2 Related WorkSeveral previous systems have provided high-level

programming of reconfigurable network hardware,mostly using the Click model. We start with a brief sum-mary of the essential features of Click, and then describethe other prior work.

2.1 ClickThe Click Modular Router [2] is a widely-used soft-

ware architecture for the flexible creation of software-based routers (and similar packet-processing systems).Click’s modules are calledelements, which can be con-nected into a graph. Elements have input and outputports, and connections between elements can either bepush, in which earlier elements call later ones to move a

packet through the graph, orpull, in which the calls aremade by the consuming element. Connections betweenelements are typed, so that (for example) one cannot im-properly connect a pull port to a push port. Also, certainkinds of ports can be connected only once. Queues areexplicit elements that connect a pull subgraph to a pushsubgraph.

Over the years, a large library of Click elements havebeen built, allowing designers to re-use the work of oth-ers.

2.2 NP-ClickThe NP-Click project [8] developed a version of Click

tailored specifically for the Intel IXP1200 architecture,a popular network processor with six RISC cores anda StrongARM central core, and a multi-tiered memoryarchitecture. NP-Click enabled efficient use of these re-sources in realizing a Click design. To that end, NP-Clickused the classic Click element architecture. Elementswere implemented in IXP-C, a somewhat restricted sub-set of the C language supported by the IXP architecture.In order to efficiently use the six “microengine” cores ofthe IXP, multiple elements could be instantiated on theIXP board.

NP-Click also addressed efficient use of the memoryhierarchy. The IPX1200 has large, slow memories sharedbetween microengines, and small, fast microengine-specific memories. NP-Click keywords allowed the pro-grammer to annotate data to specify where a particularobject could be stored. NP-Click supported programmer-controllable arbitration of shared resources between par-allel elements. The NP-Click authors estimated a four-fold productivity increase using NP-Click, and demon-strated performance between 60% and 99% of nativeIXP-C coded designs.

2.3 CliffCliff (Click for FPGAs) [3] was the first attempt to im-

plement Click configurations directly on an FPGA base.As with NP-Click, Cliff made the Click element the cen-ter of the design. The heart of the Cliff approach was astandard communication protocol among elements. Eachelement implemented a memory interface, and an inter-element communication protocol consisting of two sig-nalling wires per port. The wires implemented a three-way handshake between connected ports. Each Cliffelement implemented a simple, user-extensible finite-state machine (FSM) with three states: waiting for input,ready to send, or processing a packet.

The Cliff system offered no simulation tools, and theCliff driver simply instantiated and connected predefinedCliff elements. The predefined Cliff elements were re-sponsible for implementing the protocols given above.Effectively, Cliff was a netlisting tool for predefined Ver-ilog modules.

There were several notable features of Cliff de-

Page 4: Chimpp: A Click-based Programming and Simulation ...Chimpp: A Click-based Programming and Simulation Environment for Reconfigurable Networking Hardware Erik Rubow, Rick McGeer, Jeffrey

3

signs [7]. Data was passed between elements on a 344-bit-wide bus that obviated the need for memory storageof packet headers, though packet payloads were stored inmemory. However, the full bus width had to be filled be-fore packet processing could begin, resulting in a latencypenalty of several tens of cycles. Also, only one Cliff ele-ment was active at any time in processing a given packet;this failure to exploit the parallelism available in hard-ware implementations significantly limited performance.

2.4 CUSPClick Utilizing Speculation and Parallelism

(CUSP) [7] extended the Cliff work, but with sev-eral changes. CUSP used a much narrower (16-bit)data bus, and provided explicit support for parallel andspeculative execution of blocks of elements. It also pro-vides some automatic generation of the communicationbetween these blocks.

Parallelism in CUSP is entirely at the block level: theblocks in a design execute separately or in sequence, butwithin a block the elements operate in parallel as de-scribed above.

CUSP demonstrated a 2x performance improvementover Cliff in FPGA space, and came within 20% to ahand design, and processed packets at near line rate ontwo sample designs.

2.5 GG [1] represents another recent attempt to realize high-

level designs in a NetFPGA environment. A G descrip-tion consists of a set ofrules for packet rewriting andstate manipulation. Each rule can be guarded (executedconditionally). As in CUSP, G descriptions are auto-matically clustered into optimized parallel microarchi-tectures; G uses automatic dependence analysis amongrules to optimize scheduling.

2.6 NetThreadsNetThreads [4] takes a different approach to the prob-

lem. Rather than compiling to raw hardware (as Ver-ilog), the NetThreads team took a two-step approach:building a number of “soft” processors from lookup ta-bles (LUTs) on the FPGA, and then compiling a multi-threaded program to run on the soft processors. The ad-vantage of this approach over the use of hardware-basedapproaches is that C code can be compiled directly ontothe NetFPGA platform, and run on essentially a stan-dard processor. In order to make programming easy, Net-Threads permits a static number of threads (a fixed num-ber of threads/processor and a fixed number of proces-sors/FPGA) and uses a static round-robin thread sched-ule.

In preliminary results [4], performance was not dra-matically better than could be expected from Click run-ning on a modern x86. The paper reported packet ratesfor three designs: a DHCP implementation, a packet

classifier, and a NAT, respectively as 500, 1800, and 4600packets/sec. (At 1500 bytes/packet, this works out to 6Mb/sec – 59 Mb/sec.) We note that these were prelimi-nary experiments, reported in a workshop paper, and thatbetter performance may follow subsequent optimization.

2.7 Summary of related workWe summarize the work in this area in Table 1, at-

tempting to highlight the various features of the workin this area. The table shows that previous work essen-tially split into two groups: NP-Click and NetThreads,focussed on ensuring that multithreaded Click code rancorrectly on embedded processors; and G, CUSP andCliff, focussed on realizing high-level Click designs ef-ficiently in a hardware-only implementation. CUSP andCliff both relied on pre-designed hardware blocks, andimposed a rigid protocol and design style on the libraryelements to ensure correctness and composability. Thecontribution of CUSP over Cliff was the insertion ofautomatically-generated communication blocks to per-mit parallelism and speculative execution. G has onlybeen briefly and recently described, and its input formatis sufficiently high-level that it is difficult to determineexactly the implementation details. However, it seemslikely that it is effectively a layer sitting on top of a CUSPbase.

Like Cliff and CUSP (and, probably, like G as well)Chimpp relies on a library of pre-designed hardwareblocks, and focusses on integrating them into a de-sign. Unlike Cliff and CUSP, Chimpp explicitly supportsmixed hardware/software designs, since typical NetF-PGA designs have some components that reside on thehost. Further, Chimpp does not impose a rigid protocolon the hardware side of the design. Chimpp elementsmay communicate in any fashion they desire; all that isforbidden is cycles in the communication graph. In thissense, Chimpp’s hardware synthesis may be regarded asa lower-level primitive than either Cliff’s or CUSP’s.

Indeed, we believe that it would be straightforward tore-implement either Cliff or CUSP to generate Chimppdesigns. The advantages and disadvantages of Chimpp’sapproach are the usual ones of a lower-level vs. a higher-level framework. By not imposing a specific policy, a laCliff and CUSP, we permit designers more freedom andpotentially more efficient implementations. However,designers must think about the communication protocolthey want between blocks, something Cliff and CUSPspares them.

2.8 A Brief Discussion of VerilogSince this paper is targetted to a networking audience,

it is likely that some readers will be unfamiliar with theVerilog language and what it is like to develop hardwareusing Verilog. We briefly describe it here. Readers inter-ested in a fuller description should consult[9]. Readerswho are familiar with Verilog may skip this section.

Page 5: Chimpp: A Click-based Programming and Simulation ...Chimpp: A Click-based Programming and Simulation Environment for Reconfigurable Networking Hardware Erik Rubow, Rick McGeer, Jeffrey

4

Hardware Simulation Integration ofSystem Approach datapath supported HW/SW elements?

NetFPGA User-written Verilog Virtex-II FPGA Verilog NoNP-Click [8] Compiling high-level code to multicore

network processorsIXP1200 NP N/A N/A

CLIFF [3] Composing Verilog blocks representingClick elements on NetFPGA

Fixed Protocol Verilog No

CUSP [7] Composing VHDL blocks representingClick elements on NetFPGA, optimiz-ing parallelism and speculation

Fixed Protocol VHDL No

G [1] Compiling high-level rules into a NetF-PGA hardware datapath

(unknown) gsim, High-levelsimulation (notcycle accurate)

No

NetThreads [4] Compiling multi-threaded C code ontosoft processors implemented on NetF-PGA

Mutithreadedprocessor cores

N/A N/A

Chimpp Composing Verilog blocks representingClick elements on NetFPGA togetherwith software Click elements

Designer’s choice Omnet++ and Ver-ilog: multi-modesimulation

Yes

Table 1: Feature Comparison of NetFPGA Programming Environments

The syntax of Verilog is reminiscent of C with ex-tensive bit operations, but the semantics are very dif-ferent, reflecting the semantics of the implementationdomain. A hardware circuit is typically composed ofacyclic graphs of logic gates (AND, OR, NOT, etc),calledcombinational logic blocks, separated by banks ofstateholding elements calledflipflops. The flipflops aregoverned by clocks; they change value only on the ris-ing, or falling, edge of the clock. Conversely, logic gateschange state whenever one of their inputs changes state.

Verilog models this behavior with two separate formsof assignment operation. Thecontinuous assignmentop-eration, denoted by the keywordassign, operates as alogic gate – whenever a value on the right-hand side ofthe assignment statement changes, the left-hand side isre-evaluated. This means that there is no flow of controlin a Verilog program; everyassign statement is con-ceptually evaluated in parallel, all the time.

The second assignment operation issequential assign-ment. There are two forms,blocking assignment, de-noted by the= operator, andnon-blocking assignment,denoted by the<= operator. These statement both rep-resent flipflops, and thus only appear when guarded by aclock, represented by thealways guard, of the form:always @(posedge clk) c <= dwhich indicates thatc is a flipflop, clocked on the ris-

ing edge ofclk, inputd. This statement is only executedwhenclk has a rising edge.

Similarly,modules in Verilog have a superficial sim-ilarity to C routines, but the semantics are very different.A module in Verilog represents a chunk of hardware.Its parameters are physical wires running into and out of

the module. As with continuous assignment, all modulesin Verilog conceptually execute in parallel.

Note that while Verilog has syntactic similarities withhigh-level software languages, it does not hide the messydetails of hardware design from the developer. Synthe-sizable Verilog code must be designed with an idea ofwhat the synthesized hardware will look like. Timingconsiderations are particularly important to be mindfulof. The developer must consider, for example, at whichclock cycle a value will become available, and how tomanage the storage of other values in the meantime, par-ticularly if pipelining is required to maintain throughput.

Such being the case, it is clear that being able to reuseVerilog code is very important.

3 Design and implementationIn this section, we describe the design and implemen-

tation of Chimpp.

3.1 Mapping Click concepts to hardwareThe Click Modular Router is a software-based plat-

form. While Chimpp aims to map the Click approach tohardware, we borrowed only those concepts that madesense in the context of hardware and which yielded prac-tical benefit without unduly limiting the designer’s flexi-bility.

The element is the most fundamental Click concept.In Click, elements are C++ objects, belonging to a par-ticular element class. Click elements allow per-instancecustomization via configuration strings. This is similarto modules in Verilog, where parameters may be usedfor per-instance customization.

Click elements have some number of ports by which

Page 6: Chimpp: A Click-based Programming and Simulation ...Chimpp: A Click-based Programming and Simulation Environment for Reconfigurable Networking Hardware Erik Rubow, Rick McGeer, Jeffrey

5

they connect to other elements. These connections corre-spond to paths through which packets may flow. Packetflow is implemented in software as a virtual function callfrom one element to another, where one element passes apointer to a packet as a parameter to a method invocationon another element. This is the natural, intuitive way toimplement packet flow in software.

In hardware, there are many ways a designer mightchoose to have a packet flow from module to module,each with its advantages and disadvantages. In the caseof the NetFPGA reference framework, the datapath forpacket flow consists of a 64-bit data bus and 8-bit con-trol bus, with upstream and downstream flow control sig-nals. The entire contents of a packet flow through thisbus, preceded by some per-packet information (referredto as “module headers”) as indicated by the control bus.

However, one might prefer to use a different method,perhaps due to a different set of goals or constraints. Forexample, one might prefer to have a different bus width,or to extract the header fields for processing while stor-ing the payload to memory, or to divide packets intofixed-sized cells. A common interface for packet flowbetween hardware elements is needed, yet there is nosingle “right” interface. Note that Click’s push and pullconcepts are specific to procedural implementation; inhardware with bidirectional flow control, there is no suchdistinction.

As a packet flows through Click elements, some per-packet information may be attached to it. These arecalled annotations, and provide a way for elements tocommunicate to downstream elements information as-sociated with a packet, but external to the packet itself.A hardware implementation of an annotation would de-pend on the packet transfer mechanism. In the case ofthe NetFPGA reference framework, the so-called “mod-ule headers” that precede each packet are the equivalentof Click annotations.

Communication between Click elements is not re-stricted to these connections for packet flow. Elementsmay also export arbitrary method interfaces accessible byother elements. This communication is not representedby port connections. Elements may locate each other ei-ther explicitly by a name in the configuration string, orimplicitly by proximity in the graph. The latter methodis referred to as “flow-based router context”. A hardwareequivalent of these arbitrary method interfaces would bean arbitrary set of signals. We view flow-based routercontext as a way to work around the fact that method in-terface associations are not directly indicated in the lan-guage, and as it will be made clear later, we eliminate theneed for it in our framework.

Handlers provide a way for other applications to in-teract with Click elements. A Click handler can be readfrom or written to through the Linux proc file system.

Analogously, a hardware design may provide mecha-nisms for software interaction and control of hardwareelements. In the case of the NetFPGA reference frame-work, this interaction takes the form of register reads andwrites initiated by software on the host system.

The fact that Click configurations form a directedgraph, with potentially complex interconnections be-tween elements, can introduce certain complexities forhardware implementation. In software these connectionsbetween elements are implemented as a function call. Acomplex graph poses no particular difficulty for a soft-ware implementation – for example, when Click pathsmerge during push processing, nothing special needs tobe done.

However, there is an important difference betweensoftware and hardware datapaths. A software datapathoperates on abstractions such as function calls, objects,and pointers. A hardware datapath operates on data flow-ing between fixed physical modules over physical wires.

Complex graphs, as used in Click, therefore createproblems for hardware-based designs. In particular,they require arbiters wherever paths merge. Also, inan FPGA-based design, a complex graph could make itmore difficult for the development tools to do placementand routing.

Another difference arises from the fact that, whileClick is single-threaded, hardware elements all operatesimultaneously in parallel. Suppose we have multipleparallel paths which begin and end at the same element.In Click, as long as there is no switch between push andpull processing in between, packet order would be main-tained. In hardware, the parallel paths can have differentprocessing delays, so a packet could be forced to waitfor another packet which arrived after it did. If a packetmust pass through multiple arbiters during processing,then processing time could be highly variable.

In section 3.7 we discuss one design strategy that canbe used to mitigate these problems.

3.2 The Chimpp frameworkIn this section, we describe the abstractions and frame-

work of Chimpp, by which we generate a top-level Ver-ilog file from an element library and a user-suppliedscript.

In consideration of the fact that there is no single“right” interface for packet flow in hardware and that itis desirable to allow other forms of communication be-tween elements than packet flow, we chose not to requireChimpp elements to conform to any single standard in-terface. Instead, Chimpp provides a general method tospecify connections between compatible element inter-faces, defined in a library of Chimpp hardware elements.These connections include not only those by which pack-ets flow from element to element, but also any sort ofinter-element communication. This eliminates the need

Page 7: Chimpp: A Click-based Programming and Simulation ...Chimpp: A Click-based Programming and Simulation Environment for Reconfigurable Networking Hardware Erik Rubow, Rick McGeer, Jeffrey

6

for something like Click’s flow-based router context, andalso makes the design easier to understand, as there areno hidden element interactions. Furthermore, there areno required features that all elements must implement,and no restrictions on the structure of internal implemen-tations.

The Click language provides an easy and concise wayto instantiate and customize elements from an existingelement library, and interconnect them to form a directedgraph. The idea of instantiating and interconnecting ele-ments, rather than writing sequential code, actually mapsmuch more naturally to hardware than to software. Wehave created a simple language to concisely instantiateand interconnect Chimpp elements at a high level. Ascript written in this language, together with a library ofexisting elements, generates the top-level Verilog for agiven design. This approach enables the construction ofdesigns without the need to change any Verilog, providedthat all required elements exist in the library. Even forthose comfortable with coding in Verilog directly, thisprovides a valuable productivity benefit. Furthermore,the approach encourages the development of modular,reusable elements. Our language does not attempt topreserve the syntax of the Click language, particularlybecause it has different semantics; for example, unlikeClick, Chimpp connections do not have any notion of di-rection.

3.3 Bus and element typesThe Chimpp framework is built on the concepts of

typedbuses, elements, andinterfaces.A bus consists of a set of signals. A signal corre-

sponds to a single “wire” declaration in Verilog, whichmay have a width of one or more. An element is im-plemented as a Verilog module, which may optionallycontain submodules. Bus instances belong to named bustypes, and element instances belong to named elementtypes. Bus types specify a set of named signals, and listsome set of named interface types that may connect tothat bus type. Different interfaces may have different di-rections (input, output, or inout) for each signal in thebus. Element types specify a set of named interfaces,each belonging to a type specified by the correspondingbus type, along with a mapping between the local signalnames and the bus signal names. Bus types carry withthem an understanding of how the bus is to be used bythe various elements which interface with them.

The specific operational semantics of a particular bustype are not enforced by this framework, but it is ex-pected that element designers will take care that the op-erational semantics and assumptions about the meaningsof signal values are consistent across elements that usea common bus. For example, if there is a signal that in-dicates that there is data to be read, or that a module isnot yet ready to receive data, then individual elements

are expected to respond appropriately.

3.4 Connection rulesA few simple rules are used to validate connections

between elements (or rather, connections of element in-terfaces to a common bus instance):

• The interfaces must belong to the same bus type.• Each signal must have exactly one “output” driver

or one or more “inout” drivers.

Bus types could be used to prevent a connection be-tween element interfaces which have different opera-tional assumptions, even if the signals themselves arecompatible. For example, it might not be desirable to al-low an interface expecting IPv4 packets to be connectedto an interface which emits IPv6 packets. It is not a re-quirement that every output signal have an input signalto drive. However, a warning is presented to the user ifthis is the case.

Any useful design must have some connection to the“outside world.” In Click, these connections are embed-ded within certain elements. For example, the “ToDe-vice” and “FromDevice” elements send packets to, andreceive packets from, a given network interface. In a Ver-ilog design, the interface with the outside world consistsof the interface of the top-level module. This interfaceis platform-specific. This “top-level” module could bethe same top-level module used for synthesis, or it maybe some submodule within the design which is intendedto implement the core user logic. In any case, Chimpprepresents this interface to the “outside world” as a spe-cial type of element which we refer to as anenvironment.An environment element is instantiated in the same wayas other elements, and uses interfaces in the same way.However, it is treated differently during the Verilog gen-eration process, as it provides a skeleton for the top-levelfile rather than a module instance within the file. Exactlyone environment element must be present in a design.

3.5 Libraries and packagesChimpp has no standard or built-in element or bus

types. All types are library-defined, and it is easy to addnew types. The library is structured as a collection ofpackages. A package contains a group of related busand element definitions. For a particular design, a usermust specify which packages are to be used. This pro-vides some protection against naming conflicts, and alsomakes it easier to distribute and share library elements.Bus and element types are defined in simple XML files,which can be easily written and modified by hand. Fig-ures 1 and 2 show examples of these XML formats.Element types also come with one or more Verilog filesthat implement the element. Our tool does not generatethe Verilog that implements elements, only the Verilogthat instantiates and interconnects them. However, theseelement Verilog files could be generated by another tool

Page 8: Chimpp: A Click-based Programming and Simulation ...Chimpp: A Click-based Programming and Simulation Environment for Reconfigurable Networking Hardware Erik Rubow, Rick McGeer, Jeffrey

7

<?xml version="1.0"?><bus name="nf2_pkts">

<interface name="in"><signal name="data" type="input" width="64"/><signal name="ctrl" type="input" width="8"/><signal name="wr" type="input" width="1"/><signal name="rdy" type="output" width="1"/>

</interface><interface name="out">

<signal name="data" type="output"/><signal name="ctrl" type="output"/><signal name="wr" type="output"/><signal name="rdy" type="input"/>

</interface><interface name="spy">

<signal name="data" type="input"/><signal name="ctrl" type="input"/><signal name="wr" type="input"/><signal name="rdy" type="input"/>

</interface></bus>

Figure 1: An example bus definition<?xml version="1.0"?><element name="ipv4_decrement_ttl">

<interface name="clk" type="clock.in"><signal name="clk"/>

</interface><interface name="reset" type="reset.in">

<signal name="reset"/></interface><interface name="pkts_in" type="nf2_pkts.in">

<signal name="in_data"/><signal name="in_ctrl"/><signal name="in_wr"/><signal name="in_rdy"/>

</interface><interface name="pkts_out" type="nf2_pkts.out">

<signal name="out_data"/><signal name="out_ctrl"/><signal name="out_wr"/><signal name="out_rdy"/>

</interface></element>

Figure 2: An example element definition

from some high-level specification (such as G), and usedtogether with hand-written elements in the same design.

3.6 Language syntaxWe use a very simple language for specifying a partic-

ular design. There are three kinds of statements:1. Those that specify which packages to use2. Those that instantiate elements3. Those that connect element interfaces to each otherA short example will quickly illustrate these state-

ments. The example design in Fig. 3 contains threeelement instances: two of type someelem and one oftype debugelem. The environment, instantiated with thename env, is of type someenvironment. The only pack-age used in this example is called somepkg, so all busand element types must be specified in that package. Anelement instantiation statement (as indicated by the “::”token) may or may not include a list of Verilog param-eters enclosed in parentheses. The string enclosed inparentheses is not interpreted by this tool but is passedalong to the generated Verilog instantiation.

// some commentuse some_pkg;

env :: some_environment;elem1 :: some_elem(.SOME_PARAM(1));elem2 :: some_elem(.SOME_PARAM(2));debug :: debug_elem(.STR("debug"));

env.clk <=> *.clk;env.reset <=> *.reset;

env.data_in <=> elem1.in;elem1.out <=> elem2.in <=> debug.spy;elem2.out <=> env.data_out;

Figure 3: A simple Chimpp example

The most notable difference between this languageand the Click language is that connections in the Clicklanguage are directed, whereas here they are not. The“<=>” token indicates that the indicated interfaces areconnected to the same bus. Multiple such connectionsmay be made with a single statement. For example, theline “elem1.out<=> elem2.in<=> debug.spy” indi-cates that interfaceout of elementelem1, interfacein ofelementelem2, and interfacespy of elementdebugareall connected to the same bus. While their implementa-tions are specified here, it could be that the debug ele-ment is there for the purpose of monitoring and perhapsprinting the communications between elem1 and elem2during simulation.

Another way to connect multiple interfaces togetherwith a single statement is to use “*” as the element name,as in the line “env.clk<=> *.clk”. This would cause allinterfaces with the name clk to be connected to env.clk.Note that when we say there is no standard or built-in in-terface, we don’t make any exception for clock and resetsignals. Some designs may require multiple clocks, so itis up to the designer to specify these connections as well.

When the script in Fig. 3 is processed, a top-level Ver-ilog file is generated. All necessary signals are automat-ically named and declared, and all Verilog modules areinstantiated with the proper connections. Additionally,the implementing Verilog files for all elements used inthe design are copied into a working directory.

Notice that this framework is not specific to net-working. In fact, it is not domain-specific at all, andcan therefore be used in a wide variety of applications.The domain-specific and platform-specific aspects of thisframework are contained within the element librariesthemselves. Nevertheless, we have obtained a conciseand high-level way to instantiate and interconnect ele-ments, as in Click.

Page 9: Chimpp: A Click-based Programming and Simulation ...Chimpp: A Click-based Programming and Simulation Environment for Reconfigurable Networking Hardware Erik Rubow, Rick McGeer, Jeffrey

8

3.7 Design of a NetFPGA Element LibraryHere we discuss a few of our design decisions relating

to the initial development of a library of packet process-ing elements specifically for the NetFPGA platform.

The NetFPGA reference framework was designedwith the intent that all application-specific user logicwould be contained within a single module, which isnamed userdatapath. Within this module, the user logicis presented with interfaces to packet queues, to a registerbus, and to off-chip memory, hiding such details as DMAtransfers. Changes outside the userdatapath module areusually unnecessary. We decided to operate within thismodule, adopting it as our environment of choice andtaking advantage of the existing framework.

We used the same interface for packet transfer betweenelements as defined in the reference framework, consist-ing of a 64-bit data bus and an 8-bit control bus, withupstream and downstream flow control. Additional in-terfaces for inter-element communication include the ex-isting register bus and memory interfaces, as well as oth-ers that were defined as needed. The flexibility providedby our tool in defining new bus and interface types hasproven itself to be a useful and desirable feature.

In section 3.1 we discussed the problem of requir-ing arbiters for every path merge. To address this, wecame up with an alternative to having strictly dedicatedinterfaces for packet flow, which is to flatten the packetflow graph, to have all packets pass through all packet-processing elements, and to mark each packet with anindication of whether a particular element should pro-cess it or forward it unchanged. For this purpose wedefined a special packet annotation which we refer toas a “PTAG”. The PTAG is a 16-bit field that precedespackets as part of the first annotation word (or “mod-ule header”). It takes the place of a field that was pre-viously defined in NetFPGA as the word length of thepacket. The word length is used by the outputqueuesmodule, but can safely be overwritten and later recon-structed from the byte length field.

Each element which obeys the PTAG convention hasa PTAG associated with it, and only processes a packetif its PTAG matches the one contained in the packet an-notation. When an element processes a packet, it canchange the PTAG annotation to match that of the nextelement which should process it. These elements onlyrequire a single input interface and a single output inter-face for packet flow. The need for arbiters for every pathmerger goes away, and processing delays become moreconsistent.

A side effect of this approach is that only acyclicgraphs can be represented in this way. We consider thisa good thing, because cycles in the graph can introducedeadlock due to a flow control loop, unless special care istaken to avoid it. Note that this approach does not prevent

the use of other dedicated interfaces. The PTAG methodmay be freely combined with the method of having ded-icated interfaces. This may be desirable, for example, ifcertain packet streams are to receive higher priority treat-ment.

More attention will be given to the particular elementsthat we implemented in section 6.

4 Modular design with hardware and soft-ware elements

There are many functions of a network device that arebest implemented in software rather than hardware, ei-ther because hardware implementation is infeasible, toocostly, or yields a negligible performance benefit. There-fore software is often written in support of the hardwaredatapath. With modular and easily modifiable hardwaredesigns, it is desirable for the supporting software to alsobe modular and easily modifiable. One way to achievethis goal is to employ Click itself as the control planeof a hardware-based data plane. Click can not only op-erate as a control plane for the hardware, but also as anextension of the data plane for traffic that may requiresophisticated treatment.

There are two mechanisms that are provided by the ex-isting NetFPGA kernel module and reference frameworkfor communication between the host software and theNetFPGA hardware. First of all, the NetFPGA presentsitself as a NIC to the operating system. Thus packets aresent to and from the NetFPGA in the same way that theywould be sent to and from a NIC. Therefore, Click caneasily be configured to send packets to and receive pack-ets from the NetFPGA simply by using the appropriateinterfaces.

The other communication mechanism is for readingand writing 32-bit user-defined registers within the NetF-PGA. These registers can be embedded within individ-ual hardware elements. Each hardware element whichcontains software-accessible registers must be allocatedsome section of the register address space. Click soft-ware elements can be written to monitor and manipulatethe state of one or more hardware elements.

For example, a Click element could manipulate a rout-ing table contained within a hardware element. A soft-ware element can be associated with a hardware elementby providing the base register address of the hardwareelement in the configuration string. The individual regis-ters within that element could then be accessed by usinga known offset from the base address.

A useful practice would be to have special read-onlyregisters in each hardware element indicating the typeof element and a version number that software elementscan check to avoid possible configuration or consistencyproblems.

This modular approach to mixed hardware and soft-

Page 10: Chimpp: A Click-based Programming and Simulation ...Chimpp: A Click-based Programming and Simulation Environment for Reconfigurable Networking Hardware Erik Rubow, Rick McGeer, Jeffrey

9

ware design can make it much easier to develop and mod-ify these kinds of systems. Hardware and software ele-ment pairs can be co-designed to implement specific fea-tures, and their insertion into or removal from a largersystem can be done easily. Furthermore, this approachcan make the overall design easier to understand at a highlevel.

5 Simulation EnvironmentThe current practice for testing a NetFPGA design is

to first simulate the hardware in isolation using manuallygenerated stimuli. Then, if those tests are successful,the design is synthesized and loaded onto a NetFPGA,and tested with independently written software in a realnetwork. When things do not interoperate correctly in arunning system, it can be very difficult to debug. Incre-mental hardware changes can result in a long develop-ment cycle when synthesis is required after each change.Furthermore, the network topology scenarios that can betested are limited to those which are physically availableto the developer.

To address this, and to aid in the development of mod-ular hardware and software designs, we created an envi-ronment in which the hardware and software componentsof a design can be simulated together within a simulatednetwork topology. The foundation for this environmentis OMNeT++ [10], a popular discrete-event network sim-ulator.

5.1 Integrating the hardware simulatorIn order to integrate Verilog simulation into the net-

work simulator, it is necessary that the Verilog simula-tor be interactive: that is, the inputs can be determineddynamically at run-time and the outputs can be reportedto the network simulation kernel as they are generated.To satisfy this requirement, we used Verilator1, an open-source Verilog-to-C++ compiler. Verilator compiles asynthesizable Verilog design into a C++ class. This classprovides a simple way to manipulate the top-level inputs,allow the stimuli to propagate within the design, and readthe outputs. Aside from the fact that this enables the in-teractivity that we require, it also happens to be very fast,and the fact that it can be linked directly into the networksimulator is valuable for performance reasons.

We use Verilator to compile the userdatapath mod-ule, which contains the application-specific packet pro-cessing logic. A wrapper around the resulting class pro-vides an interface for injecting packets into the MAC andCPU queues, injecting register reads and write requests,advancing the clock, and receiving any generated out-put packets or responses to the register requests. Once apacket is injected into the wrapper object, it is fed intothe Verilator-generated object one 64-bit word at a time.The injection of a packet into the wrapper corresponds

1http://www.veripool.org/wiki/verilator

to the completed receipt of a full frame by the MAC orby DMA transfer. Packets are likewise received from theoutput queues 64 bits at a time, subject to external flowcontrol (for example, due to a simulated bandwidth limi-tation corresponding to that of a PCI bus). Once a packethas fully been received from an output queue, the wrap-per reports the event. Another wrapper around this wrap-per turns it into an OMNeT++ simple module.

We had to extend the OMNeT++ scheduler to supportthis integration. NetFPGA modules register themselveswith the scheduler during initialization. Then, beforeeach event is removed from the event heap, the clocksof all registered NetFPGA modules are advanced untileither the time of the next scheduled event is reached,or until a NetFPGA emits a packet or register access re-sponse. If the latter occurs, then all such events emittedduring that cycle are converted into objects which canbe handled by the simulator, and inserted into the eventheap, after which the new next event is removed and pro-cessed as usual.

To shed light on the internal operation of a NetFPGAmodule during simulation, it may optionally be config-ured to dump a waveform to a specified file. Addition-ally, Verilog’s $display system task is available. For ex-ample, this can be used to implement hardware elementssimilar to Click’s Print element, effective within the sim-ulation environment.

5.2 Integration with simulated softwareAfter integrating the Verilog simulator into the net-

work simulator, our next step was to integrate the soft-ware component as well. We explored two alternativeapproaches to this integration.

The first approach, in line with our goal of supportingdesigns with modular hardware and software elements,consisted of integrating Click into OMNeT++. BecauseClick had previously been integrated into the ns-2 simu-lator2, we were able to harness the existing nsclick inter-face, inserting the appropriate hooks and writing an ap-propriate wrapper to create an OMNeT++ simple mod-ule. A compound module could then be easily cre-ated by combining a NetFPGA module with a Clickmodule. Bandwidth and delay characteristics of hard-ware/software communication can be represented by set-ting OMNeT++’s datarate and delay parameters on theconnections between Click elements and hardware mod-ules. This setup is enough to simulate packet flows span-ning hardware and software. To add NetFPGA registeraccess support to Click within the simulation environ-ment, however, some special attention is needed. Spe-cial attention would also be needed to support elementswhich use user-level sockets.

Before discussing these issues for Click simulator in-tegration, we will discuss an alternative framework for

2http://www.isi.edu/nsnam/ns/

Page 11: Chimpp: A Click-based Programming and Simulation ...Chimpp: A Click-based Programming and Simulation Environment for Reconfigurable Networking Hardware Erik Rubow, Rick McGeer, Jeffrey

10

integrating software into a network simulation. For somedevelopers or applications, Click might not be the pre-ferred or ideal way to implement a software controlplane. Therefore, we considered a more general ap-proach: simulating a single-threaded program imple-mented as a C++ object.

OMNeT++ modules which represent user-level appli-cations are typically created for the purpose of generatingsynthetic network traffic, with characteristics that resem-ble real applications. In contrast, we would like to exe-cute real application code with real data in a simulatedenvironment.

There are a number of challenges associated with try-ing to run real applications within a network simula-tor such as OMNeT++; Some of these are discussedin [5]. One is the simple problem of control flow andprogram structure. OMNeT++ uses a callback function,handleMessage(), to deliver messages to simple mod-ules. However, real user-level applications are built onsystem calls, some of which may block until an eventof interest occurs. If there is more than one potentiallyblocking system call within a given application, then thismeans that there should be multiple entry and exit pointsat which the control flow transfers between the applica-tion module and the rest of the simulation environment.When control is restored to the application module, thestack should be in the same state in which it was beforethe blocking call was made. The callback mechanism ofOMNeT++ does not provide these semantics, and mod-ifying an application to fit this structure could requiresignificant code refactoring.

There is, however, a feature of OMNeT++ that canprovide these semantics. Simple modules have the optioneither to use the handleMessage() callback function or torun as a coroutine in an infinite loop. In the latter case,modules export an activity() function, which is the entrypoint at the beginning of the simulation. At any point,modules can call a receive(), function which yields con-trol to the simulation environment until the next messagefor that module is received, at which point the receive()function returns a pointer to the message. Each modulewhich uses activity()/receive() is allocated its own stack,which enables this kind of behavior. All context switchesbetween coroutines are cooperative, and only one exe-cutes at any given point in time.

The activity()/receive() approach is discouraged byOMNeT++ developers on the grounds that it requiresmore memory (for the stack) and that it can make forsloppy code in the hands of careless developers. How-ever, it serves our goals quite well, and since we areinterested in smaller-scale simulations, the memory re-quirements are not a significant concern.

We can use the activity()/receive() approach to simu-late an I/O-bound process with potentially blocking sys-

tem calls. We assume the process is I/O-bound and thatCPU processing time is negligible because, to the pro-cess, simulation time only advances during a receive()call. We consider this to be a reasonable assumption tomake. If a more accurate timing model is desired, wecould actually measure the elapsed time between block-ing calls and perform a simulated sleep for that amountof time. All receive() calls are made through a singlewrapper function which is prepared to process any kindof message, and which buffers events appropriately. Fora given blocking call, this wrapper function is called re-peatedly until the desired condition is met: for example,some receive buffer is not empty, or the response to aNetFPGA register read has been received, or a timer hasexpired.

We have used this framework to enable real applica-tions to operate in the simulated environment with verylittle special treatment in the source code. In particular,we can use the same source code for both implementa-tion and simulation, requiring only recompilation.

Because of incompatibilities of OMNeT++ and mul-tithreading, we limit our attention to single-threaded ap-plications. To facilitate simulator integration, we imple-ment the application as a C++ class. Applications canuse the standard sockets API for network communica-tion and a user-level NetFPGA library for register ac-cess. When compiled for use in a real environment, thesefunctions operate unhindered. When compiled for sim-ulation, these function calls are intercepted (by declar-ing class member functions with the same interface) androuted to the the associated application module in thesimulated environment to handle. This enables protocolssuch as TCP to operate in simulation time rather than realtime, since simulator implementations of TCP would usesimulator time for things like round trip time and time-outs.

Now that we have discussed this in detail, I willbriefly explain what is needed to provide register accessor socket support to Click while in simulation. Basi-cally, the same approach is needed because these are callsmade within Click which must return control to the sim-ulation environment. In the case of a register read, somesmall amount of time must pass for the NetFPGA to pro-cess the request and return the value. This requires thatwe use the activity() approach to enable this feature.

6 Experimental evaluationIn order to evaluate the effectiveness of our frame-

work, and to begin development of a useful element li-brary, we experimented with a few example designs. Inparticular, we attempt to validate several specific claimsmade in this paper:

1. Chimpp makes it easy to create and modify NetF-PGA designs.

2. Chimpp hardware elements can be easily integrated

Page 12: Chimpp: A Click-based Programming and Simulation ...Chimpp: A Click-based Programming and Simulation Environment for Reconfigurable Networking Hardware Erik Rubow, Rick McGeer, Jeffrey

11

Figure 4: A Simple IPv4 Router Example

Figure 5: A Router enhanced with NAT, IP-in-IP encapsulation, and hardware-generated ARP responses

with Click software elements in the same system.3. We can simulate these mixed hardware-software de-

signs within a simulated network.4. Performance in terms of packet-processing rate is

competitive and FPGA resource consumption iscomparable.

We developed several designs as part of our evalua-tion:

• An IPv4 router: to compare against the standardNetFPGA implementation.

• Hardware-based ARP support: as a modificationof the IPv4 router.

• NAT support: also as a modification of the IPv4router.

• IP-in-IP encapsulation and decapsulation: alsoas a modification of the IPv4 router.

6.1 IPv4 router designFigure 4 provides a graphical depiction of the hard-

ware portion of the simple IPv4 router composed ofChimpp elements. Several different bus types are rep-resented by the lines interconnecting elements. The di-rection of the arrows reflects the direction of informa-tion flow across these buses. The directions are depictedfor clarity; again, they are not represented in the lan-guage. The elements in the main packet pipeline havea notation indicating their behavior with respect to thePTAG convention discussed in Section 3. For exam-ple, the notation “2->3,7” indicates that the element pro-

Page 13: Chimpp: A Click-based Programming and Simulation ...Chimpp: A Click-based Programming and Simulation Environment for Reconfigurable Networking Hardware Erik Rubow, Rick McGeer, Jeffrey

12

cesses packets with PTAG equal to 2, and may thenmodify that PTAG to a value of 3 or 7. The elementsalong the top, which include “inarb” (input arbiter), “in-terfaceconf”, “routing table”, “arp table”, and “outqs”(output queues), each have a register interface. This al-lows the software to perform operations such as config-uring the MAC and IP addresses of each interface and toinsert routing table and ARP cache entries.

The routing table and ARP table are implemented asseparate elements from those that perform the lookup anddata transformation logic. The motivation for this sepa-ration was so that other tables could be easily pluggedin. For example, it may be desired to use a different ta-ble lookup algorithm or structure, or to store the table inoff-chip memory.

The “mapptag” element sets the PTAG according tothe source port, and the “mapdst port” element setsthe destination output port according to the source port.The purpose of both these elements is to send packetswhich originate from the host CPU directly out on theright network interface, and to send certain packets upto to host CPU (for example, non-IP packets or a lookupmiss). The input arbiter and output queues elements wereadopted from the NetFPGA reference library unmodi-fied. All other elements were written from scratch.

This simple router design has essentially the samefunctionality as the NetFPGA reference router. How-ever, the structure of it is very different. The referencerouter encapsulates all of the functionality between theinput arbiter and the output queues into a single modulecalled “outputport lookup”. It implements layer 2 andlayer 3 processing in a very tightly coupled way. The op-erations of filtering Ethernet addresses, validating the IPheader, performing the next hop lookup and ARP lookup,decrementing the TTL, recomputing the checksum, andmanaging packet flow to and from the host CPU are allperformed within this module, coordinated by a single“mastermind” module. The advantage to this approachis that many of these things can be done in parallel. Thedisadvantage is that it is very difficult to add new func-tionality to this design, or to extract portions of the de-sign for reuse elsewhere. In contrast, our design decou-ples these functions into finer-grained elements. Eachelement has a simple and well-defined operation to per-form, and the task of determining the correct output portis distributed across these elements.

6.2 Quantitative comparisonsWe can make a few quantitative comparisons between

our Chimpp router and the NetFPGA reference router.It should be noted that during the development of theChimpp router, there was no attempt to optimize any ofthese metrics, save that we required line rate operation.

1. Lines of Verilog: The only difference in Verilogcode between the two router designs is between the

input arbiter and the output queues. To implementthis portion, the Chimpp router used 2,093 lines ofVerilog, and the reference router used 3,096 linesof Verilog. Neither of these figures include the theuserdatapath module itself.

2. FPGA Resources: The reference router uses 46%of the slices on the FPGA (a slice is a basic unit ofreconfigurable logic) and 11% of the block RAMs.The Chimpp router uses 50% of the slices and 12%of the block RAMs. The higher slice utilization ap-pears to be due to the use of more distributed RAM,though there are fewer slices used as logic.

3. Processing Latency: The processing latencythrough the main pipeline (the section between theinput arbiter and the output queues) for the ref-erence router is 18 cycles, or 144 ns. The pro-cessing latency for the Chimpp router is 26 cycles,or 208 ns. This is due to the fact that the refer-ence router performs certain operations in parallel.The lower latency makes little difference, however;as the NetFPGA framework does not allow cut-through operation, large frames incur a store-and-forward delay of over 1500 cycles. The input ar-biter introduces its own latencies due to contention,and the biggest source of latency overall is likely tobe the output queues, even if buffers are relativelysmall.

4. Throughput: Both routers operate at line rate for allpacket sizes.

6.3 Integration with ClickThe Click portion of our router design essentially con-

sists of a basic router with an overall structure which issimilar to the hardware datapath. It looks like a plainClick router, but contains elements which are modifiedto reflect their state in their corresponding hardware el-ements. We modified (subclassed) two existing Clickelements and created one new element. The new ele-ment, called NF2AddrInit, takes as parameters the reg-ister block tag of the interfaceconf element and theMAC and IP addresses of those elements. This elementsimply initializes the appropriate registers when Clickloads. The other two elements, NF2ARPQuerier andNF2LinearIPLookup are equivalent to their superclassesARPQuerier and LinearIPLookup except that they pushtheir ARP tables and routing tables into their associ-ated hardware elements. This can be thought of as ahardware-accelerated Click router. We grouped theseNetFPGA-specific elements into a separate Click pack-age. By packaging up these specialized Click elementsseparately, they can easily be distributed along withpackages of Chimpp hardware elements.

6.4 Adding features to the routerWe will now discuss our experiments in extending the

functionality of the example router.

Page 14: Chimpp: A Click-based Programming and Simulation ...Chimpp: A Click-based Programming and Simulation Environment for Reconfigurable Networking Hardware Erik Rubow, Rick McGeer, Jeffrey

13

ARP Responder: The ARP responder is simple: itconsumes ARP requests, and when the request is forthe configured IP address of the interface on which thepacket was received, it generates an ARP response. Thiscan easily be inserted after the l3classify element, whichshould be configured to set the appropriate PTAG forARP requests. Due to the relative infrequency of ARPrequests, this will not yield a significant performance im-pact. However, it does demonstrate the ease with whichthe router can be extended.

This is an example of a feature which was alreadypresent in the Click configuration but which was pusheddown into the hardware. Figure 5 depicts a modifiedversion of the router which contains this ARP responder(labeled arpresp), as well as other additional elementswhich we will now discuss.

Network Address Translation: The next enhance-ment to the IPv4 router that we implemented was a Net-work Address (Port) Translation element. This NAT im-plementation was implemented purely in hardware usingsimple mechanisms.

Outgoing packets are identified by having a privatesource IP address and a public destination IP address.For these packets, the source IP address and TCP/UDPport number are hashed to a 15-bit value which is usedto replace the source port (We only use the upper halfof the port space for NAT). The source IP address is re-placed with the IP address of the external NetFPGA in-terface (for simplicity we assumed only one external in-terface). When the hash is performed, a reverse mappingof 15-bit port number to the internal IP address and portis installed in a directly indexed table.

Incoming packets destined to a port number for whichsuch an entry exists will have their destination IP addressand port number replaced with these internal entries.

The advantages to this implementation are that it is fastand that the table is small enough to fit in on-chip mem-ory. The disadvantages are that external port collisionscan occur, and it does not currently validate the source IPaddress for incoming traffic (this would be easy to do butwould require more storage). Once addresses have beenreplaced, the packet can be routed normally, so this el-ement should be placed before the nexthop lookup ele-ment. However, this demonstrates the point that we wereable to add this functionality simply by inserting an ele-ment into the design.

IP-in-IP encapsulation: The final enhancement to theIPv4 router that we have made thus far is simple IP-in-IP encapsulation and decapsulation. The desired behav-ior here is to conditionally insert a basic IP header withsome configured source and destination IP address. Atthe terminating node of this IP tunnel, the header shouldbe removed.

The decision about whether to apply encapsulation

or decapsulation of a packet is currently made stati-cally by generic classifiers (labeled ipencapfilter andip decapfilter). These classifiers are provided with setsof bitmasks and values to match on for one or more of thefirst eight words of a packet. In this way, any combina-tion of arbitrary field bits in the first 64 bytes of a packetmay be matched against specific values. In the case ofthe encapsulator filter, it checked for IPv4 packets with aparticular destination IP prefix. In the case of the decap-sulator filter, it checked for IPv4 packets with a protocolfield value of 4 (indicating IP-in-IP) and a specific desti-nation IP address.

These generic classifiers are fairly flexible, and the im-plementation is trivial. Ideally, however, we would likethese encapsulation and decapsulation elements to beconfigurable at runtime (for example, to correspond witha configured interface IP address). One possible way toachieve this would be to use dynamic generic classifiers,where the bitmasks and values to match against are con-tained in software-accessible registers.

During the development of all of these features, weused Verilator for individual unit tests and the mixedsimulation environment based on OMNeT++ for multi-element simulations, before attempting to synthesize adesign. The simulation environment proved to be quiteuseful, and the synthesized designs generally operatedexactly as expected. The ability to simulate arbitraryC++ programs had other uses aside from acting as a con-trol plane for the NetFPGA. For example we could writea program to transfer a file from one virtual node to an-other through the NetFPGA and write it to a file. Wecould do the same for multiple node instances, and ver-ify that the file was transmitted correctly.

7 Summary and conclusionsIn this paper, we presented Chimpp, a development

environment for reconfigurable network hardware, andits implementation for NetFPGA. Chimpp differs fromprevious attempts in that it permits a wide variety ofcommunication patterns and bus structures in the im-plemented design, handles communications with a Clicksoftware layer, and, as part of the Chimpp environment,incorporates common simulation tools. We demon-strated a sample design with Chimpp, the IPv4 router,and showed it had minor overhead in LUT count, latency,and no penalty in throughput compared to the hand-coded router design , and 33% less Verilog code. Wewere able to demonstrate easily adding enhancements tothe router: Network Address Translation, ARP response,and IP-in-IP encapsulation.

Chimpp is primarily designed to be a vehicle for net-work researchers wishing to use the NetFPGA platformfor experimentation. We believe that the value of an en-vironment like Chimpp is determined by its user com-munity. In particular, essential to Chimpp’s success is

Page 15: Chimpp: A Click-based Programming and Simulation ...Chimpp: A Click-based Programming and Simulation Environment for Reconfigurable Networking Hardware Erik Rubow, Rick McGeer, Jeffrey

14

a broad range of pre-written modules to permit networkexperimenters to rapidly build novel and interesting de-signs on the NetFPGA platform. Therefore, we will re-lease Chimpp and our library of Chimpp hardware mod-ules on the NetFPGA website, and encourage networkresearchers and NetFPGA users to use the environmentand contribute back any modules they have designed foruse with Chimpp.

References[1] M. Attig and G. Brebner. High-level programming of

the FPGA on NetFPGA. InProc. NetFPGA Developers’Workshop, August 2009.

[2] E. Kohler, R. Morris, B. Chen, J. Jannotti, and M. F.Kaashoek. The Click Modular Router.TOCS, 18(3):263–297, 2000.

[3] C. Kulkarni, G. Brebner, and G. Schelle. Mapping a Do-main Specific Language to a Platform FPGA. InProc.Design Automation Conference, pages 924–927, June2004.

[4] M. Labrecque, J. G. Steffan, G. Salmon, M. Ghobadi,and Y. Ganjali. NetThreads: Programming NetFPGAwith Threaded Software. InProc. NetFPGA Developers’Workshop, August 2009.

[5] C. P. Mayer, T. Gamer, and C. P. Mayer. Integrating realworld applications into omnet++, 2008.

[6] J. Naous, G. Gibb, S. Bolouki, and N. McKeown. A Pro-gramming Model for Reusable Hardware in NetFPGA. InProc. PRESTO, Aug. 2008.

[7] G. Schelle and D. Grunwald. CUSP: A Modular Frame-work for High Speed Network Applications on FPGAs.In FPGA ’05, pages 246–257, February 2005.

[8] N. Shah, W. Plishker, and K. Keutzer. NP-Click: A Pro-gramming Model for the Intel IXP1200. InProc. 2ndWorkshop on Network Processors, February 2003.

[9] D. E. Thomas and P. R. Moorby.The Verilog hardwaredescription language. Kluwer Academic Publishers, 101Philip Drive, Assinippi Park, Norwell, MA, 02061, fifthedition, 2002.

[10] A. Varga. The OMNeT++ discrete event simulation sys-tem. In Proc. European Simulation Multiconference,pages 319–324, 2001.