an experimental study of network performance impact of...

An Experimental Study of Network Performance Impact

of Increased Latency in Software Defined Radios Thomas Schmid Oussama Sekkat Mani B. Srivastava

Networked and Embedded Systems Laboratory

Electrical Engineering Department, University of California, Los Angeles

{thomas.schmid, osekkat, mbs}@ucla.edu ABSTRACT

Software Defined Radios are becoming more and more prevalent. Especially in the radio amateur community, Software Defined Radios are a big success. The wireless industry also has considerable interest in the dynamic reconfigurability and other advantages of Software Defined Radios. Our research focuses on the latency of Software Defined Radios and its impact on throughput in modern wireless protocols. Software Defined Radio systems often employ a bus system to transfer the samples from a radio frontend to the processor which introduces a non-negligible latency. Additionally, the signal processing calculations on general-purpose processors introduce additional latencies that are not found on conventional radios. This work concentrates on one particular Software Defined Radio system called GNU Radio, an open source Software Defined Radio application, and one of its hardware components, the Universal Software Radio Peripheral (USRP), and analyzes its receive and transmit latencies. We will use these measurements to characterize the performance impact on IEEE 802.15.4 implementation in GNU Radio. Additionally, we present two Software Defined Radio implementations of short-range radio standards, a FSK scheme used in the Chipcon CC1000 radio, and the physical layer of IEEE 802.15.4. We use these implementations for round trip time measurements and introduce two sample applications, a physical layer bridge between the FSK scheme and IEEE 802.15.4, and a dual channel receiver that receives two radio channels concurrently.

Categories and Subject Descriptors C.2.1 [Computer Systems Organization]: Wireless Communication

General Terms Design, Experimentation

Keywords Software Defined Radio, IEEE 802.15.4, GNU Radio

1. INTRODUCTION Software Defined Radio (SDR) has lately gained a lot of attention in the wireless industry as well as in academia. The recent advances in hardware design and the wide availability of very powerful processors allow us to implement some of the applications for Software Defined Radios, which were not feasible before. For example, the first software radio from the SpectrumWare Project [1] ran on a PentiumPro 133-MHz processor, and was not even able to process one AMPS cellular phone channel in real-time. In 2001, Vanu Inc. utilized similar code running on a Pentium III 1-GHz processor and was able to run seven AMPS channels in real-time. This was not achieved through better algorithms, but simply by the advances of the underlying hardware itself. Earlier this year, Vanu Inc. presented the first FCC approved multi-mode software radio base station, which combines GSM, iDEN, and CDMA in one device. This shows that Software Defined Radios have become a reality and today represent a very flexible and viable alternative to conventional radio systems.

There is still considerable academic and industrial research concerning Software Defined Radios. In [2], Joe Mitola proposed a software radio phase space, which is a good way to compare software radios and describe where research was going on. On one side, researchers are investigating new ways to build wide band radio frontends [3]. These are generally concerned with the hardware instead of software. Nevertheless, these frontends give an immense amount of flexibility through tuneability and control knobs exposed to the software. Another area of interest is high-speed A/D converters [4], which reach the GSample/s range. The hope is that one day they will be fast enough to remove the necessity of a radio frontend altogether. Yet another area of research are the “Cognitive Radios”, which were introduced by J. Mitola in [5] and are part of a bigger academic field, called “Cognitive Dynamic Systems” [6]. Cognitive radios are a model for wireless radio nodes that can dynamically change their radio components, including their modulation scheme. The goal is to adapt to a changing wireless environment and to exploit a larger spectrum of frequencies. The finding of the Federal Communication Commission of the United States fostered these ideas, that while some frequency bands are heavily used, others are underused. Additionally, frequency usage depends heavily on time and place. Thus, a cognitive

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that

copies bear this notice and the full citation on the first page. To copy

otherwise, or republish, to post on servers or to redistribute to lists,

requires prior specific permission and/or a fee.

WiNTECH`07, September 10, 2007, Montréal, Québec, Canada.

Copyright 2007 ACM 978-1-59593-738-4/07/0009...$5.00.

radio could exploit these facts and use unused spectrum to improve its performance.

The SpectrumWare project showed that general-purpose computers can be used as Software Defined Radio platforms. One might wonder why one should use such a computer when one can use more specialized hardware such as DSPs and FPGAs. Here are some of the reasons why one wishes to do so:

• Portability. Code written for one general purpose CPU can easily be ported to other architectures and operating systems, especially if the code is well abstracted and modularized. This gives an increased flexibility and choice in hardware, i.e., one can use the often widely available hardware necessary to fulfill the task, without being locked into a specific architecture or hardware platform.

• Availability of Development Tools. Development tools for general-purpose computers are more widely available, especially under open source licenses. This is very important for researchers who often need full access to the whole tool-chain in order to design new systems.

• Application Integration. If the Software Defined Radio code runs on the same computer as the application, then it can be better integrated into it. This can enable new models on how applications configure and use communication primitives available to them (See J. Mitola III’s dissertation [5]).

Many modern MAC protocols rely on specific receive and transmit latencies for different reasons, and impose deadlines by which certain actions must be done. For example in IEEE 802.11, a radio needs to wait for one Distributed Interframe Space period (DIFS, 128µs) from the time it first senses the channel to be idle, until it can actually send the message. In 802.11e, different values of DIFS are used to provide differentiated channel access to traffic of different priorities. Likewise, the acknowledgement packet in response to a data packet must be sent within a Short Interframe Space period (SIFS). Thus, a tight control over the latency and timing is necessary to meet these deadlines.

In this article, we will discuss two aspects of our current research in Software Defined Radios, namely throughput and latency for packet radio systems. Michael Ismert brushes the surface of this in [7]. He shows that an increase in the buffer size of their system results in a decrease in processing overhead, but an increase in the latency of the samples coming from the A/D converter to the processor. Little attention has been paid to this inherent trade off in Software Defined Radio solutions for general purpose computing systems. In our research, we will try to quantify the increase in latency for a current commercially available Software Defined Radio hardware platform and study its impact on higher layer protocols. More specifically, we will focus on short-range wireless protocols, namely a simple FSK scheme used in the Chipcon CC1000 radio [1], and the IEEE 802.15.4 standard.

The rest of this article is structured as follows. In Section 2 we describe GNU Radio and the Universal Software Radio

Peripheral (USRP) used in our research. Section 3 quantifies the sending and receiving latency of the used hardware. In Section 4 we explain the short range wireless protocols we consider in our research and briefly introduce two applications that are now possible to do with our implementations, a SDR bridge that translates between two incompatible wireless standards, and a SDR basestation that receives two radio channels at the same time. In Section 5 we evaluate the impact of measured hardware latency on the MAC protocol presented earlier and give some suggestions of changes to the hardware, software, and protocol in order to fix these problems. Section 6 describes some related work and Section 7 gives some concluding remarks.

2. Platform Our research concentrates on short-range wireless radios. More specifically, we implemented the FSK physical layer found in the Chipcon CC1000 radio chip [1] and the physical layer of IEEE 802.15.4. For the latter, we consider a radio chip using that standard, the Chipcon CC2420 [1]. Both these standards are widely used in the sensor network community. The popular Mica2 (CC1000 radio chip), MicaZ (CC2420 radio chip) from Crossbow, or the T-Mote Sky from Moteiv all use these standards. The next two sub-sections will give a short introduction to GNU Radio and the USRP, two tools we use in our research.

2.1 GNU Radio GNU Radio [8] is a collection of open source software. Combining it with minimal hardware allows the construction of radios, and thus turns a typical hardware problem into multiple software problems. The main goal of GNU Radio is to facilitate the combination of signal and data processing blocks into powerful modulation, demodulation, or more complex signal processing systems. GNU Radio achieves this by providing simple signal processing primitives written in C++. By using SWIG, an interface compiler that allows easy integration of C/C++ into scripting languages, GNU Radio provides a simple interface to the signal processing blocks from within Python, a dynamic object-oriented scripting language. Thus, the Python code simply connects the signal processing blocks and allows them to run at native speed without any interpretation.

Several different example applications are already written as part of GNU Radio to demonstrate its diverse signal processing tools. They range from an application that decodes HDTV pictures, an AM/FM broadcast radio en/decoder, to simple AM, FM, and PSK modulation schemes. Additionally there is an example implementation of a packet radio system using GMSK modulation and demodulation to transmit packets from one host to another. However, the problem with this system is that GNU Radio doesn’t provide good support for packet based processing since it is stream oriented. A DARPA project called “Adaptive Distributed Radio Open-source Intelligent Network (ADROIT)” [9] attempted to change this by implementing a new primitive into GNU Radio, called the “m-block” [10]. This will allow a simpler implementation of block based processing and allows the annotation of samples with meta data such as timestamps. By the time of

writing this article (Spring 2007), the m-block implementation in GNU Radio was well advanced, but not far enough to be considered for our implementations.

GNU Radio by itself is not very useful, as it still needs some hardware to interface to the real world. Fortunately, GNU Radio supports several different hardware platforms [11], such as sound cards, or an RF frontend to receive different bands of the RF spectrum. The most commonly used one is the Universal Software Radio Peripheral (USRP), which we will cover in more detail in the next section.

2.2 The Universal Software Radio Peripheral Matt Ettus developed the USRP [12] as a flexible low-cost platform for software defined radios. It consists of one motherboard that holds the ADCs, DACs, and a FPGA for simple, but bandwidth consuming processing. Additionally, the FPGA is used to reduce the sampling rate such that the samples can be sent to a PC over a USB 2.0 connection. There are two Analog Devices AD9862 chips, each containing two 12-bit ADCs with up to 64 MS/s sampling rate. This allows an effective receive bandwidth of up to 32 MHz. The DACs have 14-bit resolution and up to 128 MS/s. They allow us to generate signals of up to around 50 MHz in bandwidth. From these figures we can see that the bottleneck is the USB 2.0 connection, which has a theoretic data rate limited to 480 MBit/s. In practice it is worse. A benchmark program included in GNU Radio shows, that the USB 2.0 bus can sustain about 32 MByte/s of continuous data throughput, thus limiting the transfer to a maximum of 8 MS/s of complex signals (16-bit I and Q channel).

3. Delay Measurements for the Receive and

Transmit Paths Current MAC layer protocols have stringent demands on delays and latencies. Conventional radio chips have no problem to meet them, because the logic is implemented near the physical layer processing units. Thus, latency in processing and bus transfers are negligible. The following paragraphs discuss two scenarios to motivate the reason on why we need to measure and characterize the latencies in the SDR system for our research.

GNU Radio does most of the processing on a general-purpose computer and the computed samples need to be transferred over a bus system to the radio frontend. In the best case, all the samples can be precomputed and pushed

down to the radio frontend, where a trigger can release them. This is currently not implemented in GNU Radio, but represents an optimal system that achieves the smallest latency possible. Figure 1 depicts this scenario. However, the channel sensing and sensing logic is still done on the CPU, and thus we introduce at least once the latency of the bus system. This latency inflicts a “blind spot”, in which someone could grab the channel and thus the SDR system would cause a collision.

Figure 2 shows a different scenario. We cannot always precompute a packet the radio is about to send out if it depends on the content of the packet it just received. One such example in a current MAC layer protocol is the CTS in 802.11. It must contain the receiver ID of the node the RTS came from, as well as a NAV period, which depends on the NAV period received in the RTS. Thus, the CTS needs to be calculated on the fly and we introduce at least twice the bus system latency. ACK packets have a similar problem. We can only send out an ACK once we fully decoded the message and checked it for errors, thus introducing a considerable calculation delay that needs to be characterized.

To study higher layer protocol performance, one needs to know receive and transmit latencies of the USRP. In other words, we are interested in the time it takes from generating a sample on the CPU until it is sent out through the USRP, and vice versa. In order to measure this latency, we used an external oscilloscope and the computer’s parallel port. The parallel port has a maximum latency of about 1 µs1, which is, as we will see later on, much smaller than the USRP latency, and can thus be ignored. Note that the USB bus limits the sampling rate to 8MS/s for 16-bit complex samples. Thus, we only measure the latency up to that point.

The latency heavily depends on the buffering introduced in the chain between the ADCs and the processing blocks running on the computer. This buffering is necessary because GNU Radio is temporally decoupled from the USRP, i.e., GNU Radio doesn’t care about the physical sampling rate and processes the samples as fast as possible until its buffers are either completely full or empty. It is the USB driver that rate limits the data going to the USRP and

1 The latency here is defined as the time it takes from calling outb

until the parallel port pin actually toggles. We use the latest real-time scheduling additions of the Linux kernel 2.6 and we run the processes at the highest priority (-20).

Figure 1: A conventional radio has the channel sensing part

very close to the PHY layer processing. Thus, latencies are small

and negligible. In the ideal case, a SDR system can precompute the packet and load it into the RF frontend. But if the channel

sensing is done in the CPU, it introduces a non-negligible

latency that creates a “blind spot” that increases collisions.

Figure 2: Precomputation of a packet is not possible if it

depends on the packet a radio just received. Thus, the bus system latency becomes very important and has to be as small

as possible in order to minimize interframe spacings.

matches the USB transfer rate to the sampling rate. Therefore, the sampling rate influences the rate at which the buffers are filled or emptied and thus changes the latency. The USRP uses the USB in isochronous transfer mode to guarantee access to the full USB bandwidth. The downside of this mode is that the data is sent in packets over the bus and thus introduces an additional latency component. We will quantify all these latencies for streamed data as well as for burst traffic in the following paragraphs. Our test system consists of a dual Pentium IV Xeon CPU clocked at 3.75GHz and 2GB of RAM. The system runs a Linux kernel at version 2.6.17 and we used GNU Radio at subversion revision 3941 with libusb version 0.1.12.

3.1 Receive Latency The theoretic receive latency can be calculated as follows:

! = !USRP Hardware + !USB + !GNU Radio,

where !USRP Hardware is the latency introduced by the USRP hardware (from the output of the antenna to the FPGA), !USB is the latency introduced by transmitting the data over the USB bus (data entering the USB bus from the FPGA until it gets handed over to the USB driver on the PC), and !GNU Radio is the latency introduced by the data processing on the computer. We can calculate the latency introduced by the USB since a USB packet is only sent when we have a sufficient amount of data collected in the USRP buffer., i.e., the smallest allowed USB packet is 512 byte, and the largest one is specified by the user by two parameters, fusb_nblocks and fusb_block_size. Thus, the USB latency is

!USB =f (512, fusb_nblocks " fusb_block _ size)

sample_ size " fs, (1)

where f (x, y) depends on the amount of data in the buffer and is at least x and at most y , and fs is the sampling frequency. Since we use complex 16 bit samples, the sample size is

sample_ size = 2 !16bit = 4byte .

With the help of an external function generator we can quantify the receive latency. The function generator produces a square wave at 1Hz. This square wave is fed into the antenna port of the USRP and at the same time into the external oscilloscope. In GNU Radio we used a very simple signal-processing block to detect the edges in the square wave transition from low to high and accordingly

toggle a pin on the parallel port. This pin is connected to a second channel on the external oscilloscope where we now can measure the latency between the generated square wave and the parallel port pin’s generated square wave. We set the USB transfer parameters to fusb_nblocks=8 and fusb_block_size=2048. These are reasonable values and give a good tradeoff between USB protocol overhead and USB package payload for high data rates. Figure 3 illustrates the measured receive latency for different sampling rates and compares it to the theoretic minimum and maximum USB latency calculated by using Equation 1. We give the median as well as the minimum and maximum measured latency because the collected samples do not follow a normal distribution.

The behavior of the measured USRP receive latency is as expected. On average, the latency matches the maximum USB latency, i.e., each USB packet takes the maximum number of allowed bytes. The graph also shows us that there is a maximum latency that can be met with high probability. However, this latency is very high and ranges from 1 ms for a sampling rate of 8 MS/s up to 30 ms for a sampling rate of 250 kS/s. This is too large for any of the current wireless protocol standards, especially as the CPU will have to do additional processing of the samples. We will investigate in Section 5 the effects of this latency on MAC layers and give hints on how we can decrease them.

3.2 Transmit Latency The transmit latency is slightly more complicated to explain since there is a 32kByte buffer between the GNU Radio code and the USB code for the USRP. GNU Radio will immediately fill this buffer, because GNU Radio is temporally decoupled from the USRP, i.e., it is not rate limited. The temporal recoupling happens in the USB driver code. Thus, if we have a continuous transmit stream of samples, each sample has to go through this 32k buffer before it is sent to the USRP. But GNU Radio can also operate in a different way where the upper layer code generates packets, which then are converted into samples through the modulation process. These samples arrive in a burst to an empty buffer. Thus, the samples are immediately transmitted to the USRP once there are enough samples to fill a USB packet.

We describe only one measurement for a continuous transmission, since the continuous streaming case is less interesting for our research. If we use a sample rate of 320kS/s it will take a complex 16-bit sample approximately 25ms to go through the 32k buffer because GNU Radio will

Figure 3: Theoretic maximum USB latency vs. measured

RX latency for fusb_nblock=8 and fusb_block_size=2048. A higher sampling rate fills the buffers faster and thus

decreases the latency.

Figure 4: Transmit latency measurements for different

sampling rates. The measurements are for busty transmissions.

fill that buffer as fast as possible and keep it full. There is another 8 kByte buffer in the USRP itself. This one will take an additional 6.25ms, which gives a theoretic total of 31.25ms. We measured this latency with a similar setup as with the receive latency. This time, GNU Radio generates a square wave and when the waveform goes high or low, the code toggles one of the parallel pins. The USRP output and the parallel port are connected to two different channels on an external oscilloscope, where we can measure the delay between the two waves. The median latency for a signal sampled at 320kS/s is 32.9ms, the minimum measured latency 28.9ms, and the maximum latency 36.9ms. This is what we expect given our theoretic result.

The more interesting case is the burst arrival of samples. In this case, the samples do not need to go through the 32k buffer nor the 8k buffer since they should be empty. Thus, the delay can be reduced dramatically. Figure 4 depicts the measured latencies. We can see that there is a minimum latency of 200µs. We assume that this is the approximate time it takes for the data to get from user space into the USB driver since it is the same for all the sampling rates.

As for the receive latency, we can define a maximum transmit latency that can be met with high probability and ranges from 240µs for 8 MS/s up to 730µs for 250 kS/s. Again, this latency does not include any preprocessing of the data which would have to be included in the latency calculations of a specific protocol implementation. As with the receive latency, the transmit latency is too high to meet the specifications of current wireless protocols.

4. Physical Layer Implementation and

Applications We implemented two short-range wireless physical layer standards commonly used in sensor networks to study the impact of physical layer processing in a general purpose processor on the latency and round trip times in a packet based radio. The first physical layer uses a simple FSK modulation scheme, and the second one is the physical layer of IEEE 802.15.4 and uses a O-QPSK modulation. Both implementations are based on the GMSK packet radio example available in GNU Radio. For more details on the implementation please see [13].

4.1 Implementation Verification We tested both implementations with the Crossbow MicaZ and Mica2 mote that features the Chipcon CC2420 [1] and the CC1000 [1] radio transceivers. On the mote we run SOS [14], an operating system for mote-class wireless devices developed at UCLA. The network stack on the MicaZ mote is the Chipcon proprietary stack implementation that is compliant with the IEEE 802.15.4 standard. On the Mica2 we use the default SOS stack for the Mica2 mote.

We created two test scenarios to test the transmitter and the receiver code. In the first scenario, we programmed a mote to regularly send out a message. This allowed us to test the correct working of the receiver according to the standard. The messages were very simple and the pure MAC layer payload consisted of 27 bytes. Thus, the total number of bytes sent over the physical channel was 45 bytes for the MicaZ, and 37 bytes for the Mica2 (not counting the

synchronization sequence in the beginning of the message). The messages were sent at an interval of 100ms. We conducted five trials each time sending approximately 1000 messages and calculated the number of messages successfully decoded with our GNU Radio implementation. For a comparison, we also equipped a second mote with a base-station code and recorded how many messages it received. On average, the GNU Radio IEEE 802.15.4 code received 92.8% of the messages the base-station MicaZ mote received. This is an expected number because the GNU Radio code doesn’t allow any errors in the spreading sequence. The results for the GNU Radio FSK code are slightly better, 94.2%, though the native Mica2 base-station still exceeds the GNU Radio implementation’s performance.

The second test was done to check the transmit code of the GNU Radio implementations. We sent out the same message the mote produced with the GNU Radio system. A second computer with a second USRP and daughter-board received the messages. Additionally, a base-station mote also received the messages and we counted them as well. The messages were sent out at an interval of 500ms. On average, the GNU Radio IEEE 802.15.4 code successfully decoded 98.6% of the messages the base-station mote received. This shows that the transmitting code works and that both the transmit and receive code can communicate with other IEEE 802.15.4 compliant transceivers. Again, the results for the Mica2 code are very similar at 98.8%.

4.2 Round Trip Time Measurements of the

SDR Protocol Implementations The round trip time of a radio is an important measure for upper layer protocols. It defines the time for a basic message exchange between two communication partners and sets limits for congestion backoffs, interframe spacing, and throughput. We compared measurements for the two SDR implementations and for the commercial radios. Figure 5 illustrates the measured times. We can see that the SDR implementation of the FSK protocol is about 8 times faster than the conventional Mica2 platform. This is not astonishing because the microcontroller of the Mica2 has to do a lot of the processing itself since the radio chip is byte oriented. Thus, the microcontroller has to search for the start of the message itself and has to spend a considerable amount of processing power, which in turn results in a higher round trip time.

Figure 5: Measured round trip times for the conventional

radios and the SDR implementations.

For the IEEE 802.15.4 implementation it is the other way around. Here, the conventional chip does most of the processing and sends the whole packet, once received, to the microcontroller. Thus, the processing is much faster and results in a 3 times shorter round trip time. Additionally, it has a very deterministic round trip time, i.e., there is not a lot of variability in the measured times. On the other side, the SDR implementation has to do a considerable amount of calculation to dispread the data and retrieve the message. This introduces a considerable delay in the communication chain.

The round trip time measurements show us that we definitely need to reduce the processing delay, and that we must not simply concentrate on the hardware latency. For example, from the previous sections, we can infer that the minimal average round trip time without any data processing for the IEEE 802.15.4 SDR implementation should be in the order of 3 ms for a sampling rate of 4 MS/s. However, we measured an average round trip time of 26.5 ms, which means that the processing, i.e., the modulation, spreading, demodulation and dispreading of the packet introduced an additional 22.5 ms. Therefore, we can improve the performance dramatically through the implementation of more efficient algorithms, and using other performance enhancements on the computer side like parallelization.

5. Implication of Physical Layer Delays on the

MAC Layer, and Possible Solutions In the previous section we showed the interaction between our SDR implementations and current wireless sensing system hardware. However, this hardware does not make use of any IEEE MAC Layer standard. That being the case, we will next investigate the impact of longer latencies on such a standard, namely IEEE 802.15.4 and discuss possibilities on how to solve these occurring problems.

5.1 Impact on IEEE 802.15.4 The default turnaround time in IEEE 802.15.4 is defined as 12 symbol periods for a short interframe spacing (SIFS), and 40 symbol periods for a long interframe spacing (LIFS). One symbol period in the 2.4 GHz range is 16µs, which makes the receive-transmit turnaround time 192µs or 640µs respectively. The round trip time of a packet radio is defined as RTT=2!RX+2!TX and from our round trip time measurements in Section 4.2 we know that the average round trip time of our IEEE 802.15.4 physical layer implementation is 26.5ms. This can be translated to a receive-transmit delay of 13.25ms, which translates into 70 symbol periods. This is almost twice as long as a LIFS, and thus our current implementation could never meet the requirements for IEEE 802.15.4.

We used ns-2 [15] at version 2.30 to study the impact of a longer receive-transmit turnaround time in an extended IEEE 802.15.4 protocol. Ns-2 includes a full implementation of IEEE 802.15.4, which was written by J. Zheng [16]. The scenario for our simulations included three nodes. Two of the nodes started sending constant bit rate traffic to the third node at time 7s. We let the simulation run for a total of 1000s after which we calculate the total throughput and delivery rate over that period.

Figure 6 illustrates the result of the simulation for different receive-transmit turnaround times. We can see that under a high network load, an increase in the turnaround time decreases the total throughput of the network, because we have longer idle times between the packets. On the other hand, if the offered packet rate is lower than the network’s capacity, then the larger turnaround times do not matter and the throughput is the same.

5.2 How to Solve the Problem If Added Delay

Decreases Throughput In the last section we showed, that we are not able to meet the short deadlines with the current version of the USRP in current wireless protocols. We also showed relaxing these deadlines works, but significantly reduces the maximal throughput. In this section we will discuss possible solutions to these problems. In general, the solutions are of two kind: either we change the hardware/software architecture in order to comply with current protocols, or we modify protocols to cope with the restrictions given by SDR systems.

5.2.1 Hardware / Software Architecture Changes There are multiple parts in hardware and software that can be improved in order to get a better performance. The following list is not exhaustive and mentions just some of the more obvious solutions:

• USB: Instead of using USB2 one could change the USRP to use a different bus technology, which has a higher bandwidth and shorter latencies to the CPU. One possible candidate would be PCI-Express, which has a bandwidth of up to 8 GByte/s and could use DMA to put the samples directly into memory. But most of the general purpose bus architectures found in general purpose computers are optimized towards throughput, whereas an SDR system requires excellent latency and throughput performance. Additionally, from the measurements we presented in Section 4 we can see that even if the bus latency could be reduced to 0, we still have a large latency impact that comes from the processing of the samples on a general purpose CPU.

• RSSI gating and m-Block: An enhancement, which is already in development in GNU Radio, is the message block (m-block). This new component will allow annotating blocks of samples with metadata like timing information. With this mechanism you will be able to instruct the USRP at what time it should send out which

Figure 6: Experimental throughput study of IEEE 802.15.4

in NS-2 for different RX-TX turnaround times.

samples. Another possibility, driven by the scenario depicted earlier on Figure 1, would be to link the USRP’s FPGA to the RSSI circuit and then annotate the samples with specific RSSI characteristics, instead of timing information. Thus, the USRP sends out the samples once the measured RSSI meets these characteristics, instead of doing the channel sensing in the CPU itself. This mechanism allows for a very simple implementation of channel access methods, similar to what current wireless protocols do. But it can still not be used for packets that depend on earlier received packets since they can not be precomputed.

• Optimizing via precomputation: We mentioned precomputation multiple times so far as a mean to decrease latency. The goal of it is to calculate the samples that need to be sent over the air ahead of time, before the message needs to be sent. The samples get sent to the radio frontend and stored in a buffer. That way, if the radio frontend receives a interrupt from the general purpose processor, or if it detects some specific feature in the channel sensing, the samples can be sent immediately without the necessary transfer over a bus system. An extension to this scheme would be to store partial packet fragments, which always stay constant in a buffer close to the radio frontend. Thus, the processor needs only to compute the parts that change, which safes processing time as well as bus transfers.

• Offload often used code into hardware: Some of the most computing intensive tasks in a SDR system are the filters, and general purpose CPUs are not designed to calculate these filters efficiently. Therefore, it would make sense to push some general filter algorithms like low-pass, high-pass, band-pass, frequency translating filters, etc. down into the hardware itself. Fortunately, these filters come early in the processing stage and thus are ideal candidates to be done on the FPGA on the USRP before the data gets transferred to the computer.

5.2.2 Protocol Changes Another solution to the increase in hardware latency is to design protocols, which do not have such hard deadline constraints. Following is a non-exhaustive list of possible protocols, including changes to existing protocols, which could solve the problem:

• TDMA: Using a TDMA protocol would solve most of the latency problems. However, it also introduces new problems, which need to be solved such as synchronization between the participating nodes. Additionally, the m-block implementation would be needed to guarantee that samples are sent out at a specific time. This is currently not possible in GNU Radio, but should be supported in future versions.

• Universal Header Coding: One could imagine that the headers and ACKs use a very simple modulation scheme, and are the same through all the different protocols. This would allow implementing the demodulation in the hardware and thus a quick ACK or CTS response could be guaranteed. A similar scheme is already used today in 802.11, where the headers and beacons are sent with 1MBps, and only after the header the modulation switches to the specific speed.

• Delayed ACK: The increased latency is a big problem in the ACK reply and the RTS/CTS exchange. For the latter one we currently do not have a solution because we need to decode the RTS on the CPU in order to get the address and timing information. Additionally, the RTS reserves the channel for a certain amount of time and no one can use it for that time, except the node addressed in the RTS. Thus, we have to accept the latency increase. For the ACKs, we have some more liberty. For example, we can delay them to a later point in time. A hypothetical protocol could look like this: assume that we have a similar protocol as IEEE 802.11, i.e., the channel needs to be clear for at least a DIFS period before one can try to acquire it. Unfortunately, the SDR implementation cannot reply to a message until 1.5! DIFS, i.e., if someone else tries to acquire the channel in order to send a message, the ACK cannot be sent immediately after the message. Therefore, if no one needs the channel (low channel utilization), we can afford the latency and send the ACK after 1.5! DIFS. Else, if someone acquired the channel to send a message, the ACK-ing node waits until the end of the transmission and then grabs the channel at 0.5! DIFS, i.e., he has priority over all the other nodes. Note that we did not do any formal analysis of such a protocol. It is simply an idea of how one might solve the latency problem and certainly needs much more work to determine its feasibility.

6. Related Work S. Valentin describes in [17] the delays measured in their GNU Radio testbed. He states that the USB bus transfer and USRP calculations can be neglected. As we showed in Section 3 this is not the case. Unfortunately, [17] fails to mention the sampling rate at which they run the USRP. However, if we assume that they run it at 8MS/s, then we can find from our measurements the total USB latency as 2"(RX Latency + TX Latency) = 1.6ms. This is close to the 1.4 ms delay Valentin found four their setup.

The SpectrumWare project had its own interface card, the GuPPI [7]. This card used the PCI bus and DMA for a fast memory access. The minimum input latency they could achieve was a little bit over 1 ms. Additionally, the GuPPI project used some virtual memory tricks in order to decrease the OS overhead for memory access. It remains to be investigated if such tricks could also decrease the latency in GNU Radio.

The CalRADIO 1 [18] from UCSD takes a different approach. Their goal is to provide a general development platform for the layers 2-7 of 802.11b. Their physical layer is fixed and implemented in the Intersil (Conexant) Prism chip. Everything else is done in software on a C5471 Texas Instruments DSP, which contains an ARM processor.

There are many other SDR platforms available commercially, such as the XMC-3321 from Spectrum Signal, or in academia, like the MIFE system from EPFL[19]. Unfortunately, the discussion of each of these systems available is out of scope for this publication.

7. Conclusion SDR platforms have emerged as a flexible alternative to conventional radios, but their very nature of pushing the physical layer processing into general purpose processors

increases the latency of available interconnects. This latency needs to be characterized and should be as small as possible. We conducted such latency measurements for the receive and transmit path of the USRP in order to study the impacts on MAC layer protocols. We found that the minimal receive latency is about 600µs and the minimal transmit latency about 200µs. We found that these latencies are already too long for the IFS requirements of modern MAC protocols. However, they do not include any calculation in the processor itself, which will increase that latency even more.

In order to study the impact of larger latencies, we implemented two short-range radio standards, the physical layer of IEEE 802.15.4 and a simple FSK scheme. These two implementations allowed us to make an assessment on the latency including modulation/demodulation and packet processing. We found in our experimental analysis that the round trip time of our IEEE 802.15.4 SDR implementation is 25ms on average and 50ms maximum. This is slow compared to a conventional radio chip, which achieves a round trip time of 8 ms, and shows that latency is a big issue. As a consequence, we studied the impact on a MAC layer considering an increase in latency. We found that the throughput of IEEE 802.15.4 decreases by about 30% under a heavy network load. Finally, we presented ideas on how to solve this problem. The solutions we gave are two fold, based on changes in hardware and software in order to meet latencies of existing MAC protocols, or based on modifications and adaptation of the used wireless protocols to make them cope with the problems introduced by SDR systems.

The implementation of the two physical layer implementations in GNU Radio allow us to do further research on different aspects in Software Defined Radio. The implementation of two sample applications, a dual-channel radio and a physical layer bridge, give a first impression on the potential of Software Defined Radios in wireless sensing systems. The field of cognitive radios is growing rapidly, and we plan to integrate cognitive radio principles, such as dynamic channel and modulation selection, into our software solution.

8. REFERENCES [1] D. Tennenhouse, V. Bose. The SpectrumWare Approach to

Wireless Signal Processing, Wireless Network Journal, 1996

[2] J. Mitola. Software Radio Architecture: a Mathematical

Perspective, IEEE Journal on Selected Areas in

Communications 17 (4) (1999) 514–538.

[3] S. Lang, B. Daneshrad. Design and Implementation of a 5.25

GHz Radio Transceiver for a MIMO Testbed, Wireless

Communications and Networking Conference, 2005.

[4] K. Lundberg. High-Speed Analog-to-Digital Converter

Survey, unpublished, 2005.

[5] J. Mitola. Cognitive Radio, Licentiate Thesis, Dept. of

Teleinformatics, Royal Institute of Technology, Sweden,

1999.

[6] S. Haykin. Cognitive Dynamic Systems, under preparation,

2007.

[7] M. Ismert. Making Commodity PCs Fit for Signal

Processing, USENIX, 1998.

[8] GNU Radio. http://www.gnu.org/software/gnuradio/, Mai

2007.

[9] ADROIT website, http://acert.ir.bbn.com/, Mai 2007.

[10] ADROIT: GNU Radio Architectural Changes,

http://acert.ir.bbn.com/downloads/adroit/gnuradio-

architectural-enhancements-3.pdf, Mai 2007.

[11] Supported GNU Radio hardware, http://comsec.com/wiki?

GnuRadioHardware, Mai 2007.

[12] Ettus Research LLC, Universal Software Radio Peripheral,

http://www.ettus.com/, Mai 2007

[13] http://acert.ir.bbn.com/projects/gr-ucla, Mai 2007

[14] C. Han, R. Rengaswamy, R. Shea, E. Kohler, M. Srivastava.

Sos: A dynamic Operating System for Sensor Networks,

SenSys 2005.

[15] The Network Simulator - ns-2, http://www.isi.edu/nsnam/ns/,

Mai 2007.

[16] J. Zheng and M. Lee. A Comprehensive Performance Study

of IEEE 802.15.4, IEEE Press Book, 2004.

[17] S. Valentin and H. von Malm and H. Karl. Evaluating the

GNU Software Radio Platform for Wireless Testbeds,

Technical Report TR-RI-06-273, February 2006

[18] UCSD, CalRadio1, http://calradio.calit2.net, Mai 2007.

[19] MIFE, http://lcmwww.epfl.ch/SRAD/, Mai 2007.

an experimental study of network performance impact of...

Documents