radar signal processing - college of engineering

Radar Signal Processing: Hardware Accelerator and Hardware Update

First Semester Report

Fall Semester 2007

by

Michael Neuberg

Christopher Picard

Prepared to partially fulfill the requirements for ECE401

Department of Electrical and Computer Engineering

Colorado State University

Fort Collins, Colorado 80523

Report Approved:

Project Advisor

Senior Design Coordinator

ii

Abstract

This project consists of two different projects both dealing with Colorado State University’s (CSU’s) radar systems. The first project is to update CSU’s Pawnee Doppler radar system. This will be done to improve performance and ability to enhance data. The second project deals with adding a hardware accelerator to the implementation for the Parametric Time Domain Method (PTDM) clutter mitigation algorithm. The device used in the hardware acceleration will be graphics cards.

The antiquated Pawnee radar system is outdated, which makes it hard to find replacement parts and limits the performance. The main retrofit is to replace the sychros, which give rotational position, with optical encoders. This should improve the resolution ten times, which enhances the system’s range over long distances. For the PTDM clutter mitigation implementation a standard computer system cannot compute results in real time. Graphics cards can perform needed calculations at rapid rates. This will allow the algorithm to be applied in real time.

Updating the system by using optical encoders requires interfacing to the Digital Signal Processor (DSP). A Field Programmable Gate Array (FPGA) will be used to send data to and from the encoders, then to the DSP. This will act as the interface between the two devices. A printed circuit board will be made with the FPGA to transmit and receive the data. The hardware that will be used is a Xilinx Spartan-3E FPGA and Stegmann ARS-20 absolute optical encoder. This hardware will then be installed at the Pawnee radar system which will improve the data about the position of the radar antenna. When researching different graphics cards to use for the implementation of the PTDM, clutter mitigation algorithm the Nvidia Quadro FX 5600 was chosen. The graphics card has 16 multiprocessors each of which can execute 768 simultaneous threads. Nvidia has a language called Compute Unified Device Architecture (CUDA). This language allows the programmer to use a C style language to code tasks on the graphics cards. This will greatly speed up production time for the implementation.

iii

Acknowledgments

The authors of this paper would like to thank Dr. Chandra for the opportunity to work on this project. It has given us valuable experience with the radar facilities at Colorado State University (CSU). We would also like to thank Jim George as well as the rest of the graduate students that are part of the CSU radar research team. We could not have gotten this far on our project without their guidance.

We would like to thank the CSU CHILL facilities for donating the components to help research and test for the CSU Pawnee update project. We would also like to thank Hewlett Packard for their donation of the workstation platform and graphics cards to the hardware accelerator project.

iv

TABLE OF CONTENTS

Title i

Abstract ii

Acknowledgments iii

Table of Contents iv

List of Figures and Tables v

I. Introduction 1

II. Background Information of Radar & CSU Facilities 2

III. Hardware Accelerator 3

A. Review of Previous Work 3

B. Problem Requirements and Solution Approaches 6

C. Solution Implementation 7

C.1 Hardware 7

C.2 CUDA Implementation 8

IV. Pawnee Hardware Update 10

A. Colorado State University Pawnee Radar Facility 10

B. Pawnee Hardware 10

C. Problems Arising With Old Hardware 11

D. Encoder Background 12

E. Inner Workings of the Encoder 14

F. Xilinx Spartan-3E FPGA Details 15

G. Methodology for Project 16

H. Current Work and Results 17

V. Future Work and Conclusions 18

References 21

Bibliography 22

Appendix A – Abbreviations 23

Appendix B – Budget 24

Appendix C – Gray to Binary Comparison 25

v

List of Figures

Figure 1 CHILL Doppler Radar Facility 2

Figure 2 Pawnee Doppler Radar Facility 3

Figure 3 PPI Without Ground Clutter Filter 4

Figure 4 PPI With Ground Clutter Filter 5

Figure 5 Nvdia CUDA Architecture 9

Figure 6 PPI CHILL Radar Image, 2003, Tornado 11

Figure 7 Pawnee Radar Inside the Radome 13

Figure 8 Stegmann ARS-20 Absolute Optical Encoder 14

Figure 9 Inside Components of Optical Encoder 14

Figure 10 Digilent BASYS Board 15

Figure 11 Clock Signal and Encoder Data 18

List of Tables

Table 1 FX 5600 Specifications 8

Table 2 Xilinx Spartan-3E Specifications 15

Table 3 Pawnee Hardware Update Schedule 19

Table 4 Hardware Accelerator Schedule 19

1

I. Introduction

There are two separate projects that are integrated together to benefit both of Colorado

State University (CSU’s) radar sites. The first project is to address the issue of ground clutter on

a radar image. There are several methods for mitigating ground clutter, each method having its

own benefits and disadvantages. This report will discuss one of the more effective ground clutter

mitigation techniques, as well as the difficulties in implementing the approach.

This report will cover a method for creating a hardware accelerator in order to perform

matrix calculation in parallel to significantly reduce the computation time. It will use a unique

preset architecture on a graphics card. Using the architecture and a C based extension language,

the graphics card will be transformed into the hardware accelerator for this project. Using this

hardware accelerator will allow one of the ground clutter mitigation methods to be preformed in

real time.

The second project contains information to update hardware at one of the CSU radar

facilities. Most of the hardware is outdated, which makes it hard to find replacement parts.

Updating the hardware will enhance the radar positioning system and will provide more precise

data. The main hardware update will be to replace synchros with encoders to determine the

precise angle and direction where the dish is directed. More details of the project will be

discussed along with the methods of implementation.

First, a background description of how radar works and the history will be discussed.

Then information on the two CSU radar facilities will be provided. Chapter III will deal with the

hardware accelerator project details. Chapter IV will discuss the specifics of the hardware

2

update. The last section will discuss wrap up the two projects and give an idea of future work to

be completed.

II. Background Information on Radar and CSU Facilities

The basic idea of radar is very simple. During WWII when ships or airplanes would

cross the path of radio signals, their echoes would be heard. This lead to the idea of using radio

signals to determine the location of objects. This technology was further developed during

WWII forward to current day innovations. Many great technological advances have been made

to benefit the development of radar today. Much research continues today to improve and

enhance radar systems.

Understanding how radar works is a very simple idea. Radar basically consists of four

parts: 1.) There is the transmitter that generates a signal, 2.) an antenna that sends and receives

the signal, 3.) a receiver that detects the signal, and 4.) a display unit to analyze the data. All

these components make up very complex radar systems.

Colorado State University currently uses two radar facilities. The CHILL Doppler radar

(Figure 1) is a dual polarized S-band system located by Greeley, Colorado.

Figure 1 CHILL Radar Facility

3

The other is the Pawnee Doppler radar (Figure 2) which is a single polarized S-band system

located by Nun, Colorado..

Figure 2 Pawnee Radar Facility

Both radar facilities are combined to operate as a dual-Doppler configuration. The waveform

generated is a Gaussian waveform. Then a Klystron amplifier charges up the waveform to

1MW/channel. Next this is sent to the transmitter and sent off into the sky. Both of these

systems are sponsored by the National Science Foundation and Colorado State University.

III. Hardware Accelerator

A. Review of previous work:

One of the major problems that weather radar systems face is ground clutter. Ground

clutter is when the radar beam bounces off objects located on the ground. These objects can be

trees, buildings, mountains, et cetera. In order get a good weather image over areas with ground

clutter a clutter mitigation algorithm must be applied. In Figure 1 there is a Plan Position

Indicator (PPI) image with no ground clutter mitigation approaches applied. Figure 2 shows an

image seconds later with a ground clutter mitigation approach applied.

4

Figure 3 PPI Without ground clutter filter

5

Figure 4 PPI With Ground Clutter Filter

As one can see, this filter significantly reduces the presence of ground clutter. It does not

reduce the clutter completely.

Parametric time domain method (PTDM) is a method for radar ground clutter mitigation.

This method helps suppress the effect of ground clutter close to the radar site. Unlike previous

methods, this approach does not have the problem of signal loss or spectral leakage (Nguyen,

Moisseev, Chandaraskar 2007). The reason PTDM is not subject to these problems is because it

does not apply Fourier transforms on the signal switching the domain. Instead the method

performs its calculations in the time domain. The main issue with the PTDM approach is it

6

requires computationally expensive calculations. This is primarily due to the complex matrix

calculations that must be performed on large matrixes. When simulated in Matlab, the

calculations were shown to be too slow to run on a standard personal computer.

B. Problem Requirements and Solution Approaches:

The problem is to find an approach to implement a hardware accelerator for calculating

determinates and inverses on large matrixes. The accelerator must be able to perform the

calculation at a rate to implement the PTDM algorithm in real time.

The first solution approach investigated to enhance the performance of matrix

calculations was implementing the calculations on a Field Programmable Gate Array (FPGA).

This would involve using logical gates to paralyze the operations required to take determinates

and inverses of matrixes. The main drawback of this approach is the development time along

with the high cost of custom designing a circuit.

The second solution approach investigated was to perform the matrix calculations on a

Graphics Processing Unit (GPU). A GPU is a processor that is located on a graphics card. In

order for a graphics card to render images and handle the movement of those images on a screen

it must perform matrix operations at an accelerated rate. The graphics card accomplishes this by

paralyzing the calculations as well as designing the GPU to optimize these calculations.

After comparing the two approaches, it was determined that using a GPU was the better

approach. The primary reason the second approach was a better solution is that a significant

amount of development time has already been spent optimizing matrix calculation. This makes

the development time of the implementation significantly shorter.

7

C. Solution Implementation:

This section will talk about the hardware the PTDM will be implemented on. It also

discusses the method for turning a GPU into a usable hardware accelerator.

C.1 Hardware:

There were two graphics cards venders researched as possible suppliers for this project.

The first vender is ATI graphics cards. ATI has a development method called Close To the Metal

(CTM). This development tool allows the user to write a custom driver level piece of code for

the graphics card. The primary disadvantage of this approach is the learning curve for

development.

The second vender is Nvidia. Nvidia has a development method called Compute Unified

Design Architecture (CUDA). This method builds a general architecture on the graphics card and

then allows the user to write in a C style language making special function calls to compute on

the graphics card. With this C style language development time is significantly shorter.

When comparing the two venders the Nvidia approach was selected because of the

shorter development time. The specific graphics card picked was the Quadro FX 5600. This card

is one of the few that will work with the CUDA architecture and is supported on the Hewlett

Packard xw9400. Two of these cards as well as the platform were donated to the project by

Hewlett Packard. The graphics card's specifications are listed in Table 1.

8

FX 5600 Specifications Values

GPU G80

GPU Speed 600 MHz

RAM 1.5GB GDDR3

RAM Speed 800 MHz

Memory Interface 384 Bit

Memory Bandwidth 76.8 GB/sec

Table 1 FX 5600 Specifications

C .2 CUDA Implementation:

In order to turn the two graphics cards into hardware accelerators CUDA will be used.

This architecture turns the graphics card into multiprocessor blocks. Each of these

multiprocessors has several individual processors with dedicated registers as well as access to

shared memory. Each multiprocessor can handle 768 concurrent threads and has 8192 registers

(Nvidia CUDA Programming Guide 2007). For the entire architecture see Figure 5.

9

Figure 5 Nvidia CUDA Architecture

CUDA also allows the user to write their program in C code, using extensions in order to

communicate with the card. This allows for the original C code version of the algorithm to be

rewritten into a CUDA version of the code. All of the matrix calculations will be sent to the

graphics cards taking advantage of the parallel design of CUDA.

10

IV. Pawnee Hardware Update

The goal of this project is to update the positioning system and hardware on the CSU-

Pawnee Doppler radar. This will be done by replacing synchros with optical encoders and

interfacing them to the radar signal processor using an FPGA. This will greatly improve the

accuracy of this site.

A. Colorado State University Pawnee Radar Facility

The Pawnee radar is located about 48km north of the CSU-CHILL radar in Greeley.

This radar was shown before in Figure 2. The antenna is located within the radome on the right-

hand-side. The trailer on the left is where all the main radar hardware, electrical equipment, and

operators are located. This is a single-polarization radar system which means it only transmits

signals on one axis.

B. Pawnee Hardware

The old Pawnee radar has outdated equipment. This old equipment makes it hard to fix

and find replacement parts. While this radar still works fine, it can definitely be enhanced.

Right now the radar uses synchros, which were developed during WW II, to communicate

rotational position to the operator. The two types of rotational position are the azimuth position

and the elevation position. A synchro is basically a 3 phase system with transformers used to

drive currents to a motor that can be used to determine angular position. They also use many

wires to transmit the data in parallel over long distances, which in turn can be very costly, prone

to noise, and prone to damage. Due to all the wires and the age of the synchros only so many

bits of resolution can be determined. The current synchros have a resolution of approximately

0.1 with the number of data bits available.

11

C. Problem Arising With Old Hardware

What sort of error can be cause from the low resolution of the synchros? An example of a

typical radar image is shown below in Figure 6.

Figure 6 PPI CHILL Radar Image, 2003, Tornado

12

This was taken in August 2003 from the CHILL facility of Burns, Wyoming, which is

approximately 100km away. The area within the big circle represents a tornado given by the

“hook” shape that the reflectivity is making. If the image above was from the Pawnee radar site,

and not from CHILL, the following errors can be calculated due to the low resolution of the

synchros.

From the radar at the center suppose that the tornado was located at an angle of 75o.

x=25.8819km

y=96.5926km

The simple law of sines can show the precise location of the tornado. If the degree is off

by 0.1, then the resulting error can occur.

x=26.0505km

y=96.5473km

This can place the tornado almost 0.2km (656ft) from where it actually is. If this value

could be improved to track the exact location of the weather system, people who were directly in

the path of the tornado could be warned. This is only for 100km away when many weather

radars can go up to a few hundred kilometers away, which increases the error caused by the low

resolution. The goal is to improve these values.

D. Encoder Background

A good solution to the issue of low resolution is to use optical encoders. Absolute

encoders from Stegmann, ARS-20 were purchased to replace the synchros. Absolute encoders

are used because the position data is still retained even after power loss. This is important

because the radar is not operational all the time. The absolute encoder was compared to an

incremental encoder that loses its position upon power loss. An absolute encoder was by far the

13

best choice. The best thing about this encoder is that it transmits 15 bits of data serially for each

position. This means that there are 2 to 32,768 different values that this can read within 360o.

This gives a resolution of 0.01. By doing the same math as before with the better resolution, the

error for 100km is only off by 0.02km (65ft), which is very good. Another benefit to using

optical encoders is the fact that they are not susceptible to noise and electromagnetic

interference. They are also beneficial because it requires fewer wires to transmit data. Only two

wires are required to transmit data and two wires are used for a clock signal. This reduces noise

and cost with the wires. The encoder is shown below with a view of the radar inside the radome.

It will go inside the platform where the synchros currently reside. One encoder is needed to

track the azimuth rotational position and one is needed for the elevation position.

Figure 7 Pawnee Radar Inside the Radome

14

E. Inner Workings of the Encoder

How do these encoders work? Absolute optical encoders use a light source, rotating

disk, and photo detectors to produce an output signal. The important part to this is the rotating

disk. This disk has fifteen independent rotating tracks because one track is needed for one bit of

data. Each track rotates independently and has various arrangements of holes. As the light

shines onto the disk the tracks are positioned to let the light through certain bits which represents

the position the encoder is turned. This light is then picked up with a photo detector and the bits

are transmitted. As said before, the optical encoder is great because the light is not affected the

electromagnetic waves and fields.

Gray code is used instead of binary because to change by a value of one, only one bit

needs to change. With binary data to change by a value of one may require multiple bits to

change. An comparison of the two can be found in Appendix C. By only changing one bit at a

time, this makes it easier for the tracks to rotate. The encoder and an inside image is shown

below in Figure 8 and 9.

Figure 8 Figure 9 Stegmann ARS-20 Absolute Optical Encoder Inside Components of Optical Encoder

15

F. Xilinx Spartan-3E FPGA Details

Now that the encoders will be able to improve the resolution of the radar, how is the data

communicated back and forth? A Field Programmable Gate Array (FPGA) was chosen because

of the ease of configuration and programming when compared to transistor logic chips. Also

most FPGA’s have more than enough functionality to complete most tasks. To do FPGA testing

a development board was needed. A Digilent BASYS programmable logic board was used.

Figure 10 Digilent BASYS Board

The board contained a Xilinx Spartan-3E FPGA. The chip specifications of the Spartan-3E are

shown in Table 2.

Xilinx Spartan-3E Values

Speed 500 MHz

Block RAM 72 K bits

Multipliers 4 at 18bits each

I/O pins 144

System Gates 100 K

Table 2

16

Xilinx Spartan-3E Specification Testing with this board works well because this gives a chance to complete the coding

and debugging of the FPGA before implementation. A lot of research about the specifications of

this board was done using online documentations from Xilinx and Digilent. There is much

documentation about the BASYS board and operation, about being able to use the Digilent

software to download to the device, and also how all the external connectors and switches were

connected to the FPGA. Much more research was done looking into the FPGA data sheets.

With 144 pins all the I/O were checked to determine which would need to be used. Many other

features such as Digital Clock Manager (DCM), electrical characteristics, timing, memory, and

so on were looked into when working with this chip.

G. Methodology for Project

This project requires the FPGA to be programmed using VHSIC Hardware Description

Language (VHDL). Many tutorials and examples were researched online to understand this

programming language better. VHDL can be used to program physical signals and logic gates

that the FPGA can use as I/O. In the summer of 2007, a research experience for undergraduate

project was done by Darryl Benally and some of the code from that was used as a reference. The

tasks involved in programming this board were very difficult. To properly receive data from the

encoders the FPGA needs to generate a 17 pulse square wave. The encoder transmits 15 bits of

data and 2 error bits. The radar transmits waves approximately once every millisecond. In order

to get one data bit from the encoders with each transmit the minimum clock signal needs to be

1kHz per bit. With 17 bits per encoder the clock signal needs to be at least 17kHz if a reading is

required every time the radar sends a signal. A faster clock signal would allow for more readings

during the transmit time. With a clock signal going to each encoder, one elevation and one

azimuth, the encoder will be sending back the data of 17 bits on two separate lines. The FPGA

needs to read in this data and process it.

17

The encoder transmits the data in Synchronous Serial Interface (SSI) format which is

patented by Stegmann. This is where a clock signal is sent to the encoder and the data is send

back. The first high to low transition stores the data into a serial format ready to be transmitted.

Then on the first low to high transition the most significant bit of the Gray code is transmitted.

Then on each following low to high pulse the next bit is transmitted and so on. With 17 pulses

only 15 bits of data are taken in due to the first and last signals being held high which are used to

detect errors. An important part is to put a delay between successive clock signals so the data

sets can be distinguished. An example of the clock signal would be to start high then transmit a

square wave with 17 steps then finish high. After this then wait and repeat the signal. The

benefits of SSI is the need for only 4 data lines, low conventional component count, and one can

store data simultaneously.

Once the data is received by the FPGA from the encoder, it is processed. First, it must be

converted from Gray code to binary. Then one of the data sets needs to be shifted so that the

elevation and azimuth position data can be combined into the same thirty-two bit word of data.

This is called the “parallel to serial conversions” because the data from two parallel lines is able

to be transmitted on one single line. The single set of data is then transmitted over fiber optic

lines to the radar signal processor unit. The whole process is completed in less then a

millisecond, meaning the data coming from the radar position will always be accurate.

H. Current Work and Results

As of the end of the fall semester of 2007, here is the current status of the project. Right

now programming the FPGA in VHDL is the task at hand. An onboard digital clock is used and

being divided down to get the desired frequency. The FPGA is outputting a clock signal with 17

pulses and a delay to both of the encoders. The encoders are responding back with data. An

example of the clock and data signal is shown in Figure 9. The encoder data on the bottom

18

represents a 15 bit serial data stream. As noted before, this is the light output passing through the

encoder rotating disks.

Figure 11 Clock Signal and Encoder Data

The tasks that still need to be completed are to continuously update data, Gray code to

binary conversion, parallel to serial combination, and transmit data over fiber optic lines. Much

more work still needs to get done, but a good understanding of VHDL and the FPGA has been

accomplished and everything should go at quicker pace.

V. Future Work and Conclusions

Next, semester the plans for the Pawnee Hardware update will be to design the circuit

board where the components and FPGA will rest. For testing purposes right now, another board

attaches to the external connectors on the BASYS with wires dangling between the two. The

whole setup is an acceptable short-term solution, but a long-term solution would be to have all

the components on the same board. The design of the board will be very challenging due to the

complexity of the BASYS circuit. Also, determining what parts need to be included and what

parts can be excluded will take some careful consideration. Once the board is designed the board

19

will be made and all the components will need to be in place. Testing of the board will be done

to see if the design is done correctly. An idea of timing for next semester is laid out in Table 3.

Task Start Date Finish Date Duration

Finish Programming Interface for FPGA ---------- 1/27/08 12 Weeks

Design Printed Circuit Board 1/27/08 3/23/08 8 Weeks

Build, Test, and Debug Circuit Board 3/23/08 4/13/08 3 Weeks

Table 3 Pawnee Hardware Update Schedule

For the hardware accelerator project the plans for next semester are to implement the

accelerator on the final hardware. The system must first be configured. Then simple test CUDA

programs will be run. The next step will be to write code that will optimize matrix determinates

and inverses on the graphics card. After that code is written, it will be integrated with the code

for the PTDM. Finally the system will be configured to integrate with the radar sites interface. A

timeline for these tasks are in Table 4.

Task Start Date Finish Date Duration

Get system setup and successfully run simple test

CUDA programs.

12/10/07 1/21/08 6 Weeks

Finish CUDA code that optimizes performance on

real sample data

1/21/08 2/18/08 4 Weeks

Integrate CUDA code with algorithm code. 2/18/08 3/31/08 6 Weeks

Configure system to interact with radar interface

and integrate system into site.

3/31/08 4/14/08 2 Weeks

Table 4 Hardware Accelerator Schedule

20

After the implementation of the new encoders as well as the hardware accelerator system,

CSU’s radar sites will have more accurate radar data to be retrieved. Given the facts outlined in

the paper, these two projects will increase the accuracy of the radar data in two different ways.

With the more accurate data, further research projects on the sites will benefit from this project.

21

References:

[1] Absolute Encoder; Sick/Stegmann

[2] ARS 20, ARS 25: Single-turn Absolute Encoder; Sick/Stegmann

[3] CUDA Programming Guide 1.1, Nvidia; November 2007

[4] Digilent Basys Board Reference Manual; Digilent

[5] Spartan-3 generation FPGA User Guide; Xilinx; April 2007

[6] Spartan-3E FPGA Family: Complete Data Sheet; Xilinx; May 2007

[7] Synchronous Serial Interface for Absolute Encoders; Stegmann

22

Bibliography

[1] Benally, Darryl, 2007: Antenna Control for the Pawnee Radar. Colorado State University

[2] Nguyen, Cuong M., Dmitri N. Moisseev, and V. Chandrasekar, 2007: A parametric time domain method for spectral moment estimation and clutter mitigation for weather radars. Colorado State University

[3] Rinehart, Ronald E., 1997: Radar for Meteorologists. 3rd Edition: Knight Printing 2001

23

Appendix A- Abbreviations

CTM- Close To the Metal

CSU- Colorado State University

CUDA- Compute Unified Device Architecture

DCM- Digital Clock Manager

DSP- Digital Signal Processor

FPGA- Field Programmable Gate Array

PPI- Plan Position Indicator

PTDM- Parametric Time Domain Method

SSI- Serial Synchronous Interface

VHDL- VHSIC Hardware Description Language

VHSIC- Very High Speed Integrated Circuits

24

Appendix B- Budget

Item Price

Digilent BASYS Board $50

2 Quadro FX 5600 $3000 ($1,500 a piece)

2 Stegmann Absolute Encoder $1200 ($600 a piece)

xw 9400 $1500

25

Appendix C- Gray to Binary Comparison

Gray Code Binary Code Number

0000 0000 0

0001 0001 1

0011 0010 2

0010 0011 3

0110 0100 4

0111 0101 5

0101 0110 6

0100 0111 7

1100 1000 8

radar signal processing - college of engineering

Documents