radar signal processing - college of engineering
TRANSCRIPT
Radar Signal Processing: Hardware Accelerator and Hardware Update
First Semester Report
Fall Semester 2007
by
Michael Neuberg
Christopher Picard
Prepared to partially fulfill the requirements for ECE401
Department of Electrical and Computer Engineering
Colorado State University
Fort Collins, Colorado 80523
Report Approved:
Project Advisor
Senior Design Coordinator
ii
Abstract
This project consists of two different projects both dealing with Colorado State University’s (CSU’s) radar systems. The first project is to update CSU’s Pawnee Doppler radar system. This will be done to improve performance and ability to enhance data. The second project deals with adding a hardware accelerator to the implementation for the Parametric Time Domain Method (PTDM) clutter mitigation algorithm. The device used in the hardware acceleration will be graphics cards.
The antiquated Pawnee radar system is outdated, which makes it hard to find replacement parts and limits the performance. The main retrofit is to replace the sychros, which give rotational position, with optical encoders. This should improve the resolution ten times, which enhances the system’s range over long distances. For the PTDM clutter mitigation implementation a standard computer system cannot compute results in real time. Graphics cards can perform needed calculations at rapid rates. This will allow the algorithm to be applied in real time.
Updating the system by using optical encoders requires interfacing to the Digital Signal Processor (DSP). A Field Programmable Gate Array (FPGA) will be used to send data to and from the encoders, then to the DSP. This will act as the interface between the two devices. A printed circuit board will be made with the FPGA to transmit and receive the data. The hardware that will be used is a Xilinx Spartan-3E FPGA and Stegmann ARS-20 absolute optical encoder. This hardware will then be installed at the Pawnee radar system which will improve the data about the position of the radar antenna. When researching different graphics cards to use for the implementation of the PTDM, clutter mitigation algorithm the Nvidia Quadro FX 5600 was chosen. The graphics card has 16 multiprocessors each of which can execute 768 simultaneous threads. Nvidia has a language called Compute Unified Device Architecture (CUDA). This language allows the programmer to use a C style language to code tasks on the graphics cards. This will greatly speed up production time for the implementation.
iii
Acknowledgments
The authors of this paper would like to thank Dr. Chandra for the opportunity to work on this project. It has given us valuable experience with the radar facilities at Colorado State University (CSU). We would also like to thank Jim George as well as the rest of the graduate students that are part of the CSU radar research team. We could not have gotten this far on our project without their guidance.
We would like to thank the CSU CHILL facilities for donating the components to help research and test for the CSU Pawnee update project. We would also like to thank Hewlett Packard for their donation of the workstation platform and graphics cards to the hardware accelerator project.
iv
TABLE OF CONTENTS
Title i
Abstract ii
Acknowledgments iii
Table of Contents iv
List of Figures and Tables v
I. Introduction 1
II. Background Information of Radar & CSU Facilities 2
III. Hardware Accelerator 3
A. Review of Previous Work 3
B. Problem Requirements and Solution Approaches 6
C. Solution Implementation 7
C.1 Hardware 7
C.2 CUDA Implementation 8
IV. Pawnee Hardware Update 10
A. Colorado State University Pawnee Radar Facility 10
B. Pawnee Hardware 10
C. Problems Arising With Old Hardware 11
D. Encoder Background 12
E. Inner Workings of the Encoder 14
F. Xilinx Spartan-3E FPGA Details 15
G. Methodology for Project 16
H. Current Work and Results 17
V. Future Work and Conclusions 18
References 21
Bibliography 22
Appendix A – Abbreviations 23
Appendix B – Budget 24
Appendix C – Gray to Binary Comparison 25
v
List of Figures
Figure 1 CHILL Doppler Radar Facility 2
Figure 2 Pawnee Doppler Radar Facility 3
Figure 3 PPI Without Ground Clutter Filter 4
Figure 4 PPI With Ground Clutter Filter 5
Figure 5 Nvdia CUDA Architecture 9
Figure 6 PPI CHILL Radar Image, 2003, Tornado 11
Figure 7 Pawnee Radar Inside the Radome 13
Figure 8 Stegmann ARS-20 Absolute Optical Encoder 14
Figure 9 Inside Components of Optical Encoder 14
Figure 10 Digilent BASYS Board 15
Figure 11 Clock Signal and Encoder Data 18
List of Tables
Table 1 FX 5600 Specifications 8
Table 2 Xilinx Spartan-3E Specifications 15
Table 3 Pawnee Hardware Update Schedule 19
Table 4 Hardware Accelerator Schedule 19
1
I. Introduction
There are two separate projects that are integrated together to benefit both of Colorado
State University (CSU’s) radar sites. The first project is to address the issue of ground clutter on
a radar image. There are several methods for mitigating ground clutter, each method having its
own benefits and disadvantages. This report will discuss one of the more effective ground clutter
mitigation techniques, as well as the difficulties in implementing the approach.
This report will cover a method for creating a hardware accelerator in order to perform
matrix calculation in parallel to significantly reduce the computation time. It will use a unique
preset architecture on a graphics card. Using the architecture and a C based extension language,
the graphics card will be transformed into the hardware accelerator for this project. Using this
hardware accelerator will allow one of the ground clutter mitigation methods to be preformed in
real time.
The second project contains information to update hardware at one of the CSU radar
facilities. Most of the hardware is outdated, which makes it hard to find replacement parts.
Updating the hardware will enhance the radar positioning system and will provide more precise
data. The main hardware update will be to replace synchros with encoders to determine the
precise angle and direction where the dish is directed. More details of the project will be
discussed along with the methods of implementation.
First, a background description of how radar works and the history will be discussed.
Then information on the two CSU radar facilities will be provided. Chapter III will deal with the
hardware accelerator project details. Chapter IV will discuss the specifics of the hardware
2
update. The last section will discuss wrap up the two projects and give an idea of future work to
be completed.
II. Background Information on Radar and CSU Facilities
The basic idea of radar is very simple. During WWII when ships or airplanes would
cross the path of radio signals, their echoes would be heard. This lead to the idea of using radio
signals to determine the location of objects. This technology was further developed during
WWII forward to current day innovations. Many great technological advances have been made
to benefit the development of radar today. Much research continues today to improve and
enhance radar systems.
Understanding how radar works is a very simple idea. Radar basically consists of four
parts: 1.) There is the transmitter that generates a signal, 2.) an antenna that sends and receives
the signal, 3.) a receiver that detects the signal, and 4.) a display unit to analyze the data. All
these components make up very complex radar systems.
Colorado State University currently uses two radar facilities. The CHILL Doppler radar
(Figure 1) is a dual polarized S-band system located by Greeley, Colorado.
Figure 1 CHILL Radar Facility
3
The other is the Pawnee Doppler radar (Figure 2) which is a single polarized S-band system
located by Nun, Colorado..
Figure 2 Pawnee Radar Facility
Both radar facilities are combined to operate as a dual-Doppler configuration. The waveform
generated is a Gaussian waveform. Then a Klystron amplifier charges up the waveform to
1MW/channel. Next this is sent to the transmitter and sent off into the sky. Both of these
systems are sponsored by the National Science Foundation and Colorado State University.
III. Hardware Accelerator
A. Review of previous work:
One of the major problems that weather radar systems face is ground clutter. Ground
clutter is when the radar beam bounces off objects located on the ground. These objects can be
trees, buildings, mountains, et cetera. In order get a good weather image over areas with ground
clutter a clutter mitigation algorithm must be applied. In Figure 1 there is a Plan Position
Indicator (PPI) image with no ground clutter mitigation approaches applied. Figure 2 shows an
image seconds later with a ground clutter mitigation approach applied.
4
Figure 3 PPI Without ground clutter filter
5
Figure 4 PPI With Ground Clutter Filter
As one can see, this filter significantly reduces the presence of ground clutter. It does not
reduce the clutter completely.
Parametric time domain method (PTDM) is a method for radar ground clutter mitigation.
This method helps suppress the effect of ground clutter close to the radar site. Unlike previous
methods, this approach does not have the problem of signal loss or spectral leakage (Nguyen,
Moisseev, Chandaraskar 2007). The reason PTDM is not subject to these problems is because it
does not apply Fourier transforms on the signal switching the domain. Instead the method
performs its calculations in the time domain. The main issue with the PTDM approach is it
6
requires computationally expensive calculations. This is primarily due to the complex matrix
calculations that must be performed on large matrixes. When simulated in Matlab, the
calculations were shown to be too slow to run on a standard personal computer.
B. Problem Requirements and Solution Approaches:
The problem is to find an approach to implement a hardware accelerator for calculating
determinates and inverses on large matrixes. The accelerator must be able to perform the
calculation at a rate to implement the PTDM algorithm in real time.
The first solution approach investigated to enhance the performance of matrix
calculations was implementing the calculations on a Field Programmable Gate Array (FPGA).
This would involve using logical gates to paralyze the operations required to take determinates
and inverses of matrixes. The main drawback of this approach is the development time along
with the high cost of custom designing a circuit.
The second solution approach investigated was to perform the matrix calculations on a
Graphics Processing Unit (GPU). A GPU is a processor that is located on a graphics card. In
order for a graphics card to render images and handle the movement of those images on a screen
it must perform matrix operations at an accelerated rate. The graphics card accomplishes this by
paralyzing the calculations as well as designing the GPU to optimize these calculations.
After comparing the two approaches, it was determined that using a GPU was the better
approach. The primary reason the second approach was a better solution is that a significant
amount of development time has already been spent optimizing matrix calculation. This makes
the development time of the implementation significantly shorter.
7
C. Solution Implementation:
This section will talk about the hardware the PTDM will be implemented on. It also
discusses the method for turning a GPU into a usable hardware accelerator.
C.1 Hardware:
There were two graphics cards venders researched as possible suppliers for this project.
The first vender is ATI graphics cards. ATI has a development method called Close To the Metal
(CTM). This development tool allows the user to write a custom driver level piece of code for
the graphics card. The primary disadvantage of this approach is the learning curve for
development.
The second vender is Nvidia. Nvidia has a development method called Compute Unified
Design Architecture (CUDA). This method builds a general architecture on the graphics card and
then allows the user to write in a C style language making special function calls to compute on
the graphics card. With this C style language development time is significantly shorter.
When comparing the two venders the Nvidia approach was selected because of the
shorter development time. The specific graphics card picked was the Quadro FX 5600. This card
is one of the few that will work with the CUDA architecture and is supported on the Hewlett
Packard xw9400. Two of these cards as well as the platform were donated to the project by
Hewlett Packard. The graphics card's specifications are listed in Table 1.
8
FX 5600 Specifications Values
GPU G80
GPU Speed 600 MHz
RAM 1.5GB GDDR3
RAM Speed 800 MHz
Memory Interface 384 Bit
Memory Bandwidth 76.8 GB/sec
Table 1 FX 5600 Specifications
C .2 CUDA Implementation:
In order to turn the two graphics cards into hardware accelerators CUDA will be used.
This architecture turns the graphics card into multiprocessor blocks. Each of these
multiprocessors has several individual processors with dedicated registers as well as access to
shared memory. Each multiprocessor can handle 768 concurrent threads and has 8192 registers
(Nvidia CUDA Programming Guide 2007). For the entire architecture see Figure 5.
9
Figure 5 Nvidia CUDA Architecture
CUDA also allows the user to write their program in C code, using extensions in order to
communicate with the card. This allows for the original C code version of the algorithm to be
rewritten into a CUDA version of the code. All of the matrix calculations will be sent to the
graphics cards taking advantage of the parallel design of CUDA.
10
IV. Pawnee Hardware Update
The goal of this project is to update the positioning system and hardware on the CSU-
Pawnee Doppler radar. This will be done by replacing synchros with optical encoders and
interfacing them to the radar signal processor using an FPGA. This will greatly improve the
accuracy of this site.
A. Colorado State University Pawnee Radar Facility
The Pawnee radar is located about 48km north of the CSU-CHILL radar in Greeley.
This radar was shown before in Figure 2. The antenna is located within the radome on the right-
hand-side. The trailer on the left is where all the main radar hardware, electrical equipment, and
operators are located. This is a single-polarization radar system which means it only transmits
signals on one axis.
B. Pawnee Hardware
The old Pawnee radar has outdated equipment. This old equipment makes it hard to fix
and find replacement parts. While this radar still works fine, it can definitely be enhanced.
Right now the radar uses synchros, which were developed during WW II, to communicate
rotational position to the operator. The two types of rotational position are the azimuth position
and the elevation position. A synchro is basically a 3 phase system with transformers used to
drive currents to a motor that can be used to determine angular position. They also use many
wires to transmit the data in parallel over long distances, which in turn can be very costly, prone
to noise, and prone to damage. Due to all the wires and the age of the synchros only so many
bits of resolution can be determined. The current synchros have a resolution of approximately
0.1 with the number of data bits available.
11
C. Problem Arising With Old Hardware
What sort of error can be cause from the low resolution of the synchros? An example of a
typical radar image is shown below in Figure 6.
Figure 6 PPI CHILL Radar Image, 2003, Tornado
12
This was taken in August 2003 from the CHILL facility of Burns, Wyoming, which is
approximately 100km away. The area within the big circle represents a tornado given by the
“hook” shape that the reflectivity is making. If the image above was from the Pawnee radar site,
and not from CHILL, the following errors can be calculated due to the low resolution of the
synchros.
From the radar at the center suppose that the tornado was located at an angle of 75o.
x=25.8819km
y=96.5926km
The simple law of sines can show the precise location of the tornado. If the degree is off
by 0.1, then the resulting error can occur.
x=26.0505km
y=96.5473km
This can place the tornado almost 0.2km (656ft) from where it actually is. If this value
could be improved to track the exact location of the weather system, people who were directly in
the path of the tornado could be warned. This is only for 100km away when many weather
radars can go up to a few hundred kilometers away, which increases the error caused by the low
resolution. The goal is to improve these values.
D. Encoder Background
A good solution to the issue of low resolution is to use optical encoders. Absolute
encoders from Stegmann, ARS-20 were purchased to replace the synchros. Absolute encoders
are used because the position data is still retained even after power loss. This is important
because the radar is not operational all the time. The absolute encoder was compared to an
incremental encoder that loses its position upon power loss. An absolute encoder was by far the
13
best choice. The best thing about this encoder is that it transmits 15 bits of data serially for each
position. This means that there are 2 to 32,768 different values that this can read within 360o.
This gives a resolution of 0.01. By doing the same math as before with the better resolution, the
error for 100km is only off by 0.02km (65ft), which is very good. Another benefit to using
optical encoders is the fact that they are not susceptible to noise and electromagnetic
interference. They are also beneficial because it requires fewer wires to transmit data. Only two
wires are required to transmit data and two wires are used for a clock signal. This reduces noise
and cost with the wires. The encoder is shown below with a view of the radar inside the radome.
It will go inside the platform where the synchros currently reside. One encoder is needed to
track the azimuth rotational position and one is needed for the elevation position.
Figure 7 Pawnee Radar Inside the Radome
14
E. Inner Workings of the Encoder
How do these encoders work? Absolute optical encoders use a light source, rotating
disk, and photo detectors to produce an output signal. The important part to this is the rotating
disk. This disk has fifteen independent rotating tracks because one track is needed for one bit of
data. Each track rotates independently and has various arrangements of holes. As the light
shines onto the disk the tracks are positioned to let the light through certain bits which represents
the position the encoder is turned. This light is then picked up with a photo detector and the bits
are transmitted. As said before, the optical encoder is great because the light is not affected the
electromagnetic waves and fields.
Gray code is used instead of binary because to change by a value of one, only one bit
needs to change. With binary data to change by a value of one may require multiple bits to
change. An comparison of the two can be found in Appendix C. By only changing one bit at a
time, this makes it easier for the tracks to rotate. The encoder and an inside image is shown
below in Figure 8 and 9.
Figure 8 Figure 9 Stegmann ARS-20 Absolute Optical Encoder Inside Components of Optical Encoder
15
F. Xilinx Spartan-3E FPGA Details
Now that the encoders will be able to improve the resolution of the radar, how is the data
communicated back and forth? A Field Programmable Gate Array (FPGA) was chosen because
of the ease of configuration and programming when compared to transistor logic chips. Also
most FPGA’s have more than enough functionality to complete most tasks. To do FPGA testing
a development board was needed. A Digilent BASYS programmable logic board was used.
Figure 10 Digilent BASYS Board
The board contained a Xilinx Spartan-3E FPGA. The chip specifications of the Spartan-3E are
shown in Table 2.
Xilinx Spartan-3E Values
Speed 500 MHz
Block RAM 72 K bits
Multipliers 4 at 18bits each
I/O pins 144
System Gates 100 K
Table 2
16
Xilinx Spartan-3E Specification Testing with this board works well because this gives a chance to complete the coding
and debugging of the FPGA before implementation. A lot of research about the specifications of
this board was done using online documentations from Xilinx and Digilent. There is much
documentation about the BASYS board and operation, about being able to use the Digilent
software to download to the device, and also how all the external connectors and switches were
connected to the FPGA. Much more research was done looking into the FPGA data sheets.
With 144 pins all the I/O were checked to determine which would need to be used. Many other
features such as Digital Clock Manager (DCM), electrical characteristics, timing, memory, and
so on were looked into when working with this chip.
G. Methodology for Project
This project requires the FPGA to be programmed using VHSIC Hardware Description
Language (VHDL). Many tutorials and examples were researched online to understand this
programming language better. VHDL can be used to program physical signals and logic gates
that the FPGA can use as I/O. In the summer of 2007, a research experience for undergraduate
project was done by Darryl Benally and some of the code from that was used as a reference. The
tasks involved in programming this board were very difficult. To properly receive data from the
encoders the FPGA needs to generate a 17 pulse square wave. The encoder transmits 15 bits of
data and 2 error bits. The radar transmits waves approximately once every millisecond. In order
to get one data bit from the encoders with each transmit the minimum clock signal needs to be
1kHz per bit. With 17 bits per encoder the clock signal needs to be at least 17kHz if a reading is
required every time the radar sends a signal. A faster clock signal would allow for more readings
during the transmit time. With a clock signal going to each encoder, one elevation and one
azimuth, the encoder will be sending back the data of 17 bits on two separate lines. The FPGA
needs to read in this data and process it.
17
The encoder transmits the data in Synchronous Serial Interface (SSI) format which is
patented by Stegmann. This is where a clock signal is sent to the encoder and the data is send
back. The first high to low transition stores the data into a serial format ready to be transmitted.
Then on the first low to high transition the most significant bit of the Gray code is transmitted.
Then on each following low to high pulse the next bit is transmitted and so on. With 17 pulses
only 15 bits of data are taken in due to the first and last signals being held high which are used to
detect errors. An important part is to put a delay between successive clock signals so the data
sets can be distinguished. An example of the clock signal would be to start high then transmit a
square wave with 17 steps then finish high. After this then wait and repeat the signal. The
benefits of SSI is the need for only 4 data lines, low conventional component count, and one can
store data simultaneously.
Once the data is received by the FPGA from the encoder, it is processed. First, it must be
converted from Gray code to binary. Then one of the data sets needs to be shifted so that the
elevation and azimuth position data can be combined into the same thirty-two bit word of data.
This is called the “parallel to serial conversions” because the data from two parallel lines is able
to be transmitted on one single line. The single set of data is then transmitted over fiber optic
lines to the radar signal processor unit. The whole process is completed in less then a
millisecond, meaning the data coming from the radar position will always be accurate.
H. Current Work and Results
As of the end of the fall semester of 2007, here is the current status of the project. Right
now programming the FPGA in VHDL is the task at hand. An onboard digital clock is used and
being divided down to get the desired frequency. The FPGA is outputting a clock signal with 17
pulses and a delay to both of the encoders. The encoders are responding back with data. An
example of the clock and data signal is shown in Figure 9. The encoder data on the bottom
18
represents a 15 bit serial data stream. As noted before, this is the light output passing through the
encoder rotating disks.
Figure 11 Clock Signal and Encoder Data
The tasks that still need to be completed are to continuously update data, Gray code to
binary conversion, parallel to serial combination, and transmit data over fiber optic lines. Much
more work still needs to get done, but a good understanding of VHDL and the FPGA has been
accomplished and everything should go at quicker pace.
V. Future Work and Conclusions
Next, semester the plans for the Pawnee Hardware update will be to design the circuit
board where the components and FPGA will rest. For testing purposes right now, another board
attaches to the external connectors on the BASYS with wires dangling between the two. The
whole setup is an acceptable short-term solution, but a long-term solution would be to have all
the components on the same board. The design of the board will be very challenging due to the
complexity of the BASYS circuit. Also, determining what parts need to be included and what
parts can be excluded will take some careful consideration. Once the board is designed the board
19
will be made and all the components will need to be in place. Testing of the board will be done
to see if the design is done correctly. An idea of timing for next semester is laid out in Table 3.
Task Start Date Finish Date Duration
Finish Programming Interface for FPGA ---------- 1/27/08 12 Weeks
Design Printed Circuit Board 1/27/08 3/23/08 8 Weeks
Build, Test, and Debug Circuit Board 3/23/08 4/13/08 3 Weeks
Table 3 Pawnee Hardware Update Schedule
For the hardware accelerator project the plans for next semester are to implement the
accelerator on the final hardware. The system must first be configured. Then simple test CUDA
programs will be run. The next step will be to write code that will optimize matrix determinates
and inverses on the graphics card. After that code is written, it will be integrated with the code
for the PTDM. Finally the system will be configured to integrate with the radar sites interface. A
timeline for these tasks are in Table 4.
Task Start Date Finish Date Duration
Get system setup and successfully run simple test
CUDA programs.
12/10/07 1/21/08 6 Weeks
Finish CUDA code that optimizes performance on
real sample data
1/21/08 2/18/08 4 Weeks
Integrate CUDA code with algorithm code. 2/18/08 3/31/08 6 Weeks
Configure system to interact with radar interface
and integrate system into site.
3/31/08 4/14/08 2 Weeks
Table 4 Hardware Accelerator Schedule
20
After the implementation of the new encoders as well as the hardware accelerator system,
CSU’s radar sites will have more accurate radar data to be retrieved. Given the facts outlined in
the paper, these two projects will increase the accuracy of the radar data in two different ways.
With the more accurate data, further research projects on the sites will benefit from this project.
21
References:
[1] Absolute Encoder; Sick/Stegmann
[2] ARS 20, ARS 25: Single-turn Absolute Encoder; Sick/Stegmann
[3] CUDA Programming Guide 1.1, Nvidia; November 2007
[4] Digilent Basys Board Reference Manual; Digilent
[5] Spartan-3 generation FPGA User Guide; Xilinx; April 2007
[6] Spartan-3E FPGA Family: Complete Data Sheet; Xilinx; May 2007
[7] Synchronous Serial Interface for Absolute Encoders; Stegmann
22
Bibliography
[1] Benally, Darryl, 2007: Antenna Control for the Pawnee Radar. Colorado State University
[2] Nguyen, Cuong M., Dmitri N. Moisseev, and V. Chandrasekar, 2007: A parametric time domain method for spectral moment estimation and clutter mitigation for weather radars. Colorado State University
[3] Rinehart, Ronald E., 1997: Radar for Meteorologists. 3rd Edition: Knight Printing 2001
23
Appendix A- Abbreviations
CTM- Close To the Metal
CSU- Colorado State University
CUDA- Compute Unified Device Architecture
DCM- Digital Clock Manager
DSP- Digital Signal Processor
FPGA- Field Programmable Gate Array
PPI- Plan Position Indicator
PTDM- Parametric Time Domain Method
SSI- Serial Synchronous Interface
VHDL- VHSIC Hardware Description Language
VHSIC- Very High Speed Integrated Circuits
24
Appendix B- Budget
Item Price
Digilent BASYS Board $50
2 Quadro FX 5600 $3000 ($1,500 a piece)
2 Stegmann Absolute Encoder $1200 ($600 a piece)
xw 9400 $1500
25
Appendix C- Gray to Binary Comparison
Gray Code Binary Code Number
0000 0000 0
0001 0001 1
0011 0010 2
0010 0011 3
0110 0100 4
0111 0101 5
0101 0110 6
0100 0111 7
1100 1000 8