radar signal processing - walter scott, jr. college of...

46
Radar Signal Processing: Hardware Accelerator and Hardware Update Academic Year 2007-2008 by Michael Neuberg Christopher Picard Prepared to partially fulfill the requirements for ECE401/ECE402 Department of Electrical and Computer Engineering Colorado State University Fort Collins, Colorado 80523 Report Approved: Project Advisor Senior Design Coordinator

Upload: nguyenphuc

Post on 14-Jun-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Radar Signal Processing: Hardware Accelerator and Hardware Update

Academic Year 2007-2008

by

Michael Neuberg

Christopher Picard

Prepared to partially fulfill the requirements for ECE401/ECE402

Department of Electrical and Computer Engineering

Colorado State University

Fort Collins, Colorado 80523

Report Approved:

Project Advisor

Senior Design Coordinator

ii

Abstract

This project consists of two different projects both dealing with Colorado State University’s (CSU’s) radar systems. The first project is to update CSU’s Pawnee Doppler radar system. This will be done to improve performance and ability to enhance data. The second project deals with adding a hardware accelerator to the implementation for the Parametric Time Domain Method (PTDM) clutter mitigation algorithm. The device used in the hardware acceleration will be graphics cards.

The antiquated Pawnee radar system is outdated, which makes it hard to find replacement parts and limits the performance. The main retrofit is to replace the sychros, which give rotational position, with optical encoders. This should improve the resolution ten times, which enhances the system’s range over long distances. For the PTDM clutter mitigation implementation a standard computer system cannot compute results in real time. Graphics cards can perform needed calculations at rapid rates. This will allow the algorithm to be applied in real time.

Updating the system by using optical encoders requires interfacing to the Digital Signal Processor (DSP). A Field Programmable Gate Array (FPGA) will be used to send data to and from the encoders, then to the DSP. This will act as the interface between the two devices. A printed circuit board will be made with the FPGA to transmit and receive the data. The hardware that will be used is a Xilinx Spartan-3E FPGA and Stegmann ARS-20 absolute optical encoder. This hardware will then be installed at the Pawnee radar system which will improve the data about the position of the radar antenna. When researching different graphics cards to use for the implementation of the PTDM, clutter mitigation algorithm the Nvidia Quadro FX 5600 was chosen. The graphics card has 16 multiprocessors each of which can execute 128 simultaneous threads. Nvidia has a language called Compute Unified Device Architecture (CUDA). This language allows the programmer to use a C style language to code tasks on the graphics cards. This will greatly speed up production time for the implementation.

iii

Acknowledgments

The authors of this paper would like to thank Dr. Chandra for the opportunity to work on this project. It has given us valuable experience with the radar facilities at Colorado State University (CSU). We would also like to thank Jim George as well as the rest of the graduate students that are part of the CSU radar research team. We could not have gotten this far on our project without their guidance.

We would like to thank the CSU CHILL facilities for donating the components to help research and test for the CSU Pawnee update project. We would also like to thank Hewlett Packard for their donation of the workstation platform and graphics cards to the hardware accelerator project.

iv

TABLE OF CONTENTS

Title i

Abstract ii

Acknowledgments iii

Table of Contents iv

List of Figures and Tables v

I. Introduction 1

II. Background Information of Radar & CSU Facilities 2

III. Hardware Accelerator 3

A. Review of Previous Work 3

B. Problem Requirements and Solution Approaches 6

C. Solution Implementation 7

C.1 Hardware 7

C.2 CUDA Implementation 8

C.3 Current Project Status 10

IV. Pawnee Hardware Update 12

A. Colorado State University Pawnee Radar Facility 12

B. Pawnee Hardware 13

C. Problems Arising With Old Hardware 13

D. Solution 15

E. Implementation of Software 16

F. Implementation of Hardware 18

F.1 Components and Requirements 18

F. 2 PCB Design and Layout 20

V. Future Work and Conclusions 27

References 29

Bibliography 30

Appendix A – Abbreviations 31

Appendix B – Budget 32

Appendix C – Gray to Binary Comparison 33

v

Appendix D – PCB Components List 34

Appendix E – Circuit Board Schematic Layout 35

List of Figures

Figure 1 CHILL Doppler Radar Facility 2

Figure 2 Pawnee Doppler Radar Facility 3

Figure 3 PPI without Ground Clutter Filter 4

Figure 4 PPI with Ground Clutter Filter 5

Figure 5 Nvdia CUDA Architecture 9

Figure 6 Benchmark Results 10

Figure 7 PPI CHILL Radar Image, 2003, Tornado 14

Figure 8 Pawnee Radar Inside the Radome 16

Figure 9 Clock Signal and Encoder Data 17

Figure 10 Xilinx XPower Estimator Tool 19

Figure 11 PCB Top Layer 22

Figure 12 PCB Ground Plane 23

Figure 13 PCB Power Plane 25

Figure 14 PCB Top Layer and Power Layer 26

List of Tables

Table 1 FX 5600 Specifications 7

1

I. Introduction

There are two separate projects that are integrated together to benefit both of Colorado

State University (CSU’s) radar sites. The first project is to address the issue of ground clutter on

a radar image. There are several methods for mitigating ground clutter, each method having its

own benefits and disadvantages. This report will discuss one of the more effective ground clutter

mitigation techniques, as well as the difficulties in implementing the approach.

This report will cover a method for creating a hardware accelerator in order to perform

matrix calculation in parallel to significantly reduce the computation time. It will use a unique

preset architecture on a graphics card. Using the architecture and a C based extension language,

the graphics card will be transformed into the hardware accelerator for this project. Using this

hardware accelerator will allow one of the ground clutter mitigation methods to be preformed in

real time.

The second project contains information to update hardware at one of the CSU radar

facilities. Most of the hardware is outdated, which makes it hard to find replacement parts.

Updating the hardware will enhance the radar positioning system and will provide more precise

data. The main hardware update will be to replace synchros with encoders to determine the

precise angle and direction where the dish is directed. More details of the project will be

discussed along with the methods of implementation.

First, a background description of how radar works and the history will be discussed.

Then information on the two CSU radar facilities will be provided. Chapter III will deal with the

hardware accelerator project details. Chapter IV will discuss the specifics of the hardware

2

update. The last section will discuss wrap up the two projects and give an idea of future work to

be completed.

II. Background Information on Radar and CSU Facilities

The basic idea of radar is very simple. During WWII when ships or airplanes would

cross the path of radio signals, their echoes would be heard. This led to the idea of using radio

signals to determine the location of objects. This technology was further developed during

WWII forward to current day innovations. Many great technological advances have been made

to benefit the development of radar today. Much research continues today to improve and

enhance radar systems.

Understanding how radar works is a very simple idea. Radar basically consists of four

parts: 1.) there is the transmitter that generates a signal, 2.) an antenna that sends and receives

the signal, 3.) a receiver that detects the signal, and 4.) a display unit to analyze the data. All

these components make up very complex radar systems.

Colorado State University currently uses two radar facilities. The CHILL Doppler radar

(Figure 1) is a dual polarized S-band system located by Greeley, Colorado.

Figure 1 CHILL Radar Facility

3

The other is the Pawnee Doppler radar (Figure 2) which is a single polarized S-band system

located by Nun, Colorado..

Figure 2 Pawnee Radar Facility

Both radar facilities are combined to operate as a dual-Doppler configuration. The waveform

generated is a Gaussian waveform. Then a Klystron amplifier charges up the waveform to

1MW/channel. Next this is sent to the transmitter and sent off into the sky. Both of these

systems are sponsored by the National Science Foundation and Colorado State University.

III. Hardware Accelerator

A. Review of previous work:

One of the major problems that weather radar systems face is ground clutter. Ground

clutter is when the radar beam bounces off objects located on the ground. These objects can be

trees, buildings, mountains, et cetera. In order get a good weather image over areas with ground

clutter a clutter mitigation algorithm must be applied. In Figure 1 there is a Plan Position

Indicator (PPI) image with no ground clutter mitigation approaches applied. Figure 2 shows an

image seconds later with a ground clutter mitigation approach applied.

4

Figure 3 PPI without ground clutter filter

5

Figure 4 PPI with Ground Clutter Filter

As one can see, this filter significantly reduces the presence of ground clutter. It does not

reduce the clutter completely.

Parametric Time Domain Method (PTDM) is a method for radar ground clutter

mitigation. This method helps suppress the effect of ground clutter close to the radar site. Unlike

previous methods, this approach does not have the problem of signal loss or spectral leakage

(Nguyen, Moisseev, Chandaraskar 2007). The reason PTDM is not subject to these problems is

because it does not apply Fourier transforms on the signal switching the domain. Instead the

method performs its calculations in the time domain. The main issue with the PTDM approach is

6

it requires computationally expensive calculations. This is primarily due to the complex matrix

calculations that must be performed on large matrixes. When simulated in Matlab, the

calculations were shown to be too slow to run on a standard personal computer.

B. Problem Requirements and Solution Approach:

The problem is to find an approach to implement a hardware accelerator for calculating

matrix operations which includes determinates and inverse. The accelerator must be able to

perform the calculation at a rate to implement the PTDM algorithm in real time.

The chosen implementation was to perform the matrix calculations on a Graphics

Processing Unit (GPU). A GPU is a processor that is located on a graphics card. In order for a

graphics card to render images and handle the movement of those images on a screen, it must

perform large numbers of floating point operations in a small amount of time. The GPU

accomplishes this by performing large number of these operations in parallel, as well as

optimizing floating point operations. Both of these factors make the GPU an ideal choice for

implementing PTDM.

C. Solution Implementation:

This section will talk about the hardware the PTDM will be implemented on. It also

discusses the method for turning a GPU into a useable hardware accelerator.

C.1 Hardware:

There were two graphics cards venders researched as possible suppliers for this project.

The first vender is ATI graphics cards. ATI has a development method called Close To the Metal

(CTM). This development tool allows the user to write a custom driver level piece of code for

7

the graphics card. The primary disadvantage of this approach is the learning curve for

development.

The second vender is Nvidia. Nvidia has a development method called Compute Unified

Design Architecture (CUDA). This method builds a general architecture on the graphics card and

then allows the user to write in a C style language making special function calls to compute on

the graphics card. With this C style language development time is significantly shorter.

When comparing the two venders, the Nvidia approach was selected because of the

shorter development time. The specific graphics card picked was the Quadro FX 5600. This card

is one of the few that will work with the CUDA architecture and is supported on the Hewlett

Packard xw9400. Two of these cards, as well as the platform, were donated to the project by

Hewlett Packard. The graphics card's specifications are listed in Table 1.

FX 5600 Specifications Values

GPU G80

GPU Speed 600 MHz

RAM 1.5GB GDDR3

RAM Speed 800 MHz

Memory Interface 384 Bit

Memory Bandwidth 76.8 GB/sec

Table 1 FX 5600 Specifications

8

The xw9400 was chosen since it has two full 16-bit channel PCIe slots. This allows for

data to be transferred from main memory to the graphics card memory at a higher rate. The

system has two AMD Opteron processors. This provides a high-end processor to accurately

access the speed of the CUDA implementation.

C .2 CUDA Implementation:

In order to turn the two graphics cards into hardware accelerators CUDA will be used.

This architecture turns the graphics card into multiprocessor blocks. Each of these

multiprocessors has eight individual processors with dedicated registers as well as access to

shared memory. There are 8,192 registers and 16 KB of shared memory per multiprocessor

block. (Nvidia CUDA Programming Guide 2007). For the entire architecture see Figure 5.

9

Figure 5 Nvidia CUDA Architecture

The software implementation of CUDA is a C style language. When an operation is to be

performed on the graphics card, a segment of code called a kernel is executed. Each kernel is

broken up into an array of blocks. Each block consists of a large number of threads. Only one

block can be executed on a multiprocessor at a time; however, other blocks can be swapped on to

the multiprocessor in order to optimize performance when a block is in an idle section.

10

In order to use this architecture in a high performance setting, there are some important

constraints to understand. The most important constraint is having 16 KB of shared memory. All

of the data you are processing per multiprocessor must be able to fit in the 16 KB shared memory

space. If all of the data cannot fit in the shared memory, then there is a 400 to 600 clock cycle

delay per read from any data not in shared memory (Nvidia CUDA Programming Guide 2007).

The second constraint is all threads are organized into a group of 32 called a warp. This means

that to get optimal performance there must be a minimum of 64 threads per block. The last

important constraint is the memory layout of the device. Shared memory blocks are organized

into sixteen, one byte blocks. No thread can access the same memory block at the same time.

Global memory is organized in a different manner, however, in order to get optimal read times

each thread should read form it at 32 bit intervals (Nvidia CUDA Programming Guide 2007).

C .3 Current Project Status:

The first important segment of code for implementing PTDM is the inverse operator. It is

the operator that involves the most floating point computations out of PTDM. The current

performance of the inverse operator is shown in Figure 6. The x axis represents the number of 16

x 16 matrices the inverse was performed on.

11

Figure 6 Benchmark Results

The performance of CUDA shown is greatly improved from the original implementation

that had a speed at 10000 matrices of 7.9 sec. CUDA still has a 33% slower performance then the

systems processor. This performance difference has been shown to be due to the data transfer

rates from memory to the graphics card. When the inverse operator is preformed on the data ten

times, the CUDA implementation’s performance becomes 35% faster than the system processor.

There are three key improvements made from the original implementation of the CUDA

code. The first improvement was implementing a batch type data transfer structure from main

memory to the graphics card memory. Instead of sending all of the data at the same time to the

graphics card, batches of 100 matrices are sent instead. This single improvement increased the

performance from 7.9 sec. to 1.2 sec on 10000 matrices. The second improvement was to the use

of host page locked memory. This method makes the pages that contain the data that is to be

16x16 Results

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

500 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Gates

Sec

Single

Double

CUDA

12

transferred onto the graphics cards to be locked in their location in memory. The system will no

longer be able to move the data to a different memory location or page the data out of main

memory. The end result of this restriction is a data transfer rate between main memory and the

graphics card at twice the original rate. The last improvement was to calculate the inverse of two

matrices per block. Due to data dependency of the inverse operation, it is difficult to fully utilize

the parallelism of the graphics card. By adding the second matrix, it allows for double

performance in low parallel sections of the inverse operator. No more matrices could be added

due to the 16 KB limitation on shared memory.

The next important operator that PTDM involves is calculating a matrix determinate. The

code for this operator is not complete at present time, but is slated to be finished within the next

two weeks. With the data transfer structure, the inverse and the determinate operator the

remaining parts of PTDM will take less time then these three parts.

IV. Pawnee Hardware Update

The goal of this project is to update the positioning system and hardware on the CSU-

Pawnee Doppler radar. This will be done by replacing synchros with optical encoders and

interfacing them to the radar signal processor using an FPGA. This will greatly improve the

accuracy of this site.

A. Colorado State University Pawnee Radar Facility

The Pawnee radar is located about 48km north of the CSU-CHILL radar in Greeley.

This radar was shown before in Figure 2. The antenna is located within the radome on the right-

hand-side. The trailer on the left is where all the main radar hardware, electrical equipment, and

operators are located. This is a single-polarization radar system, which means it only transmits

signals on one axis.

13

B. Current Pawnee Hardware

The old Pawnee radar has outdated equipment. This old equipment makes it hard to fix

and find replacement parts. While this radar still works fine, it can definitely be enhanced.

Right now the radar uses synchros, which were developed during WW II, to communicate

rotational position to the operator. The two types of rotational position are the azimuth position

and the elevation position. Synchros use many wires to transmit the data in parallel over long

distances, which in turn can be very costly, prone to noise, and prone to damage. Due to all the

wires and the age of the synchros, only so many bits of resolution can be determined. The

current synchros have a resolution of approximately 0.1 with the number of data bits available.

C. Problem Arising With Old Hardware

What sort of error can be caused from the low resolution of the synchros? An example of

a typical radar image is shown below in Figure 7.

14

Figure 7 PPI CHILL Radar Image, 2003, Tornado

This was taken in August 2003 from the CHILL facility of Burns, Wyoming, which is

approximately 100km away. The area within the big circle represents a tornado given by the

“hook” shape that the reflectivity is making. If the image above was from the Pawnee radar site,

and not from CHILL, the following errors can be calculated due to the low resolution of the

synchros.

From the radar at the center suppose that the tornado was located at an angle of 75o.

15

x=25.8819km

y=96.5926km

The simple law of sines can show the precise location of the tornado. If the degree is off

by 0.1, then the resulting error can occur.

x=26.0505km

y=96.5473km

This can place the tornado almost 0.2km (656ft) from where it actually is. If this value

could be improved to track the exact location of the weather system, people who were directly in

the path of the tornado could be warned. This is only for 100km away. However, many weather

radars can go up to a few hundred kilometers away, which increases the error caused by the low

resolution. The goal is to improve these values.

D. Solution

A good solution to the issue of low resolution is to use optical encoders. Absolute

encoders from Stegmann, ARS-20, were purchased to replace the synchros. Absolute encoders

are used because the position data is still retained even after power loss. This is important

because the radar is not operational all the time. The absolute encoder was compared to an

incremental encoder that loses its position upon power loss. An absolute encoder was by far the

best choice. The best thing about this encoder is that it transmits 15 bits of data serially for each

position. This means that there are 2 to 32,768 different values that this can read within 360o.

This gives a resolution of 0.01. By doing the same math as before with the better resolution, the

error for 100km is only off by 0.02km (65ft), which is very good. Another benefit to using

optical encoders is the fact that they are not susceptible to noise and electromagnetic

interference. They are also beneficial because it requires fewer wires to transmit data. Only two

wires are required to transmit data and two wires are used for a clock signal. This reduces noise

and cost with the wires. The encoder is shown below in Figure 8 with a view of the radar inside

16

the radome. It will go inside the platform where the synchros currently reside. One encoder is

needed to track the azimuth rotational position and one is needed for the elevation position.

Figure 8 Pawnee Radar Inside the Radome

E. Implementation of Software

To complete this project a Field Programmable Gate Array (FPGA) is used to act as the

interface between the optical encoders and the DSP unit. To test this project a Basys Digilent

development board was used, which contains a Xilinx Spartan-3E FPGA. This was used to

implement test code to check the data from the encoders. In the summer of 2007, a research

experience for undergraduate project was done by Darryl Benally and some of the code from that

was used as a reference. Many VHDL tutorials were used to develop the coding of this system.

17

The FPGA does the following processes to work correctly. First, to properly receive data

from the encoders the FPGA needs to generate a 17 pulse square wave. The encoder transmits

15 bits of data and 2 error bits. The radar transmits waves approximately once every

millisecond. In order to get one data bit from the encoders with each transmission, the minimum

clock signal needs to be 1kHz per bit. With 17 bits per encoder the clock signal needs to be at

least 17kHz if a reading is required every time the radar sends a signal. A faster clock signal

would allow for more readings during the transmit time. This project is done using a 1MHz

clock signal in order to get many data readings for each radar transmission. With a clock signal

going to each encoder, one elevation and one azimuth, the encoder will be sending back the data

of 17 bits on two separate lines. The FPGA needs to read in this data and process it.

The encoder transmits the data in Synchronous Serial Interface (SSI) format which is

patented by Stegmann. The signal is held high then a low signal initiates the sequence. Data is

sent back on each rising edge. An important part is to put a delay between successive clock

signals so the data sets can be distinguished. An example of the clock signal needed to be sent to

the encoder would be to start high then transmit a square wave with 17 steps then finish high.

An example of the clock and data signal is shown in Figure 9. The encoder data on the bottom

represents a 15 bit serial data stream.

Figure 9 Clock Signal and Encoder Data

18

After this, then wait and repeat the signal. The benefits of SSI is the need for only 4 data lines,

low conventional component count, and one can store data simultaneously.

Once the data is received by the FPGA from the encoder, it is processed. First, it must be

converted from Gray code to binary. Then one of the data sets needs to be shifted so that the

elevation and azimuth position data can be combined into the same thirty-two bit word of data.

This is called the “parallel to serial conversions” because the data from two parallel lines is able

to be transmitted on one single serial line. The single set of data is then transmitted over fiber

optic lines to the radar signal processor unit. The whole process is completed in less than a

millisecond, meaning the data coming from the radar position will always be accurate.

F. Implementation of Hardware

To complete this project a Printed Circuit Board (PCB) is needed to hold all the

components. The board is designed using PCB Artist from Advanced Circuits. In order to

accomplish this, much review of the current board layouts and components was needed. Another

important consideration was to find components that were available on Digikey in the useable

packages.

F.1 Components and Requirements

A schematic on the Basys board from Digilent was used as a reference for general

configurations. An important difference in the project design versus the test board is the FPGA.

The software used to program the device, the available FPGA devices, and the PCB parts all

needed to match up. A Xilinx Spartan 3 FPGA was used, which is a different pin out and

specifications from the development board’s Spartan 3E chip. The device has less gates and less

19

functionality, but for this project is very adequate. Powering on the FPGA is actually a very

difficult task with it requiring 3 different voltage levels. A nice feature with using Xilinx chips is

the XPower Estimator Tool, which is shown below in Figure 10.

Figure 10

Xilinx XPower Estimator Tool

Basically, this gives good estimated power requirements for each of the power inputs in to the

device. This is used to determine what size voltage regulators are needed in order supply

adequate current. Knowing this, a Texas Instruments Dual Output Voltage regulator of 1A is

used. This output provides a 3.3V and 1.2V output. Another Texas Instrument voltage regulator

is used to supply 2.5V. Many other components need these voltage levels. Some devices do

need a 5V and 12V supply. To do this another voltage regulator, 7805, is used. By using all

these different voltage regulators, only one external power supply is needed. The setup will go

as follows. A 12V power supply is used. This voltage goes to the encoders and then to a voltage

20

regulator to take it down to 5V. This 5V is used by some devices, but then it is also divided

down to 1.2V, 2.5V, and 3.3V for other devices.

Other components include a crystal oscillator running at 40MHz. This is used to by the

FPGA to send a clock signal to the encoders. A Xilinx Programming Platform Flash chip is also

used to program the device. This ties the JTAG programming cable to the FPGA and allows the

device to be programmed. Differential line drivers and receivers are used in order to transmit

and receive data from the encoders. These are used to distinguish between the high and low TTL

signals over long distances. A dual peripheral driver is also used for TTL outputs. These signals

are then sent to a fiber optic transceiver, which sends the signal to the DSP unit and angle

display. A detailed parts list can be found in Appendix D. As said before all these parts are

easily available through electronic component providers.

F.2 PCB Design and Layout

A schematic design was the first step is the PCB process. Using PCB Artist, a schematic

layout was created. This involved connecting nets, choosing parts, and creating parts. The

correct package size and specifications of each component were taken into consideration. This

was basically broken down into six different sections: power, transmitter, receiver, flash

memory, clock, and programming. These sections can be seen in Appendix E.

Once the schematic was completed, then this was translated in a printed circuit board

layout. Each component was then placed in a logical position in relation to the routing

requirements of the board. Everything was then routed with copper traces.

At first the design started as a two layer PCB board. This proved to be very difficult and

hard to follow traces. The overall flow of the nets in this design was very poor. Many

components were hard to place and then hard to route to where needed. Due to the difficulties

with a two layer board, a decision was made to use a four layer board. This allowed a top layer,

21

ground layer, power layer, and bottom layer. By using this type of board, the flow improved and

it was much easier to route all the components.

The top layer contained approximately 95% of the components, which is easier to follow

the flow. Many decoupling capacitors are used on the inputs and outputs to most pins to reduce

noise. These needed to be placed very close to the pin. All the parts are labeled and documented

on the top silk screen of this top layer. Due to the benefits of a four layer board, extra

components are able to be added to this top layer. This included extra dip component options

and other small parts so this board can be added onto later if needed. The top layer can be

viewed in Figure 11 below. The bottom layer, not shown, was basically only used for a few

decoupling capacitors and copper traces.

22

Figure 11

PCB Top Layer

The ground plane was one of the biggest advantages to using a four layer board. This

allowed for a ground connection from almost any point on the board, while with a two layer

board many nets where used to connect ground at various points. The ground plane is shown

below in Figure 12.

23

Figure 12

PCB Ground Layer

The black represents the copper pour area and white is the etchings. Whenever a component

needed to be grounded then a via was added to connect straight to the ground plane. The white

star pattern shows vias that are connected to this plane. The star pattern is used for thermal relief

in that heat will not concentrate to one location on the plan, but can be dispersed through four

sides. Some components, such as heat sinks on the top copper layer, require many vias with

24

thermals to dissipate heat. The rings are components that do not touch the ground plane and go

through to another layer.

Another concern about the Pawnee Radar site is that it frequently gets struck by

lightning. This could potentially destroy the board through the connections. In order to prevent

this, modifications to the ground plane were made. An outer ground ring partially isolated from

the inner ground layer was created to prevent destruction of the components. Any permanent

connectors, such as the encoders, have the wire shielding connected to this outer ring. That way

if charge builds up in the wire, it will travel around the outside edge and up to the upper right

side of the board where the power supply will be. This way the charge does not travel across the

middle of the board where the FPGA and all the components are located. It still makes a

common ground for all by being connected in the upper right hand corner.

A power plane layer was also used in order to provide all the components with the

needed supply. This involved carefully planned out polygon pours according to the component

locations. The power layer is shown below in Figure 13.

25

Figure 13

PCB Power Layer

As one can see, each isolated area represents a different voltage level. The voltage dividers are

placed between the two planes so that it can take in the input voltage from one area, and then

output the voltage to another area. As with the ground plane, this plane is connected to through

vias with thermals to disperse the flow of the current and heat. This design allows for voltage

levels of 12V, 5V, 3.3V, 2.5V, and 1.2V all from only one power source. This is important

because of all the different voltage requirements for the components and also the three separate

voltage levels needed by the FPGA. The board’s outline with component and power layer is

26

shown below in Figure 14 to show how all the voltage regulators supply power and where all the

components lie.

Figure 14

PCB Top Layer and Power Layer

27

V. Future Work and Conclusions

As of May 3, 2008 the Pawnee Hardware update project is almost complete. The board is

currently being manufactured and the parts have been ordered. They are expected to arrive

within the next few days. The next task will be to place and solder all the components to the

board. Once the board has been tested and checked, then the software program from fall

semester will be tested with the board. It will be tested against an angle display at the radar

facility to determine if it working correctly.

It is not recommend to continue this project in the future. The project will basically be

done within a week. The only task left to do would be actually installing the encoders at the

Pawnee site. This can easily be done sometime this summer when other maintenance work is

being completed at the Pawnee radar site.

It is difficult to say whether the hardware accelerator project should be continued.

Despite the clams that CUDA is an easy implementation method, there is an extremely high

learning curve and very little material available on it. This is what led to the full PTDM

algorithm not being implemented at present time. If this project is continued, a larger team is

required to help offset the difficulty of the implementation.

There are three main steps left in the hardware accelerator project. The first is to take the

existing code and add the rest of PTDM to it. After that step is accomplished the performance of

the code needs to be measured and improved until it is sufficient to implement PTDM in real

time. The final step will be to integrate the PTDM code with the present data interface at the

radar site. Once these steps are accomplished the project will be ready to be installed at the Chill

radar site.

28

After the implementation of the new encoders, hardware components, and software

components of the Pawnee retrofit project, CSU’s radar sites will have more accurate radar data

to be retrieved. Even though the PTDM implementation was not completed this semester, there is

now good base for future developers to build on. This will help to lower the large learning curve

associated with CUDA.

Given the facts outlined in the paper, these two projects will increase the accuracy of the

radar data in two different ways. The first will provide higher precision data and the second a

cleaner image. With the more accurate data, further research projects on the sites will benefit

from this project.

29

References:

[1] Absolute Encoder; Sick/Stegmann

[2] ARS 20, ARS 25: Single-turn Absolute Encoder; Sick/Stegmann

[3] CUDA Programming Guide 1.1, Nvidia; November 2007

[4] Digilent Basys Board Reference Manual; Digilent

[5] Digilent Basys Board Schematic Revision E; Digilent

[6] PCB Artist Layout Software Step by Step; Advanced Circuits

[7] PCB Artist Layout Software Part Creation; Advanced Circuits

[8] Spartan-3 generation FPGA User Guide; Xilinx; April 2007

[9] Spartan-3E FPGA Family: Complete Data Sheet; Xilinx; May 2007

[10] Synchronous Serial Interface for Absolute Encoders; Stegmann

30

Bibliography

[1] Benally, Darryl, 2007: Antenna Control for the Pawnee Radar. Colorado State University

[2] Nguyen, Cuong M., Dmitri N. Moisseev, and V. Chandrasekar, 2007: A parametric time domain method for spectral moment estimation and clutter mitigation for weather radars. Colorado State University

[3] Rinehart, Ronald E., 1997: Radar for Meteorologists. 3rd Edition: Knight Printing 2001

31

Appendix A- Abbreviations

CTM- Close To the Metal

CSU- Colorado State University

CUDA- Compute Unified Device Architecture

DCM- Digital Clock Manager

DSP- Digital Signal Processor

FPGA- Field Programmable Gate Array

PCB- Printed Circuit Board

PCIe- Peripheral Component Interconnect Express

PPI- Plan Position Indicator

PTDM- Parametric Time Domain Method

SSI- Serial Synchronous Interface

VHDL- VHSIC Hardware Description Language

VHSIC- Very High Speed Integrated Circuits

32

Appendix B- Budget

Item Price

Digilent BASYS Board $50

2 Quadro FX 5600 $3000 ($1,500 a piece)

2 Stegmann Absolute Encoder $1200 ($600 a piece)

xw 9400 $1500

33

Appendix C- Gray to Binary Comparison

Gray Code Binary Code Number

0000 0000 0

0001 0001 1

0011 0010 2

0010 0011 3

0110 0100 4

0111 0101 5

0101 0110 6

0100 0111 7

1100 1000 8

34

Appendix D- PCB Components List

1A Voltage Regulator 5V, Fairchild Semiconductor

40MHz Crystal Oscillator, CTS-Frequency Controls

5mm Terminal Block, On Shore Technology

600W Transient Voltage Suppressor, Diodes Inc

Dual Output Voltage Regulator 1.2V, 3.3V, Texas Instruments

Dual Peripheral Driver, Texas Instruments

ESD Diode Array, NXP Semiconductors

Fiber Optic Transceiver, Agilent

Heatsink, Aavid Thermalloy

Linear Voltage Regulator 2.5V, Texas Instruments

Platform Flash, Xilinx

Quadruple Differential Line Receiver, Texas Instruments

Quadruple Differential Line Driver, Texas Instruments

Spartan -3 FPGA, Xilinx

Various Size Resistors

Various Size Capacitors

35

Appendix E- Circuit Board Schematic Layout

Power Schematic

36

Transmitter Schematic

37

Receiver Schematic

38

Flash Memory Schematic

39

40

Clock Schematic

Programming Schematic

41