skau1 approach to scalable parallel processing for space-based radar

Skau 1Confidential and Proprietary

Approach to Scalable Parallel ProcessingFor Space-Based Radar


Example: SAR and GMTI Partitioning/Mapping/Utilization


DopplerProcessing

JammerNulling

CFAROutputClutter

SuppressionSTAP

PulseCompression

Input

A/D

Range Correction &

AzimuthProcessing

RangeCompression

ImageFormation

&Post

Processing

OutputAzimuthFiltering

InputJammerNulling

A/D

Basic SAR Processing Flow

Basic GMTI Processing Flow

Basic SBR Signal Processing


Example: SAR Processing Assumptions

• Assumed SAR Parameters• 16 channels (12 Subarrays and 4 Auxiliary Channels)• 400 - 650 Msamples/sec (2.5 - 1.5 nsec per sample• 500 sec receive window (200,000 - 333,000 range cells per pulse)• 1 KHZ PRI• 16,384 pulses in azimuth (16.4sec collection time)• The last 1/2 of the collected samples are used as the first 1/2 of the samples for the image (processed only through range pulse compression and stored)• 1 beam formed for SAR image; input 16,384 ranges x 16,384 pulses• 8-bits (1 Byte) per A/D sample• 64-bits (8 Bytes) for internal data storage for complex data (4 Bytes real, 4 Bytes imaginary)• 32-bits (2 Bytes) for internal data storage of real data• 4/3 oversampling out of the Polyphase Channelizer• 128 Subbands formed; only 40 processed


Example: SAR/GMTI Partitioning on BEP System

• Set local memory per processing element at 128 KBytes to be able to handle 16K FFTs for SAR mode (not double buffered)

• For the GMTI mode, the 128 KBytes of local memory can handle

- 256 pulses and 64 ranges per main memory access that can be processed in the pulse or Doppler dimension

or

- 8,000 ranges and 2 pulses worth of data per main memory access that can be processed in the range dimension

or

- any combination that maximizes throughput by “blocking,” i.e., effectively “caching”, and “striding” data for optimum performance

• Data can also be partitioned across beams, channels, and segments when they are independent variables relative to high level data flow processing


SAR Processing Flow/Global Memory Accessing (1 of 2)

Range PulseCompression16,384

ranges foreach pulse

Store datafrom eachpulse in global

memory 16,384


16,384pulses

x 1range

x 8 Bytes= 131.1 KBytes

Extractpulse

(cross-range)data

Perform cross-range

FFT

Loop 16,384 times = 16,384 ranges/1 range/loop

128pulsesx 128

rangesx

8 Bytes= 131.1 KBytes

Extract2D PolarReformat

data

Perform Polar

Reformatting

Loop 16,908 times = 268.44 x 106 cells/15.876 x 103 cells/patch

Storeprocessed

data

131.1 KBytes

Storeprocessed

data

Small overlap toprocess 126 x 126

patch

16,384 KBytes

16,384ranges

x 1pulse

x 8 Bytes= 16,384 KBytes

Loop 16,384 times = 16,384 pulses/1 pulse/loop

16,384 KBytes

Extract rangedata

Perform rangeFFT

Storeprocessed

data(transposed)



16,384pulses

x 1range



Perform Auto Focus

16,384 MBytes

Storeprocessed

data

131.1 KBytes

16,384pulses

x 1range



65.6 KBytes

Extract pulse

(cross-range)data

Perform MagnitudeFunction

Storeprocessed

data

Extractpulse

(cross-range)data

Perform cross-range

FFT

Storeprocessed

data

Extractpulse

(cross-range)data

16,384pulses

x 1range


Loop 1024 times = 32,768 ranges/32 ranges/loop

All of this, from cross-range FFT through Magnituding, can be done with the data in place, i.e.,

no need to extract and restore data until allof the processing is

complete. (I setit up that way

because the 2ndhalf of the 2D FFT,

AutoFocus, and Magnituding appear

to be performedonly in the cross-rangedimension.


SAR Processing Flow/Global Memory Utilization

16, 384pulses

2.2 GBytescomplex data

Storage ofpulse

compressedrange datafor furtherprocessing

1.1 GBytesof new data from

current CPIcollection

Storage ofcross-range

FFT processeddata throughthrough polarreformatting

Storage of1st half of

2D FFTresult

transposed(corner-turned)

throughMagnituding


2.2 GBytescomplex data Total global memory required for SAR (worst case) = 9.9 GBytes *

* This might be able to be reduced to 7.8 GBytes if 2D FFT can be done “in place.” **

2.1 GBytes available for storing training samples for AWC (Adaptive Weight Computation) for ECCM

CPIN-1 CPINCPIN+1

Platform Motion

Notes: 1) For continuous map, the last half of the previous CPI can be used as the first half of the data for the next CPI 2) Not exactly sure how this works with ECCM, collecting training samples, etc. Obviously, you do not want a big jammer to mess up the formation of the SAR image. How to null a big jammer out without affecting the image is a major consideration. 3) For onboard DTED processing, I assume the global memory requirements would double because two (2) beams would be formed, an upper beam and a lower beam with slightly different look-down angles that could be used to form the elevation differential ISAR image 4) Historically, SAR processing hasn’t required the arithmetic precision of GMTI, e.g., 4-bit A/D converters and 8-16 bit data representation in the processing chain. The memory requirement is a function of the arithmetic precision.


Possible SAR Partitioning/Mapping/Utilization(32K x 32K Image)


SAR Processing - Assumptions

• Assumed SAR Parameters• 16 channels (12 Subarrays and 4 Auxiliary Channels)• 400 - 650 Msamples/sec (2.5 - 1.5 nsec per sample• 500 sec receive window (200,000 - 333,000 range cells per pulse)• 1 KHZ PRI• 32,768 pulses in azimuth (32.8 sec collection time)• The last 1/2 of the collected samples are used as the first 1/2 of the samples for the image (processed only through range pulse compression and stored)• 1 beam formed for SAR image; input 32,768 ranges x 32, 768 pulses• 8-bits (1 Byte) per A/D sample• 64-bits (8 Bytes) for internal data storage for complex data (4 Bytes real, 4 Bytes imaginary)• 32-bits (2 Bytes) for internal data storage of real data• 4/3 oversampling out of the Polyphase Channelizer• 128 Subbands formed; only 40 processed

• Assumed Processing Resources• 32 processing nodes per board• 64 GFLOPS peak throughput per board• 48 GFLOPS sustained throughput per board (assumed @ 75% execution efficiency)• 32 MBytes local data memory per board

- 8 MBytes local data memory per processing cluster per board-- 2 MBytes local data memory per processing node

• 256 KBytes local memory per processor• 32 GBytes Global Memory (worst case SAR requirement)



Range PulseCompression32,768


Store datafrom eachpulse in global

memory 32,768


32,768pulses

x 32ranges

x 8 Bytes= 8.4 MBytes

Extractpulse

(cross-range)data

Perform cross-range

FFT


1012pulsesx 1012ranges

x8 Bytes

= 8.2 MBytes

Extract2D PolarReformat

data

Perform Polar

Reformatting

Loop 1060 times = 1073.74 x 106 cells/1.012 x 106cells/patch

Storeprocessed

data

8.4 MBytes

Storeprocessed

data

Small overlap toprocess 1010 x 1010

patch

8.2 MBytes

32,768ranges

x 32pulses



8.4 MBytes

Extract rangedata

Perform rangeFFT

Storeprocessed

data(transposed)

Note: This sizing assumes that the extracted data fills the available 8 MBytes of external memory available on the Compute Cluster. It might be safer to assume only twenty-eight (28) ranges per extraction ==> more loops, but the bandwidth is about the same because the same amount of data has to be extracted and re-stored. The same thing is true for the patch size, except there is a little more bandwidth required because of the overlap needed to do the interpolation. This gets a little tricky because of the assumed in-place calculation.



32,768pulses

x 32ranges



Perform Auto Focus

8.4 MBytes

Storeprocessed

data

8.4 MBytes

32,768pulses

x 32ranges



4.2 MBytes

Extract pulse

(cross-range)data

Perform MagnitudeFunction

Storeprocessed

data

Extractpulse

(cross-range)data

Perform cross-range

FFT

Storeprocessed

data

Extractpulse

(cross-range)data

32,768pulses

x 32ranges



All of this, from cross-range FFT through Magnituding, can be done with the data in place, i.e.,

no need to extract and restore data until allof the processing is

complete. (I setit up that way

because the 2ndhalf of the 2D FFT,

AutoFocus, and Magnituding appear

to be performedonly in the cross-rangedimension.

Note: This sizing assumes that the extracted data fills the available 2MBytes of external memory available on the Compute Cluster. It might be safer to assume only twenty-eight (28) ranges per extraction ==> more loops, but the bandwidth is about the same because the same amount of data has to be extracted and re-stored.


SAR Processing Flow/Global Memory Utilization

32, 768pulses


Storage ofpulse

compressedrange datafor furtherprocessing

4.3 GBytesof new data from

current CPIcollection

Storage ofcross-range

FFT processeddata throughthrough polarreformatting

Storage of1st half of

2D FFTresult

transposed(corner-turned)

throughMagnituding


8.6 GBytescomplex data Total global memory required for SAR (worst case) = 30.1 GBytes *

* This might be able to be reduced to 21.4 GBytes if 2D FFT can be done “in place.” **

2.9 GBytes available for storing training samples for AWC (Adaptive Weight Computation) for ECCM

CPIN-1 CPINCPIN+1

Platform Motion

Notes: 1) For continuous map, the last half of the previous CPI can be used as the first half of the data for the next CPI 2) I am not sure how this works with ECCM, collecting training samples, etc. Obviously, you wouldn’t want a big jammer to mess up the formation of the SAR image, but I am not sure how you null a big jammer out without affecting the image 3) For onboard DTED processing, I assume the global memory requirements would double because two (2) beams would be formed, an upper beam and a lower beam with slightly different look-down angles that could be used to form the elevation differential ISAR image 4) Historically, SAR processing hasn’t required the arithmetic precision of GMTI, e.g., 4-bit A/D converters and 8-16 bit data representation in the processing chain. The memory requirement is a function of the arithmetic precision.


SAR Processing Flow/Board-Level Partitioning/Utilization

PolyphaseChannelizer

Board #1


Board #2


Board #3


Board #4

ECCM, AWC, &PP Combination

Board #1


Board #2


Board #4


Board #3

Range PC, Freq.Conv, Polar Reform,2D FFT, Auto Focus,

& Mag. Board #1


& Mag. Board #2


& Mag. Board #3


& Mag. Board #4Max. Sustained T-put per Stage = 192 GOPSUsed T-put perStage = 144 GOPS

Max. Sustained T-put per Stage = 192 GOPSUsed T-put per Stage = 96 GOPS

Each ECCM/AWC/PP Combo Board processes 16 radar channels with 750 range cells per pulse and 40 subbands per channel; forms 1 beam; combines 40 subbands; each ECCM node outputs 8192 ranges to each of the next stage receiving nodes

Each Range PC - Mag node performs range compression on a pulse by pulse basis; sends processed data to global memory; frequency conversion, polar reformatting, 2D FFT,Auto Focus, and Magnituding are performed working out ofglobal memory, and restoring processed data in global memory on a function by function basis

1.15 GBytes/sec inputto each PPC board

1 beam x 8192 ranges x 8 Bytesinput to each Range PC board from each ECCM board every PRI = 65.5 MBytes/sec; Aggregate BW = 1.05 GBytes/sec

Global Memory

Max. Memory = 32 GBytesUsed Memory = 30.1 GBytes

Output

528.4 MBytes/sec between each board and global memory Aggregate BW = 2.1 GBytes/sec

Training Samples &Adaptive Weights

16 ch x 256 pulsesx 288,000 rangesx 4 Bytes everyCPI (0.256 sec)= 18.4 GBytes/sec

4 ch x 40 subbands x 1 pulse x 750 ranges x 8 Bytes output from each PPC board and input to each ECCM board every PRI (1 msec) = 0.96 GBytes/sec; Aggregate BW = 15.4 GBytes/sec

Each PPC Channelizer Board processes 4 radar channels with 288,000 range cells per pulse; generates 128 subbands; 40 subbands are output to the Global Memory for prcessing in the ECCM nodes


SAR Processing Flow/Node-Level Partitioning/Utilization

GlobalMemory

Compute Board #N

8 MByteLocalData

Memory

8 MByteLocalData

Memory

8 MByteLocalData

Memory

8 MByteLocalData

Memory

Compute Cluster #1

Compute Cluster #2

Compute Cluster #3

Compute Cluster #4

CNA #1

CNA #2

CNA #3

CNA #4

CNA #1

CNA #2

CNA #3

CNA #4

CNA #1

CNA #2

CNA #3

CNA #4

CNA #1

CNA #2

CNA #3

CNA #4

Possible Partitioning/Utilization for Range Pulse Compression,Frequency Conversion in Range, 2D FFT, Auto Focus, andMagnituding: Ranges or Cross-Ranges M+1 to M+32 => Compute Cluster #1 Ranges or Cross-Ranges M+1 to M+8 => CNA #1 Ranges or Cross-Ranges M+9 to M+16 => CNA #2 Ranges or Cross-Ranges M+17 to M+24 => CNA #3 Ranges or Cross-Ranges M+25 to M+32 => CNA #4 Ranges or Cross-Ranges M+33 to M+64 => Compute Cluster #2 Ranges or Cross-Ranges M+33 to M+40 => CNA #1 Ranges or Cross-Ranges M+41 to M+48 => CNA #2 Ranges or Cross-Ranges M+49 to M+56 => CNA #3 Ranges or Cross-Ranges M+57 to M+64 => CNA #4 Ranges or Cross-Ranges M+65 to M+96 => Compute Cluster #3 Ranges or Cross-Ranges M+65 to M+72 => CNA #1 Ranges or Cross-Ranges M+73 to M+80 => CNA #2 Ranges or Cross-Ranges M+81 to M+88 => CNA #3 Ranges or Cross-Ranges M+89 to M+96 => CNA #4 Ranges or Cross-Ranges M+96 to M+128 => Compute Cluster #4 Ranges or Cross-Ranges M+97 to M+104 => CNA #1 Ranges or Cross-Ranges M+105 to M+112 => CNA #2 Ranges or Cross-Ranges M+113 to M+120 => CNA #3 Ranges or Cross-Ranges M+121 to M+138 => CNA #4 for 1 < M < 128Similarly across all four (4) Compute boards with the associated changein indexing. (Polar Reformatting is similar except data is in Rng-XRng patches

Input to Board from ECCM nodes = 65.6 MBytes/sec/ECCM board x 4 ECCM boards = 262 MBytes/sec => perform range compressionRange Compression Output to Global Memory = 262 MBytes/sec

Post-Range Compression Processing out of global memory:

Input to Board 33.6 MBytes per fetch x 128 fetches per function per board every 65.5 sec = 4.3 GBytes/65.5 sec = 65.6 MBytes/sec per function per bd. Output from Board 33.6 MBytes per store x 128 stores per function per board every 65.5 sec = 4.3 GBytes/65.5 sec = 65.6 MBytes/sec per function per bd.

Aggregate Bandwidth per board= 131.2 MBytes/sec x 4 functions = 528.4 MBytes/sec


Example GMTI Partitioning/Mapping/Utilization


GMTI Processing - Assumptions

• Assumed GMTI Parameters• 16 channels (12 Subarrays and 4 Auxiliary Channels)• 400 - 650 Msamples/sec (2.5 - 1.5 nsec per sample)• 256 pulses per CPI • 1 KHZ PRI• 500 sec receive window; (200,000 - 333,000 range cells per pulse) into the polyphase channelizer• 4/3 oversampling out of the Polyphase Channelizer• 128 Subbands formed; only 40 processed• 6 beams formed• 10-bits (2 Bytes) per A/D sample• 64-bits (8 Bytes) for internal data storage for complex data (4 Bytes real, 4 Bytes imaginary)• 32-bits (2 Bytes) for internal data storage of real data

• Assumed Processing Resources• 32 processing nodes per board• 64 GFLOPS peak throughput per board• 48 GFLOPS sustained throughput per board (assumed @ 75% execution efficiency)• 32 MBytes local data memory per board

- 8 MBytes local data memory per processing cluster per board-- 2 MBytes local data memory per processing node

• 256 KBytes local memory per processor• 32 GBytes Global Memory (worst case SAR requirement)


GMTI Processing Flow/Board-Level Partitioning/Utilization #1


Board #1


Board #2


Board #3


Board #4


Board #1


Board #2


Board #3

Max. Sustained T-put per Stage = 144 GOPSUsed T-put perStage = 100.3GOPS

Each PPC Channelizer Board processes 4 radar channels with 288,000 range cells per pulse; generates 128 subbands; 40 subbands are output to the Global Memory for prcessing in the ECCM nodes

Each ECCM/AWC/PP Combo Board processes 16 radar channels with 1000 range cells per pulse and 40 subbands per channel; forms 6 beams; & combines 40 subbands; each ECCM node outputs 30,000 ranges to global memory for Pulse Compression processing

Each Pulse Compression board outputs 2 beams with 256 pulses and 72,000 ranges; Doppler board receives 6 beams with 256 pulses and 72,000 ranges and outputs 12 beams with 256 Doppler cells and 72,000 ranges (staggered); STAP outputs 3 beams with 256 Doppler cells and 72,000 ranges; CFAR outputs target detections

16 ch x 256 pulsesx 288,000 rangesx 4 Bytes everyCPI (0.256 sec)= 18.4 GBytes/sec


4 ch x 40 subbands x 1 pulse x 3000 ranges x 8 Bytes output from each PPC board and input to global memory every PRI (1 msec) = 3.85 GBytes/sec ; Aggregate BW = 15.4 GBytes/sec

6 beams x 256 pulses x 30,000 ranges x 8 Bytes output to global memory from each ECCM board every CPI = 1.44 GBytes/sec; Aggregate BW = 4.32 GBytes/sec

Global Memory

Max. Memory = 32 GBytesUsed Memory = 21 GBytes

Output

PulseComp.

Board #1

Pulse Comp.

Board #2

Pulse Comp.

Board #3

Max. Sustained T-put per Stage = 144 GOPSUsed T-put perStage = 101 GOPS

Doppler,STAP, &

CFARBoard #1


16 ch x 256 pulses x 1000 ranges x 40 subbands x 8 Bytes input to each each ECCM board from global memory every CPI = 5.1 GBytes/sec; Aggregate BW = 15.4 GBytes/sec

2 beams x 256 pulses x 90,000 ranges x 8 Bytes input to each Pulse Compression board from global memory each CPI = 1.44 GBytes/sec; Aggregate BW = 4.32 GBytes/sec

2 beams x 256 pulses x 72,000 ranges x 8 Bytes output from each Pulse Compression board to global memory every CPI = 1.15 GBytes/sec; Aggregate BW = 3.46 GBytes/sec

6 beams x 256 pulses x 72,000 ranges x 8 Bytes input to Doppler board from global memory every CPI = 3.46 GBytes/sec

Doppler Output@ 6.92 GBytes/sec

STAP Inout@ 6.92 GBytes/sec

STAP Output@ 1.73 GBytes/sec

CFAR Input @ 1.73 GBytes/sec




Board #1


Board #2


Board #3


Board #4


Board #1


Board #2


Board #3

Max. Sustained T-put per Stage = 144 GOPSUsed T-put perStage = 100.3GOPS

Each PPC Channelizer Board processes 4 radar channels with 288,000 range cells per pulse; generates 128 subbands; 40 subbands are output to the ECCM boards

Each ECCM/AWC/PP Combo Board processes 16 radar channels with 1000 range cells per pulse and 40 subbands per channel; forms 6 beams; & combines 40 subbands; each ECCM node outputs 30,000 ranges to global memory for Pulse Compression processing

Each Pulse Compression board outputs 2 beams with 256 pulses and 72,000 ranges; Doppler board receives 6 beams with 256 pulses and 72,000 ranges and outputs 12 beams with 256 Doppler cells and 72,000 ranges (staggered); STAP outputs 3 beams with 256 Doppler cells and 72,000 ranges; CFAR outputs target detections

16 ch x 256 pulsesx 288,000 rangesx 4 Bytes every0.256 seconds= 18.4 GBytes/sec


4 ch x 40 subbands x 1 pulses x 1000 ranges x 8 Bytes output from each PPC board to each ECCM boards every PRI (1 msec)= 5.12 GBytes/sec; Aggregate BW = 15.4 GBytes/sec

6 beams x 256 pulses x 30,000 ranges x 8 Bytes output to global memory from each ECCM board every PRI (1 msec) = 1.44 GBytes/sec; Aggregate BW = 4.32 GBytes/sec

Global Memory


Output

PulseComp.

Board #1

Pulse Comp.

Board #2

Pulse Comp.

Board #3


Doppler,STAP, &

CFARBoard #1


2 beams x 256 pulses x 90,000 ranges x 8 Bytes from global memory input to each Pulse Compression board every CPI (0.256 sec) = 1.44 GBytes/sec; Aggregate BW = 4.32 GBytes/sec

2 beams x 256 pulses x 72,000 ranges x 8 Bytes output to global memory from each Pulse Comp. board every CPI (0.256 sec)= 1.15 GBytes/sec; Aggregate BW = 3.46 GBytes/sec

6 beams x 256 pulses x 72,000 ranges x 8 Bytes input to Doppler board from global memory every CPI (0.256 sec) = 3.46 GBytes/sec

Doppler Output@ 6.92 GBytes/sec

STAP Input@ 6.92 GBytes/sec

STAP Output@ 1.73 GBytes/sec





Board #1


Board #2


Board #3


Board #4

ECCM, AWC, & PPCComb. & Pulse Comp.

Board #1


Each PPC Channelizer Board processes 4 radar channels with 288,000 range cells per pulse; generates 128 subbands; 40 subbands are output to the ECCM nodes

Each ECCM/AWC/PP Combo/Pulse Compression Board processes 16 radar channels with 90,000 range cells per pulse and 40 subbands per channel; forms 1 or beams (only 1 forms 2 beams; & combines 40 subbands; each ECCM node outputs 72,000 ranges to global memory for Doppler processing

Each ECCM/Pulse Compression board outputs 6 beams with 72,000 ranges per pulse; Doppler board receives 6 beams with 256 pulses and 72,000 ranges and outputs 12 beams with 256 Doppler cells and 72,000 ranges (staggered); STAP outputs 3 beams with 256 Doppler cells and 72,000 ranges; CFAR outputs target detections

16 ch x 256 pulsesx 288,000 rangesx 4 Bytes every0.256 seconds= 18.4 GBytes/sec


4 ch x 40 subbands x 1 pulse x 3000 ranges x 8 Bytes inputto ECCM boards from each PPC board every PRI (1 msec) = 3.07 GBytes/sec ); Aggregate BW = 15.4 GBytes/sec

Global Memory


OutputDoppler,STAP, &

CFARBoard #1


6 beams x 72,000 ranges x 1 pulse x 8 Bytes output to global memory from the ECCM/Pulse Compression Boards every PRI (1 msec) = 3.45 GBytes/sec Aggregate BW

6 beams x 256 pulses x 72,000 ranges x 8 Bytes input to the Doppler/STAP/CFAR board from global memory every CPI = 3.46 GBytes/sec

Doppler Output (12 beams x 72,000 ranges x 256 Dopplers)@ 6.92 GBytes/sec

STAP Input (12 beams x 72,000 ranges x 256 Dopplers) @ 6.92 GBytes/sec

STAP Output(3 beams x72,000 rangesx 256 Dopplers)@ 1.73 GBytes/sec



Board #2


Board #3


Board #4


Board #5


Backup Charts


Radar Event Scheduler/Time Line Generator

Search Reqmnts

Field-Requested Events

Track Revisit

Requirements

Real-TimeWaveformDesigner

Beam Steering Controller

OnboardProcessing

Configurator

S/C AttitudeDetermination

& Control

Exciter/Transmitter

ElectronicallySteerableAntenna

Subsystem...

Receiver

ProgrammableOnboard

ProcessingSubsystem

S/C GuidanceNavigation &

Control

Mode /ContextControl

Health& Status

Processed Data/Target Reports Communication

Subsystem

Uplink/Downlink

Mode Control

Hypothetical Real-Time Adaptive Space-Based Radar Design


Space-Based RadarAdvantages

- the ultimate “high ground” -- radar “horizon” > for airborne or ground radar - 24-hour all weather capability (IR sensors can’t see through clouds, optical sensors blind in dark) - less vulnerable/more survivable than airborne assets - once launched, lower logistics costs than airborne assets (fuel, ground support fighter protection, etc.) - continuous world-wide coverage with full constellation of satellites - High Range Resolution (HRR) with frequency jumped burst waveforms - SAR (Synthetic Aperture Radar) - IFSAR (Interferometric SAR) - DTED (Digital Terrain Elevation Data) - DAR (Distributed Aperture Radar) - multi-mission

-- GMTI (Joint STARS)-- SAR (Joint STARS)-- AMTI (AWACS [E3-A] & E-2C)

- GPIR (Ground Penetrating Imaging Radar) - foliage penetration capability - reduced downlink requirements with onboard processing in many cases

Disadvantages/Issues

- limited power generation and power dissipation capability in space - limited aperture size (antenna dimensions) - high altitude (R4 losses) - ionospheric effects - steep look-down angle/Nadir Hole - clutter Doppler/clutter Doppler spread -- function of satellite-target geometry (earth background) and platform velocity - optimal waveform design for target detection performance and clutter cancellation

-- frequency of operation (X,L,UHF/VHF)-- polarization (Tx and Rx)-- pulse width, PRF, CPI

-- range resolution/bandwidth-- Doppler resolution-- range ambiguities

-- Doppler ambiguities - constellation

-- altitude, inclination, number of satellites, phasing - environmental -- radiation, micro-meteorites, etc. - launch vehicle capability - initial system cost


Processor - Processing Engine(s) -- Heterogeneous vs. Homogeneous -- GP vs. Accelerator -- Custom ASIC vs. FPGA vs. PIM (Processor In Memory) -- Clock Rate -- Algorithm/Architecture Coupling Efficiency -- Flexibility/Programmability vs. Performance

- Mode Switching - Optimal Waveform Design

-- Performance - Rad Hard vs. Rad Tolerant -- Available Software Development Environment - Number of Processing Engines

-- Number of Modules-- Number of Processing Engines per Module-- Parallelization Efficiencies

- Interconnect Capacity/Capability - Overlap Processing and Communication - Bandwidth

Memory Storage - Amount (size) - Location - Access Bandwidth - Utilization -- Pipeline -- Double Buffering -- “Caching”/Data Blocking

Algorithm Implementation - Analog vs. Digital vs. Optical - Performance/Efficiency - Optimum utilization of available arithmetic units - Arithmetic Precision -- Fixed Point vs. Floating Point -- # of bits -- Single vs. Double -- Programmable - Time Domain vs. Frequency Domain Pulse Compression -- Optimal Size of FFT for Frequency Domain Conv. - Radix Size(s) for FFTs - Power of 2 FFTs vs. odd-point FFTs - Method for Adaptive Weight Computation - Data Flow vs. Other Implementation Paradigms -- Medium/Coarse Grained vs. Fine Grained - Partitioning/Parallelism -- Data Partitioning -- Algorithm Partitioning -- Round Robin vs. Distributed Parallel - Phantom Functions, e.g., corner turns - Order of Execution of Functions - Grouping of Functions - Combination of Functions - Minimize Memory Required - Minimize Communication Requirements - Fault Tolerance - Real Time Execution

Considerations for Optimal Onboard Processing


Line ofFlight

GMTI CPIs GMTI CPIs GMTI CPIs

RadarProcessingTime Line

SAR CPI SAR CPI

SAR Beams SAR BeamsGMTI Search & Track Beams

Location of Last Detection ofa Suspected Stop & Go Target. . .

Images

Mode Processing ContextSwitch Points

Context Switch from GMTI to SpotlightSAR to image location of last detectionof a Stop-and-Go target which mayhave stopped

Context Switch from SAR back tonormal search and track patterns

GMTI Application

SAR Application

SAGE Capture

OnboardProcessor

TrackerTracks

Radar EventScheduler

RadarSynchronizer/

Controller

BeamSteering

Controller

RadarData

DetectionsGMTI

SAR

GMTI/SAR Mode Switching for SBR

skau1 approach to scalable parallel processing for space-based radar

Documents