skau1 approach to scalable parallel processing for space-based radar

25
Skau 1 C onfidential and Proprietary Approach to Scalable Parallel Processing For Space-Based Radar

Upload: audra-mathews

Post on 04-Jan-2016

221 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Skau1 Approach to Scalable Parallel Processing For Space-Based Radar

Skau 1Confidential and Proprietary

Approach to Scalable Parallel ProcessingFor Space-Based Radar

Page 2: Skau1 Approach to Scalable Parallel Processing For Space-Based Radar

Skau 2Confidential and Proprietary

Example: SAR and GMTI Partitioning/Mapping/Utilization

Page 3: Skau1 Approach to Scalable Parallel Processing For Space-Based Radar

Skau 3Confidential and Proprietary

DopplerProcessing

JammerNulling

CFAROutputClutter

SuppressionSTAP

PulseCompression

Input

A/D

Range Correction &

AzimuthProcessing

RangeCompression

ImageFormation

&Post

Processing

OutputAzimuthFiltering

InputJammerNulling

A/D

Basic SAR Processing Flow

Basic GMTI Processing Flow

Basic SBR Signal Processing

Page 4: Skau1 Approach to Scalable Parallel Processing For Space-Based Radar

Skau 4Confidential and Proprietary

Example: SAR Processing Assumptions

• Assumed SAR Parameters• 16 channels (12 Subarrays and 4 Auxiliary Channels)• 400 - 650 Msamples/sec (2.5 - 1.5 nsec per sample• 500 sec receive window (200,000 - 333,000 range cells per pulse)• 1 KHZ PRI• 16,384 pulses in azimuth (16.4sec collection time)• The last 1/2 of the collected samples are used as the first 1/2 of the samples for the image (processed only through range pulse compression and stored)• 1 beam formed for SAR image; input 16,384 ranges x 16,384 pulses• 8-bits (1 Byte) per A/D sample• 64-bits (8 Bytes) for internal data storage for complex data (4 Bytes real, 4 Bytes imaginary)• 32-bits (2 Bytes) for internal data storage of real data• 4/3 oversampling out of the Polyphase Channelizer• 128 Subbands formed; only 40 processed

Page 5: Skau1 Approach to Scalable Parallel Processing For Space-Based Radar

Skau 5Confidential and Proprietary

Example: SAR/GMTI Partitioning on BEP System

• Set local memory per processing element at 128 KBytes to be able to handle 16K FFTs for SAR mode (not double buffered)

• For the GMTI mode, the 128 KBytes of local memory can handle

- 256 pulses and 64 ranges per main memory access that can be processed in the pulse or Doppler dimension

or

- 8,000 ranges and 2 pulses worth of data per main memory access that can be processed in the range dimension

or

- any combination that maximizes throughput by “blocking,” i.e., effectively “caching”, and “striding” data for optimum performance

• Data can also be partitioned across beams, channels, and segments when they are independent variables relative to high level data flow processing

Page 6: Skau1 Approach to Scalable Parallel Processing For Space-Based Radar

Skau 6Confidential and Proprietary

SAR Processing Flow/Global Memory Accessing (1 of 2)

Range PulseCompression16,384

ranges foreach pulse

Store datafrom eachpulse in global

memory 16,384

ranges foreach pulse

16,384pulses

x 1range

x 8 Bytes= 131.1 KBytes

Extractpulse

(cross-range)data

Perform cross-range

FFT

Loop 16,384 times = 16,384 ranges/1 range/loop

128pulsesx 128

rangesx

8 Bytes= 131.1 KBytes

Extract2D PolarReformat

data

Perform Polar

Reformatting

Loop 16,908 times = 268.44 x 106 cells/15.876 x 103 cells/patch

Storeprocessed

data

131.1 KBytes

Storeprocessed

data

Small overlap toprocess 126 x 126

patch

16,384 KBytes

16,384ranges

x 1pulse

x 8 Bytes= 16,384 KBytes

Loop 16,384 times = 16,384 pulses/1 pulse/loop

16,384 KBytes

Extract rangedata

Perform rangeFFT

Storeprocessed

data(transposed)

Page 7: Skau1 Approach to Scalable Parallel Processing For Space-Based Radar

Skau 7Confidential and Proprietary

SAR Processing Flow/Global Memory Accessing (2 of 2)

16,384pulses

x 1range

x 8 Bytes= 131.1 KBytes

Loop 16,384 times = 16,384 ranges/1 range/loop

Perform Auto Focus

16,384 MBytes

Storeprocessed

data

131.1 KBytes

16,384pulses

x 1range

x 8 Bytes= 131.1 KBytes

Loop 16,384 times = 16,384 ranges/1 range/loop

65.6 KBytes

Extract pulse

(cross-range)data

Perform MagnitudeFunction

Storeprocessed

data

Extractpulse

(cross-range)data

Perform cross-range

FFT

Storeprocessed

data

Extractpulse

(cross-range)data

16,384pulses

x 1range

x 8 Bytes= 131.1 KBytes

Loop 1024 times = 32,768 ranges/32 ranges/loop

All of this, from cross-range FFT through Magnituding, can be done with the data in place, i.e.,

no need to extract and restore data until allof the processing is

complete. (I setit up that way

because the 2ndhalf of the 2D FFT,

AutoFocus, and Magnituding appear

to be performedonly in the cross-rangedimension.

Page 8: Skau1 Approach to Scalable Parallel Processing For Space-Based Radar

Skau 8Confidential and Proprietary

SAR Processing Flow/Global Memory Utilization

16, 384pulses

2.2 GBytescomplex data

Storage ofpulse

compressedrange datafor furtherprocessing

1.1 GBytesof new data from

current CPIcollection

Storage ofcross-range

FFT processeddata throughthrough polarreformatting

Storage of1st half of

2D FFTresult

transposed(corner-turned)

throughMagnituding

2.2 GBytescomplex data

2.2 GBytescomplex data Total global memory required for SAR (worst case) = 9.9 GBytes *

* This might be able to be reduced to 7.8 GBytes if 2D FFT can be done “in place.” **

2.1 GBytes available for storing training samples for AWC (Adaptive Weight Computation) for ECCM

CPIN-1 CPINCPIN+1

Platform Motion

Notes: 1) For continuous map, the last half of the previous CPI can be used as the first half of the data for the next CPI 2) Not exactly sure how this works with ECCM, collecting training samples, etc. Obviously, you do not want a big jammer to mess up the formation of the SAR image. How to null a big jammer out without affecting the image is a major consideration. 3) For onboard DTED processing, I assume the global memory requirements would double because two (2) beams would be formed, an upper beam and a lower beam with slightly different look-down angles that could be used to form the elevation differential ISAR image 4) Historically, SAR processing hasn’t required the arithmetic precision of GMTI, e.g., 4-bit A/D converters and 8-16 bit data representation in the processing chain. The memory requirement is a function of the arithmetic precision.

Page 9: Skau1 Approach to Scalable Parallel Processing For Space-Based Radar

Skau 9Confidential and Proprietary

Possible SAR Partitioning/Mapping/Utilization(32K x 32K Image)

Page 10: Skau1 Approach to Scalable Parallel Processing For Space-Based Radar

Skau 10Confidential and Proprietary

SAR Processing - Assumptions

• Assumed SAR Parameters• 16 channels (12 Subarrays and 4 Auxiliary Channels)• 400 - 650 Msamples/sec (2.5 - 1.5 nsec per sample• 500 sec receive window (200,000 - 333,000 range cells per pulse)• 1 KHZ PRI• 32,768 pulses in azimuth (32.8 sec collection time)• The last 1/2 of the collected samples are used as the first 1/2 of the samples for the image (processed only through range pulse compression and stored)• 1 beam formed for SAR image; input 32,768 ranges x 32, 768 pulses• 8-bits (1 Byte) per A/D sample• 64-bits (8 Bytes) for internal data storage for complex data (4 Bytes real, 4 Bytes imaginary)• 32-bits (2 Bytes) for internal data storage of real data• 4/3 oversampling out of the Polyphase Channelizer• 128 Subbands formed; only 40 processed

• Assumed Processing Resources• 32 processing nodes per board• 64 GFLOPS peak throughput per board• 48 GFLOPS sustained throughput per board (assumed @ 75% execution efficiency)• 32 MBytes local data memory per board

- 8 MBytes local data memory per processing cluster per board-- 2 MBytes local data memory per processing node

• 256 KBytes local memory per processor• 32 GBytes Global Memory (worst case SAR requirement)

Page 11: Skau1 Approach to Scalable Parallel Processing For Space-Based Radar

Skau 11Confidential and Proprietary

SAR Processing Flow/Global Memory Accessing (1 of 2)

Range PulseCompression32,768

ranges foreach pulse

Store datafrom eachpulse in global

memory 32,768

ranges foreach pulse

32,768pulses

x 32ranges

x 8 Bytes= 8.4 MBytes

Extractpulse

(cross-range)data

Perform cross-range

FFT

Loop 1024 times = 32,768 ranges/32 ranges/loop

1012pulsesx 1012ranges

x8 Bytes

= 8.2 MBytes

Extract2D PolarReformat

data

Perform Polar

Reformatting

Loop 1060 times = 1073.74 x 106 cells/1.012 x 106cells/patch

Storeprocessed

data

8.4 MBytes

Storeprocessed

data

Small overlap toprocess 1010 x 1010

patch

8.2 MBytes

32,768ranges

x 32pulses

x 8 Bytes= 8.4 MBytes

Loop 1024 times = 32,768 ranges/32 ranges/loop

8.4 MBytes

Extract rangedata

Perform rangeFFT

Storeprocessed

data(transposed)

Note: This sizing assumes that the extracted data fills the available 8 MBytes of external memory available on the Compute Cluster. It might be safer to assume only twenty-eight (28) ranges per extraction ==> more loops, but the bandwidth is about the same because the same amount of data has to be extracted and re-stored. The same thing is true for the patch size, except there is a little more bandwidth required because of the overlap needed to do the interpolation. This gets a little tricky because of the assumed in-place calculation.

Page 12: Skau1 Approach to Scalable Parallel Processing For Space-Based Radar

Skau 12Confidential and Proprietary

SAR Processing Flow/Global Memory Accessing (2 of 2)

32,768pulses

x 32ranges

x 8 Bytes= 8.4 MBytes

Loop 1024 times = 32,768 ranges/32 ranges/loop

Perform Auto Focus

8.4 MBytes

Storeprocessed

data

8.4 MBytes

32,768pulses

x 32ranges

x 8 Bytes= 8.4 MBytes

Loop 1024 times = 32,768 ranges/32 ranges/loop

4.2 MBytes

Extract pulse

(cross-range)data

Perform MagnitudeFunction

Storeprocessed

data

Extractpulse

(cross-range)data

Perform cross-range

FFT

Storeprocessed

data

Extractpulse

(cross-range)data

32,768pulses

x 32ranges

x 8 Bytes= 8.4 MBytes

Loop 1024 times = 32,768 ranges/32 ranges/loop

All of this, from cross-range FFT through Magnituding, can be done with the data in place, i.e.,

no need to extract and restore data until allof the processing is

complete. (I setit up that way

because the 2ndhalf of the 2D FFT,

AutoFocus, and Magnituding appear

to be performedonly in the cross-rangedimension.

Note: This sizing assumes that the extracted data fills the available 2MBytes of external memory available on the Compute Cluster. It might be safer to assume only twenty-eight (28) ranges per extraction ==> more loops, but the bandwidth is about the same because the same amount of data has to be extracted and re-stored.

Page 13: Skau1 Approach to Scalable Parallel Processing For Space-Based Radar

Skau 13Confidential and Proprietary

SAR Processing Flow/Global Memory Utilization

32, 768pulses

8.6 GBytescomplex data

Storage ofpulse

compressedrange datafor furtherprocessing

4.3 GBytesof new data from

current CPIcollection

Storage ofcross-range

FFT processeddata throughthrough polarreformatting

Storage of1st half of

2D FFTresult

transposed(corner-turned)

throughMagnituding

8.6 GBytescomplex data

8.6 GBytescomplex data Total global memory required for SAR (worst case) = 30.1 GBytes *

* This might be able to be reduced to 21.4 GBytes if 2D FFT can be done “in place.” **

2.9 GBytes available for storing training samples for AWC (Adaptive Weight Computation) for ECCM

CPIN-1 CPINCPIN+1

Platform Motion

Notes: 1) For continuous map, the last half of the previous CPI can be used as the first half of the data for the next CPI 2) I am not sure how this works with ECCM, collecting training samples, etc. Obviously, you wouldn’t want a big jammer to mess up the formation of the SAR image, but I am not sure how you null a big jammer out without affecting the image 3) For onboard DTED processing, I assume the global memory requirements would double because two (2) beams would be formed, an upper beam and a lower beam with slightly different look-down angles that could be used to form the elevation differential ISAR image 4) Historically, SAR processing hasn’t required the arithmetic precision of GMTI, e.g., 4-bit A/D converters and 8-16 bit data representation in the processing chain. The memory requirement is a function of the arithmetic precision.

Page 14: Skau1 Approach to Scalable Parallel Processing For Space-Based Radar

Skau 14Confidential and Proprietary

SAR Processing Flow/Board-Level Partitioning/Utilization

PolyphaseChannelizer

Board #1

PolyphaseChannelizer

Board #2

PolyphaseChannelizer

Board #3

PolyphaseChannelizer

Board #4

ECCM, AWC, &PP Combination

Board #1

ECCM, AWC, &PP Combination

Board #2

ECCM, AWC, &PP Combination

Board #4

ECCM, AWC, &PP Combination

Board #3

Range PC, Freq.Conv, Polar Reform,2D FFT, Auto Focus,

& Mag. Board #1

Range PC, Freq.Conv, Polar Reform,2D FFT, Auto Focus,

& Mag. Board #2

Range PC, Freq.Conv, Polar Reform,2D FFT, Auto Focus,

& Mag. Board #3

Range PC, Freq.Conv, Polar Reform,2D FFT, Auto Focus,

& Mag. Board #4Max. Sustained T-put per Stage = 192 GOPSUsed T-put perStage = 144 GOPS

Max. Sustained T-put per Stage = 192 GOPSUsed T-put per Stage = 96 GOPS

Each ECCM/AWC/PP Combo Board processes 16 radar channels with 750 range cells per pulse and 40 subbands per channel; forms 1 beam; combines 40 subbands; each ECCM node outputs 8192 ranges to each of the next stage receiving nodes

Each Range PC - Mag node performs range compression on a pulse by pulse basis; sends processed data to global memory; frequency conversion, polar reformatting, 2D FFT,Auto Focus, and Magnituding are performed working out ofglobal memory, and restoring processed data in global memory on a function by function basis

1.15 GBytes/sec inputto each PPC board

1 beam x 8192 ranges x 8 Bytesinput to each Range PC board from each ECCM board every PRI = 65.5 MBytes/sec; Aggregate BW = 1.05 GBytes/sec

Global Memory

Max. Memory = 32 GBytesUsed Memory = 30.1 GBytes

Output

528.4 MBytes/sec between each board and global memory Aggregate BW = 2.1 GBytes/sec

Training Samples &Adaptive Weights

16 ch x 256 pulsesx 288,000 rangesx 4 Bytes everyCPI (0.256 sec)= 18.4 GBytes/sec

4 ch x 40 subbands x 1 pulse x 750 ranges x 8 Bytes output from each PPC board and input to each ECCM board every PRI (1 msec) = 0.96 GBytes/sec; Aggregate BW = 15.4 GBytes/sec

Each PPC Channelizer Board processes 4 radar channels with 288,000 range cells per pulse; generates 128 subbands; 40 subbands are output to the Global Memory for prcessing in the ECCM nodes

Page 15: Skau1 Approach to Scalable Parallel Processing For Space-Based Radar

Skau 15Confidential and Proprietary

SAR Processing Flow/Node-Level Partitioning/Utilization

GlobalMemory

Compute Board #N

8 MByteLocalData

Memory

8 MByteLocalData

Memory

8 MByteLocalData

Memory

8 MByteLocalData

Memory

Compute Cluster #1

Compute Cluster #2

Compute Cluster #3

Compute Cluster #4

CNA #1

CNA #2

CNA #3

CNA #4

CNA #1

CNA #2

CNA #3

CNA #4

CNA #1

CNA #2

CNA #3

CNA #4

CNA #1

CNA #2

CNA #3

CNA #4

Possible Partitioning/Utilization for Range Pulse Compression,Frequency Conversion in Range, 2D FFT, Auto Focus, andMagnituding: Ranges or Cross-Ranges M+1 to M+32 => Compute Cluster #1 Ranges or Cross-Ranges M+1 to M+8 => CNA #1 Ranges or Cross-Ranges M+9 to M+16 => CNA #2 Ranges or Cross-Ranges M+17 to M+24 => CNA #3 Ranges or Cross-Ranges M+25 to M+32 => CNA #4 Ranges or Cross-Ranges M+33 to M+64 => Compute Cluster #2 Ranges or Cross-Ranges M+33 to M+40 => CNA #1 Ranges or Cross-Ranges M+41 to M+48 => CNA #2 Ranges or Cross-Ranges M+49 to M+56 => CNA #3 Ranges or Cross-Ranges M+57 to M+64 => CNA #4 Ranges or Cross-Ranges M+65 to M+96 => Compute Cluster #3 Ranges or Cross-Ranges M+65 to M+72 => CNA #1 Ranges or Cross-Ranges M+73 to M+80 => CNA #2 Ranges or Cross-Ranges M+81 to M+88 => CNA #3 Ranges or Cross-Ranges M+89 to M+96 => CNA #4 Ranges or Cross-Ranges M+96 to M+128 => Compute Cluster #4 Ranges or Cross-Ranges M+97 to M+104 => CNA #1 Ranges or Cross-Ranges M+105 to M+112 => CNA #2 Ranges or Cross-Ranges M+113 to M+120 => CNA #3 Ranges or Cross-Ranges M+121 to M+138 => CNA #4 for 1 < M < 128Similarly across all four (4) Compute boards with the associated changein indexing. (Polar Reformatting is similar except data is in Rng-XRng patches

Input to Board from ECCM nodes = 65.6 MBytes/sec/ECCM board x 4 ECCM boards = 262 MBytes/sec => perform range compressionRange Compression Output to Global Memory = 262 MBytes/sec

Post-Range Compression Processing out of global memory:

Input to Board 33.6 MBytes per fetch x 128 fetches per function per board every 65.5 sec = 4.3 GBytes/65.5 sec = 65.6 MBytes/sec per function per bd. Output from Board 33.6 MBytes per store x 128 stores per function per board every 65.5 sec = 4.3 GBytes/65.5 sec = 65.6 MBytes/sec per function per bd.

Aggregate Bandwidth per board= 131.2 MBytes/sec x 4 functions = 528.4 MBytes/sec

Page 16: Skau1 Approach to Scalable Parallel Processing For Space-Based Radar

Skau 16Confidential and Proprietary

Example GMTI Partitioning/Mapping/Utilization

Page 17: Skau1 Approach to Scalable Parallel Processing For Space-Based Radar

Skau 17Confidential and Proprietary

GMTI Processing - Assumptions

• Assumed GMTI Parameters• 16 channels (12 Subarrays and 4 Auxiliary Channels)• 400 - 650 Msamples/sec (2.5 - 1.5 nsec per sample)• 256 pulses per CPI • 1 KHZ PRI• 500 sec receive window; (200,000 - 333,000 range cells per pulse) into the polyphase channelizer• 4/3 oversampling out of the Polyphase Channelizer• 128 Subbands formed; only 40 processed• 6 beams formed• 10-bits (2 Bytes) per A/D sample• 64-bits (8 Bytes) for internal data storage for complex data (4 Bytes real, 4 Bytes imaginary)• 32-bits (2 Bytes) for internal data storage of real data

• Assumed Processing Resources• 32 processing nodes per board• 64 GFLOPS peak throughput per board• 48 GFLOPS sustained throughput per board (assumed @ 75% execution efficiency)• 32 MBytes local data memory per board

- 8 MBytes local data memory per processing cluster per board-- 2 MBytes local data memory per processing node

• 256 KBytes local memory per processor• 32 GBytes Global Memory (worst case SAR requirement)

Page 18: Skau1 Approach to Scalable Parallel Processing For Space-Based Radar

Skau 18Confidential and Proprietary

GMTI Processing Flow/Board-Level Partitioning/Utilization #1

PolyphaseChannelizer

Board #1

PolyphaseChannelizer

Board #2

PolyphaseChannelizer

Board #3

PolyphaseChannelizer

Board #4

ECCM, AWC, &PP Combination

Board #1

ECCM, AWC, &PP Combination

Board #2

ECCM, AWC, &PP Combination

Board #3

Max. Sustained T-put per Stage = 144 GOPSUsed T-put perStage = 100.3GOPS

Each PPC Channelizer Board processes 4 radar channels with 288,000 range cells per pulse; generates 128 subbands; 40 subbands are output to the Global Memory for prcessing in the ECCM nodes

Each ECCM/AWC/PP Combo Board processes 16 radar channels with 1000 range cells per pulse and 40 subbands per channel; forms 6 beams; & combines 40 subbands; each ECCM node outputs 30,000 ranges to global memory for Pulse Compression processing

Each Pulse Compression board outputs 2 beams with 256 pulses and 72,000 ranges; Doppler board receives 6 beams with 256 pulses and 72,000 ranges and outputs 12 beams with 256 Doppler cells and 72,000 ranges (staggered); STAP outputs 3 beams with 256 Doppler cells and 72,000 ranges; CFAR outputs target detections

16 ch x 256 pulsesx 288,000 rangesx 4 Bytes everyCPI (0.256 sec)= 18.4 GBytes/sec

4.61 GBytes/sec inputto each PPC board

4 ch x 40 subbands x 1 pulse x 3000 ranges x 8 Bytes output from each PPC board and input to global memory every PRI (1 msec) = 3.85 GBytes/sec ; Aggregate BW = 15.4 GBytes/sec

6 beams x 256 pulses x 30,000 ranges x 8 Bytes output to global memory from each ECCM board every CPI = 1.44 GBytes/sec; Aggregate BW = 4.32 GBytes/sec

Global Memory

Max. Memory = 32 GBytesUsed Memory = 21 GBytes

Output

PulseComp.

Board #1

Pulse Comp.

Board #2

Pulse Comp.

Board #3

Max. Sustained T-put per Stage = 144 GOPSUsed T-put perStage = 101 GOPS

Doppler,STAP, &

CFARBoard #1

Max. Sustained T-put per Stage = 48 GOPSUsed T-put perStage = 34 GOPS

16 ch x 256 pulses x 1000 ranges x 40 subbands x 8 Bytes input to each each ECCM board from global memory every CPI = 5.1 GBytes/sec; Aggregate BW = 15.4 GBytes/sec

2 beams x 256 pulses x 90,000 ranges x 8 Bytes input to each Pulse Compression board from global memory each CPI = 1.44 GBytes/sec; Aggregate BW = 4.32 GBytes/sec

2 beams x 256 pulses x 72,000 ranges x 8 Bytes output from each Pulse Compression board to global memory every CPI = 1.15 GBytes/sec; Aggregate BW = 3.46 GBytes/sec

6 beams x 256 pulses x 72,000 ranges x 8 Bytes input to Doppler board from global memory every CPI = 3.46 GBytes/sec

Doppler Output@ 6.92 GBytes/sec

STAP Inout@ 6.92 GBytes/sec

STAP Output@ 1.73 GBytes/sec

CFAR Input @ 1.73 GBytes/sec

Page 19: Skau1 Approach to Scalable Parallel Processing For Space-Based Radar

Skau 19Confidential and Proprietary

GMTI Processing Flow/Board-Level Partitioning/Utilization #2

PolyphaseChannelizer

Board #1

PolyphaseChannelizer

Board #2

PolyphaseChannelizer

Board #3

PolyphaseChannelizer

Board #4

ECCM, AWC, &PP Combination

Board #1

ECCM, AWC, &PP Combination

Board #2

ECCM, AWC, &PP Combination

Board #3

Max. Sustained T-put per Stage = 144 GOPSUsed T-put perStage = 100.3GOPS

Each PPC Channelizer Board processes 4 radar channels with 288,000 range cells per pulse; generates 128 subbands; 40 subbands are output to the ECCM boards

Each ECCM/AWC/PP Combo Board processes 16 radar channels with 1000 range cells per pulse and 40 subbands per channel; forms 6 beams; & combines 40 subbands; each ECCM node outputs 30,000 ranges to global memory for Pulse Compression processing

Each Pulse Compression board outputs 2 beams with 256 pulses and 72,000 ranges; Doppler board receives 6 beams with 256 pulses and 72,000 ranges and outputs 12 beams with 256 Doppler cells and 72,000 ranges (staggered); STAP outputs 3 beams with 256 Doppler cells and 72,000 ranges; CFAR outputs target detections

16 ch x 256 pulsesx 288,000 rangesx 4 Bytes every0.256 seconds= 18.4 GBytes/sec

4.61 GBytes/sec inputto each PPC board

4 ch x 40 subbands x 1 pulses x 1000 ranges x 8 Bytes output from each PPC board to each ECCM boards every PRI (1 msec)= 5.12 GBytes/sec; Aggregate BW = 15.4 GBytes/sec

6 beams x 256 pulses x 30,000 ranges x 8 Bytes output to global memory from each ECCM board every PRI (1 msec) = 1.44 GBytes/sec; Aggregate BW = 4.32 GBytes/sec

Global Memory

Max. Memory = 32 GBytesUsed Memory = 13.1 GBytes

Output

PulseComp.

Board #1

Pulse Comp.

Board #2

Pulse Comp.

Board #3

Max. Sustained T-put per Stage = 144 GOPSUsed T-put perStage = 101 GOPS

Doppler,STAP, &

CFARBoard #1

Max. Sustained T-put per Stage = 48 GOPSUsed T-put perStage = 34 GOPS

2 beams x 256 pulses x 90,000 ranges x 8 Bytes from global memory input to each Pulse Compression board every CPI (0.256 sec) = 1.44 GBytes/sec; Aggregate BW = 4.32 GBytes/sec

2 beams x 256 pulses x 72,000 ranges x 8 Bytes output to global memory from each Pulse Comp. board every CPI (0.256 sec)= 1.15 GBytes/sec; Aggregate BW = 3.46 GBytes/sec

6 beams x 256 pulses x 72,000 ranges x 8 Bytes input to Doppler board from global memory every CPI (0.256 sec) = 3.46 GBytes/sec

Doppler Output@ 6.92 GBytes/sec

STAP Input@ 6.92 GBytes/sec

STAP Output@ 1.73 GBytes/sec

CFAR Input @ 1.73 GBytes/sec

Page 20: Skau1 Approach to Scalable Parallel Processing For Space-Based Radar

Skau 20Confidential and Proprietary

GMTI Processing Flow/Board-Level Partitioning/Utilization #3

PolyphaseChannelizer

Board #1

PolyphaseChannelizer

Board #2

PolyphaseChannelizer

Board #3

PolyphaseChannelizer

Board #4

ECCM, AWC, & PPCComb. & Pulse Comp.

Board #1

Max. Sustained T-put per Stage = 240 GOPSUsed T-put perStage = 202 GOPS

Each PPC Channelizer Board processes 4 radar channels with 288,000 range cells per pulse; generates 128 subbands; 40 subbands are output to the ECCM nodes

Each ECCM/AWC/PP Combo/Pulse Compression Board processes 16 radar channels with 90,000 range cells per pulse and 40 subbands per channel; forms 1 or beams (only 1 forms 2 beams; & combines 40 subbands; each ECCM node outputs 72,000 ranges to global memory for Doppler processing

Each ECCM/Pulse Compression board outputs 6 beams with 72,000 ranges per pulse; Doppler board receives 6 beams with 256 pulses and 72,000 ranges and outputs 12 beams with 256 Doppler cells and 72,000 ranges (staggered); STAP outputs 3 beams with 256 Doppler cells and 72,000 ranges; CFAR outputs target detections

16 ch x 256 pulsesx 288,000 rangesx 4 Bytes every0.256 seconds= 18.4 GBytes/sec

4.61 GBytes/sec inputto each PPC board

4 ch x 40 subbands x 1 pulse x 3000 ranges x 8 Bytes inputto ECCM boards from each PPC board every PRI (1 msec) = 3.07 GBytes/sec ); Aggregate BW = 15.4 GBytes/sec

Global Memory

Max. Memory = 128 GBytesUsed Memory = 8.66 GBytes

OutputDoppler,STAP, &

CFARBoard #1

Max. Sustained T-put per Stage = 48 GOPSUsed T-put perStage = 34 GOPS

6 beams x 72,000 ranges x 1 pulse x 8 Bytes output to global memory from the ECCM/Pulse Compression Boards every PRI (1 msec) = 3.45 GBytes/sec Aggregate BW

6 beams x 256 pulses x 72,000 ranges x 8 Bytes input to the Doppler/STAP/CFAR board from global memory every CPI = 3.46 GBytes/sec

Doppler Output (12 beams x 72,000 ranges x 256 Dopplers)@ 6.92 GBytes/sec

STAP Input (12 beams x 72,000 ranges x 256 Dopplers) @ 6.92 GBytes/sec

STAP Output(3 beams x72,000 rangesx 256 Dopplers)@ 1.73 GBytes/sec

CFAR Input @ 1.73 GBytes/sec

ECCM, AWC, & PPCComb. & Pulse Comp.

Board #2

ECCM, AWC, & PPCComb. & Pulse Comp.

Board #3

ECCM, AWC, & PPCComb. & Pulse Comp.

Board #4

ECCM, AWC, & PPCComb. & Pulse Comp.

Board #5

Page 21: Skau1 Approach to Scalable Parallel Processing For Space-Based Radar

Skau 21Confidential and Proprietary

Backup Charts

Page 22: Skau1 Approach to Scalable Parallel Processing For Space-Based Radar

Skau 22Confidential and Proprietary

Radar Event Scheduler/Time Line Generator

Search Reqmnts

Field-Requested Events

Track Revisit

Requirements

Real-TimeWaveformDesigner

Beam Steering Controller

OnboardProcessing

Configurator

S/C AttitudeDetermination

& Control

Exciter/Transmitter

ElectronicallySteerableAntenna

Subsystem...

Receiver

ProgrammableOnboard

ProcessingSubsystem

S/C GuidanceNavigation &

Control

Mode /ContextControl

Health& Status

Processed Data/Target Reports Communication

Subsystem

Uplink/Downlink

Mode Control

Hypothetical Real-Time Adaptive Space-Based Radar Design

Page 23: Skau1 Approach to Scalable Parallel Processing For Space-Based Radar

Skau 23Confidential and Proprietary

Space-Based RadarAdvantages

- the ultimate “high ground” -- radar “horizon” > for airborne or ground radar - 24-hour all weather capability (IR sensors can’t see through clouds, optical sensors blind in dark) - less vulnerable/more survivable than airborne assets - once launched, lower logistics costs than airborne assets (fuel, ground support fighter protection, etc.) - continuous world-wide coverage with full constellation of satellites - High Range Resolution (HRR) with frequency jumped burst waveforms - SAR (Synthetic Aperture Radar) - IFSAR (Interferometric SAR) - DTED (Digital Terrain Elevation Data) - DAR (Distributed Aperture Radar) - multi-mission

-- GMTI (Joint STARS)-- SAR (Joint STARS)-- AMTI (AWACS [E3-A] & E-2C)

- GPIR (Ground Penetrating Imaging Radar) - foliage penetration capability - reduced downlink requirements with onboard processing in many cases

Disadvantages/Issues

- limited power generation and power dissipation capability in space - limited aperture size (antenna dimensions) - high altitude (R4 losses) - ionospheric effects - steep look-down angle/Nadir Hole - clutter Doppler/clutter Doppler spread -- function of satellite-target geometry (earth background) and platform velocity - optimal waveform design for target detection performance and clutter cancellation

-- frequency of operation (X,L,UHF/VHF)-- polarization (Tx and Rx)-- pulse width, PRF, CPI

-- range resolution/bandwidth-- Doppler resolution-- range ambiguities

-- Doppler ambiguities - constellation

-- altitude, inclination, number of satellites, phasing - environmental -- radiation, micro-meteorites, etc. - launch vehicle capability - initial system cost

Page 24: Skau1 Approach to Scalable Parallel Processing For Space-Based Radar

Skau 24Confidential and Proprietary

Processor - Processing Engine(s) -- Heterogeneous vs. Homogeneous -- GP vs. Accelerator -- Custom ASIC vs. FPGA vs. PIM (Processor In Memory) -- Clock Rate -- Algorithm/Architecture Coupling Efficiency -- Flexibility/Programmability vs. Performance

- Mode Switching - Optimal Waveform Design

-- Performance - Rad Hard vs. Rad Tolerant -- Available Software Development Environment - Number of Processing Engines

-- Number of Modules-- Number of Processing Engines per Module-- Parallelization Efficiencies

- Interconnect Capacity/Capability - Overlap Processing and Communication - Bandwidth

Memory Storage - Amount (size) - Location - Access Bandwidth - Utilization -- Pipeline -- Double Buffering -- “Caching”/Data Blocking

Algorithm Implementation - Analog vs. Digital vs. Optical - Performance/Efficiency - Optimum utilization of available arithmetic units - Arithmetic Precision -- Fixed Point vs. Floating Point -- # of bits -- Single vs. Double -- Programmable - Time Domain vs. Frequency Domain Pulse Compression -- Optimal Size of FFT for Frequency Domain Conv. - Radix Size(s) for FFTs - Power of 2 FFTs vs. odd-point FFTs - Method for Adaptive Weight Computation - Data Flow vs. Other Implementation Paradigms -- Medium/Coarse Grained vs. Fine Grained - Partitioning/Parallelism -- Data Partitioning -- Algorithm Partitioning -- Round Robin vs. Distributed Parallel - Phantom Functions, e.g., corner turns - Order of Execution of Functions - Grouping of Functions - Combination of Functions - Minimize Memory Required - Minimize Communication Requirements - Fault Tolerance - Real Time Execution

Considerations for Optimal Onboard Processing

Page 25: Skau1 Approach to Scalable Parallel Processing For Space-Based Radar

Skau 25Confidential and Proprietary

Line ofFlight

GMTI CPIs GMTI CPIs GMTI CPIs

RadarProcessingTime Line

SAR CPI SAR CPI

SAR Beams SAR BeamsGMTI Search & Track Beams

Location of Last Detection ofa Suspected Stop & Go Target. . .

Images

Mode Processing ContextSwitch Points

Context Switch from GMTI to SpotlightSAR to image location of last detectionof a Stop-and-Go target which mayhave stopped

Context Switch from SAR back tonormal search and track patterns

GMTI Application

SAR Application

SAGE Capture

OnboardProcessor

TrackerTracks

Radar EventScheduler

RadarSynchronizer/

Controller

BeamSteering

Controller

RadarData

DetectionsGMTI

SAR

GMTI/SAR Mode Switching for SBR