multi-core gpu – fast parallel sar image generation

1
Abstract This poster presents a design for parallel processing of synthetic aperture radar (SAR) data using multi-core Graphics Processing Units (GP-GPUs). In this design a two-dimensional image is reconstructed from received echo pulses and their corresponding response values. Then the comparative performance of proposed parallel algorithm implementation is tabulated using different set of nvidia GPUs i.e. GT, Tesla and Fermi. Analyzing experimentation results on image of various size 1,024 x 1,024 pixels to 4,096 x 4,096 pixels led us to the conclusion that nvidia Fermi S2050 have maximum performance throughput . Introduction Generating images from Synthetic aperture radar (SAR) video’s raw data require complex computations that is tabulated in data sample for RADARSAT-1[1]. In SAR system, fundamental principle is based on matching of received(Rx) signal phase with the transmitted(Tx) signal phase at a stable frequency of transmission. Fast Fourier Transform (FFT) calculation is used to calculate phase of received and transmitted signal. The modified algorithm that is presented in the paper [2] is implemented by dividing the impulse response of Rx/Tx signal into several blocks of equal length and then computing FFT on each block. It help to process the dimension of range and azimuth in parallel. Since GP-GPU has huge computational capability which process data in parallel, modified SAR image generation on GPU is implemented. New proposed parallel algorithm helped to achieve faster execution speed for range and azimuth dimension. Algorithm And Method Data Sample Implementation Typical Parallel Algorithm Proposed Parallel Algorithm Process distribution to generate SAR Range Image Generation Timing (1024x1024) Azimuth Image Generation Timing (1024x1024) Final SAR Image Generation (1024x1024) Result - SAR Image Generation References [1] Ian G Cumming and Frank H Wong, Digital Processing of Synthetic Aperture Data, Artech House, Boston, London, Jan 2005. [2] Bhattacharya C, “Parallel processing of satellite-borne SAR data for accurate and efficient image formation,” EUSAR 2010, pp. 1046-49, 7-10 June 2010, Aachen, Germany. [3] NVIDIA Corporation, Compute Unified Device Architecture (CUDA), http://developer.nvidia.com/object/cuda.html. [4] AK Agarwal and et al, “Accelerated SAR image generation on GPGPU platform,” APSAR, 2011 Multi-core GPU Fast parallel SAR image generation Mahesh Khadtare 1 , Pragati Dharmale 2 , Prakalp Somwanshi 3 & C Bhattacharya 4 1 I2IT, Pune, IN; 2 SNHU, NH, US; 3 CRL, Pune, IN; 4 DIAT, Pune, IN Parameter Typical Value Number of range samples 6840 Number of Azimuth samples 4096 Block size (image size) 8192 X 8192 Data size (actual size) 11.27 M samples /sec Total Computational requirement (range) 1.83 GFLOPS Total Computational requirement (azimuth) 3.28 GFLOPS Total load computing requirement 5.11 GFLOPS NOTE: All calculations in O (GFLOPS) 6840 pixels, 50 km Range (R) Azimuth (s) 4096 lines, 3.6 km RADAR-SAT-1 Data from C-band Sensor 50 km Range 3.6 km Azimuth 7.2 m Ground Range Resolution 5.26 m Azimuth Resolution Image (1) Image (2) Image (N) GPU-PE(1) Process Image (1) Process Image (2) Process Image (N) GPU-PE(2) GPU-PE(N) Image 1 G-PE 1 G-PE 2 G-PE N P-Image 1 Image N G-PE 1 G-PE 2 G-PE N P-Image N Exploit Linearity of Correlation process 1 ) 1 ( ), ( ) ( 1 0 K i n iK n u n u P i i 1 ) 1 ( ), ( ) ( 1 0 K j n jK n y n y Q j j 1 ), ( ) ( ) ( 1 0 2 2 0 1 0 K l K n u l n y l r i P j l K n j Q j Perform Block Correlation in Transform domain 1 2 0 ], [ ) ( * K k k Y k U IDFT l r j i ij Arrange Correlation vector as output of Matrix transform X = A*T Mathematical Model Phase-I Range Processing Phase-II Azimuth Processing Input Data Stream Output data stream Time Signal Data Sensor Range FFT Multiplication of Reference Function Range IFFT Corner-turn Azimuth FFT Range Migration Multiplication of Reference Function Azimuth IFFT Recons- tructed Picture Range Azimuth Range Azimuth Range Compress Processing Azimuth Compress Processing The rate of Execution Time* 34 % 6 % 60 % *1 Thread, Conventional Corner Turn CPU / GPU Core Details Time(sec) Speedu p AMD Athlon X2 Dual Core 1 45.337 1x Fx1800 64 19.50 2.32x GeForce GT 525M 96 4.150 10.92x Tesla GeForce GTX 275 192 1.421 31.90x Fermi S2050 (Multi GPU) 4x448 0.321 141.21x 0 10 20 30 40 50 AMD Athlon X2 Dual Core Fx1800 GeForce GT 525M Fermi S2050 Range Image Processing 0 10 20 30 40 50 60 AMD Athlon X2 Dual Core Fx1800 GeForce GT 525M Fermi S2050 Azimuth Image Processing CPU / GPU Core Details Time(sec) Speedup AMD Athlon X2 Dual Core 1 50.065 1x Fx1800 64 36.60 1.368x GeForce GT 525M 96 8.501 5.884x Tesla GeForce GTX 275 192 3.725 13.44x Fermi S2050 (Multi GPU) 4x448 0.981 51.01x CPU / GPU Core Details Time(sec) Speedup AMD Athlon X2 Dual Core 1 95.00 1x Fx1800 64 36.60 2.6x GeForce GT 525M 96 9.23 10.29x Tesla GeForce GTX 275 192 4.4563 21.31x Fermi S2050 (Multi GPU) 4x448 1.280 74.21x 0 20 40 60 80 100 AMD Athlon X2 Dual Core Fx1800 GeForce GT 525M Fermi S2050 SAR Image Processing Transmit Signal ( M Blocks) Formation of M*N Blocks Received Signal (N Blocks) Formation of M*N Blocks FFT of M*N Block vector of Transmit Signal FFT of M*N Block vector of Received Signal IFFT with Normalization of correlated M*N blocks of data Rearrange the DATA Processed O/P Image CONTACT NAME Mahesh Khadtare: [email protected] POSTER P5142 CATEGORY: VIDEO & IMAGE PROCESSING - VI02

Upload: maheshkha

Post on 20-Mar-2017

15 views

Category:

Engineering


2 download

TRANSCRIPT

Page 1: Multi-core GPU – Fast parallel SAR image generation

Abstract This poster presents a design for parallel processing of synthetic aperture radar (SAR) data using multi-core Graphics Processing Units (GP-GPUs). In this design a two-dimensional image is reconstructed from received echo pulses and their corresponding response values. Then the comparative performance of proposed parallel algorithm implementation is tabulated using different set of nvidia GPUs i.e. GT, Tesla and Fermi. Analyzing experimentation results on image of various size 1,024 x 1,024 pixels to 4,096 x 4,096 pixels led us to the conclusion that nvidia Fermi S2050 have maximum performance throughput .

Introduction Generating images from Synthetic aperture radar (SAR) video’s raw data require complex computations that is tabulated in data sample for RADARSAT-1[1].

In SAR system, fundamental principle is based on matching of received(Rx) signal phase with the transmitted(Tx) signal phase at a stable frequency of transmission. Fast Fourier Transform (FFT) calculation is used to calculate phase of received and transmitted signal.

The modified algorithm that is presented in the paper [2] is implemented by dividing the impulse response of Rx/Tx signal into several blocks of equal length and then computing FFT on each block. It help to process the dimension of range and azimuth in parallel. Since GP-GPU has huge computational capability which process data in parallel, modified SAR image generation on GPU is implemented. New proposed parallel algorithm helped to achieve faster execution speed for range and azimuth dimension.

Algorithm And Method

Data Sample

Implementation Typical Parallel Algorithm

Proposed Parallel Algorithm

Process distribution to generate SAR

Range Image Generation Timing (1024x1024)

Azimuth Image Generation Timing (1024x1024)

Final SAR Image Generation (1024x1024)

Result - SAR Image Generation

References [1] Ian G Cumming and Frank H Wong, Digital Processing of Synthetic Aperture Data, Artech House, Boston, London, Jan 2005. [2] Bhattacharya C, “Parallel processing of satellite-borne SAR data for accurate and efficient image formation,” EUSAR 2010, pp. 1046-49, 7-10 June 2010, Aachen, Germany. [3] NVIDIA Corporation, Compute Unified Device Architecture (CUDA), http://developer.nvidia.com/object/cuda.html. [4] AK Agarwal and et al, “Accelerated SAR image generation on GPGPU platform,” APSAR, 2011

Multi-core GPU – Fast parallel SAR image generation Mahesh Khadtare1, Pragati Dharmale2, Prakalp Somwanshi3 & C Bhattacharya4

1 – I2IT, Pune, IN; 2 – SNHU, NH, US; 3 – CRL, Pune, IN; 4 – DIAT, Pune, IN

Parameter Typical Value Number of range samples 6840

Number of Azimuth samples 4096

Block size (image size) 8192 X 8192

Data size (actual size) 11.27 M samples /sec

Total Computational requirement (range) 1.83 GFLOPS

Total Computational requirement (azimuth) 3.28 GFLOPS

Total load computing requirement 5.11 GFLOPS

NOTE: All calculations in O (GFLOPS) 6840 pixels, 50 km

Range (R)

Azimuth (s)

4096 lines, 3.6 km

RADAR-SAT-1 Data from C-band Sensor

• 50 km Range • 3.6 km Azimuth • 7.2 m Ground Range Resolution • 5.26 m Azimuth Resolution

Image (1) Image (2) Image (N)

GPU-PE(1)

Process Image (1) Process Image (2) Process Image (N)

GPU-PE(2) GPU-PE(N)

Image 1

G-PE 1

G-PE 2

G-PE N

P-Image 1

Image N

G-PE 1

G-PE 2

G-PE N

P-Image N

Exploit Linearity of Correlation process

1)1(),()(1

0

KiniKnunuP

ii

1)1(),()(1

0

KjnjKnynyQ

jj

1),()()(1

0

22

0

1

0

KlKnulnylr i

P

j

lK

nj

Q

j

Perform Block Correlation in Transform domain

120],[)( * KkkYkUIDFTlr jiij

Arrange Correlation vector as output of Matrix transform

X = A*T

Mathematical Model

Phase-I

Range Processing

Phase-II

Azimuth Processing

Input Data Stream Output data stream

Time

Signal Data

Sensor

Ran

ge F

FT

Mul

tiplic

atio

n of

R

efer

ence

Fu

nctio

n

Ran

ge IF

FT

C

orne

r-tu

rn

Azim

uth

FFT

Ran

ge M

igra

tion

Mul

tiplic

atio

n of

R

efer

ence

Fu

nctio

n

Azim

uth

IFFT

Recons-tructed Picture

Range Azimuth Range Azimuth

Range Compress Processing Azimuth Compress Processing

The rate of Execution Time* 34 % 6 % 60 %

*1 Thread, Conventional Corner Turn

CPU / GPU Core Details

Time(sec) Speedup

AMD Athlon X2 Dual Core

1 45.337 1x

Fx1800 64 19.50 2.32x

GeForce GT 525M 96 4.150 10.92x

Tesla GeForce GTX 275 192 1.421 31.90x

Fermi S2050 (Multi GPU)

4x448 0.321 141.21x

0 10 20 30 40 50

AMD Athlon X2 Dual Core

Fx1800 GeForce GT 525M

Fermi S2050

Range Image Processing

0 10 20 30 40 50 60

AMD Athlon X2 Dual Core

Fx1800 GeForce GT 525M

Fermi S2050

Azimuth Image Processing CPU / GPU Core Details

Time(sec) Speedup

AMD Athlon X2 Dual Core 1 50.065 1x

Fx1800 64 36.60 1.368x

GeForce GT 525M 96 8.501 5.884x

Tesla GeForce GTX 275 192 3.725 13.44x

Fermi S2050 (Multi GPU)

4x448 0.981 51.01x

CPU / GPU Core Details

Time(sec) Speedup

AMD Athlon X2 Dual Core 1 95.00 1x

Fx1800 64 36.60 2.6x

GeForce GT 525M 96 9.23 10.29x

Tesla GeForce GTX 275 192 4.4563 21.31x

Fermi S2050 (Multi GPU)

4x448 1.280 74.21x

0 20 40 60 80

100

AMD Athlon X2 Dual Core

Fx1800 GeForce GT 525M

Fermi S2050

SAR Image Processing

Transmit Signal ( M Blocks)

Formation of M*N Blocks

Received Signal (N Blocks)

Formation of M*N Blocks

FFT of M*N Block vector of Transmit

Signal FFT of M*N Block vector

of Received Signal

IFFT with Normalization of correlated M*N blocks of

data

Rearrange the DATA

Processed O/P Image

contact name

Mahesh Khadtare: [email protected]

P5142

category: Video & iMage Processing - Vi02