a multi-processor system on chip architecture for real time remote sensing data processing

Post on 25-Feb-2016

46 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing. Presenter: Dr. Alejandro Castillo Atoche. 2011/07/25. IGARSS’11. Outline. Introduction Previous Work MPSoC via the HW/SW Co-design Case Study: RBR Algorithms Algorithm Analysis - PowerPoint PPT Presentation

TRANSCRIPT

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico. 1

2011/07/25

Presenter: Dr. Alejandro Castillo Atoche

A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing

IGARSS’11

2

Outline Introduction Previous Work MPSoC via the HW/SW Co-design

Case Study: RBR Algorithms Algorithm Analysis

Network on Chip (NoC)-based Accelerator Integration in a Co-design scheme

New Perspective: Network of FPGA-VLSI architectures

Hardware Implementation Results Performance Analysis

Conclusions

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

3

Introduction: Radar Imagery, Facts

The initial problem of this proposition for the Geospatial RS imagery consist in to solve the ill-conditioned inverse spatial spectrum pattern (SSP) estimation problem with model uncertainties via the Bayesian minimum risk (BMR) estimation strategy.

In previous works, alternatives of MPSoC propositions have been developed but without systolic arrays techniques or Network on a Chip structures.

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

4

Introduction: HW implementation, Facts

Why Multiprocessor System on a Chip? Because MPSoCs are single-chip multiprocessor

designed for real time signal processing applications.

Why Network on a Chip Accelerators?Networks-on-chips (NoCs) are multiprocessor

interconnection networks designed to achieved real time SP. Avoids Bottlenecks in HW/SW co-designs.

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

5

MOTIVATION

To efficiently conceptualize and implement an architecture with the aggregation of parallel computing and systolic array mapping techniques in a novel network on a chip (NoC) accelerator scheme via the HW/SW co-design paradigm.

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

6

CONTRIBUTIONS:

First, a high-speed robust Bayesian regularization hardware accelerator for the real-time enhancement of the large scale Geospatial imagery is designed.

Second, the use of High Performance Computing techniques in an efficient architecture based on Network on a Chip (NoC) is also developed.

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

8

Algorithmic ref. Implementation

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

9

Algorithmic ref. Implementation

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

10School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Algorithmic ref. Implementation

Method → RSF RBR

SNR [dB] → 15 20 25 15 20 25

Metrics

IOSNR

[dB]10.15 15.32 20.25 6.15 10.62 13.04

PIOSNR

(%)81.37 86.62 85.24 95.18 90.29 98.24

MSE 0.16 0.46 0.57 0.03 0.29 0.34

11

Partitioning Stage

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

FPGA

EmbeddedProcessor

Robust SS vector Coprocessor 1

Data acquisition:

Parameters: 1, , , ,a r

n nR R S S

Pre-computed Sw-stage:

,F Ω

+diag { }V Fuu F

RBR estimator Coprocessor 2

0ˆRBR b b ΩV

( )ju

12

NoC oriented structure of the proposed coprocessors

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

(a) Robust SS vector

From Embedded Processor

InputFIFO

:,

data inputu F

tiledcontrol

Fu

tiledcontrol

FIFO

MemoryBuffer

tiledcontrol

F uu

tiledcontrol

FIFO

2Fixed-Sized PA2 4m

MemoryBuffer

diag

F uu F

tiledcontrol

tiledcontrol

FIFO

1Fixed-Sized PA1 32m

3Fixed-Sized PA3 32m

OutputFIFO

:data outputV

To Embedded Processor

13

NoC oriented structure of the proposed coprocessors

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

(b) RBR estimator

RBRb̂0b

InputFIFO

:,

data inputΩ V

tiledcontrol

Ω V

tiledcontrol

FIFO

MemoryBuffer

4Fixed-Sized PA4 32m

From Embedded Processor 0Ω V b

0b OutputFIFO

:data output

To Embedded Processor

14

Aggregation of parallel computing techniques

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Application

for (tile=0, tile< L, tile++){

} } }

for (i=0, i< m, i++){for (j=0, j< n, j++){

for (k=0, k< r, k++){a(i,j,k)=a(i,j-1,k);b(i,j,k)=b(i-1,j,k);c(i,j,k)=c(i,j,k-1) +

a(i,j,k)*b(i,j,k);

}

B[2,2]

A[2,2]

--

---

-

-

-- ---

-

-

--

B[0,0]

A[0,0]

- -

-

-

B[0,1]B[1,0]

A[1,0]A[0,1]

-

-

B[0,2]B[1,1]B[2,0]

A[2,0]

A[1,1]A[0,2]

B[1,2]B[2,1]

A[1,2]

A[2,1]

-

-

Linear Schedule: set of parallel and uniformely spaced hyperplanes.

SFG Projection

3-D Dependance Graph (DG)

add

mul

a[i,j-1]

a[i,j]

b[i-1, j] b[i,j]

c[i,j]

15

Tiling technique

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

E/S

E/S

E/S E/S

E/S

E/S E/S E/S

E/S

E/S

Large-Scale Real-World

Image

PEPE

PEPE

FIFO

FIFO

FIFO

FIFO

E/S

E/S

E/S

E/S

Fixed-Size Systolic Array

16

Tiling technique

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PEPEPE

PEPEPE

PEPE

PE PE

PE

PE

PE

PE

E/S

E/S

E/S

E/S

Fifo

Fifo

Fifo

Fifo

PE PE

PE PE

E/S

E/S

PE PE

PE

PE

PEPE

PEPEPE

PE

PE PE

PE

PE

PE

PE PE

PE

PE

PE

E/S E/S E/S

E/SE/S

E/SE/SE/S

E/SE/S

PE

PE

PE

PE

E/S

PE

PEPE

PE

E/S

PE

PE

PE

PE

PE PE

PE

PE

PEPE

PEPEPE

PE

PE PE

PE

PE

PE

PE PE

PE

PE

PE

E/S E/S E/S

E/SE/S

E/SE/SE/S

E/SE/S

Fifo

Fifo

Fifo

PE PE

PE

Fifo

PE

PE

PE PE

PE

E/S

E/S

E/S

E/S

PE

PE

PE

PE

PE PE

PE

PE

PEPE

PEPEPE

PE

PE PE

PE

PE

PE

PE PE

PE

PE

PE

E/S E/S E/S

E/SE/S

E/SE/SE/S

E/SE/S

Fifo

FifoPEPE

PEPE

Fifo

Fifo

PE

PE PE

PE

E/SE/S

PE

PE

PE

PE

PE PE

PE

PE

PEPE

PEPEPE

PE

PE PE

PE

PE

PE

PE PE

PE

PE

PE

E/S E/S E/S

E/SE/S

E/SE/SE/S

E/SE/S PEPE

PEPE

Fifo

Fifo

Fifo

Fifo

PE

PE PE

PE

E/S E/S

PE

PE

PE

PE

PE PE

PE

PE

PEPE

PEPEPE

PE

PE PE

PE

PE

PE

PE PE

PE

PE

PE

E/S E/S E/S

E/SE/S

E/SE/SE/S

E/SE/S

Fifo

FifoPEPE

PEPE

Fifo

Fifo

E/S

PE

PE PE

PE

E/S

I/O

I/O

PE

PE

PE

PE

PE PE

PE

PE

PEPE

PEPEPE

PE

PE PE

PE

PE

PE

PE PE

PE

PE

PE

E/S E/S E/S

E/SE/S

E/SE/SE/S

E/SE/S

Fifo

FifoPEPE

PEPE

Fifo

Fifo

PE

PE PE

PE

E/S

E/SE/S

E/S

(1,2)

(2,1)

Large-Scale Real-World

Image

Fixed-Size Systolic Array

17

Fixed-Sized NoC-PAs-based Robust SS vector co-processor

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Stage1: 1 PA Fu

DegradedLarge-scaleRS Image

(1)u(2)u(3)u

( )nu

1 1k k

32

u1,1u1,2u1,u n

,u n n

GlobalControl

full en 1,1F 1,2F

1,F n

,Fn n

F

en

full

FIFO Buffer

32

4 , 1 3 , 2 2 , 3 ,

3 ,1 2 , 2 1 , 3

2 , 1 1 , 2

1 , 1

F F F FF F F 0F F 0 0F 0 0 0

m n

u

F

1,1u

D

1,2u

D

1,u m

D

FIFO

1,3u

D

0

1,3u[32,23]

1,3F[32,23]

T[64,46]

[32,23]

[32,23]TEMP_1 ( 1)V m

TEMP_1 ( )V m

[32,23]

TEMP_1VlocalControl

1Fixed-Sized PA1 32m

( 1)n

tiledControl

D: one step delayT: truncate

FIFO Buffer

tiledControl

From Embedded Processor

Stage 1: TEMP_1 V F u( )n n ( 1)n( 1)n

Data Skewed

18

Fixed-Sized NoC-PAs-based Robust SS vector co-processor

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Stage2:

Stage 2: TEMP_2 TEMP_1V V u

( )n n ( 1)n (1 )n

[32,23]TEMP_1

1,1

V

[32,23]

1,u m

T

[64,46]

[32,23]

TEMP_21,3

V

PE PE PE PE

PE PE PE PE

D

D

D

D

PE PE PE PE

D

D

DD

DDD

D

DD D

TEMP_11,1

V

TEMP_12,1

V

TEMP_1,1

Vm

TEMP_11,1

Vm

TEMP_12 ,1

Vm

TEMP_12,1

Vm

1,1u1,2u

1,3u1,u m

1, 1u m

1, 2u m

1, 3u m

1,2u m

2Fixed-Sized PA4m

TEMP_1255 1,1

Vm

TEMP_1255 2,1

Vm

TEMP_1,1

Vn

1,255 1u m

1,255 2u m

1,255 3u m

1,u n

TEMP_2V( )n n

2PA Fuu

19

Fixed-Sized NoC-PAs-based Robust SS vector co-processor

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Stage3: 3 diag{ } . PA Fuu F

Stage 3: TEMP_2 diag{ }V V F( )n n ( 1)n( 1)n

PE

0

PE

PE

PE

tiledControl

3Fixed-Sized PA3 32m 1,1F

0

0

0

1,2F

2,1F

0

0

1,3F

2,2F

3,1F

0

1,2F m

2,2 1F m

3,2 2F m

,Fm m

V

tiledControl

TEMP_21,1

V

0

0

0

TEMP_22,1

V

TEMP_21,2

V

0

0

TEMP_22 ,1

Vm

TEMP_22 1,2

Vm

TEMP_22 2,3

Vm

TEMP_2,

Vm m

32 1,1v1,2v1,v m

,v n n

GlobalControl

full

FIFO Buffer

To Embedded Processor

V

( 1)n

RobustSS Vector

,Fm m

TEMP_2,

Vm m

( 1)V m( )V m

20

Fixed-Sized NoC-PAs-based RBR estimator co-processor

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

4 ,1 3 , 2 2 , 3 ,

3 ,1 2 , 2 1 , 3

2 ,1 1 , 2

1 ,1

00 0

0 0 0

m n

1,1V

D

1,2V

D

1,V m

D

FIFO

1,3V

D

0

1,3V[32,23]

1,3[32,23]

T[64,46]

[32,23]

[32,23]TEMP_1 ( 1)b̂ m

TEMP_1 ( )b̂ m

[32,23]

Ω V

Fixed-Sized PA64m

( 1)n

tiledControl

D: one step delayT: truncate

tiledControl

32 1,1V1,2V1,V m

,Vn n

FIFO Buffer

V

32

32 1,11,21,m

,n n

FIFO Buffer

Ω

3232

32

localControl

From

Embedded Processor

0bRBRb̂

32RBR 1,1b̂RBR 1,2b̂

RBR 1,b̂ m

RBR ,b̂ n n

GlobalControl

full

FIFO Buffer

32

RBRb̂

ReconstructedRS Image

RBR (1)b̂RBR (2)b̂RBR(3)b̂

RBR ( )ˆ

nb

1 1k k

RBRb̂

To Embedded Processor

GlobalControl

full en

0 1,1b0 1,2b

0 1,b m

0 ,b n n

FIFO Buffer

0b

32

full en

GlobalControl

full en

GlobalControl

RBR 0ˆ b b ΩV

21

New Perspective:VLSI-FPGA Platforms

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Novel VLSI-FPGA platform represents a new perspective for real time processing of newer RS applications.

22School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

VLSI-FPGA Platform

Control System

EmbeddedProcessor

Spatial-temporalreorder

FPGA

BufferMemory

FIFO

Bit-LevelMPPA

Architectureu

F

Image data

RobustifiedReconstruction

operator

VLSI Co-processor

DegradedLarge-scaleRS Image

1 1k k

ReconstructedRS Image

RFS( )ˆ

jb

1 1k k

ju

23

Performance Analysis: FPGA

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

HW co-processors → Robust SS vector RBR

estimator

Synthesis

Metrics

Slices 8158 3289

*DSP’48 144 32

^LUTs 7539 2278

Flip-Flops 6304 2788

24

Performance Analysis: FPGA

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Implementation →Processing time (seconds)

RBR

Evaluated PC-Oriented Implementation 19.7

Proposed Efficient RBR architecture 1.26

25

Conclusions The implementation results of the proposed NoC-PA-

oriented architecture helps to drastically reduce the overall processing time of the RBR algorithm. In fact, the presented architecture is efficiently implemented in MPSoC mode in spite of employing systems based on traditional DSPs or PC-Clusters platforms .

The implementation of the RBR algorithm using the proposed architecture takes only 1.26 seconds for the large-scale RS image reconstruction in contrast to 19.7 seconds required with the C++ implementation. Thus, the achieved processing time is approximately 16 times less than the corresponding processing time with the conventional C++ PC-based implementation.

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

26

Recent Selected Journal Papers

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

A. Castillo Atoche, D. Torres, Yuriy V. Shkvarko, “Towards Real Time Implementation of Reconstructive Signal Processing Algorithms Using Systolic Arrays Coprocessors”, JOURNAL OF SYSTEMS ARCHITECTURE (JSA), Edit. ELSEVIER, Volume 56, Issue 8, August 2010, Pages 327-339, ISSN: 1383-7621, doi:10.1016/j.sysarc.2010.05.004. JCR.

A. Castillo Atoche, D. Torres, Yuriy V. Shkvarko, “Descriptive Regularization-Based Hardware/Software Co-Design for Real-Time Enhanced Imaging in Uncertain Remote Sensing Environment”, EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING (JASP), Edit. HINDAWI, Volume 2010, 31 pages, 2010. ISSN: 1687-6172, e-ISSN: 1687-6180, doi:10.1155/ASP. JCR.

Yuriy V. Shkvarko, A. Castillo Atoche, D. Torres, “Near Real Time Enhancement of Geospatial Imagery via Systolic Implementation of Neural Network-Adapted Convex Regularization Techniques”, JOURNAL OF PATTERN RECOGNITION LETTERS, Edit. ELSEVIER, 2011. JCR. In Press

27

Thanks for your attention.

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Dr. Alejandro Castillo AtocheEmail: acastill@uady.mx

top related