a multi-processor system on chip architecture for real time remote sensing data processing

26
School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico. 1 2011/07/2 Presenter: Dr. Alejandro Castillo Atoche A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing

Upload: dean

Post on 25-Feb-2016

46 views

Category:

Documents


2 download

DESCRIPTION

A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing. Presenter: Dr. Alejandro Castillo Atoche. 2011/07/25. IGARSS’11. Outline. Introduction Previous Work MPSoC via the HW/SW Co-design Case Study: RBR Algorithms Algorithm Analysis - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico. 1

2011/07/25

Presenter: Dr. Alejandro Castillo Atoche

A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing

IGARSS’11

Page 2: A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing

2

Outline Introduction Previous Work MPSoC via the HW/SW Co-design

Case Study: RBR Algorithms Algorithm Analysis

Network on Chip (NoC)-based Accelerator Integration in a Co-design scheme

New Perspective: Network of FPGA-VLSI architectures

Hardware Implementation Results Performance Analysis

Conclusions

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Page 3: A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing

3

Introduction: Radar Imagery, Facts

The initial problem of this proposition for the Geospatial RS imagery consist in to solve the ill-conditioned inverse spatial spectrum pattern (SSP) estimation problem with model uncertainties via the Bayesian minimum risk (BMR) estimation strategy.

In previous works, alternatives of MPSoC propositions have been developed but without systolic arrays techniques or Network on a Chip structures.

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Page 4: A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing

4

Introduction: HW implementation, Facts

Why Multiprocessor System on a Chip? Because MPSoCs are single-chip multiprocessor

designed for real time signal processing applications.

Why Network on a Chip Accelerators?Networks-on-chips (NoCs) are multiprocessor

interconnection networks designed to achieved real time SP. Avoids Bottlenecks in HW/SW co-designs.

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Page 5: A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing

5

MOTIVATION

To efficiently conceptualize and implement an architecture with the aggregation of parallel computing and systolic array mapping techniques in a novel network on a chip (NoC) accelerator scheme via the HW/SW co-design paradigm.

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Page 6: A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing

6

CONTRIBUTIONS:

First, a high-speed robust Bayesian regularization hardware accelerator for the real-time enhancement of the large scale Geospatial imagery is designed.

Second, the use of High Performance Computing techniques in an efficient architecture based on Network on a Chip (NoC) is also developed.

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Page 7: A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing

8

Algorithmic ref. Implementation

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Page 8: A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing

9

Algorithmic ref. Implementation

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Page 9: A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing

10School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Algorithmic ref. Implementation

Method → RSF RBR

SNR [dB] → 15 20 25 15 20 25

Metrics

IOSNR

[dB]10.15 15.32 20.25 6.15 10.62 13.04

PIOSNR

(%)81.37 86.62 85.24 95.18 90.29 98.24

MSE 0.16 0.46 0.57 0.03 0.29 0.34

Page 10: A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing

11

Partitioning Stage

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

FPGA

EmbeddedProcessor

Robust SS vector Coprocessor 1

Data acquisition:

Parameters: 1, , , ,a r

n nR R S S

Pre-computed Sw-stage:

,F Ω

+diag { }V Fuu F

RBR estimator Coprocessor 2

0ˆRBR b b ΩV

( )ju

Page 11: A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing

12

NoC oriented structure of the proposed coprocessors

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

(a) Robust SS vector

From Embedded Processor

InputFIFO

:,

data inputu F

tiledcontrol

Fu

tiledcontrol

FIFO

MemoryBuffer

tiledcontrol

F uu

tiledcontrol

FIFO

2Fixed-Sized PA2 4m

MemoryBuffer

diag

F uu F

tiledcontrol

tiledcontrol

FIFO

1Fixed-Sized PA1 32m

3Fixed-Sized PA3 32m

OutputFIFO

:data outputV

To Embedded Processor

Page 12: A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing

13

NoC oriented structure of the proposed coprocessors

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

(b) RBR estimator

RBRb̂0b

InputFIFO

:,

data inputΩ V

tiledcontrol

Ω V

tiledcontrol

FIFO

MemoryBuffer

4Fixed-Sized PA4 32m

From Embedded Processor 0Ω V b

0b OutputFIFO

:data output

To Embedded Processor

Page 13: A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing

14

Aggregation of parallel computing techniques

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Application

for (tile=0, tile< L, tile++){

} } }

for (i=0, i< m, i++){for (j=0, j< n, j++){

for (k=0, k< r, k++){a(i,j,k)=a(i,j-1,k);b(i,j,k)=b(i-1,j,k);c(i,j,k)=c(i,j,k-1) +

a(i,j,k)*b(i,j,k);

}

B[2,2]

A[2,2]

--

---

-

-

-- ---

-

-

--

B[0,0]

A[0,0]

- -

-

-

B[0,1]B[1,0]

A[1,0]A[0,1]

-

-

B[0,2]B[1,1]B[2,0]

A[2,0]

A[1,1]A[0,2]

B[1,2]B[2,1]

A[1,2]

A[2,1]

-

-

Linear Schedule: set of parallel and uniformely spaced hyperplanes.

SFG Projection

3-D Dependance Graph (DG)

add

mul

a[i,j-1]

a[i,j]

b[i-1, j] b[i,j]

c[i,j]

Page 14: A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing

15

Tiling technique

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

E/S

E/S

E/S E/S

E/S

E/S E/S E/S

E/S

E/S

Large-Scale Real-World

Image

PEPE

PEPE

FIFO

FIFO

FIFO

FIFO

E/S

E/S

E/S

E/S

Fixed-Size Systolic Array

Page 15: A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing

16

Tiling technique

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PEPEPE

PEPEPE

PEPE

PE PE

PE

PE

PE

PE

E/S

E/S

E/S

E/S

Fifo

Fifo

Fifo

Fifo

PE PE

PE PE

E/S

E/S

PE PE

PE

PE

PEPE

PEPEPE

PE

PE PE

PE

PE

PE

PE PE

PE

PE

PE

E/S E/S E/S

E/SE/S

E/SE/SE/S

E/SE/S

PE

PE

PE

PE

E/S

PE

PEPE

PE

E/S

PE

PE

PE

PE

PE PE

PE

PE

PEPE

PEPEPE

PE

PE PE

PE

PE

PE

PE PE

PE

PE

PE

E/S E/S E/S

E/SE/S

E/SE/SE/S

E/SE/S

Fifo

Fifo

Fifo

PE PE

PE

Fifo

PE

PE

PE PE

PE

E/S

E/S

E/S

E/S

PE

PE

PE

PE

PE PE

PE

PE

PEPE

PEPEPE

PE

PE PE

PE

PE

PE

PE PE

PE

PE

PE

E/S E/S E/S

E/SE/S

E/SE/SE/S

E/SE/S

Fifo

FifoPEPE

PEPE

Fifo

Fifo

PE

PE PE

PE

E/SE/S

PE

PE

PE

PE

PE PE

PE

PE

PEPE

PEPEPE

PE

PE PE

PE

PE

PE

PE PE

PE

PE

PE

E/S E/S E/S

E/SE/S

E/SE/SE/S

E/SE/S PEPE

PEPE

Fifo

Fifo

Fifo

Fifo

PE

PE PE

PE

E/S E/S

PE

PE

PE

PE

PE PE

PE

PE

PEPE

PEPEPE

PE

PE PE

PE

PE

PE

PE PE

PE

PE

PE

E/S E/S E/S

E/SE/S

E/SE/SE/S

E/SE/S

Fifo

FifoPEPE

PEPE

Fifo

Fifo

E/S

PE

PE PE

PE

E/S

I/O

I/O

PE

PE

PE

PE

PE PE

PE

PE

PEPE

PEPEPE

PE

PE PE

PE

PE

PE

PE PE

PE

PE

PE

E/S E/S E/S

E/SE/S

E/SE/SE/S

E/SE/S

Fifo

FifoPEPE

PEPE

Fifo

Fifo

PE

PE PE

PE

E/S

E/SE/S

E/S

(1,2)

(2,1)

Large-Scale Real-World

Image

Fixed-Size Systolic Array

Page 16: A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing

17

Fixed-Sized NoC-PAs-based Robust SS vector co-processor

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Stage1: 1 PA Fu

DegradedLarge-scaleRS Image

(1)u(2)u(3)u

( )nu

1 1k k

32

u1,1u1,2u1,u n

,u n n

GlobalControl

full en 1,1F 1,2F

1,F n

,Fn n

F

en

full

FIFO Buffer

32

4 , 1 3 , 2 2 , 3 ,

3 ,1 2 , 2 1 , 3

2 , 1 1 , 2

1 , 1

F F F FF F F 0F F 0 0F 0 0 0

m n

u

F

1,1u

D

1,2u

D

1,u m

D

FIFO

1,3u

D

0

1,3u[32,23]

1,3F[32,23]

T[64,46]

[32,23]

[32,23]TEMP_1 ( 1)V m

TEMP_1 ( )V m

[32,23]

TEMP_1VlocalControl

1Fixed-Sized PA1 32m

( 1)n

tiledControl

D: one step delayT: truncate

FIFO Buffer

tiledControl

From Embedded Processor

Stage 1: TEMP_1 V F u( )n n ( 1)n( 1)n

Data Skewed

Page 17: A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing

18

Fixed-Sized NoC-PAs-based Robust SS vector co-processor

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Stage2:

Stage 2: TEMP_2 TEMP_1V V u

( )n n ( 1)n (1 )n

[32,23]TEMP_1

1,1

V

[32,23]

1,u m

T

[64,46]

[32,23]

TEMP_21,3

V

PE PE PE PE

PE PE PE PE

D

D

D

D

PE PE PE PE

D

D

DD

DDD

D

DD D

TEMP_11,1

V

TEMP_12,1

V

TEMP_1,1

Vm

TEMP_11,1

Vm

TEMP_12 ,1

Vm

TEMP_12,1

Vm

1,1u1,2u

1,3u1,u m

1, 1u m

1, 2u m

1, 3u m

1,2u m

2Fixed-Sized PA4m

TEMP_1255 1,1

Vm

TEMP_1255 2,1

Vm

TEMP_1,1

Vn

1,255 1u m

1,255 2u m

1,255 3u m

1,u n

TEMP_2V( )n n

2PA Fuu

Page 18: A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing

19

Fixed-Sized NoC-PAs-based Robust SS vector co-processor

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Stage3: 3 diag{ } . PA Fuu F

Stage 3: TEMP_2 diag{ }V V F( )n n ( 1)n( 1)n

PE

0

PE

PE

PE

tiledControl

3Fixed-Sized PA3 32m 1,1F

0

0

0

1,2F

2,1F

0

0

1,3F

2,2F

3,1F

0

1,2F m

2,2 1F m

3,2 2F m

,Fm m

V

tiledControl

TEMP_21,1

V

0

0

0

TEMP_22,1

V

TEMP_21,2

V

0

0

TEMP_22 ,1

Vm

TEMP_22 1,2

Vm

TEMP_22 2,3

Vm

TEMP_2,

Vm m

32 1,1v1,2v1,v m

,v n n

GlobalControl

full

FIFO Buffer

To Embedded Processor

V

( 1)n

RobustSS Vector

,Fm m

TEMP_2,

Vm m

( 1)V m( )V m

Page 19: A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing

20

Fixed-Sized NoC-PAs-based RBR estimator co-processor

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

4 ,1 3 , 2 2 , 3 ,

3 ,1 2 , 2 1 , 3

2 ,1 1 , 2

1 ,1

00 0

0 0 0

m n

1,1V

D

1,2V

D

1,V m

D

FIFO

1,3V

D

0

1,3V[32,23]

1,3[32,23]

T[64,46]

[32,23]

[32,23]TEMP_1 ( 1)b̂ m

TEMP_1 ( )b̂ m

[32,23]

Ω V

Fixed-Sized PA64m

( 1)n

tiledControl

D: one step delayT: truncate

tiledControl

32 1,1V1,2V1,V m

,Vn n

FIFO Buffer

V

32

32 1,11,21,m

,n n

FIFO Buffer

Ω

3232

32

localControl

From

Embedded Processor

0bRBRb̂

32RBR 1,1b̂RBR 1,2b̂

RBR 1,b̂ m

RBR ,b̂ n n

GlobalControl

full

FIFO Buffer

32

RBRb̂

ReconstructedRS Image

RBR (1)b̂RBR (2)b̂RBR(3)b̂

RBR ( )ˆ

nb

1 1k k

RBRb̂

To Embedded Processor

GlobalControl

full en

0 1,1b0 1,2b

0 1,b m

0 ,b n n

FIFO Buffer

0b

32

full en

GlobalControl

full en

GlobalControl

RBR 0ˆ b b ΩV

Page 20: A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing

21

New Perspective:VLSI-FPGA Platforms

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Novel VLSI-FPGA platform represents a new perspective for real time processing of newer RS applications.

Page 21: A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing

22School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

VLSI-FPGA Platform

Control System

EmbeddedProcessor

Spatial-temporalreorder

FPGA

BufferMemory

FIFO

Bit-LevelMPPA

Architectureu

F

Image data

RobustifiedReconstruction

operator

VLSI Co-processor

DegradedLarge-scaleRS Image

1 1k k

ReconstructedRS Image

RFS( )ˆ

jb

1 1k k

ju

Page 22: A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing

23

Performance Analysis: FPGA

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

HW co-processors → Robust SS vector RBR

estimator

Synthesis

Metrics

Slices 8158 3289

*DSP’48 144 32

^LUTs 7539 2278

Flip-Flops 6304 2788

Page 23: A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing

24

Performance Analysis: FPGA

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Implementation →Processing time (seconds)

RBR

Evaluated PC-Oriented Implementation 19.7

Proposed Efficient RBR architecture 1.26

Page 24: A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing

25

Conclusions The implementation results of the proposed NoC-PA-

oriented architecture helps to drastically reduce the overall processing time of the RBR algorithm. In fact, the presented architecture is efficiently implemented in MPSoC mode in spite of employing systems based on traditional DSPs or PC-Clusters platforms .

The implementation of the RBR algorithm using the proposed architecture takes only 1.26 seconds for the large-scale RS image reconstruction in contrast to 19.7 seconds required with the C++ implementation. Thus, the achieved processing time is approximately 16 times less than the corresponding processing time with the conventional C++ PC-based implementation.

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Page 25: A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing

26

Recent Selected Journal Papers

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

A. Castillo Atoche, D. Torres, Yuriy V. Shkvarko, “Towards Real Time Implementation of Reconstructive Signal Processing Algorithms Using Systolic Arrays Coprocessors”, JOURNAL OF SYSTEMS ARCHITECTURE (JSA), Edit. ELSEVIER, Volume 56, Issue 8, August 2010, Pages 327-339, ISSN: 1383-7621, doi:10.1016/j.sysarc.2010.05.004. JCR.

A. Castillo Atoche, D. Torres, Yuriy V. Shkvarko, “Descriptive Regularization-Based Hardware/Software Co-Design for Real-Time Enhanced Imaging in Uncertain Remote Sensing Environment”, EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING (JASP), Edit. HINDAWI, Volume 2010, 31 pages, 2010. ISSN: 1687-6172, e-ISSN: 1687-6180, doi:10.1155/ASP. JCR.

Yuriy V. Shkvarko, A. Castillo Atoche, D. Torres, “Near Real Time Enhancement of Geospatial Imagery via Systolic Implementation of Neural Network-Adapted Convex Regularization Techniques”, JOURNAL OF PATTERN RECOGNITION LETTERS, Edit. ELSEVIER, 2011. JCR. In Press

Page 26: A Multi-Processor System on Chip Architecture for Real Time Remote Sensing Data Processing

27

Thanks for your attention.

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Dr. Alejandro Castillo AtocheEmail: [email protected]