hierarchical design of parallel architectures for signal processing applications patrice quinton,...

60
Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI http://www.irisa.fr/cosi/ALPHA

Upload: marcus-mclaughlin

Post on 13-Dec-2015

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

Hierarchical Design of Parallel Architectures for Signal Processing Applications

Patrice Quinton, Tanguy RissetIRISA - COSIhttp://www.irisa.fr/cosi/ALPHA

Page 2: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 2

Outline

What is MMAlpha? Example of design flow The Alpha language Structured scheduling Performance Conclusion

Page 3: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 3

What is MMAlpha?

A public domain silicon compiler for loop nests

FPGA

ASICVHDL

for i = 1 to n do

for k = 1 to m do

y[i,k] = y[i,k-1] + w[i,k]*x[i-k]

Page 4: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 4

What is MMAlpha?

User controlled design process

UniformizationScheduling/Mapping

HDL derivation

MMAlpha

Vhdl1

Alpha

Vhdl2Vhdl3

Design script easily reused

Page 5: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 5

Target applications

Data intensive applications

Fir Adaptive LMS Kalman filtering

Signal processing Motion estimators

2D-filters

Multimedia

DNA sequencing

Bio-Informatics

Page 6: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 6

MMAlpha highlights

Compilation of loop nests to parallel circuits

by means of the polyhedral model

for i = 1 to n do

for k = 1 to m do

y[i,k] = y[i,k-1] + w[i,k]*x[i-k]

Page 7: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 7

MMAlpha highlights

Semi-automatic design space exploration

UniformizationScheduling/Mapping1

HDL derivation

MMAlpha

Alpha

Scheduling/Mapping2

Page 8: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 8

MMAlpha highlights

Hierarchical design methodology

UniformizationScheduling/Mapping

HDL derivation

MMAlpha

Alpha

Alpha

Page 9: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 9

MMAlpha highlights

Multi-target output for codesign

VHDL

for i = 1 to n do

for k = 1 to m do

y[i,k] = y[i,k-1] + w[i,k]*x[i-k] C

Page 10: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 10

Outline

What is MMAlpha? Example of design flow The Alpha language Structured scheduling Performance Conclusion

Page 11: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 11

The algorithm

Page 12: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 12

The algorithm

Alignment matrix

Page 13: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 13

Equations

Recurrence

for one point of the matrix

Page 14: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 14

Iteration space

Page 15: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 15

Uniformization

Page 16: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 16

Uniformization

Page 17: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 17

Scheduling

Page 18: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 18

Mapping

Page 19: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 19

Architecture

Page 20: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 20

Architecture

Page 21: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 21

Control

Page 22: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 22

Control

Page 23: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 23

Outline

What is MMAlpha? Example of design flow The Alpha language Structured scheduling Performance Conclusion

Page 24: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 24

Alpha code for dot product

b(2)

b(1)

b(4)b(3)

a(2)

a(1)

a(4)a(3)

k

0

c

system dot:{N|1<=N} (a,b:{k|1<=k<=N}

of integer) returns (c: integer);

var Acc:{k|0<=k<=N} of integer;

let

Acc[k] = case

{|k=0}:0[];

{|k>0}:Acc[k-1]+a[k]*b[k];

esac;

c[]=Acc[N];

tel;

Page 25: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 25

Matrix product using dot product

{i,j,k|1<=i,j,k<=N}: A[i,j,k] = A[i,j-1,k-1];

{i,j,k|1<=i,j,k<=N}: B[i,j,k] = B[i-1,j,k-1];

use {i,j|1<=i<=N} dot[N](A,B) returns (C);

B(1,1,2)B(1,1,1)

B(1,1,4)B(1,1,3)

A(1,1,2)A(1,1,1)

A(1,1,4)A(1,1,3)

0

c[1,1]

i

kj

Page 26: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 26

Outline

What is MMAlpha? Example of design flow The Alpha language Structured scheduling Performance Conclusion

Page 27: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 27

Classical linear schedule in MMAlpha design flow

kekbkf

kfke

kfkd

kbkc

kdkakb

kcka

Nk

log:

2sin:

3/1:

1:

)1(*2:

1:

1

F

E

D

C

B

A

to For1

4

2

6

10

14

Duration

Page 28: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 28

Classical linear schedule in MMAlpha design flow

2

A B

C

F

D

E

1 4

1

2

0 1

0

1

0

2

14 10

6

Page 29: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 29

Classical linear schedule in MMAlpha design flow

schedule[ scheduleType -> sameLinearPart, durations -> {0,0,1,4,2,6,10,14,0,0}, addConstraints -> {/* linear constraints */}]

Page 30: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 30

Multi-dimensional scheduling

Virtual clock counter is a vector (hours, minutes,….)

t1

t22

111

2

1

2

)1(t

ttNt

t

tT

N

N

Page 31: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 31

Multi-dimensional scheduling

Useful for • Fast prototyping of parallelism in complex

applicationsSVD (S. Robert, 1997)Kalman filtering (A. Mozipo, 1998)

• Efficient code generation Quilleré 1999

• Structured scheduling

Page 32: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 32

Extension to structured scheduling

Structured systems of recurrence equations (Dinechin97)

Example:• matrix product can be expressed as

independent dot products.

Question: • Provided we have a layout for the dot

product, can we use it for matrix product?

Page 33: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 33

Example: Matrix-Matrix product

{i,j,k|1<=i,j,k<=N}: A[i,j,k] = A[i,j-1,k-1];

{i,j,k|1<=i,j,k<=N}: B[i,j,k] = B[i-1,j,k-1];

{i,j,k|1<=i,j,k<=N}:

C[i,j,k] = C[i,j,k-1]*A[i,j,k]*B[i,j,k];

{i,j|1<=i,j<=N}: c[i,j] = C[i,j,N];

Page 34: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 34

Example: Matrix-Matrix product

B(1,1,2)

B(1,1,1)

B(1,1,4)

B(1,1,3)

A(1,1,2)

A(1,1,1)

A(1,1,4)

A(1,1,3)

0

c[1,1]

i

kj

Page 35: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 35

Alpha code for dot product

b(2)

b(1)

b(4)b(3)

a(2)

a(1)

a(4)a(3)

k

0

c

system dot:{N|1<=N} (a,b:{k|1<=k<=N}

of integer) returns (c: integer);

var Acc:{k|0<=k<=N} of integer;

let

Acc[k] = case

{|k=0}:0[];

{|k>0}:Acc[k-1]+a[k]*b[k];

esac;

c[]=Acc[N];

tel;

Page 36: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 36

Matrix product using dot product

{i,j,k|1<=i,j,k<=N}: A[i,j,k] = A[i,j-1,k-1];

{i,j,k|1<=i,j,k<=N}: B[i,j,k] = B[i-1,j,k-1];

use {i,j|1<=i<=N} dot[N](A,B) returns (C);

B(1,1,2)B(1,1,1)

B(1,1,4)B(1,1,3)

A(1,1,2)A(1,1,1)

A(1,1,4)A(1,1,3)

0

c[1,1]

i

kj

Page 37: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 37

Structured dependence graph

A B

C

N)Nj,(i,

:,1,

dot

Njiji

A B

ACC

k)j,i,kj,(i,

:,,1,,

Nkjikjik)j,i,kj,(i,

:,,1,,

Nkjikji

1)-kj,i,kj,(i,

:,,1,,

Nkjikji

C

),,j(i,

:,1,

Nji

Njiji

SDG:DG:

Page 38: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 38

What is a structured scheduling?

Schedule each computations such that• dependencies are respected• Timing functions are positive

All instances of a given subsystem refer to the same schedule

Schedule is built from the structured dependence graph.

Page 39: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 39

Necessary form of a structured scheduling

This form can be imposed in term of linear constraints.

)( returns )( dot 1 use CA,BN<=i,j<=N}{i,j |

)(),(),,( kTjilkjiT dota

MM

A

)(),(),,( kTjilkjiT dotb

MM

B

()),(),( dotc

MM

cTjiljiT

Page 40: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 40

Schedule of the dot product

, )(, )( kkTkkT dotb

dota

1(), 1)( NTkkT dotc

dotAccb(2)

b(1)

b(4)b(3)

a(2)

a(1)

a(4)a(3)

k

0

c

Page 41: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 41

Structured 1D schedule

, ),,( kjikjiT MMA

.1),( NjijiT MMC

,),,( kjikjiT MMB

use {i,j|1<=i<=N} dot[N](A,B) returns (C);

Page 42: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 42

2D schedule

, ),,(

ki

jkjiT MM

A

, ),,(

ki

jkjiT MM

B

, 1

),(

Ni

jjiT MM

c

use {i,j|1<=i<=N} dot[N](A,B) returns (C);

Page 43: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 43

How to interpret this?

B(1,1,2)

B(1,1,1)

B(1,1,4)

B(1,1,3)

A(1,1,2)

A(1,1,1)

A(1,1,4)

A(1,1,3)

0

c[1,1]

i

kj

Page 44: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 44

j=1

B(1,1,2)

B(1,1,1)

B(1,1,4)

B(1,1,3)

A(1,1,2)

A(1,1,1)

A(1,1,4)

A(1,1,3)

0

c[1,1]

i

kj

Page 45: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 45

j=1

B(1,1,2)

B(1,1,1)

B(1,1,4)

B(1,1,3)

A(1,1,2)

A(1,1,1)

A(1,1,4)

A(1,1,3)

0

c[1,1]

i

kj

Page 46: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 46

j=1

B(1,1,2)

B(1,1,1)

B(1,1,4)

B(1,1,3)

A(1,1,2)

A(1,1,1)

A(1,1,4)

A(1,1,3)

0

c[1,1]

i

kj

Page 47: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 47

j=1

B(1,1,2)

B(1,1,1)

B(1,1,4)

B(1,1,3)

A(1,1,2)

A(1,1,1)

A(1,1,4)

A(1,1,3)

0

c[1,1]

i

kj

Page 48: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 48

j=2

i

k j

Page 49: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 49

j=3

i

kj

Page 50: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 50

Structured schedules for MM

Dot product:

Matrix product:

, )(, )( kkTkkT dotb

dota

)( returns )( dot 1 use cA,BN<=i,j<=N}{i,j |

, ),,( kjikjiT MMA

1(), 1)( NTkkT dotc

dotAcc

.1),( NjijiT MMC

,),,( kjikjiT MMB

, ),,(

ki

jkjiT MM

A

, ),,(

ki

jkjiT MM

B

, 1

),(

Ni

jjiT MM

cOr

Page 51: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 51

Matrix product re-using hardware

A1,* A3,* C3,*

A2,* C2,*C1,*

B*,*

Acc

+

*

A

B

C

Dot product

Matrix product

, ),,(

ki

jkjiT MM

A

, ),,(

ki

jkjiT MM

B

, 1

),(

Ni

jjiT MM

c

Page 52: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 52

Advantage of structured scheduling

Preserves designer’s structuring Re-uses hardware Constraints are linear

uses a classical schedule tool

• Reduces the schedule computation complexity

• Improves readability of the schedule information

Page 53: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 53

Outline

What is MMAlpha? Example of design flow The Alpha language Structured scheduling Performance Conclusion

Page 54: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 54

Experiments

Vertex method: Simplex of Mathematica Farkas method: Pip software Evaluation of structured vs « flat »

scheduling

Page 55: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 55

Test set (vertex method)Algorithm #Variables #Constraints Time (seconds)

Full adder 35 119 2,0Fuzzy set (flat) 201 698 268,0Fuzzy set (structured) 70 213 7,1Mux 21 77 0,7Givens 45 277 13,9Dot product 13 32 0,1Kalman (flat) 249 1177 1083,5Kalman SQRT (flat) 170 874 399,6Kalman SQRT (structured) 85 314 20,1Matrix vector 16 48 0,3Matrix multiplication 19 101 1,3Matrix multiplication 2 28 120 2,0Matrix multiplication 3 (structured) 26 92 0,6Matrix multiplication 3 (flat) 56 310 22,0Neural network (structured) 82 223 7,4Neural network (flat) 144 404 56,7Gauss-Seidel 26 100 1,3

Page 56: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 56

#Constraints vs #Variables

50 100 150 200 250#variables

200

400

600

800

1000

1200#constraints Vertex method

Page 57: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 57

Schedule time vs #constraints

200 400 600 800 1000 1200#constraints

200

400

600

800

1000

1200#times Vertex method

Page 58: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 58

#Constraints vs #Variables (Farkas method)

50 100 150 200#variables

50

100

150

200

#constraints Farkas method

Page 59: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 59

Schedule time vs #constraints

200 400 600 800 1000 1200#constraints

100

200

300

400

500

600#times Farkas method

Page 60: Hierarchical Design of Parallel Architectures for Signal Processing Applications Patrice Quinton, Tanguy Risset IRISA - COSI

July 2001 Samos - 2001 60

Flat vs structured schedule

Time flatschedule

Time structschedule

Latency flatschedule

Latency structschedule

FullAdder

1.43 0.26 2+4b 1+2b

Matmult 5.65 0.39 6+2M 2+2M

Fuzzylogic

474.72 8.43 15+3M 7+3M

SVD 7.75 4N(M+6)

Kalman 511.27 41.67 26+5M 15+5M

NeuralNet

58.32 4.80 13+2M+N 5+2M+N