derivation of efficient fsm from polyhedral loop nests tomofumi yuki, antoine morvan, steven derrien...

21
Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1

Upload: shannon-hoare

Post on 14-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1

Derivation of Efficient FSM from Polyhedral Loop Nests

Tomofumi Yuki, Antoine Morvan, Steven Derrien

INRIA/Université de Rennes 1

1

Page 2: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1

High-Level Synthesis

Writing HDL is too costly Led to emergence of HLS tools

HLS is sensitive to input source Must be written in “HW-aware” manner

Source-to-Source transformations Common in optimizing compilers (semi-)automated exploration at HLS

stage Further enhance

productivity/performance

2

Page 3: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1

HLS Specific Transformations Not all optimizing compiler

transformations make sense in embedded context Its converse is also true

Finite State Machines is an example

for loops are preferred ingeneral purpose context

3

for i for j S0 for k S1

while (…) if (…) S0; if (…) S1; if (…) k = k+1; if (…) i=i+1; j=0;

FSMderivation

Page 4: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1

Contributions

Analytical model of Loop Pipelining Understanding when to use Nested LP w.r.t. Single Loop Pipelining

Derivation of Finite State Machines Handles imperfectly nested loops Based on polyhedral techniques

Pipelining of the control-path Computing n-states ahead Improves performance of the control-

path

4

Page 5: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1

Outline

Modeling Loop Pipelining Single Loop Pipelining Nested Loop Pipelining NLP vs SLP

FSM Derivation Evaluation Conclusion

5

Page 6: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1

Single Loop Pipelining

Overlapped execution of innermost loop

6

for i=1:M for j=1:N S(i,j);

for i=1:M for j=1:N stage0(i,j); stage1(i,j); stage2(i,j); stage3(i,j);

for i=1:M s0(i,1); s1(i,1); s0(i,2); s2(i,1); s1(i,2); … s3(i,1); s2(i,2); … s0(i,N); s3(i,2); … s1(i,N); … s2(i,N); s3(i,N);

Page 7: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1

Pipeline flush/fill Overhead

Overhead for each iteration of the outer loop

7

i=1 i=2 i=3

for i=1:M for j=1:N s0(i,j); s1(i,j); s2(i,j); s3(i,j);

under-utilized stages

Page 8: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1

Nested Loop Pipelining

“Compress” by pipelining alltogether

8

i=1i=2

i=3

for i=1:M for j=1:N s0(i,j); s1(i,j); s2(i,j); s3(i,j);

for i=1:M j=j+1; j<N s0(i,j); s1(i,j); s2(i,j); s3(i,j);

while(has_next) i,j=next(i,j) s0(i,j); s1(i,j); s2(i,j); s3(i,j);

Page 9: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1

NLP Overhead

Larger control-path FSM for loop nest, instead of a single

loop FSM for SLP is a simple check on loop

bound Hinders maximum frequency

Complex control-path may take longer than one data-path stage

Savings in flush/fill overhead must be greater than the loss in frequency

9

Page 10: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1

Modeling Trade-offs

Important parameters: Frequency Degradation due to NLP Innermost trip count Number of pipeline stages

f*: NLP frequency normalized to SLP f* = 0.9 means 10% degradation in

frequency α= #stages / trip count

larger α means large flush/fill overhead

10

Page 11: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1

When is NLP Beneficial?

11

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.2 0.12 0.24 0.36 0.48 0.60 0.72 0.84 0.96 1.08 1.20

0.4 0.14 0.28 0.42 0.56 0.70 0.84 0.98 1.12 1.27 1.41

0.6 0.16 0.32 0.48 0.64 0.80 0.96 1.12 1.28 1.45 1.59

0.8 0.18 0.36 0.54 0.72 0.90 1.08 1.27 1.45 1.61 1.79

1 0.20 0.40 0.60 0.80 1.00 1.20 1.41 1.59 1.79 2.00

1.2 0.22 0.44 0.66 0.88 1.10 1.32 1.54 1.75 1.96 2.22

1.4 0.24 0.48 0.72 0.96 1.20 1.45 1.67 1.92 2.17 2.38

1.6 0.26 0.52 0.78 1.04 1.30 1.56 1.82 2.08 2.33 2.63

1.8 0.28 0.56 0.84 1.12 1.41 1.67 1.96 2.22 2.50 2.78

2 0.30 0.60 0.90 1.20 1.49 1.79 2.08 2.38 2.70 3.03

f*: higher = less degradation

α: larger = small trip count (innermost)

Program Characteristic

(cannot change)

Improving control-path is

possible

Model speedup as a function of f* and α

Page 12: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1

Outline

Modeling Loop Pipelining FSM Derivation

Polyhedral Representation Computing Transitions State Look Ahead

Evaluation Conclusion

12

Page 13: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1

Polyhedral Representation

Represent loops as mathematical objects

13

for i = 0:N for j = 0:M S

S M

N

for i = 0:N for j = 0:i S0 for k = 0:N-i S1

S0S1

Page 14: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1

FSM Derivation

next function Find a piece-wise function that gives the

immediate successor in lexicographic order

Proposed in 1998 for low-level code generation

Direct Application to FSM Each piece = condition of transition Function = transition

Can be composed to obtain nextn

14

Page 15: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1

State Look Ahead

Pipelining the control-flow When data-path is heavily pipelined,

control-path becomes the critical path Computing n-states ahead

Allows n-stage pipelining of the control-path

15

datapath

i,j i,’j’ i”,j”

next2

datapath

i,j i’,j’

next

Page 16: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1

Other Optimizations

Merging transitions next computed can have many

transitions Some can be merged by looking at its

context

Common Sub-expressions HLS tools sometimes fail to catch

16

next(i,j) = (i,j+1) if i<N

next(i,j) = (N,j+1) if i=Nnext(i,j) = (i,j+1) if i≤N

if (a>b && c>d) A;if (a>b && e>f) B;

x = a>b;if (x && c>d) A;if (x && e>f) B;

Page 17: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1

Evaluation Methodology

Focus on control-path empty data-path (incrementing arrays) independent iterations loops with different shapes

3 versions: different pipelining SLP : innermost loop NLP: all loops FSM-LA2: while loop of FSM with next2

17

Page 18: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1

Evaluation: HLS Phase

Maximum Target Frequency

18

rect 2d rect 3d triangular 2d

triangular 3d

0

100

200

300

400

500

SLP NLP FSM-LA2

Page 19: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1

Evaluation: Synthesized Design Achieved Frequency

19

rect 2d rect 3d triangular 2d

triangular 3d

0

100

200

300

400

500

SLP NLP FSM-LA2

Page 20: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1

Conclusion

Improved FSM generation from for loops Example of HLS specific transformation State look ahead to pipeline control-path HLS tools currently lack compiler

optimizations Applied to Nested Loop Pipelining

Enlarge applicability by reducing its overhead

Future Directions Other uses of next function Other HLS-specific transformations

20

Page 21: Derivation of Efficient FSM from Polyhedral Loop Nests Tomofumi Yuki, Antoine Morvan, Steven Derrien INRIA/Université de Rennes 1 1

21