muhammad noman ashraf

36
Electrical and Computer Engineering Muhammad Noman Ashraf Optimization of Data-Flow Computations Using Canonical TED Representation M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimization of Data-Flow Computations Using Canonical TED Representation” , in IEEE Transactions on Computer-Aided design of Integrated Circuits and Systems ECE 667 Synthesis and Verification of Digital Systems Spring 2011 Slides adapted from D. Gomez-Prado,Q. Ren, M. Ciesielski, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

Upload: eldon

Post on 13-Feb-2016

50 views

Category:

Documents


0 download

DESCRIPTION

- PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Muhammad Noman Ashraf

Electrical and Computer Engineering

Muhammad Noman Ashraf

Optimization of Data-Flow Computations Using Canonical TED Representation

M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimization of Data-Flow Computations Using Canonical TED Representation” , in IEEE Transactions on Computer-Aided design of Integrated Circuits and Systems

ECE 667 Synthesis and Verification of Digital SystemsSpring 2011

Slides adapted from D. Gomez-Prado,Q. Ren, M. Ciesielski, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

Page 2: Muhammad Noman Ashraf

2Electrical and Computer Engineering

Overview Motivation TED Review Related Work TED Decomposition System TED Linearization Product Term Extraction Sum-Term Extraction Reordering DFG Generation Replacing constant multipliers by Shifters Conclusion References

Page 3: Muhammad Noman Ashraf

3Electrical and Computer Engineering

Motivation

F=a⋅ (f⋅ (g+d⋅ c)+c⋅ e⋅ g)

F=a⋅ f⋅ g+a⋅ f d⋅ c+a⋅ c⋅ e⋅ gMinimum number of operations: 5MPY, 2ADD

F=(a⋅ f)(g+d⋅ c)+(a⋅ c)⋅ e⋅ gnumber of operations: 6MPY, 2ADD

Res: 2MPY,1ADD

Res: 2MPY,1ADD

8MPY, 2ADD

12345

12

34 L=3MPY+1ADD

L = 3MPY+2ADD

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

Page 4: Muhammad Noman Ashraf

4Electrical and Computer Engineering

TED Review [Construction]

ywpwqwzux 2)(

zu

qw

(zu+qw)

+

x(zu+qw)

pw2

+

+

yw

Canonical for the given order:x,z,u,q,p,y,w

1 2w

^2 1 w

Notation: NON-LINEAR

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

Page 5: Muhammad Noman Ashraf

5Electrical and Computer Engineering

RELATED WORK

HDL Compilers• High level synthesis systems – Cyber, Spark, Catapult C – Lacks

local optimility Kernel based decomposition [Hosangadi et al, Optimizing Polynomial

Expressions by algebraic factorization and cse, IEEE Transactions 2005]

• Lacks canonicity Cut based decomposition (TED based) [Askar et al. “Data-flow

transformations using Taylor expansion diagrams,” in Proc. Des. Autom. Test Eur., 2007]

• Limitation – only applicable to TEDs with disjoint decomposition property

Page 6: Muhammad Noman Ashraf

6Electrical and Computer Engineering

Cut based decomposition (Related Work) Top down approach Apply a series of cuts (additive and multiplicative) to the edges such that it separates into two disjoint sub-graphs Different sequence of cuts results in different DFG

Sequence - A3,A1,M1,A2

Page 7: Muhammad Noman Ashraf

7Electrical and Computer Engineering

Cut based decomposition (Related Work) Top down approach Apply a series of cuts (additive and multiplicative) to the edges such that it separates into two disjoint sub-graphs Different sequence of cuts results in different DFG

Sequence – A1,A3,M1,A2

Sequence - A3,A1,M1,A2

Page 8: Muhammad Noman Ashraf

8Electrical and Computer Engineering

TED decomposition [TDS] Cut based decomposition mentioned earlier only works for TEDs with

disjoint decomposition property• Many TEDs don’t have this property

New approach – Bottom up• Identify algebraic operations and extract from the graph• Also works for TEDs without disjoint decomposition property• TED based factorization, CSE, and decomposition jointly referred asTED

decomposition Systematically involves

• Linearization• Product-term extraction• Sum-term extraction• Reordering• DFG generation

Page 9: Muhammad Noman Ashraf

9Electrical and Computer EngineeringSlide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

TDS System Overview

TED linearization

Variable ordering

TED factorization & decomposition

Constant multiplication& shifter generation

Common subexpression elimination (CSE)

TED-based Transformations

Static timing analysis

Latency optimization

Resource constraints

DFG-based Transformations

Behavioral transformations

Optimized DFG

TDS netlist

TDS netlist

Designobjectives

Designconstraints

Structural elements

FunctionalTED

StructuralDFG

TDS flow

Matrix transforms,Polynomials

C, Behavioral HDL

DFG extraction

High Level Synthesis(GAUT)

RTL VHDL

Orig

inal

DFG

HLS flow

Page 10: Muhammad Noman Ashraf

10Electrical and Computer Engineering

TED Linearization TED naturally represents polynomial in its factored form

This efficiency is missing when considering non-linear expressions

F=a2c+abc a could be factored out

split a^2 intoa1 and a2

F=a1(a2+b)c

Page 11: Muhammad Noman Ashraf

11Electrical and Computer EngineeringTED Decomposition

split w^2 intow1 and w2

TED Linearization [back to previous example]

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

Page 12: Muhammad Noman Ashraf

12Electrical and Computer Engineering

TED Linearization [Concept]

^1x ^n^0

F0 F1Fn

…..

x1^0

F0

x2

F1xn

Fn-1

Fn

^1

^0

^0

^1

^1

• split xk = x1.x2.x3…..xk , where xi =xj for all i,j

• iteratively perform splitting on high order nodes

• above substitution results in Horner form which contains minimum no. of multiplications

Page 13: Muhammad Noman Ashraf

13Electrical and Computer Engineering

Product Term Extraction

Extractable Product Term – product of variables which appear in expression only once• Can be extracted from TED without duplicating any of it’s variables

Set of nodes connected by a series of multiplicative edges only• starting and ending nodes can have incident additive edges• Starting and ending nodes can have more than one incoming or outgoing

multiplicative edge• Ending node can be terminal node 1

[TDS] recursively identify such terms by traversing the graph in a bottom-up fashion • For each node use depth first approach for including nodes in product term

Page 14: Muhammad Noman Ashraf

14Electrical and Computer Engineeringstart

u has only one * parent …YESu has only one child path …YES

z has only one * parent …YESz has only one * child path …NO

CONTINUE

BACKTRACK

zu

P1

P2

Product-Term Extraction [back to example]

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

Page 15: Muhammad Noman Ashraf

15Electrical and Computer Engineering

Sum Term Extraction

Extractable Sum Term – sum of variables which appear in expression only once• Can be extracted from TED without duplicating any of it’s variables

“Set of nodes incident to multiplicative edges joined at a single common node, such that nodes in question are connected by a chain of additive edges only”

[TDS] recursively identify such terms by traversing the graph in a bottom-up fashion • For each node, make a list of incident nodes and extract the nodes from the

list if connected by additive edges only [TDS] Uses associativity property of addition

Page 16: Muhammad Noman Ashraf

16Electrical and Computer Engineering

Keep support(irreducible)

start

S1

Sum-Term Extraction [back to example]

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

Page 17: Muhammad Noman Ashraf

17Electrical and Computer Engineering

Sum Term Extraction

Extractable Sum Term – sum of variables which appear in expression only once• Can be extracted from TED without duplicating any of it’s variables

“Set of nodes incident to multiplicative edges joined at a single common node, such that nodes in question are connected by a chain of additive edges only”

[TDS] recursively identify such terms by traversing the graph in a bottom-up fashion • For each node, make a list of incident nodes and extract the nodes from

the list if connected by additive edges only [TDS] Uses associativity property of addition

Page 18: Muhammad Noman Ashraf

18Electrical and Computer Engineering

Example to illustrate Associativity*

S1=b+d

S2=a+c

Page 19: Muhammad Noman Ashraf

19Electrical and Computer Engineering

Stop when TED isIrreducible.

Now generate DFG – (to be explained later)

If Sum term extraction results in more product terms, go back

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

Sum-Term Extraction [cont. – back to example]

Page 20: Muhammad Noman Ashraf

20Electrical and Computer Engineering

P3

P4

P5 S3Stop when TED isIrreducible.

S2

Reordering [Back to previous example -> Iteration 2 extraction]

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

Page 21: Muhammad Noman Ashraf

22Electrical and Computer Engineering

DFG Generation and Optimization

Transform each irreducible TED into simple DFG• Additive edge -> addition operation• Multiplicative edge -> multiplication operation• Break multiple operands operations into chain of operations

[TDS] maintain a hash table for DFG nodes keyed by the corresponding function • Helps in reusing the node, if same function/expression found again• Captures redundancy due to poor variable order during factorization

DFG is not unique• Can be restructured and balanced to minimize cost

Page 22: Muhammad Noman Ashraf

23Electrical and Computer Engineering

Data Flow Graph

L=2MPY+2ADD

Req 3MPY, 2ADD

total: 5MPY, 3ADD

Reordering cost

12

34

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

Page 23: Muhammad Noman Ashraf

24Electrical and Computer Engineering

S2

P3

P4 S3

L=2MPY+2ADD

Req 3MPY, 2ADD

Reordering [-> Iteration 3 extraction]

Cost involvesReordering of variableExtractionDFG generationAnnotating Latency and resource requirements

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

Page 24: Muhammad Noman Ashraf

25Electrical and Computer Engineering

1234

F

12345

total: 4MPY , 3ADD

F = S3 = P4+P3 = w⋅S2+x⋅P1 = w⋅(q+S1)+x⋅(z⋅u) = w⋅(q+P2+y)+x⋅z⋅ u = w⋅(q+p⋅w+y)+x⋅z⋅u

L=2MPY+2ADD L=2MPY+3ADDReq 1MPY,1ADD

1×1×1+

1+1+

Reordering cost

L=2MPY+2ADD

Req 2MPY, 1ADD

Previous cost

L=2MPY+2ADD

Req=3MPY,2ADD

Generating and evaluating new Data Flow Graph [Iteration 3]

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

Page 25: Muhammad Noman Ashraf

26Electrical and Computer Engineering

Through reordering all cases can be obtained

1234

Reordering [-> Iteration 4 extraction,DFG generation]

Design Space Exploration

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

Page 26: Muhammad Noman Ashraf

27Electrical and Computer Engineering

Replacing constant multipliers* By shifters

• Transform constant multiplications into shifters, while considering factorization involving shifters

Steps• Represent constant in CSD format – Use shift variable Li (instead of 2i for shifting i bits• Generate TED with shift variables, linearize it and perform decomposition• Replace terms involving shift variables (Li) by i-bit shifters

7a + 6bL3(a+b) - L.b - a ((a+b)<<3) – (a+

(b<<1))(L3-1)a+(L3-L)b

Page 27: Muhammad Noman Ashraf

28Electrical and Computer EngineeringSlide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

TDS – TED Decomposition System RECAP Read in the CDFG file (cdfg) or polynomial expression (poly) or using pre-coded

DSP transforms (tr) Translate into functional TED (dfg2ted) and structural elements (comparators

etc.) Linearize its data path (linearize) Iterate

• Iterate• Product term extraction• Sum term extraction

• Reorder to minimize latency (reorder) Set of irreducible TEDs Produce Final DFG (ted2dfg)and annotate back the CDFG file (write) Data flow and computation intensive designs - DSP

Design Space Exploration

Page 28: Muhammad Noman Ashraf

29Electrical and Computer Engineering

Conclusion

Results in the paper show 15% Latency improvement and 7% area reduction when using DFG generated from TDS instead of using KBD• Far better results when compared to original DFG

TDS – front end to GAUT

Fundamental limitation – decomposition dependent upon variable reordering which is an expensive operation

Page 29: Muhammad Noman Ashraf

30Electrical and Computer Engineering

REFERENCES M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel

Boutillon, “Optimization of Data-Flow Computations Using Canonical TED Representation”, in IEEE Transactions on Computer-Aided design of Integrated Circuits and Systems

M. Ciesielski, S. Askar, D. Gomez-Prado, J. Guillot, and E. Boutillon, “Data-flow transformations using Taylor expansion diagrams,” in Proc. Des. Autom. Test Eur., 2007, pp. 455–460

TDS—TED-Based Dataflow Decomposition System, Univ. Massachusetts,Amherst, MA. [Online]. Available: http://www.ecs.umass.edu/ece/labs/vlsicad/tds.html

Page 30: Muhammad Noman Ashraf

31Electrical and Computer Engineering

QUESTIONS?

Page 31: Muhammad Noman Ashraf

32Electrical and Computer Engineering

Experiment Setup*

TED linearization

Variable ordering

TED factorization & decomposition

Constant multiplication& shifter generation

Common subexpression elimination (CSE)

TED-based Transformations

Static timing analysis

Latency optimization

Resource constraints

DFG-based Transformations

Behavioral transformations

Optimized DFG

TDS netlist

TDS netlist

Designobjectives

Designconstraints

Structural elements

FunctionalTED

StructuralDFG

TDS flow

Matrix transforms,Polynomials

C, Behavioral HDL

DFG extraction

High Level Synthesis(GAUT)

RTL VHDL

Orig

inal

DFG

HLS flow

KBD ORIGINAL

TED

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

Page 32: Muhammad Noman Ashraf

33Electrical and Computer Engineering

Results*

KBD

KBDKBD

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

Page 33: Muhammad Noman Ashraf

34Electrical and Computer Engineering

Results: Quintic Spline*

KBD

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

Page 34: Muhammad Noman Ashraf

35Electrical and Computer Engineering

Results: Quartic spline*

KBD

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

Page 35: Muhammad Noman Ashraf

36Electrical and Computer Engineering

Improvement over KBD and Original*

KBD

KBD

Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

Page 36: Muhammad Noman Ashraf

37Electrical and Computer Engineering