spark: a parallelizing high-level synthesis...

46
Center for Embedded Computer Systems University of California, Irvine and San Diego http://www.cecs.uci.edu/~spark SPARK: A Parallelizing High-Level Synthesis Framework Supported by Semiconductor Research Corporation & Intel Inc Sumit Gupta Rajesh Gupta, Nikil Dutt, Alex Nicolau

Upload: others

Post on 05-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

Center for Embedded Computer SystemsUniversity of California, Irvine and San Diego

http://www.cecs.uci.edu/~spark

SPARK: A Parallelizing High-Level Synthesis Framework

Supported by Semiconductor Research Corporation & Intel Inc

Sumit Gupta

Rajesh Gupta, Nikil Dutt, Alex Nicolau

Page 2: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

2Copyright Sumit Gupta 2003

System Level SynthesisSystem Level Synthesis

System LevelModel

TaskAnalysis

HW/SWPartitioning

ASIC

ProcessorCore

Memory

FPGA

I/O

HardwareBehavioralDescription

SoftwareBehavioralDescription

SoftwareCompiler

HighLevel

Synthesis

Page 3: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

3Copyright Sumit Gupta 2003

High Level SynthesisHigh Level Synthesis

M e m o r y

ALUCon

trol

Data path

d = e - f g = h + i

If NodeT Fc

x = a + bc = a < b

j = d x gl = e + x

x = a + b;c = a < b;if (c) thend = e – f;

elseg = h + i;

j = d x g;l = e + x;

Transform behavioral descriptions to RTL/gate level

From C to CDFG to Architecture

Page 4: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

4Copyright Sumit Gupta 2003

High Level SynthesisHigh Level Synthesis

M e m o r y

ALUCon

trol

Data path

d = e - f g = h + i

If NodeT Fc

x = a + bc = a < b

j = d x gl = e + x

x = a + b;c = a < b;if (c) thend = e – f;

elseg = h + i;

j = d x g;l = e + x;

Transform behavioral descriptions to RTL/gate level

From C to CDFG to Architecture

Problem # 1Problem # 1 :: Poor quality of HLS results beyond Poor quality of HLS results beyond straightstraight--line behavioral descriptionsline behavioral descriptionsPoor/No controllability of the HLSPoor/No controllability of the HLSresultsresults

Problem # 2Problem # 2 ::

Page 5: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

5Copyright Sumit Gupta 2003

OutlineOutlinenn Motivation and Background Motivation and Background nn Our Approach to Our Approach to ParallelizingParallelizing HighHigh--Level SynthesisLevel Synthesisnn Code Transformations Techniques for PHLSCode Transformations Techniques for PHLS

nn Parallelizing Transformations Parallelizing Transformations nn Dynamic TransformationsDynamic Transformations

nn The PHLS Framework and Experimental ResultsThe PHLS Framework and Experimental Resultsnn Multimedia and Image Processing ApplicationsMultimedia and Image Processing Applicationsnn Case Study: Intel Instruction Length DecoderCase Study: Intel Instruction Length Decoder

nn Conclusions and Future WorkConclusions and Future Work

Page 6: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

6Copyright Sumit Gupta 2003

HighHigh--level Synthesislevel Synthesisnn WellWell--researched area: from early 1980’sresearched area: from early 1980’s

nn Renewed interest due to new system level design methodologies Renewed interest due to new system level design methodologies nn Large number of synthesis optimizations have been proposed Large number of synthesis optimizations have been proposed

nn Either Either operation leveloperation level: algebraic transformations on DSP codes: algebraic transformations on DSP codesnn or or logic levellogic level: Don’t Care based control optimizations: Don’t Care based control optimizationsnn In contrast, compiler transformations operate at both operation In contrast, compiler transformations operate at both operation level level

(fine(fine--grain) and source level (coarsegrain) and source level (coarse--grain) grain) nn Parallelizing Compiler TransformationsParallelizing Compiler Transformations

nn Different optimization objectives and cost models than HLSDifferent optimization objectives and cost models than HLSØØOur aimOur aim: Develop Synthesis and Parallelizing Compiler : Develop Synthesis and Parallelizing Compiler

Transformations that are “useful” for HLS Transformations that are “useful” for HLS nn Beyond scheduling results: in Beyond scheduling results: in Circuit Area and DelayCircuit Area and Delaynn For large designs with For large designs with complex control flowcomplex control flow (nested (nested

conditionals/loops)conditionals/loops)

Page 7: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

7Copyright Sumit Gupta 2003

Our Approach: Our Approach: ParallelizingParallelizing HLS (PHLS)HLS (PHLS)

C Input VHDLOutput

Original CDFG

Optimized CDFG

Scheduling& Binding

Source-Level CompilerTransformations

Scheduling Compiler & Dynamic Transformations

nn Optimizing Compiler and Parallelizing Compiler transformations Optimizing Compiler and Parallelizing Compiler transformations applied at applied at SourceSource--levellevel (Pre(Pre--synthesis) and during synthesis) and during SchedulingSchedulingnn SourceSource--level code refinement using level code refinement using PrePre--synthesissynthesis transformationstransformationsnn Code Restructuring by Code Restructuring by SpeculativeSpeculative Code MotionsCode Motionsnn Operation Operation replicationreplication to improve concurrencyto improve concurrencynn DynamicDynamic transformations: transformations: exploit new opportunities during exploit new opportunities during

schedulingscheduling

Page 8: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

8Copyright Sumit Gupta 2003

PHLS Transformations PHLS Transformations Organized into Four GroupsOrganized into Four Groups

1.1. PrePre--synthesissynthesis: Loop: Loop--invariant code motions, Loop invariant code motions, Loop unrolling, CSEunrolling, CSE

2.2. SchedulingScheduling: Speculative Code Motions, Multi: Speculative Code Motions, Multi--cycling, Operation Chaining, Loop Pipeliningcycling, Operation Chaining, Loop Pipelining

3.3. DynamicDynamic: Transformations applied dynamically : Transformations applied dynamically during scheduling: Dynamic CSE, Dynamic Copy during scheduling: Dynamic CSE, Dynamic Copy Propagation, Dynamic Branch BalancingPropagation, Dynamic Branch Balancing

4.4. Basic Compiler TransformationsBasic Compiler Transformations: Copy : Copy Propagation, Dead Code EliminationPropagation, Dead Code Elimination

Page 9: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

9Copyright Sumit Gupta 2003

Speculative Code MotionsSpeculative Code Motions

+

+If Node

T FReverse

Speculation

Conditional Speculation

Speculation

Across HierarchicalBlocks

_

a

b

c

Operation Movement to reduce impact of Programming Style on Quality of HLS Results

Early Condition Execution

Evaluates conditionsAs soon as possible

Page 10: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

10Copyright Sumit Gupta 2003

Dynamic TransformationsDynamic Transformationsnn Called Called “dynamic”“dynamic” since they are applied during since they are applied during

scheduling (versus a pass before/after scheduling)scheduling (versus a pass before/after scheduling)nn Dynamic Branch BalancingDynamic Branch Balancingnn Increase the scope of code motionsIncrease the scope of code motionsnn Reduce impact of programming style on HLS resultsReduce impact of programming style on HLS results

nn Dynamic CSE and Dynamic Copy PropagationDynamic CSE and Dynamic Copy Propagationnn Exploit the Operation movement and duplication due Exploit the Operation movement and duplication due

to speculative code motionsto speculative code motionsnn Create new opportunities to apply these transformations Create new opportunities to apply these transformations

nn Reduce the number of operations Reduce the number of operations

Page 11: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

11Copyright Sumit Gupta 2003

Dynamic Branch BalancingDynamic Branch Balancing

If NodeT F

_ e

BB 0

BB 2BB 1

BB 3

BB 4

+ a

+ b

_ c

_ dS0

S1

S2

S3

++Resource Allocation

Original Design

If NodeT F

_ e

BB 0

BB 2BB 1

BB 3

BB 4

+a

+b

_ c _ d

Scheduled Design

UnbalancedConditional

Longest PathLongest Path

Page 12: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

12Copyright Sumit Gupta 2003

Insert New Scheduling Step in Shorter BranchInsert New Scheduling Step in Shorter Branch

If NodeT F

_ e

BB 0

BB 2BB 1

BB 3

BB 4

+a

+b

_ c _ d

If NodeT F

_ e

BB 0

BB 2BB 1

BB 3

BB 4

+ a

+ b

_ c

_ dS0

S1

S2

S3

++Resource Allocation

Original Design Scheduled Design

Page 13: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

13Copyright Sumit Gupta 2003

Insert New Scheduling Step in Shorter BranchInsert New Scheduling Step in Shorter Branch

If NodeT F

BB 0

BB 2BB 1

BB 3

BB 4

+a

+b

_ c _ d

If NodeT F

_ e

BB 0

BB 2BB 1

BB 3

BB 4

+ a

+ b

_ c

_ dS0

S1

S2

S3

++Resource Allocation

e_ _e

Original Design Scheduled Design

Dynamic Branch Balancing inserts new scheduling stepsDynamic Branch Balancing inserts new scheduling stepsnn Enables Conditional SpeculationEnables Conditional Speculationnn Leads to further code compactionLeads to further code compaction

Page 14: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

14Copyright Sumit Gupta 2003

Dynamic CSEDynamic CSE: Going beyond Traditional CSE: Going beyond Traditional CSE

a = b + c;cd = b < c;if (cd)

d = b + c;else

e = g + h;

C Description

BB 2 BB 3

BB 1

d = b + c

BB 4

a = b + c

e = g + h

HTG Representation

If NodeT F

BB 0

BB 2 BB 3

BB 1

d = a

BB 4

a = b + c

e = g + h

After Traditional CSE

If NodeT F

BB 0

Page 15: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

15Copyright Sumit Gupta 2003

a = b + c;cd = b < c;if (cd)

d = b + c;else

e = g + h;

C Description

BB 2 BB 3

BB 1

d = b + c

BB 4

a = b + c

e = g + h

HTG Representation

If NodeT F

BB 0

BB 2 BB 3

BB 1

d = a

BB 4

a = b + c

e = g + h

After Traditional CSE

If NodeT F

BB 0

nn We use notion of We use notion of DominanceDominance of Basic Blocksof Basic Blocksnn Basic block Basic block BBiBBi dominates dominates BBjBBj if all control paths from if all control paths from

the the initialinitial basic block of the design graph leading to basic block of the design graph leading to BBjBBj goes through goes through BBiBBi

nn We can eliminate an operation We can eliminate an operation opjopj in in BBjBBj using common using common expression in expression in opiopi if if BBiBBi dominates dominates BBjBBj

Dynamic CSEDynamic CSE: Going beyond Traditional CSE: Going beyond Traditional CSE

Page 16: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

16Copyright Sumit Gupta 2003

New Opportunities for “New Opportunities for “DynamicDynamic” CSE” CSEDue to Code MotionsDue to Code Motions

BB 2 BB 3

BB 1

a = b + c

BB 6 BB 7

BB 5

d = b + c

BB 4

BB 8

Scheduler decides to Speculate

BB 2 BB 3

BB 1

a = dcse

BB 6 BB 7

BB 5

d = b + c

BB 4

BB 8

dcse = b + c BB 0BB 0

CSE CSE not possiblenot possible since BB2 since BB2 does not dominate BB6does not dominate BB6

CSE CSE possiblepossible now since now since BB0 does not dominate BB6BB0 does not dominate BB6

Page 17: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

17Copyright Sumit Gupta 2003

BB 2 BB 3

BB 1

a = b + c

BB 6 BB 7

BB 5

d = b + c

BB 4

BB 8

BB 2 BB 3

BB 1

a = dcse

BB 6 BB 7

BB 5

d = dcse

BB 4

BB 8

dcse = b + c BB 0BB 0Scheduler decides to Speculate

New Opportunities for “New Opportunities for “DynamicDynamic” CSE” CSEDue to Code MotionsDue to Code Motions

CSE CSE not possiblenot possible since BB2 since BB2 does not dominate BB6does not dominate BB6

CSE CSE possiblepossible now since now since BB0 does not dominate BB6BB0 does not dominate BB6

If scheduler moves or duplicates an operation op, apply CSE on remaining operations using op

If scheduler moves or duplicates an operation If scheduler moves or duplicates an operation opop, apply CSE on , apply CSE on remaining operations using remaining operations using opop

Page 18: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

18Copyright Sumit Gupta 2003

Condition Speculation & Dynamic CSECondition Speculation & Dynamic CSE

BB 1 BB 2

BB 0

BB 5 BB 6

BB 4

a = b + c

BB 3

BB 7

d = b + c

BB 1 BB 2

BB 0

BB 5 BB 6

BB 4

a = a'

BB 3

BB 7

a' = b + c a' = b + c

d = b + c BB 8BB 8

Scheduler decides to

ConditionallySpeculate

Page 19: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

19Copyright Sumit Gupta 2003

Condition Speculation & Dynamic CSECondition Speculation & Dynamic CSE

BB 1 BB 2

BB 0

BB 5 BB 6

BB 4

a = b + c

BB 3

BB 7

d = b + c

BB 1 BB 2

BB 0

BB 5 BB 6

BB 4

a = a'

BB 3

BB 7

a' = b + c a' = b + c

d = a'BB 8 BB 8

Scheduler decides to

ConditionallySpeculate

Page 20: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

20Copyright Sumit Gupta 2003

Condition Speculation & Dynamic CSECondition Speculation & Dynamic CSE

BB 1 BB 2

BB 0

BB 5 BB 6

BB 4

a = b + c

BB 3

BB 7

d = b + c

BB 1 BB 2

BB 0

BB 5 BB 6

BB 4

a = a'

BB 3

BB 7

a' = b + c a' = b + c

d = a'BB 8 BB 8

Scheduler decides to

ConditionallySpeculate

nn Use the notion of dominance Use the notion of dominance by groups of basic blocksby groups of basic blocksn All Control Paths leading up

to BB8 come from either BB1 or BB2: =>=> BB1 and BB1 and BB2 BB2 together dominatetogether dominate BB8BB8

Page 21: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

21Copyright Sumit Gupta 2003

Loop Shifting: An Incremental Loop Loop Shifting: An Incremental Loop Pipelining TechniquePipelining Technique

BB 0

b +_ d

LoopExit

Loop Node

BB 3

BB 2

BB 1

BB 4

BB 0

b +_ d

LoopExit

Loop Node

BB 3

BB 2

BB 1

BB 4

a + c_

a + c_

LoopLoopShiftingShifting

a + c_

Page 22: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

22Copyright Sumit Gupta 2003

Loop Shifting: An Incremental Loop Loop Shifting: An Incremental Loop Pipelining TechniquePipelining Technique

BB 0

a +

b

c_

+_ d

LoopExit

Loop Node

BB 3

BB 2

BB 1

BB 4

BB 0

b +_ d

LoopExit

Loop Node

BB 3

BB 2

BB 1

BB 4

a +

c_

a + c_

LoopLoopShiftingShifting

CompacCompac--tiontion

Page 23: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

23Copyright Sumit Gupta 2003

SPARKSPARKHigh Level High Level Synthesis Synthesis

FrameworkFramework

Page 24: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

24Copyright Sumit Gupta 2003

SPARK Parallelizing HLS FrameworkSPARK Parallelizing HLS Frameworknn C input and C input and SynthesizableSynthesizable RTL VHDL outputRTL VHDL outputnn ToolTool--box box of Transformations and Heuristicsof Transformations and Heuristics

nn Each of these can be developed independently of the otherEach of these can be developed independently of the othernn Script based Script based control over transformations & heuristicscontrol over transformations & heuristicsnn Hierarchical Intermediate Representation (Hierarchical Intermediate Representation (HTGsHTGs))

nn Retains structural information about design (conditional blocks,Retains structural information about design (conditional blocks,loops)loops)

nn Enables efficient and structured application of transformationsEnables efficient and structured application of transformationsnn Complete HLS tool:Complete HLS tool: Does Binding, Control Synthesis and Does Binding, Control Synthesis and

Backend VHDL generationBackend VHDL generationnn Interconnect Minimizing Resource BindingInterconnect Minimizing Resource Binding

nn Enables Enables Graphical VisualizationGraphical Visualization of Design description of Design description and intermediate resultsand intermediate results

nn 100,000+ lines of C++ code100,000+ lines of C++ code

Page 25: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

25Copyright Sumit Gupta 2003

Synthesizable CSynthesizable Cnn ANSIANSI--C front end from Edison Design Group (EDG)C front end from Edison Design Group (EDG)nn Features of C not supported for synthesisFeatures of C not supported for synthesis

nn PointersPointersnn However, Arrays and passing by reference However, Arrays and passing by reference areare supportedsupported

nn Recursive Function CallsRecursive Function Callsnn GotosGotos

nn Features for which support has not been implementedFeatures for which support has not been implementednn MultiMulti--dimensional arraysdimensional arraysnn StructsStructsnn Continue, BreaksContinue, Breaks

nn Hardware component generated for each function Hardware component generated for each function nn A called function is instantiated as a hardware component in A called function is instantiated as a hardware component in

calling functioncalling function

Page 26: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

26Copyright Sumit Gupta 2003

HTGHTG DFGDFGGraph VisualizationGraph Visualization

Page 27: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

27Copyright Sumit Gupta 2003

Resource Utilization GraphResource Utilization Graph

SchedulingScheduling

Page 28: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

28Copyright Sumit Gupta 2003

Example of Example of ComplexComplex HTGHTGnn Example of a real design: Example of a real design:

MPEGMPEG--1 pred2 function1 pred2 functionnn Just for demonstration; you are Just for demonstration; you are

not expected to read the textnot expected to read the text

nn Multiple nested loops and Multiple nested loops and conditionalsconditionals

Page 29: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

29Copyright Sumit Gupta 2003

ExperimentsExperimentsnn Results presented here forResults presented here for

nn PrePre--synthesis transformationssynthesis transformationsnn Speculative Code MotionsSpeculative Code Motionsnn Dynamic CSEDynamic CSE

nn We used We used SPARKSPARK to synthesize designs derived from to synthesize designs derived from several industrial designsseveral industrial designsnn MPEGMPEG--1, MPEG1, MPEG--2, GIMP Image Processing software2, GIMP Image Processing softwarenn Case StudyCase Study of Intel Instruction Length Decoderof Intel Instruction Length Decoder

nn Scheduling ResultsScheduling Resultsnn Number of States in FSMNumber of States in FSMnn Cycles on Longest Path Cycles on Longest Path

through Designthrough Design

nn VHDL: Logic Synthesis VHDL: Logic Synthesis nn Critical Path Length (ns)Critical Path Length (ns)nn Unit AreaUnit Area

Page 30: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

30Copyright Sumit Gupta 2003

Target ApplicationsTarget Applications

1501503535221111GIMP GIMP tilertiler

2602606161441818MPEGMPEG--2 2 dp_framedp_frame

2872874545661111MPEGMPEG--1 1 pred2pred2

12312317172244MPEGMPEG--1 1 pred1pred1

# of # of OperationsOperations

# Non# Non--Empty Empty Basic BlocksBasic Blocks

# of # of LoopsLoops

# of Ifs# of IfsDesignDesign

Page 31: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

31Copyright Sumit Gupta 2003

MPEG-1 Pred1 Function

0

0.2

0.4

0.6

0.8

1

1.2

Longest Path(lcyc)

Critical Path(cns)

Total Delay (c*l) Unit Area

+ Speculative Code Motions

+ Pre-Synthesis Transforms

+ Dynamic CSE

MPEG-1 Pred2 Function

0

0.2

0.4

0.6

0.8

1

1.2

Longest Path(lcyc)

Critical Path(cns)

Total Delay (c*l) Unit Area

Scheduling & Logic Synthesis ResultsScheduling & Logic Synthesis Results

Non-speculative CMs: Within BBs & Across Hier Blocks

42%

10%

36%

36%

8%

39%

Page 32: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

32Copyright Sumit Gupta 2003

MPEG-1 Pred1 Function

0

0.2

0.4

0.6

0.8

1

1.2

Longest Path(lcyc)

Critical Path(cns)

Total Delay (c*l) Unit Area

+ Speculative Code Motions

+ Pre-Synthesis Transforms

+ Dynamic CSE

MPEG-1 Pred2 Function

0

0.2

0.4

0.6

0.8

1

1.2

Longest Path(lcyc)

Critical Path(cns)

Total Delay (c*l) Unit Area

Scheduling & Logic Synthesis ResultsScheduling & Logic Synthesis Results

Non-speculative CMs: Within BBs & Across Hier Blocks

42%

10%

36%

36%

8%

39%

Overall: 63Overall: 63--66 % improvement in Delay66 % improvement in Delay

Almost constant Area Almost constant Area

Page 33: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

33Copyright Sumit Gupta 2003

Non-speculative CMs: Within BBs & Across Hier Blocks

+ Speculative Code Motions

+ Pre-Synthesis Transforms

+ Dynamic CSE

Scheduling & Logic Synthesis ResultsScheduling & Logic Synthesis ResultsMPEG-2 DpFrame Function

0

0.2

0.4

0.6

0.8

1

1.2

Longest Path(lcyc)

Critical Path(cns)

Total Delay (c*l) Unit Area

GIMP Tiler Function

0

0.2

0.4

0.6

0.8

1

1.2

Longest Path(lcyc)

Critical Path(cns)

Total Delay (c*l) Unit Area

14%

20%1%

33%

41%

52%

Page 34: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

34Copyright Sumit Gupta 2003

Non-speculative CMs: Within BBs & Across Hier Blocks

+ Speculative Code Motions

+ Pre-Synthesis Transforms

+ Dynamic CSE

Scheduling & Logic Synthesis ResultsScheduling & Logic Synthesis ResultsMPEG-2 DpFrame Function

0

0.2

0.4

0.6

0.8

1

1.2

Longest Path(lcyc)

Critical Path(cns)

Total Delay (c*l) Unit Area

GIMP Tiler Function

0

0.2

0.4

0.6

0.8

1

1.2

Longest Path(lcyc)

Critical Path(cns)

Total Delay (c*l) Unit Area

14%

20%1%

33%

41%

52%

Overall: 48Overall: 48--76 % improvement in Delay76 % improvement in Delay

Almost constant Area Almost constant Area

Page 35: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

35Copyright Sumit Gupta 2003

Case Study: Case Study: IntelIntel Instruction Length DecoderInstruction Length Decoder

Stream ofInstructions

Instruction Length Decoder

FirstInsn

SecondInsn

ThirdInstruction

Instruction BufferInstruction Buffer

Page 36: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

36Copyright Sumit Gupta 2003

Example Design: ILD Block from IntelExample Design: ILD Block from Intel

nn Case Study: A design derived from the Case Study: A design derived from the Instruction Length Instruction Length DecoderDecoder of the Intel Pentiumof the Intel Pentium®® class of processorsclass of processorsnn Decodes length of instructions streaming from Decodes length of instructions streaming from

memorymemorynnHas to look at up to 4 bytes at a timeHas to look at up to 4 bytes at a time

nn Has to execute in Has to execute in one cycleone cycle and decode about 64 bytes and decode about 64 bytes of instructionsof instructions

ØØ Characteristics of Microprocessor functional blocksCharacteristics of Microprocessor functional blocksnn Low Latency: Single or Dual cycle implementationLow Latency: Single or Dual cycle implementationnn Consist of several small computationsConsist of several small computationsnn Intermix of control and data logicIntermix of control and data logic

Page 37: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

37Copyright Sumit Gupta 2003

Basic Instruction Length Decoder:Basic Instruction Length Decoder:Initial DescriptionInitial Description

Length Contribution 1

Need Byte 4 ?

Need Byte 2 ?

Need Byte 3 ?

Byt

e 1

Byt

e 2

Byt

e 3

Byt

e 4

=+

+

+

Total Length Of Instruction

Length Contribution 2

Length Contribution 3

Length Contribution 4

Page 38: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

38Copyright Sumit Gupta 2003

Instruction Length Decoder: Instruction Length Decoder: Decoding 2Decoding 2ndnd InstructionInstruction

Length Contribution 1

Need Byte 4 ?

Need Byte 2 ?

Need Byte 3 ?

Byt

e 3

Byt

e 4

=+

+

+

Total Length Of Insn

Length Contribution 2

Length Contribution 3

Length Contribution 4B

yte

5

Byt

e 6First

Insn

After decoding the length of an instruction

v Start looking from next bytev Again examine up to 4 bytes to determine length of next instruction

Page 39: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

39Copyright Sumit Gupta 2003

Instruction Length Decoder:Instruction Length Decoder:ParallelizedParallelized DescriptionDescription

Need Byte 4 ?

Need Byte 2 ?

Need Byte 3 ?

Byt

e 1

Byt

e 2

Byt

e 3

Byt

e 4

Length Contribution 1

Length Contribution 2

Length Contribution 3

Length Contribution 4

=+

+

+

Total Length Of Instruction

v Speculatively calculate the length contribution of all 4 bytes at a timev Determine actual total length of instruction based on this data

Page 40: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

40Copyright Sumit Gupta 2003

ILD: Extracting Further ParallelismILD: Extracting Further ParallelismB

yte

1

Byt

e 2

Byt

e 3

Byt

e 4

Byte 1Insn.Len Calc

Byte 3Insn.Len Calc

Byte 5Insn.Len Calc

Byte 2Insn.Len Calc

Byte 4Insn.Len Calc

Byt

e 5

v Speculativelycalculate length of instructions assuming a new instruction starts at each bytev Do this calculation for all bytes in parallelv Traverse from 1st

byte to last v Determine length of instructions starting from the 1st till the lastv Discard unused calculations

Page 41: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

41Copyright Sumit Gupta 2003

Initial:Initial: MultiMulti--Cycle Cycle SequentialSequential ArchitectureArchitecture

Length Contribution 1

Need Byte 4 ?

Need Byte 3 ?

Byt

e 1

Byt

e 2

Byt

e 3

Byt

e 4

Length Contribution 2

Length Contribution 3

Length Contribution 4

Need Byte 2 ?

Page 42: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

42Copyright Sumit Gupta 2003

ILD Synthesis: Resulting ArchitectureILD Synthesis: Resulting ArchitectureSpeculate Operations,

Fully Unroll Loop,Eliminate Loop Index

Variable

Page 43: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

43Copyright Sumit Gupta 2003

ILD Synthesis: Resulting ArchitectureILD Synthesis: Resulting ArchitectureSpeculate Operations,

Fully Unroll Loop,Eliminate Loop Index

Variable

Multi-cycle Sequential

Architecture

Multi-cycle Sequential

Architecture

Single cycle Parallel

Architecture

Single cycle Parallel

Architecture

nn Our toolbox approach enables us to develop a script to Our toolbox approach enables us to develop a script to synthesize applications from different domainssynthesize applications from different domains

nn Final design looks close to the actual implementation done Final design looks close to the actual implementation done by Intelby Intel

Page 44: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

44Copyright Sumit Gupta 2003

ConclusionsConclusionsnn Parallelizing code transformations enable a new range of Parallelizing code transformations enable a new range of

HLS transformationsHLS transformationsnn Provide the needed improvement in quality of HLS results Provide the needed improvement in quality of HLS results

nn Possible to be competitive against manually designed circuits. Possible to be competitive against manually designed circuits. nn Can enable productivity improvements in microelectronic designCan enable productivity improvements in microelectronic design

nn Built a synthesis system with a range of code transformationsBuilt a synthesis system with a range of code transformationsnn Platform for applying Coarse and FinePlatform for applying Coarse and Fine--grain Optimizationsgrain Optimizationsnn ToolTool--box approach where transformations and heuristics can be box approach where transformations and heuristics can be

developeddevelopednn Enables the designer to find the right Enables the designer to find the right synthesis scriptsynthesis script for for

different application domainsdifferent application domainsnn Performance improvements of 60Performance improvements of 60--70 % across a number of designs70 % across a number of designsnn We have shown its effectiveness on an Intel designWe have shown its effectiveness on an Intel design

Page 45: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

45Copyright Sumit Gupta 2003

AcknowledgementsAcknowledgementsnn AdvisorsAdvisorsnn Professors Rajesh Gupta, Professors Rajesh Gupta, NikilNikil DuttDutt, Alex , Alex NicolauNicolau

nn Contributors to SPARK frameworkContributors to SPARK frameworknn Nick Nick SavoiuSavoiu, , MehrdadMehrdad ReshadiReshadi, , SunwooSunwoo KimKim

nn Intel Strategic CAD Labs (SCL)Intel Strategic CAD Labs (SCL)nn Timothy Timothy KamKam, Mike , Mike KishinevskyKishinevsky

nn Supported by Semiconductor Research Supported by Semiconductor Research Corporation and Intel SCLCorporation and Intel SCL

Page 46: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and

Thank YouThank You