optimizations for a simulator construction system supporting reusable components david a. penry and...

24
Optimizations for a Simulator Construction System Supporting Reusable Components David A. Penry and David I. August David A. Penry and David I. August The Liberty Architecture Research Group The Liberty Architecture Research Group Princeton University Princeton University

Upload: parker-yarnall

Post on 15-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Optimizations for a Simulator Construction System Supporting Reusable Components

David A. Penry and David I. AugustDavid A. Penry and David I. August

The Liberty Architecture Research GroupThe Liberty Architecture Research Group

Princeton UniversityPrinceton University

2

Architectural Simulator

Architectural Exploration

Architectural options are Architectural options are studied using simulatorsstudied using simulators

More iterations = better More iterations = better decisionsdecisions

Need fast path to simulatorNeed fast path to simulator

Need fast simulator Need fast simulator

ArchitectureOptions

3

Simulator Construction Systems

Reuse simulator Reuse simulator infrastructureinfrastructure

Architectural SimulatorInstance

Architecture Description

Simulator Builder

But still must be able to But still must be able to reuse descriptionsreuse descriptions

Structural compositionStructural compositionMedium-grained Medium-grained components components Standard communication Standard communication contractscontractsHigh parameterizabilityHigh parameterizabilitySeparation of concernsSeparation of concerns

4

The Reuse Penalty

Reusability leads to a speed penalty: Reusability leads to a speed penalty: more component instancesmore component instancesmore signalsmore signalsmore general codemore general code

Therefore: Therefore: reusable systems are often slowerreusable systems are often slower

How can we mitigate the reuse penalty?How can we mitigate the reuse penalty?

5

Liberty Simulation Environment

Simulator construction system for high reuseSimulator construction system for high reuse

Two-tiered specificationsTwo-tiered specificationsLeaf module templates in CLeaf module templates in CNetlisting language for instantiation and customizationNetlisting language for instantiation and customization

Three-signal standard communications contract with Three-signal standard communications contract with overrides (overrides (control functionscontrol functions))

Code is generatedCode is generated

Enable

Data

Ack

6

Contrast: SystemC

Simulator construction libraries (C++)Simulator construction libraries (C++)

Partially supports reuse:Partially supports reuse:++ Structural composition Structural composition++ Module granularity varies Module granularity varies? Communications contracts by convention? Communications contracts by convention-- Low parameterizability Low parameterizability-- Separation of concerns Separation of concerns

Description is a C++ programDescription is a C++ program

7

A C

D

B

A C

D

B

A C

D

B

A C

D

B

A C

D

B

A C

D

B

A C

D

B

Models of Computation

System C uses Discrete Event (DE)System C uses Discrete Event (DE)

LSE uses Heterogenous Synchronous Reactive (HSR)LSE uses Heterogenous Synchronous Reactive (HSR)Edwards (1997)Edwards (1997)Unparsed code blocks (black boxes)Unparsed code blocks (black boxes)Values begin Values begin unresolvedunresolved and resolve monotonically and resolve monotonicallyChaotic schedulingChaotic scheduling

8

Potential HSR Benefits vs. DE

Static schedules possibleStatic schedules possible

Lower per-signal overheadLower per-signal overhead

Use of Use of unresolvedunresolved value to avoid redundant computation value to avoid redundant computation

A C

D

B

9

Three models of a 4-way out-of-order microprocessorThree models of a 4-way out-of-order microprocessor

SystemC using custom speed-optimized componentsSystemC using custom speed-optimized componentsLSE model using custom speed-optimized componentsLSE model using custom speed-optimized componentsLSE model using standard reusable componentsLSE model using standard reusable components

9 benchmarks (CPU 2000/MediaBench)9 benchmarks (CPU 2000/MediaBench)See paper for compiler, etc.See paper for compiler, etc.

Experimental methodology

481383Custom LSE

42348911Reusable LSE

32714Custom SystemC

Non-edge signalsSignalsInstancesModel

10

Custom LSE vs. SystemC

Custom LSE outperforms custom SystemCCustom LSE outperforms custom SystemCReduction in overheadReduction in overheadUse of Use of unresolvedunresolved signal value signal valueStatic instantiation and code specializationStatic instantiation and code specialization

Dynamic schedule for bothDynamic schedule for both

Model Cycles/sec Speedup

Custom SystemC 53722 -

Custom LSE 155111 2.88

11

Reuse Penalty

Reusable model suffers large reuse penalty (0.26)Reusable model suffers large reuse penalty (0.26)Many more signalsMany more signalsMany more non-edge signalsMany more non-edge signalsMore componentsMore components

All dynamic schedulesAll dynamic schedules

Model Cycles/sec Speedup

Custom SystemC 53722 -

Custom LSE 155111 2.88

Reusable LSE 40649 0.76

12

Creating Static Schedules

Edward’s algorithm (1997)Edward’s algorithm (1997)Construct a signal dependency graphConstruct a signal dependency graphBreak into strongly-connected components (SCC). Break into strongly-connected components (SCC). Schedule in topological orderSchedule in topological orderPartition each SCC into a head and tailPartition each SCC into a head and tailSchedule tail recursively, then repeat head (any order)Schedule tail recursively, then repeat head (any order)and tail’s scheduleand tail’s scheduleCoalesceCoalesce

A C

D

B

13

Creating Static Schedules

Edward’s algorithm (1997)Edward’s algorithm (1997)Construct a signal dependency graphConstruct a signal dependency graphBreak into strongly-connected components (SCC). Break into strongly-connected components (SCC). Schedule in topological orderSchedule in topological orderPartition each SCC into a head and tailPartition each SCC into a head and tailSchedule tail recursively, then repeat head (any order)Schedule tail recursively, then repeat head (any order)and tail’s scheduleand tail’s scheduleCoalesceCoalesce

A C

DB

1 23

4

1

2

4

3

14

Creating Static Schedules

Edward’s algorithm (1997)Edward’s algorithm (1997)Construct a signal dependency graphConstruct a signal dependency graphBreak into strongly-connected components (SCC). Break into strongly-connected components (SCC). Schedule in topological orderSchedule in topological orderPartition each SCC into a head and tailPartition each SCC into a head and tailSchedule tail recursively, then repeat head (any order)Schedule tail recursively, then repeat head (any order)and tail’s scheduleand tail’s scheduleCoalesceCoalesce

1

2

4

3

ab

c

Schedule: a b c

A C

DB

1 23

4

15

Creating Static Schedules

Edward’s algorithm (1997)Edward’s algorithm (1997)Construct a signal dependency graphConstruct a signal dependency graphBreak into strongly-connected components (SCC).Break into strongly-connected components (SCC). Schedule in topological order Schedule in topological orderPartition each SCC into a head and tailPartition each SCC into a head and tailSchedule tail recursively, then repeat head (any order)Schedule tail recursively, then repeat head (any order)and tail’s scheduleand tail’s scheduleCoalesceCoalesce

1

2

4

3

ab

c

Schedule: 1 b 4

HT

A C

DB

1 23

4

16

Creating Static Schedules

Edward’s algorithm (1997)Edward’s algorithm (1997)Construct a signal dependency graphConstruct a signal dependency graphBreak into strongly-connected components (SCC). Break into strongly-connected components (SCC). Schedule in topological orderSchedule in topological orderPartition each SCC into a head and tailPartition each SCC into a head and tailSchedule tail recursively, then repeat head (any order)Schedule tail recursively, then repeat head (any order)and tail’s scheduleand tail’s scheduleCoalesceCoalesce

1

2

4

3

ab

c

Schedule: 1 2 3 2 4

HT

A C

DB

1 23

4

17

Creating Static Schedules

Edward’s algorithm (1997)Edward’s algorithm (1997)Construct a signal dependency graphConstruct a signal dependency graphBreak into strongly-connected components (SCC). Break into strongly-connected components (SCC). Schedule in topological orderSchedule in topological orderPartition each SCC into a head and tailPartition each SCC into a head and tailSchedule tail recursively, then repeat head (any order)Schedule tail recursively, then repeat head (any order)and tail’s scheduleand tail’s scheduleCoalesceCoalesce

1

2

4

3

AB

C

HT

Choosing an optimal partition is exponential

A C

DB

1 23

4

Schedule: 1 2 3 2 4 A B C B (D)

18

Dynamic sub-schedule embedding

SCCs arise due to incomplete informationSCCs arise due to incomplete information

““Optimal” schedules are optimal w.r.t. informationOptimal” schedules are optimal w.r.t. information

““Optimal” schedule may be Optimal” schedule may be worseworse than dynamic than dynamic

A

B C

When an SCC is “too big”, just schedule that section When an SCC is “too big”, just schedule that section dynamicallydynamically

19

Dependency information enchancement

In practice, we see big SCCsIn practice, we see big SCCs

Peek in the black boxPeek in the black boxSimple parsing of communication overrides (control functions)Simple parsing of communication overrides (control functions)Can ask user to tell about internal dependenciesCan ask user to tell about internal dependenciesNot too painful because it is reusedNot too painful because it is reused

A

B C

20

Evaluation of Information Enhancement

Control function parsing more useful aloneControl function parsing more useful aloneNot principally through schedulingNot principally through scheduling

It is important to have both kinds of enhancementIt is important to have both kinds of enhancement

Optimization Cycles/sec SpeedupNo static scheduling 40649 -

With control function parsing 47850 1.18

With internal dependencies 41306 1.02

With both 57046 1.40

21

Reuse Penalty Revisited

Reuse penalty mitigated in part Reuse penalty mitigated in part

Model Cycles/sec Speedup Build time (s)

Custom SystemC 53722 - 49.1

Custom LSE 155111 2.88 15.4Reusable LSE w/o optimization

40649 0.76 33.9

Reusable LSE with optimization

57046 1.06 34.4

Reusable LSE model 6% faster than custom SystemC

22

Conclusions

A tradeoff exists between speed and reuseA tradeoff exists between speed and reuse

The simulator construction system can helpThe simulator construction system can helpHigher base speed makes reuse penalty less painfulHigher base speed makes reuse penalty less painful

Optimizations are possible with HSR modelOptimizations are possible with HSR modelAbility of scheduler adapt to information available is powerfulAbility of scheduler adapt to information available is powerfulThis adaptation is not possible with DEThis adaptation is not possible with DE

You can have high reuse at reasonable speedsYou can have high reuse at reasonable speeds

23

Future Work

Release of LSERelease of LSEFall 2003Fall 2003http://liberty.princeton.eduhttp://liberty.princeton.edu

Hybrid model of computationHybrid model of computationEmbed HSR in DE, DE in HSREmbed HSR in DE, DE in HSRAutomatic extraction of HSR portions from DEAutomatic extraction of HSR portions from DE

24

Other optimizations

Improved block coalescingImproved block coalescingSee paperSee paper

Code specializationCode specializationImplementation of APIs depends upon environmentImplementation of APIs depends upon environment