optimizations for a simulator construction system supporting reusable components david a. penry and...
TRANSCRIPT
Optimizations for a Simulator Construction System Supporting Reusable Components
David A. Penry and David I. AugustDavid A. Penry and David I. August
The Liberty Architecture Research GroupThe Liberty Architecture Research Group
Princeton UniversityPrinceton University
2
Architectural Simulator
Architectural Exploration
Architectural options are Architectural options are studied using simulatorsstudied using simulators
More iterations = better More iterations = better decisionsdecisions
Need fast path to simulatorNeed fast path to simulator
Need fast simulator Need fast simulator
ArchitectureOptions
3
Simulator Construction Systems
Reuse simulator Reuse simulator infrastructureinfrastructure
Architectural SimulatorInstance
Architecture Description
Simulator Builder
But still must be able to But still must be able to reuse descriptionsreuse descriptions
Structural compositionStructural compositionMedium-grained Medium-grained components components Standard communication Standard communication contractscontractsHigh parameterizabilityHigh parameterizabilitySeparation of concernsSeparation of concerns
4
The Reuse Penalty
Reusability leads to a speed penalty: Reusability leads to a speed penalty: more component instancesmore component instancesmore signalsmore signalsmore general codemore general code
Therefore: Therefore: reusable systems are often slowerreusable systems are often slower
How can we mitigate the reuse penalty?How can we mitigate the reuse penalty?
5
Liberty Simulation Environment
Simulator construction system for high reuseSimulator construction system for high reuse
Two-tiered specificationsTwo-tiered specificationsLeaf module templates in CLeaf module templates in CNetlisting language for instantiation and customizationNetlisting language for instantiation and customization
Three-signal standard communications contract with Three-signal standard communications contract with overrides (overrides (control functionscontrol functions))
Code is generatedCode is generated
Enable
Data
Ack
6
Contrast: SystemC
Simulator construction libraries (C++)Simulator construction libraries (C++)
Partially supports reuse:Partially supports reuse:++ Structural composition Structural composition++ Module granularity varies Module granularity varies? Communications contracts by convention? Communications contracts by convention-- Low parameterizability Low parameterizability-- Separation of concerns Separation of concerns
Description is a C++ programDescription is a C++ program
7
A C
D
B
A C
D
B
A C
D
B
A C
D
B
A C
D
B
A C
D
B
A C
D
B
Models of Computation
System C uses Discrete Event (DE)System C uses Discrete Event (DE)
LSE uses Heterogenous Synchronous Reactive (HSR)LSE uses Heterogenous Synchronous Reactive (HSR)Edwards (1997)Edwards (1997)Unparsed code blocks (black boxes)Unparsed code blocks (black boxes)Values begin Values begin unresolvedunresolved and resolve monotonically and resolve monotonicallyChaotic schedulingChaotic scheduling
8
Potential HSR Benefits vs. DE
Static schedules possibleStatic schedules possible
Lower per-signal overheadLower per-signal overhead
Use of Use of unresolvedunresolved value to avoid redundant computation value to avoid redundant computation
A C
D
B
9
Three models of a 4-way out-of-order microprocessorThree models of a 4-way out-of-order microprocessor
SystemC using custom speed-optimized componentsSystemC using custom speed-optimized componentsLSE model using custom speed-optimized componentsLSE model using custom speed-optimized componentsLSE model using standard reusable componentsLSE model using standard reusable components
9 benchmarks (CPU 2000/MediaBench)9 benchmarks (CPU 2000/MediaBench)See paper for compiler, etc.See paper for compiler, etc.
Experimental methodology
481383Custom LSE
42348911Reusable LSE
32714Custom SystemC
Non-edge signalsSignalsInstancesModel
10
Custom LSE vs. SystemC
Custom LSE outperforms custom SystemCCustom LSE outperforms custom SystemCReduction in overheadReduction in overheadUse of Use of unresolvedunresolved signal value signal valueStatic instantiation and code specializationStatic instantiation and code specialization
Dynamic schedule for bothDynamic schedule for both
Model Cycles/sec Speedup
Custom SystemC 53722 -
Custom LSE 155111 2.88
11
Reuse Penalty
Reusable model suffers large reuse penalty (0.26)Reusable model suffers large reuse penalty (0.26)Many more signalsMany more signalsMany more non-edge signalsMany more non-edge signalsMore componentsMore components
All dynamic schedulesAll dynamic schedules
Model Cycles/sec Speedup
Custom SystemC 53722 -
Custom LSE 155111 2.88
Reusable LSE 40649 0.76
12
Creating Static Schedules
Edward’s algorithm (1997)Edward’s algorithm (1997)Construct a signal dependency graphConstruct a signal dependency graphBreak into strongly-connected components (SCC). Break into strongly-connected components (SCC). Schedule in topological orderSchedule in topological orderPartition each SCC into a head and tailPartition each SCC into a head and tailSchedule tail recursively, then repeat head (any order)Schedule tail recursively, then repeat head (any order)and tail’s scheduleand tail’s scheduleCoalesceCoalesce
A C
D
B
13
Creating Static Schedules
Edward’s algorithm (1997)Edward’s algorithm (1997)Construct a signal dependency graphConstruct a signal dependency graphBreak into strongly-connected components (SCC). Break into strongly-connected components (SCC). Schedule in topological orderSchedule in topological orderPartition each SCC into a head and tailPartition each SCC into a head and tailSchedule tail recursively, then repeat head (any order)Schedule tail recursively, then repeat head (any order)and tail’s scheduleand tail’s scheduleCoalesceCoalesce
A C
DB
1 23
4
1
2
4
3
14
Creating Static Schedules
Edward’s algorithm (1997)Edward’s algorithm (1997)Construct a signal dependency graphConstruct a signal dependency graphBreak into strongly-connected components (SCC). Break into strongly-connected components (SCC). Schedule in topological orderSchedule in topological orderPartition each SCC into a head and tailPartition each SCC into a head and tailSchedule tail recursively, then repeat head (any order)Schedule tail recursively, then repeat head (any order)and tail’s scheduleand tail’s scheduleCoalesceCoalesce
1
2
4
3
ab
c
Schedule: a b c
A C
DB
1 23
4
15
Creating Static Schedules
Edward’s algorithm (1997)Edward’s algorithm (1997)Construct a signal dependency graphConstruct a signal dependency graphBreak into strongly-connected components (SCC).Break into strongly-connected components (SCC). Schedule in topological order Schedule in topological orderPartition each SCC into a head and tailPartition each SCC into a head and tailSchedule tail recursively, then repeat head (any order)Schedule tail recursively, then repeat head (any order)and tail’s scheduleand tail’s scheduleCoalesceCoalesce
1
2
4
3
ab
c
Schedule: 1 b 4
HT
A C
DB
1 23
4
16
Creating Static Schedules
Edward’s algorithm (1997)Edward’s algorithm (1997)Construct a signal dependency graphConstruct a signal dependency graphBreak into strongly-connected components (SCC). Break into strongly-connected components (SCC). Schedule in topological orderSchedule in topological orderPartition each SCC into a head and tailPartition each SCC into a head and tailSchedule tail recursively, then repeat head (any order)Schedule tail recursively, then repeat head (any order)and tail’s scheduleand tail’s scheduleCoalesceCoalesce
1
2
4
3
ab
c
Schedule: 1 2 3 2 4
HT
A C
DB
1 23
4
17
Creating Static Schedules
Edward’s algorithm (1997)Edward’s algorithm (1997)Construct a signal dependency graphConstruct a signal dependency graphBreak into strongly-connected components (SCC). Break into strongly-connected components (SCC). Schedule in topological orderSchedule in topological orderPartition each SCC into a head and tailPartition each SCC into a head and tailSchedule tail recursively, then repeat head (any order)Schedule tail recursively, then repeat head (any order)and tail’s scheduleand tail’s scheduleCoalesceCoalesce
1
2
4
3
AB
C
HT
Choosing an optimal partition is exponential
A C
DB
1 23
4
Schedule: 1 2 3 2 4 A B C B (D)
18
Dynamic sub-schedule embedding
SCCs arise due to incomplete informationSCCs arise due to incomplete information
““Optimal” schedules are optimal w.r.t. informationOptimal” schedules are optimal w.r.t. information
““Optimal” schedule may be Optimal” schedule may be worseworse than dynamic than dynamic
A
B C
When an SCC is “too big”, just schedule that section When an SCC is “too big”, just schedule that section dynamicallydynamically
19
Dependency information enchancement
In practice, we see big SCCsIn practice, we see big SCCs
Peek in the black boxPeek in the black boxSimple parsing of communication overrides (control functions)Simple parsing of communication overrides (control functions)Can ask user to tell about internal dependenciesCan ask user to tell about internal dependenciesNot too painful because it is reusedNot too painful because it is reused
A
B C
20
Evaluation of Information Enhancement
Control function parsing more useful aloneControl function parsing more useful aloneNot principally through schedulingNot principally through scheduling
It is important to have both kinds of enhancementIt is important to have both kinds of enhancement
Optimization Cycles/sec SpeedupNo static scheduling 40649 -
With control function parsing 47850 1.18
With internal dependencies 41306 1.02
With both 57046 1.40
21
Reuse Penalty Revisited
Reuse penalty mitigated in part Reuse penalty mitigated in part
Model Cycles/sec Speedup Build time (s)
Custom SystemC 53722 - 49.1
Custom LSE 155111 2.88 15.4Reusable LSE w/o optimization
40649 0.76 33.9
Reusable LSE with optimization
57046 1.06 34.4
Reusable LSE model 6% faster than custom SystemC
22
Conclusions
A tradeoff exists between speed and reuseA tradeoff exists between speed and reuse
The simulator construction system can helpThe simulator construction system can helpHigher base speed makes reuse penalty less painfulHigher base speed makes reuse penalty less painful
Optimizations are possible with HSR modelOptimizations are possible with HSR modelAbility of scheduler adapt to information available is powerfulAbility of scheduler adapt to information available is powerfulThis adaptation is not possible with DEThis adaptation is not possible with DE
You can have high reuse at reasonable speedsYou can have high reuse at reasonable speeds
23
Future Work
Release of LSERelease of LSEFall 2003Fall 2003http://liberty.princeton.eduhttp://liberty.princeton.edu
Hybrid model of computationHybrid model of computationEmbed HSR in DE, DE in HSREmbed HSR in DE, DE in HSRAutomatic extraction of HSR portions from DEAutomatic extraction of HSR portions from DE