the esw paradigm
DESCRIPTION
The ESW Paradigm. Manoj Franklin & Guirndar S. Sohi 05/10/2002. Observations. Large exploitable ILP, theoretically Close instructions dependent; parallelism possible further down stream Centralized resources is bad Minimizing comm cost is important. What about others?. Dataflow model - PowerPoint PPT PresentationTRANSCRIPT
The ESW Paradigm
Manoj Franklin & Guirndar S. Sohi
05/10/2002
Observations
• Large exploitable ILP, theoretically
• Close instructions dependent; parallelism possible further down stream
• Centralized resources is bad
• Minimizing comm cost is important
What about others?
- Dataflow model+ most general- unconventional PL paradigm- comm cost can be high
- SS, VLIW (sequential)+ temporal locality- large centralized HW- compiler too dumb- not scalable
- ESW = dataflow + sequential
Design Goals
• Decentralized resources
• Minimize wasted execution
• Speculative memory address disambiguation
• realizability Replace large dynamic window with manysmall ones
How it works
• Basic window– Single entry, loop-free, call-free block– Equal, superset or subset of basic block
• Execute basic windows in parallel
• Multiple independent stages– Complete with branch prediction, L1 cache, re
g file…etc.
Dist Inst SupplyOptimization:Snooping on L2-L1Cache traffic
Dist Inter-Inst Comm
Observation:1. Register use mostly within
basic block2. The rest in subsequent
blocks
Architecture:1. dist. future file2. create/use masks for
dep. check
Dist DMem SystemProblem:1. Addr. space large, c
an’t create/use mask
2. Need to maintain consistency between multiple copies
Solution: ARB
ARB
Q. What happens when ARB’s full?
- Bits cleared upon commit- Restart stages when dependency violated- When load, forward values from ARB if already exists
Simulation Environment
• Custom simulator using MIPS R2000 pipeline
• Up to 2 inst fetch/decode/issued/ per IE
• Up to 32 inst per basic window
• 4K word L1 cache, 64KB L2 DM Cache (100% hit rate, what??)
• 3-bit counter branch prediction
ResultsOptimizations:1. Moving up instruction2. Expand basic window (in eqntott and
expresso)
Basic window <= basic block
But is 100% cache hit rate reasonable?
Discussion
• Compare this to CMP? RAW?
• Does the trade-off strike a balance?
New Results (1)
In order execution
New Results (2)
Out of order execution