![Page 1: Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,](https://reader035.vdocuments.mx/reader035/viewer/2022062417/55165310550346a2698b4c99/html5/thumbnails/1.jpg)
Runahead Execution:
An Alternative to Very Large Instruction Windows for Out-of-order
Processors
Onur Mutlu, The University of Texas at Austin
Jared Start, Microprocessor Research, Intel Labs
Chris Wilkerson, Desktop Platforms Group, Intel Corp
Yale N. Patt, The University of Texas at Austin
Presented by: Mark Teper
![Page 2: Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,](https://reader035.vdocuments.mx/reader035/viewer/2022062417/55165310550346a2698b4c99/html5/thumbnails/2.jpg)
Outline The Problem Related Work The Idea: Runahead Execution Details Results Issues
![Page 3: Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,](https://reader035.vdocuments.mx/reader035/viewer/2022062417/55165310550346a2698b4c99/html5/thumbnails/3.jpg)
Brief Overview Instruction
Window: Set of in-order
instructions that have not yet been commited
Scheduling Window Set of unexecuted
instructions needed to selected for execution
What can go wrong?
…
Program FlowInstruction Window
Scheduling Windows
ExecutionUnits
ExecutionUnits
![Page 4: Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,](https://reader035.vdocuments.mx/reader035/viewer/2022062417/55165310550346a2698b4c99/html5/thumbnails/4.jpg)
…
The Problem
Program Flow
Unexecuted Instruction Executing Instruction
Long Running InstructionCommited Instruction
Instruction Window
![Page 5: Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,](https://reader035.vdocuments.mx/reader035/viewer/2022062417/55165310550346a2698b4c99/html5/thumbnails/5.jpg)
Filling the Instruction Window
Better
IPC
![Page 6: Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,](https://reader035.vdocuments.mx/reader035/viewer/2022062417/55165310550346a2698b4c99/html5/thumbnails/6.jpg)
Related Work Caches:
Alter size and structure of caches
Attempt to reduce unnecessary memory reads
Prefetching: Attempt to fetch data into
nearby cache before needed
Hardware & software techniques
Other techniques: Waiting instruction buffer
(WIB) Long-latency block
retirements
CPU
L1 Cache 1 Cycle
L2 Cache 10 Cycles
Memory 1000 cycles
![Page 7: Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,](https://reader035.vdocuments.mx/reader035/viewer/2022062417/55165310550346a2698b4c99/html5/thumbnails/7.jpg)
RunAhead Execution Continue executing instructions during long stalls
Disregard results once data is available
…
Program Flow
Unexecuted Instruction Executing Instruction
Long Running InstructionCommited Instruction
Instruction Window
![Page 8: Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,](https://reader035.vdocuments.mx/reader035/viewer/2022062417/55165310550346a2698b4c99/html5/thumbnails/8.jpg)
Benefits Acts as a high accuracy prefetcher
Software prefetchers have less information Hardware prefetchers can’t analyze code as
well
Biase predictors
Makes use of cycles that are otherwise wasted
![Page 9: Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,](https://reader035.vdocuments.mx/reader035/viewer/2022062417/55165310550346a2698b4c99/html5/thumbnails/9.jpg)
Entering RunAhead Processors can enter run-ahead mode at any point
L2 Cache Misses used in paper
Architecture needs to be able to checkpoint and restore register state
Including branch-history register and return address stack
![Page 10: Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,](https://reader035.vdocuments.mx/reader035/viewer/2022062417/55165310550346a2698b4c99/html5/thumbnails/10.jpg)
Handling Avoided Read Run Ahead trigger returns immediately
Value is marked as INV Processor continues fetching and executing
instructions
ld r1, [r2]
Add r3, r2, r2
Add r3, r1, r2
move r1, 0
R1
R2
R3
![Page 11: Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,](https://reader035.vdocuments.mx/reader035/viewer/2022062417/55165310550346a2698b4c99/html5/thumbnails/11.jpg)
Executing Instruction in RunAhead Instructions are fetched and executed as
normal Instructions are committed retired out of
the instruction window in program order If the instructions registers are INV it can be
retired without executing No data is ever observable outside the CPU
![Page 12: Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,](https://reader035.vdocuments.mx/reader035/viewer/2022062417/55165310550346a2698b4c99/html5/thumbnails/12.jpg)
Branches during RunAhead
Divergence Points: Incorrect INV value branch prediction
Predict Branch
Yes – Assume predictor is correct,Continue execution
Does BranchDepend on INV?
No - Evaluate branch
Was branch predictor correct?
Yes – Continue Execution No – Flush instruction queue
![Page 13: Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,](https://reader035.vdocuments.mx/reader035/viewer/2022062417/55165310550346a2698b4c99/html5/thumbnails/13.jpg)
Exiting RunAhead Occurs when stalling memory access finally
returns Checkpointed architecture is restored All instructions in the machine are flushed
Processor starts fetching again at instruction which caused RunAhead execution Paper presented optimization where fetching
started slightly before stalled instruction returned
![Page 14: Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,](https://reader035.vdocuments.mx/reader035/viewer/2022062417/55165310550346a2698b4c99/html5/thumbnails/14.jpg)
Biasing Branch Predictors RunAhead can cause branch predictors to
be biased twice on the same branch
Several Alternatives:(1)Always train branch predictors (2)Never train branch predictors (3)Create list of predicted branches(4)Create separate Branch Predictor
![Page 15: Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,](https://reader035.vdocuments.mx/reader035/viewer/2022062417/55165310550346a2698b4c99/html5/thumbnails/15.jpg)
RunAhead Cache
RunAhead execution disregards stores Can’t produce externally observable results
However, this data is needed for communication Solution: Run-Ahead cache
Loop:
…
store r1, [r2]
add r1, r3, r1
store r1, [r4]
load r1, [r2]
bne r1, r5, Loop
![Page 16: Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,](https://reader035.vdocuments.mx/reader035/viewer/2022062417/55165310550346a2698b4c99/html5/thumbnails/16.jpg)
Stores and Loads in Run Ahead
Loads1. If address is INV data
is automatically INV2. Next look in:
1. Store buffer2. RunAhead Cache
3. Finally go to memory1. In in cache treat as
valid2. If not treat as INV,
don’t stall
Stores1. Use store-buffer as
usual2. On Commit:
1. If address is INV ignore2. Otherwise write data to
RunAhead Cache
![Page 17: Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,](https://reader035.vdocuments.mx/reader035/viewer/2022062417/55165310550346a2698b4c99/html5/thumbnails/17.jpg)
Run-Ahead Cache Results
Found that not passing data from stores to loads resulted in poor performance Significant number of INV results
Better
![Page 18: Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,](https://reader035.vdocuments.mx/reader035/viewer/2022062417/55165310550346a2698b4c99/html5/thumbnails/18.jpg)
Details: Architecture
![Page 19: Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,](https://reader035.vdocuments.mx/reader035/viewer/2022062417/55165310550346a2698b4c99/html5/thumbnails/19.jpg)
ResultsBetter
![Page 20: Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,](https://reader035.vdocuments.mx/reader035/viewer/2022062417/55165310550346a2698b4c99/html5/thumbnails/20.jpg)
Results (2)Better
![Page 21: Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,](https://reader035.vdocuments.mx/reader035/viewer/2022062417/55165310550346a2698b4c99/html5/thumbnails/21.jpg)
Issues
Some wrong assumptions about future machines Future baseline corresponds poorly to modern
architectures
Not a lot of details of architectural requirement for this technique Increase architecture size Increase power-requirements