Chen-Yong Cher & T. N. Vijaykumar
School of Electrical and Computer EngineeringPurdue University
http://www.ece.purdue.edu/~vijay
Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 2
Accuracy is not 100% due to difficult branches� Complex branching patterns� Conflicts in prediction tables
Trends show deeper pipelines (e.g., 20-stage Pentium 4)� One misprediction squash
� At least 15 cycles or 15 x 4 = 60 instructions� At 5% mispredictions, CPI = 0.25 + 0.2*0.05*15 = 0.40
� Actually, squashes cost more due to late outcomes
Branch mispredictions cause significant performance loss
Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 3
branch PC2
A…
B…
C…
TakenNot Taken
Control-flow independent
Control-flow dependent
ExecutedIrrespectiveof branch outcome
Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 4
Skip over control-flow dependent code� For only difficult branches� Without even fetching control-flow dependent code� Execute control-flow independent code� Execute control-flow dependent code after branch resolves� Conserve hardware resources
Today’s OoO pipelines routinely exploit data independence� But not control-flow independence directly
Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 5
� Introduction� Skipper: When and Where to Skip� Skipper: How to skip� Results� Conclusions
Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 6
Tim
e Some of data-independent C
Correct Incorrect Skipper
Predict not taken Predict taken Skip
Resolve not taken Resolve not taken Resolve not taken
Some of A & C Some of B & C
Rest of A & C Squash ALL B & C
Re-execute ALL of A & C
A & rest of CBranch PC2
A B
C
IncorrectCorrect
Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 7
Execution is out of orderBut fetch and rename are in orderInstruction Window maintains precise interrupt
Relies on fetching in program order
predict/fetch decode rename
OoOissue
regread execute
branchor
cachewriteback
Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 8
Skipping results in out-of-order fetching� First fetch control-flow independent� Then fetch control-flow dependent
Convince an in-order fetch pipeline to fetch out-of-order!
Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 9
� Introduction� Skipper: When and Where to Skip� Skipper: How to skip� Results� Conclusions
Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 10
� When: only difficult branches �JRS low confidence predictor [MICRO ‘96]�Count consecutive correct predictions�Identify as difficult if recently mispredicted
Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 11
� Hardware Heuristic based on If-Then-Else� Learn and keep in table� Branch PC2 # difficult branch (step 1)� A� …� Jump PC3 # jump instruction (step 2)� PC2: B # target of difficult branch� …� PC3: C # target of jump instruction
� Reconvergence PC: PC3 for If-Then-Else, PC2 for If (step 3)
Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 12
� Introduction� Skipper: When and Where to Skip� Skipper: How to skip� Results� Conclusions
Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 13
�Create a gap in instruction window �Fill the gap later when fetching skipped instructions
�Learn the gap length from past�Use largest length of if/else paths conservatively�squash if actual instruction count exceeds gap length
Despite out-of-order fetch, program order in I-window
Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 14
Prog
ram
Ord
er
Instruction Window
Gap
Control-flow independent
Control-flow dependent A B
C
Branch PC2Head
Tail
Program Order
FetchedFirst
FetchedLater
Out-Of-OrderFetching
Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 15
Prog
ram
Ord
er
Instruction Window
GapA B
C
Branch PC2Head
Tail
Program Order
FetchedFirst
FetchedLater
Inputregs (2)
Outputregs (1)
Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 16
� How will data dependent instructions wait for skipped instructions�Learn outputregs written by control dependent insts�Preallocate and preassign for outputregs, mark “busy” �Insert Pmoves instructions after gap filled�pmoves copy values to preallocated after gap filled
�If actual output not in outputregs, squash
Use normal rename and wake-up mechanism
Gap
Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 17
� How will control-flow dependent instructions know the correct registers to source�Learn inputregs read by control dependent insts�Cannot backup all rename maps in single cycle�Backup only inputregs and outputregs
�Skipped instructions use backup rename table�If actual input not in inputregs, squash
Use normal rename backup mechanismGap
Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 18
backup in/outputregs’ rename maps
wbmem/br
execreadOoOissue
rendecfet
fetch next from reconv PC
mark busy
create Inst-Window gap
allocate new regs
place in Inst-Window gap
fetch skippedinsts
Last inst Inserts pmoves
Preassign for outputregs
Usual
DifficultBranch
Skipped lookup in backup rename table
Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 19
� Introduction� Skipper: When and Where to Skip� Skipper: How to skip� Results� Conclusions
Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 20
Simplescalar simulator� 8k/8k/8k entries Hybrid predictors, commit-update� 9-cycle misprediction penalty� 4K-entry, 4-bit JRS
� 64K 2-way L1 I & D caches, 2M L2 cache
� 128-entry information table of 3KB total
Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 21
� Speedup 10% over base � Compress – deep data dependent� Cc1, go –mispredictions in control-dependent path� Perl, vortex – low misprediction rate and low coverage
0.900.951.001.051.101.151.20
cc1
compre
ss go
ijpeg li
m88ks
im perl
vorte
x
Spee
dup 128
256
Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 22
� Speedup 8% over Polypath� Polypath executes both if & else paths� Equal I-cache bandwidth for all machines
0.900.951.001.051.101.151.20
cc1co
mpress go ijpeg li
m88ksim perl
vortex
Skipper 128Polypath 128
Skipper 256Polypath 256
Spee
dup
Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 23
� Actual Coverage Mean: 23% of mispredictions� Overshoot Mean: 4.3% of all branches
� Mean of branch misprediction rate�Skipper’s: 4.06%�Superscalar’s 6.53%
Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 24
Exploits control-flow independence for difficult branches� Fetch control-independent code while branch resolves� Fetch control-dependent code after the branch is resolved
� Out-of-order instruction fetch � Mechanisms: Inst-Window gap, Preallocation, Pmoves
� Performs better �10% over Superscalar�8% over Polypath
Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 25
A B
C…
Branch PC6
Branch PC2
Program Order Predictor relies on fetching in-order
Missingpatternhistories
Shift In Predictionhistory
Pattern History
Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 26
� Compiler might confuse reconvergence PC heuristic1. Compiler changes code patterns (trace scheduling)
� But only performed non-difficult branches
2. Compiler changes control instructions(branch to jump)
3. Compiler increases # of control-dependent: (Example: tail duplication)� Increasing gap length to unacceptably large number
Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 27
2416891208798go
Misprediction RatioCoverage
981009910098
10092
HeuristicAccuracy
112177988vortex432169894perl4211329078m88ksim846177796li938589690ijpeg
12892510098compress1087197592cc1
Superscalar’s
Skipper’sOvershootActualHeuristicJRSBenchmarks
Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 28
2110881.4go
13891613
1014
#slot
8551.0vortex4551.3perl5442.1m88ksim9551.2li5662.0ijpeg
4341.5compress7461.4cc1
#inst#out#in#gaps
Benchmarks