selective topics in micro-architecture design: branch...
TRANSCRIPT
Selective Topics in Micro-Architecture Design:
Branch Prediction
Lei Chen
June 15, 2009 IEEE CTS ComSoc/SP Talk 2
Outline• Processor, Pipelining and the Branch Problem
• The “Solution”: Branch Prediction
– Design Examples
• The Journal of Instruction-Level Parallelism (JILP) Championship
Branch Prediction Competition
• Branch Prediction and Others
• Branch Prediction Research and Design in Academia vs. Industry
• Issues in Branch Prediction and its Future
• Conclusions
June 15, 2009 IEEE CTS ComSoc/SP Talk 3
The Basic 5-stage PipelinePattterson and Hennessy, “Computer Organization and Design”
IM Reg DM Reg
IM Reg DM Reg
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6
Time (in clock cycles)
beq $10, $1, 40
Program execution order (in instructions)
sub $11, $2, $3
ALU
ALU
June 15, 2009 IEEE CTS ComSoc/SP Talk 4
The Actual Processor
• Instruction Fetch• Decode and Rename• Instruction Issue
• Instruction Execution• Writeback and
Commit
Instruction Cache
Data Cache
Instruction Fetch
Predictor Branch
Instruction Fetch Queue Decode Rename ROB Register File
Function Units
Instruction Issue Queue
June 15, 2009 IEEE CTS ComSoc/SP Talk 5
Two Real Processorshttp://bwrc.eecs.berkeley.edu/CIC/die_photos/pentium.gif/, http://www.theinquirer.net/img/3911/nove3.jpg.
Pentium Power5
June 15, 2009 IEEE CTS ComSoc/SP Talk 6
The Branch Problem
• History– ENIAC
• First electronic computer– Stretch (IBM 7030)
• Pipeline processing
• More critical in superscalar and deeply-pipelined processors– more branch instrs, higher mispred penalty– Performance bottleneck
June 15, 2009 IEEE CTS ComSoc/SP Talk 7
The “Solution”: Branch Prediction
• An educated guess of branch direction– Taken or not-taken
• Static– fixed before program execution– adv: little hardware cost– disadv: low accuracy, application-specific
• Dynamic– adv: higher accuracy– disadv: hardware cost, power
June 15, 2009 IEEE CTS ComSoc/SP Talk 8
Another “Solution”: Branch Predication
• Allow instructions dependant on branches to conditionally update architecture state
• Basic idea: – Conditional branches set predicate registers– Dependant instructions continue execution
but update architecture state based on the associated predicate register value
• Good for short branches, no mispredictionbut fixed overhead
June 15, 2009 IEEE CTS ComSoc/SP Talk 9
Static Branch Prediction
• Always taken– Sun SuperSparc
• Always not taken– MIPS R4000, Intel i486
• Instruction specific– PowerPC 601
• Backward-taken-forward-not-taken: – HP PA-7x00
• Profile-based
June 15, 2009 IEEE CTS ComSoc/SP Talk 10
Dynamic Branch Prediction
• 1-bit– Alpha 21064 instruction cache
• Bimodal (2-bit): 00, 01, 10, 11– Alpha 21064A, Alpha 21164 and PowerPC 604
• History-based– Gselect and Gshare
• use global history– Local
• use history from each branch– Path-based
• further distinguish the paths leading to a branch
June 15, 2009 IEEE CTS ComSoc/SP Talk 11
Branch Prediction AccuracyLinley Gwennap, “New Algorithm Improves Branch Prediction”
June 15, 2009 IEEE CTS ComSoc/SP Talk 12
2-level Adaptive Branch PredictionPattern History Table (PHT)
Branch History Register (BHR)(Shift left when update)
Re-kRe-k+1 ...... Re-2Re-1
.......1 1 1 0
00......0000......01 . . . . . .11......1011......11
Branch History Pattern
Index
June 15, 2009 IEEE CTS ComSoc/SP Talk 13
Local Branch Predictor
Index
PatternHistoryTable(GPHT)
Per-addressBranchHistoryTable(PBHT)
.
.
.
Global
June 15, 2009 IEEE CTS ComSoc/SP Talk 14
Combined Branch Predictor
Local PHT
MUX
XOR
Global History
Selector PHT
Global PHT
Local BHT
PC
Selector
Final Prediction
Global Prediction
Local Prediction
Select
Local Predictor
Gshare Predictor
June 15, 2009 IEEE CTS ComSoc/SP Talk 15
Multi-Hybrid Predictor
. . .
Branch Target Buffer
Predictor Selection Counters
Priority Encoder
1 N. . .
Predictor 1 Predictor N
Prediction
June 15, 2009 IEEE CTS ComSoc/SP Talk 16
The Aliasing Problem
• Finite indexing tables• Destructive aliasing
– different bias• Non-destructive aliasing
– same bias• De-aliasing branch predictors
– Agree, Bi-Mode, Skewed, Filter, and YAGS
June 15, 2009 IEEE CTS ComSoc/SP Talk 17
Agree Branch Predictor
bias bit
Branch Target Buffer (BTB)
address history
+
Prediction
PHT
agree/disagree
June 15, 2009 IEEE CTS ComSoc/SP Talk 18
Bi-Mode Branch Predictor
T PHT
+
Prediction
address history
Choice PHT NT PHT
June 15, 2009 IEEE CTS ComSoc/SP Talk 19
Skew Branch Predictor
Bank 1
address history
Prediction
majority vote
Bank 2 Bank 3
f1 f2 f3
June 15, 2009 IEEE CTS ComSoc/SP Talk 20
Filter Branch Predictor
bias bit
address history
PHT+
Branch Target Buffer (BTB)
Prediction
counter
June 15, 2009 IEEE CTS ComSoc/SP Talk 21
2Bc-gskew Branch Predictor (EV8)BIM
G0
G1
Meta
address
address history
majorityvote
Prediction
metaprediction
bimodal prediction
e-gskew prediction
June 15, 2009 IEEE CTS ComSoc/SP Talk 22
Perceptron Branch Predictor
June 15, 2009 IEEE CTS ComSoc/SP Talk 23
Branch Prediction Using Register Values
• PREG prediction– Using profile information to find the branches
correlated with data values • Branch difference predictor
– Maintains the history of differences between branch source registers
– backing predictor, rare event predictor• Branch prediction through value prediction
June 15, 2009 IEEE CTS ComSoc/SP Talk 24
Branch Prediction through Value Prediction
June 15, 2009 IEEE CTS ComSoc/SP Talk 25
Branch Prediction Using Data Dependence Information
• ARVI branch predictor– Available Register Value Information– Data dependence -> branch calculation
• Dynamic Dataflow-based Identification of Correlated Branches – Identify correlated branches from a large
global history• Multiple-cycle overriding schemes
June 15, 2009 IEEE CTS ComSoc/SP Talk 26
Intuition Behind ARVI
• Register Values
+Branch Calculation
=
Branch Outcome
ADDIU R2, R7, #1SUBU R3, R12, R2SLT R2, R3, R7BEQ R2, R0, #2
R12 R7
Branch Outcome
June 15, 2009 IEEE CTS ComSoc/SP Talk 27
p1 xor p2 p3
p2 + p3 p4
p4 or p4 p5
BEQ p5, 0, #8
• The outstanding data dependence chain leading to the branch
Branch Dependence Chain
Committed Instructions
Outstanding Instructions
June 15, 2009 IEEE CTS ComSoc/SP Talk 28
DDT ConfigurationValid
1
1
1
0
p1 p2 p3 p4 p5In-Flight Instructions
p1 xor p2 p3
p2 + p3 p4
p4 + p4 p5
X X
X
X
X
X
Xi, j: Instri pjdependence
June 15, 2009 IEEE CTS ComSoc/SP Talk 29
DDT OperationValid p1 p2 p3 p4 p5
0
0
0
0
In-Flight Instructions
p1 xor p2 p3
June 15, 2009 IEEE CTS ComSoc/SP Talk 30
X
DDT OperationValid p1 p2 p3 p4 p5
0
0
0
1 p1 xor p2 p3
In-Flight Instructions
OR ANDPDEST
June 15, 2009 IEEE CTS ComSoc/SP Talk 31
DDT OperationValid p1 p2 p3 p4 p5
1
0
0
0
X p1 xor p2 p3
In-Flight Instructions
p2 + p3 p4
OR ANDPDEST
June 15, 2009 IEEE CTS ComSoc/SP Talk 32
X
X
DDT OperationValid p1 p2 p3 p4 p5
1
0
0
p2 + p3 p41
X p1 xor p2 p3
In-Flight Instructions
OR ANDPDEST
June 15, 2009 IEEE CTS ComSoc/SP Talk 33
DDT Operation
p1 xor p2 p3
p2 + p3 p4
Valid p1 p2 p3 p4 p5
1
1
0
0
X
In-Flight Instructions
X
X
p4 + p4 p5
OR ANDPDEST
June 15, 2009 IEEE CTS ComSoc/SP Talk 34
X
X
X
DDT OperationValid p1 p2 p3 p4 p5
1
1
0
X
1
X
X
In-Flight Instructions
p1 xor p2 p3
p2 + p3 p4
p4 + p4 p5
OR ANDPDEST
June 15, 2009 IEEE CTS ComSoc/SP Talk 35
DDT OperationValid p1 p2 p3 p4 p5
1 X XX
X1
1
In-Flight Instructions
p1 xor p2 p3
p2 + p3 p4
p4 + p4 p5
X
X
0
OR ANDPDEST
June 15, 2009 IEEE CTS ComSoc/SP Talk 36
DDT Operation
p2 + p3 p4
p4 + p4 p5
Valid p1 p2 p3 p4 p5
XX
X
X
X
X1
1
0
In-Flight Instructions
0
< Committed >
OR ANDPDEST
June 15, 2009 IEEE CTS ComSoc/SP Talk 37
Register Set Extractor (RSE)
Valid p1 p2 p3 p4 p5
1
1
1
X XX
X X
X1
In-Flight Instructions
p1 xor p2 p3
p2 + p3 p4
p4 + p4 p5
BEQ p5, 0, #8
p1 p2 p3 p4 p5
S S T
S S
S
T
T
1 1 0 0 0
June 15, 2009 IEEE CTS ComSoc/SP Talk 38
ARVI Operations
DDT
RSE Filter
Index = H(Reg Values, PC)
BVIT
Tag Prediction
HitComparePrediction
Tag = F(logical reg ID, path length)
p1 xor p2 p3
p2 + p3 p4
p4 or p4 p5
BEQ p5, 0, #8
Extract the reg set ofthe branch DD chain
p1, p2, p3, p4, p5
June 15, 2009 IEEE CTS ComSoc/SP Talk 39
ARVI Access Timing
RSEDDT
depth calculationData dependence(first level
predictoraccessedhere)
Cycle 1 Cycle 5Cycle 4Cycle 3Cycle 2
BVIT
tag cntr.perf.depth ID
tag
Adder tree of register IDs
table lookupRegister map
instructionBranch
initial prediction
map tableShadow
Reg FileShadow
Read Read
Read
comparetags
hit
predictionARVIpred
PC
XOR tree
June 15, 2009 IEEE CTS ComSoc/SP Talk 40
Outline• Processor, Pipelining and the Branch Problem
• The “Solution”: Branch Prediction
– Design Examples
• The Journal of Instruction-Level Parallelism (JILP) Championship
Branch Prediction Competition
• Branch Prediction and Others
• Branch Prediction Research and Design in Academia vs. Industry
• Issues in Branch Prediction and its Future
• Conclusions
June 15, 2009 IEEE CTS ComSoc/SP Talk 41
JILP Championship Branch Prediction Competition (CBP)
• Objective: Compare branch predictors in a common framework
• Held in conjunction with MICRO• CBP-1: 2004, CBP-2: 2006• Result:
– Most are adapted perceptron predictor– A great deal of design engineering– Submissions from all around the world
June 15, 2009 IEEE CTS ComSoc/SP Talk 42
JILP Championship Branch Prediction Competition (CBP)
• Different branches need to be predicted by different schemes– Bias and profiling– Loop, call/return
• History length– Global– Variable length
• Update scheme is very important• No other information utilized
– static information about the instructions, data values, and memory addresses.
June 15, 2009 IEEE CTS ComSoc/SP Talk 43
• Processor, Pipelining and the Branch Problem
• The “Solution”: Branch Prediction
– Design Examples
• The Journal of Instruction-Level Parallelism (JILP) Championship
Branch Prediction Competition
• Branch Prediction and Others
• Branch Prediction Research and Design in Academia vs. Industry
• Issues in Branch Prediction and its Future
• Conclusions
Outline
June 15, 2009 IEEE CTS ComSoc/SP Talk 44
Branch Prediction and Others
• Data compression and information theory– Partial matching, Markov predictor– Entropy
• Signal processing– FAB: Fourier Analysis Branch
• Analog circuits– Digital-to-analog conversion for neural networks -> our brains?
• Hacking Intel branch prediction• Data encryption
– A spy-process running simultaneously with an RSA-process -> resource sharing
– Square-and-Multiply Algorithm (SM) -> branches -> measure mispredictions -> key
June 15, 2009 IEEE CTS ComSoc/SP Talk 45
Branch Prediction and Others
June 15, 2009 IEEE CTS ComSoc/SP Talk 46
• The Base Processor and Base Pipeline
• The Branch Problem
• The “Solution”: Branch Prediction
– Design Examples
• The Journal of Instruction-Level Parallelism (JILP) Championship Branch
Prediction Competition
• Branch Prediction and Others
• Branch Prediction Research and Design in Academia vs. Industry
• Issues in Branch Prediction and its Future
• Conclusions
Outline
June 15, 2009 IEEE CTS ComSoc/SP Talk 47
Branch Prediction in AcademiaISCA Branch Prediction Papers
0
1
2
3
4
5
198119
8919
9019
9119
9219
9319
9419
95*
1996
*19
97*
1998
*19
9920
0020
0120
0220
0320
0420
0520
0620
0720
0820
09
June 15, 2009 IEEE CTS ComSoc/SP Talk 48
Branch Prediction in IndustryUSPTO Granted "Branch Prediction" Patents
0
2
4
6
8
10
12
14
16
18
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
June 15, 2009 IEEE CTS ComSoc/SP Talk 49
Branch Prediction in IndustryUSPTO Granted "Branch Prediction" Patents
0
5
10
15
20
25
30
35
AMD
ARM
DEC+
HP
Free
scal
e+Mo
toro
laFu
jits
u
IBM
Inte
lIP
-Fir
stMi
tsub
ishi
Next
gen
Sun
Tosh
iba
June 15, 2009 IEEE CTS ComSoc/SP Talk 50
Issues in Branch Prediction and Its Future
• Aliasing still a problem*– Time and space
• Global Branch History Length*– Longer and variable (CBP)
• Branch Predictor Delay– Overriding scheme
• Power Issues*• Systematic design approach
– Target prediction
June 15, 2009 IEEE CTS ComSoc/SP Talk 51
Conclusions
• The branch problem is due to pipeline processing and it will become more critical
• Static and dynamic branch prediction schemes have been proposed to solve the branch problem
• Branch prediction is still unsolved in industry and requires careful systematic engineering
Selective Topics in Micro-Architecture Design:
Branch Prediction
Lei Chen