lecture 4: branch predictors. direction: 0 or 1 target: 32- or 64-bit value turns out targets are...

50
Advanced Microarchitecture Lecture 4: Branch Predictors

Post on 15-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

Advanced MicroarchitectureLecture 4: Branch Predictors

Page 2: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

2

Direction vs. Target• Direction: 0 or 1• Target: 32- or 64-bit value

• Turns out targets are generally easier to predict– Don’t need to predict NT target– T target doesn’t usually change

• or has “nice” pattern like subroutine returns

Lecture 4: Correlated Branch Predictors

Page 3: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

3

Branches Have Locality• If a branch was previously taken, there’s a

good chance it’ll be taken again in the future

for(i=0; i < 100000; i++){

/* do stuff */}

Lecture 4: Correlated Branch Predictors

This branch will be taken99,999 times in a row.

Page 4: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

4

Simple Predictor• Always predict NT

– no fetch bubbles (always just fetch the next line)

– does horribly on previous for-loop example• Always predict T

– does pretty well on previous example– but what if you have other control besides

loops?

p = calloc(num,sizeof(*p));if(p == NULL)

error_handler( );Lecture 4: Correlated Branch Predictors

This branch is practicallynever taken

Page 5: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

5

Last Outcome Predictor• Do what you did last time

Lecture 4: Correlated Branch Predictors

0xDC08: for(i=0; i < 100000; i++){

0xDC44: if( ( i % 100) == 0 )

tick( );

0xDC50:if( (i & 1) == 1)

odd( );

}

T

N

Page 6: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

6

Misprediction Rates?

Lecture 4: Correlated Branch Predictors

DC08: TTTTTTTTTTT ... TTTTTTTTTTNTTTTTTTTT …

100,000 iterations

How often is branch outcome != previous outcome?2 / 100,000

TNNT

DC44: TTTTT ... TNTTTTT ... TNTTTTT ...

2 / 100DC50: TNTNTNTNTNTNTNTNTNTNTNTNTNTNT …

2 / 2

99.998%Prediction

Rate

98.0%

0.0%

Page 7: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

7

Saturating Two-Bit Counter

Lecture 4: Correlated Branch Predictors

0 1

FSM for Last-OutcomePrediction

0 1

2 3

FSM for 2bC(2-bit Counter)

Predict NT

Predict T

Transistion on T outcome

Transistion on NT outcome

Page 8: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

8

Example

Lecture 4: Correlated Branch Predictors

2

T

3

T

3

T

…3

N

N

1

T

0

0

T

1

T T T T…

T

1 1 1 1

T

1

T…1

0

T

1

T

2

T

3

T

3

T… 3

T

Initial Training/Warm-up1bC:

2bC:

Only 1 Mispredict per N branches now!DC08: 99.999% DC04: 99.0%

Page 9: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

9

Importance of Branches• 98% 99%

– Whoop-Dee-Do!– Actually, it’s 2% misprediction rate 1%– That’s a halving of the number of mispredictions

• So what?– If misp rate equals 50%, and 1 in 5 insts is a branch, then

number of useful instructions that we can fetch is:5*(1 + ½ + (½)2 + (½)3 + … ) = 10

– If we halve the miss rate down to 25%:5*(1 + ¾ + (¾)2 + (¾)3 + … ) = 20

– Halving the miss rate doubles the number of useful instructions that we can try to extract ILP from

Lecture 4: Correlated Branch Predictors

Page 10: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

10

Typical Organization of 2bC Predictor

Lecture 4: Correlated Branch Predictors

PC hash32 or 64 bits

log2 n bits

n entries/counters

Prediction

FSMUpdateLogic

table update

Actual outcome

… back to predictors

Page 11: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

11

Typical Hash

• Just take the log2n least significant bits of the PC

• May need to ignore a few bits– In a 32-bit RISC ISA, all instructions are 4 bytes

wide, and all instruction addresses are 4-byte aligned least two significant bits of PC are always zeros and so they are not included• equivalent to right-shifting PC by two positions before

hashing– In a variable-length CISC ISA (ex. x86),

instructions may start on arbitrary byte boundaries• probably don’t want to shift

Lecture 4: Correlated Branch Predictors

Page 12: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

12

How about the Branch at 0xDC50?• 1bc and 2bc don’t do too well (50% at best)• But it’s still obviously predictable• Why?

– It has a repeating pattern: (NT)*– How about other patterns? (TTNTN)*

• Use branch correlation– The outcome of a branch is often related to

previous outcome(s)

Lecture 4: Correlated Branch Predictors

Page 13: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

13

Idea: Track the History of a Branch

Lecture 4: Correlated Branch Predictors

PC Previous Outcome

1Counter if prev=0

3 0Counter if prev=1

1 3 3 prev = 1 3 0 prediction = N

prev = 0 3 0 prediction = T

prev = 1 3 0 prediction = N

prev = 0 3 0 prediction = T

prev = 1 3 prediction = T3

prev = 1 3 prediction = T3

prev = 1 3 prediction = T2

prev = 0 3 prediction = T2

Page 14: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

14

Deeper History Covers More Patterns

• What pattern has this branch predictor entry learned?

Lecture 4: Correlated Branch Predictors

PC

0 310 1 3 1 0 02 2

Last 3 Outcomes Counter if prev=000

Counter if prev=001

Counter if prev=010

Counter if prev=111

001 1; 011 0; 110 0; 100 100110011001… (0011)*

Page 15: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

15

Predictor Organizations

Lecture 4: Correlated Branch Predictors

PC Hash

Different pattern foreach branch PC

PC Hash

Shared set ofpatterns

PC Hash

Mix of both

Page 16: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

16

Example (1)• 1024 counters (210)

– 32 sets ( )• 5-bit PC hash chooses a set

– Each set has 32 counters• 32 x 32 = 1024• History length of 5 (log232 =

5)

• Branch collisions– 1000’s of branches

collapsed into only 32 sets

Lecture 4: Correlated Branch Predictors

PC Hash

5

5

Page 17: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

17

Example (2)• 1024 counters (210)

– 128 sets ( )• 7-bit PC hash chooses a set

– Each set has 8 counters• 128 x 8 = 1024• History length of 3 (log28 =

3)

• Limited Patterns/Correlation– Can now only handle

history length of three

Lecture 4: Correlated Branch Predictors

PC Hash

7

3

Page 18: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

18

Two-Level Predictor Organization• Branch History Table

(BHT)– 2a entries– h-bit history per entry

• Pattern History Table (PHT)– 2b sets– 2h counters per set

• Total Size in bits– h2a + 2(b+h)2

Lecture 4: Correlated Branch Predictors

PC Hash a

b

h

Each entry is a 2-bit counter

Page 19: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

19

Classes of Two-Level Predictors• h = 0 or a = 0 (Degenerate Case)

– Regular table of 2bC’s (b = log2counters)

• h > 0, a > 1– “Local History” 2-level predictor

• h > 0, a = 1– “Global History” 2-level predictor

Lecture 4: Correlated Branch Predictors

Page 20: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

20

Global vs. Local Branch History• Local Behavior

– What is the predicted direction of Branch A given the outcomes of previous instances of Branch A?

• Global Behavior– What is the predicted direction of Branch Z

given the outcomes of all* previous branches A, B, …, X and Y?

* number of previous branches tracked limited by the history length

Lecture 4: Correlated Branch Predictors

Page 21: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

21

Why Global Correlations Exist• Example: related branch

conditions

p = findNode(foo);if ( p is parent )

do something;

do other stuff; /* may contain more branches */

if ( p is a child )do something else;

Lecture 4: Correlated Branch Predictors

Outcome of secondbranch is always

opposite of the firstbranch

A:

B:

Page 22: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

22

Other Global Correlations• Testing same/similar conditions

– code might test for NULL before a function call, and the function might test for NULL again

– in some cases it may be faster to recompute a condition rather than save a previous computation in memory and re-load it

– partial correlations: one branch could test for cond1, and another branch could test for cond1 && cond2 (if cond1 is false, then the second branch can be predicted as false)

– multiple correlations: one branch tests cond1, a second tests cond2, and a third tests cond1 cond2 (which can always be predicted if the first two branches are known).

Lecture 4: Correlated Branch Predictors

Page 23: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

23

A Global-History Predictor

Lecture 4: Correlated Branch Predictors

PC Hash

b

h

Single global branchhistory register (BHR)

PC Hash

bh

b+h

Page 24: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

24

Similar Tradeoff Between B and H• For fixed number of counters

– Larger h Smaller b• Larger h longer history

– able to capture more patterns– longer warm-up/training time

• Smaller b more branches map to same set of counters

– more interference

– Larger b Smaller h• just the opposite…

Lecture 4: Correlated Branch Predictors

Page 25: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

25

Motivation for Combined Indexing• Not all 2h “states” are used

– (TTNN)* only uses half of the states for a history length of 3, and only ¼ of the states for a history length of 4

– (TN)* only uses two states no matter how long the history length is

• Not all bits of the PC are uniformly distributed

• Not all bits of the history are uniformly likely to be correlated– more recent history more likely to be strongly

correlated

Lecture 4: Correlated Branch Predictors

Page 26: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

26

Combined Index Example: gshare• S. McFarling (DEC-WRL TR, 1993)

Lecture 4: Correlated Branch Predictors

PC Hash

kk

XOR

k = log2counters

Page 27: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

27

Gshare exampleBranchAddress

GlobalHistory

Gselect4/4

Gshare8/8

00000000 00000001 00000001 00000001

00000000 00000000 00000000 00000000

11111111 00000000 11110000 11111111

11111111 10000000 11110000 01111111

Lecture 4: Correlated Branch Predictors

Insufficient historyleads to a conflict

Page 28: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

28

Some Interference May Be Tolerable• Branch A: always not-

taken• Branch B: always taken• Branch C: TNTNTN…• Branch D: TTNNTTNN…

Lecture 4: Correlated Branch Predictors

3

0

3

0

3

0

0

3

000

111

010

101

001

011

100

110

Page 29: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

29

And Then It Might Not• Branch X: TTTNTTTN…• Branch Y: TNTNTN…• Branch Z: TTTT…

Lecture 4: Correlated Branch Predictors

000

111

010

101

001

011

100

110

0

3

3

3

3?

?

Page 30: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

30

Interference Reducing Predictors• There are patterns and asymmetries in

branches• Not all patterns occur with same frequency• Branches have biases• This lecture:

– Bi-Mode (Lee et al., MICRO 97)– gskewed (Michaud et al., ISCA 97)

• These are global history predictors, but the ideas can be applied to other types of predictors

Lecture 4: Correlated Branch Predictors

Page 31: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

31

Gskewed idea• Interference occurs because two (or more)

branches hash to the same index• A different hash function can prevent this

collision– but may cause other collisions

• Use multiple hash functions such that a collision can only occur in a few cases– use a majority vote to make final decision

Lecture 4: Correlated Branch Predictors

Page 32: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

32

Gskewed organization

Lecture 4: Correlated Branch Predictors

PC

Global Histhash1 hash2 hash3

maj

prediction

PH

T1

PH

T2

PH

T3

if hash1(x) = hash1(y)then:

hash2(x) hash2(y)hash3(x) hash3(y)

Page 33: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

33

Gskewed example

Lecture 4: Correlated Branch Predictors

A

B

maj

Page 34: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

34

Combining Predictors• Some branches exhibit local history

correlations– ex. loop branches

• While others exhibit global history correlations– “spaghetti logic”, ex. if-elsif-elsif-elsif-else

branches

• Using a global history predictor prevents accurate prediction of branches exhibiting local history correlations

• And visa-versaLecture 4: Correlated Branch Predictors

Page 35: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

35

Tournament Hybrid Predictors

Pred0 Pred1

MetaUpdat

e

---

Inc

Dec

---

Lecture 4: Correlated Branch Predictors

Pred0 Pred1Meta-

Predictor

Final Prediction

table of 2-/3-bit counters

If meta-counter MSB = 0,use pred0 else use pred1

Page 36: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

36

Common Combinations• Global history + Local history• “easy” branches + global history

– 2bC and gshare• short history + long history

• Many types of behavior, many combinations

Lecture 4: Correlated Branch Predictors

Page 37: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

37

Multi-Hybrids• Why only combine two predictors?

Lecture 4: Correlated Branch Predictors

M23

MM

prediction

M

prediction

• Tradeoff between making good individual predictions (P’s) vs. making good meta-predictions (M’s)– for a fixed hardware budget, improving one may

hurt the other

P3P2M01P1P0 P3P2P1P0

Page 38: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

38

Prediction Fusion

• Selection discards information from n-1 predictors

• Fusion attempts to synthesize all information– more info to work with– possibly more junk to sort through

Lecture 4: Correlated Branch Predictors

M

prediction

P3

prediction

M

P2P1P0P3P2P1P0

Page 39: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

39

Using Long Branch Histories• Long global history provides more context

for branch prediction/pattern matching– more potential sources of correlation

• Costs– For PHT-based approach, HW cost increases

exponentially: O(2h) counters– Training time increases, which may decrease

overall accuracy

Lecture 4: Correlated Branch Predictors

Page 40: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

40

Predictor Training Time• Ex: prediction equals opposite for 2nd most

recent

Lecture 4: Correlated Branch Predictors

• Hist Len = 2• 4 states to train:

NN TNT TTN NTT N

• Hist Len = 3• 8 states to train:

NNN TNNT TNTN NNTT NTNN T…

Page 41: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

41

Neural Branch Prediction• Uses “Perceptron” from classical machine

learning theory– simplest form of a neural-net (single-layer,

single-node)• Inputs are past branch outcomes• Compute weighted sum of inputs

– output is linear function of inputs– sign of output is used for the final prediction

Lecture 4: Correlated Branch Predictors

Page 42: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

42

Perceptron Predictor

Lecture 4: Correlated Branch Predictors

1 0 1 0 0 1 0 1 1 1 0 0 1 1 0 0 0 0 1

1 -1 1 -1 -1 1 -1 1 1 1 -1 -1 1 1 -1 -1 -1 -1 1

xn x0x1x2xn-1

w0w1w2w3w4w5w6w7w8w9w10w11w12w13w14w15w16w17wn

xxx x x x x x x x x x x x x x x x x

Adder

0 prediction

“bias”

Page 43: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

43

Perceptron Predictor (2)

• Magnitude of weight wi determines how correlated branch i is to the current branch

• Sign of weight determines postitive or negative correlation

• Ex. outcome is usually opposite as 5th oldest branch– w5 has large magnitude (L), but is negative

– if x5 is taken, then w5x5 = -L1 = -L• tends to make sum more negative (toward a NT

prediction)

– if x5 is not taken, then w5x5 = -L-1 = L

Lecture 4: Correlated Branch Predictors

Page 44: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

44

Perceptron Predictor (3)• When actual branch outcome is known:

– if xi = outcome, then increment wi (positive correlation)

– if xi outcome, then decrement wi (negative correlation)

– for x0, increment if branch taken, decrement if NT

• “Done with training”– if |S wi| > q, then don’t update weights unless

mispred

Lecture 4: Correlated Branch Predictors

Page 45: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

45

Perceptron Trains Quickly• If no correlation exists with branch i, then

wi will just get incremented and decremented back and forth, wi 0

• If correlation exists with branch j, then wj will be consistently incremented (or decremented) to have a large influence on the overall sum

Lecture 4: Correlated Branch Predictors

Page 46: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

46

Linearly Inseparable Functions• Perceptron computes linear combination of

inputs• Can only learn linearly separable functions

Lecture 4: Correlated Branch Predictors

xi

xj

1-1-1

1N

N

N

Txi

xj

1-1-1

1N

T

T

N

f() = -3*xi -4*xj – 5

wi wj w0

• No values of wi, wj, w0 exist to satisfy these output

• No straight line exists that separates T’s from N’s

Page 47: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

47

Overall Hardware Organization

Lecture 4: Correlated Branch Predictors

PC Hash

one set of weights

BHR…

adder…

prediction = sign(sum)

Size = (h+1)*k*n + h + Area(mult) + Area(adder)

h = history length, k = counter width, n = number of perceptrons in table

Table of weights

Table BHR

Multipliers

Page 48: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

48

GEHL• GEometric History Length predictor

Lecture 4: Correlated Branch Predictors

very long branch history

h1 h2 h3 h4

PC

adder

prediction = sign(sum)

K-bit weights

L1L2 L3

L4

L(i) = ai-1 L(1)

History lengths form a geometric progression

Page 49: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

49

PPM Predictors• PPM = Partial Pattern Matching

– Used in data compression– Idea: Use longest history necessary, but no longer

Lecture 4: Correlated Branch Predictors

Most Recent Oldest

2bc

Part

ial ta

gs

Part

ial ta

gs

Part

ial ta

gs

Part

ial ta

gsh1 h2 h3 h4

PC

= = = =

0 1

0 1

0 1

0 1

Pred

2bc

2bc

2bc

2bc

Page 50: Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT

50

TAGE Predictor• Similar to PPM, but uses geometric history

lengths– Currently the most accurate type of branch

prediction algorithm

• References (www.jilp.org):– PPM: Michaud (CBP-1)– O-GEHL: Seznec (CBP-1)– TAGE: Seznec & Michaud (JILP)– L-TAGE: Seznec (CBP-2)

Lecture 4: Correlated Branch Predictors