branch.1 10/14 branch prediction static, dynamic branch prediction techniques

23
branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques

Upload: roland-emery-goodman

Post on 17-Jan-2016

236 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques

branch.110/14

Branch Prediction

Static, Dynamic Branch prediction techniques

Page 2: Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques

branch.210/14

I-cache

Fetch Buffer

IssueBuffer

Func.Units

Arch.State

Execute

Decode

ResultBuffer Commit

PC

Fetch

Branchexecuted

Next fetch started

Modern processors have 10 -14 pipeline stages between next PC calculation and branch resolution !

Control Flow PenaltyWhy Branch Prediction

work lost if pipeline makes wrong prediction

~ Loop length x pipeline width

Page 3: Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques

branch.310/14

Branch Penalties in a Superscalarare extensive

Page 4: Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques

branch.410/14

Reducing Control Flow Penalty Software solutions

• Minimize branches - loop unrolling Increases the run length

Hardware solutions• Find something else to do - delay slots• Speculate –Dynamic branch prediction

Speculative execution of instructions beyond branch

Page 5: Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques

branch.510/14

Motivation:Branch penalties limit performance of deeply pipelined processors

Much worse for superscalar processors

Modern branch predictors have high accuracy(>95%) and can reduce branch penalties significantly

Required hardware support:Dynamic Prediction HW:

• Branch history tables, branch target buffers, etc.

Mispredict recovery mechanisms:• Keep computation result separate from commit• Kill instructions following branch• Restore state to state following branch

Branch Prediction

Page 6: Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques

branch.610/14

Static Branch Prediction- reviewOverall probability a branch is taken is ~60-70% but:

ISA can attach preferred direction semantics to branches, e.g., Motorola MC88110

bne0 (preferred taken) beq0 (not taken)

ISA can allow arbitrary choice of statically predicted direction, e.g., HP PA-RISC, Intel IA-64 typically reported as ~80% accurate

JZ

JZbackward

90%forward

50%

Page 7: Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques

branch.710/14

Branch Prediction Needs

• Target address generation– Get register: PC, Link reg, GP reg.

– Calculate: +/- offset, auto inc/dec

– Target speculation

• Condition resolution– Get register: condition code reg, count reg.,

other reg.

– Compare registers

– Condition speculation

Page 8: Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques

branch.810/14

Target address generation takes time

Page 9: Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques

branch.910/14

Condition resolution takes time

Page 10: Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques

branch.1010/14

Solution: Branch speculation

Page 11: Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques

branch.1110/14

Branch Prediction Schemes

1. 2-bit Branch-Prediction Buffer

2. Branch Target Buffer

3. Correlating Branch Prediction Buffer

4. Tournament Branch Predictor

5. Integrated Instruction Fetch Units

6. Return Address Predictors (for subroutines, Pentium, Core Duo)

7. Predicated Execution (Itanium)

Page 12: Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques

branch.1210/14

Dynamic Branch Predictionlearning based on past behavior

• Incoming stream of addresses

• Fast outgoing stream of predictions

• Correction information returned from pipeline

BranchPredictor

Incoming Branches{ Address }

Prediction{ Address, Value }

Corrections{ Address, Value }

History Information

Page 13: Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques

branch.1310/14

Branch History Table (BHT)Table of predictors

• Each branch given its own predictor

• BHT is table of “Predictors”– Could be 1-bit or more

– Indexed by PC address of Branch

• Problem: in a loop, 1-bit BHT will cause two mispredictions (avg is 9 iterations before exit):

– End of loop case: when it exits loop– First time through loop, it predicts exit instead of looping

• most schemes use at least 2 bit predictors• Performance = ƒ(accuracy, cost of misprediction)

– Misprediction Flush Reorder Buffer

• In Fetch state of branch:– Use Predictor to make prediction

• When branch completes– Update corresponding Predictor

Predictor 0

Predictor 7

Predictor 1Branch PC

Page 14: Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques

branch.1410/14

Branch History Table OrganizationTarget PC calculation takes time

4K-entry BHT, 2 bits/entry, ~80-90% correct predictions

0 0Fetch PC

Branch? Target PC

+

I-Cache

Opcode offset

Instruction

k

BHT Index

2k-entryBHT,2 bits/entry

Taken/¬Taken?

Page 15: Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques

branch.1510/14

• Better Solution: 2-bit scheme where change prediction only if get misprediction twice:

• Red: stop, not taken

• Green: go, taken

• Adds hysteresis to decision making process

2-bit Dynamic Branch Predictionmore accurate than 1-bit

T

T

NT

Predict Taken

Predict Not Taken

Predict Taken

Predict Not TakenT

NT

T

NT

NT

Page 16: Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques

branch.1610/14

BTB: Branch Address at Same Time as Prediction

• Branch Target Buffer (BTB): Address of branch index to get prediction AND branch address (if taken)

Branch PC Predicted PC

=?

PC

of in

stru

ctio

nFETC

H

prediction statebits

Yes: instruction is branch and use predicted PC as next PC

No: branch not predicted, proceed normally

(Next PC = PC+4)Only predicted taken branches and jumps held in BTB

Next PC determined before branch fetched and decodedlater: check prediction, if wrong kill instruction, update BPb

Page 17: Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques

branch.1710/14

BTB contains only Branch & Jump Instructions

BTB contains information for branch and jump instructions only not updated for other instructions

For all other instructions the next PC is PC+4 !

Achieved without decoding instruction

Page 18: Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques

branch.1810/14

Combining BTB and BHT• BTB entries considerably more expensive than BHT,

fetch redirected earlier in pipeline - can accelerate indirect branches (JR)

• BHT can hold many more entries - more accurate

A PC Generation/MuxP Instruction Fetch Stage 1F Instruction Fetch Stage 2B Branch Address Calc/Begin DecodeI Complete DecodeJ Steer Instructions to Functional unitsR Register File ReadE Integer Execute

BTB

BHTBHT in later pipeline stage corrects when BTB misses a predicted taken branch

BTB/BHT only updated after branch resolves in E stage

Page 19: Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques

branch.1910/14

Subroutine Return Stack• Small stack – accelerate subroutine returns

• more accurate than BTBs.

Push return address when function call executed

Pop return address when subroutine return decoded

&nexta

&nextb

&nextc k entries(typically k=8-16)

Page 20: Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques

branch.2010/14

Mispredict Recovery

In-order execution machines:– Instructions issued after branch cannot write-back before branch

resolves

– all instructions in pipeline behind mispredicted branch Killed

Page 21: Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques

branch.2110/14

• Avoid branch prediction by turning branches into conditionally executed instructions:

if (x) then A = B op C else NOP– If false, then neither store result nor cause exception

– Expanded ISA of Alpha, MIPS, PowerPC, SPARC have conditional move; PA-RISC can annul any following instr.

– IA-64: 64 1-bit condition fields selected so conditional execution of any instruction

– This transformation is called “if-conversion”

• Drawbacks to conditional instructions– Still takes a clock even if “annulled”

– Stall if condition evaluated late

– Complex conditions reduce effectiveness; condition becomes known late in pipeline

x

A = B op C

Predicated Execution

Page 22: Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques

branch.2210/14

Accuracy v. Size (SPEC89)

Page 23: Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques

branch.2310/14

Dynamic Branch Prediction Summary

• Prediction becoming important part of scalar execution

• Branch History Table: 2 bits for loop accuracy

• Correlation: Recently executed branches correlated with next branch.

• Tournament Predictor: more resources to competitive solutions and pick between them

• Branch Target Buffer: include branch address & prediction

• Predicated Execution can reduce number of branches, number of mispredicted branches

• Return address stack for prediction of indirect jump