a 256 kbits l-tage branch predictor

29
1 André Seznec Caps Team IRISA/INRIA A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC

Upload: bridie

Post on 12-Jan-2016

78 views

Category:

Documents


6 download

DESCRIPTION

A 256 Kbits L-TAGE branch predictor. André Seznec IRISA/INRIA/HIPEAC. Directly derived from : A case for (partially) tagged branch predictors , A. Seznec and P. Michaud JILP Feb. 2006 + Tricks: Loop predictor Kernel/user histories. TAGE: TAgged GEometric history length predictors. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A 256 Kbits L-TAGE branch predictor

1

André Seznec Caps Team

IRISA/INRIA

A 256 Kbits L-TAGE branch predictor

André Seznec

IRISA/INRIA/HIPEAC

Page 2: A 256 Kbits L-TAGE branch predictor

2

André SeznecCaps Team

Irisa

Directly derived from:

A case for (partially) tagged branch predictors, A. Seznec and P. Michaud JILP Feb. 2006

+Tricks:

Loop predictorKernel/user histories

Page 3: A 256 Kbits L-TAGE branch predictor

3

André SeznecCaps Team

Irisa

TAGE:TAgged GEometric history length predictors

The genesis

Page 4: A 256 Kbits L-TAGE branch predictor

4

André SeznecCaps Team

Irisa

Back around 2003

2bcgskew was state-of-the-art, but: but was lagging behind neural inspired

predictors on a few benchmarks Just wanted to get best of both behaviors

and maintain: Reasonable implementation cost:

• Use only global history • Medium number of tables

In-time response

Page 5: A 256 Kbits L-TAGE branch predictor

5

André SeznecCaps Team

Irisa

L(0) ?

L(4)

L(3)

L(2)L(1)

TOT1

T2

T3

T4

The basis : A Multiple length global history predictor

Page 6: A 256 Kbits L-TAGE branch predictor

6

André SeznecCaps Team

Irisa

GEometric History Length predictor

L(1)1iαL(i)

0 L(0)

The set of history lengths forms a geometric series

What is important: L(i)-L(i-1) is drastically increasing

most of the storage for short history !!

{0, 2, 4, 8, 16, 32, 64, 128}

Capture correlation on very long histories

Page 7: A 256 Kbits L-TAGE branch predictor

7

André SeznecCaps Team

Irisa

Combining multiple predictions ?

Classical solution: Use of a meta predictor

“wasting” storage !?! chosing among 5 or 10 predictions ??

Neural inspired predictors, Jimenez and Lin 2001 Use an adder tree instead of a meta-predictor

Partial matching Use tagged tables and the longest matching historyChen et al 96, Michaud 2005

Page 8: A 256 Kbits L-TAGE branch predictor

8

André SeznecCaps Team

Irisa

L(0) ∑

L(4)

L(3)

L(2)L(1)

TOT1

T2

T3

T4

CBP-1 (2004): OGEHL

Final computation through a sum

Prediction=Sign

12 components 3.670 misp/KI

Page 9: A 256 Kbits L-TAGE branch predictor

9

André SeznecCaps Team

Irisa

pc h[0:L1]

ctru

tag

hash hash

=?

ctru

tag

hash hash

=?

ctru

tag

hash hash

=?

prediction

pc pc h[0:L2] pc h[0:L3]

11 1 1 1 1 1

1

1

TAGEGeometric history length + PPM-like

+ optimized update policy

Tagless base predictor

Page 10: A 256 Kbits L-TAGE branch predictor

10

André SeznecCaps Team

Irisa

=? =? =?

11 1 1 1 1 1

1

1

Hit

Hit

Altpred

Pred

Miss

Page 11: A 256 Kbits L-TAGE branch predictor

11

André SeznecCaps Team

Irisa

Prediction computation

General case: Longest matching component provides the prediction

Special case: Many mispredictions on newly allocated entries: weak Ctr

On many applications, Altpred more accurate than Pred Property dynamically monitored through a single 4-bit

counter

Page 12: A 256 Kbits L-TAGE branch predictor

12

André SeznecCaps Team

Irisa

TAGE update policy

General principle:

Minimize the footprint of the prediction.

Just update the longest history

matching component and allocate at most one entry on mispredictions

Page 13: A 256 Kbits L-TAGE branch predictor

13

André SeznecCaps Team

Irisa

A tagged table entry

Ctr: 3-bit prediction counter U: 2-bit useful counter

Was the entry recently useful ? Tag: partial tag

Tag CtrU

Page 14: A 256 Kbits L-TAGE branch predictor

14

André SeznecCaps Team

Irisa

Updating the U counter

If (Altpred ≠ Pred) then• Pred = taken : U= U + 1• Pred ≠ taken : U = U - 1

Graceful aging:Periodic shift of all U counters• implemented through the reset of a single bit

Page 15: A 256 Kbits L-TAGE branch predictor

15

André SeznecCaps Team

Irisa

Allocating a new entry on a misprediction

Find a single “useless” entry with a longer history: Priviledge the smallest possible history

• To minimize footprint But not too much

• To avoid ping-pong phenomena

Initialize Ctr as weak and U as zero

Page 16: A 256 Kbits L-TAGE branch predictor

16

André SeznecCaps Team

Irisa

Improve the global history

Address + conditional branch history: path confusion on short histories

Address + path: Direct hashing leads to path confusion

1. Represent all branches in branch history

2. Use also path history ( 1 bit per branch, limited to 16 bits)

Page 17: A 256 Kbits L-TAGE branch predictor

17

André SeznecCaps Team

Irisa

Design tradeoff for CBP2 (1)

13 components:Bring the best accuracy on distributed traces

• 8 components not very far !

History length:Min=4 , Max = 640

Could use any Min in [2,6] and any Max in [300, 2000]

Page 18: A 256 Kbits L-TAGE branch predictor

18

André SeznecCaps Team

Irisa

Design tradeoff for CBP2 (2)

Tag width tradeoff: (destructive) false match is better tolerated

on shorter history7 bits on T1 to 15 bits on T12

Tuning the number of table entries:Smaller number for very long historiesSmaller number for very short histories

Page 19: A 256 Kbits L-TAGE branch predictor

19

André SeznecCaps Team

Irisa

Adding a loop predictor

The loop predictor captures the number of iterations of a loop

When successively encounters 4 times the same number of iterations, the loop predictor provides the prediction.

Advantages: Very reliable Small storage budget: 256 52-bit entries

Complexity ? Might be difficult to manage speculative iteration numbers on

deep pipelines

Page 20: A 256 Kbits L-TAGE branch predictor

20

André SeznecCaps Team

Irisa

Using a kernel history and a user history

Traces mix user and kernel activities: Kernel activity after exception

• Global history pollution

Solution: use two separate global histories

User history is updated only in user mode Kernel history is updated in both modes

Page 21: A 256 Kbits L-TAGE branch predictor

21

André SeznecCaps Team

Irisa

L-TAGE submission accuracy (distributed traces)

3.314 misp/KI

Page 22: A 256 Kbits L-TAGE branch predictor

22

André SeznecCaps Team

Irisa

Reducing L-TAGE complexity

Included 241,5 Kbits TAGE predictor:

3.368 misp/KI

Loop predictor beneficial only on gzip:

Might not be worth the extra complexity

Page 23: A 256 Kbits L-TAGE branch predictor

23

André SeznecCaps Team

Irisa

Using less tables

8 components 256 Kbits TAGE predictor:3.446 misp/KI

Page 24: A 256 Kbits L-TAGE branch predictor

24

André SeznecCaps Team

Irisa

TAGE prediction computation time ?

3 successive steps: Index computation Table read Partial match + multiplexor

Does not fit on a single cycle: But can be ahead pipelined !

Page 25: A 256 Kbits L-TAGE branch predictor

25

André SeznecCaps Team

Irisa

Ahead pipelining a global history branch predictor (principle)

Initiate branch prediction X+1 cycles in advance to provide the prediction in time Use information available:

• X-block ahead instruction address• X-block ahead history

To ensure accuracy: Use intermediate path information

Page 26: A 256 Kbits L-TAGE branch predictor

26

André SeznecCaps Team

Irisa

Practice

Ahead pipelined TAGE:4// prediction computations

bc

Ha

A

A B C

Page 27: A 256 Kbits L-TAGE branch predictor

27

André SeznecCaps Team

Irisa

3-branch ahead pipelined 8 component 256 Kbits TAGE

3.552 misp/KI

Page 28: A 256 Kbits L-TAGE branch predictor

28

André SeznecCaps Team

Irisa

A final case for the Geometric History Length predictors

delivers state-of-the-art accuracy

uses only global information: Very long history: 300+ bits !!

can be ahead pipelined

many effective design points OGEHL or TAGE Nb of tables, history lengths

Page 29: A 256 Kbits L-TAGE branch predictor

29

André SeznecCaps Team

Irisa

The End