architecture of aca

8/10/2019 Architecture of Aca

1/20

1

Branch Prediction

BTB basics, return addressprediction, correlating prediction

Blah blah blah


2/20

Why what??

All unwanted creepy things happen in pipeline due tostalls and other unwanted things which are in bondwith one another.

We wanna break this bond and achieve our ultimate

aim of getting pipeline ideal CPI which offcourse weknow as impossible

Human tendency wat to do ..??we challenge

So here I am as a human thinking on handling branch

problems and stuffsActually henessey and patterson have done I amredoing it

2


3/20

Speculation :wat do u mean by tat?Speculative executionis an optimizationtechnique

where a computer systemperforms some task thatmay not be actually needed. The main idea is to dowork beforeit is known whether that work will beneeded at all, so as to prevent a delay that would

have to be incurred by doing the workafter

it isknown whether it is needed. If it turns out the workwas not needed after all, any changes made by thework are reverted and the results are ignored

3
http://en.wikipedia.org/wiki/Optimization_(computer_science)http://en.wikipedia.org/wiki/Computer_systemhttp://en.wikipedia.org/wiki/Computer_systemhttp://en.wikipedia.org/wiki/Optimization_(computer_science)


4/20

4

Branch Target BufferBranch Target Buffer (BTB): Address of branch index toget prediction AND branch address (if taken) Note: must check for branch match now, since cant use wrong

branch address

Example: BTB combined with BHTBranch PC Predicted PC

=?

PC

of

instruction

FETCH

Extraprediction state

bits

Yes: instruction isbranch and usepredicted PC asnext PC

No: branch notpredicted, proceed normally

(Next PC = PC+4)


5/20

5

Wat does btb do??

The PC of the instruction being fetched is matchedagainst a set of instruction addresses stored in thefirst column; these represent the

addresses of known branches. If the PC matches

one of these entries, then the instructionbeing fetched is a taken branch, and the secondfield, predicted PC, contains the prediction for thenext PC after the branch. Fetching begins

immediately at that address. Thethird field, which is optional, may be used for extraprediction state bits.


6/20

6

Return A resses Pre ictionWhy?

Register indirect branch hard to predictaddress Many callers, one callee

Jump to multiple return addresses from a single

address (no PC-target correlation)SPEC89 85% such branches for procedurereturn

Since stack discipline for procedures, savereturn address in small buffer that acts likea stack: 8 to 16 entries has small miss rate


7/20

7

Branch Target BufferBranch Target Buffer (BTB): Address of branch index toget prediction AND branch address (if taken) Note: must check for branch match now, since cant use wrong

branch address

Example: BTB combined with BHTBranch PC Predicted PC

=?

PC

of

instruction

FETCH

Extraprediction state

bits

Yes: instruction isbranch and usepredicted PC asnext PC

No: branch notpredicted, proceed normally

(Next PC = PC+4)


8/20

HOW???return address stack [ras]

return address stacks are also very simple: they're fixed sizestacks of return addresses.

to use a return address stack, we push pc+4 onto the stack whenwe execute a procedure call instruction. this pushes the returnaddress of the call instruction onto the stack - when the call is

finished, it will return to pc+4 of the procedure call instruction.when we execute a return instruction, we pop an address off thestack, and predict that the return instruction will return to thepopped address.

since return instructions almost always return to the last

procedure call instruction, return address stacks are highlyaccurate.

remember that return address stacks only generate predictionsfor returninstructions. they don't help at all for procedure callinstructions [we use the btb to predict calls].

8


9/20

problems

suppose i have a standard 5-stage pipeline wherebranches and jumps are resolved in the execute stage.20% of my instructions are branches, and they'retaken 60% of the time. 5% of my instructions arereturn instructions [return instructions are notbranches!].

what's the cpi of my system if i always stall myprocessor on branches and jumps? what if i alwayspredict that branches and jumps are taken?

My idea waste to consider CPI chuck parellellism enjoywith serial exeuction

But again human tendency ????i am a human ..i dontgive up kind of attitude ..

9


10/20

What to do ??

Integrated Instruction Fetch UnitsTo meet the demands of multiple-issue processors, many recentdesigners have

chosen to implement an integrated instruction fetch unit, as aseparate autonomous unit that feeds instructions to the rest of

the pipeline. Essentially, thisamounts to recognizing that characterizing instruction fetch asa simple single

pipe stage given the complexities of multiple issue is no longervalid.

Instead, recent designs have used an integrated instructionfetch unit that integrates several functions:

10


11/20

1. Integrated branch predictionThe branchpredictor becomes part of the

instruction fetch unit and is constantlypredicting branches, so as to drive the

fetch pipeline.

2. Instruction prefetchTo deliver multipleinstructions per clock, the

instruction fetch unit will likely need to

fetch ahead. The unit autonomouslymanages the prefetching of instructions (seeChapter 5 )

Bcuz patterson says so11


12/20

Instruction memory access and bufferingWhenfetching multiple instructions per cycle a variety ofcomplexities are encountered, including thedifficulty that fetching multiple instructions mayrequire accessing multiple cache

lines. The instruction fetch unit encapsulates thiscomplexity, using prefetchto try to hide the cost of crossing cache blocks. Theinstruction fetch unit also

provides buffering, essentially acting as an on-demand unit to provideinstructions to the issue stage as needed and in thequantity needed

12


13/20

Terminology ???ROB

Which place bank ormint??

Sorry its read onlybuffer

So wat do u mean bytat ??

13


14/20

14

A re-order buffer(ROB) is used in a Tomasuloalgorithmfor out-of-orderinstruction execution. Itallows instructions to be committed in-order.

Normally, there are three stages of instructions:"Issue", "Execute", "Write Result". In Tomasuloalgorithm, there is an additional stage "Commit". Inthis stage, the results of instructions will be stored in a

register or memory. In the "Write Result" stage, theresults are just put in the re-order buffer. All contentsin this buffer can then be used when executing otherinstructions depending on these
http://en.wikipedia.org/wiki/Tomasulo_algorithmhttp://en.wikipedia.org/wiki/Tomasulo_algorithmhttp://en.wikipedia.org/wiki/Out-of-order_executionhttp://en.wikipedia.org/wiki/Tomasulo_algorithmhttp://en.wikipedia.org/wiki/Tomasulo_algorithmhttp://en.wikipedia.org/wiki/Tomasulo_algorithmhttp://en.wikipedia.org/wiki/Tomasulo_algorithmhttp://en.wikipedia.org/wiki/Out-of-order_executionhttp://en.wikipedia.org/wiki/Out-of-order_executionhttp://en.wikipedia.org/wiki/Out-of-order_executionhttp://en.wikipedia.org/wiki/Out-of-order_executionhttp://en.wikipedia.org/wiki/Out-of-order_executionhttp://en.wikipedia.org/wiki/Tomasulo_algorithmhttp://en.wikipedia.org/wiki/Tomasulo_algorithm


15/20

15

Register Renaming versus Reorder BuffersWith the addition of speculation, register values may alsotemporarily reside in the ROB. In either case, if the

processor does not issue new instructions for a period of time, allexistinginstructions will commit, and the register values will appear in theregister file,which directly corresponds to the architecturally visible registers.

So wat ??In the register-renaming approach, an extended set of physical registers isused to hold both the architecturally visible registers as well as temporaryvalues.

Thus, the extended registers replace the function of both the ROB and thereservation stations. During instruction issue, a renaming process maps thenames ofarchitectural registers to physical register numbers in the extended register set,allocating a new unused register for the destination.


16/20

16

An advantage ????of the renaming approach versus the ROB approach is that

instruction commit is simplified, since it requires only two simpleactions: recordthat the mapping between an architectural register number andphysical register

number is no longer speculative, and free up any physicalregisters being used tohold the older value of the architectural register. In a designwith reservation

stations, a station is freed up when the instruction using itcompletes execution,and a ROB entry is freed up when the corresponding instructioncommits


17/20

17

How do we ever know which registers arethe architectural registers if they are constantly changing? Most ofthe time when

the program is executing it does not matter. There are clearly cases,however,where another process, such as the operating system, must be able to knowexactly where the contents of a certain architectural register reside. Tounderstandhow this capability is provided, assume the processor does not issueinstructionsfor some period of time. Eventually all instructions in the pipeline willcommit,and the mapping between the architecturally visible registers and physicalregisters will become stable. At that point, a subset of the physical registers

containsthe architecturally visible registers, and the value of any physical register notassociated with an architectural register is unneeded. It is then easy to movethearchitectural registers to a fixed subset of physical registers so that the

values canbe communicated to another process.

H M h t S l t


18/20

18

How Much to SpeculateOne of the significant advantages of speculation is its ability to uncover eventsthat would otherwise stall the pipeline early, such as cache misses. This potentialadvantage, however, comes with a significant potential disadvantage. Speculationis not free: it takes time and energy, and the recovery of incorrect speculation further reducesperformance. In addition, to support the higher instruction execution

rate needed to benefit from speculation, the processor must have additionalresources, which take silicon area and power. Finally, if speculation causes anexceptional event to occur, such as a cache or TLB miss, the potential for significant performanceloss increases, if that event would not have occurred withoutspeculation.To maintain most of the advantage, while minimizing the disadvantages, mostpipelines with speculation will allow only low-cost exceptional events (such as afirst-level cache miss) to be handled in speculative mode. If an expensive exceptional eventoccurs, such as a second-level cache miss or a translation lookasidebuffer (TLB) miss, the processor will wait until the instruction causing the eventis no longer speculative before handling the event. Although this may slightlydegrade the performance of some programs, it avoids significant performancelosses in others, especially those that suffer from a high frequency of such events

coupled with less-than-excellent branch prediction


19/20


20/20

20

Accuracy of Return Address Predictor

architecture of aca

Documents