architecture of aca

Upload: jesse-english

Post on 02-Jun-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 Architecture of Aca

    1/20

    1

    Branch Prediction

    BTB basics, return addressprediction, correlating prediction

    Blah blah blah

  • 8/10/2019 Architecture of Aca

    2/20

    Why what??

    All unwanted creepy things happen in pipeline due tostalls and other unwanted things which are in bondwith one another.

    We wanna break this bond and achieve our ultimate

    aim of getting pipeline ideal CPI which offcourse weknow as impossible

    Human tendency wat to do ..??we challenge

    So here I am as a human thinking on handling branch

    problems and stuffsActually henessey and patterson have done I amredoing it

    2

  • 8/10/2019 Architecture of Aca

    3/20

    Speculation :wat do u mean by tat?Speculative executionis an optimizationtechnique

    where a computer systemperforms some task thatmay not be actually needed. The main idea is to dowork beforeit is known whether that work will beneeded at all, so as to prevent a delay that would

    have to be incurred by doing the workafter

    it isknown whether it is needed. If it turns out the workwas not needed after all, any changes made by thework are reverted and the results are ignored

    3

    http://en.wikipedia.org/wiki/Optimization_(computer_science)http://en.wikipedia.org/wiki/Computer_systemhttp://en.wikipedia.org/wiki/Computer_systemhttp://en.wikipedia.org/wiki/Optimization_(computer_science)
  • 8/10/2019 Architecture of Aca

    4/20

    4

    Branch Target BufferBranch Target Buffer (BTB): Address of branch index toget prediction AND branch address (if taken) Note: must check for branch match now, since cant use wrong

    branch address

    Example: BTB combined with BHTBranch PC Predicted PC

    =?

    PC

    of

    instruction

    FETCH

    Extraprediction state

    bits

    Yes: instruction isbranch and usepredicted PC asnext PC

    No: branch notpredicted, proceed normally

    (Next PC = PC+4)

  • 8/10/2019 Architecture of Aca

    5/20

    5

    Wat does btb do??

    The PC of the instruction being fetched is matchedagainst a set of instruction addresses stored in thefirst column; these represent the

    addresses of known branches. If the PC matches

    one of these entries, then the instructionbeing fetched is a taken branch, and the secondfield, predicted PC, contains the prediction for thenext PC after the branch. Fetching begins

    immediately at that address. Thethird field, which is optional, may be used for extraprediction state bits.

  • 8/10/2019 Architecture of Aca

    6/20

    6

    Return A resses Pre ictionWhy?

    Register indirect branch hard to predictaddress Many callers, one callee

    Jump to multiple return addresses from a single

    address (no PC-target correlation)SPEC89 85% such branches for procedurereturn

    Since stack discipline for procedures, savereturn address in small buffer that acts likea stack: 8 to 16 entries has small miss rate

  • 8/10/2019 Architecture of Aca

    7/20

    7

    Branch Target BufferBranch Target Buffer (BTB): Address of branch index toget prediction AND branch address (if taken) Note: must check for branch match now, since cant use wrong

    branch address

    Example: BTB combined with BHTBranch PC Predicted PC

    =?

    PC

    of

    instruction

    FETCH

    Extraprediction state

    bits

    Yes: instruction isbranch and usepredicted PC asnext PC

    No: branch notpredicted, proceed normally

    (Next PC = PC+4)

  • 8/10/2019 Architecture of Aca

    8/20

    HOW???return address stack [ras]

    return address stacks are also very simple: they're fixed sizestacks of return addresses.

    to use a return address stack, we push pc+4 onto the stack whenwe execute a procedure call instruction. this pushes the returnaddress of the call instruction onto the stack - when the call is

    finished, it will return to pc+4 of the procedure call instruction.when we execute a return instruction, we pop an address off thestack, and predict that the return instruction will return to thepopped address.

    since return instructions almost always return to the last

    procedure call instruction, return address stacks are highlyaccurate.

    remember that return address stacks only generate predictionsfor returninstructions. they don't help at all for procedure callinstructions [we use the btb to predict calls].

    8

  • 8/10/2019 Architecture of Aca

    9/20

    problems

    suppose i have a standard 5-stage pipeline wherebranches and jumps are resolved in the execute stage.20% of my instructions are branches, and they'retaken 60% of the time. 5% of my instructions arereturn instructions [return instructions are notbranches!].

    what's the cpi of my system if i always stall myprocessor on branches and jumps? what if i alwayspredict that branches and jumps are taken?

    My idea waste to consider CPI chuck parellellism enjoywith serial exeuction

    But again human tendency ????i am a human ..i dontgive up kind of attitude ..

    9

  • 8/10/2019 Architecture of Aca

    10/20

    What to do ??

    Integrated Instruction Fetch UnitsTo meet the demands of multiple-issue processors, many recentdesigners have

    chosen to implement an integrated instruction fetch unit, as aseparate autonomous unit that feeds instructions to the rest of

    the pipeline. Essentially, thisamounts to recognizing that characterizing instruction fetch asa simple single

    pipe stage given the complexities of multiple issue is no longervalid.

    Instead, recent designs have used an integrated instructionfetch unit that integrates several functions:

    10

  • 8/10/2019 Architecture of Aca

    11/20

    1. Integrated branch predictionThe branchpredictor becomes part of the

    instruction fetch unit and is constantlypredicting branches, so as to drive the

    fetch pipeline.

    2. Instruction prefetchTo deliver multipleinstructions per clock, the

    instruction fetch unit will likely need to

    fetch ahead. The unit autonomouslymanages the prefetching of instructions (seeChapter 5 )

    Bcuz patterson says so11

  • 8/10/2019 Architecture of Aca

    12/20

    Instruction memory access and bufferingWhenfetching multiple instructions per cycle a variety ofcomplexities are encountered, including thedifficulty that fetching multiple instructions mayrequire accessing multiple cache

    lines. The instruction fetch unit encapsulates thiscomplexity, using prefetchto try to hide the cost of crossing cache blocks. Theinstruction fetch unit also

    provides buffering, essentially acting as an on-demand unit to provideinstructions to the issue stage as needed and in thequantity needed

    12

  • 8/10/2019 Architecture of Aca

    13/20

    Terminology ???ROB

    Which place bank ormint??

    Sorry its read onlybuffer

    So wat do u mean bytat ??

    13

  • 8/10/2019 Architecture of Aca

    14/20

    14

    A re-order buffer(ROB) is used in a Tomasuloalgorithmfor out-of-orderinstruction execution. Itallows instructions to be committed in-order.

    Normally, there are three stages of instructions:"Issue", "Execute", "Write Result". In Tomasuloalgorithm, there is an additional stage "Commit". Inthis stage, the results of instructions will be stored in a

    register or memory. In the "Write Result" stage, theresults are just put in the re-order buffer. All contentsin this buffer can then be used when executing otherinstructions depending on these

    http://en.wikipedia.org/wiki/Tomasulo_algorithmhttp://en.wikipedia.org/wiki/Tomasulo_algorithmhttp://en.wikipedia.org/wiki/Out-of-order_executionhttp://en.wikipedia.org/wiki/Tomasulo_algorithmhttp://en.wikipedia.org/wiki/Tomasulo_algorithmhttp://en.wikipedia.org/wiki/Tomasulo_algorithmhttp://en.wikipedia.org/wiki/Tomasulo_algorithmhttp://en.wikipedia.org/wiki/Out-of-order_executionhttp://en.wikipedia.org/wiki/Out-of-order_executionhttp://en.wikipedia.org/wiki/Out-of-order_executionhttp://en.wikipedia.org/wiki/Out-of-order_executionhttp://en.wikipedia.org/wiki/Out-of-order_executionhttp://en.wikipedia.org/wiki/Tomasulo_algorithmhttp://en.wikipedia.org/wiki/Tomasulo_algorithm
  • 8/10/2019 Architecture of Aca

    15/20

    15

    Register Renaming versus Reorder BuffersWith the addition of speculation, register values may alsotemporarily reside in the ROB. In either case, if the

    processor does not issue new instructions for a period of time, allexistinginstructions will commit, and the register values will appear in theregister file,which directly corresponds to the architecturally visible registers.

    So wat ??In the register-renaming approach, an extended set of physical registers isused to hold both the architecturally visible registers as well as temporaryvalues.

    Thus, the extended registers replace the function of both the ROB and thereservation stations. During instruction issue, a renaming process maps thenames ofarchitectural registers to physical register numbers in the extended register set,allocating a new unused register for the destination.

  • 8/10/2019 Architecture of Aca

    16/20

    16

    An advantage ????of the renaming approach versus the ROB approach is that

    instruction commit is simplified, since it requires only two simpleactions: recordthat the mapping between an architectural register number andphysical register

    number is no longer speculative, and free up any physicalregisters being used tohold the older value of the architectural register. In a designwith reservation

    stations, a station is freed up when the instruction using itcompletes execution,and a ROB entry is freed up when the corresponding instructioncommits

  • 8/10/2019 Architecture of Aca

    17/20

    17

    How do we ever know which registers arethe architectural registers if they are constantly changing? Most ofthe time when

    the program is executing it does not matter. There are clearly cases,however,where another process, such as the operating system, must be able to knowexactly where the contents of a certain architectural register reside. Tounderstandhow this capability is provided, assume the processor does not issueinstructionsfor some period of time. Eventually all instructions in the pipeline willcommit,and the mapping between the architecturally visible registers and physicalregisters will become stable. At that point, a subset of the physical registers

    containsthe architecturally visible registers, and the value of any physical register notassociated with an architectural register is unneeded. It is then easy to movethearchitectural registers to a fixed subset of physical registers so that the

    values canbe communicated to another process.

    H M h t S l t

  • 8/10/2019 Architecture of Aca

    18/20

    18

    How Much to SpeculateOne of the significant advantages of speculation is its ability to uncover eventsthat would otherwise stall the pipeline early, such as cache misses. This potentialadvantage, however, comes with a significant potential disadvantage. Speculationis not free: it takes time and energy, and the recovery of incorrect speculation further reducesperformance. In addition, to support the higher instruction execution

    rate needed to benefit from speculation, the processor must have additionalresources, which take silicon area and power. Finally, if speculation causes anexceptional event to occur, such as a cache or TLB miss, the potential for significant performanceloss increases, if that event would not have occurred withoutspeculation.To maintain most of the advantage, while minimizing the disadvantages, mostpipelines with speculation will allow only low-cost exceptional events (such as afirst-level cache miss) to be handled in speculative mode. If an expensive exceptional eventoccurs, such as a second-level cache miss or a translation lookasidebuffer (TLB) miss, the processor will wait until the instruction causing the eventis no longer speculative before handling the event. Although this may slightlydegrade the performance of some programs, it avoids significant performancelosses in others, especially those that suffer from a high frequency of such events

    coupled with less-than-excellent branch prediction

  • 8/10/2019 Architecture of Aca

    19/20

  • 8/10/2019 Architecture of Aca

    20/20

    20

    Accuracy of Return Address Predictor