lecture 5. dynamic scheduling ii

Download Lecture 5. Dynamic Scheduling II

Post on 01-Feb-2016

27 views

Category:

Documents

0 download

Embed Size (px)

DESCRIPTION

COM515 Advanced Computer Architecture. Lecture 5. Dynamic Scheduling II. Prof. Taeweon Suh Computer Science Education Korea University. Modern Processors. Branch Prediction results in speculative execution - PowerPoint PPT Presentation

TRANSCRIPT

  • Lecture 5. Dynamic Scheduling IIProf. Taeweon SuhComputer Science EducationKorea UniversityCOM515 Advanced Computer Architecture

    Korea Univ

    *Modern ProcessorsBranch Prediction results in speculative execution Speculative instructions (if wrongly speculated) must not alter the architecture statesArchitecture RegistersMemoryRequirement of precise exception/interrupts

    Prof. Sean Lees Slide

    Korea Univ

    *Modern Out-of-Order CoreROBRegister Alias Table renames architecture registersAllocate instructionsReorder Buffer maintains state information (physical registers) for precise interrupts and speculative executionReservation Station issues instructions to functional unitsArchitectural register fileLSQLoad Store Queue maintains memory access orderingProf. Sean Lees Slide

    Korea Univ

    *Register RenamingR0ArchitecturalRegistersR1R2R3R4R5R6R7No FalseDependencies!Adapted from Prof. G. Lohs SlidesSandy Bridge:160 PRs for INT144 PRs for FP

    Korea Univ

    *Register RenamingDest = Src1 op Src2MappingMechanism

    TagS1 op TagS2Src1 TagS1Src2 TagS2TagD =Repeat for each instructionAdapted from Prof. G. Lohs Slides

    Korea Univ

    *Register Alias Table (RAT)Use a lookup table for renamingOne entry per architectural registerEach entry maps to the most recent version of the architectural register, could be in Physical register fileArchitectural register fileProf. Sean Lees Slide

    Korea Univ

    *RAT ExampleR1 = R2 + R3T13, T14, T15, T16Free Physical RegsT13 = R2 + R3T14, T15, T16R5 = R4 R1T14 = R4 T13R1 = R1 * R5T15, T16T15 = T13 * T14R2 = R5 / R1T16T16 = T14 / T15Adapted from Prof. G. Lohs Slides

    Korea Univ

    *Superscalar RenameR1 = R2 + R3R4 = R5 R7R3 = R0 / R2R5 = Ld 12[R6]RATT16T39T14T5Dont renameimmediatesFor N-widesuperscalar:2N RAT read-portsN RAT write-portsProf. Sean Lees SlideT23T7T16X

    Korea Univ

    *Intra-Group DependenciesR2 = R2 + R3R4 = R5 R7R3 = R0 / R2R5 = Ld 12[R6]RATT10T31T19T6From freeregister poolProf. Sean Lees SlideT16T39T14T5T23T7T16X

    Korea Univ

    *Intra-Group DependenciesR1 = R2 + R1R2 = R1 R2R1 = R2 / R1R1 = R2 >> R1RATT16T34T34T16T16T34T16T34Correct final renamed registersModified from Prof. Sean Lees Slide

    Korea Univ

    *Resolving Intra-Group DependenciesRATFrom freeregister poolIntra-GroupDependencyCheckerInst 0Inst 1Inst 2Inst 3Src LSrc RDestT0LT1LT2LT3LT0RT1RT2RT3RPdst0Pdst1Pdst2Adapted from Prof. G. Lohs Slides

    Korea Univ

    *Intra-Group Dependency CheckingPdst0Pdst1Pdst2dst0dst1dst2Adapted from Prof. G. Lohs Slides

    Korea Univ

    *Mapping SelectionR1 = R2 + R1R2 = R1 R2R1 = R2 / R1R1 = R2 >> R1Only this mappingfor R1 should bewritten into the RATCondition: use mappingif instruction is lastwriter to the registerAdapted from Prof. G. Lohs Slides

    Korea Univ

    *Issue with Imprecise Interruptadd instructions take one cycleE.g.,Load (left side) induces a data page fault;If out-of-order completion is allowedR10 and r12 will be modified Wrong values will be used by the re-issued loadInterrupt classesProgram interrupts (exceptions or traps)External interrupts (asynchronous)

    lw r5, 8(r10) add r10, r9, r8 add r12, r10, r7Modified from Prof. Sean Lees Slide

    Korea Univ

    *Precise InterruptsTo reflect a sequential architecture model Serially correct (think about a single issue, non-pipelined processor)Keep Precise State of an executionAll instructions before the interrupted instruction must be completedThe state should appear as if no instruction issued after the interrupted instruction The interrupted PC should be presented to the interrupt handler (restartable)Similar to branch misprediction handlingOut-of-order execution makes the ordering hardUndo what comes after an interruptProf. Sean Lees Slide

    Korea Univ

    *Why Support Precise InterruptsNeed to maintain a precise state (for recovery)

    Software debuggingI/O or timer interruptsVirtual memory (page fault)Instruction emulationVirtual machinesProf. Sean Lees Slide

    Korea Univ

    *Support Precise InterruptBuffer resultsCan reconstruct the scenario (state) as sequential executionRestart from saved PC with saved PC stateProf. Sean Lees Slide

    Korea Univ

    *Reorder Buffer (ROB) [SmithPlezkun85 88]Architecture Register File keeps In-order stateReorder Buffer (ROB)A circular bufferContains all in-flight instructionsbuffers the Lookahead stateIn-order allocation/deallocation with head/tail pointersWhen an exception occursHalt instruction issuesRevert to in-order state using RF and discard ROB resultsAlso used for branch misprediction recoveryPentium Pro/II/III integrates physical register file within ROBPentium 4 decouples ROB and physical register fileModified from Prof. Sean Lees Slide

    Korea Univ

    *ROB (with physical registers)VData (physical register)Exp eventRegDstDone?Spec?PCHead(oldest instruction)Tail(next inst to be allocated)Sandy Bridge : 168-entry ROBProf. Sean Lees Slide

    Korea Univ

    *Handling Precise Interrupts100100xA0040000R2R1=R1+10R2=R2*2100xA0080000FR1FR1=FR2/0.010111R1111R21ARFR3111R3R4234Prof. Sean Lees Slide

    Korea Univ

    *Handling Precise Interrupts0100xA0040000R2R2=R2*2100xA0080000FR1FR1=FR2/0.0100xA00C0000R3R3=R3+11R1111R21ARFR3111R3R4234Prof. Sean Lees Slide

    Korea Univ

    *Handling Precise Interrupts0100xA0040000R2R2=R2*2100xA0080000FR1FR1=FR2/0.0101xA00C0000R3R3=R3+1100xA0100000R44 R4=R4*21R1111R21ARFR3111R3R4234Prof. Sean Lees Slide

    Korea Univ

    *Handling Precise Interrupts0100xA0040000R2R2=R2*2100xA0080010FR1FR1=FR2/0.0101xA00C0000R3R3=R3+1101xA0100000R44 R4=R4*28100xA0140000FR4 FR4=FR4*2.0141R1111R21ARFR3111R3R42344Prof. Sean Lees Slide

    Korea Univ

    *Handling Precise Interrupts0100xA0080010FR1FR1=FR2/0.0101xA00C0000R3R3=R3+1101xA0100000R44 R4=R4*28100xA0140000FR4 FR4=FR4*2.001R1111R21ARFR3111R3R4434Prof. Sean Lees Slide

    Korea Univ

    *Handling Precise Interrupts0100xA0080010FR1FR1=FR2/0.0101xA00C0000R3R3=R3+1101xA0100000R44 R4=R4*28100xA0140000FR4 FR4=FR4*2.00Back up PCand current RFThese values were not committed into RF1R1111R21ARFR3111R3R4434Prof. Sean Lees SlideDepending on the Exception, process will either abort or instruction will be resumed from this excepting instruction

    Korea Univ

    *Handling Speculative Execution100100xB0040000R1=R1+10BEQ R1,R0,L11R11R21ARFR3111R3R4234Prof. Sean Lees Slide

    Korea Univ

    *Handling Speculative Execution100100xB0040000R1=R1+10BEQ R1,R0,L1111xC1000000R2=R3

Recommended

View more >