tomasulo algorithmlecture 11: case study—ricardo/courses/advtopcompsys/material/tomasulo... ·...

52
RHK.F95 1 Lecture 11: Case Study— Tomasulo Algorithm Professor Randy H. Katz Computer Science 252 Fall 1995

Upload: others

Post on 18-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 1

Lecture 11: Case Study—Tomasulo Algorithm

Professor Randy H. KatzComputer Science 252

Fall 1995

Page 2: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 2

Review: Scoreboard Summary

• Speedup 1.7 from compiler; 2.5 by hand BUT slow memory (no cache)

• Limitations of 6600 scoreboard– No forwarding– Limited to instructions in basic block (small window)– Number of functional units(structural hazards)– Wait for WAR hazards– Prevent WAW hazards

Page 3: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 3

Another Dynamic Algorithm: Tomasulo Algorithm

• For IBM 360/91 about 3 years after CDC 6600• Goal: High Performance without special compilers• Differences between IBM 360 & CDC 6600 ISA

– IBM has only 2 register specifiers/instr vs. 3 in CDC 6600– IBM has 4 FP registers vs. 8 in CDC 6600

• Differences between Tomasulo Algorithm & Scoreboard– Control & buffers distributed with Function Units vs. centralized in

scoreboard; called “reservation stations”– Registers in instructions replaced by pointers to reservation station

buffer– HW renaming of registers to avoid WAR, WAW hazards– Common Data Bus broadcasts results to all FUs– Load and Stores treated as FUs as well

Page 4: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 4

From instruction unit

Floating-pointoperations

Frommemory

Load buffersFP registers

Store buffers

To memory

654321 3

2

1

Reservationstations

FP adders FP multipliers

321

21

Common data bus (CDB)

Operation bus

Operandbuses

LoadBuffer

FPRegisters

FP Op Queue

StoreBuffer

FP AddRes.Station

FP MulRes.Station

CommonDataBus

Tomasulo Organization

Page 5: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 5

Reservation Station Components

Op—Operation to perform in the unit (e.g., + or –)Qj, Qk—Reservation stations producing source registers Vj, Vk—Value of Source operandsRj, Rk—Flags indicating when Vj, Vk are ready

Busy—Indicates reservation station and FU is busy

Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions that will write that register.

Page 6: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 6

Three Stages of Tomasulo Algorithm

1.Issue—get instruction from FP Op Queue If reservation station free, the scoreboard issues instr &

sends operands (renames registers).

2.Execution—operate on operands (EX) When both operands ready then execute;

if not ready, watch CDB for result

3.Write result—finish execution (WB) Write on Common Data Bus to all awaiting units;

mark reservation station available.

Page 7: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 7

Tomasulo Example Cycle 0

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 Load1 NoLD F2 45+ R3 Load2 NoMULTDF0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op V j Vk Qj Qk0 Add1 No0 Add2 No

Add3 No0 Mult1 No0 Mult2 No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F300 FU

Page 8: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 8

Tomasulo Example Cycle 1

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 Load1 No 34+R2LD F2 45+ R3 Load2 NoMULTDF0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op V j Vk Qj Qk0 Add1 No0 Add2 No

Add3 No0 Mult1 No0 Mult2 No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F301 FU Load1

Page 9: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 9

Tomasulo Example Cycle 2

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULTDF0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op V j Vk Qj Qk0 Add1 No0 Add2 No

Add3 No0 Mult1 No0 Mult2 No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F302 FU Load2 Load1

Page 10: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 10

Tomasulo Example Cycle 3

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op V j Vk Qj Qk0 Add1 No0 Add2 No

Add3 No0 Mult1 Yes MULTD R(F4) Load20 Mult2 No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F303 FU Mult1 Load2 Load1

Page 11: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 11

Tomasulo Example Cycle 4

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 Load2 Yes 45+R3MULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op V j Vk Qj Qk0 Add1 Yes SUBD M(34+R2) Load20 Add2 No

Add3 No0 Mult1 Yes MULTD R(F4) Load20 Mult2 No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F304 FU Mult1 Load2 M(34+R2) Add1

Page 12: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 12

Tomasulo Example Cycle 5

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 Load2 Yes 45+R3MULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op V j Vk Qj Qk0 Add1 Yes SUBD M(34+R2) Load20 Add2 No

Add3 No0 Mult1 Yes MULTD R(F4) Load20 Mult2 Yes DIVD M(34+R2) Mult1

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F305 FU Mult1 Load2 M(34+R2) Add1 Mult2

Page 13: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 13

Tomasulo Example Cycle 6

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2 6Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op V j Vk Qj Qk2 Add1 Yes SUBD M(34+R2) M(45+R3)0 Add2 Yes ADDD M(45+R3) Add1

Add3 No1 0 Mult1 Yes MULTD M(45+R3) R(F4)

0 Mult2 Yes DIVD M(34+R2) Mult1Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F306 FU Mult1 M(45+R3) Add2 Add1 Mult2

Page 14: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 14

Tomasulo Example Cycle 7

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2 6Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op V j Vk Qj Qk1 Add1 Yes SUBD M(34+R2) M(45+R3)0 Add2 Yes ADDD M(45+R3) Add1

Add3 No9 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F307 FU Mult1 M(45+R3) Add2 Add1 Mult2

Page 15: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 15

Tomasulo Example Cycle 8

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op V j Vk Qj Qk0 Add1 Yes SUBD M(34+R2) M(45+R3)0 Add2 Yes ADDD M(45+R3) Add1

Add3 No8 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F308 FU Mult1 M(45+R3) Add2 Add1 Mult2

Page 16: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 16

Tomasulo Example Cycle 9

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 8 9DIVD F10 F0 F6 5ADDD F6 F8 F2 6Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op V j Vk Qj Qk0 Add1 No0 Add2 Yes ADDD M()–M() M(45+R3)

Add3 No7 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F309 FU Mult1 M(45+R3) Add2 M()–M() Mult2

Page 17: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 17

Tomasulo Example Cycle 10

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 8 9DIVD F10 F0 F6 5ADDD F6 F8 F2 6Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op V j Vk Qj Qk0 Add1 No2 Add2 Yes ADDD M()–M() M(45+R3)

Add3 No7 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F301 0 FU Mult1 M(45+R3) Add2 M()–M() Mult2

Page 18: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 18

Tomasulo Example Cycle 11

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 8 9DIVD F10 F0 F6 5ADDD F6 F8 F2 6Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op V j Vk Qj Qk0 Add1 No1 Add2 Yes ADDD M()–M() M(45+R3)

Add3 No5 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F301 1 FU Mult1 M(45+R3) Add2 M()–M() Mult2

Page 19: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 19

Tomasulo Example Cycle 12

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 8 9DIVD F10 F0 F6 5ADDD F6 F8 F2 6 1 2Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op V j Vk Qj Qk0 Add1 No0 Add2 Yes ADDD M()–M() M(45+R3)

Add3 No4 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F301 2 FU Mult1 M(45+R3) Add2 M()–M() Mult2

Page 20: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 20

Tomasulo Example Cycle 13

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 8 9DIVD F10 F0 F6 5ADDD F6 F8 F2 6 1 2 1 3Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op V j Vk Qj Qk0 Add1 No0 Add2 No

Add3 No3 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F301 3 FU Mult1 M(45+R3) (M–M)+M() M()–M() Mult2

Page 21: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 21

Tomasulo Example Cycle 14

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 8 9DIVD F10 F0 F6 5ADDD F6 F8 F2 6 1 2 1 3Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op V j Vk Qj Qk0 Add1 No0 Add2 No

Add3 No2 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F301 4 FU Mult1 M(45+R3) (M–M)+M() M()–M() Mult2

Page 22: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 22

Tomasulo Example Cycle 15

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 8 9DIVD F10 F0 F6 5ADDD F6 F8 F2 6 1 2 1 3Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op V j Vk Qj Qk0 Add1 No0 Add2 No

Add3 No1 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F301 5 FU Mult1 M(45+R3) (M–M)+M() M()–M() Mult2

Page 23: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 23

Tomasulo Example Cycle 16

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 1 6 Load3 NoSUBD F8 F6 F2 4 8 9DIVD F10 F0 F6 5ADDD F6 F8 F2 6 1 2 1 3Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op V j Vk Qj Qk0 Add1 No0 Add2 No

Add3 No0 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F301 6 FU Mult1 M(45+R3) (M–M)+M() M()–M() Mult2

Page 24: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 24

Tomasulo Example Cycle 17

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 1 6 1 7 Load3 NoSUBD F8 F6 F2 4 8 9DIVD F10 F0 F6 5ADDD F6 F8 F2 6 1 2 1 3Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op V j Vk Qj Qk0 Add1 No0 Add2 No

Add3 No0 Mult1 No0 Mult2 Yes DIVD M*F4 M(34+R2)

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F301 7 FU M*F4 M(45+R3) (M–M)+M() M()–M() Mult2

Page 25: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 25

Tomasulo Example Cycle 18

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 1 6 1 7 Load3 NoSUBD F8 F6 F2 4 8 9DIVD F10 F0 F6 5ADDD F6 F8 F2 6 1 2 1 3Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op V j Vk Qj Qk0 Add1 No0 Add2 No

Add3 No0 Mult1 No

4 0 Mult2 Yes DIVD M*F4 M(34+R2)Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F301 8 FU M*F4 M(45+R3) (M–M)+M() M()–M() Mult2

Page 26: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 26

Tomasulo Example Cycle 57

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 1 6 1 7 Load3 NoSUBD F8 F6 F2 4 8 9DIVD F10 F0 F6 5ADDD F6 F8 F2 6 1 2 1 3Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op V j Vk Qj Qk0 Add1 No0 Add2 No

Add3 No0 Mult1 No1 Mult2 Yes DIVD M*F4 M(34+R2)

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F305 7 FU M*F4 M(45+R3) (M–M)+M() M()–M() Mult2

Page 27: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 27

Tomasulo Example Cycle 58

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 1 6 1 7 Load3 NoSUBD F8 F6 F2 4 8 9DIVD F10 F0 F6 5 5 8ADDD F6 F8 F2 6 1 2 1 3Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op V j Vk Qj Qk0 Add1 No0 Add2 No

Add3 No0 Mult1 No0 Mult2 Yes DIVD M*F4 M(34+R2)

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F305 8 FU M*F4 M(45+R3) (M–M)+M() M()–M() Mult2

Page 28: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 28

Tomasulo Example Cycle 59

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 1 6 1 7 Load3 NoSUBD F8 F6 F2 4 8 9DIVD F10 F0 F6 5 5 8 5 9ADDD F6 F8 F2 6 1 2 1 3Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op V j Vk Qj Qk0 Add1 No0 Add2 No

Add3 No0 Mult1 No0 Mult2 No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F305 9 FU M*F4 M(45+R3) (M–M)+M() M()–M() M*F4/M

Page 29: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 29

Tomasulo Loop Example

Loop: LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1 SUBI R1 R1 #8 BNEZ R1 Loop

• Multiply takes 4 clocks• Load have cache misses

Page 30: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 30

Loop Example Cycle 0Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 Load1 NoMULTDF4 F0 F2 1 Load2 NoSD F4 0 R1 1 Load3 No QiLD F0 0 R1 2 Store1 NoMULTDF4 F0 F2 2 Store2 NoSD F4 0 R1 2 Store3 NoReservation Stations S1 S2 RS for jRS for k

Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 No SUBI R1 R1 # 80 Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30

0 8 0 Qi

Page 31: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 31

Loop Example Cycle 1Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 Load1 Yes 8 0MULTDF4 F0 F2 1 Load2 NoSD F4 0 R1 1 Load3 No QiLD F0 0 R1 2 Store1 NoMULTDF4 F0 F2 2 Store2 NoSD F4 0 R1 2 Store3 NoReservation Stations S1 S2 RS for jRS for k

Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 No SUBI R1 R1 # 80 Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30

1 8 0 Qi Load1

Page 32: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 32

Loop Example Cycle 2Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 Load1 Yes 8 0MULTDF4 F0 F2 1 2 Load2 NoSD F4 0 R1 1 Load3 No QiLD F0 0 R1 2 Store1 NoMULTDF4 F0 F2 2 Store2 NoSD F4 0 R1 2 Store3 NoReservation Stations S1 S2 RS for jRS for k

Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 Yes MULTD R(F2) Load1 SUBI R1 R1 # 80 Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30

2 8 0 Qi Load1 Mult1

Page 33: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 33

Loop Example Cycle 3Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 Load1 Yes 8 0MULTDF4 F0 F2 1 2 Load2 NoSD F4 0 R1 1 3 Load3 No QiLD F0 0 R1 2 Store1 Yes 8 0 Mult1MULTDF4 F0 F2 2 Store2 NoSD F4 0 R1 2 Store3 NoReservation Stations S1 S2 RS for jRS for k

Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 Yes MULTD R(F2) Load1 SUBI R1 R1 # 80 Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30

3 8 0 Qi Load1 Mult1

Page 34: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 34

Loop Example Cycle 4Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 Load1 Yes 8 0MULTDF4 F0 F2 1 2 Load2 NoSD F4 0 R1 1 3 Load3 No QiLD F0 0 R1 2 Store1 Yes 8 0 Mult1MULTDF4 F0 F2 2 Store2 NoSD F4 0 R1 2 Store3 NoReservation Stations S1 S2 RS for jRS for k

Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 Yes MULTD R(F2) Load1 SUBI R1 R1 # 80 Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30

4 7 2 Qi Load1 Mult1

Page 35: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 35

Loop Example Cycle 5Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 Load1 Yes 8 0MULTDF4 F0 F2 1 2 Load2 NoSD F4 0 R1 1 3 Load3 No QiLD F0 0 R1 2 Store1 Yes 8 0 Mult1MULTDF4 F0 F2 2 Store2 NoSD F4 0 R1 2 Store3 NoReservation Stations S1 S2 RS for jRS for k

Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 Yes MULTD R(F2) Load1 SUBI R1 R1 # 80 Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30

5 7 2 Qi Load1 Mult1

Page 36: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 36

Loop Example Cycle 6Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 Load1 Yes 8 0MULTDF4 F0 F2 1 2 Load2 Yes 7 2SD F4 0 R1 1 3 Load3 No QiLD F0 0 R1 2 6 Store1 Yes 8 0 Mult1MULTDF4 F0 F2 2 Store2 NoSD F4 0 R1 2 Store3 NoReservation Stations S1 S2 RS for jRS for k

Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 Yes MULTD R(F2) Load1 SUBI R1 R1 # 80 Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30

6 7 2 Qi Load1 Mult1

Page 37: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 37

Loop Example Cycle 7Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 Load1 Yes 8 0MULTDF4 F0 F2 1 2 Load2 Yes 7 2SD F4 0 R1 1 3 Load3 No QiLD F0 0 R1 2 6 Store1 Yes 8 0 Mult1MULTDF4 F0 F2 2 7 Store2 NoSD F4 0 R1 2 Store3 NoReservation Stations S1 S2 RS for jRS for k

Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 Yes MULTD R(F2) Load1 SUBI R1 R1 # 80 Mult2 Yes MULTD R(F2) Load2 BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30

7 7 2 Qi Load2 Mult2

Page 38: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 38

Loop Example Cycle 8Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 Load1 Yes 8 0MULTDF4 F0 F2 1 2 Load2 Yes 7 2SD F4 0 R1 1 3 Load3 No QiLD F0 0 R1 2 6 Store1 Yes 8 0 Mult1MULTDF4 F0 F2 2 7 Store2 Yes 7 2 Mult2SD F4 0 R1 2 8 Store3 NoReservation Stations S1 S2 RS for jRS for k

Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 Yes MULTD R(F2) Load1 SUBI R1 R1 # 80 Mult2 Yes MULTD R(F2) Load2 BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30

8 7 2 Qi Load2 Mult2

Page 39: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 39

Loop Example Cycle 9Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 9 Load1 Yes 8 0MULTDF4 F0 F2 1 2 Load2 Yes 7 2SD F4 0 R1 1 3 Load3 No QiLD F0 0 R1 2 6 Store1 Yes 8 0 Mult1MULTDF4 F0 F2 2 7 Store2 Yes 7 2 Mult2SD F4 0 R1 2 8 Store3 NoReservation Stations S1 S2 RS for jRS for k

Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 Yes MULTD R(F2) Load1 SUBI R1 R1 # 80 Mult2 Yes MULTD R(F2) Load2 BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30

9 6 4 Qi Load2 Mult2

Page 40: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 40

Loop Example Cycle 10Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 9 1 0 Load1 NoMULTDF4 F0 F2 1 2 Load2 Yes 7 2SD F4 0 R1 1 3 Load3 No QiLD F0 0 R1 2 6 1 0 Store1 Yes 8 0 Mult1MULTDF4 F0 F2 2 7 Store2 Yes 7 2 Mult2SD F4 0 R1 2 8 Store3 NoReservation Stations S1 S2 RS for jRS for k

Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R14 Mult1 Yes MULTD M(80) R(F2) SUBI R1 R1 # 80 Mult2 Yes MULTD R(F2) Load2 BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30

1 0 6 4 Qi Load2 Mult2

Page 41: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 41

Loop Example Cycle 11Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 9 1 0 Load1 NoMULTDF4 F0 F2 1 2 Load2 NoSD F4 0 R1 1 3 Load3 Yes 6 4 QiLD F0 0 R1 2 6 1 0 1 1 Store1 Yes 8 0 Mult1MULTDF4 F0 F2 2 7 Store2 Yes 7 2 Mult2SD F4 0 R1 2 8 Store3 NoReservation Stations S1 S2 RS for jRS for k

Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R13 Mult1 Yes MULTD M(80) R(F2) SUBI R1 R1 # 84 Mult2 Yes MULTD M(72) R(F2) BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30

1 1 6 4 Qi Mult2

Page 42: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 42

Loop Example Cycle 12Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 9 1 0 Load1 NoMULTDF4 F0 F2 1 2 Load2 NoSD F4 0 R1 1 3 Load3 Yes 6 4 QiLD F0 0 R1 2 6 1 0 1 1 Store1 Yes 8 0 Mult1MULTDF4 F0 F2 2 7 Store2 Yes 7 2 Mult2SD F4 0 R1 2 8 Store3 NoReservation Stations S1 S2 RS for jRS for k

Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R12 Mult1 Yes MULTD M(80) R(F2) SUBI R1 R1 # 83 Mult2 Yes MULTD M(72) R(F2) BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30

1 2 6 4 Qi Mult2

Page 43: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 43

Loop Example Cycle 13Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 9 1 0 Load1 NoMULTDF4 F0 F2 1 2 Load2 NoSD F4 0 R1 1 3 Load3 Yes 6 4 QiLD F0 0 R1 2 6 1 0 1 1 Store1 Yes 8 0 Mult1MULTDF4 F0 F2 2 7 Store2 Yes 7 2 Mult2SD F4 0 R1 2 8 Store3 NoReservation Stations S1 S2 RS for jRS for k

Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R11 Mult1 Yes MULTD M(80) R(F2) SUBI R1 R1 # 82 Mult2 Yes MULTD M(72) R(F2) BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30

1 3 6 4 Qi Mult2

Page 44: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 44

Loop Example Cycle 14Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 9 1 0 Load1 NoMULTDF4 F0 F2 1 2 1 4 Load2 NoSD F4 0 R1 1 3 Load3 Yes 6 4 QiLD F0 0 R1 2 6 1 0 1 1 Store1 Yes 8 0 Mult1MULTDF4 F0 F2 2 7 Store2 Yes 7 2 Mult2SD F4 0 R1 2 8 Store3 NoReservation Stations S1 S2 RS for jRS for k

Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 Yes MULTD M(80) R(F2) SUBI R1 R1 # 81 Mult2 Yes MULTD M(72) R(F2) BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30

1 4 6 4 Qi Mult2

Page 45: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 45

Loop Example Cycle 15Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 9 1 0 Load1 NoMULTDF4 F0 F2 1 2 1 4 1 5 Load2 NoSD F4 0 R1 1 3 Load3 Yes 6 4 QiLD F0 0 R1 2 6 1 0 1 1 Store1 Yes 8 0 M(80)*R(F2)MULTDF4 F0 F2 2 7 1 5 Store2 Yes 7 2 Mult2SD F4 0 R1 2 8 Store3 NoReservation Stations S1 S2 RS for jRS for k

Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 No SUBI R1 R1 # 80 Mult2 Yes MULTD M(72) R(F2) BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30

1 5 6 4 Qi Mult2

Page 46: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 46

Loop Example Cycle 16Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 9 1 0 Load1 NoMULTDF4 F0 F2 1 2 1 4 1 5 Load2 NoSD F4 0 R1 1 3 Load3 Yes 6 4 QiLD F0 0 R1 2 6 1 0 1 1 Store1 Yes 8 0 M(80)*R(F2)MULTDF4 F0 F2 2 7 1 5 1 6 Store2 Yes 7 2 M(72)*R(72)SD F4 0 R1 2 8 Store3 NoReservation Stations S1 S2 RS for jRS for k

Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 Yes MULTD R(F2) Load3 SUBI R1 R1 # 80 Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30

1 6 6 4 Qi Mult1

Page 47: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 47

Loop Example Cycle 17Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 9 1 0 Load1 NoMULTDF4 F0 F2 1 2 1 4 1 5 Load2 NoSD F4 0 R1 1 3 Load3 Yes 6 4 QiLD F0 0 R1 2 6 1 0 1 1 Store1 Yes 8 0 M(80)*R(F2)MULTDF4 F0 F2 2 7 1 5 1 6 Store2 Yes 7 2 M(72)*R(72)SD F4 0 R1 2 8 Store3 Yes 6 4 Mult1Reservation Stations S1 S2 RS for jRS for k

Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 Yes MULTD R(F2) Load3 SUBI R1 R1 # 80 Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30

1 7 6 4 Qi Mult1

Page 48: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 48

Loop Example Cycle 18Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 9 1 0 Load1 NoMULTDF4 F0 F2 1 2 1 4 1 5 Load2 NoSD F4 0 R1 1 3 1 8 Load3 Yes 6 4 QiLD F0 0 R1 2 6 1 0 1 1 Store1 Yes 8 0 M(80)*R(F2)MULTDF4 F0 F2 2 7 1 5 1 6 Store2 Yes 7 2 M(72)*R(72)SD F4 0 R1 2 8 Store3 Yes 6 4 Mult1Reservation Stations S1 S2 RS for jRS for k

Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 Yes MULTD R(F2) Load3 SUBI R1 R1 # 80 Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30

1 8 5 6 Qi Mult1

Page 49: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 49

Loop Example Cycle 19Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 9 1 0 Load1 NoMULTDF4 F0 F2 1 2 1 4 1 5 Load2 NoSD F4 0 R1 1 3 1 8 1 9 Load3 Yes 6 4 QiLD F0 0 R1 2 6 1 0 1 1 Store1 NoMULTDF4 F0 F2 2 7 1 5 1 6 Store2 Yes 7 2 M(72)*R(72)SD F4 0 R1 2 8 Store3 Yes 6 4 Mult1Reservation Stations S1 S2 RS for jRS for k

Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 Yes MULTD R(F2) Load3 SUBI R1 R1 # 80 Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30

1 9 5 6 Qi Mult1

Page 50: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 50

Loop Example Cycle 20Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 9 1 0 Load1 NoMULTDF4 F0 F2 1 2 1 4 1 5 Load2 NoSD F4 0 R1 1 3 1 8 1 9 Load3 Yes 6 4 QiLD F0 0 R1 2 6 1 0 1 1 Store1 NoMULTDF4 F0 F2 2 7 1 5 1 6 Store2 Yes 7 2 M(72)*R(72)SD F4 0 R1 2 8 2 0 Store3 Yes 6 4 Mult1Reservation Stations S1 S2 RS for jRS for k

Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 Yes MULTD R(F2) Load3 SUBI R1 R1 # 80 Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30

2 0 5 6 Qi Mult1

Page 51: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 51

Loop Example Cycle 21Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 9 1 0 Load1 NoMULTDF4 F0 F2 1 2 1 4 1 5 Load2 NoSD F4 0 R1 1 3 1 8 1 9 Load3 Yes 6 4 QiLD F0 0 R1 2 6 1 0 1 1 Store1 NoMULTDF4 F0 F2 2 7 1 5 1 6 Store2 NoSD F4 0 R1 2 8 2 0 2 1 Store3 Yes 6 4 Mult1Reservation Stations S1 S2 RS for jRS for k

Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 Yes MULTD R(F2) Load3 SUBI R1 R1 # 80 Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30

2 1 5 6 Qi Mult1

Page 52: Tomasulo AlgorithmLecture 11: Case Study—ricardo/Courses/AdvTopCompSys/Material/tomasulo... · RHK.F95 3 Another Dynamic Algorithm: Tomasulo Algorithm • For IBM 360/91 about 3

RHK.F95 52

Tomasulo Summary

• Prevents Register as bottleneck• Avoids WAR, WAW hazards of Scoreboard• Allows loop unrolling in HW• Not limited to basic blocks (provided branch

prediction)• Lasting Contributions

– Dynamic scheduling– Register renaming– Load/store disambiguation

• Next: More branch prediction