lecture 6 tomasulo algorithm - iowa state...
TRANSCRIPT
1
1
Lecture 6 Tomasulo Algorithm
CprE 581 Computer SystemsArchitecture
Reading:Textbook (4th ed: 2.4, 2.5)(5th ed: 3.4, 3.5)
2
Tomasulo AlgorithmHistory:
1966: scoreboarding in CDC6600, implements limited dynamic schedulingThree years later: Tomasulo in IBM 360/91, introduces register renaming and reservation stationNow appearing in todays Dec Alpha, SGI MIPS, SUN UltraSparc, Intel Pentium, IBM PowerPC, and others
In today’s perspective: It implements reservation-based dynamic scheduling without data forwarding (obsolete)
The merit is the use of tag to identify instructions in flight: register renaming, reservation station, tag broadcast
It is an “algorithm” first
Adapted from UCB CS252 S98, Copyright 1998 USB
2
Assumptions
Assumptions for time being:
• No branches• Only concern is register dependence• Only floating-point instructions are
executed out-of-order
3
4
Register Renaming and Correctness1. The same set of instructions write to
user-visible register and memory;2. Each instruction receives the same
operands as in the sequential execution; and
3. Any register or memory word receives the value of the last write as in the sequential execution
3
Tomasulo Organization
5
6
Reservation Station ComponentsOp—Operation to perform in the unit (e.g., + or –)Vj, Vk—Value of Source operands Store buffer has V field, result to be stored
Qj, Qk—Reservation stations producing source registers (value to be written) Note: No ready flags as in Scoreboard; Qj,Qk=0 =>
ready Store buffers only have Qi for RS producing result
Busy—Indicates reservation station or FU is busy
How are dependences represented in reservation station?
Adapted from UCB CS252 S98B
4
7
Register Result Status
It’s a table to record the RS index of the latest pending instruction, if there is one, that writes to each register
It’s kind of register renaming: rename register to RS if there is a pending write
8
Three Stages of Tomasulo Algorithm1. Issue—get instruction from FP Op Queue
If reservation station free (no structural hazard), control issues instr & sends operands (renames registers).
2.Execution—operate on operands (EX)When both operands ready then execute;if not ready, watch Common Data Bus for result
3.Write result—finish execution (WB)Write on Common Data Bus to all awaiting units; mark reservation station available
Issue: dependence check for new instWriteback: Wakeup dependent instructions
Adapted from UCB CS252 S98
5
9
Issue Stage and RenamingSource renaming: rename the two source registers if there are pending writes
Assigns a free RS
Dest. Renaming: Updates register result status with the allocated RS
Renaming removes name dependences
10
Execute Stage
Only “ready” instructions can join the competitionThere is a select logic to select instructions for FU execution Some policy may be used, e.g. age basedNon-ready instructions can be “woken up” during writeback of its parent inst
6
11
Writeback and Common Data BusNormal data bus: data + destination (“go to” bus)Common data bus: data + source (“come from” bus) 64 bits of data + 4 bits of source index
(tag)Does the broadcast to every instruction in
flightWaiting instructions do tag matching and update their ready bits and value fields (if the tag matches theirs)
Adapted from UCB CS252 S98, Copyright 1998 USB
12
Code ExampleLD F6,34(R2)
LD F2,45(R3)
MULTI F0,F2,F4
SUBD F8,F6,F2
DIVD F10,F0,F6
ADD F6,F8,F2
LD1 LD2
MULTISUBD
DIVDADD
Operation latencies: load/store 2 cycles,Add/sub 2 cycles, Mult 10 cycles, divide 40 cycle
7
13
Tomasulo Example Cycle 0Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 Load1 NoLD F2 45+ R3 Load2 NoMULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations S1 S2 RS for j RS for k
TimeNam BusyOp Vj Vk Qj Qk0 Add1No0 Add2No0 Add3No0 Mult1No0 Mult2No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F300 FU
Adapted from UCB CS252 S98, Copyright 1998 UCB
14
Tomasulo Example Cycle 1Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 Load1 No 34+R2LD F2 45+ R3 Load2 NoMULT F0 F2 F4 Load3 NoSUBDF8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No
Add3 No0 Mult1 No0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F301 FU Load1
Yes
Adapted from UCB CS252 S98, Copyright 1998 UCB
8
15
Tomasulo Example Cycle 2
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULTDF0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No
Add3 No0 Mult1 No0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F302 FU Load2 Load1
Adapted from UCB CS252 S98, Copyright 1998 UCB
16
Tomasulo Example Cycle 3
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULT F0 F2 F4 3 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No
Add3 No0 Mult1 Yes MULTD R(F4) Load20 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F303 FU Mult1 Load2 Load1
• Note: registers names are removed (“renamed”) in Reservation Stations
• Load1 completing; what is waiting for Load1?Adapted from UCB CS252 S98, Copyright 1998 UCB
9
17
Tomasulo Example Cycle 4
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 Load2 Yes 45+R3MULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk0 Add1 Yes SUBD M(34+R2) Load20 Add2 No
Add3 No0 Mult1 Yes MULTD R(F4) Load20 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F304 FU Mult1 Load2 M(34+R2) Add1
• Load2 completing; what is waiting for it?
Adapted from UCB CS252 S98, Copyright 1998 UCB
18
Tomasulo Example Cycle 5
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk2 Add1 Yes SUBD M(34+R2) M(45+R3)0 Add2 No
Add3 No10 Mult1 Yes MULTD M(45+R3) R(F4)
0 Mult2 Yes DIVD M(34+R2) Mult1Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F305 FU Mult1 M(45+R3) M(34+R2) Add1 Mult2
Adapted from UCB CS252 S98, Copyright 1998 UCB
10
19
Tomasulo Example Cycle 6
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2 6Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk1 Add1 Yes SUBD M(34+R2) M(45+R3)0 Add2 Yes ADDD M(45+R3) Add1
Add3 No9 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F306 FU Mult1 M(45+R3) Add2 Add1 Mult2
Adapted from UCB CS252 S98, Copyright 1998 UCB
20
Tomasulo Example Cycle 7
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7DIVD F10 F0 F6 5ADDD F6 F8 F2 6Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk0 Add1 Yes SUBD M(34+R2) M(45+R3)0 Add2 Yes ADDD M(45+R3) Add1
Add3 No8 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F307 FU Mult1 M(45+R3) Add2 Add1 Mult2
• Add1 completing; what is waiting for it?
Adapted from UCB CS252 S98, Copyright 1998 UCB
11
21
Tomasulo Example Cycle 8
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTF0 F2 F4 3 Load3 NoSUBDF8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDDF6 F8 F2 6Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk0 Add1 No 2 Add2 Yes ADDD M()-M() M(45+R3)0 Add3 No7 Mult1 Yes MULTDM(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F308 FU Mult1 M(45+R3) Add2 M()-M() Mult2
Adapted from UCB CS252 S98, Copyright 1998 UCB
22
Tomasulo Example Cycle 9
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk0 Add1 No1 Add2 Yes ADDD M()–M() M(45+R3)0 Add3 No6 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F309 FU Mult1 M(45+R3) Add2 M()–M() Mult2
Adapted from UCB CS252 S98, Copyright 1998 UCB
12
23
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 Yes ADDD M()–M() M(45+R3)0 Add3 No5 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F3010 FU Mult1 M(45+R3) Add2 M()–M() Mult2
Tomasulo Example Cycle 10
• Add2 completing; what is waiting for it?
Adapted from UCB CS252 S98, Copyright 1998 UCB
24
Tomasulo Example Cycle 11
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTF0 F2 F4 3 Load3 NoSUBDF8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDDF6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No0 Add3 No4 Mult1 Yes MULTDM(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F3011 FU Mult1 M(45+R3) (M-M)+M() M()ŠM() Mult2
• Write result of ADDD here vs. scoreboard?
Adapted from UCB CS252 S98, Copyright 1998 UCB
13
25
Tomasulo Example Cycle 12
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMUL F0 F2 F4 3 Load3 NoSUB F8 F6 F2 4 7 8DIVDF10 F0 F6 5ADD F6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for k
Time Name BusyOp Vj Vk Qj Qk0 Add1 No0 Add2 No0 Add3 No3 Mult1 Yes MULTM(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F3012 FU Mult1 M(45+R3) (M-M)+M()M()–M(Mult2
Adapted from UCB CS252 S98, Copyright 1998 UCB
26
Tomasulo Example Cycle 13
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No
Add3 No2 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F3013 FU Mult1 M(45+R3) (M–M)+M() M()–M() Mult2
Adapted from UCB CS252 S98, Copyright 1998 UCB
14
27
Tomasulo Example Cycle 14
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No0 Add3 No1 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F3014 FU Mult1 M(45+R3) (M–M)+M() M()–M() Mult2
Adapted from UCB CS252 S98, Copyright 1998 UCB
28
Tomasulo Example Cycle 15
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 15 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No
Add3 No0 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F3015 FU Mult1 M(45+R3) (M–M)+M() M()–M() Mult2
• Mult1 completing; what is waiting for it?
Adapted from UCB CS252 S98, Copyright 1998 UCB
15
29
Tomasulo Example Cycle 16
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No
Add3 No0 Mult1 No
40 Mult2 Yes DIVD M*F4 M(34+R2)Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F3016 FU M*F4 M(45+R3) (M–M)+M() M()–M() Mult2
• Note: Just waiting for divide
Adapted from UCB CS252 S98, Copyright 1998 UCB
30
Tomasulo Example Cycle 55
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No
Add3 No0 Mult1 No1 Mult2 Yes DIVD M*F4 M(34+R2)
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F3055 FU M*F4 M(45+R3) (M–M)+M() M()–M() Mult2
Adapted from UCB CS252 S98, Copyright 1998 UCB
16
31
Tomasulo Example Cycle 56
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5 56ADDD F6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No
Add3 No0 Mult1 No0 Mult2 Yes DIVD M*F4 M(34+R2)
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F3056 FU M*F4 M(45+R3) (M–M)+M() M()–M() Mult2
• Mult 2 completing; what is waiting for it?
Adapted from UCB CS252 S98, Copyright 1998 UCB
32
Tomasulo Example Cycle 57
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5 56 57ADDD F6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No
Add3 No0 Mult1 No0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F3057 FU M*F4 M(45+R3) (M–M)+M() M()–M() M*F4/M
• Again, in-oder issue, out-of-order execution, completion
Adapted from UCB CS252 S98, Copyright 1998 UCB
17
33
34
Scheduling Results
Inst Disp Schd EXE MEM CDBL.D 1 - - 2-3 4L.D 2 - - 3-4 5MULT 3 4-5 6-15 - 16SUB.D 4 5 6-7 - 8DIV.D 5 5-11 17-56 - 57ADD.D 6 7-8 9-10 - 11
It’s not perfect: lose one cycle for CDB broadcasting
18
35
Correctness of TomasuloEach instruction receives the same
operands as in the sequential execution
Or, each instruction receives its operand from its parent in program order (data flow)
How many possible ways of register data flow in Tomasulo?
36
Correctness of Tomasulo
Any register or memory word receives the value of the last write as in the sequential execution No stale writes
How can Tomasulo prevent stale writes?
19
37
Review Dependences
How are dependences enforced or removed in Tomasulo Algorithm?Data dependences (RAW)Must be enforced: timing and data flowAntidependence (WAR)Hazard to data flow, can be removedOutput Dependence (WAW)Hazard of stale write, can be removed
38
Renaming in Tomasulo
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 Load2 Yes 45+R3MULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk0 Add1 Yes SUBD M(34+R2) Load20 Add2 No
Add3 No0 Mult1 Yes MULTD R(F4) Load20 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F304 FU Mult1 Load2 M(34+R2) Add1
Tomasulo Example Cycle 4
Adapted from UCB CS252 S98, Copyright 1998 UCB
Can you see renamed instructions?
20
39
Renaming in Tomasulo
LD F6,34(R2)
LD F2,45(R3)
MULTI F0,F2,F4
SUBD F8,F6,F2
DIVD F10,F0,F6
ADD F6,F8,F2
LD F6/Load1,34(R2)
LD F2/Load2,45(R3)
MULTI F0/Mult1,Load2,F4
SUBD F8/Add1,Load1,Load2
DIVD F10/Mult2,Mult1,F6
ADD F6/Add2,Add1,F2
Left: Original instructionsRight: Renamed form after issuing (dispatching)
40
Renaming in Tomasulo
LD F6,34(R2)
LD F8,45(R3)
MULTI F10,F8,F4
SUBD F12,F6,F8
DIVD F14,F10,F6
ADD F16,F12,F8
LD F6/Load1,34(R2)
LD F8/Load2,45(R3)
MULTI F10/Mult1,Load2,F4
SUBD F12/Add1,Load1,Load2
DIVD F14/Mult2,Mult1,F6
ADD F16/Add2,Add1,F2
Left: F2=>F8, F0=>F10, F8=>F12, F10=>F14, F6=>F16Right: The renamed forms use the same reservation stations
21
41
Renaming in Tomasulo
LD F6,34(R2)
LD F2,45(R3)
MULTI F0,F2,F4
DIVD F8,F0,F6
SUBD F8,F6,F2
ADD F6,F8,F2
LD F6/Load1,34(R2)
LD F2/Load2,45(R3)
MULTI F0/Mult1,Load2,F4
DIVD F8/Mult2,Mult1,F6
SUBD F8/Add1,Load1,Load2
ADD F6/Add2,Add1,F2
Left: Moving DVD and change its destination register, causinga hazard to WAR and WAW dependences in dynamic schedulingQ1: How does ADD receive correct input values?Q2: How does F8 receive the last value of write?
42
Register Renaming and Correctness1. The same set of instructions write to
user-visible register and memory;2. Each instruction receives operands from
the same parent(s) as in the sequential execution; and
3. Any register or memory word receives the value of the last write as in the sequential execution
Tomasulo algorithm enforces properties 2 and 3 regarding register values, and helps maintain property 1 (to be discussed), in dynamic execution
22
43
Modern Renaming: Use of TagIn Tomasulo, RS and
load/store buffer index is used as tag.Renaming assigns every new reg. value a unique tagRS stores tags to preserve dependencesCDB broadcasts tag with data for data passing and wakeup
Why tag is so chosen?
What does tag really represent?
What else can be used as tag?
44
Tomasulo SummaryReservations stations: Increases effective number of registers Distributes scheduling logic
Register renaming: Avoids WAR and WAW dependence
Tag + Data broadcasting for waking up child instructions
Pros: can be effectively combined with speculative execution
Cons: CDB broadcasting adds one-cycle delay (addressed in modern instruction scheduling)
Adapted from UCB CS252 S98, Copyright 1998 USB