lecture 6 tomasulo algorithm - iowa state...

22
1 1 Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems Architecture Reading:Textbook (4 th ed: 2.4, 2.5) (5 th ed: 3.4, 3.5) 2 Tomasulo Algorithm History: 1966: scoreboarding in CDC6600, implements limited dynamic scheduling Three years later: Tomasulo in IBM 360/91, introduces register renaming and reservation station Now appearing in todays Dec Alpha, SGI MIPS, SUN UltraSparc, Intel Pentium, IBM PowerPC, and others In today’s perspective: It implements reservation-based dynamic scheduling without data forwarding (obsolete) The merit is the use of tag to identify instructions in flight: register renaming, reservation station, tag broadcast It is an “algorithm” first Adapted from UCB CS252 S98, Copyright 1998 USB

Upload: others

Post on 18-Aug-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 6 Tomasulo Algorithm - Iowa State Universityclass.ece.iastate.edu/tyagi/cpre581/lectures/Lecture8.pdf · 2013. 10. 3. · Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems

1

1

Lecture 6 Tomasulo Algorithm

CprE 581 Computer SystemsArchitecture

Reading:Textbook (4th ed: 2.4, 2.5)(5th ed: 3.4, 3.5)

2

Tomasulo AlgorithmHistory:

1966: scoreboarding in CDC6600, implements limited dynamic schedulingThree years later: Tomasulo in IBM 360/91, introduces register renaming and reservation stationNow appearing in todays Dec Alpha, SGI MIPS, SUN UltraSparc, Intel Pentium, IBM PowerPC, and others

In today’s perspective: It implements reservation-based dynamic scheduling without data forwarding (obsolete)

The merit is the use of tag to identify instructions in flight: register renaming, reservation station, tag broadcast

It is an “algorithm” first

Adapted from UCB CS252 S98, Copyright 1998 USB

Page 2: Lecture 6 Tomasulo Algorithm - Iowa State Universityclass.ece.iastate.edu/tyagi/cpre581/lectures/Lecture8.pdf · 2013. 10. 3. · Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems

2

Assumptions

Assumptions for time being:

• No branches• Only concern is register dependence• Only floating-point instructions are

executed out-of-order

3

4

Register Renaming and Correctness1. The same set of instructions write to

user-visible register and memory;2. Each instruction receives the same

operands as in the sequential execution; and

3. Any register or memory word receives the value of the last write as in the sequential execution

Page 3: Lecture 6 Tomasulo Algorithm - Iowa State Universityclass.ece.iastate.edu/tyagi/cpre581/lectures/Lecture8.pdf · 2013. 10. 3. · Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems

3

Tomasulo Organization

5

6

Reservation Station ComponentsOp—Operation to perform in the unit (e.g., + or –)Vj, Vk—Value of Source operands Store buffer has V field, result to be stored

Qj, Qk—Reservation stations producing source registers (value to be written) Note: No ready flags as in Scoreboard; Qj,Qk=0 =>

ready Store buffers only have Qi for RS producing result

Busy—Indicates reservation station or FU is busy

How are dependences represented in reservation station?

Adapted from UCB CS252 S98B

Page 4: Lecture 6 Tomasulo Algorithm - Iowa State Universityclass.ece.iastate.edu/tyagi/cpre581/lectures/Lecture8.pdf · 2013. 10. 3. · Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems

4

7

Register Result Status

It’s a table to record the RS index of the latest pending instruction, if there is one, that writes to each register

It’s kind of register renaming: rename register to RS if there is a pending write

8

Three Stages of Tomasulo Algorithm1. Issue—get instruction from FP Op Queue

If reservation station free (no structural hazard), control issues instr & sends operands (renames registers).

2.Execution—operate on operands (EX)When both operands ready then execute;if not ready, watch Common Data Bus for result

3.Write result—finish execution (WB)Write on Common Data Bus to all awaiting units; mark reservation station available

Issue: dependence check for new instWriteback: Wakeup dependent instructions

Adapted from UCB CS252 S98

Page 5: Lecture 6 Tomasulo Algorithm - Iowa State Universityclass.ece.iastate.edu/tyagi/cpre581/lectures/Lecture8.pdf · 2013. 10. 3. · Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems

5

9

Issue Stage and RenamingSource renaming: rename the two source registers if there are pending writes

Assigns a free RS

Dest. Renaming: Updates register result status with the allocated RS

Renaming removes name dependences

10

Execute Stage

Only “ready” instructions can join the competitionThere is a select logic to select instructions for FU execution Some policy may be used, e.g. age basedNon-ready instructions can be “woken up” during writeback of its parent inst

Page 6: Lecture 6 Tomasulo Algorithm - Iowa State Universityclass.ece.iastate.edu/tyagi/cpre581/lectures/Lecture8.pdf · 2013. 10. 3. · Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems

6

11

Writeback and Common Data BusNormal data bus: data + destination (“go to” bus)Common data bus: data + source (“come from” bus) 64 bits of data + 4 bits of source index

(tag)Does the broadcast to every instruction in

flightWaiting instructions do tag matching and update their ready bits and value fields (if the tag matches theirs)

Adapted from UCB CS252 S98, Copyright 1998 USB

12

Code ExampleLD F6,34(R2)

LD F2,45(R3)

MULTI F0,F2,F4

SUBD F8,F6,F2

DIVD F10,F0,F6

ADD F6,F8,F2

LD1 LD2

MULTISUBD

DIVDADD

Operation latencies: load/store 2 cycles,Add/sub 2 cycles, Mult 10 cycles, divide 40 cycle

Page 7: Lecture 6 Tomasulo Algorithm - Iowa State Universityclass.ece.iastate.edu/tyagi/cpre581/lectures/Lecture8.pdf · 2013. 10. 3. · Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems

7

13

Tomasulo Example Cycle 0Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 Load1 NoLD F2 45+ R3 Load2 NoMULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations S1 S2 RS for j RS for k

TimeNam BusyOp Vj Vk Qj Qk0 Add1No0 Add2No0 Add3No0 Mult1No0 Mult2No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F300 FU

Adapted from UCB CS252 S98, Copyright 1998 UCB

14

Tomasulo Example Cycle 1Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 Load1 No 34+R2LD F2 45+ R3 Load2 NoMULT F0 F2 F4 Load3 NoSUBDF8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No

Add3 No0 Mult1 No0 Mult2 No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F301 FU Load1

Yes

Adapted from UCB CS252 S98, Copyright 1998 UCB

Page 8: Lecture 6 Tomasulo Algorithm - Iowa State Universityclass.ece.iastate.edu/tyagi/cpre581/lectures/Lecture8.pdf · 2013. 10. 3. · Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems

8

15

Tomasulo Example Cycle 2

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULTDF0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No

Add3 No0 Mult1 No0 Mult2 No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F302 FU Load2 Load1

Adapted from UCB CS252 S98, Copyright 1998 UCB

16

Tomasulo Example Cycle 3

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULT F0 F2 F4 3 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No

Add3 No0 Mult1 Yes MULTD R(F4) Load20 Mult2 No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F303 FU Mult1 Load2 Load1

• Note: registers names are removed (“renamed”) in Reservation Stations

• Load1 completing; what is waiting for Load1?Adapted from UCB CS252 S98, Copyright 1998 UCB

Page 9: Lecture 6 Tomasulo Algorithm - Iowa State Universityclass.ece.iastate.edu/tyagi/cpre581/lectures/Lecture8.pdf · 2013. 10. 3. · Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems

9

17

Tomasulo Example Cycle 4

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 Load2 Yes 45+R3MULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 Yes SUBD M(34+R2) Load20 Add2 No

Add3 No0 Mult1 Yes MULTD R(F4) Load20 Mult2 No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F304 FU Mult1 Load2 M(34+R2) Add1

• Load2 completing; what is waiting for it?

Adapted from UCB CS252 S98, Copyright 1998 UCB

18

Tomasulo Example Cycle 5

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk2 Add1 Yes SUBD M(34+R2) M(45+R3)0 Add2 No

Add3 No10 Mult1 Yes MULTD M(45+R3) R(F4)

0 Mult2 Yes DIVD M(34+R2) Mult1Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F305 FU Mult1 M(45+R3) M(34+R2) Add1 Mult2

Adapted from UCB CS252 S98, Copyright 1998 UCB

Page 10: Lecture 6 Tomasulo Algorithm - Iowa State Universityclass.ece.iastate.edu/tyagi/cpre581/lectures/Lecture8.pdf · 2013. 10. 3. · Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems

10

19

Tomasulo Example Cycle 6

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2 6Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk1 Add1 Yes SUBD M(34+R2) M(45+R3)0 Add2 Yes ADDD M(45+R3) Add1

Add3 No9 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F306 FU Mult1 M(45+R3) Add2 Add1 Mult2

Adapted from UCB CS252 S98, Copyright 1998 UCB

20

Tomasulo Example Cycle 7

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7DIVD F10 F0 F6 5ADDD F6 F8 F2 6Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 Yes SUBD M(34+R2) M(45+R3)0 Add2 Yes ADDD M(45+R3) Add1

Add3 No8 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F307 FU Mult1 M(45+R3) Add2 Add1 Mult2

• Add1 completing; what is waiting for it?

Adapted from UCB CS252 S98, Copyright 1998 UCB

Page 11: Lecture 6 Tomasulo Algorithm - Iowa State Universityclass.ece.iastate.edu/tyagi/cpre581/lectures/Lecture8.pdf · 2013. 10. 3. · Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems

11

21

Tomasulo Example Cycle 8

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTF0 F2 F4 3 Load3 NoSUBDF8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDDF6 F8 F2 6Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 No 2 Add2 Yes ADDD M()-M() M(45+R3)0 Add3 No7 Mult1 Yes MULTDM(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F308 FU Mult1 M(45+R3) Add2 M()-M() Mult2

Adapted from UCB CS252 S98, Copyright 1998 UCB

22

Tomasulo Example Cycle 9

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 No1 Add2 Yes ADDD M()–M() M(45+R3)0 Add3 No6 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F309 FU Mult1 M(45+R3) Add2 M()–M() Mult2

Adapted from UCB CS252 S98, Copyright 1998 UCB

Page 12: Lecture 6 Tomasulo Algorithm - Iowa State Universityclass.ece.iastate.edu/tyagi/cpre581/lectures/Lecture8.pdf · 2013. 10. 3. · Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems

12

23

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 Yes ADDD M()–M() M(45+R3)0 Add3 No5 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3010 FU Mult1 M(45+R3) Add2 M()–M() Mult2

Tomasulo Example Cycle 10

• Add2 completing; what is waiting for it?

Adapted from UCB CS252 S98, Copyright 1998 UCB

24

Tomasulo Example Cycle 11

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTF0 F2 F4 3 Load3 NoSUBDF8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDDF6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No0 Add3 No4 Mult1 Yes MULTDM(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3011 FU Mult1 M(45+R3) (M-M)+M() M()ŠM() Mult2

• Write result of ADDD here vs. scoreboard?

Adapted from UCB CS252 S98, Copyright 1998 UCB

Page 13: Lecture 6 Tomasulo Algorithm - Iowa State Universityclass.ece.iastate.edu/tyagi/cpre581/lectures/Lecture8.pdf · 2013. 10. 3. · Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems

13

25

Tomasulo Example Cycle 12

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMUL F0 F2 F4 3 Load3 NoSUB F8 F6 F2 4 7 8DIVDF10 F0 F6 5ADD F6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for k

Time Name BusyOp Vj Vk Qj Qk0 Add1 No0 Add2 No0 Add3 No3 Mult1 Yes MULTM(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3012 FU Mult1 M(45+R3) (M-M)+M()M()–M(Mult2

Adapted from UCB CS252 S98, Copyright 1998 UCB

26

Tomasulo Example Cycle 13

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No

Add3 No2 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3013 FU Mult1 M(45+R3) (M–M)+M() M()–M() Mult2

Adapted from UCB CS252 S98, Copyright 1998 UCB

Page 14: Lecture 6 Tomasulo Algorithm - Iowa State Universityclass.ece.iastate.edu/tyagi/cpre581/lectures/Lecture8.pdf · 2013. 10. 3. · Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems

14

27

Tomasulo Example Cycle 14

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No0 Add3 No1 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3014 FU Mult1 M(45+R3) (M–M)+M() M()–M() Mult2

Adapted from UCB CS252 S98, Copyright 1998 UCB

28

Tomasulo Example Cycle 15

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 15 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No

Add3 No0 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3015 FU Mult1 M(45+R3) (M–M)+M() M()–M() Mult2

• Mult1 completing; what is waiting for it?

Adapted from UCB CS252 S98, Copyright 1998 UCB

Page 15: Lecture 6 Tomasulo Algorithm - Iowa State Universityclass.ece.iastate.edu/tyagi/cpre581/lectures/Lecture8.pdf · 2013. 10. 3. · Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems

15

29

Tomasulo Example Cycle 16

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No

Add3 No0 Mult1 No

40 Mult2 Yes DIVD M*F4 M(34+R2)Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3016 FU M*F4 M(45+R3) (M–M)+M() M()–M() Mult2

• Note: Just waiting for divide

Adapted from UCB CS252 S98, Copyright 1998 UCB

30

Tomasulo Example Cycle 55

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No

Add3 No0 Mult1 No1 Mult2 Yes DIVD M*F4 M(34+R2)

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3055 FU M*F4 M(45+R3) (M–M)+M() M()–M() Mult2

Adapted from UCB CS252 S98, Copyright 1998 UCB

Page 16: Lecture 6 Tomasulo Algorithm - Iowa State Universityclass.ece.iastate.edu/tyagi/cpre581/lectures/Lecture8.pdf · 2013. 10. 3. · Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems

16

31

Tomasulo Example Cycle 56

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5 56ADDD F6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No

Add3 No0 Mult1 No0 Mult2 Yes DIVD M*F4 M(34+R2)

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3056 FU M*F4 M(45+R3) (M–M)+M() M()–M() Mult2

• Mult 2 completing; what is waiting for it?

Adapted from UCB CS252 S98, Copyright 1998 UCB

32

Tomasulo Example Cycle 57

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULT F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5 56 57ADDD F6 F8 F2 6 10 11Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 No0 Add2 No

Add3 No0 Mult1 No0 Mult2 No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F3057 FU M*F4 M(45+R3) (M–M)+M() M()–M() M*F4/M

• Again, in-oder issue, out-of-order execution, completion

Adapted from UCB CS252 S98, Copyright 1998 UCB

Page 17: Lecture 6 Tomasulo Algorithm - Iowa State Universityclass.ece.iastate.edu/tyagi/cpre581/lectures/Lecture8.pdf · 2013. 10. 3. · Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems

17

33

34

Scheduling Results

Inst Disp Schd EXE MEM CDBL.D 1 - - 2-3 4L.D 2 - - 3-4 5MULT 3 4-5 6-15 - 16SUB.D 4 5 6-7 - 8DIV.D 5 5-11 17-56 - 57ADD.D 6 7-8 9-10 - 11

It’s not perfect: lose one cycle for CDB broadcasting

Page 18: Lecture 6 Tomasulo Algorithm - Iowa State Universityclass.ece.iastate.edu/tyagi/cpre581/lectures/Lecture8.pdf · 2013. 10. 3. · Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems

18

35

Correctness of TomasuloEach instruction receives the same

operands as in the sequential execution

Or, each instruction receives its operand from its parent in program order (data flow)

How many possible ways of register data flow in Tomasulo?

36

Correctness of Tomasulo

Any register or memory word receives the value of the last write as in the sequential execution No stale writes

How can Tomasulo prevent stale writes?

Page 19: Lecture 6 Tomasulo Algorithm - Iowa State Universityclass.ece.iastate.edu/tyagi/cpre581/lectures/Lecture8.pdf · 2013. 10. 3. · Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems

19

37

Review Dependences

How are dependences enforced or removed in Tomasulo Algorithm?Data dependences (RAW)Must be enforced: timing and data flowAntidependence (WAR)Hazard to data flow, can be removedOutput Dependence (WAW)Hazard of stale write, can be removed

38

Renaming in Tomasulo

Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 Load2 Yes 45+R3MULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations S1 S2 RS for j RS for k

Time Name Busy Op Vj Vk Qj Qk0 Add1 Yes SUBD M(34+R2) Load20 Add2 No

Add3 No0 Mult1 Yes MULTD R(F4) Load20 Mult2 No

Register result status

Clock F0 F2 F4 F6 F8 F10 F12 ... F304 FU Mult1 Load2 M(34+R2) Add1

Tomasulo Example Cycle 4

Adapted from UCB CS252 S98, Copyright 1998 UCB

Can you see renamed instructions?

Page 20: Lecture 6 Tomasulo Algorithm - Iowa State Universityclass.ece.iastate.edu/tyagi/cpre581/lectures/Lecture8.pdf · 2013. 10. 3. · Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems

20

39

Renaming in Tomasulo

LD F6,34(R2)

LD F2,45(R3)

MULTI F0,F2,F4

SUBD F8,F6,F2

DIVD F10,F0,F6

ADD F6,F8,F2

LD F6/Load1,34(R2)

LD F2/Load2,45(R3)

MULTI F0/Mult1,Load2,F4

SUBD F8/Add1,Load1,Load2

DIVD F10/Mult2,Mult1,F6

ADD F6/Add2,Add1,F2

Left: Original instructionsRight: Renamed form after issuing (dispatching)

40

Renaming in Tomasulo

LD F6,34(R2)

LD F8,45(R3)

MULTI F10,F8,F4

SUBD F12,F6,F8

DIVD F14,F10,F6

ADD F16,F12,F8

LD F6/Load1,34(R2)

LD F8/Load2,45(R3)

MULTI F10/Mult1,Load2,F4

SUBD F12/Add1,Load1,Load2

DIVD F14/Mult2,Mult1,F6

ADD F16/Add2,Add1,F2

Left: F2=>F8, F0=>F10, F8=>F12, F10=>F14, F6=>F16Right: The renamed forms use the same reservation stations

Page 21: Lecture 6 Tomasulo Algorithm - Iowa State Universityclass.ece.iastate.edu/tyagi/cpre581/lectures/Lecture8.pdf · 2013. 10. 3. · Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems

21

41

Renaming in Tomasulo

LD F6,34(R2)

LD F2,45(R3)

MULTI F0,F2,F4

DIVD F8,F0,F6

SUBD F8,F6,F2

ADD F6,F8,F2

LD F6/Load1,34(R2)

LD F2/Load2,45(R3)

MULTI F0/Mult1,Load2,F4

DIVD F8/Mult2,Mult1,F6

SUBD F8/Add1,Load1,Load2

ADD F6/Add2,Add1,F2

Left: Moving DVD and change its destination register, causinga hazard to WAR and WAW dependences in dynamic schedulingQ1: How does ADD receive correct input values?Q2: How does F8 receive the last value of write?

42

Register Renaming and Correctness1. The same set of instructions write to

user-visible register and memory;2. Each instruction receives operands from

the same parent(s) as in the sequential execution; and

3. Any register or memory word receives the value of the last write as in the sequential execution

Tomasulo algorithm enforces properties 2 and 3 regarding register values, and helps maintain property 1 (to be discussed), in dynamic execution

Page 22: Lecture 6 Tomasulo Algorithm - Iowa State Universityclass.ece.iastate.edu/tyagi/cpre581/lectures/Lecture8.pdf · 2013. 10. 3. · Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems

22

43

Modern Renaming: Use of TagIn Tomasulo, RS and

load/store buffer index is used as tag.Renaming assigns every new reg. value a unique tagRS stores tags to preserve dependencesCDB broadcasts tag with data for data passing and wakeup

Why tag is so chosen?

What does tag really represent?

What else can be used as tag?

44

Tomasulo SummaryReservations stations: Increases effective number of registers Distributes scheduling logic

Register renaming: Avoids WAR and WAW dependence

Tag + Data broadcasting for waking up child instructions

Pros: can be effectively combined with speculative execution

Cons: CDB broadcasting adds one-cycle delay (addressed in modern instruction scheduling)

Adapted from UCB CS252 S98, Copyright 1998 USB