cosc 6385 computer architecture...
TRANSCRIPT
1
Edgar Gabriel
COSC 6385
Computer Architecture
- Tomasulo’s Algorithm
Edgar Gabriel
Fall 2009
COSC 6385 – Computer Architecture
Edgar Gabriel
Analyzing a short code-sequence
DIV.D F0, F2, F4
ADD.D F6, F0, F8
S.D F6, 0(R1)
SUB.D F8, F10, F14
MUL.D F6, F10, F8
2
COSC 6385 – Computer Architecture
Edgar Gabriel
Analyzing a short code-sequence
• 3 True data dependencies
DIV.D F0, F2, F4
ADD.D F6, F0, F8
S.D F6, 0(R1)
SUB.D F8, F10, F14
MUL.D F6, F10, F8
COSC 6385 – Computer Architecture
Edgar Gabriel
Analyzing a short code-sequence
DIV.D F0, F2, F4
ADD.D F6, F0, F8
S.D F6, 0(R1)
SUB.D F8, F10, F14
MUL.D F6, F10, F8
• 3 True data dependencies
3
COSC 6385 – Computer Architecture
Edgar Gabriel
Analyzing a short code-sequence
DIV.D F0, F2, F4
ADD.D F6, F0, F8
S.D F6, 0(R1)
SUB.D F8, F10, F14
MUL.D F6, F10, F8
• 3 True data dependencies
COSC 6385 – Computer Architecture
Edgar Gabriel
Analyzing a short code-sequence
DIV.D F0, F2, F4
ADD.D F6, F0, F8
S.D F6, 0(R1)
SUB.D F8, F10, F14
MUL.D F6, F10, F8
• Anti-dependencies (WAR hazards)
4
COSC 6385 – Computer Architecture
Edgar Gabriel
Analyzing a short code-sequence
DIV.D F0, F2, F4
ADD.D F6, F0, F8
S.D F6, 0(R1)
SUB.D F8, F10, F14
MUL.D F6, F10, F8
• Output dependency (WAW
hazard)
COSC 6385 – Computer Architecture
Edgar Gabriel
Analyzing a short code-sequence
DIV.D F0,F2, F4
ADD.D S, F0, F8
S.D S, 0(R1)
SUB.D T, F10, F14
MUL.D F6,F10, T
• Renaming some registers can
remove the WAR and WAW
hazards
– Any subsequent use of F8 must
be replaced by T
5
COSC 6385 – Computer Architecture
Edgar Gabriel
Tomasulo’s Algorithm
• Register renaming is provided by reservation stations
– Buffer the operands of instructions waiting to being
issued
– Fetches an operand as soon as available
– Eliminates the need to get an operand from register
– Pending instructions designate the reservation station
providing the input
• For overlapping successive writes: only the last one will
be executed
COSC 6385 – Computer Architecture
Edgar Gabriel
Tomasulo’s Algorithm
• Typically more reservation stations than registers
• Hazard detection is distributed (instead of centralized
as in the Scoreboard)
• Results are passed directly from reservation stations to
functional units using a common data bus (CDB)
• Each reservation station holds the opcode for the
pending instruction and either operand values or names
of reservation stations that will provide them
• Load and store buffers hold data and addresses for
memory access
6
COSC 6385 – Computer Architecture
Edgar Gabriel
FP registersInstruction
queue
Address unit
Memory unitFP adders FP multipliers
4
3
2
1
4
3
2
1
Frominstruction
unit
Reservationstations
Store buffers Load
buffers
DataAddress
Common data bus
LOAD-STOREOPERATIONS
FPOPERATIONS
COSC 6385 – Computer Architecture
Edgar Gabriel
Tomasulo’s Algorithm
• Load store buffers:
– Hold components of effective address
– Hold destination memory address ( = effective address)
– Hold value
7
COSC 6385 – Computer Architecture
Edgar Gabriel
Tomasulo’s Algorithm
• Only three steps per instruction – each step can take an
arbitrary number of cycles
– Issue:
• get next instruction from FIFO instruction queue
• Search matching empty reservation station
– If found: issue instruction with operand values
– If not found: structural hazard-> instruction stalls
– If operands not in register: keep track of functional
units producing operands
COSC 6385 – Computer Architecture
Edgar Gabriel
Tomasulo’s Algorithm
– Execute:
• If operands not available: monitor common data bus
• When all operands available: execute
– Write result:
• Write data on CDB and from there into registers
8
COSC 6385 – Computer Architecture
Edgar Gabriel
Data fields for reservation stations
• Qp: operation to perform on source operands S1 and S2
• Qj, Qk: reservation stations producing the operands
• Vj, Vk: value for each operand
• A: holds information for memory address calculation
(immediate field, effective address)
• Busy: indicates occupied functional units/reservation
stations
• Qi: number of the reservation station who will produce
the data to be stored in this register
COSC 6385 – Computer Architecture
Edgar Gabriel
The same example as for scoreboarding
L.D F6, 34(R2)
L.D F2, 45(R3)
MUL.D F0, F2, F4
SUB.D F8, F6, F2
DIV.D F10, F0, F6
ADD.D F6, F8, F2
Following slides are based on a lecture by Jelena Mirkovic,
University of Delawarehttp://www.cis.udel.edu/~sunshine/courses/F04/CIS662/class12.pdf
Assumption:
ADD and SUB take 2 clock cycles
MULT takes 10 clock cycle
DIV takes 40 clock cycles
2 Load/Store, 3 ADD and 2 Mult functional units/reservation stations
9
COSC 6385 – Computer Architecture
Edgar Gabriel
Instruction status
Instruction Issue Execute Write result
L.D F6, 34(R2) �
L.D F2, 45(R3)
MUL.D F0, F2, F4
SUB.D F8, F6, F2
DIV.D F10, F0, F6
ADD.D F6, F8, F2
Reservation station
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load Regs[R2] 34
Load2
Add1
Add2
Add3
Mult1
Mult2
Register result status
F0 F2 F4 F6 F8 F10 F12 / F30
Qi Load1
Time=1 Issue first load
COSC 6385 – Computer Architecture
Edgar Gabriel
Instruction status
Instruction Issue Execute Write result
L.D F6, 34(R2) � �
L.D F2, 45(R3) �
MUL.D F0, F2, F4
SUB.D F8, F6, F2
DIV.D F10, F0, F6
ADD.D F6, F8, F2
Reservation station
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load Regs[R2] +34
Load2 Yes Load Regs[R3] 45
Add1
Add2
Add3
Mult1
Mult2
Register result status
F0 F2 F4 F6 F8 F10 F12 / F30
Qi Load2 Load1
Time=2 First load calc. address. Second load issued
10
COSC 6385 – Computer Architecture
Edgar Gabriel
Instruction status
Instruction Issue Execute Write result
L.D F6, 34(R2) � �
L.D F2, 45(R3) � �
MUL.D F0, F2, F4 �
SUB.D F8, F6, F2
DIV.D F10, F0, F6
ADD.D F6, F8, F2
Reservation station
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load Regs[R2]+34
Load2 Yes Load Regs[R3] +45
Add1
Add2
Add3
Mult1 Yes Mult Regs[F4] Load2
Mult2
Register result status
F0 F2 F4 F6 F8 F10 F12 / F30
Qi Mult1 Load2 Load1
Time=3 First load read from mem. Second load calc address. Mult is issued
COSC 6385 – Computer Architecture
Edgar Gabriel
Instruction status
Instruction Issue Execute Write result
L.D F6, 34(R2) � � �
L.D F2, 45(R3) � �
MUL.D F0, F2, F4 �
SUB.D F8, F6, F2 �
DIV.D F10, F0, F6
ADD.D F6, F8, F2
Reservation station
Name Busy Op Vj Vk Qj Qk A
Load1
Load2 Yes Load Regs[R3]+45
Add1 Yes Sub Mem[34+Regs[R2]] Load2
Add2
Add3
Mult1 Yes Mult Regs[F4] Load2
Mult2
Register result status
F0 F2 F4 F6 F8 F10 F12 / F30
Qi Mult1 Load2 Add1
Time=4 First load write res. Second load read mem. Mult stalled, Sub issued
11
COSC 6385 – Computer Architecture
Edgar Gabriel
Instruction status
Instruction Issue Execute Write result
L.D F6, 34(R2) � � �
L.D F2, 45(R3) � � �
MUL.D F0, F2, F4 �
SUB.D F8, F6, F2 �
DIV.D F10, F0, F6 �
ADD.D F6, F8, F2
Reservation station
Name Busy Op Vj Vk Qj Qk A
Load1
Load2
Add1 Yes Sub Mem[34+Regs[R2]] Mem[45+Regs[R3]]
Add2
Add3
Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4]
Mult2 Yes Div Mem[34+Regs[R2]] Mult1
Register result status
F0 F2 F4 F6 F8 F10 F12 / F30
Qi Mult1 Add1 Mult2
Time=5 Second load write res. Mult stalled, Sub stalled, Div. issued
COSC 6385 – Computer Architecture
Edgar Gabriel
Instruction status
Instruction Issue Execute Write result
L.D F6, 34(R2) � � �
L.D F2, 45(R3) � � �
MUL.D F0, F2, F4 � �
SUB.D F8, F6, F2 � �
DIV.D F10, F0, F6 �
ADD.D F6, F8, F2 �
Reservation station
Name Busy Op Vj Vk Qj Qk A
Load1
Load2
Add1 Yes Sub Mem[34+Regs[R2]] Mem[45+Regs[R3]]
Add2 Yes Add Mem[45+Regs[R3]] Add1
Add3
Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4]
Mult2 Yes Div Mem[34+Regs[R2]] Mult1
Register result status
F0 F2 F4 F6 F8 F10 F12 / F30
Qi Mult1 Add2 Add1 Mult2
Time=6 Mult executes (1/10), Sub executes (1/2), Div. stalled, Add issued
12
COSC 6385 – Computer Architecture
Edgar Gabriel
Instruction status
Instruction Issue Execute Write result
L.D F6, 34(R2) � � �
L.D F2, 45(R3) � � �
MUL.D F0, F2, F4 � �
SUB.D F8, F6, F2 � �
DIV.D F10, F0, F6 �
ADD.D F6, F8, F2 �
Reservation station
Name Busy Op Vj Vk Qj Qk A
Load1
Load2
Add1 Yes Sub Mem[34+Regs[R2]] Mem[45+Regs[R3]]
Add2 Yes Add Mem[45+Regs[R3]] Add1
Add3
Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4]
Mult2 Yes Div Mem[34+Regs[R2]] Mult1
Register result status
F0 F2 F4 F6 F8 F10 F12 / F30
Qi Mult1 Add2 Add1 Mult2
Time=7 Mult executes (2/10), Sub executes (2/2), Div. stalled, Add stalled
COSC 6385 – Computer Architecture
Edgar Gabriel
Instruction status
Instruction Issue Execute Write result
L.D F6, 34(R2) � � �
L.D F2, 45(R3) � � �
MUL.D F0, F2, F4 � �
SUB.D F8, F6, F2 � � �
DIV.D F10, F0, F6 �
ADD.D F6, F8, F2 �
Reservation station
Name Busy Op Vj Vk Qj Qk A
Load1
Load2
Add1
Add2 Yes Add Mem[34+Regs[R2]]-
Mem[45+Regs[R3]]
Mem[45+Regs[R3]]
Add3
Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4]
Mult2 Yes Div Mem[34+Regs[R2]] Mult1
Register result status
F0 F2 F4 F6 F8 F10 F12 / F30
Qi Mult1 Add2 Add1 Mult2
Time=8 Mult executes (3/10), Sub writes res., Div. stalled, Add stalled
13
COSC 6385 – Computer Architecture
Edgar Gabriel
Instruction status
Instruction Issue Execute Write result
L.D F6, 34(R2) � � �
L.D F2, 45(R3) � � �
MUL.D F0, F2, F4 � �
SUB.D F8, F6, F2 � � �
DIV.D F10, F0, F6 �
ADD.D F6, F8, F2 � �
Reservation station
Name Busy Op Vj Vk Qj Qk A
Load1
Load2
Add1
Add2 Yes Add Mem[34+Regs[R2]]-
Mem[45+Regs[R3]]
Mem[45+Regs[R3]]
Add3
Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4]
Mult2 Yes Div Mem[34+Regs[R2]] Mult1
Register result status
F0 F2 F4 F6 F8 F10 F12 / F30
Qi Mult1 Add2 Mult2
Time=9 Mult executes (4/10), Div. stalled, Add executes (1/2)
COSC 6385 – Computer Architecture
Edgar Gabriel
Instruction status
Instruction Issue Execute Write result
L.D F6, 34(R2) � � �
L.D F2, 45(R3) � � �
MUL.D F0, F2, F4 � �
SUB.D F8, F6, F2 � � �
DIV.D F10, F0, F6 �
ADD.D F6, F8, F2 � �
Reservation station
Name Busy Op Vj Vk Qj Qk A
Load1
Load2
Add1
Add2 Yes Add Mem[34+Regs[R2]]-
Mem[45+Regs[R3]]
Mem[45+Regs[R3]]
Add3
Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4]
Mult2 Yes Div Mem[34+Regs[R2]] Mult1
Register result status
F0 F2 F4 F6 F8 F10 F12 / F30
Qi Mult1 Add2 Mult2
Time=10 Mult executes (5/10), Div. stalled, Add executes (2/2)
14
COSC 6385 – Computer Architecture
Edgar Gabriel
Instruction status
Instruction Issue Execute Write result
L.D F6, 34(R2) � � �
L.D F2, 45(R3) � � �
MUL.D F0, F2, F4 � �
SUB.D F8, F6, F2 � � �
DIV.D F10, F0, F6 �
ADD.D F6, F8, F2 � � �
Reservation station
Name Busy Op Vj Vk Qj Qk A
Load1
Load2
Add1
Add2
Add3
Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4]
Mult2 Yes Div Mem[34+Regs[R2]] Mult1
Register result status
F0 F2 F4 F6 F8 F10 F12 / F30
Qi Mult1 Mult2
Time=11 Mult executes (6/10), Div. stalled, Add writes result
COSC 6385 – Computer Architecture
Edgar Gabriel
Instruction status
Instruction Issue Execute Write result
L.D F6, 34(R2) � � �
L.D F2, 45(R3) � � �
MUL.D F0, F2, F4 � � �
SUB.D F8, F6, F2 � � �
DIV.D F10, F0, F6 �
ADD.D F6, F8, F2 � � �
Reservation station
Name Busy Op Vj Vk Qj Qk A
Load1
Load2
Add1
Add2
Add3
Mult1
Mult2 Yes Div Mem[45+Regs[R3]] *
Regs[F4]
Mem[34+Regs[R2]]
Register result status
F0 F2 F4 F6 F8 F10 F12 / F30
Qi Mult2
Time=16 Mult writes result, Div. stalled
15
COSC 6385 – Computer Architecture
Edgar Gabriel
Instruction status
Instruction Issue Execute Write result
L.D F6, 34(R2) � � �
L.D F2, 45(R3) � � �
MUL.D F0, F2, F4 � � �
SUB.D F8, F6, F2 � � �
DIV.D F10, F0, F6 � �
ADD.D F6, F8, F2 � � �
Reservation station
Name Busy Op Vj Vk Qj Qk A
Load1
Load2
Add1
Add2
Add3
Mult1
Mult2 Yes Div Mem[45+Regs[R3]] *
Regs[F4]
Mem[34+Regs[R2]]
Register result status
F0 F2 F4 F6 F8 F10 F12 / F30
Qi Mult2
Time=17 Div. Executed (1/40)
COSC 6385 – Computer Architecture
Edgar Gabriel
Instruction status
Instruction Issue Execute Write result
L.D F6, 34(R2) � � �
L.D F2, 45(R3) � � �
MUL.D F0, F2, F4 � � �
SUB.D F8, F6, F2 � � �
DIV.D F10, F0, F6 � �
ADD.D F6, F8, F2 � � �
Reservation station
Name Busy Op Vj Vk Qj Qk A
Load1
Load2
Add1
Add2
Add3
Mult1
Mult2
Register result status
F0 F2 F4 F6 F8 F10 F12 / F30
Qi
Time=57 Div. Writes result
16
COSC 6385 – Computer Architecture
Edgar Gabriel
Some remarks
• To preserve exception behavior, no instruction is
allowed to initiate execution until all branches
preceding the instruction have completed
• Load and store can be executed in different order if
they access different addresses
– Not easy to verify, since 100(R3) can point to the same
effective address as 0(R5)!
-> A load must wait for any uncompleted stores to the same
effective memory address
-> A store must wait until there are no unexecuted
loads/stores to the same memory address
COSC 6385 – Computer Architecture
Edgar Gabriel
Some remarks (II)
• Effective memory address calculation has to be executed in order
• For a load operation:
– Calculate effective memory address
– Check for conflicts with all active (=pending) store buffers
– If conflict: load stalls
• Bypassing memory and taking data from the store buffer directly to the load buffer often done
– Else: execute load
• For a store operation:
– Similarly checking for conflicts with both active load and store buffers
17
COSC 6385 – Computer Architecture
Edgar Gabriel
A loop based example
Loop: LD F0, 0(R1)
MULTD F4, F0, F2
SD F4, 0(R1)
SUBI R1, R1,#8
BNEZ R1, Loop
• This time assume Multiply takes 4 clocks
• Assume 1st load takes 8 clocks total (1 effective address + 7 mem. Access)(L1 cache miss), 2nd load takes 1 clock (hit)
• To be clear, will show clocks for SUBI, BNEZ
– Reality: integer instructions ahead of Fl. Pt. Instructions
• Show 2 iterations
Slide based on a lecture by David A. Patterson,
University of California, Berkley
http://www.cs.berkeley.edu/~pattrsn/252S01
COSC 6385 – Computer Architecture
Edgar Gabriel
Instruction status
Instruction Issue Execute done Write result done
L.D F0, 0(R1) 1
MUL.D F4, F0, F2
S.D F4, 0(R1)
L.D F0, 0(R1)
MUL.D F4, F0, F2
S.D F4, 0(R1)
Reservation station
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load Regs[R1] 0
Load2
Store1
Store2
Add1
Mult1
Mult2
Register result status
F0 F2 F4 F6 F8 F10 F12 / F30
Qi Load1
Time=1 Issue first load
18
COSC 6385 – Computer Architecture
Edgar Gabriel
Instruction status
Instruction Issue Execute done Write result done
L.D F0, 0(R1) 1
MUL.D F4, F0, F2 2
S.D F4, 0(R1)
L.D F0, 0(R1)
MUL.D F4, F0, F2
S.D F4, 0(R1)
Reservation station
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load Regs[R1] +0
Load2
Store1
Store2
Add1
Mult1 Yes Mult Regs[F2] Load1
Mult2
Register result status
F0 F2 F4 F6 F8 F10 F12 / F30
Qi Load1 Mult1
Time=2 first load effective address calc., Issue mult
COSC 6385 – Computer Architecture
Edgar Gabriel
Instruction status
Instruction Issue Execute done Write result done
L.D F0, 0(R1) 1
MUL.D F4, F0, F2 2
S.D F4, 0(R1) 3
L.D F0, 0(R1)
MUL.D F4, F0, F2
S.D F4, 0(R1)
Reservation station
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load Regs[R1]+0
Load2
Store1 Yes Store Regs[R1] Mult1 0
Store2
Add1
Mult1 Yes Mult Regs[F2] Load1
Mult2
Register result status
F0 F2 F4 F6 F8 F10 F12 / F30
Qi Load1 Mult1
Time=3 first load mem. access(1/7), mult stalled, Issue store
19
COSC 6385 – Computer Architecture
Edgar Gabriel
Instruction status
Instruction Issue Execute done Write result done
L.D F0, 0(R1) 1
MUL.D F4, F0, F2 2
S.D F4, 0(R1) 3
L.D F0, 0(R1)
MUL.D F4, F0, F2
S.D F4, 0(R1)
Reservation station
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load Regs[R1]+0
Load2
Store1 Yes Store Regs[R1] Mult1 +0
Store2
Add1
Mult1 Yes Mult Regs[F2] Load1
Mult2
Register result status
F0 F2 F4 F6 F8 F10 F12 / F30
Qi Load1 Mult1
Time=4 first load ex (2/7)., mult stall, store eff. addr, Calc SUBI (not shown)
COSC 6385 – Computer Architecture
Edgar Gabriel
Instruction status
Instruction Issue Execute done Write result done
L.D F0, 0(R1) 1
MUL.D F4, F0, F2 2
S.D F4, 0(R1) 3
L.D F0, 0(R1)
MUL.D F4, F0, F2
S.D F4, 0(R1)
Reservation station
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load Regs[R1]+0
Load2
Store1 Yes Store Mult1 Regs[R1] +0
Store2
Add1
Mult1 Yes Mult Regs[F2] Load1
Mult2
Register result status
F0 F2 F4 F6 F8 F10 F12 / F30
Qi Load1 Mult1
Time=5 first load exec (3/7)., mult stall, store stall, BNEZ (not shown)
20
COSC 6385 – Computer Architecture
Edgar Gabriel
Instruction status
Instruction Issue Execute done Write result done
L.D F0, 0(R1) 1
MUL.D F4, F0, F2 2
S.D F4, 0(R1) 3
L.D F0, 0(R1) 6
MUL.D F4, F0, F2
S.D F4, 0(R1)
Reservation station
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load Regs[R1]+0
Load2 Yes Load Regs[R1] 0
Store1 Yes Store Mult1 Regs[R1]+0
Store2
Add1
Mult1 Yes Mult Regs[F2] Load1
Mult2
Register result status
F0 F2 F4 F6 F8 F10 F12 / F30
Qi Load2 Mult1
Time=6 first load exec (4/7)., mult stall, store stall, issue load
COSC 6385 – Computer Architecture
Edgar Gabriel
Instruction status
Instruction Issue Execute done Write result done
L.D F0, 0(R1) 1
MUL.D F4, F0, F2 2
S.D F4, 0(R1) 3
L.D F0, 0(R1) 6
MUL.D F4, F0, F2 7
S.D F4, 0(R1)
Reservation station
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load Regs[R1]+0
Load2 Yes Load Regs[R1] +0
Store1 Yes Store Mult1 Regs[R1]+0
Store2
Add1
Mult1 Yes Mult Regs[F2] Load1
Mult2 Yes Mult Regs[F2] Load2
Register result status
F0 F2 F4 F6 F8 F10 F12 / F30
Qi Load2 Mult2
Time=7 first load ex (5/7)., mult stall, store stall, load2 eff. Add., issue mult2
21
COSC 6385 – Computer Architecture
Edgar Gabriel
Instruction status
Instruction Issue Execute done Write result done
L.D F0, 0(R1) 1
MUL.D F4, F0, F2 2
S.D F4, 0(R1) 3
L.D F0, 0(R1) 6
MUL.D F4, F0, F2 7
S.D F4, 0(R1) 8
Reservation station
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load Regs[R1]+0
Load2 Yes Load Regs[R1]+0
Store1 Yes Store Mult1 Regs[R1]+0
Store2 Yes Store Regs[R1] Mult2 0
Add1
Mult1 Yes Mult Regs[F2] Load1
Mult2 Yes Mult Regs[F2] Load2
Register result status
F0 F2 F4 F6 F8 F10 F12 / F30
Qi Load2 Mult2
Time=8 first load ex (6/7)., mult, store, mult2 stall, load2 ex., issue store2
COSC 6385 – Computer Architecture
Edgar Gabriel
Instruction status
Instruction Issue Execute done Write result done
L.D F0, 0(R1) 1 9
MUL.D F4, F0, F2 2
S.D F4, 0(R1) 3
L.D F0, 0(R1) 6
MUL.D F4, F0, F2 7
S.D F4, 0(R1) 8
Reservation station
Name Busy Op Vj Vk Qj Qk A
Load1 Yes Load Regs[R1]+0
Load2 Yes Load Regs[R1]+0
Store1 Yes Store Mult1 Regs[R1]+0
Store2 Yes Store Regs[R1] Mult2 +0
Add1
Mult1 Yes Mult Regs[F2] Load1
Mult2 Yes Mult Regs[F2] Load2
Register result status
F0 F2 F4 F6 F8 F10 F12 / F30
Qi Load2 Mult2
Time=9 first load exec (7/7)., mult, store, mult2 stall, load2 exec., store2
22
COSC 6385 – Computer Architecture
Edgar Gabriel
Instruction status
Instruction Issue Execute done Write result done
L.D F0, 0(R1) 1 9 10
MUL.D F4, F0, F2 2
S.D F4, 0(R1) 3
L.D F0, 0(R1) 6 10
MUL.D F4, F0, F2 7
S.D F4, 0(R1) 8
Reservation station
Name Busy Op Vj Vk Qj Qk A
Load1
Load2 Yes Load Regs[R1]+0
Store1 Yes Store Mult1 Regs[R1]+0
Store2 Yes Store Mult2 Regs[R1]+0
Add1
Mult1 Yes Mult Mem[Load1] Regs[F2] Load1
Mult2 Yes Mult Regs[F2] Load2
Register result status
F0 F2 F4 F6 F8 F10 F12 / F30
Qi Load2 Mult2
Time=10 first load write res. mult, store, mult2 stall, load2 finish, store2 stal
COSC 6385 – Computer Architecture
Edgar Gabriel
Instruction status
Instruction Issue Execute done Write result done
L.D F0, 0(R1) 1 9 10
MUL.D F4, F0, F2 2
S.D F4, 0(R1) 3
L.D F0, 0(R1) 6 10 11
MUL.D F4, F0, F2 7
S.D F4, 0(R1) 8
Reservation station
Name Busy Op Vj Vk Qj Qk A
Load1
Load2
Store1 Yes Store Mult1 Regs[R1]+0
Store2 Yes Store Mult2 Regs[R1]+0
Add1
Mult1 Yes Mult Mem[Load1] Regs[F2] Load1
Mult2 Yes Mult Mem[Load2] Regs[F2] Load2
Register result status
F0 F2 F4 F6 F8 F10 F12 / F30
Qi Mult2
Time=11 Load 2 write res, Mult1 (1/4), mult2, store1, store2 stalled
23
COSC 6385 – Computer Architecture
Edgar Gabriel
Instruction status
Instruction Issue Execute done Write result done
L.D F0, 0(R1) 1 9 10
MUL.D F4, F0, F2 2 14
S.D F4, 0(R1) 3
L.D F0, 0(R1) 6 10 11
MUL.D F4, F0, F2 7
S.D F4, 0(R1) 8
Reservation station
Name Busy Op Vj Vk Qj Qk A
Load1
Load2
Store1 Yes Store Mult1 Regs[R1]+0
Store2 Yes Store Mult2 Regs[R1]+0
Add1
Mult1 Yes Mult Mem[Load1] Regs[F2] Load1
Mult2 Yes Mult Mem[Load2] Regs[F2] Load2
Register result status
F0 F2 F4 F6 F8 F10 F12 / F30
Qi Mult2
Time=14 Mult1 (4/4), Mult2 (3/4), store1, store2 stalled
COSC 6385 – Computer Architecture
Edgar Gabriel
Instruction status
Instruction Issue Execute done Write result done
L.D F0, 0(R1) 1 9 10
MUL.D F4, F0, F2 2 14 15
S.D F4, 0(R1) 3
L.D F0, 0(R1) 6 10 11
MUL.D F4, F0, F2 7 15
S.D F4, 0(R1) 8
Reservation station
Name Busy Op Vj Vk Qj Qk A
Load1
Load2
Store1 Yes Store Mult1 Regs[R1]+0
Store2 Yes Store Mult2 Regs[R1]+0
Add1
Mult1
Mult2 Yes Mult Mem[Load2] Regs[F2] Load2
Register result status
F0 F2 F4 F6 F8 F10 F12 / F30
Qi Mult2
Time=15 Mult1 write res., Mult2 (4/4), store1 exec, store2 stalled
24
COSC 6385 – Computer Architecture
Edgar Gabriel
Instruction status
Instruction Issue Execute done Write result done
L.D F0, 0(R1) 1 9 10
MUL.D F4, F0, F2 2 14 15
S.D F4, 0(R1) 3
L.D F0, 0(R1) 6 10 11
MUL.D F4, F0, F2 7 15 16
S.D F4, 0(R1) 8
Reservation station
Name Busy Op Vj Vk Qj Qk A
Load1
Load2
Store1 Yes Store Mult1 Regs[R1]+0
Store2 Yes Store Mult2 Regs[R1]+0
Add1
Mult1
Mult2
Register result status
F0 F2 F4 F6 F8 F10 F12 / F30
Qi
Time=16 store1, store2 exec
COSC 6385 – Computer Architecture
Edgar Gabriel
Tomasulo’s Algorithm
• Please note:
– F0 never sees data from the first load
– Register File completely detached from computation
– First and Second iteration overlap completely
– Assuming two Mult units, we could not have issued a third
mult operation for the next iteration of the loop
-> no third store instruction could be issued
• In order issue, out-of-order execution, out-of-order
completion
Slide based on a lecture by David A. Patterson,
University of California, Berkley
http://www.cs.berkeley.edu/~pattrsn/252S01
25
COSC 6385 – Computer Architecture
Edgar Gabriel
Why can Tomasulo overlap
iterations of loops?• Register renaming
– Multiple iterations use different physical destinations for registers (dynamic loop unrolling).
• Reservation stations
– Permit instruction issue to advance past integer control flow operations
– Also buffer old values of registers - totally avoiding the WAR stall that we saw in the scoreboard.
• Other perspective: Tomasulo building data flow dependency graph on the fly.
Slide based on a lecture by David A. Patterson,
University of California, Berkley
http://www.cs.berkeley.edu/~pattrsn/252S01