CS/ECE 552: Pipelining (Part 3)
Prof. Matthew D. Sinclair
Lecture notes based in part on slides created by MikkoLipasti, Mark Hill, Josh San Miguel, and John Shen
Announcements 2/20
• Project Design Review Monday 2/24– My office, 6369 CS
• Midterm coming up next week (3/5 in class)– Closed book, one double-sided hand-written cheat sheet– Calculators allowed– MIPS green cards provided– Covers Weeks 1 through 6– Will post additional Midterm Details today under Week 7 on Canvas
• HW3 Posted Tomorrow, Due 2/28• Project Phase 1 due 3/13• HW1 Grades Released• HW2 Canvas Submission – per group
2
Announcements 2/25
• Midterm coming up next week (3/5 in class)– Closed book, one double-sided hand-written cheat sheet– Calculators allowed– MIPS green cards provided– Covers Weeks 1 through 6– Posted additional Midterm Details
• Practice Exams posted on Canvas Week 7• Link to Course Website with topics that will covered
– Next Tuesday: exam review – bring questions!
• HW3 Posted Friday, Due 2/28• Project Phase 1 due 3/13• HW1 Grades Released
– Expectations for your homework and project submissions
• HW2 Grading In Progress
3
Today’s Learning Objectives
• Analyze how branches impact the performance of pipelined programs
• Identify branch delay slots, and revise code to utilize them
• Demonstrate how branches require additional forwarding
4
Data Hazards?
• Pipelining, without forwarding:– assume RF bypassing– average CPI = 1
5
250 ps 150 ps 100 ps 350 ps 150 ps
I5 I4 I3 I2 I1
RAW (20%)
RAW (50%)
Data Hazards?
• Pipelining, without forwarding:– assume RF bypassing– average CPI = 1 + (1 × 20%) + (2 × 50%) = 2.2– 770 ps per instruction 6
250 ps 150 ps 100 ps 350 ps 150 ps
I5 I4 I3 I2 I1
RAW (20%)
RAW (50%)
Data Hazards?
• Pipelining, with forwarding:– assume RF bypassing– average CPI = 1
7
250 ps 150 ps 100 ps 350 ps 150 ps
I5 I4 I3 I2 I1
load-to-use (25%)
Data Hazards?
• Pipelining, with forwarding:– assume RF bypassing– average CPI = 1 + (1 × 25%) = 1.25– 437.5 ps per instruction 8
250 ps 150 ps 100 ps 350 ps 150 ps
I5 I4 I3 I2 I1
load-to-use (25%)
Control Dependences• Conditional branches (e.g., beq, bne):
– Branch must execute to determine which instruction to fetch next; subsequent instructions are control-dependent on the branch instruction
– COD Figure 4.65, branches resolved in ID stage:
9
Control Dependences• Conditional branches (e.g., beq, bne):
– Branch must execute to determine which instruction to fetch next; subsequent instructions are control-dependent on the branch instruction
– COD Figure 4.65, branches resolved in ID stage:
10
target
condition
Control Dependences
beq $s1, $s2, SKIP
add $s4, $s5, $s6
...
SKIP: sub $s4, $s5, $s6
With predict-not-taken (flush otherwise):
11
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
beq F
Control Dependences
beq $s1, $s2, SKIP
add $s4, $s5, $s6
...
SKIP: sub $s4, $s5, $s6
With predict-not-taken (flush otherwise):
12
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
beq F D
add F
Control Dependences
beq $s1, $s2, SKIP
add $s4, $s5, $s6
...
SKIP: sub $s4, $s5, $s6
With predict-not-taken (flush otherwise):
13
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
beq F D X
add F D
sub F
Control Dependences
beq $s1, $s2, SKIP
add $s4, $s5, $s6
...
SKIP: sub $s4, $s5, $s6
With predict-not-taken (flush otherwise):
14
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
beq F D X
add F =
sub F
Control Dependences
beq $s1, $s2, SKIP
add $s4, $s5, $s6
...
SKIP: sub $s4, $s5, $s6
With predict-not-taken (flush otherwise):
15
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
beq F D X M W
add F =
sub F D X M W
16
COD Figure 4.65
Set PC to IF/ID.BranchAddr
Set IF/ID.Instruction to 0x00000000(sll $0, $0, 0)
Control Dependences
Control Hazards?
• Pipelining, with predict-not-taken:– assume branches resolved in ID, flush if branch taken– average CPI = 1
17
250 ps 150 ps 100 ps 350 ps 150 ps
I5 I4 I3 I2 I1
60% branches taken
Control Hazards?
• Pipelining, with predict-not-taken:– assume branches resolved in ID, flush if branch taken– average CPI = 1 + (1 × 60%) = 1.6– 560 ps per instruction 18
250 ps 150 ps 100 ps 350 ps 150 ps
I5 I4 I3 I2 I1
60% branches taken
Control Hazards?
• Pipelining, with dynamic branch prediction:– assume branches resolved in ID, flush if branch mispredicted– average CPI = 1
19
250 ps 150 ps 100 ps 350 ps 150 ps
I5 I4 I3 I2 I1
90% branches predicted correctly
Control Hazards?
• Pipelining, with dynamic branch prediction:– assume branches resolved in ID, flush if branch mispredicted– average CPI = 1 + (1 × 10%) = 1.1– 385 ps per instruction 20
250 ps 150 ps 100 ps 350 ps 150 ps
I5 I4 I3 I2 I1
90% branches predicted correctly
21
COD Figure 4.65
condition
…But RAW Hazard at ID?
add $s1, $s2, $s3
beq $s1, $s4, SKIP
With no forwarding to branch decision circuit in ID(assume RF bypassing):
22
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
add
beq
…But RAW Hazard at ID?
add $s1, $s2, $s3
beq $s1, $s4, SKIP
With no forwarding to branch decision circuit in ID(assume RF bypassing):
23
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
add F D X
beq F
…But RAW Hazard at ID?
add $s1, $s2, $s3
beq $s1, $s4, SKIP
With no forwarding to branch decision circuit in ID(assume RF bypassing):
24
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
add F D X M W
beq F * * D X M W
…But RAW Hazard at ID?
25
COD Figure 4.65
condition
…But RAW Hazard at ID?
add $s1, $s2, $s3
beq $s1, $s4, SKIP
With forwarding to branch decision circuit in ID(assume RF bypassing):
26
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
add F D X
beq F
…But RAW Hazard at ID?
add $s1, $s2, $s3
beq $s1, $s4, SKIP
With forwarding to branch decision circuit in ID(assume RF bypassing):
27
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
add F D X M W
beq F * D X M W
…But RAW Hazard at ID?
CS/ECE 552: Pipelining (Part 4)
Prof. Matthew D. Sinclair
Lecture notes based in part on slides created by MikkoLipasti, Mark Hill, Josh San Miguel, and John Shen
Pipeline Diagrams
beq $s1, $s2, DEST
add $s4, $s5, $s6
...
DEST: sub $s4, $s5, $s6
With predict-not-taken:
29
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
beq F D X M W
Pipeline Diagrams
beq $s1, $s2, DEST
add $s4, $s5, $s6
...
DEST: sub $s4, $s5, $s6
With predict-not-taken:
30
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
beq F D X M W
add F =
Pipeline Diagrams
beq $s1, $s2, DEST
add $s4, $s5, $s6
...
DEST: sub $s4, $s5, $s6
With predict-not-taken:
31
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
beq F D X M W
add F =
DEST F D X M W
Pipeline Diagrams
beq $s1, $s2, DEST
add $s4, $s5, $s6
...
DEST: sub $s4, $s5, $s6
With predict-not-taken:
32
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
beq F D X M W
add F D X M W
DEST F D X M W
NOP
33
beq
Control Hazards
add
cycle 2:
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
beq F D X M W
add F =
DEST F D X M W
34
add (NOP)
Control Hazards
DEST beq
cycle 3:
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
beq F D X M W
add F =
DEST F D X M W
35
add (NOP)
Control Hazards
DEST beq
cycle 4:
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
beq F D X M W
add F =
DEST F D X M W
36
add (NOP)
Control Hazards
DEST beq
cycle 5:
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
beq F D X M W
add F =
DEST F D X M W
37
add (NOP)
Control Hazards
DEST
cycle 6:
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
beq F D X M W
add F =
DEST F D X M W
Branch Delay Slots
beq $s1, $s2, DEST
add $s4, $s5, $s6 # branch delay slot
...
DEST: sub $s4, $s5, $s6
With one branch delay slot:
38
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
beq F D X M W
Branch Delay Slots
beq $s1, $s2, DEST
add $s4, $s5, $s6 # branch delay slot
...
DEST: sub $s4, $s5, $s6
With one branch delay slot:
39
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
beq F D X M W
add F D X M W
Branch Delay Slots
beq $s1, $s2, DEST
add $s4, $s5, $s6 # branch delay slot
...
DEST: sub $s4, $s5, $s6
With one branch delay slot:
40
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
beq F D X M W
add F D X M W
sub F D X M W
Branch Delay Slots
beq $s1, $s2, DEST
sll $0, $0, 0 # branch delay slot
add $s4, $s5, $s6
...
DEST: sub $s4, $s5, $s6
With one branch delay slot:
41
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
beq F D X M W
NOP F D X M W
sub F D X M W
Branch Delay Slots
sub $s5, $s6, $s7
add $s1, $s2, $s3
beq $s1, $s4, T1
sll $0, $0, 0 # branch delay slot
add $s4, $s5, $s6
j T2
…
T1: or $s4, $s4, $s7
slt $s2, $s4, $s6
T2: and $s2, $s2, $s3
42
Branch Delay Slots
sub $s5, $s6, $s7
add $s1, $s2, $s3
beq $s1, $s4, T1
sub $s5, $s6, $s7 # branch delay slot – legal?
add $s4, $s5, $s6
j T2
…
T1: or $s4, $s4, $s7
slt $s2, $s4, $s6
T2: and $s2, $s2, $s3
43
Branch Delay Slots
sub $s5, $s6, $s7
add $s1, $s2, $s3
beq $s1, $s4, T1
sub $s5, $s6, $s7 # branch delay slot – legal? yes
add $s4, $s5, $s6
j T2
…
T1: or $s4, $s4, $s7
slt $s2, $s4, $s6
T2: and $s2, $s2, $s3
44
Branch Delay Slots
sub $s5, $s6, $s7
add $s1, $s2, $s3
beq $s1, $s4, T1
add $s1, $s2, $s3 # branch delay slot – legal?
add $s4, $s5, $s6
j T2
…
T1: or $s4, $s4, $s7
slt $s2, $s4, $s6
T2: and $s2, $s2, $s3
45
Branch Delay Slots
sub $s5, $s6, $s7
add $s1, $s2, $s3
beq $s1, $s4, T1
add $s1, $s2, $s3 # branch delay slot – legal? no
add $s4, $s5, $s6
j T2
…
T1: or $s4, $s4, $s7
slt $s2, $s4, $s6
T2: and $s2, $s2, $s3
46
Branch Delay Slots
sub $s5, $s6, $s7
add $s1, $s2, $s3
beq $s1, $s4, T1
add $s4, $s5, $s6 # branch delay slot – legal?
add $s4, $s5, $s6
j T2
…
T1: or $s4, $s4, $s7
slt $s2, $s4, $s6
T2: and $s2, $s2, $s3
47
Branch Delay Slots
sub $s5, $s6, $s7
add $s1, $s2, $s3
beq $s1, $s4, T1
add $s4, $s5, $s6 # branch delay slot – legal? no
add $s4, $s5, $s6
j T2
…
T1: or $s4, $s4, $s7
slt $s2, $s4, $s6
T2: and $s2, $s2, $s3
48
Branch Delay Slots
sub $s5, $s6, $s7
add $s1, $s2, $s3
beq $s1, $s4, T1
or $s4, $s4, $s7 # branch delay slot – legal?
add $s4, $s5, $s6
j T2
…
T1: or $s4, $s4, $s7
slt $s2, $s4, $s6
T2: and $s2, $s2, $s3
49
Branch Delay Slots
sub $s5, $s6, $s7
add $s1, $s2, $s3
beq $s1, $s4, T1
or $s4, $s4, $s7 # branch delay slot – legal? yes
add $s4, $s5, $s6
j T2
…
T1: or $s4, $s4, $s7
slt $s2, $s4, $s6
T2: and $s2, $s2, $s3
50
Branch Delay Slots
51
Branch Delay Slots
jal FUNC
sll $0, $0, 0 # branch delay slot
add $s4, $s5, $s6
...
FUNC:
or $s4, $s5, $s6
jr $ra
52
Branch Delay Slots
jal FUNC
sll $0, $0, 0 # branch delay slot
add $s4, $s5, $s6
...
FUNC:
or $s4, $s5, $s6
jr $ra
sll $0, $0, 0 # branch delay slot
53
BACKUP
54
Why Pipelining?
55
Why Pipelining?
56
Why Pipelining?
57
Why Pipelining?
58
Why Pipelining?
59
Why Pipelining?
60
Why Pipelining?
61
Why Pipelining?
62
Why Pipelining?
63
Why Pipelining?
64
Why Pipelining?
65
Why Pipelining?
66
Why Pipelining?
67
Why Pipelining?
68
I1I2I3
Why Pipelining?
69
I1
I2I3I4
Why Pipelining?
70
I1
I2I3I4
Why Pipelining?
71
I1
I2I3I4
Why Pipelining?
72
I1
I2I3I4
Why Pipelining?
73
I2I3I4
I1
Why Pipelining?
74
I1I2I3
Why Pipelining?
75
I2I3I4
I1
Why Pipelining?
76
I3I4I5
I2 I1
Why Pipelining?
77
I4I5I6
I3 I2 I1
Why Pipelining?
78
I5I6I7
I4 I3 I2 I1
Why Pipelining?
79
I6I7I8
I5 I4 I3 I2 I1
Why Pipelining?
80
250 ps 150 ps 100 ps 350 ps 150 ps
Why Pipelining?
• Single-cycle:– clock period = 1 ns– CPI = 1– 1 ns per instruction 81
250 ps 150 ps 100 ps 350 ps 150 ps
Why Pipelining?
• Pipelining:– clock period = max{IF,ID,EX,MEM,WB} = 350 ps
82
250 ps 150 ps 100 ps 350 ps 150 ps
Why Pipelining?
• Pipelining:– clock period = max{IF,ID,EX,MEM,WB} = 350 ps– individual CPI = 5
83
250 ps 150 ps 100 ps 350 ps 150 ps
Why Pipelining?
• Pipelining:– clock period = max{IF,ID,EX,MEM,WB} = 350 ps– individual CPI = 5, average CPI = (#insns + 4) / #insns ≈ 1
84
250 ps 150 ps 100 ps 350 ps 150 ps
I5 I4 I3 I2 I1
Why Pipelining?
• Pipelining:– clock period = max{IF,ID,EX,MEM,WB} = 350 ps– individual CPI = 5, average CPI = (#insns + 4) / #insns ≈ 1– 350 ps per instruction 85
250 ps 150 ps 100 ps 350 ps 150 ps
I5 I4 I3 I2 I1
Pipeline Diagrams
lw $s1, 0($s2)
lw $s3, 4($s1)
add $s5, $s4, $s3
Assume full forwarding and bypassing:
86
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
lw $s1 F D X M W
Pipeline Diagrams
lw $s1, 0($s2)
lw $s3, 4($s1)
add $s5, $s4, $s3
Assume full forwarding and bypassing:
87
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
lw $s1 F D X M W
lw $s3 F
Pipeline Diagrams
lw $s1, 0($s2)
lw $s3, 4($s1)
add $s5, $s4, $s3
Assume full forwarding and bypassing:
88
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
lw $s1 F D X M W
lw $s3 F D* D
Pipeline Diagrams
lw $s1, 0($s2)
lw $s3, 4($s1)
add $s5, $s4, $s3
Assume full forwarding and bypassing:
89
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
lw $s1 F D X M W
lw $s3 F D* D X M W
Pipeline Diagrams
lw $s1, 0($s2)
lw $s3, 4($s1)
add $s5, $s4, $s3
Assume full forwarding and bypassing:
90
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
lw $s1 F D X M W
lw $s3 F D* D X M W
add F* F
Pipeline Diagrams
lw $s1, 0($s2)
lw $s3, 4($s1)
add $s5, $s4, $s3
Assume full forwarding and bypassing:
91
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
lw $s1 F D X M W
lw $s3 F D* D X M W
add F* F D* D
Pipeline Diagrams
lw $s1, 0($s2)
lw $s3, 4($s1)
add $s5, $s4, $s3
Assume full forwarding and bypassing:
92
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
lw $s1 F D X M W
lw $s3 F D* D X M W
add F* F D* D X M W
Pipeline Diagrams
lw $s1, 0($s2)
lw $s3, 4($s1)
add $s5, $s4, $s3
Assume full forwarding and bypassing:
93
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
lw $s1 F D X M W
X M W
lw $s3 F D* D X M W
add F* F D* D X M W
Pipeline Diagrams
lw $s1, 0($s2)
lw $s3, 4($s1)
add $s5, $s4, $s3
Assume full forwarding and bypassing:
94
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
lw $s1 F D X M W
X M W
lw $s3 F D* D X M W
X M W
add F* F D* D X M W
Pipeline Diagrams
lw $s1, 0($s2)
lw $s3, 4($s1)
add $s5, $s4, $s3
Assume full forwarding and bypassing:
95
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
lw $s1 F D X M W
X M W
lw $s3 F D* D X M W
X M W
add F* F D* D X M W
stalls
NOP
NOP
96
lw $s3
Data Hazards
add lw $s1
cycle 3:
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
lw $s1 F D X M W
X M W
lw $s3 F D* D X M W
X M W
add F* F D* D X M W
97
lw $s3
Data Hazards
add lw $s1
cycle 3:
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
lw $s1 F D X M W
X M W
lw $s3 F D* D X M W
X M W
add F* F D* D X M W
98
lw $s3
Data Hazards
add NOP
cycle 4:
lw $s1
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
lw $s1 F D X M W
X M W
lw $s3 F D* D X M W
X M W
add F* F D* D X M W
99
lw $s3
Data Hazards
add NOP
cycle 5:
lw $s1
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
lw $s1 F D X M W
X M W
lw $s3 F D* D X M W
X M W
add F* F D* D X M W
100
lw $s3
Data Hazards
add NOP
cycle 5:
lw $s1
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
lw $s1 F D X M W
X M W
lw $s3 F D* D X M W
X M W
add F* F D* D X M W
101
lw $s3
Data Hazards
add NOP
cycle 5:
lw $s1
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
lw $s1 F D X M W
X M W
lw $s3 F D* D X M W
X M W
add F* F D* D X M W
102
lw $s3
Data Hazards
add NOP
cycle 6:
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
lw $s1 F D X M W
X M W
lw $s3 F D* D X M W
X M W
add F* F D* D X M W
NOP
103
lw $s3
Data Hazards
add
cycle 7:
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
lw $s1 F D X M W
X M W
lw $s3 F D* D X M W
X M W
add F* F D* D X M W
NOP
104
lw $s3
Data Hazards
add
cycle 7:
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
lw $s1 F D X M W
X M W
lw $s3 F D* D X M W
X M W
add F* F D* D X M W
NOP
105
Data Hazards
add
cycle 8:
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
lw $s1 F D X M W
X M W
lw $s3 F D* D X M W
X M W
add F* F D* D X M W
NOP
MEM-to-EX Forwarding
lw $s1, 0($s2)
lw $s3, 4($s1)
add $s5, $s4, $s3
Assume full forwarding and bypassing:
106
insn\cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
lw $s1 F D X M W
X M W
lw $s3 F D* D X M W
X M W
add F* F D* D X M W
107
COD Figure 4.65
MEM-to-EX Forwarding
108
COD Figure 4.65
Why not this?
109
COD Figure 4.65
How about this?