pipelining -...
TRANSCRIPT
![Page 1: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/1.jpg)
PipeliningProf. James L. Frankel
Harvard University
Version of 2:05 AM 2-Apr-2019Copyright © 2019 James L. Frankel. All rights reserved.
![Page 2: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/2.jpg)
Single-Cycle CPU Implementation
• The clock cycle must be long enough to accommodate the longest instruction
• There is no overlapping of the execution of instructions
• With time-consuming instructions (floating point, memory access), the penalty is high
![Page 3: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/3.jpg)
Pipelining
• Decreases the average number of cycles per instruction (CPI) –throughput
• Does not decrease the time for any single instruction – latency
![Page 4: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/4.jpg)
Memory Organization
• We will allow the assumption that there are “two” memories – one for instructions and one for data
• This is not really the case
• We will have an instruction cache and a data cache• These will create the illusion that we have two separate memories
![Page 5: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/5.jpg)
Instruction Set Design for Pipelining
• All instructions are the same length• x86 ISA has 1 to 15 bytes per instruction
• Small number of instruction formats• Operands can be fetched from registers before determining the instruction
type
• Load/store architecture means that the EX (ALU) stage can compute the memory address• If operands could come from memory, then the memory access would need
to precede the EX stage
• All memory accesses are aligned• Only a single memory access is required
![Page 6: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/6.jpg)
Data Path Split into Pipeline Stages
![Page 7: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/7.jpg)
InsMemAd
dr
InsO
ut
4
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
3232
5
5
[20..16]
[15..11]
5
[25..21]
[25..0]Shift left two bits
2628
[31..28]
4
Sign extend
[15..0]
16
Shift left two bits
32 32
32
32
5
5
32
32
Add
er
Add
erA
LU
ALUFcnincl. Carryin
PC
LoadPC
ClearPC
Mux
BrchOrJmpOr
Squencl
Mux
WriteRegFromMemory
OrALU
Mux
ALUSecondOpImmOrReg2
Mux
Regw#From20..16Or15..11
IR
LoadIR
RegReadOrWrite
RegArray
RegReadOrWrite
DataIn
Reg2#
RegW#
Reg1
Reg2
Reg1#
IF: Instruction Fetch ID: Instruction Decode/Register File ReadEX: Execute/Address
Calculation
DataMemA
ddr
Rea
dD
ata
WriteData
5
5
MEM: Memory Access
WB: Write Back
32
![Page 8: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/8.jpg)
Right to Left Data Paths are Problematic
![Page 9: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/9.jpg)
InsMemAd
dr
InsO
ut
4
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
3232
5
5
[20..16]
[15..11]
5
[25..21]
[25..0]Shift left two bits
2628
[31..28]
4
Sign extend
[15..0]
16
Shift left two bits
32 32
32
32
5
5
32
32
Add
er
Add
erA
LU
ALUFcnincl. Carryin
PC
LoadPC
ClearPC
Mux
BrchOrJmpOr
Squencl
Mux
WriteRegFromMemory
OrALU
Mux
ALUSecondOpImmOrReg2
Mux
Regw#From20..16Or15..11
IR
LoadIR
RegReadOrWrite
RegArray
RegReadOrWrite
DataIn
Reg2#
RegW#
Reg1
Reg2
Reg1#
IF: Instruction Fetch ID: Instruction Decode/Register File ReadEX: Execute/Address
Calculation
DataMemA
ddr
Rea
dD
ata
WriteData
5
5
MEM: Memory Access
WB: Write Back
32
![Page 10: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/10.jpg)
Pipelining Three lw Instructions
IM Reg ALU DM Reglw $1, 100($0)
IM Reg ALU DM Reglw $2, 200($0)
IM Reg ALU DM Reglw $3, 300($0)
Instruction Execution Order
![Page 11: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/11.jpg)
Staging the Pipelined Data
• Add pipeline registers between stages
• Allows the pipeline stages to run independently within the same clock cycle
![Page 12: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/12.jpg)
InsMemA
dd
r
InsO
ut
4
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
3232
5
5
[20..16]
[15..11]
5
[25..21]
[25..0]Shift left two bits
2628
[31..28]
4
Sign extend
[15..0]
16
Shift left two bits
32 32
32
32
5
5
32
32
Add
er
Add
erA
LU
ALUFcnincl. Carryin
PC
LoadPC
ClearPC
Mux
BrchOrJmpOr
Squencl
Mux
WriteRegFromMemory
OrALU
Mux
ALUSecondOpImmOrReg2
Mux
Regw#From20..16Or15..11
IR
LoadIR
RegReadOrWrite
RegArray
RegReadOrWrite
DataIn
Reg2#
RegW#
Reg1
Reg2
Reg1#
IF: Instruction Fetch ID: Instruction Decode/Register File ReadEX: Execute/Address
Calculation
DataMemA
ddr
Rea
dD
ata
WriteData
5
5
MEM: Memory Access
WB: Write Back
32
IF/ID ID/EX EX/MEM MEM/WB
![Page 13: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/13.jpg)
lw Instruction in IF Pipeline Stage
Add
er
InsMemA
dd
r
InsO
ut
4
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
3232
5
5
[20..16]
[15..11]
5
[25..21]
[25..0]Shift left two bits
2628
[31..28]
4
Sign extend
[15..0]
16
Shift left two bits
32 32
32
32
5
5
32
32
Add
er
Add
erA
LU
ALUFcnincl. Carryin
PC
LoadPC
ClearPC
Mux
BrchOrJmpOr
Squencl
Mux
WriteRegFromMemory
OrALU
Mux
ALUSecondOpImmOrReg2
Mux
Regw#From20..16Or15..11
IR
LoadIR
RegReadOrWrite
RegArray
RegReadOrWrite
DataIn
Reg2#
RegW#
Reg1
Reg2
Reg1#
IF: Instruction Fetch ID: Instruction Decode/Register File ReadEX: Execute/Address
Calculation
DataMemA
ddr
Rea
dD
ata
WriteData
5
5
MEM: Memory Access
WB: Write Back
32
IF/ID ID/EX EX/MEM MEM/WB
InsO
utIns
MemAd
dr
Mux
PC
LoadPC
![Page 14: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/14.jpg)
ID (Instruction Decode) Stage
• No need to keep the IR as a register because the current instruction is stored in the IF/ID pipeline register
![Page 15: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/15.jpg)
lw Instruction in ID Pipeline Stage
InsMemA
dd
r
InsO
ut
4
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
3232
5
5
[20..16]
[15..11]
5
[25..21]
[25..0]Shift left two bits
2628
[31..28]
4
Sign extend
[15..0]
16
Shift left two bits
32 32
32
32
5
5
32
32
Add
er
Add
erA
LU
ALUFcnincl. Carryin
PC
LoadPC
ClearPC
Mux
BrchOrJmpOr
Squencl
Mux
WriteRegFromMemory
OrALU
Mux
ALUSecondOpImmOrReg2
Mux
Regw#From20..16Or15..11
RegReadOrWrite
RegArray
RegReadOrWrite
DataIn
Reg2#
RegW#
Reg1
Reg2
Reg1#
IF: Instruction Fetch ID: Instruction Decode/Register File ReadEX: Execute/Address
Calculation
DataMemA
ddr
Rea
dD
ata
WriteData
5
5
MEM: Memory Access
WB: Write Back
32
IF/ID ID/EX EX/MEM MEM/WB
Sign extend
RegReadOrWrite
DataIn
Reg2#
RegW#
Reg1
Reg2
Reg1#Reg
Array
![Page 16: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/16.jpg)
lw Instruction in EX Pipeline Stage
InsMemA
dd
r
InsO
ut
4
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
3232
5
5
[20..16]
[15..11]
5
[25..21]
[25..0]Shift left two bits
2628
[31..28]
4
Sign extend
[15..0]
16
Shift left two bits
32 32
32
32
5
5
32
32
Add
er
Ad
der
ALU
ALUFcnincl. Carryin
PC
LoadPC
ClearPC
Mux
BrchOrJmpOr
Squencl
Mux
WriteRegFromMemory
OrALU
Mux
ALUSecondOpImmOrReg2
Mu
x
Regw#From20..16Or15..11
RegReadOrWrite
RegArray
RegReadOrWrite
DataIn
Reg2#
RegW#
Reg1
Reg2
Reg1#
IF: Instruction Fetch ID: Instruction Decode/Register File ReadEX: Execute/Address
Calculation
DataMemA
ddr
Rea
dD
ata
WriteData
5
5
MEM: Memory Access
WB: Write Back
32
IF/ID ID/EX EX/MEM MEM/WB
Mux
ALU
![Page 17: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/17.jpg)
lw Instruction in MEM Pipeline Stage
InsMemA
dd
r
InsO
ut
4
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
3232
5
5
[20..16]
[15..11]
5
[25..21]
[25..0]Shift left two bits
2628
[31..28]
4
Sign extend
[15..0]
16
Shift left two bits
32 32
32
32
5
5
32
32
Add
er
Add
erA
LU
ALUFcnincl. Carryin
PC
LoadPC
ClearPC
Mux
BrchOrJmpOr
Squencl
Mux
WriteRegFromMemory
OrALU
Mux
ALUSecondOpImmOrReg2
Mux
Regw#From20..16Or15..11
RegReadOrWrite
RegArray
RegReadOrWrite
DataIn
Reg2#
RegW#
Reg1
Reg2
Reg1#
IF: Instruction Fetch ID: Instruction Decode/Register File ReadEX: Execute/Address
Calculation
DataMemA
ddr
Rea
dD
ata
WriteData
5
5
MEM: Memory Access
WB: Write Back
32
IF/ID ID/EX EX/MEM MEM/WB
DataMem
WriteData
Ad
dr
Rea
dD
ata
![Page 18: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/18.jpg)
lw Instruction in WB Pipeline Stage
InsMemA
dd
r
InsO
ut
4
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
3232
5
5
[20..16]
[15..11]
5
[25..21]
[25..0]Shift left two bits
2628
[31..28]
4
Sign extend
[15..0]
16
Shift left two bits
32 32
32
32
5
5
32
32
Add
er
Add
erA
LU
ALUFcnincl. Carryin
PC
LoadPC
ClearPC
Mux
BrchOrJmpOr
Squencl
Mux
WriteRegFromMemory
OrALU
Mux
ALUSecondOpImmOrReg2
Mux
Regw#From20..16Or15..11
RegReadOrWrite
RegArray
RegReadOrWrite
DataIn
Reg2#
RegW#
Reg1
Reg2
Reg1#
IF: Instruction Fetch ID: Instruction Decode/Register File ReadEX: Execute/Address
Calculation
DataMemA
ddr
Rea
dD
ata
WriteData
5
5
MEM: Memory Access
WB: Write Back
32
IF/ID ID/EX EX/MEM MEM/WB
Mux
RegReadOrWrite
DataIn
Reg2 #
RegW #
Reg1
Reg2
Reg1 #Reg
Array
![Page 19: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/19.jpg)
Problem with lw in WB Pipeline Stage
• Unfortunately, the incorrect register is written in the WB stage
• The Regw# is coming from the wrong pipeline stage!
• Let’s correct the problem by keeping the Regw# with the value that will be written to that register
![Page 20: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/20.jpg)
Corrected Pipeline Diagram for lw Instruction in WB Pipeline Stage
InsMemA
ddr
InsO
ut
4
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
3232
5
5
[20..16]
[15..11]
5
[25..21]
[25..0]Shift left two bits
2628
[31..28]
4
Sign extend
[15..0]
16
Shift left two bits
32 32
32
32
5
5
32
32
Add
er
Add
erA
LU
ALUFcnincl. Carryin
PC
LoadPC
ClearPC
Mux
BrchOrJmpOr
Squencl
Mux
WriteRegFromMemory
OrALU
Mux
ALUSecondOpImmOrReg2
Mux
Regw#From20..16Or15..11
RegReadOrWrite
RegArray
RegReadOrWrite
DataIn
Reg2#
RegW#
Reg1
Reg2
Reg1#
IF: Instruction Fetch ID: Instruction Decode/Register File ReadEX: Execute/Address
Calculation
DataMemA
ddr
Rea
dD
ata
WriteData
5
5
MEM: Memory Access
WB: Write Back
32
IF/ID ID/EX EX/MEM MEM/WB
Mux
RegReadOrWrite
DataIn
Reg2 #
RegW #
Reg1
Reg2
Reg1 #Reg
Array
5
5
![Page 21: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/21.jpg)
Executing an sw Instruction
IM Reg ALUsw $4, 400($0) DM Reg
![Page 22: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/22.jpg)
sw Instruction in MEM Pipeline Stage
InsMemA
ddr
InsO
ut
4
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
3232
5
5
[20..16]
[15..11]
5
[25..21]
[25..0]Shift left two bits
2628
[31..28]
4
Sign extend
[15..0]
16
Shift left two bits
32 32
32
32
5
5
32
32
Add
er
Add
erA
LU
ALUFcnincl. Carryin
PC
LoadPC
ClearPC
Mux
BrchOrJmpOr
Squencl
Mux
WriteRegFromMemory
OrALU
Mux
ALUSecondOpImmOrReg2
Mux
Regw#From20..16Or15..11
RegReadOrWrite
RegArray
RegReadOrWrite
DataIn
Reg2#
RegW#
Reg1
Reg2
Reg1#
IF: Instruction Fetch ID: Instruction Decode/Register File ReadEX: Execute/Address
Calculation
DataMemA
ddr
Rea
dD
ata
WriteData
5
5
MEM: Memory Access
WB: Write Back
32
IF/ID ID/EX EX/MEM MEM/WB
5
5
DataMem
WriteData
Ad
dr
Rea
dD
ata
![Page 23: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/23.jpg)
sw Instruction in WB Pipeline Stage
InsMemA
ddr
InsO
ut
4
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
3232
5
5
[20..16]
[15..11]
5
[25..21]
[25..0]Shift left two bits
2628
[31..28]
4
Sign extend
[15..0]
16
Shift left two bits
32 32
32
32
5
5
32
32
Add
er
Add
erA
LU
ALUFcnincl. Carryin
PC
LoadPC
ClearPC
Mux
BrchOrJmpOr
Squencl
Mux
WriteRegFromMemory
OrALU
Mux
ALUSecondOpImmOrReg2
Mux
Regw#From20..16Or15..11
RegReadOrWrite
RegArray
RegReadOrWrite
DataIn
Reg2#
RegW#
Reg1
Reg2
Reg1#
IF: Instruction Fetch ID: Instruction Decode/Register File ReadEX: Execute/Address
Calculation
DataMemA
ddr
Rea
dD
ata
WriteData
5
5
MEM: Memory Access
WB: Write Back
32
IF/ID ID/EX EX/MEM MEM/WB
5
5
![Page 24: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/24.jpg)
Sequence of Instructions
• Imaging the pipelined execution of a sequence of five instructions:
lw $10, 20($1)
sub $11, $2, $3
add $12, $3, $4
lw $13, 24($1)
add $14, $5, $6
![Page 25: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/25.jpg)
Multi-clock-cycle Pipeline Diagram of Above Instructions with Pipeline Registers
IM Reg ALU DM Reglw $10, 20($1)
sub $11, $2, $3
add $12, $3, $4
Instruction Execution Order
lw $13, 24($1)
add $14, $5, $6
IM Reg ALU RegDM
IM Reg ALU RegDM
IM Reg ALU DM Reg
IM Reg ALU RegDM
![Page 26: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/26.jpg)
Identification of Control Lines
InsMemA
ddr
InsO
ut
4
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
5
5
[20..16]
[15..11]
5
[25..21]
[25..0]Shift left two bits
2628
[31..28]
4
Sign extend
[15..0]
16
Shift left two bits
32 32
32
32
5
5
32
32
Add
er
Add
erA
LU
ALUFcnincl. Carryin
PC
LoadPC
ClearPC
Mux
BrchOrJmpOr
Squencl
Mux
WriteRegFromMem
oryOrALU
Mux
ALUSecondOpImmOrReg2
Mux
Regw#From20..16Or15..11
RegReadOrWrite
RegArray
RegReadOrWrite
DataIn
Reg2#
RegW#
Reg1
Reg2
Reg1#
IF: Instruction Fetch ID: Instruction Decode/Register File ReadEX: Execute/Address
Calculation
DataMemA
ddr
Rea
dD
ata
WriteData
5
5
MEM: Memory Access
WB: Write Back
32
IF/ID ID/EX EX/MEM MEM/WB
5
5
MemRead MemWrite
Zero
Branch
![Page 27: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/27.jpg)
Problem with Control Lines
• Each instruction at each stage of the pipeline needs to have its specific control lines asserted
• Difficult problem to manage the state of each instruction in each stage
• However, what if we stage the control lines with the pipelined data!
![Page 28: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/28.jpg)
Pipelined Control Summary
IF: Instruction FetchID: Instruction Decode/
Register File ReadEX: Execute/Address
CalculationMEM: Memory Access WB: Write Back
IF/ID ID/EX EX/MEM MEM/WB
Control
Inst
ruct
ion
EX
ME
MW
B
ME
MW
B
WB
![Page 29: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/29.jpg)
Pipelined Control Detail
InsMemA
dd
r
InsO
ut
4
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
5
5
[20..16]
[15..11]
5
[25..21]
[25..0]Shift left two bits
2628
[31..28]
4
Sign extend
[15..0]
16
Shift left two bits
32 32
32
32
5
5
32
32
Add
er
Add
erA
LU
ALUFcnincl. Carryin
PC
LoadPC
ClearPC
Mux
BrchOrJmpOr
Squencl
Mux
WriteRegFromMem
oryOrALU
Mux
ALUSecondOpImmOrReg2
Mux
Regw#From20..16Or15..11
RegReadOrWrite
RegArray
RegReadOrWrite
DataIn
Reg2#
RegW#
Reg1
Reg2
Reg1#
IF: Instruction Fetch ID: Instruction Decode/Register File ReadEX: Execute/Address
Calculation
DataMemA
ddr
Rea
dD
ata
WriteData
5
5
MEM: Memory Access
WB: Write Back
32
IF/ID ID/EX EX/MEM MEM/WB
5
5
MemRead MemWrite
Zero
Branch
![Page 30: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/30.jpg)
Sequence of Instructions with Dependencies
• Pipelined execution of a sequence of five instructions with dependencies:
sub $2, $1, $3
and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
![Page 31: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/31.jpg)
Multi-clock-cycle Pipeline Diagram of Above Instructions with Dependencies
IM Reg ALU Regsub $2, $1, $3
and $12, $2, $5
or $13, $6, $2
Instruction Execution Order
add $14, $2, $2
sw $15, 100($2)
IM Reg ALU Reg
IM Reg ALU Reg
IM Reg ALU Reg
IM Reg ALU Reg
DM
DM
DM
DM
DM
![Page 32: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/32.jpg)
Multi-clock-cycle Pipeline Diagram of Above Instructions with Dependencies & Forwarding
IM Reg ALU Regsub $2, $1, $3
and $12, $2, $5
or $13, $6, $2
Instruction Execution Order
add $14, $2, $2
sw $15, 100($2)
IM Reg ALU Reg
IM Reg ALU Reg
IM Reg ALU Reg
IM Reg ALU Reg
DM
DM
DM
DM
DM
![Page 33: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/33.jpg)
Forwarding
• It is the responsibility of the Forwarding Unit to determine how to forward available values to where they are used
![Page 34: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/34.jpg)
Sequence of Instructions with Hazards & Stalls• Pipelined execution of a sequence of five instructions with
dependencies:
lw $2, 20($1)
and $4, $2, $5
or $8, $2, $6
add $9, $4, $2
slt $1, $6, $7
![Page 35: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/35.jpg)
Multi-clock-cycle Pipeline Diagram of Above Instructions with Hazards & Stalls
IM Reg ALU Reglw $2, 20($1)
and $4, $2, $5
or $8, $2, $6
Instruction Execution Order
add $9, $4, $2
slt $1, $6, $7
IM Reg ALU Reg
IM Reg ALU Reg
IM Reg ALU Reg
IM Reg ALU Reg
DM
DM
DM
DM
DM
![Page 36: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/36.jpg)
Multi-clock-cycle Pipeline Diagram of Above Instructions with Hazards & Stalls & Forwarding
IM Reg ALU Reglw $2, 20($1)
and $4, $2, $5
or $8, $2, $6
Instruction Execution Order
add $9, $4, $2
slt $1, $6, $7
IM Reg ALU Reg
IM Reg ALU Reg
IM Reg ALU Reg
IM Reg ALU Reg
DM
DM
DM
DM
DM
![Page 37: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/37.jpg)
Forwarding Does Not Solve All Our Problems
• We really need the value of $2 fromlw $2, 20($1)
inand $4, $2, $5
before it is available
• This can’t be corrected by using forwarding
• We need to delay the execution of and $4, $2, $5 for one clock cycle until $2 is available
![Page 38: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/38.jpg)
Hazard Detection Unit
• It is the responsibility of the Hazard Detection Unit to determine if a stall is required
• The Hazard Detection Unit inserts bubbles
• A bubble causes the CPU to perform no operation• We refer to this as a nop (pronounced: no op)
![Page 39: Pipelining - sites.fas.harvard.edusites.fas.harvard.edu/~cscie287/spring2019/slides/Pipelining.pdf · Pipelining •Decreases the average number of cycles per instruction (CPI) –](https://reader030.vdocuments.mx/reader030/viewer/2022011809/5d40dd3a88c9938c3f8d199d/html5/thumbnails/39.jpg)
Multi-clock-cycle Pipeline Diagram of Above Instructions with Hazards & Stalls Inserted
IM Reg ALU Reglw $2, 20($1)
and $4, $2, $5 becomes nop
and $4, $2, $5
Instruction Execution Order
or $8, $2, $6
add $9, $4, $2
IM Reg ALU Reg
IM Reg ALU Reg
IM Reg ALU Reg
IM Reg ALU Reg
DM
DM
DM
DM
DM