chapter 4 sections 4.1 – 4.4 appendix d.1 and d.2 dr. iyad f. jafar basic mips architecture:...
TRANSCRIPT
Chapter 4Sections 4.1 – 4.4
Appendix D.1 and D.2
Dr. Iyad F. Jafar
Basic MIPS Architecture:Single-Cycle Datapath and Control
Outline
Introduction Clocking Single-cycle Datapath Single-cycle Control Performance Analysis
2
Introduction
3
So far, we have built a small ALUADD, SUB, SLT, AND, OR, …
What about Memory and registers?Control operations? Interpreting (decoding) instructions?
The big pictureThe CPU’s datapath deals with moving data
around The CPU’s control manages the data
Generic implementationFetchPC = PC+4
Decode
Execute
The clocking methodology defines when signals can be read and when they are written An edge-triggered methodology
Typical execution read contents of state elements send values through combinational logic write results to one or more state elements
Assumes state elements are written on every clock cycle; if not, need explicit write control signal
write occurs only when both the write control is asserted and the clock edge occurs
Clocking
4
State Element
State Element
Combinationallogic
clock
one clock cycle
Single-Cycle Datapath
5
The first implementation considered
All instructions start and finish execution in one cycle!
This include the time required to fetch, decode, and execute the instruction
In the following, we will consider the datapath of each of these steps
Single-Cycle Datapath
6
Fetch DatapathFetching the instruction from memory requires
Sending the PC to memory to read the instruction Update the PC to point to the next instruction
Do we need an explicit write signal for writing the PC?Do we need an explicit read signal for reading the
memory?
PC Read
AddressData
InstructionMemory
+4
Instruction
Single-Cycle Datapath
7
Decode DatapathRegardless of the instruction
Send the opcode (31-26) and the function (5-0) fields of the instruction to the control unit
Read two registers; rs (25-21) and rt (20-16)Reading is not harmful!
Instruction
Write Data
Read Addr 1
Read Addr 2
Write Addr
Reg
iste
r File
Read Data 1
Read Data 2
ControlUnit
R[rs]
R[rt]
Single-Cycle Datapath
8
Inside the Register FileHow can we read a register out of 32
registers?
Register 0Register 1Register 2….
Register 31
32-t
o-1
M
UX
32-t
o-1
M
UX
Read Register 1
Read Register 2
Read Data 1
Read Data 2
0
1
31
0
1
31
Single-Cycle Datapath
9
Inside the Register FileHow can we write a register out of 32
registers?
Register Number
5-t
o-3
2 D
ecod
er
Write Data
Register 0D
C
Register 1D
C
Register 2D
C
…..D
C
Register 31D
C
Write
31
1
0
Clock
Single-Cycle Datapath
10
Execution DatapathR-type instructions (ADD, SUB, SLT, AND,
OR)The two registers are read already!Perform operation based on OPCODE and FUNC fieldsStore the result back into the register file (the
destination register is specified in rd field of the instruction (15-11)!
The register file is not written on every cycle! Need an explicit write signal
Write Data
Read Addr 1
Read Addr 2
Write AddrR
eg
iste
r File
Read Data 1
Read Data 2
R[rs]
R[rt]
InstructionWrite
ALU
RegWrite ALU Control
Single-Cycle Datapath
11
Execution DatapathLoad Instruction
Compute the load address Store the loaded data in the register file. The
destination register is the rt field of the instruction (20-16)
Write Data
Read Addr 1
Read Addr 2
Write Addr
Reg
iste
r File
Read Data 1
Read Data 2
R[rs]
R[rt]
InstructionWrite
ALU
RegWrite ALU Control
Address
Data
Data
Mem
ory
Sign Ext.
WriteData
MemRead
MemWrite
Single-Cycle Datapath
12
Execution DatapathStore Instruction
Compute the load address Store register in the memory
Write Data
Read Addr 1
Read Addr 2
Write Addr
Reg
iste
r File
Read Data 1
Read Data 2
R[rs]
R[rt]
InstructionWrite
ALU
RegWrite ALU Control
Address
Data
Data
Mem
ory
Sign Ext.
WriteData
MemRead
MemWrite
Single-Cycle Datapath
13
Execution DatapathBranch Instruction
Compare the two registers Compute the branch address Change PC if true !
Write Data
Read Addr 1
Read Addr 2
Write Addr
Reg
iste
r File
Read Data 1
Read Data 2
Instruction
Write
RegWrite
Sign Ext.
ALU
ALU Contro
l
PC
+4
x4
+
Zero
Branch
Address
1
0
Branch Address
Zero
Single-Cycle Datapath
14
Execution DatapathJump Instruction
Compute the jump addressStore it in the PC
PC Read
AddressData
InstructionMemory
+4
Instruction x4
jump address
1
0
Jump
Single-Cycle Datapath
15
Creating the Single DatapathAssemble the datapath segments and add
control lines and multiplexors as neededSingle cycle design
Fetch, decode and execute each instructions in one clock cycle
No datapath resource can be used more than once per instruction, so some must be duplicated (e.g., separate Instruction Memory and Data Memory, several adders)
Multiplexors needed at the input of shared elements with control lines to do the selection
Write signals to control writing to the Register File and Data Memory
Cycle time is determined by length of the longest path
Single-Cycle Datapath
16
ReadAddress
Instr[31-0]
InstructionMemory
+
PC
4
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ALU
ovf
zero
RegWrite
DataMemory
Address
Write Data
Read Data
MemWrite
MemRead
SignExtend16 32
MemtoReg
ALUSrc
Shift
left 2
+PCSrc
RegDst
ALUcontrol
1
1
1
00
0
0
1
ALUOp
Instr[5-0]
Instr[15-0]
Instr[25-21]
Instr[20-16]
Instr[15 -11]
ControlUnit
Instr[31-26]
Branch
0
1Shift
left 2Instr[25-0]
PC[31-28]
Jump
Single-Cycle Control
17
Need to design the control that generates the appropriate control signals based on the Opcode and Function fields to Specify the operation of the ALUControl the data flow by selecting the appropriate input of
the multiplexors
With the following observations across different instructions Op field is always in bits 31-26 of the instruction Address of registers to be read are always specified by
The rs field (bits 25-21) The rt field (bits 20-16)
For LW and SW, the rs field is the base registerAddress of register to be written is in one of two places
For LW, the address is the rt field (bits 20-16 ) For R-type, the address is the rd field (bits 15-11)
Offset for BEQ, LW, and SW is always in bits 15-0 of the instruction
Single-Cycle Control
18
Signal Name
Effect when Deassereted (0)
Effect when Asserted (1)
RegDst The destination register is from rt field
The destination register is from rd field
RegWrite None Enable writing to the register selected by the Write register port
ALUSrcThe second ALU operand comes from the second
register file output
The second ALU operand is the sign extended offset
PCSrc PC value is PC+4 PC is the branch address
MemRead None Contents of memory address are put on Read data output
MemWrite None Data on the Write data input is placed in the specified address
MemtoRegThe data fed to the register file Write data input comes
from ALU
The data fed to the register file Write data input comes from
memory
ALUOp Used with the function field of the instruction to generate the ALUOp signal that specify the ALU operation
R-type Instruction Data/Control Flow
19
ReadAddress
Instr[31-0]
InstructionMemory
+
PC
4
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ALU
ovf
zero
RegWrite
DataMemory
Address
Write Data
Read Data
MemWrite
MemRead
SignExtend16 32
MemtoReg
ALUSrc
Shift
left 2
+PCSrc
RegDst
ALUcontrol
1
1
1
00
0
0
1
ALUOp
Instr[5-0]
Instr[15-0]
Instr[25-21]
Instr[20-16]
Instr[15 -11]
ControlUnit
Instr[31-26]
Branch
0
1Shift
left 2Instr[26-0]
PC[31-28]
Jump
Load Word Instruction Data/Control Flow
20
ReadAddress
Instr[31-0]
InstructionMemory
+
PC
4
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ALU
ovf
zero
RegWrite
DataMemory
Address
Write Data
Read Data
MemWrite
MemRead
SignExtend16 32
MemtoReg
ALUSrc
Shift
left 2
+PCSrc
RegDst
ALUcontrol
1
1
1
00
0
0
1
ALUOp
Instr[5-0]
Instr[15-0]
Instr[25-21]
Instr[20-16]
Instr[15 -11]
ControlUnit
Instr[31-26]
Branch
0
1Shift
left 2Instr[26-0]
PC[31-28]
Jump
Branch Equal Instruction Data/Control Flow
21
ReadAddress
Instr[31-0]
InstructionMemory
+
PC
4
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ALU
ovf
zero
RegWrite
DataMemory
Address
Write Data
Read Data
MemWrite
MemRead
SignExtend16 32
MemtoReg
ALUSrc
Shift
left 2
+PCSrc
RegDst
ALUcontrol
1
11
00
0
0
1
ALUOp
Instr[5-0]
Instr[15-0]
Instr[25-21]
Instr[20-16]
Instr[15 -11]
ControlUnit
Instr[31-26]
Branch
0
1Shift
left 2Instr[26-0]
PC[31-28]
Jump
Jump Instruction Data/Control Flow
22
ReadAddress
Instr[31-0]
InstructionMemory
+
PC
4
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
ALU
ovf
zero
RegWrite
DataMemory
Address
Write Data
Read Data
MemWrite
MemRead
SignExtend16 32
MemtoReg
ALUSrc
Shift
left 2
+PCSrc
RegDst
ALUcontrol
1
11
00
0
0
1
ALUOp
Instr[5-0]
Instr[15-0]
Instr[25-21]
Instr[20-16]
Instr[15 -11]
ControlUnit
Instr[31-26]
Branch
0
1Shift
left 2Instr[26-0]
PC[31-28]
Jump
Single-Cycle Control
23
The Main Control UnitThe input is the Op field (6 bits) from the instruction The output is nine control signals The truth table !
Inputs Outputs
Op5
Op4
Op3
Op2
Op1
Op0
RegDist
ALUsrc
MemtoReg
RegWrite
MemRead
MemWrite
Branch
ALUop1
ALUop0
R-type 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0
LW 1 0 0 0 1 1 0 1 1 1 1 0 0 0 0
SW 1 0 1 0 1 1 X 1 X 0 0 1 0 0 0
BEQ 0 0 0 1 0 0 X 0 X 0 0 0 1 0 1
Single-Cycle Control
24
The Main Control UnitTo design the logic circuit, generate the appropriate
minterms for each output signalSimply, use a PLA!
Single-Cycle Control
25
The ALU Control UnitIt has two inputs
ALUop (2 bits) from Main control Func (6 bits) from the instruction
It has two outputs Bengate (1 bits)Operation (2 bits)
Supported Operations
Function
Bnegate
Operation
and 0 00or 0 01
add 0 10sub 1 10slt 1 11
ALUcontrol
ALUop
Func
Bnegate
Operation
Single-Cycle Control
26
The ALU Control UnitTruth Table !
Inputs Outputs
ALUop1
ALUop0
F5 F4 F3 F2 F1 F0Bnegate
Operation
1
Operation0
AND 1 0 1 0 0 1 0 0 0 0 0
OR 1 0 1 0 0 1 0 1 0 0 1
ADD 1 0 1 0 0 0 0 0 0 1 0
SUB 1 0 1 0 0 0 1 0 1 1 0
SLT 1 0 1 0 1 0 1 0 1 1 1
LW 0 0 n/a 0 1 0
SW 0 0 n/a 0 1 0
BEQ 0 1 n/a 1 1 0
Single-Cycle Control
27
The ALU Control UnitHardware ImplementationGenerating minterms!! Minimization!! By inspection!
Performance Analysis
28
All instructions have to finish in one cycle!How long is the cycle time?
Different units are used in different instructionsEach unit has its own delay Need to find the longest path!
Assume the following times
Thus, the cycle time should be at least 8 ns
R-type: Instr. Fetch Register Read ALU Register Write
6ns
8ns
7ns
5ns
2ns
Branch: Instr. Fetch Register Read ALU
LW: Instr. Fetch Register Read ALU Memory Read Register Write
SW: Instr. Fetch Register Read ALU Memory Write
Jump: Instr. Fetch
Unit Delay
ALU 2 ns
Memory 2 ns
Register File 1 ns
Performance Analysis
29
The cycle time is fixed!However, not all instructions require the same time!
There is a wasted time for some instructions?!
Possible Solution?
Clock
LW SW
Cycle 1 Cycle 2
waste
Performance Analysis
30
Example 1. Example 1. consider the following two implementations of a single cycle machine: Machine A : all instructions execute in one cycle of
fixed lengthMachine B: all instructions execute in one cycle ,
however, the cycle time adapts to instruction types
Use the information given in the tables to compare the two machines
Unit Time (ps)
Memory 200
ALU and adders 100
Register File 50
Instruction type
Percentage %
ALU 45
Load 25
Store 10
Branch 15
Jump 5
Performance Analysis
31
Example 1. Continued. CPU Execution Time = IC x CPI x Clock cycle timeCPI A = CPIB = 1 ICA = ICB
CCA= 600 ns
CCB = 600 x 0.25 + 550 x 0.1 + 400 x 0.45 + 350 x 0.15 + 200 x 0.05 = 447.5 ps
PerformancB / PerformanceA = 600 / 447.5 = 1.34So, adaptive clock cycle is faster; however it is hard
to implement !
Instruction Type
Inst. Memor
y
Register
ReadALU
Data Memor
y
Register
WriteTotal
R-type 200 50 100 0 50 400
Load 200 50 100 200 50 600
Store 200 50 100 200 550
Branch 200 50 100 0 350
Jump 200 200
Single Cycle Disadvantages & Advantages
32
Single-cycle implementation assumes that all instructions can execute in one cycles
AdvantagesSimple and easy to understand
Disadvantages Hardware duplication!Uses the clock cycle inefficiently – the clock cycle
must be timed to accommodate the slowest instruction (especially problematic for more complex instructions like floating point multiply)