chapter five
DESCRIPTION
Chapter Five. The Processor: Datapath & Control. We're ready to look at an implementation of the MIPS Simplified to contain only: memory-reference instructions: lw, sw arithmetic-logical instructions: add, sub, and, or, slt control flow instructions: beq, j Generic Implementation: - PowerPoint PPT PresentationTRANSCRIPT
12004 Morgan Kaufmann Publishers
Chapter Five
22004 Morgan Kaufmann Publishers
• We're ready to look at an implementation of the MIPS
• Simplified to contain only:
– memory-reference instructions: lw, sw – arithmetic-logical instructions: add, sub, and, or, slt– control flow instructions: beq, j
• Generic Implementation:
– use the program counter (PC) to supply instruction address
– get the instruction from memory
– read registers
– use the instruction to decide exactly what to do
• All instructions use the ALU after reading the registers
Why? memory-reference? arithmetic? control flow?
The Processor: Datapath & Control
32004 Morgan Kaufmann Publishers
• Abstract / Simplified View:
Two types of functional units:
– elements that operate on data values (combinational)
– elements that contain state (sequential)
More Implementation Details
Data
Register #
Register #
Register #
PC Address Instruction
Instructionmemory
Registers ALU Address
Data
Datamemory
AddAdd
4
42004 Morgan Kaufmann Publishers
• Unclocked vs. Clocked
• Clocks used in synchronous logic
– when should an element that contains state be updated?
State Elements
Clock period Rising edge
Falling edge
cycle time
52004 Morgan Kaufmann Publishers
• The set-reset latch
– output depends on present inputs and also on past inputs
An unclocked state element
R
S
Q
Q
62004 Morgan Kaufmann Publishers
• Output is equal to the stored value inside the element(don't need to ask for permission to look at the value)
• Change of state (value) is based on the clock
• Latches: whenever the inputs change, and the clock is asserted
• Flip-flop: state changes only on a clock edge(edge-triggered methodology)
"logically true", — could mean electrically low
A clocking methodology defines when signals can be read and written— wouldn't want to read a signal at the same time it was being written
Latches and Flip-flops
72004 Morgan Kaufmann Publishers
• Two inputs:
– the data value to be stored (D)
– the clock signal (C) indicating when to read & store D
• Two outputs:
– the value of the internal state (Q) and it's complement
D-latch
Q
C
D
_Q
D
C
Q
82004 Morgan Kaufmann Publishers
D flip-flop
• Output changes only on the clock edge
D
C
Q
D
C
Dlatch
D
C
QD
latch
D
C
Q Q
92004 Morgan Kaufmann Publishers
Our Implementation
• An edge triggered methodology
• Typical execution:
– read contents of some state elements,
– send values through some combinational logic
– write results to one or more state elements
Stateelement
1
Stateelement
2Combinational logic
Clock cycle
102004 Morgan Kaufmann Publishers
• Built using D flip-flops
Register File
Read registernumber 1 Read
data 1Read registernumber 2
Readdata 2
Writeregister
WriteWritedata
Register file
Read registernumber 1
Register 0
Register 1
. . .
Register n – 2
Register n – 1
M
u
x
Read registernumber 2
M
u
x
Read data 1
Read data 2
Do you understand? What is the “Mux” above?
112004 Morgan Kaufmann Publishers
Abstraction
• Make sure you understand the abstractions!
• Sometimes it is easy to think you do, when you don’t
Mux
C
Select
32
32
32
B
A
Mux
Select
B31
A31
C31
Mux
B30
A30
C30
Mux
B0
A0
C0
...
...
122004 Morgan Kaufmann Publishers
Register File
• Note: we still use the real clock to determine when to write
Write
01
n-to-2n
decoder
n – 1
n
Register 0
C
D
Register 1
C
D
Register n – 2
C
D
Register n – 1
C
D
...
Register number...
Register data
132004 Morgan Kaufmann Publishers
Simple Implementation
• Include the functional units we need for each instruction
Why do we need this stuff?
PC
Instructionaddress
Instruction
Instructionmemory
Add Sum
a. Instruction memory b. Program counter c. Adder
Readregister 1
Readregister 2
Writeregister
WriteData
Registers ALUData
Data
Zero
ALUresult
RegWrite
a. Registers b. ALU
5
5
5
Registernumbers
Readdata 1
Readdata 2
ALU operation4
AddressReaddata
Datamemory
a. Data memory unit
Writedata
MemRead
MemWrite
b. Sign-extension unit
Signextend
16 32
142004 Morgan Kaufmann Publishers
Building the Datapath
• Use multiplexors to stitch them together
Readregister 1
Readregister 2
Writeregister
Writedata
Writedata
Registers ALU
Add
Zero
RegWrite
MemRead
MemWrite
PCSrc
MemtoReg
Readdata 1
Readdata 2
ALU operation4
Signextend
16 32
InstructionALU
result
Add
ALUresult
Mux
Mux
Mux
ALUSrc
Address
Datamemory
Readdata
Shiftleft 2
4
Readaddress
Instructionmemory
PC
152004 Morgan Kaufmann Publishers
Control
• Selecting the operations to perform (ALU, read/write, etc.)
• Controlling the flow of data (multiplexor inputs)
• Information comes from the 32 bits of the instruction
• Example:
add $8, $17, $18 Instruction Format:
000000 10001 10010 01000 00000 100000
op rs rt rd shamt funct
• ALU's operation based on instruction type and function code
162004 Morgan Kaufmann Publishers
• e.g., what should the ALU do with this instruction• Example: lw $1, 100($2)
35 2 1 100
op rs rt 16 bit offset
• ALU control input
0000 AND0001 OR0010 add0110 subtract0111 set-on-less-than1100 NOR
• Why is the code for subtract 0110 and not 0011?
Control
172004 Morgan Kaufmann Publishers
• Must describe hardware to compute 4-bit ALU control input
– given instruction type 00 = lw, sw01 = beq, 10 = arithmetic
– function code for arithmetic
• Describe it using a truth table (can turn into gates):
ALUOp computed from instruction type
Control
Instruction RegDst ALUSrcMemto-
RegReg
WriteMem Read
Mem Write Branch ALUOp1 ALUp0
R-format 1 0 0 1 0 0 0 1 0lw 0 1 1 1 1 0 0 0 0sw X 1 X 0 0 1 0 0 0beq X 0 X 0 0 0 1 0 1
Readregister 1
Readregister 2
Writeregister
Writedata
Writedata
Registers
ALU
Add
Zero
Readdata 1
Readdata 2
Signextend
16 32
Instruction[31–0] ALU
result
Add
ALUresult
Mux
Mux
Mux
Address
Datamemory
Readdata
Shiftleft 2
4
Readaddress
Instructionmemory
PC
1
0
0
1
0
1
Mux
0
1
ALUcontrol
Instruction [5–0]
Instruction [25–21]
Instruction [31–26]
Instruction [15–11]
Instruction [20–16]
Instruction [15–0]
RegDstBranchMemReadMemtoRegALUOpMemWriteALUSrcRegWrite
Control
192004 Morgan Kaufmann Publishers
Control
• Simple combinational logic (truth tables)
Operation2
Operation1
Operation0
Operation
ALUOp1
F3
F2
F1
F0
F (5– 0)
ALUOp0
ALUOp
ALU control block
R-format Iw sw beq
Op0
Op1
Op2
Op3
Op4
Op5
Inputs
Outputs
RegDst
ALUSrc
MemtoReg
RegWrite
MemRead
MemWrite
Branch
ALUOp1
ALUOpO
202004 Morgan Kaufmann Publishers
• All of the logic is combinational
• We wait for everything to settle down, and the right thing to be done
– ALU might not produce “right answer” right away
– we use write signals along with clock to determine when to write
• Cycle time determined by length of the longest path
Our Simple Control Structure
We are ignoring some details like setup and hold times
Stateelement
1
Stateelement
2Combinational logic
Clock cycle
212004 Morgan Kaufmann Publishers
Single Cycle Implementation
• Calculate cycle time assuming negligible delays except:
– memory (200ps), ALU and adders (100ps), register file access (50ps)
Readregister 1
Readregister 2
Writeregister
Writedata
Writedata
Registers ALU
Add
Zero
RegWrite
MemRead
MemWrite
PCSrc
MemtoReg
Readdata 1
Readdata 2
ALU operation4
Signextend
16 32
InstructionALU
result
Add
ALUresult
Mux
Mux
Mux
ALUSrc
Address
Datamemory
Readdata
Shiftleft 2
4
Readaddress
Instructionmemory
PC
222004 Morgan Kaufmann Publishers
Where we are headed
• Single Cycle Problems:
– what if we had a more complicated instruction like floating point?
– wasteful of area
• One Solution:
– use a “smaller” cycle time
– have different instructions take different numbers of cycles
– a “multicycle” datapath:
Data
Register #
Register #
Register #
PC Address
Instructionor dataMemory Registers ALU
Instructionregister
Memorydata
register
ALUOut
A
BData
232004 Morgan Kaufmann Publishers
• We will be reusing functional units
– ALU used to compute address and to increment PC
– Memory used for instruction and data
• Our control signals will not be determined directly by instruction
– e.g., what should the ALU do for a “subtract” instruction?
• We’ll use a finite state machine for control
Multicycle Approach
242004 Morgan Kaufmann Publishers
• Break up the instructions into steps, each step takes a cycle
– balance the amount of work to be done
– restrict each cycle to use only one major functional unit
• At the end of a cycle
– store values for use in later cycles (easiest thing to do)
– introduce additional “internal” registers
Multicycle Approach
Readregister 1
Readregister 2
Writeregister
Writedata
Registers ALU
Zero
Readdata 1
Readdata 2
Signextend
16 32
Instruction[25–21]
Instruction[20–16]
Instruction[15–0]
ALUresult
Mux
Mux
Shiftleft 2
Instructionregister
PC 0
1
Mux
0
1
Mux
0
1
Mux
0
1A
B 0
1
2
3
ALUOut
Instruction[15–0]
Memorydata
register
Address
Writedata
Memory
MemData
4
Instruction[15–11]
252004 Morgan Kaufmann Publishers
Instructions from ISA perspective
• Consider each instruction from perspective of ISA.
• Example:
– The add instruction changes a register.
– Register specified by bits 15:11 of instruction.
– Instruction specified by the PC.
– New value is the sum (“op”) of two registers.
– Registers specified by bits 25:21 and 20:16 of the instructionReg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] op Reg[Memory[PC][20:16]]
– In order to accomplish this we must break up the instruction.(kind of like introducing variables when programming)
262004 Morgan Kaufmann Publishers
Breaking down an instruction
• ISA definition of arithmetic:
Reg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] op Reg[Memory[PC][20:16]]
• Could break down to:– IR <= Memory[PC]– A <= Reg[IR[25:21]]– B <= Reg[IR[20:16]]– ALUOut <= A op B– Reg[IR[20:16]] <= ALUOut
• We forgot an important part of the definition of arithmetic!– PC <= PC + 4
272004 Morgan Kaufmann Publishers
Idea behind multicycle approach
• We define each instruction from the ISA perspective (do this!)
• Break it down into steps following our rule that data flows through at most one major functional unit (e.g., balance work across steps)
• Introduce new registers as needed (e.g, A, B, ALUOut, MDR, etc.)
• Finally try and pack as much work into each step (avoid unnecessary cycles)
while also trying to share steps where possible(minimizes control, helps to simplify solution)
• Result: Our book’s multicycle Implementation!
282004 Morgan Kaufmann Publishers
• Instruction Fetch
• Instruction Decode and Register Fetch
• Execution, Memory Address Computation, or Branch Completion
• Memory Access or R-type instruction completion
• Write-back step
INSTRUCTIONS TAKE FROM 3 - 5 CYCLES!
Five Execution Steps
292004 Morgan Kaufmann Publishers
• Use PC to get instruction and put it in the Instruction Register.
• Increment the PC by 4 and put the result back in the PC.
• Can be described succinctly using RTL "Register-Transfer Language"
IR <= Memory[PC];PC <= PC + 4;
Can we figure out the values of the control signals?
What is the advantage of updating the PC now?
Step 1: Instruction Fetch
302004 Morgan Kaufmann Publishers
• Read registers rs and rt in case we need them
• Compute the branch address in case the instruction is a branch
• RTL:
A <= Reg[IR[25:21]];B <= Reg[IR[20:16]];ALUOut <= PC + (sign-extend(IR[15:0]) << 2);
• We aren't setting any control lines based on the instruction type (we are busy "decoding" it in our control logic)
Step 2: Instruction Decode and Register Fetch
312004 Morgan Kaufmann Publishers
• ALU is performing one of three functions, based on instruction type
• Memory Reference:
ALUOut <= A + sign-extend(IR[15:0]);
• R-type:
ALUOut <= A op B;
• Branch:
if (A==B) PC <= ALUOut;
Step 3 (instruction dependent)
322004 Morgan Kaufmann Publishers
• Loads and stores access memory
MDR <= Memory[ALUOut];or
Memory[ALUOut] <= B;
• R-type instructions finish
Reg[IR[15:11]] <= ALUOut;
The write actually takes place at the end of the cycle on the edge
Step 4 (R-type or memory-access)
332004 Morgan Kaufmann Publishers
• Reg[IR[20:16]] <= MDR;
Which instruction needs this?
Write-back step
342004 Morgan Kaufmann Publishers
Summary:
352004 Morgan Kaufmann Publishers
• How many cycles will it take to execute this code?
lw $t2, 0($t3)lw $t3, 4($t3)beq $t2, $t3, Label #assume notadd $t5, $t2, $t3sw $t5, 8($t3)
Label: ...
• What is going on during the 8th cycle of execution?
• In what cycle does the actual addition of $t2 and $t3 takes place?
Simple Questions
Readregister 1
Readregister 2
Writeregister
Writedata
Registers ALU
Zero
Readdata 1
Readdata 2
Signextend
16 32
Instruction[31–26]
Instruction[25–21]
Instruction[20–16]
Instruction[15–0]
ALUresult
Mux
Mux
Shiftleft 2
Shiftleft 2
Instructionregister
PC 0
1
Mux
0
1
Mux
0
1
Mux
0
1A
B 0
1
2
3
Mux
0
1
2
ALUOut
Instruction[15–0]
Memorydata
register
Address
Writedata
Memory
MemData
4
Instruction[15–11]
PCWriteCond
PCWrite
IorD
MemRead
MemWrite
MemtoReg
IRWrite
PCSource
ALUOp
ALUSrcB
ALUSrcA
RegWrite
RegDst
26 28
Outputs
Control
Op[5–0]
ALUcontrol
PC [31–28]
Instruction [25-0]
Instruction [5–0]
Jumpaddress[31–0]
372004 Morgan Kaufmann Publishers
• Finite state machines:
– a set of states and
– next state function (determined by current state and the input)
– output function (determined by current state and possibly input)
– We’ll use a Moore machine (output based only on current state)
Review: finite state machines
Inputs
Current state
Outputs
Clock
Next-statefunction
Outputfunction
Nextstate
382004 Morgan Kaufmann Publishers
Review: finite state machines
• Example:
B. 37 A friend would like you to build an “electronic eye” for use as a fake security device. The device consists of three lights lined up in a row, controlled by the outputs Left, Middle, and Right, which, if asserted, indicate that a light should be on. Only one light is on at a time, and the light “moves” from left to right and then from right to left, thus scaring away thieves who believe that the device is monitoring their activity. Draw the graphical representation for the finite state machine used to specify the electronic eye. Note that the rate of the eye’s movement will be controlled by the clock speed (which should not be too great) and that there are essentially no inputs.
392004 Morgan Kaufmann Publishers
• Value of control signals is dependent upon:
– what instruction is being executed
– which step is being performed
• Use the information we’ve accumulated to specify a finite state machine
– specify the finite state machine graphically, or
– use microprogramming
• Implementation can be derived from specification
Implementing the Control
402004 Morgan Kaufmann Publishers
• Note:– don’t care if not mentioned– asserted if name only
– otherwise exact value
• How many state bits will we need?
Graphical Specification of FSMMemRead
ALUSrcA = 0IorD = 0IRWrite
ALUSrcB = 01ALUOp = 00
PCWritePCSource = 00
ALUSrcA = 0ALUSrcB = 11ALUOp = 00
ALUSrcA = 1ALUSrcB = 00ALUOp = 10
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
MemReadIorD = 1
MemWriteIorD = 1
RegDst = 1RegWrite
MemtoReg = 0
RegDst = 1RegWrite
MemtoReg = 0
PCWritePCSource = 10
ALUSrcA = 1ALUSrcB = 00ALUOp = 01
PCWriteCondPCSource = 01
Instruction decode/register fetch
Instruction fetch
0 1
Start
Jumpcompletion
9862
3
4
5 7
Memory readcompleton step
R-type completionMemoryaccess
Memoryaccess
ExecutionBranch
completionMemory address
computation
412004 Morgan Kaufmann Publishers
• Implementation:
Finite State Machine for Control
PCWrite
PCWriteCond
IorD
MemtoReg
PCSource
ALUOp
ALUSrcB
ALUSrcA
RegWrite
RegDst
NS3NS2NS1NS0
Op5
Op4
Op3
Op2
Op1
Op0
S3
S2
S1
S0
State register
IRWrite
MemRead
MemWrite
Instruction registeropcode field
Outputs
Control logic
Inputs