s3 -processor hardware implementation (control& datapath)

70
Hardware Implementation Datapath & Conrol (Processor) Mohammed Ali Abbaker Yossof Ali Abd-Elgadir Mohammed AbdAlhakam December 2010

Upload: mohammed-ali-abbaker-ali

Post on 26-Oct-2014

153 views

Category:

Documents


1 download

DESCRIPTION

computer architecturecomputer organization & design

TRANSCRIPT

Page 1: S3 -Processor Hardware Implementation (Control& Datapath)

Hardware Implementation Datapath & Conrol

(Processor)

Mohammed Ali AbbakerYossof Ali Abd-Elgadir

Mohammed AbdAlhakam

December 2010

Page 2: S3 -Processor Hardware Implementation (Control& Datapath)

2

Computer Organization

Page 3: S3 -Processor Hardware Implementation (Control& Datapath)

3

Introduction

• Before designing a machine we must discuss:– How the logical implementation of the machine

will operate ‘Datapath’– How the machine is clocked ‘Control’.

Page 4: S3 -Processor Hardware Implementation (Control& Datapath)

4

Logical elements ‘Functional Units’

Combinational elements Sequential elementsThe output depends only on the current

stateThe output depends on earlier states

Same input produce same output Contain a memory save the state

Ex. ALU Ex. Register

- The state element can be read any time- The clock is used to determine when the state element should be written

Page 5: S3 -Processor Hardware Implementation (Control& Datapath)

5

Clocking methodology:

- Defines when signal can read & when they can be written.

- If a signal is written at the same time it is read, the reading value could be:

1.The old value.2.The newly written value3.May be some mix of the two- Avoid the circumstance.

Page 6: S3 -Processor Hardware Implementation (Control& Datapath)

6

Edge-triggered clocking

• For simplicity, an edge-triggered clocking methodology is assumed, which means that any values stored in the machine are updated only on a clock edge. The state elements all update their internal storage on the clock edge.

Page 7: S3 -Processor Hardware Implementation (Control& Datapath)

7

Basic MIPS Implementation

•This implementation includes a subset of the core MIPS instruction set from the 4 basic classes:

- The memory reference instructions: load word (lw) and store word (sw)

- The arithmetic-logical instructions: add, sub, and, or, and slt.- The conditional instructions: branch equal (beq) and the

unconditional instructions: jump (j).

•Not include all integer instruction (for ex. Shift, multiply, divide are missing) & not include any floating-point instruction.

Page 8: S3 -Processor Hardware Implementation (Control& Datapath)

8

Basic MIPS Implementation cont.

• But the key principles to create a datapath and to design the control will be clear.

• most concepts used in implementing the MIPS subset at in this seminar & in the next seminar ‘Pipeline’ are the same basic ideas that are used to construct a broad spectrum of computer, from high performance servers to general-purpose microprocessors to embedded processors .

Page 9: S3 -Processor Hardware Implementation (Control& Datapath)

9

overview of the implementation:

•For the three types of the MIPS instruction set, much of what needs to be done to implement them is same independent of class of the instruction.

•For very instruction, the first two steps are identical:

1. Send the program Counter (PC) to the code memory & fetch it .2. Read 1 or 2 registers

•After these two steps, the action depends on the instruction class. Fortunately, for the three instruction classes, the actions are largely the same.

Page 10: S3 -Processor Hardware Implementation (Control& Datapath)

10

overview of the implementation cont.

3. All the instruction use the ALU except jump (j): - the memory reference instruction use it for an

address calculation. - the arithmetic-logical instruction use it for the

operation execution. - the branch instruction use it for comparison.

•Simplicity & regularity of the instruction simplify the implementation make the execution of different instruction classes similar.

Page 11: S3 -Processor Hardware Implementation (Control& Datapath)

11

overview of the implementation cont.

4. The action now depend on the class:

- the memory reference instruction needs to access the memory to write data @store / read data @load.

- the arithmetic-logical instruction must write back the ALU result into a register.

- the branch instruction may need to change the next instruction address based on the conversion; otherwise the PC should increment by 4 for the next instruction address.

Page 12: S3 -Processor Hardware Implementation (Control& Datapath)

12

High level view of MIPS implementation:

Figure 1:abstract view of MIPS subset implementation with major functional units & connection

Page 13: S3 -Processor Hardware Implementation (Control& Datapath)

13

Figure 1

- The input of a particular unit can come two sources.- They can’t simply wire together.- They connect together by using an element named a

data selector which choose one of the multiple sources - The data selector which is a multiplexer in fact.- The mux selects from the multiple sources depending

on the setting of its control line.- The control line setting bases on information coming

from the executed instruction.

Page 14: S3 -Processor Hardware Implementation (Control& Datapath)

14

High level view of MIPS implementation cont.

•Many other units is controlled depending on the type of the instruction:

- The data memory must read on a load & write on a store.

- The register must be written on a load & arithmetic-logical instruction.

- The ALU must perform several operations.

Page 15: S3 -Processor Hardware Implementation (Control& Datapath)

15

Figure 2: Basic implementation of MIPS subset with necessary multiplexers & control lines

Page 16: S3 -Processor Hardware Implementation (Control& Datapath)

16

Figure 2

- The top multiplexer control the PC.- The medium multiplexer control steering the ALU output

(arithmetic ins.) or the data memory output (load ins.).- The last mux control the ALU input which is from register

(nonimmediate arithmetic-logical ins.) or it’s from offset field of the instruction (immediate operations, load or store, or branch).

•In the next more functional units will appear & no. of connections between them will increase & of course more control units.

Page 17: S3 -Processor Hardware Implementation (Control& Datapath)

17

Datapath

• The collection of state elements, computation elements, and interconnections that together provide a conduit for the flow and transformation of data in the processor during execution.

Page 18: S3 -Processor Hardware Implementation (Control& Datapath)

18

Control

• The component of the processor that commands the datapath, memory, and I/O devices according to the instructions of the program.

Page 19: S3 -Processor Hardware Implementation (Control& Datapath)

20

Abstract View of the DataPath

• Shown are abstract view of datapath, let’s first look at:

Registers

Register #

Data

Register #

Datamemory

Address

Data

Register #

PC Instruction ALU

Instructionmemory

Address

Page 20: S3 -Processor Hardware Implementation (Control& Datapath)

21

Abstract View of the DataPath

• Arithmetic Logic Unit (ALU): is a digital circuit that performs arithmetic and logical operations.

Registers

Register #

Data

Register #

Datamemory

Address

Data

Register #

PC Instruction ALU

Instructionmemory

Address

Page 21: S3 -Processor Hardware Implementation (Control& Datapath)

22

ALU design• Shown are single ALU cell

gives operation according to control inputs.

• 32 such cell forms the ALU, an attention given for MSB cell from which we can detect overflow, and compare

Page 22: S3 -Processor Hardware Implementation (Control& Datapath)

23

ALU design• The four control lines specify the

operation (2 LSB bits).

ALU control lines Function

0000 AND

0001 OR

0010 add

0110 subtract

0111 set on less than

1100 NOR

4

32

32

32

ALU operation

A

B

Zero

Result

Overflow

CarryOut

CarryIn

Page 23: S3 -Processor Hardware Implementation (Control& Datapath)

24

Abstract View of the DataPath

• Registers file: A state element that consists of a set of registers that can be read and written by supplying a register number to be accessed.

Registers

Register #

Data

Register #

Datamemory

Address

Data

Register #

PC Instruction ALU

Instructionmemory

Address

Page 24: S3 -Processor Hardware Implementation (Control& Datapath)

25

Register file (reading):

Reg 31

…..

Reg 1

Reg 0

…..

…..

M

UX

M

UX

32

5Read register 1

Read register 2

Read data 1

Read data 2

• To specify the two

source registers out

of 32 register you can

select them by

Multiplexers by applying

their addresses.

Page 25: S3 -Processor Hardware Implementation (Control& Datapath)

26

Register file (writing):• To specify the destination register for write data out of 32

register you can decode its address to write on it only when

‘write’ is high.

........

........

5

to

32

decoder

Reg 0

Reg 1

Reg 31

........

E

D

E

D

E

D

5

32

Write

Write register number

Register data

Page 26: S3 -Processor Hardware Implementation (Control& Datapath)

27

Register file:

• Now, we can combine read and write structure for our complete register file: Read register #1

Read register #2

Write register

Write data

Write

Read data 1

Read data 2

5

5

5

32

32

32

Page 27: S3 -Processor Hardware Implementation (Control& Datapath)

28

Abstract View of the DataPath

• Memory system:– We can use either unified memory or two memory -

as in our case- one readable (instruction), and the other for read/write (data).

Registers

Register #

Data

Register #

Datamemory

Address

Data

Register #

PC Instruction ALU

Instructionmemory

Address

Page 28: S3 -Processor Hardware Implementation (Control& Datapath)

29

Memory system

• Remember that each of the two memories can holds 232= 4G byte, or 1G words (word= 4bytes).

• To access memory to get data or instruction we must supply the address line with multiple of 4 value, then four bytes are accessed for read or write in fashion similar to register file.

0

1

2

3

4

5

6

7

232-1

Page 29: S3 -Processor Hardware Implementation (Control& Datapath)

30

Abstract View of the DataPath

• Detailed datapath: now we can accomplish each of these element to execute the program

Registers

Register #

Data

Register #

Datamemory

Address

Data

Register #

PC Instruction ALU

Instructionmemory

Address

Page 30: S3 -Processor Hardware Implementation (Control& Datapath)

31

Fetching an Instruction• Program counter (PC) the register containing the address of the

instruction in the program being executed.• A instruction memory unit will hold the instructions that are to be

executed. • We need an ALU that performs only addition in order to calculate

the next instruction to fetch.

PC

Instructionmemory

Readaddress

Instruction

4

Add

PC

Instructionmemory

Instructionaddress

Instruction

a. Instruction memory b. Program counter

Add Sum

c. Adder

Page 31: S3 -Processor Hardware Implementation (Control& Datapath)

32

The Register File access

• For the R-type instructions, read the contents of 2 registers, perform an ALU op. , and write the result back into a third register (the value to be written, the register number, and the write control signal).

In s t r u c t io nR e g is te rs

W r itere g is te r

R e a dd a ta 1

R e a dd a ta 2

R e a dre g is te r 1

R e a dre g is te r 2

W r ited a t a

A L Ure s u lt

A L U

Z e ro

3

Write

ALU op

[25-11]

Page 32: S3 -Processor Hardware Implementation (Control& Datapath)

33

Data Memory• For the load and store instructions, we need to access the data memory and a unit that sign-extends the 16-bit

constant in an I-type instruction (immediate). In addition we use the existing ALU to compute the address to

access.

– In store:

• Register 1 used for address calculation, register 2 hold the data to be written and MemWrite is set

– In load:

• Register 1 used for address calculation, Write register for destination data from memory, both MemRead and RegWrite

are set

Instruction

16 32

RegistersWriteregister

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Datamemory

Writedata

Readdata

Writedata

Signextend

ALUresult

ZeroALU

Address

MemRead

MemW rite

RegW rite

ALU operation3[25-21]

[15-0]

[20-16]

[20-16]

Page 33: S3 -Processor Hardware Implementation (Control& Datapath)

34

Branch Equal• For the beq instruction, two registers for compare and a 16-bit offset used to compute

the branch address relative to the PC. To implement this instruction we must add the

sign-extend offset to the PC and shift left 2.

• Sign-extend: to increase the size of a data item by replicating the high-order sign bit of

the original data item in the high-order bits of the larger, destination data item.

• The unit Shift left 2 is simply a routing of the signals between input and output that

adds 00 to the low-order end of the sign-extended offset field; no actual shift hardware

is needed, since the amount of the shift is constant.

• Control logic is used to decide whether the incremented PC or branch target should

replace the PC, based on the zero output of the ALU

Page 34: S3 -Processor Hardware Implementation (Control& Datapath)

35

Branch Equal Diagram

Page 35: S3 -Processor Hardware Implementation (Control& Datapath)

36

Designing complete datapath• Comparing the three previous slides to build the common

datapath we note:– For register file input, ‘read register 1’ is constant [25-21] and

‘read register 2’ [20-16] (R-type and store) but ‘write register’ maybe [15-11] (R-type) or [20-16] (load) (Mux needed).

– For the ALU input, first input is from ‘read data 1’ while second maybe ‘read data 2’ (R-type & beq) or output of sign extend (sw & lw).

– The data written to register file may come from ALU output (R-type) or memory (load).

– PC next contents may come from ordinary incremented value or addition of beq offset.

Page 36: S3 -Processor Hardware Implementation (Control& Datapath)

37

Datapath Diagram

PC

Instructionmemory

Readaddress

Instruction[31–0]

Instruction [20 16]

Instruction [25 21]

Add

Instruction [5 0]

4

16 32Instruction [15 0]

0

0Mux

0

1

Add ALUresult

Mux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Datamemory

Writedata

Readdata

Mux

1

Instruction [15 11]

ALUcontrol

Shiftleft 2

ALUAddress

ALUSrc

ALUOp

MemRead

MemWrite

MemtoReg

PCSrc

RegWrite

RegDst

Page 37: S3 -Processor Hardware Implementation (Control& Datapath)

38

Datapath and control• From the diagram, peering in mind that the ALU 3 control inputs

from the ALU control:000 AND001 OR010 add (add, lw, sw)110 subtract (sub, beq)111 slt

• So, combining the eight control signals, we get:

Page 38: S3 -Processor Hardware Implementation (Control& Datapath)

39

Datapath & control

PC

Instructionmemory

Readaddress

Instruction[31–0]

Instruction [20 16]

Instruction [25 21]

Add

MemtoReg

ALUOp

MemWrite

RegWrite

MemRead

BranchRegDst

ALUSrc

Instruction [31 26]

4

16 32

0

0Mux

0

1

Control

Add ALUresult

Mux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

PCSrc

Mux

1

Instruction [15 11]

ALUcontrol

Shiftleft 2

ALU

Instruction [15 0]

Instruction [5 0]

Datamemory

Writedata

ReaddataAddress

Page 39: S3 -Processor Hardware Implementation (Control& Datapath)

40

R-type

PC

Instructionmemory

Readaddress

Instruction[31–0]

Instruction [20 16]

Instruction [25 21]

Add

MemtoReg

ALUOp

MemWrite

RegWrite

MemRead

BranchRegDst

ALUSrc

Instruction [31 26]

4

16 32Instruction [15 0]

0

0Mux

0

1

Control

Add ALUresult

Mux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

PCSrc

Datamemory

Writedata

Readdata

Mux

1

Instruction [15 11]

ALUcontrol

Shiftleft 2

ALU

Instruction [5 0]

Address

0 9 22 9 0 32E.g.

Page 40: S3 -Processor Hardware Implementation (Control& Datapath)

41

Load

PC

Instructionmemory

Readaddress

Instruction[31–0]

Instruction [25 21]

Add

MemtoReg

ALUOp

MemWrite

RegWrite

MemRead

BranchRegDst

ALUSrc

Instruction [31 26]

4

16 32

0

0Mux

0

1

Control

Add ALUresult

Mux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

PCSrc

Mux

1

ALUcontrol

Shiftleft 2

ALU

Instruction [15 11]

Instruction [20 16]

Instruction [15 0]

Instruction [5 0]

Datamemory

Writedata

ReaddataAddress

35 9 8 4

Page 41: S3 -Processor Hardware Implementation (Control& Datapath)

42

Branch equal

PC

Instructionmemory

Readaddress

Instruction[31–0]

Instruction [25 21]

Add

MemtoReg

ALUOp

MemWrite

RegWrite

MemRead

BranchRegDst

ALUSrc

Instruction [31 26]

4

16 32

0

0Mux

0

1

Control

Add ALUresult

Mux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

PCSrc

Mux

1

Instruction [15 11]

ALUcontrol

Shiftleft 2

ALU

Instruction [20 16]

Instruction [15 0]

Instruction [5 0]

Datamemory

Writedata

ReaddataAddress

4 8 21 1

Page 42: S3 -Processor Hardware Implementation (Control& Datapath)

43

ALU control design

PC

Instructionmemory

Readaddress

Instruction[31–0]

Instruction [20 16]

Instruction [25 21]

Add

MemtoReg

ALUOp

MemWrite

RegWrite

MemRead

BranchRegDst

ALUSrc

Instruction [31 26]

4

16 32

0

0Mux

0

1

Control

Add ALUresult

Mux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

PCSrc

Mux

1

Instruction [15 11]

ALUcontrol

Shiftleft 2

ALU

Instruction [15 0]

Instruction [5 0]

Datamemory

Writedata

ReaddataAddress

Page 43: S3 -Processor Hardware Implementation (Control& Datapath)

44

ALU control design:• For the ALU control, the signal come

from the control should tell it do one of three:– Adding (lw or sw)– Subtracting (beq)– A/L operation according to function

bits (instruction[5-0])• So, 8 input bits (6+2) and three output

bits ALU control has.

Input Output

ALUOp Funct field ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0

Operation

Lw, Sw 0 0 X X X X X X 010

Beq X 1 X X X X X X 110

Add 1 X X X 0 0 0 0 010

Sub 1 X X X 0 0 1 0 110

And 1 X X X 0 1 0 0 000

Or 1 X X X 0 1 0 1 001

Slt 1 X X X 1 0 1 0 111

Page 44: S3 -Processor Hardware Implementation (Control& Datapath)

45

Main control unit design

PC

Instructionmemory

Readaddress

Instruction[31–0]

Instruction [20 16]

Instruction [25 21]

Add

MemtoReg

ALUOp

MemWrite

RegWrite

MemRead

BranchRegDst

ALUSrc

Instruction [31 26]

4

16 32

0

0Mux

0

1

Control

Add ALUresult

Mux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

PCSrc

Mux

1

Instruction [15 11]

ALUcontrol

Shiftleft 2

ALU

Instruction [15 0]

Instruction [5 0]

Datamemory

Writedata

ReaddataAddress

Page 45: S3 -Processor Hardware Implementation (Control& Datapath)

46

Main Control Unit design• In this circuit, it’s clear that the control is

completely combinational, so we can

design it using gates or PLA.

• From diagram, We have 6 inputs and 9

outputs as depicted in truth table:

Control Signal name R-format lw sw beq

input

Op5 0 1 1 0

Op4 0 0 0 0

Op3 0 0 1 0

Op2 0 0 0 1

Op1 0 1 1 0

Op0 0 1 1 0

Outputs

RegDst 1 0 X X

ALUSrc 0 1 1 0

MemtoReg 0 1 X X

RegWrite 1 1 0 0

MemRead 0 1 0 0

MemWrite 0 0 1 0

Branch 0 0 0 1

ALUOp1 1 0 0 0

ALUOp0 0 0 0 1

Page 46: S3 -Processor Hardware Implementation (Control& Datapath)

47

Adding the Jump Instruction• For the j instruction, the upper 4 bits of PC+4 are concatenated

to the 26 bits (shifted left by 2) of the address in the J-type instruction.

Shiftleft 2

PC

Instructionmemory

Readaddress

Instruction[31– 0]

Datamemory

Readdata

Writedata

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Instruction [15– 11]

Instruction [20– 16]

Instruction [25– 21]

Add

ALUresult

Zero

Instruction [5– 0]

MemtoReg

ALUOp

MemW rite

RegWrite

MemRead

Branch

JumpRegDst

ALUSrc

Instruction [31– 26]

4

Mux

Instruction [25– 0] Jump address [31– 0]

PC+4 [31– 28]

Signextend

16 32Instruction [15– 0]

1

Mux

1

0

Mux

0

1

Mux

0

1

ALUcontrol

Control

Add ALUresult

Mux

0

1 0

ALU

Shiftleft 226 28

Address

Page 47: S3 -Processor Hardware Implementation (Control& Datapath)

48

Performance of Single-Cycle Machines• Processor we have designed is a single-cycle, that is, one cycle per instruction needed since all control signal

applied simultaneously.

• Let's assume that the operation time for the following units is:

– Memory: 2 ns

– ALU and adders: 2 ns

– Register file:1 ns.

• We will assume that MUXs, control, sign-extension, PC accesses, and wires have no delays.

• Which implementation is faster? 1. Every instruction operates in 1 clock cycle of fixed length.2. Every instruction operates in a varying length clock cycle.

• Lets look at the time needed by each instruction:

Inst. Fetch Reg. Rd ALU op Memory Reg. Wr Total

R-type 2 1 2 0 1 6ns

Load 2 1 2 2 1 8ns

Store 2 1 2 2 7ns

Branch 2 1 2 5ns

Jump 2

2ns

Page 48: S3 -Processor Hardware Implementation (Control& Datapath)

49

Fixed vs. Variable Cycle Length• Lets Assume a program has the following instruction mix: 24% loads, 12% stores, 44% R-

type, 18% branchs, 2% jumps.

• CPU execution time = Instruction count * Cycle time

• For the fixed cycle length the cycle time is 8 ns, long enough for the longest instruction

(load). Thus each instruction takes 8 ns to execute.

• For the variable cycle time the average CPU clock cycle is:

8*24% + 7*12% + 6*44% + 5*18% + 2*2% = 6.3 ns

• It is obvious that the variable clock implementation is faster but it is extremely hard to

implement.

• So why not use the single cycle implementation which is only 6.3/8 = 78% slower?

• When adding instructions such as multiply and divide which can take tens of cycles this

scheme is too slow (so, single-cycle not used).

Page 49: S3 -Processor Hardware Implementation (Control& Datapath)

50

A Multicycle Implementation• To increase clock cycle, instructions can be executed in many cycles by breaking each

instruction into several steps ( one cycle per step), (note that cycle period is fixed not CPI).

• the multicycle implementation allows a functional unit to be used more than once in each instruction as long as it is used on different clock cycles.

PC

Memory

Address

Instructionor data

Data

Instructionregister

Registers

Register #

Data

Register #

Register #

ALU

Memorydata

register

A

B

ALUOut

We now have only a single memory unit and a single ALU. In addition we need

registers to hold the output of each stage.

Page 50: S3 -Processor Hardware Implementation (Control& Datapath)

51

New Registers and MUXs• We have now added several new registers(which hare transparent to the

programmer) and some new MUXs:

– Instruction Register (IR) - the instruction fetched

– Memory Data Register (MDR) - data read from memory

– A, B - registers read from the register file

– ALUOut - result of ALU operation

• The new MUXs added are:

– An additional MUX to the 1st ALU input, chooses between the A register

and the PC.

– The MUX on the 2nd ALU input is changed from a 2-way to a 4-way MUX.

The additional inputs are the constant 4 (used to increment the PC) and

the sign-extended and shifted offset field (used in beq).

Page 51: S3 -Processor Hardware Implementation (Control& Datapath)

52

Multicycle Diagram

• IR needs write control, but others don’t• MUX to select 2 sources to memory; memory needs read signal• PC and A to one ALU input; four sources to another input

Shiftleft 2

MemtoReg

IorD MemRead MemWrite

PC

Memory

MemData

Writedata

Mux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Instruction[15– 11]

Mux

0

1

Mux

0

1

4

ALUOpALUSrcB

RegDst RegWrite

Instruction[15– 0]

Instruction [5– 0]

Signextend

3216

Instruction[25– 21]

Instruction[20– 16]

Instruction[15– 0]

Instructionregister

1 Mux

0

3

2

ALUcontrol

Mux

0

1ALU

resultALU

ALUSrcA

ZeroA

B

ALUOut

IRWrite

Address

Memorydata

register

Page 52: S3 -Processor Hardware Implementation (Control& Datapath)

53

Multicycle Datapath & control

• There are 3 possible sources for the PC value: – The output of the ALU which is PC+4;– The register ALUOut which is the address of the computed branch target; – The lower 26 bits of the IR shifted left by 2, concatenated with the 4 upper bits of the PC.

• Two control lines for PC

Shiftleft 2

PCMux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Instruction[15– 11]

Mux

0

1

Mux

0

1

4

Instruction[15– 0]

Signextend

3216

Instruction[25– 21]

Instruction[20– 16]

Instruction[15– 0]

Instructionregister

ALUcontrol

ALUresult

ALUZero

Memorydata

register

A

B

IorD

MemRead

MemWrite

MemtoReg

PCWriteCond

PCWrite

IRWrite

ALUO p

ALUSrcB

ALUSrcA

RegDst

PCSource

RegWrite

Control

O utputs

Op[5– 0]

Instruction[31-26]

Instruction [5– 0]

Mux

0

2

Jumpaddress [31-0]Instruction [25– 0] 26 28

Shiftleft 2

PC [31-28]

1

1 Mux

0

3

2

Mux

0

1ALUOut

Memory

MemData

Writedata

Address

Page 53: S3 -Processor Hardware Implementation (Control& Datapath)

54

1) Instruction Fetch

Fetch the instruction from memory and compute the address of the next sequential address:IR = Memory[PC];PC= PC + 4;

Shiftleft 2

PCMux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Instruction[15– 11]

Mux

0

1

Mux

0

1

4

Instruction[15– 0]

Signextend

3216

Instruction[25– 21]

Instruction[20– 16]

Instruction[15– 0]

Instructionregister

ALUcontrol

ALUresult

ALUZero

Memorydata

register

A

B

IorD

MemRead

MemWrite

MemtoReg

PCWriteCond

PCWrite

IRWrite

ALUO p

ALUSrcB

ALUSrcA

RegDst

PCSource

RegWrite

Control

O utputs

Op[5– 0]

Instruction[31-26]

Instruction [5– 0]

Mux

0

2

Jumpaddress [31-0 ]Instruction [25– 0] 26 28

Shiftleft 2

PC [31-28]

1

1 Mux

0

3

2

Mux

0

1ALUOut

Memory

MemData

Writedata

Address

Page 54: S3 -Processor Hardware Implementation (Control& Datapath)

55

2) Instruction Decode (ID) and register fetch

get the registers from the register file and compute the potential branch address (even if it isn't needed in the future):A = Reg[IR[25-21]];B = Reg[IR[20-16]];ALUOut = PC + (sign-extended(IR[15-0])<<2);

Shiftleft 2

PCMux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Instruction[15– 11]

Mux

0

1

Mux

0

1

4

Instruction[15– 0]

Signextend

3216

Instruction[25– 21]

Instruction[20– 16]

Instruction[15– 0]

Instructionregister

ALUcontrol

ALUresult

ALUZero

Memorydata

register

A

B

IorD

MemRead

MemWrite

MemtoReg

PCWriteCond

PCWrite

IRWrite

ALUO p

ALUSrcB

ALUSrcA

RegDst

PCSource

RegWrite

Control

O utputs

Op[5– 0]

Instruction[31-26]

Instruction [5– 0]

Mux

0

2

Jumpaddress [31-0 ]Instruction [25– 0] 26 28

Shiftleft 2

PC [31-28]

1

1 Mux

0

3

2

Mux

0

1ALUOut

Memory

MemData

Writedata

Address

Page 55: S3 -Processor Hardware Implementation (Control& Datapath)

56

3) Execution (EX), Memory address computation or branch completion

In this stage the operation is determined by the instruction class: A. Memory reference: ALUOut = A + sign-extended(IR[15-0]);B. R-type: ALUOut = A op B;

Shiftleft 2

PCMux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Instruction[15– 11]

Mux

0

1

Mux

0

1

4

Instruction[15– 0]

Signextend

3216

Instruction[25– 21]

Instruction[20– 16]

Instruction[15– 0]

Instructionregister

ALUcontrol

ALUresult

ALUZero

Memorydata

register

A

B

IorD

MemRead

MemWrite

MemtoReg

PCWriteCond

PCWrite

IRWrite

ALUO p

ALUSrcB

ALUSrcA

RegDst

PCSource

RegWrite

Control

O utputs

Op[5– 0]

Instruction[31-26]

Instruction [5– 0]

Mux

0

2

Jumpaddress [31-0 ]Instruction [25– 0] 26 28

Shiftleft 2

PC [31-28]

1

1 Mux

0

3

2

Mux

0

1ALUOut

Memory

MemData

Writedata

Address

C. Branch: if (A == B) PC = ALUOut;D. Jump: PC = PC[31-28] cat (IR[25-0]<<2)

Page 56: S3 -Processor Hardware Implementation (Control& Datapath)

57

3) Execution (EX), Memory address computation or branch completion

In this stage the operation is determined by the instruction class: A. Memory reference: ALUOut = A + sign-extended(IR[15-0]);B. R-type: ALUOut = A op B;

Shiftleft 2

PCMux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Instruction[15– 11]

Mux

0

1

Mux

0

1

4

Instruction[15– 0]

Signextend

3216

Instruction[25– 21]

Instruction[20– 16]

Instruction[15– 0]

Instructionregister

ALUcontrol

ALUresult

ALUZero

Memorydata

register

A

B

IorD

MemRead

MemWrite

MemtoReg

PCWriteCond

PCWrite

IRWrite

ALUO p

ALUSrcB

ALUSrcA

RegDst

PCSource

RegWrite

Control

O utputs

Op[5– 0]

Instruction[31-26]

Instruction [5– 0]

Mux

0

2

Jumpaddress [31-0 ]Instruction [25– 0] 26 28

Shiftleft 2

PC [31-28]

1

1 Mux

0

3

2

Mux

0

1ALUOut

Memory

MemData

Writedata

Address

C. Branch: if (A == B) PC = ALUOut;D. Jump: PC = PC[31-28] cat (IR[25-0]<<2)

Page 57: S3 -Processor Hardware Implementation (Control& Datapath)

58

3) Execution (EX), Memory address computation or branch completion

In this stage the operation is determined by the instruction class: A. Memory reference: ALUOut = A + sign-extended(IR[15-0]);B. R-type: ALUOut = A op B;

Shiftleft 2

PCMux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Instruction[15– 11]

Mux

0

1

Mux

0

1

4

Instruction[15– 0]

Signextend

3216

Instruction[25– 21]

Instruction[20– 16]

Instruction[15– 0]

Instructionregister

ALUcontrol

ALUresult

ALUZero

Memorydata

register

A

B

IorD

MemRead

MemWrite

MemtoReg

PCWriteCond

PCWrite

IRWrite

ALUO p

ALUSrcB

ALUSrcA

RegDst

PCSource

RegWrite

Control

O utputs

Op[5– 0]

Instruction[31-26]

Instruction [5– 0]

Mux

0

2

Jumpaddress [31-0 ]Instruction [25– 0] 26 28

Shiftleft 2

PC [31-28]

1

1 Mux

0

3

2

Mux

0

1ALUOut

Memory

MemData

Writedata

Address

C. Branch: if (A == B) PC = ALUOut;D. Jump: PC = PC[31-28] cat (IR[25-0]<<2)

Page 58: S3 -Processor Hardware Implementation (Control& Datapath)

59

3) Execution (EX), Memory address computation or branch completion

In this stage the operation is determined by the instruction class: A. Memory reference: ALUOut = A + sign-extended(IR[15-0]);B. R-type: ALUOut = A op B;

Shiftleft 2

PCMux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Instruction[15– 11]

Mux

0

1

Mux

0

1

4

Instruction[15– 0]

Signextend

3216

Instruction[25– 21]

Instruction[20– 16]

Instruction[15– 0]

Instructionregister

ALUcontrol

ALUresult

ALUZero

Memorydata

register

A

B

IorD

MemRead

MemWrite

MemtoReg

PCWriteCond

PCWrite

IRWrite

ALUO p

ALUSrcB

ALUSrcA

RegDst

PCSource

RegWrite

Control

O utputs

Op[5– 0]

Instruction[31-26]

Instruction [5– 0]

Mux

0

2

Jumpaddress [31-0]Instruction [25– 0] 26 28

Shiftleft 2

PC [31-28]

1

1 Mux

0

3

2

Mux

0

1ALUOut

Memory

MemData

Writedata

Address

C. Branch: if (A == B) PC = ALUOut;D. Jump: PC = PC[31-28] cat (IR[25-0]<<2)

Page 59: S3 -Processor Hardware Implementation (Control& Datapath)

60

4) Memory access or R-type completion

During this step the load/store instruction accesses memory or the AL instruction write its results.A. Memory reference: MDR = Memory[ALUOut]; (load) Memory[ALUOut] = B; (store)B. R-type: Reg[IR[15-11]] = ALUOut;

Shiftleft 2

PCMux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Instruction[15– 11]

Mux

0

1

Mux

0

1

4

Instruction[15– 0]

Signextend

3216

Instruction[25– 21]

Instruction[20– 16]

Instruction[15– 0]

Instructionregister

ALUcontrol

ALUresult

ALUZero

Memorydata

register

A

B

IorD

MemRead

MemWrite

MemtoReg

PCWriteCond

PCWrite

IRWrite

ALUO p

ALUSrcB

ALUSrcA

RegDst

PCSource

RegWrite

Control

O utputs

Op[5– 0]

Instruction[31-26]

Instruction [5– 0]

Mux

0

2

Jumpaddress [31-0 ]Instruction [25– 0] 26 28

Shiftleft 2

PC [31-28]

1

1 Mux

0

3

2

Mux

0

1ALUOut

Memory

MemData

Writedata

Address

Page 60: S3 -Processor Hardware Implementation (Control& Datapath)

61

5) Memory read completion step:

The load completes by writing the value from memory into a register.Reg[IR[20-16]]=MDR;

Shiftleft 2

PCMux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Instruction[15– 11]

Mux

0

1

Mux

0

1

4

Instruction[15– 0]

Signextend

3216

Instruction[25– 21]

Instruction[20– 16]

Instruction[15– 0]

Instructionregister

ALUcontrol

ALUresult

ALUZero

Memorydata

register

A

B

IorD

MemRead

MemWrite

MemtoReg

PCWriteCond

PCWrite

IRWrite

ALUO p

ALUSrcB

ALUSrcA

RegDst

PCSource

RegWrite

Control

O utputs

Op[5– 0]

Instruction[31-26]

Instruction [5– 0]

Mux

0

2

Jumpaddress [31-0 ]Instruction [25– 0] 26 28

Shiftleft 2

PC [31-28]

1

1 Mux

0

3

2

Mux

0

1ALUOut

Memory

MemData

Writedata

Address

Page 61: S3 -Processor Hardware Implementation (Control& Datapath)

62

Summary of the Steps

Step nameAction for R-type

instructionsAction for memory-reference

instructionsAction for branches

Action for jumps

Instruction fetch IR = Memory[PC]PC = PC + 4

Instruction A = Reg [IR[25-21]]decode/register fetch B = Reg [IR[20-16]]

ALUOut = PC + (sign-extend (IR[15-0]) << 2)

Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] IIcomputation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2)jump completion

Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut]completion ALUOut or

Store: Memory [ALUOut] = B

Memory read completion Load: Reg[IR[20-16]] = MDR

Page 62: S3 -Processor Hardware Implementation (Control& Datapath)

63

Hardwired control unit design

InstructionFetch

InstructionDecode

AddressComputation

ExecutionJump

CompletionBranch

Completion

MemoryRead

MemoryWrite

R-TypeCompletion

WriteBack

Load + Store R-type Branch Jump

Load Store

Start

0

1

2

3

5

6

7

4

8 9

Page 63: S3 -Processor Hardware Implementation (Control& Datapath)

64

MemReadALUSelA=0

IorD=0IRWrite

ALUSelB=01ALUOp=00

PCWritePCSource=00

ALUSelA=0ALUSelB=11ALUOp=00TargetWrite

ALUSelA=1ALUSelB=10ALUOp=00

ALUSelA=1ALUSelB=00ALUOp=10

PCWritePCSource=10

ALUSelA=1ALUSelB=00ALUOp=01

PCWriteCondPCSource=01

MemReadALUSelA=1

IorD=1ALUSelB=10ALUOp=00

MemWriteALUSelA=1

IorD=1ALUSelB=10ALUOp=00

ALUSelA=1RegDst=1RegWrite

MemtoReg=0ALUSelB=0ALUOp=10

MemReadALUSelA=1

IorD=1RegWrite

MemtoReg=1RegDst=0

ALUSelB=10ALUOp=00

Load + Store R-type Branch Jump

Load Store

Start

0 1

2

3

5

6

7

4

8

9

Page 64: S3 -Processor Hardware Implementation (Control& Datapath)

65

Control unit design:• The input to the

circuit is IR[31-26]

• From the previous diagram there are 10 states, so 4 flip flop (4-bit register) is needed, together with inputs, determine the next state and output.

Page 65: S3 -Processor Hardware Implementation (Control& Datapath)

66

• The next slide table (derived from state diagram) is written to simplify design (instead of using state table), we can note the following:

• All control output determined directly from current state: example

– PCWrite= 1 at s3s2s1s0= 0000 or 1001

• The ten next state is described from the inputs and ten current state, e.g.

– NextState2=1 at s3s2s1s0= 0001 and ( Op= 100011 or 101011

• The 4 output NS4-NS0, can be determined accordingly, e.g.

– NS0 =1 when NextState1, 3, 5 or 7 is true

Page 66: S3 -Processor Hardware Implementation (Control& Datapath)

67

Output Current states Op

PCWrite state0 + state9

PCWriteCond state8

IorD state3 + state5

MemRead state0 + state3

MemWrite state5

IRWrite state0

MemtoReg state4

PCSource1 state9

PCSource0 state8

ALUOp1 state6

ALUOp0 state8

ALUSrcB1 state1 +state2

ALUSrcB0 state0 + state1

ALUSrcA state2 + state6 + state8

RegWrite state4 + state7

RegDst state7

NextState0 state4 + state5 + state7 + state8 + state9

NextState1 state0

NextState2 state1 (Op = 'lw') + (Op = 'sw')

NextState3 state2 (Op = 'lw')

NextState4 state3

NextState5 state2 (Op = 'sw')

NextState6 state1 (Op = ’R-type’)

NextState7 state6

NextState8 state1 (Op = 'beq')

NextState9 state1 (Op = 'jmp')

Page 67: S3 -Processor Hardware Implementation (Control& Datapath)

68

PLA implementation for control logic part

Page 68: S3 -Processor Hardware Implementation (Control& Datapath)

69

• Several possible initial representations, sequence control and logic representation, and control implementation => all may be determined indep.

Initial Rep. Finite State Diagram Microprogram

Sequencing Explicit Next State MicroprogramControl Function Counter +

Dispatch ROMs

Logic Rep. Logic Equations Truth Tables

Implementation PLA ROM

Sequential Control Design:

“hardwired control” “microprogrammed control”

Page 69: S3 -Processor Hardware Implementation (Control& Datapath)

75

References:

• Computer Organization and Design 3E (John Hennessy & David Patterson)

• Logic and Computer Design Fundamentals,3E(M. Morris Mano & Charles Kime)

Page 70: S3 -Processor Hardware Implementation (Control& Datapath)

76