arquitectura de computadoras - cinvestavadiaz/arqcomp/10-multicycle.pdf · tecnologías de...
TRANSCRIPT
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 1
Designing a Multicycle Designing a Multicycle ProcessorProcessor
Arquitectura de ComputadorasArquitectura de ComputadorasArturo DArturo Dííaz Paz Péérezrez
Centro de InvestigaciCentro de Investigacióón y de Estudios Avanzados del IPNn y de Estudios Avanzados del IPNLaboratorio de TecnologLaboratorio de Tecnologíías de Informacias de Informacióónn
[email protected]@cinvestav.mx
Recap: Putting it All Together: A Single Cycle Recap: Putting it All Together: A Single Cycle ProcessorProcessor
32
ALUctr
Clk
busW
RegWr
3232
busA
32busB
55 5
Rw Ra Rb32 32-bitRegisters
Rs
Rt
Rt
RdRegDst
Extender
Mux
Mux
3216imm16
ALUSrc
ExtOp
Mux
MemtoReg
Clk
Data InWrEn
32Adr
DataMemory
32
MemWrA
LU
InstructionFetch Unit
Clk
Zero
Instruction<31:0>
0
1
0
1
01<21:25>
<16:20>
<11:15>
<0:15>
Imm16RdRsRt
MainControl
op6
ALUControlfunc
6
3ALUop
ALUctr3
RegDst
ALUSrc:
Instr<5:0>
Instr<31:26>
Instr<15:0>
nPC_sel
The The ““Truth TableTruth Table”” for the Main Controlfor the Main Control
R-type ori lw sw beq jumpRegDstALUSrcMemtoRegRegWriteMemWritenPC_selJumpExtOpALUop (Symbolic)
1001000x
“R-type”
01010000
Or
01110001
Add
x1x01001
Add
x0x0010x
Subtract
xxx0001x
xxx
op 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010
ALUop <2> 1 0 0 0 0 xALUop <1> 0 1 0 0 0 xALUop <0> 0 0 0 0 1 x
MainControl
op6
ALUControl(Local)
func
3
6
ALUop
ALUctr3
RegDstALUSrc
:
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 4
PLA Implementation of the Main ControlPLA Implementation of the Main Control
op<0>
op<5>. .op<5>. .<0>
op<5>. .<0>
op<5>. .<0>
op<5>. .<0>
op<5>. .<0>
R-type ori lw sw beq jumpRegWrite
ALUSrc
MemtoRegMemWrite
BranchJump
RegDst
ExtOp
ALUop<2>ALUop<1>ALUop<0>
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 5
Systematic Generation of ControlSystematic Generation of Control
♦
In our single-cycle processor, each instruction is realized by exactly one control command or “microinstruction”■
in general, the controller is a finite state machine■
microinstruction can also control sequencing (see later)
Control Logic / Store(PLA, ROM)
OPcode
Datapath
Inst
ruct
ion
Decode
Con
ditio
nsControlPoints
microinstruction
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 6
Outline of TodayOutline of Today’’s Lectures Lecture
♦
Recap: Single cycle processor♦
Faster designs
♦
Multicycle Datapath♦
Performance Analysis
♦
Multicycle Control
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 7
Abstract View of our single Abstract View of our single cycle processorcycle processor
♦
looks like a FSM with PC as state
PC
Nex
t PC
Reg
iste
rFe
tch ALU Reg
. W
rt
Mem
Acc
ess
Dat
aM
emInst
ruct
ion
Fetc
h
Res
ult S
tore
ALU
ctr
Reg
Dst
ALU
Src
ExtO
p
Mem
Wr
Equ
al
nPC
_sel
Reg
Wr
Mem
Wr
Mem
Rd
MainControl
ALUcontrol
op
fun
Ext
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 8
WhatWhat’’s wrong with our CPI=1 s wrong with our CPI=1 processor?processor?
♦
Long Cycle Time♦
All instructions take as much time as the slowest
♦
Real memory is not as nice as our idealized memory■
cannot always get the job done in one (short) cycle
PC Inst Memory mux ALU Data Mem mux
PC Reg
FileInst Memory mux ALU mux
PC Inst Memory mux ALU Data Mem
PC Inst Memory cmp mux
Reg
File
Reg
File
Reg
File
Arithmetic & Logical
Load
Store
Branch
Critical Path
setup
setup
setup
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 9
Memory Access TimeMemory Access Time
♦
Physics => fast memories are small (large memories are slow)
■
question: register file vs. memory♦
=> Use a hierarchy of memories
Storage Array
selected word line
addressstorage cell
bit line
sense ampsaddressdecoder
CacheProcessor
1 time-period
proc
. bus
L2Cache
mem
. bus
2-3 time-periods 20 - 50 time-periods
memory
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 10
Reducing Cycle TimeReducing Cycle Time
♦
Cut combinational dependency graph and insert register / latch♦
Do same work in two fast cycles, rather than one slow one♦
May be able to short-circuit path and remove some components for some instructions!
storage element
Acyclic CombinationalLogic
storage element
storage element
Acyclic CombinationalLogic (A)
storage element
storage element
Acyclic CombinationalLogic (B)
⇒
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 11
Basic Limits on Cycle TimeBasic Limits on Cycle Time
♦
Next address logic■
PC <= branch ? PC + offset : PC + 4♦
Instruction Fetch■
InstructionReg
<= Mem[PC]♦
Register Access■
A <= R[rs]♦
ALU operation■
R <= A + B
PC
Nex
t PC
Ope
rand
Fetc
h Exec Reg
. Fi
le
Mem
Acc
ess
Dat
aM
emInst
ruct
ion
Fetc
h
Res
ult S
tore
ALU
ctr
Reg
Dst
ALU
Src
ExtO
p
Mem
Wr
nPC
_sel
Reg
Wr
Mem
Wr
Mem
Rd
Control
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 12
Partitioning the CPI=1 Partitioning the CPI=1 DatapathDatapath
♦
Add registers between smallest steps
♦
Place enables on all registers
PC
Nex
t PC
Ope
rand
Fetc
h Exec Reg
. Fi
le
Mem
Acc
ess
Dat
aM
emInst
ruct
ion
Fetc
h
Res
ult S
tore
ALU
ctr
Reg
Dst
ALU
Src
ExtO
p
Mem
Wr
nPC
_sel
Reg
Wr
Mem
Wr
Mem
Rd
Equ
al
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 13
Example Multicycle Example Multicycle DatapathDatapath
♦
Critical Path ?
PC
Nex
t PC
Ope
rand
Fetc
h
Inst
ruct
ion
Fetc
h
nPC
_sel
IRRegFile E
xtA
LU Reg
. Fi
le
Mem
Acc
ess
Dat
aM
em
Res
ult S
tore
Reg
Dst
Reg
Wr
Mem
Wr
Mem
Rd
S
M
Mem
ToR
eg
Equ
al
ALU
ctr
ALU
Src
ExtO
p
A
B
E
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 14
Recall: StepRecall: Step--byby--step Processor step Processor DesignDesign
Step 1: ISA => Logical Register Transfers
Step 2: Components of the Datapath
Step 3: RTL + Components => Datapath
Step 4: Datapath + Logical RTs => Physical RTs
Step 5: Physical RTs => Control
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 15
Step 4: RStep 4: R--rtypertype (add, sub, . . .)(add, sub, . . .)
♦
Logical Register Transfer
♦
Physical Register Transfers
inst Logical Register Transfers
ADDU R[rd] <– R[rs] + R[rt]; PC <– PC + 4
inst Physical Register TransfersIR <–
MEM[pc]ADDU A<– R[rs]; B <– R[rt]
S <– A + BR[rd] <– S; PC <– PC + 4
Tim
e
Exe
c
Reg
. Fi
le
Mem
Acc
ess
Dat
aM
em
S
M
Reg
File
PC
Nex
t PC
IR
Inst
. Mem A
B
E
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 16
Step 4: Logical Step 4: Logical immedimmed
♦
Logical Register Transfer
♦
Physical Register Transfers
inst Logical Register Transfers
ORI R[rt] <– R[rs] OR ZExt(Im16); PC <– PC + 4
inst Physical Register TransfersIR <–
MEM[pc]ORI A<– R[rs]; B <– R[rt]
S <– A or ZExt(Im16)R[rt] <– S; PC <– PC + 4
Tim
e
Exe
c
Reg
. Fi
le
Mem
Acc
ess
Dat
aM
em
S
M
Reg
File
PC
Nex
t PC
IR
Inst
. Mem A
B
E
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 17
Step 4 : LoadStep 4 : Load
♦
Logical Register Transfer
♦
Physical Register Transfers
inst Logical Register Transfers
LW R[rt] <– MEM[R[rs] + SExt(Im16)];
PC <– PC + 4inst Physical Register Transfers
IR <–
MEM[pc]LW A<– R[rs]; B <– R[rt]
S <– A + SExt(Im16)M <– MEM[S]R[rd] <– M; PC <– PC + 4
Tim
e
Exe
c
Reg
. Fi
le
Mem
Acc
ess
Dat
aM
em
S
M
Reg
File
PC
Nex
t PC
IR
Inst
. Mem A
B
E
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 18
Step 4 : StoreStep 4 : Store
♦
Logical Register Transfer
♦
Physical Register Transfers
inst Logical Register Transfers
SW MEM[R[rs] + SExt(Im16)] <– R[rt];
PC <– PC + 4
inst Physical Register TransfersIR <–
MEM[pc]SW A<– R[rs]; B <– R[rt]
S <– A + SExt(Im16); MEM[S] <– B PC <– PC + 4
Tim
e
Exe
c Reg
. Fi
le
Mem
Acc
ess
Dat
aM
em
S
M
Reg
File
PC
Nex
t PC
IR
Inst
. Mem A
B
E
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 19
Step 4 : BranchStep 4 : Branch
♦
Logical Register Transfer
♦
Physical Register Transfers
Exe
c
Reg
. Fi
le
Mem
Acc
ess
Dat
aM
em
S
M
Reg
File
PC
Nex
t PC
IR
Inst
. Mem A
B
E
inst Logical Register Transfers
BEQ if R[rs] == R[rt]
then PC <= PC + 4+SExt(Im16) || 00
else PC <= PC + 4inst Physical Register Transfers
IR <–
MEM[pc]BEQ E<– (R[rs] = R[rt])
if E then PC <– PC + 4 else PC <–PC+4+SExt(Im16)||00
Tim
e
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 20
Alternative datapath (book): Alternative datapath (book): Multiple Cycle DatapathMultiple Cycle Datapath♦
Minimizes Hardware: 1 memory, 1 adder
IdealMemoryWrAdrDin
RAdr
32
32
32Dout
MemWr32
AL
U
3232
ALUOp
ALUControl
Instruction Reg
32
IRWr
32
Reg File
Ra
Rw
busW
Rb5
5
32busA
32busB
RegWr
Rs
Rt
Mux
0
1
Rt
Rd
PCWr
ALUSelA
Mux 01
RegDst
Mux
0
1
32
PC
MemtoReg
Extend
ExtOp
Mux
0
132
0
1
23
4
16Imm 32
<< 2
ALUSelB
Mux
1
0
Target32
Zero
ZeroPCWrCond PCSrc BrWr
32
IorD
AL
U O
ut
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 21
Our Control ModelOur Control Model
♦
State specifies control points for Register Transfer♦
Transfer occurs upon exiting state (same falling edge)
State X
Register TransferControl Points
Depends on Input
Control State
Next StateLogic
Output Logic
inputs (conditions)
outputs (control points)
Step 4 Step 4 ⇒⇒
Control Specification for Control Specification for multicyclemulticycle procproc
IR <= MEM[PC]
R-type
A <= R[rs]B <= R[rt]
S <= A fun B
R[rd] <= SPC <= PC + 4
S <= A or ZX
R[rt] <= SPC <= PC + 4
ORi
S <= A + SX
R[rt] <= MPC <= PC + 4
M <= MEM[S]
LW
S <= A + SX
MEM[S] <= BPC <= PC + 4
BEQ
PC <=Next(PC,Equal)
SW
“instruction fetch”
“decode / operand fetch”
Exe
cute
Mem
ory
Writ
e-ba
ck
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 23
Traditional FSM ControllerTraditional FSM Controller
State
6
4
11nextState
op
Equal
control points
state op condnextstate control points
Truth Table
datapath
State
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 24
Step 5 Step 5 ⇒⇒
((datapathdatapath + state + state diagramdiagram ⇒ ⇒
control)control)
♦
Translate RTs
into control points
♦
Assign states
♦
Then go build the controller
Mapping Mapping RTsRTs to Control Pointsto Control PointsIR <= MEM[PC]
R-type
A <= R[rs]B <= R[rt]
S <= A fun B
R[rd] <= SPC <= PC + 4
S <= A or ZX
R[rt] <= SPC <= PC + 4
ORi
S <= A + SX
R[rt] <= MPC <= PC + 4
M <= MEM[S]
LW
S <= A + SX
MEM[S] <= BPC <= PC + 4
BEQPC <=
Next(PC,Equal)
SW
“instruction fetch”
“decode”
imem_rd, IRen
ALUfun, Sen
RegDst, RegWr,PCen
Aen, Ben,Een
Exe
cute
Mem
ory
Writ
e-ba
ck
Assigning StatesAssigning StatesIR <= MEM[PC]
R-type
A <= R[rs]B <= R[rt]
S <= A fun B
R[rd] <= SPC <= PC + 4
S <= A or ZX
R[rt] <= SPC <= PC + 4
ORi
S <= A + SX
R[rt] <= MPC <= PC + 4
M <= MEM[S]
LW
S <= A + SX
MEM[S] <= BPC <= PC + 4
BEQPC <= Next(PC)
SW
“instruction fetch”
“decode”
0000
0001
0100
0101
0110
0111
1000
1001
1010
00111011
1100
Exe
cute
Mem
ory
Writ
e-ba
ck
Detailed Control SpecificationDetailed Control Specification
0000 ??????? 0001 10001 BEQ x 0011 1 1 0001 R-type x 0100 1 1 0001 orI x 0110 1 1 0001 LW x 1000 1 1 0001 SW x 1011 1 1
0011 xxxxxx 0 0000 1 00011 xxxxxx 1 0000 1 10100 xxxxxx x 0101 0 1 fun 10101 xxxxxx x 0000 1 0 0 1 10110 xxxxxx x 0111 0 0 or 10111 xxxxxx x 0000 1 0 0 1 01000 xxxxxx x 1001 1 0 add 11001 xxxxxx x 1010 1 0 01010 xxxxxx x 0000 1 0 1 1 01011 xxxxxx x 1100 1 0 add 11100 xxxxxx x 0000 1 0 0 1
State
Op field
Eq
Next IR
PC
Ops
Exec
Mem
Write-Backen sel
A B Ex Sr
ALU S R W M
M-R Wr
Dst
R:
ORi:
LW:
SW:
-all same in Moore machine
BEQ:
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 28
Performance EvaluationPerformance Evaluation
♦
What is the average CPI?■
state diagram gives CPI for each instruction type
■
workload gives frequency of each type
Type
CPIi
for type
Frequency
CPIi
x freqIiArith/Logic
4
40%
1.6
Load
5
30%
1.5
Store
4
10%
0.4
branch
3
20%
0.6
Average CPI:4.1
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 29
Controller DesignController Design
♦
The state diagrams that arise define the controller for an instruction set processor are highly structured
♦
Use this structure to construct a simple “microsequencer”♦
Control reduces to programming this very simple device■
⇒ microprogramming
sequencercontrol
datapath
control
micro-PCsequencer
microinstruction
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 30
Example: JumpExample: Jump--CounterCounter
op-codeMap ROM
Counterzeroincload
0000i
i+1
i
None of above: Do nothing(for wait states)
Using a Jump CounterUsing a Jump CounterIR <= MEM[PC]
R-type
A <= R[rs]B <= R[rt]
S <= A fun B
R[rd] <= SPC <= PC + 4
S <= A or ZX
R[rt] <= SPC <= PC + 4
ORi
S <= A + SX
R[rt] <= MPC <= PC + 4
M <= MEM[S]
LW
S <= A + SX
MEM[S] <= BPC <= PC + 4
BEQPC <= Next(PC)
SW
“instruction fetch”
“decode”
0000
0001
0100
0101
0110
0111
1000
1001
1010
00111011
1100
inc
load
zero zerozero
zero
zeroinc inc inc inc
inc
Exe
cute
Mem
ory
Writ
e-ba
ck
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 32
Our Our MicrosequencerMicrosequencer
Micro-PC
Map ROM
op-code
Z I L datapath
control
taken
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 33
Microprogram Control SpecificationMicroprogram Control Specification
0000 ? inc 10001 0 load 1 1
0011 0 zero 1 00011 1 zero 1 10100 x inc 0 1 fun 10101 x zero 1 0 0 1 10110 x inc 0 0 or 10111 x zero 1 0 0 1 01000 x inc 1 0 add 11001 x inc 1 0 01010 x zero 1 0 1 1 01011 x inc 1 0 add 11100 x zero 1 0 0 1
µPC
Taken
Next IR
PC
Ops
Exec
Mem
Write-Backen sel
A B Ex Sr
ALU S R W M
M-R Wr
Dst
R:
ORi:
LW:
SW:
BEQ
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 34
Mapping ROMMapping ROM
R-type000000 0100
BEQ
000100 0011
ori
001101 0110
LW
100011 1000
SW
101011 1011
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 35
Example: Controlling MemoryExample: Controlling Memory
PC
InstructionMemory
Inst. Reg
addr
data
IR_en
InstMem_rd
IM_wait
Controller handles nonController handles non--ideal memoryideal memoryIR <= MEM[PC]
R-type
A <= R[rs]B <= R[rt]
S <= A fun B
R[rd] <= SPC <= PC + 4
S <= A or ZX
R[rt] <= SPC <= PC + 4
ORi
S <= A + SX
R[rt] <= MPC <= PC + 4
M <= MEM[S]
LW
S <= A + SX
MEM[S] <= B
BEQPC <=
Next(PC)
SW
“instruction fetch”
“decode / operand fetch”
Exe
cute
Mem
ory
Writ
e-ba
ck
~wait wait
~wait wait
PC <= PC + 4
~wait wait
Really Simple TimeReally Simple Time--State ControlState Controlin
stru
ctio
nfe
tch
deco
deE
xecu
teM
emor
yIR <= MEM[PC]
R-type
A <= R[rs]B <= R[rt]
S <= A fun B
R[rd] <= SPC <= PC + 4
S <= A or ZX
R[rt] <= SPC <= PC + 4
ORi
S <= A + SX
R[rt] <= MPC <= PC + 4
M <= MEM[S]
LW
S <= A + SX
MEM[S] <= B
BEQ
PC <=Next(PC)
SW
~wait wait
wait
PC <= PC + 4
wait
writ
e-ba
ck
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 38
TimeTime--state Control Pathstate Control Path
♦
Local decode and control at each stage
Exe
c Reg
. Fi
le
Mem
Acc
ess
Dat
aM
em
A
B
S
M
Reg
File
Equ
al
PC
Nex
t PC
IR
Inst
. Mem
Valid
IRex
Dcd
Ctrl
IRm
em
Ex
Ctrl
IRw
b
Mem
Ctrl
WB
Ctrl
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 39
Overview of ControlOverview of Control
♦
Control may be designed using one of several initial representations. The choice of sequence control, and how logic is represented, can then be determined independently; the control can then be implemented with one of several methods using a structured logic technique.
Initial Representation Finite State Diagram Microprogram
Sequencing Control
Explicit Next State Microprogram counter Function + Dispatch ROMs
Logic Representation
Logic Equations
Truth Tables
Implementation PLA
R
OM Technique
“hardwired control” “microprogrammed control”
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 40
SummarySummary
♦
Disadvantages of the Single Cycle Processor■
Long cycle time
■
Cycle time is too long for all instructions except the Load
♦
Multiple Cycle Processor:■
Divide the instructions into smaller steps
■
Execute each step (instead of the entire instruction) in one cycle
♦
Partition datapath
into equal size chunks to minimize cycle time■
~10 levels of logic between latches
♦
Follow same 5-step method for designing “real”
processor
Laboratorio deTecnologías de Información
Arquitectura de Computadoras Multicycle- 41
Summary (contSummary (cont’’d)d)
♦
Control is specified by finite state diagram♦
Specialize state-diagrams easily captured by microsequencer■
simple increment & “branch”
fields
■
datapath
control fields
♦
Control design reduces to Microprogramming ♦
Control is more complicated with:■
complex instruction sets
■
restricted datapaths
(see the book)
♦
Simple Instruction set and powerful datapath
⇒
simple control■
could try to reduce hardware (see the book)
■
rather go for speed => many instructions at once!