ec 513 - ascs · 2019-05-03 · digital design circuit design compiler operating system...

Post on 09-Jun-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Department of Electrical & Computer Engineering

EC 513Computer Architecture

Prof. Michel A. Kinsy

Single-cycle ISA Implementation

Department of Electrical & Computer Engineering

Computer System View

DigitalDesign

CircuitDesign

Compiler

OperatingSystem

Applications

Firmware

Datapath &Control

Layout

I/OsystemProcessor MemoryorganizationISA

Department of Electrical & Computer Engineering

Role of the Instruction Set

Instruction SetHardware

Software

Department of Electrical & Computer Engineering

§ Instructions are the language the computer understand

§ Instruction Set is the vocabulary of that language§ It serves as the hardware/software interface

§ Defines instruction format (bit encoding)§ Number of explicit operands per instruction§ Operand location§ Number of bits per instruction§ Instruction length: fixed, short, long, or variable., …

§ Examples: MIPS, Alpha, x86, IBM 360, VAX, ARM, JVM

Instruction Set Architecture (ISA)

Department of Electrical & Computer Engineering

§ Many possible implementations of the same ISA§ 360 implementations: model 30 (c. 1964), z900 (c.

2001)§ x86 implementations: 8086 (c. 1978), 80186, 286,

386, 486, Pentium, Pentium Pro, Pentium-4, Core i7, AMD Athlon, AMD Opteron, Transmeta Crusoe, SoftPC

§ MIPS implementations: R2000, R4000, R10000, ...§ JVM: HotSpot, PicoJava, ARM Jazelle, ...

Instruction Set Architecture (ISA)

Department of Electrical & Computer Engineering

ISA and Performance§ Instructions per program depends on source code,

compiler technology and ISA§ Cycles per instructions (CPI) depends upon the

ISA and the microarchitecture§ Time per cycle depends upon the

microarchitecture and the base technology

Time =Instructions Cycles Time

ProgramProgram*Instruction*Cycle

Department of Electrical & Computer Engineering

Processor Performance§ Instructions per program depends on source code,

compiler technology and ISA§ Cycles per instructions (CPI) depends upon the

ISA and the microarchitecture§ Time per cycle depends upon the

microarchitecture and the base technologyMicroarchitecture CPI cycle time

Microcoded >1 short

Single-cycle unpipelined 1 long

Pipelined 1 short

Department of Electrical & Computer Engineering

Brief Overview of the RISC-V ISA§ A new, open, free ISA from Berkeley§ Several variants

• RV32, RV64, RV128 – Different data widths• ‘I’ – Base Integer instructions• ‘M’ – Multiply and Divide• ‘A’ – Atomic memory instructions• ‘F’ and ‘D’ – Single and Double precision floating point• ‘V’ – Vector extension• And many other modular extensions

§ We will focus on the RV32I the base 32-bit variant

Department of Electrical & Computer Engineering

RV32I Register State§ 32 general purpose registers (GPR)

§ x0, x1, …, x31§ 32-bit wide integer registers§ x0 is hard-wired to zero

§ Program counter (PC)§ 32-bit wide

§ CSR (Control and Status Registers)§ User-mode

§ cycle (clock cycles) // read only§ instret (instruction counts) // read only

§ Machine-mode§ hartid (hardware thread ID) // read only§ mepc, mcause etc. used for exception handling

§ Custom§ mtohost (output to host) // write only – custom extension

Department of Electrical & Computer Engineering

Instruction Types§ Register-to-Register Arithmetic and Logical

operations§ Control Instructions alter the sequential control flow§ Memory Instructions move data to and from memory§ CSR Instructions move data between CSRs and

GPRs; the instructions often perform read-modify-write operations on CSRs

§ Privileged Instructions are needed by the operating systems, and most cannot be executed by user programs

Department of Electrical & Computer Engineering

Instruction Formats§ R-type instruction

§ I-type instruction & I-immediate (32 bits)

§ I-imm = signExtend(inst[31:20])§ S-type instruction & S-immediate (32 bits)

§ S-imm = signExtend({inst[31:25], inst[11:7]})

funct7 rs2 funct3rs1 rd opcode7 5 5 3 5 7

imm[11:0] funct3rs1 rd opcode12 5 3 5 7

imm[11:5] rs2 funct3rs1 imm[4:0] opcode7 5 5 3 5 7

Department of Electrical & Computer Engineering

Datapath: Reg-Reg ALU Instructions

0x4Add

clk

addrinst

Inst.Memory

PC

inst<19:15>inst<24:20>

inst<11:7>

inst<14:12>

OpCode

ALU

ALUControl

RegWriteclk

rd1

GPRs

rs1rs2

wswd rd2

we

rd ß(rs) func (rt)funct7 rs2 funct3rs1 rd opcode7 5 5 3 5 7

Department of Electrical & Computer Engineering

Datapath: Reg-Imm ALU Instructions

OpCode

0x4Add

clk

addrinst

Inst.Memory

PCALU

RegWrite

clk

rd1

GPRs

rs1rs2

wswd rd2

weinst<19:15>

inst<11:7>

inst<14:12> ALUControl

ImmSelect

ImmSel

inst<31:20>

rt ß (rs) op immediateimm[11:0] funct3rs1 rd opcode12 5 3 5 7

Department of Electrical & Computer Engineering

Conflicts in Merging Datapath

OpCode

0x4Add

clk

addrinst

Inst.Memory

PCALU

RegWrite

clk

rd1

GPRs

rs1rs2

wswd rd2

weinst<19:15>inst<24:20>

inst<14:12> ALUControl

ImmSelect

ImmSel

inst<31:20>

inst<15:11>

rd ß(rs) func (rt)funct7 rs2 funct3rs1 rd opcode7 5 5 3 5 7

rt ß (rs) op immediateimm[11:0] funct3rs1 rd opcode

Department of Electrical & Computer Engineering

Op2SelReg / Imm

Conflicts in Merging Datapath

rd ß(rs) func (rt)funct7 rs2 funct3rs1 rd opcode7 5 5 3 5 7

rt ß (rs) op immediateimm[11:0] funct3rs1 rd opcode

OpCode

0x4Add

clk

addrinst

Inst.Memory

PCALU

RegWrite

clk

rd1

GPRs

rs1rs2

wswd rd2

weinst<19:15>inst<24:20>

inst<14:12> ALUControl

ImmSelect

ImmSel

inst<31:20>

inst<15:11>

Department of Electrical & Computer Engineering

Instruction Formats§ SB-type instruction & B-immediate (32 bits)

§ B-imm = signExtend({inst[31], inst[7], inst[30:25], inst[11:8], 1’b0})

§ U-type instruction & U-immediate (32 bits)

§ U-imm = signExtend({inst[31:12], 12’b0})

§ UJ-type instruction & J-immediate (32 bits)

§ J-imm = signExtend({inst[31], inst[19:12], inst[20], inst[30:21], 1’b0})

imm[12] imm[10:5] rs2 rs1 funct3 imm[4:1] imm[11] opcode1 6 5 5 3 4 1 7

rd opcode5 7

imm[31:12]20

imm[20] imm[10:1] rdimm[19:12]imm[11] opcode1 10 1 8 5 7

Department of Electrical & Computer Engineering

§ Register-Register instructions (R-type)

§ opcode=OP: rd ß rs1 (funct3, funct7) rs2§ funct3 = SLT/SLTU/AND/OR/XOR/SLL § funct3= ADD

§ funct7 = 0000000: rs1 + rs2§ funct7 = 0100000: rs1 – rs2

§ funct3 = SRL§ funct7 = 0000000: logical shift right§ funct7 = 0100000: arithmetic shift right

funct7 rs2 funct3rs1 rd opcode7 5 5 3 5 7

Computational Instructions

Department of Electrical & Computer Engineering

§ Register-immediate instructions (I-type)

§ opcode = OP-IMM: rd ß rs1 (funct3) I-imm§ I-imm = signExtend(inst[31:20])§ funct3 = ADDI/SLTI/SLTIU/ANDI/ORI/XORI

§ A slight variant in coding for shift instructions - SLLI / SRLI / SRAI§ rd ß rs1 (funct3, inst[30]) I-imm[4:0]

imm[11:0] funct3rs1 rd opcode12 5 3 5 7

Computational Instructions

Department of Electrical & Computer Engineering

§ Register-immediate instructions (U-type)

§ opcode = LUI : rd ß U-imm§ opcode = AUIPC : rd ß pc + U-imm§ U-imm = {inst[31:12], 12’b0}

rd opcode5 7

imm[31:12]20

Computational Instructions

Department of Electrical & Computer Engineering

Control Instructions§ Unconditional jump and link (UJ-type)

§ opcode = JAL: rd ß pc + 4; pc ß pc + J-imm§ J-imm = signExtend({inst[31], inst[19:12], inst[20], inst[30:21],

1’b0})§ Jump ±1MB range

§ Unconditional jump via register and link (I-type)

§ opcode = JALR: rd ß pc + 4; pc ß (rs1 + I-imm) & ~0x01§ I-imm = signExtend(inst[31:20])

imm[20] imm[10:1] rdimm[19:12]imm[11] opcode1 10 1 8 5 7

imm[11:0] funct3rs1 rd opcode12 5 3 5 7

Department of Electrical & Computer Engineering

Control Instructions § Conditional branches (SB-type)

§ opcode = BRANCH: pc ß compare(funct3, rs1, rs2) ? pc + B-imm : pc + 4

§ B-imm = signExtend({inst[31], inst[7], inst[30:25], inst[11:8], 1’b0})

§ Jump ±4KB range§ funct3 = BEQ/BNE/BLT/BLTU/BGE/BGEU

imm[12] imm[10:5] rs2 rs1 funct3 imm[4:1] imm[11] opcode1 6 5 5 3 4 1 7

Department of Electrical & Computer Engineering

Conditional Branches (BEQ/BNE/BLT/BGE/BLTU/BGEU)

0x4Add

PCSel

clk

WBSelMemWrite

addr

wdata

rdataData Memory

we

Op2SelImmSelOpCode

clk

clk

addrinst

Inst.Memory

PC rd1

GPRs

rs1rs2wswd rd2

we

ImmSelect

ALU

ALUControl

Add

br

pc+4

RegWrite

Br Logic

Department of Electrical & Computer Engineering

§ Load (I-type)

§ opcode = LOAD: rd ß mem[rs1 + I-imm]§ I-imm = signExtend(inst[31:20])§ funct3 = LW/LB/LBU/LH/LHU

§ Store (S-type)

§ opcode = STORE: mem[rs1 + S-imm] ß rs2§ S-imm = signExtend({inst[31:25], inst[11:7]})§ funct3 = SW/SB/SH

imm[11:0] funct3rs1 rd opcode12 5 3 5 7

imm[11:5] rs2 funct3rs1 imm[4:0] opcode7 5 5 3 5 7

Load & Store Instructions

Department of Electrical & Computer Engineering

Datapath for Memory Instructions

§ Should program and data memory be separate?§ Harvard style: separate (Aiken and Mark 1

influence)§ read-only program memory§ read/write data memory

§ Princeton style: the same (von Neumann’s influence)§ single read/write memory for program and data

§ Executing a Load or Store instruction requires accessing the memory more than once

Department of Electrical & Computer Engineering

Harvard ArchitectureWBSelALU / Mem

Op2Sel

“base”

disp

ImmSelOpCode

ALUControl

ALU

0x4

Add

clk

addrinst

Inst.Memory

PC

RegWrite

clk

rd1

GPRs

rs1rs2

wswd rd2

we

ImmSelect

clk

MemWrite

addr

wdata

rdataData Memory

we

Store (rs) + displacementimm rs2 funct3rs1 imm opcode7 5 5 3 5 7

Loadimm[11:0] funct3rs1 rd opcode

Department of Electrical & Computer Engineering

Hardwired Control§ Hardwired Control is pure Combinational Logic

Combinational Logic

op code

equal?

ImmSel

Op2Sel

FuncSel

MemWrite

WBSel

RegDst

RegWrite

PCSel

Department of Electrical & Computer Engineering

ALU Control & Immediate Extension

Inst<6:0> (Opcode)

Decode Map

Inst<14:12> (Func3)

ALUop

0?

+

FuncSel( Func, Op, +, 0? )

ImmSel( IType12,SType12, UType20)

Department of Electrical & Computer Engineering

Hardwired Control TableHardwiredControlTable

51

Opcode ImmSel Op2Sel FuncSel MemWr RFWen WBSel WASel PCSel

ALU ALUi LW SW BEQtrue

BEQfalse

J JAL

JALR

Op2Sel=Reg/Imm WBSel=ALU/Mem/PCWASel=rd/X1 PCSel=pc+4/br/rind/jabs

* * * no yes rindPC rdjabs* * * no yes PC X1

jabs* * * no no * *pc+4SBType12 * * no no * *

brSBType12 * * no no * *pc+4SType12 Imm + yes no * *

pc+4* Reg Func no yes ALU rdIType12 Imm Op pc+4no yes ALU rd

pc+4IType12 Imm + no yes Mem rd

Department of Electrical & Computer Engineering

Single-Cycle Hardwired Control

§ Harvard architecture: we will assume that§ clock period is sufficiently long for all of and the

following steps to be “completed”:§ 1. instruction fetch§ 2. decode and register fetch§ 3. ALU operation§ 4. data fetch if required§ 5. register write-back setup time

§ tC > tIFetch + tRFetch + tALU+ tDMem+ tRWB

Department of Electrical & Computer Engineering

Princeton Microarchitecture

IR

0x4

clk

RegDst

PCSel RegWrite

Op2Sel

WBSel

31

ImmSelOpCode

Add

rd1

GPRs

rs1rs2

wswd rd2

we

ImmSelect

addr

wdata

rdataData Memory

ALU

Add

ALUControl

clk

we

MemWrite

clk

PC

PCen

IRen AddrSel

clk

Fetch phase

Br Logic

Department of Electrical & Computer Engineering

Two-State Controller§ In the Princeton Microarchitecture, a flipflop can

be used to remember the phase

fetch phase

execute phaseAddrSel=ALUIRen=offPCen=onWen=on

AddrSel=PCIRen=onPCen=offWen=off

Department of Electrical & Computer Engineering

Hardwired Controller

oldcombinational logic(Harvard)

op code

equal?

ImmSel, Op2Sel,FuncSel, MemWrite,WBSel, RegDst,RegWrite, PCSel

MemWrite

IR

newcombinational logic

Pcen

Iren

AddrSel

S

1-bit Toggle FF I-fetch / Execute

RegWrite

.

.

.

Wen

Department of Electrical & Computer Engineering

Clock Period

§ Princeton architecture§ tC-Princeton > max {tM , tRF+ tALU+ tM + tWB}§ tC-Princeton > tRF+ tALU+ tM + tWB

§ while in the hardwired Harvard architecture§ tC-Harvard > tM + tRF + tALU+ tM+ tWB

§ which will execute instructions faster?

Department of Electrical & Computer Engineering

Clock Rate vs CPI§ Suppose tM >> tRF+ tALU + tWB

§ tC-Princeton = 0.5 * tC-Harvard

§ CPIPrinceton = 2§ CPIHarvard = 1

§ No difference in performance!

Department of Electrical & Computer Engineering

Princeton Microarchitecture§ Can we overlap instruction fetch and execute?

fetchphase execute phase

addr

wdata

rdataMemory

weALU

ImmSelect

PC

0x4Add

IRaddr

wdata

rdata

Memory

we rd1

GPRs

rs1rs2

wswd rd2

we

Department of Electrical & Computer Engineering

Princeton Microarchitecture§ Only one of the phases is active in any cycle

§ a lot of datapath is not in use at any given time

The same(mux not shown)

fetchphase execute phase

addr

wdata

rdataMemory

weALU

ImmSelect

PC

0x4Add

IRaddr

wdata

rdata

Memory

we rd1

GPRs

rs1rs2

wswd rd2

we

Department of Electrical & Computer Engineering

Stalling the instruction fetch

§ When stall condition is indicated§ don’t fetch a new instruction and don’t change the PC§ insert a nop in the IR § set the Memory Address mux to ALU (not shown)

fetchphase execute phase

addr

wdata

rdataMemory

weALU

ImmSelect

rd1

GPRs

rs1rs2

wswdrd2

we

PC

0x4Add

addr

wdata

rdata

Memory

we

stall?

nop

IR

Department of Electrical & Computer Engineering

Pipelined Princeton Architecture

§ Clock: tC-Princeton > tRF+ tALU+ tM

§ CPI: (1- f) + 2f cycles per instructionwhere f is the fraction of instructions that cause a stall

Department of Electrical & Computer Engineering

Instructions to Read and Write CSR

§ opcode = SYSTEM§ CSRW rs1, csr (funct3 = CSRRW, rd = x0): csr ß rs1§ CSRR csr, rd (funct3 = CSRRS, rs1 = x0): rd ß csr

csr funct3rs1 rd opcode12 5 3 5 7

Department of Electrical & Computer Engineering

GCD in C// require: x >= 0 && y > 0int gcd(int a, int b) {int t;while(a != 0) {if(a >= b) {

a = a - b;}else {

t = a; a = b; b = t;}

}return b;

}

Department of Electrical & Computer Engineering

GCD in RISC-V Assembler// a: x1, b: x2, t: x3begin:

beqz x1, done // if(x1 == 0) goto done

blt x1, x2, bigger // if(x1 < x2) goto b_bigger

sub x1, x1, x2 // x1 := x1 - x2

j begin // goto beginbigger:

mv x3, x1 // x3 := x1

mv x1, x2 // x1 := x2

mv x2, x3 // x2 := x3

j begin // goto begindone: // now x2 contains the gcd

Department of Electrical & Computer Engineering

GCD in RISC-V Assembler// a: a0, b: a1, t: t0gcd:beqz a0, doneblt a0, a1, biggersub a0, a0, a1j gcd

bigger:mv t0, a0 mv a0, a1mv a1, t0j gcd

done: // now a1 contains the gcdmv a0, a1 // move to a0 for returningret // jr ra

Department of Electrical & Computer Engineering

Exception handling § When an exception is caused

§ Hardware saves the information about the exception in CSRs:§ mepc – exception PC§ mcause – cause of the exception§ mstatus.mpp – privilege mode of exception

§ Processor jumps to the address of the trap handler (stored in the mtvec CSR) and increases the privilege level

§ An exception handler, a software program, takes over and performs the necessary action

Department of Electrical & Computer Engineering

Software for interrupt handling§ Hardware transfers control to the common

software interrupt handler (CH) which:1. Saves all GPRs into the memory pointed by

mscratch

2. Passes mcause, mepc, stack pointer to the IH (a C function) to handle the specific interrupt

3. On the return from the IH, writes the return value to mepc

4. Loads all GPRs from the memory5. Execute ERET, which does:

§ set pc to mepc§ pop mstatus (mode, enable) stack

IH

IH

IH

CH12345

GPR

Department of Electrical & Computer Engineering

Single Cycle RISC-V Processor

Department of Electrical & Computer Engineering

Single Cycle RISC-V Processor

Department of Electrical & Computer Engineering

Single Cycle RISC-V Processor

Department of Electrical & Computer Engineering

Single Cycle RISC-V Processor

Department of Electrical & Computer Engineering

Single Cycle RISC-V Processor

Department of Electrical & Computer Engineering

Next Class§ Instruction Pipelining and Hazards

top related