cmsc 22200 computer architecture - department of … · · 2016-09-29cmsc 22200 computer...
TRANSCRIPT
CMSC 22200 Computer Architecture
Lecture 2: ISA
Prof. Yanjing Li Department of Computer Science
University of Chicago
Administrative Stuff ! Lab1 is out!
" Due next Thursday (10/6)
! Lab2 " Out next Thursday
2
Lecture Outline ! Introduction to ISA
! Case Study: ARMv8 / LEGv8
3
Review: Basic Concepts ! Basic concepts
" What is a computer? " What is the von Neumann model? " What is ISA? " What is uarch? " Design point
4
! Instruction (e.g., add) ! Number of general purpose registers ! Number of ports to the register file ! Number of cycles to execute the MUL instruction ! Whether or not the machine employs pipelined instruction
execution ! Power/thermal management ! Support for virtual memory
5
ISA or uarch?
ISA ! Instructions
" Opcodes, Addressing Modes, Data Types " Instruction Types and Formats " Registers, Condition Codes
! Memory organization " Address space, Addressability, Alignment " Virtual memory management
! Call, Interrupt/Exception Handling ! Access Control, Priority/Privilege ! I/O: memory-mapped vs. instr. ! Task/thread Management ! Power and Thermal Management ! Multi-threading support, Multiprocessor support
6
Many Different ISAs Over Decades ! X86 ! ARM ! MIPS ! SPARC ! IBM 360
! What/why are the fundamental differences?
7
ISA Element: Instruction ! Or machine code, consists of
" opcode: what the instruction does (add, sub, …) " operands: who it is to do it to (register, memory, immediate)
! Example
8
Data Types ! Representation of information for which there are
instructions that operate on the representation
! ARMv8 " Integer (byte, half word, word, doubleword, quad word) " Floating point (half-, single-, double-precision) " Fixed point " Vector formats
! Others (e.g., x86) " BCD, strings
9
Instruction Process Style ! Specifies the number of “operands” an instruction
“operates” on and how it does so
! 0, 1, 2, 3 address machines " 0-address: stack machine (op, push, pop) " 1-address: accumulator machine (e.g., add mem) " 2-address: 2-operand machine (op D, S; one is both source
and dest) " 3-address: 3-operand machine (op D, S1, S2; source and dest
separate)
! E.g., ARMv8 represents a 3-address machine
10
Instruction Classes ! Operate instructions
" Process data: arithmetic and logical operations " Fetch operands, compute result, store result " Implicit sequential control flow (e.g., PC <= PC + 4)
! Data movement instructions " Move data between memory, registers, I/O devices " Implicit sequential control flow
! Control flow instructions " Change the sequence of instructions that are executed
11
Instruction Addressing Modes ! Specifies how to obtain an operand of an instruction
" Register " Immediate " Memory (displacement, register indirect, indexed, absolute,
memory indirect, autoincrement, autodecrement, …)
! Fewer or more addressing modes? Tradeoffs?
12
Instruction Addressing Modes for Memory ! Specify how to obtain memory operands
" Absolute LWRt,10000useimmediatevalueasaddress
" RegisterIndirect: LWRt,(rbase)
useGPR[rbase]asaddress
" Displacedorbased: LWRt,offset(rbase)useoffset+GPR[rbase]asaddress
" Indexed: LWRt,(rbase,rindex)
useGPR[rbase]+GPR[rindex]asaddress" MemoryIndirect LWRt((rbase))
usevalueatM[GPR[rbase]]asaddress
" Autoinc/decrement LWRt,(rbase)useGRP[rbase]asaddress,butinc.ordec.GPR[rbase]eachKme
13
Instruction Length ! Fixed length: Length of all instructions the same
+ Easier to decode single instruction in hardware + Easier to decode multiple instructions concurrently (superscalar) -- Wasted bits in instructions (Why is this bad?) -- Harder-to-extend ISA (how to add new instructions?)
! Variable length: Length of instructions different + Compact encoding (Why is this good?) + extensibility -- More logic to decode a single instruction -- Harder to decode multiple instructions concurrently
! Tradeoffs " Code size (memory space, bandwidth, latency) vs. hardware complexity " ISA extensibility and expressiveness vs. hardware complexity " Performance/energy efficiency? Smaller code vs. ease of decode
14
Uniform/Non-uniform Decode of Inst ! Uniform decode: Same bits in each instruction correspond
to the same meaning " Opcode is always in the same location " Ditto operand specifiers, immediate values, … " Many �RISC� ISAs: MIPS, SPARC + Easier decode, simpler hardware + Enables parallelism: generate target address before knowing the
instruction is a branch -- Restricts instruction format (fewer instructions?) or wastes space
! Non-uniform decode " E.g., opcode can be the 1st-7th byte in x86 + More compact and powerful instruction format -- More complex decode logic
! Uniform decode usually means fixed length as well
15
x86 vs. MIPS Instruction Formats ! x86
! MIPS:
16
R-type06-bit
rs5-bit
rt5-bit
rd5-bit
shamt5-bit
funct6-bit
opcode6-bit
rs5-bit
rt5-bit
immediate16-bit
I-type
opcode6-bit
immediate26-bit
J-type
ISA Element: Registers ! Fast storage
! How many? ! Size of each register? ! General purpose vs. special purpose?
! Why is having registers a good idea? " Because programs exhibit a characteristic called data locality " A recently produced/accessed value is likely to be used more
than once (temporal locality) ! Storing that value in a register eliminates the need to go to
memory each time that value is needed ! Complier: Register optimization is important!
17
ISA Element: Memory Organization ! Address space: How many uniquely identifiable locations in
memory
! Addressability: How much data does each uniquely identifiable location store " Byte addressable: most ISAs
! Aligned/unaligned access
18
byte-3 byte-2 byte-1 byte-0
byte-7 byte-6 byte-5 byte-4
MSB LSB
Load/Store vs. Memory/Memory Architectures
! Load/store architecture: operate instructions operate only on registers
! E.g., MIPS, ARM and many RISC ISAs
! Memory/memory architecture: operate instructions can operate on memory locations
! E.g., x86
19
ISA Element: I/O ! How to interface with I/O devices
" Memory mapped I/O ! A region of memory is mapped to I/O devices ! I/O operations are loads and stores to those locations
" Special I/O instructions ! IN and OUT instructions in x86 deal with ports of the chip
" Tradeoffs? ! Which one is more general purpose?
20
Other ISA Elements ! Privilege modes
" User vs supervisor " Who can execute what instructions?
! Exception and interrupt handling " What procedure is followed when something goes wrong with an
instruction? " What procedure is followed when an external device requests the processor?
! Virtual memory " Each program has the illusion of the entire memory space, which is greater
than physical memory
! Access protection
21
CISC vs. RISC ! CISC, Complex instruction set computer # complex instructions
" Initially motivated by �not good enough� code generation " Memory size/bandwidth considerations
! RISC, Reduced instruction set computer # simple instructions " Goal: enable better compiler control and optimization " Motivated by
! Simplifying the hardware # lower cost, higher frequency ! Enabling the compiler to optimize the code better
! Simple compiler, complex hardware vs. complex compiler, simple hardware
22
CISC vs. RISC ! Usually, …
! RISC " Simple instructions " Fixed length " Uniform decode " Few addressing modes
! CISC " Complex instructions " Variable length " Non-uniform decode " Many addressing modes
23
CISC vs. RISC ! Example: x86
! Each x86 instruction can be translated into a sequence of micro-instructions (uops) " Uops can be RISC-like " Stored in a read-only memory structure (UROM) " Why uops?
! Simple processing engine to support complex instructions ! Extensibility ! Flexibility (can be patched to fix bugs)
! Translation # unification of ISAs (ARM, x86, GPU)?
24
Aside: Ultimate RISC
25 wikipedia
Review: Programmer Visible (Architectural) State
26
M[0]M[1]M[2]M[3]M[4]
M[N-1]MemoryarrayofstoragelocaKonsindexedbyanaddress
ProgramCountermemoryaddressofthecurrentinstrucKon
Registers-givenspecialnamesintheISA(asopposedtoaddresses)-generalvs.specialpurpose
InstrucKons(andprograms)specifyhowtotransformthevaluesofprogrammervisiblestate
Programmer Invisible State ! Microarchitectural state ! Programmer cannot access this directly
! E.g. cache state ! E.g. pipeline registers
27
ARMv8/LEGv8 Case Study
28
The ARMv8 ISA ! Commercialized by ARM Holdings (www.arm.com) ! Large share of embedded core market
" Applications in mobile, consumer electronics, network/storage equipment, cameras, printers, …
! Typical of many modern ISAs ! Reference (5740 pages)
" https://developer.arm.com/docs/ddi0487/a/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile
**Basedonoriginalfigurefrom[P&HCO&D,COPYRIGHT2016Elsevier.ALLRIGHTSRESERVED.]
ARMv8 Overview ! RISC, Load/store architecture, both 32- and 64-bit ! 3-address machine ! 32-bit instructions ! Simple datatypes
" int, fp, fixed point/vector interpretation ! Addressing modes: reg, imm, simple mem addressing
" mem address from reg and instruction contents only ! 32 GPRs, PC, SP, ELR, 32 SIMD/FP registers ! Byte addressable ! Memory space and memory alignment? ! You will implement ARMv8 in C (Lab1)
LEGv8 ! A subset of ARMv8
" With some differences
! Reference " Green card from textbook " Also available online " http://booksite.elsevier.com/9780128017333/arm_ref.php
Instruction Formats
**Basedonoriginalfigurefrom[P&HCO&D,COPYRIGHT2016Elsevier.ALLRIGHTSRESERVED.]
Registers ! 32 × 64-bit register file, and 1 64-bit PC
**Basedonoriginalfigurefrom[P&HCO&D,COPYRIGHT2016Elsevier.ALLRIGHTSRESERVED.]
Memory Accesses
! Memory is byte addressed " Each address identifies an 8-bit byte
! Alignment " Does not require words (4 bytes, or 32 bits) to be
aligned in memory, except for instructions and the stack
**Basedonoriginalfigurefrom[P&HCO&D,COPYRIGHT2016Elsevier.ALLRIGHTSRESERVED.]
R-format Instructions
! Instruction fields " opcode: operation code " Rm: the second register source operand " shamt: shift amount " Rn: the first register source operand " Rd: the register destination
opcode Rm shamt Rn Rd 11 bits 5 bits 6 bits 5 bits 5 bits
**Basedonoriginalfigurefrom[P&HCO&D,COPYRIGHT2016Elsevier.ALLRIGHTSRESERVED.]
R-format Example
ADD X9,X20,X21 // add the values in X20 and X21, and put
//the result in X9, or GPR[x9] = GPR[x20]+GPR[x21]
10001011000two 10101two 000000two 10100two 01001two
1000 1011 0001 0101 0000 0010 1000 1001two =
8B15028916
opcode Rm shamt Rn Rd 11 bits 5 bits 6 bits 5 bits 5 bits
**Basedonoriginalfigurefrom[P&HCO&D,COPYRIGHT2016Elsevier.ALLRIGHTSRESERVED.]
shamt in R-format instructions
! shamt: how many positions to shift ! Shift left logical (LSL)
" R[Rd] <- R[Rn] << shamt //Shift left and fill with 0 bits " LSL by i bits: multiplies by 2i
! Shift right logical (LSR) " R[Rd] <- R[Rn] >> shamt //Shift right and fill with 0 bits " LSR by i bits: divides by 2i (unsigned only)
! Note, R-format instructions in ARMv8 support shift operations in the second operand before applying the operation specified in opcode
opcode Rm shamt Rn Rd 11 bits 5 bits 6 bits 5 bits 5 bits
**Basedonoriginalfigurefrom[P&HCO&D,COPYRIGHT2016Elsevier.ALLRIGHTSRESERVED.]
C to Assembly 101 ! C code:
f = (g + h) - (i + j); " f, …, j in X19, X20, …, X23
! Compiled into assembly:
ADD X9, X20, X21 ADD X10, X22, X23 SUB X19, X9, X10
**Basedonoriginalfigurefrom[P&HCO&D,COPYRIGHT2016Elsevier.ALLRIGHTSRESERVED.]
I-format Instructions
! Immediate instructions " Rn: source register " Rd: destination register " Immediate field: constant data; zero-extended
! Example: ADDI X22, X22, #4
" What does the machine code look like for ADDI?
opcode Rn Rd 10 bits 12 bits 5 bits 5 bits
immediate
**Basedonoriginalfigurefrom[P&HCO&D,COPYRIGHT2016Elsevier.ALLRIGHTSRESERVED.]
D-format Instructions
! Load/store instructions " Rn: base register " address: constant offset from contents of base register (+/- 32
doublewords) " op2: expands the opcode field " Rt: destination (load) or source (store) register number
! Example: LDUR X9,[X22,#64]
" LDUR opcode:111110000102; op2:0
" X9 (Rt field)
" X22 (Rn field)
opcode op2 Rn Rt 11 bits 9 bits 2 bits 5 bits 5 bits
addOffset
**Basedonoriginalfigurefrom[P&HCO&D,COPYRIGHT2016Elsevier.ALLRIGHTSRESERVED.]
C to Assembly 201 ! C code:
A[12] = h + A[8]; " h in X21, base address of A in X22
! Compiled code: " Index 8 requires offset of 64 (byte-addressed memory)
LDUR X9,[X22,#64]
ADD X9,X21,X9
STUR X9,[X22,#96]
**Basedonoriginalfigurefrom[P&HCO&D,COPYRIGHT2016Elsevier.ALLRIGHTSRESERVED.]
B Format Instructions
! Example: B L1 " branch unconditionally to instruction labeled L1;
! B opcode: 0A016-0BF16 " In ARMv8, it is 0001012
! Effect: if taken, PC = PC + BranchAddr
opcode 6 bits 26 bits
BR_address
**Basedonoriginalfigurefrom[P&HCO&D,COPYRIGHT2016Elsevier.ALLRIGHTSRESERVED.]