introduction & instructions - santa clara universityxyi/coen210/notes/1__intro_instructi… ·...

48
1 INTRODUCTION INTRODUCTION & INSTRUCTIONS & INSTRUCTIONS Dr. Bill Yi Santa Clara University (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann, 2007) (Also based on presentation: Dr. Nam Ling, COEN210 Lecture Notes)

Upload: leanh

Post on 01-Apr-2018

219 views

Category:

Documents


4 download

TRANSCRIPT

1

INTRODUCTIONINTRODUCTION& INSTRUCTIONS& INSTRUCTIONS

Dr. Bill YiSanta Clara University

(Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3rd Ed., Morgan Kaufmann, 2007)

(Also based on presentation: Dr. Nam Ling, COEN210 Lecture Notes)

2

COURSE CONTENTSCOURSE CONTENTSIntroductionIntroductionInstructionsInstructionsComputer ArithmeticProcessor: DatapathProcessor: ControlPipelining TechniquesMemoryInput/Output Devices

3

INTRODUCTIONINTRODUCTION

Overview the Computer SystemsEvolution of Memory and ProcessorHistorical PerspectiveLevels of Representation

4

A Desktop ComputerA Desktop Computer

A desktop computer (left figure)Motherboard, I/O interface board, board for memory chips, power supply, disk drives (right figure)

5

Inside a PCInside a PC

Patterson & Henessey, Morgan Kaufmann 2007

6

PC MotherboardPC Motherboard

Intel Pentium 4 processor - upper left, covered by metal fins (heat sink)Main memory DRAM – middle, small board perpendicular to mother board (DIMMs)The rest – mostly connectors for external I/O devices

7

Processor Chip Processor Chip -- 11

Earlier Intel Pentium Chip

Datacache

Instructioncache

Bus

Integerdata-path Floating-

pointdata-path

BranchControl

8

Processor Chip Processor Chip -- 22

Intel Pentium 4Intel Pentium 4 – die photo (Henessey & Patterson, Morgan Kaufmann 2003)

Intel Pentium 4 with 3 GHz -package (intel 2003)

9

Processor Chip Processor Chip -- 33

Intel Pentium 4

10

Hardware / SoftwareHardware / Software

Hardware: physical componentsSystem software: operating system, compiler, ....Application software: PowerPoint, spreadsheet, ...

System software

Application software

Hardware

11

Five Classic Components of Five Classic Components of a Computer + Networka Computer + Network

DatapathDatapath:: performs arithmetic & logic operationControl:Control: tells datapath, memory, I/O what to do according to instructionsMemory:Memory: stores programs + datacache (SRAM): small & fastDRAM: main memoryoptical disk (CD, DVD), magnetic disk, FLASH, magnetic tapes: secondary, nonvolatileInput:Input: inputs instructions, data, etc.; e.g. keyboard, mouse (electromech optical), disk...Output:Output: outputs results, information, etc.; e.g. monitor (flat-panel LCDs or CRT), printer, disk, …

Network:Network: communicates with other computers, resource sharing, non-local accesses; e.g. LAN, Internet, ...

Input Output

Datapath

Memory

Control

CPU

Network

12

A Historical PerspectiveA Historical Perspective

1946: J. Presper Eckert & John Mauchly (U. Penn.) announced ENIAC (Electronic Numerical Integrator and Calculator). It used vacuum tubes and performed 1900 adds/secJohn von Neumann joined Eckert & Mauchly and built EDVAC (Electronic Discrete Variable Automatic Computer), a stored-program computer1948: U. Manchester built Mark-I, first operational, stored-program computer1949: Maurice Wilkes (Camb. U.) built EDSAC (Electronic Delay Storage Automatic Calculator), first full-scale, operational, stored-program computer1940s: Other pioneers include Konrad Zuse (Germany), Alan Turing (UK)1940s: Howard Aiken (Harvard) built Mark-III & Mark-IV, with separate memories for instructions & data, hence Harvard Architecture1947: Whirlwind started at MIT, using magnetic core memory1951: 1st successful commercial computer, UNIVAC I (Universal Automatic Computer), built and sold (Remington-Rand / Eckert-Mauchly Computer Corp.)1952: IBM shipped IBM 701

13

A Historical PerspectiveA Historical Perspective

1964: IBM Syst/360. IBM/360 architectures dominated large computer market1965: DEC unveiled PDP-8, 1st commercial minicomputer1971: Intel invented 1st microprocessor, Intel 40041963: Seymour Cray at CDC announced CDC 6600, 1st supercomputer1976: Cray announced Cray-I, then fastest supercomputerNo single fountainhead for personal computer1977: Apple II by Steve Jobs & Steve Wozniak set standards for low cost, high volume1981: IBM announced IBM PC and became the best-selling computer of any kind; its success gave Intel the most popular microprocessor and Microsoft the most popular operating system1990s: Multimedia, networks, Internet, embedded processors, graphics, etc. 2000 - : Wireless & mobile (e.g. cell phone), 3-D graphics, multimedia (e.g. video), Internet, GHz processors, embedded, dual-core, quad-core, multi-core, etc.90s, 2000 - : Architectural techniques: Superscalar, dynamic pipelining, speculative execution, VLIW, multithreading, multi-core arch, etc.

14

Intel 80x86 HistoryIntel 80x86 History1978: Intel announced 8086 16-bit architecture (an extension to 8080 8-bit)1980: Intel announced 8087 floating point co-processor1982: Intel announced 80286, with address-space extended to 24 bits1985: Intel announced 80386, a 32-bit architecture1989: Intel 80486, with improved performance, pipelining1992: Intel Pentium, improved performance1995: Intel Pentium Pro, improved performance (> 100 MHz)1997: MMX extension, set of instructions to accelerate multimedia & communication applications1998: Intel Pentium II1999: Intel Pentium III2000: Intel Pentium III > 1 GHz, competition from AMD, Pentium IV (11/00)2002: Intel Pentium IV > 3 GHz (3.06 GHz) with multithreading and 0.13 micron technology2005: Intel Pentium D (dual-core version of Pentium 4 Extreme) - 2 independent execution units onto same processor2006-07: Intel Quad-Core, 65 nm technology

15

Technology Technology Trends Trends -- 11

16

Technology Trends Technology Trends -- 22

Moore’s law: transistor capacity doubles every 18-24 months

17

Multithreading &Multithreading &MultiMulti--core CPUscore CPUs

Threads (threads of execution) - a program forks itself into 2 or more simultaneously (or pseudo-simultaneously) running tasks

Multiple threads can be executed in parallel on many computers:Single processor - by time slicing when a single processor switches between different threads, so fast as to give the illusion of simultaneityMultiprocessor or multi-core system - achieved via multiprocessing, different threads & processes run simultaneously on different processors or cores.

Multi-core CPUs:Multi-chip approach - cores are made by different chips that are put together in a single package. Cores communicate using front side bus. L2 cache is separatedMonolithic approach - Cores are manufactured in only one chip, do not need to use front side bus. Memory cache is shared between the two cores. Better performance

18

Levels of RepresentationLevels of Representation

temp = v[k];v[k] = v[k+1];v[k+1] = temp;

lw $15, 0($2)lw $16, 4($2)sw $16, 0($2)sw $15, 4($2)

00000000101000010000000000011000

High level language program

Compiler

Assembly language program

Assembler

Object: Machine language modu.

Object: Library routine (machine lang.)Linker

Executable: Machine language prog.

Loader Memory

19

Instruction TypeInstruction FormatAddressing Modes

INSTRUCTIONSINSTRUCTIONS

20

IntroductionIntroduction

Instruction: Words of machine’s languageInstruction Set: Set of instructionRISC (Reduced Instruction Set Computer) Design Principles:

Principle 1: Simplicity favors regularityPrinciple 2: Smaller is fasterPrinciple 3: Good design demands good compromisesPrinciple 4: Make the common case fast

We’ll be working with MIPS architectureUsed by NEC, Nintendo, Cisco, Silicon Graphics, Sony, …

21

MIPS Instruction Set Arch.: MIPS Instruction Set Arch.: RegistersRegisters

Registers - 32 general purpose registers, 3 special purpose registers, each 32 bits

$zero (0): constant 0$at (1): reserved for assembler$v0-v1 (2-3): values for results & expression evaluation$a0-a3 (4-7): arguments$t0-t7 (8-15): temporaries$s0-s7 (16-23): saved$t8-t9 (24-25): more temporaries$gp (28): global pointer$sp (29): stack pointer$fp (30): frame pointer$ra (31): return address

Registers $0 - $31

PC

Hi

Lo

3 special purpose registersPC: program counterHi, Lo: for multiply and divide

22

MIPS Instruction Set Arch.:MIPS Instruction Set Arch.:MemoryMemory

Word length = 32 bitsMemory: byte addressable, Big Endian

1 word = 4 bytesEach address is to a byte

Registers are smaller than memory, but with faster access time

Note:Word – unit of access in a computerBig-endian – uses leftmost or “big end” byte as word addressLittle-endian – uses rightmost or “little end”byte as word address

Memory

Register

32 bits

8 bits

23

Registers vs. MemoryRegisters vs. Memory

Arithmetic instructions operands must be registers,

Only 32 registers providedCompiler associates variables with registersWhat about programs with lots of variables

Processor I/O

Control

Datapath

Memory

Input

Output

24

InstructionsInstructions

Load and store instructionsExample:

C code: A[12] = h + A[8];MIPS code: lw $t0, 32($s3)

add $t0, $s2, $t0sw $t0, 48($s3)

Can refer to registers by name (e.g., $s2, $t2) instead of numberStore word has destination lastRemember arithmetic operands are registers, not memory!

Can’t write: add 48($s3), $s2, 32($s3)

25

Our First ExampleOur First Example

Can we figure out the code?

swap(int v[], int k);{ int temp;temp = v[k]v[k] = v[k+1];v[k+1] = temp;}

swap:muli $2, $5, 4add $2, $4, $2lw $15, 0($2)lw $16, 4($2)sw $16, 0($2)sw $15, 4($2)jr $31

26

MIPS Instruction TypesMIPS Instruction Types

Arithmetic & logic (AL)add $s1, $s2, $s3 # $s1 ← $s2 + $s3 sub $s1, $s2, $s3 # $s1 ← $s2 - $s3

each AL inst. has exactly 3 operands, all in registers

addi $s1, $s2, 100 # s1 ← $s2 + 100the constant is kept in the instruction itself

Data transfer (load & store)lw $s1, 100($s2) # $s1 ← memory [$s2+100] (load word)sw $s1, 100($s2) # memory[$s2+100] ← $s1 (store word)lb $s1, 100($s2) # $s1 ← memory [$s2+100] (load byte)sb $s1, 100($s2) # memory[$s2+100] ← $s1 (store byte)

load/store bytes commonly used for moving characters (ASCII)

27

MIPS Instruction TypesMIPS Instruction Types

Conditional Branchbeq $s2, $s3, L1 # branch to L1 if $s2 = $s3bne $s2, $s3, L1 # branch to L1 if $s2 ≠ $s3beq $s1, $s2, 25 # branch to PC + 4 + 100 (=4x25) if $s1 = $s2slt $s2, $s3, $s4 # if ($s3) < ($s4) then $s2 ← 1;

# else $s2 ← 0 (set on less than)

Unconditional Branchj Loop # go to Loop (jump)j 2500 # go to 4x2500=10000 (jump)jr $t1 # go to $t1 (jump register)jal Proc1 # $ra ← PC + 4; go to Proc1 (jump & link)

28

Compiling a HighCompiling a HighLevel LanguageLevel Language

Assignment statement (operands in registers, operands in memory)Assignment statement (operands with variable array index)If-then-else statement Loop with variable array indexWhile loopCase / switch statementProcedure that doesn’t call another procedureNested proceduresUsing stringsUsing constantsPutting things together

29

Arithmetic instructionsuseful for assignment statements

Data transfer instructionsuseful for arrays or structures

Conditional branchesuseful for if-then-else statements & loops

Unconditional branchesCase / switch statements, procedure calls and returns

Compiling a HighCompiling a HighLevel LanguageLevel Language

30

Basic BlocksBasic Blocks

A basic block is a sequence of instructionswithout branches except possibly at the end, andwithout branch targets or branch labels, except possibly at the beginning

One of the first early phases of compilation is breaking the program into basic blocks

31

Procedure CallProcedure Call

Use the following registers$a0-a3: to pass parameters$v0-v1: to return values for results & expression evaluation$ra: return address$sp: stack pointer (points to top of stack)$fp: frame pointer

Use the following instructionsjal ProcedureAddress # it jumps to the procedure address and saves

# the return address (PC + 4) in register $rajr $ra # return jump; jump to the address stored in register $ra

Use stack a part of memoryto save the registers needed by the callee

32

Nested ProceduresNested Procedures

Use stack to preserve values ($a0-a3, $s0-s7, $sp, $ra, stack above $sp, and $fp& $gp if need to use them)No need to preserve $t0-t9, $v0-v1, stack below $spFrame pointer serves as stable base register within procedure for local referencesProcedure frame (activation record):

$fp

$sp

Arg. registers

Return address

$fp

$sp

Saved registers

Local arrays &structures

$fp

$sp

High address

Low address

33

Instruction FormatInstruction Format

All instructions are 32 bits3 types of formats: R-type (Regular)I-type (Immediate)J-type (Jump)Fields (# of bits)

op (6): opcode (basic operation)rs (5): 1st register source operandrt (5): 2nd register source opd.rd (5): register destination opd.shamt (5): shift amountfunct (6): function (select specific variant of operation in op field)

Op rs rt rd shamt funct

Op rs rt address/immediate

Op target address

address/immediate (16)target address (26)

34

Instruction Format Instruction Format (Examples) (Examples) -- 11

R-type Examples: add $t0, $s2, $t0sub $s1, $s2, $s3slt $s1, $s2, $s3jr $ra #0s in rt, rd, and shamt fields

I-type Examples: lw $s1, 100($s2) #100 appears in address/immediate fieldsw $s1, 100($s2) #100 appears in address/immediate fieldbeq $s1, $s2, 25 # 25 appears in address/immediate field (eqv. to 100)

J-type Examples:j 2500 #2500 appears in target address field (eqv. to 4x2500=10000)jal 2500 #2500 appears in target address field (eqv. to 4x2500=10000)

35

R-type Example: add $t0, $s2, $t0

I-type Example: lw $s1, 100($s2)

J-type Example: j 2500

Op=35 rs=18 rt=17 100

Op=0 rs=18 rt=8 rd=8 shamt=0 funct=32

Op=2 2500

000000 10010 01000 01000 00000 100000

Instruction Format Instruction Format (Examples) (Examples) -- 22

36

Motivation for IMotivation for I--type type InstructionsInstructions

For many operations, one operand = constantC compiler gcc: 52%Spice 69%

Design principle: Make the common case fast

37

JJ--Type InstructionsType Instructions

Example:j 200# go to location 800 (=200*4)

Other J type instruction:jal 200 # jump & link, go to location 800 (=200*4)

# $31(ra) ← PC + 4

38

Assembly Language vs. Assembly Language vs. Machine LanguageMachine Language

Assembly provides convenient symbolic representation

much easier than writing down numberse.g., destination first

Machine language is the underlying realitye.g., destination is no longer first

Assembly can provide ‘pseudoinstructions’e.g., “move $t0, $t1” exists only in Assemblywould be implemented using “add $t0, $t1, $zero”

When considering performance you should count real instructions

39

Overview of MIPSOverview of MIPS

Simple instructions all 32 bits wideVery structured, no unnecessary baggageOnly three instruction formats

Addresses are not 32 bitsHow do we handle this with load and store instructions

op rs rt rd shamt functop rs rt 16 bit addressop 26 bit address

RIJ

40

Addresses in Addresses in Branches and JumpsBranches and Jumps

Instructions:bne $t4,$t5,Label Next instruction is at Label if $t4≠$t5beq $t4,$t5,Label Next instruction is at Label if $t4=$t5j Label Next instruction is at Label

Formats:

op rs rt 16 bit addressop 26 bit address

IJ

41

Addresses in BranchesAddresses in Branches

Instructions:bne $t4,$t5,Label Next instruction is at Label if $t4≠$t5beq $t4,$t5,Label Next instruction is at Label if $t4=$t5

Formats:

Could specify a register (like lw and sw) and add it to addressUse Instruction Address Register (PC = program counter)Most branches are local (principle of locality)

Jump instructions just use high order bits of PCAddress boundaries of 256 MB

op rs rt 16 bit addressI

42

Addressing ModesAddressing Modes

Register addressingRegister addressingoperand is in a register, e.g. add $s1, $s2, $s3

Base or displacement addressingBase or displacement addressingoperand at memory location [register + constant (base)]e.g. 2nd operand in lw $t0, 200($s1)

Immediate addressingImmediate addressingoperand is a constant within instructione.g. 3rd operand in addi $s1, $s2, 10

PCPC--relative addressingrelative addressingaddress = PC (+4) + constant in instruction (*4)e.g. 3rd operand in bne $s0, $s1, Exit

PseudodirectPseudodirect addressingaddressingaddress = PC upper bits concatenated with 26-bit address in inst.

43

Byte Halfword Word

Registers

Memory

Register

Register

1. Immediateaddressing

2. Registeraddressing

3. Baseaddressing

op rs rt

op rs rt

op rs rt Address

rd . . . funct

Immediate

+

Addressing ModesAddressing Modes

44

Memory

Word

Memory

Word

4. PC-relativeaddressing

5. Pseudodirect addressing

op

op

rs rt Address

Address

PC

PC

+

Addressing ModesAddressing Modes

45

Other IssuesOther Issues

MIPS assembler accepts this pseudoinstruction even though it is not found in MIPS architecture:

move $t0, $t1 #$t0 ← $t1it translates it to: add $t0, $zero, $t1

Other pseudoinstructions: mult, blt, bge, etc.Assembler keeps track of addresses of labels in symbol tableDetails of assembler, linker, & loader are given in Appendix ADetails of MIPS instruction set & architecture in Appendix A% frequency of instruction execution

Instruction Class gcc frequency spice frequencyArithmetic 48% 50%

Data Transfer 33% 41%

Conditional branch 17% 8%

Jump & proc. call 2% 1%

46

Instruction Set Instruction Set Architecture ClassesArchitecture Classes

Use of accumulator (a default register):1 address instruction; e.g. add A: acc ← acc + mem[A]e.g. EDSAC, IBM 701, DEC PDP-8, MC 6800, Intel 8008

Use of stack:0 address instruction; e.g. add: top(stack) ← top(stack) + next_top(stack)

Use of general purpose registers:2 address instruction; e.g. add A, B: A ← A + B3 address instruction; e.g. add A,B,C: A ← B + Cload/store (reg/reg): e.g. MIPS, Sun’s SPARC, MC PowerPC, DEC Alphamemory/memory: e.g. DEC VAXmemory/register: e.g. DEC VAX, IBM 360, DEC PDP-11, MC 68000, Intel 80386

47

RISC vs. CISCRISC vs. CISC

RISC -- Reduced Instruction Set Computer -- philosophy (instruction sets measured by how well compilers used them)

Emphasis on softwareSingle-clock, reduced instruction onlyRegister to register: “LOAD” and “STORE” are independent instructionsLow cycles per secondLarge code sizesSpends more transistors on memory registers

CISC – Complex Instruction Set Computer --Emphasis on hardwareIncludes multi-clock complex instructionsMemory-to-memory: “LOAD” and “STORE” incorporated in instructionsSmall code sizes, high cycles per secondTransistors used for storing complex instructions

48

Chapter SummaryChapter Summary

Instruction TypesInstruction FormatAddressing ModesClasses of Instruction Set ArchitectureRISC vs. CISC