comp 206: computer architecture and implementation

35
This equipment is representative of the tabulating system invented and developed by Herman This equipment is representative of the tabulating system invented and developed by Herman Hollerith (1860-1929) and built for the U.S. Census Bureau. These machines were first used Hollerith (1860-1929) and built for the U.S. Census Bureau. These machines were first used in compiling the 1890 Census. Hollerith's patents were acquired by the Computing- in compiling the 1890 Census. Hollerith's patents were acquired by the Computing- Tabulating-Recording Co. (which later became IBM), and this work became the basis of the Tabulating-Recording Co. (which later became IBM), and this work became the basis of the IBM Punched Card System. The first "tabulator" used simple clock-like counting devices. IBM Punched Card System. The first "tabulator" used simple clock-like counting devices. When an electrical circuit is closed (through a punched hole in a predetermined position on When an electrical circuit is closed (through a punched hole in a predetermined position on the card), each counter is actuated by an electromagnet. The unit's pointer (clock hand) the card), each counter is actuated by an electromagnet. The unit's pointer (clock hand) moves one step each time the magnet is energized. The circuits to the electromagnets are moves one step each time the magnet is energized. The circuits to the electromagnets are closed by means of a hand-operated press type card reader. The operator places each card in closed by means of a hand-operated press type card reader. The operator places each card in the reader, pulls down a lever, and removes the card after each punched hole is counted. the reader, pulls down a lever, and removes the card after each punched hole is counted. - IBM Archives (http://www-03.ibm.com/ibm/history/exhibits/attic/attic_071.html) - IBM Archives (http://www-03.ibm.com/ibm/history/exhibits/attic/attic_071.html)

Upload: denali

Post on 05-Jan-2016

42 views

Category:

Documents


0 download

DESCRIPTION

- PowerPoint PPT Presentation

TRANSCRIPT

Page 1: COMP 206: Computer Architecture and Implementation

This equipment is representative of the tabulating system invented and developed by Herman This equipment is representative of the tabulating system invented and developed by Herman Hollerith (1860-1929) and built for the U.S. Census Bureau. These machines were first used in Hollerith (1860-1929) and built for the U.S. Census Bureau. These machines were first used in compiling the 1890 Census. Hollerith's patents were acquired by the Computing-Tabulating-compiling the 1890 Census. Hollerith's patents were acquired by the Computing-Tabulating-Recording Co. (which later became IBM), and this work became the basis of the IBM Punched Card Recording Co. (which later became IBM), and this work became the basis of the IBM Punched Card System. The first "tabulator" used simple clock-like counting devices. When an electrical circuit is System. The first "tabulator" used simple clock-like counting devices. When an electrical circuit is closed (through a punched hole in a predetermined position on the card), each counter is closed (through a punched hole in a predetermined position on the card), each counter is actuated by an electromagnet. The unit's pointer (clock hand) moves one step each time the actuated by an electromagnet. The unit's pointer (clock hand) moves one step each time the magnet is energized. The circuits to the electromagnets are closed by means of a hand-operated magnet is energized. The circuits to the electromagnets are closed by means of a hand-operated press type card reader. The operator places each card in the reader, pulls down a lever, and press type card reader. The operator places each card in the reader, pulls down a lever, and removes the card after each punched hole is counted. removes the card after each punched hole is counted.

- IBM Archives (http://www-03.ibm.com/ibm/history/exhibits/attic/attic_071.html)- IBM Archives (http://www-03.ibm.com/ibm/history/exhibits/attic/attic_071.html)

Page 2: COMP 206: Computer Architecture and Implementation

2

COMP 206:COMP 206:Computer Architecture and Computer Architecture and ImplementationImplementation

Montek SinghMontek Singh

Jan 27-29, 2009Jan 27-29, 2009

Lecture 4: Instruction Set ArchitectureLecture 4: Instruction Set Architecture

Page 3: COMP 206: Computer Architecture and Implementation

Approaching an ISAApproaching an ISA Instruction Set ArchitectureInstruction Set Architecture

Defines set of operations, instruction format, Defines set of operations, instruction format, hardware supported data types, named storage, hardware supported data types, named storage, addressing modes, sequencingaddressing modes, sequencing

Meaning of each instruction is described by Meaning of each instruction is described by RTL on RTL on architected registersarchitected registers and memory and memory

3

Page 4: COMP 206: Computer Architecture and Implementation

4

Moving Toward DesignMoving Toward Design Given technology constraints assemble adequate Given technology constraints assemble adequate

datapathdatapath Architected storage mapped to actual storageArchitected storage mapped to actual storage Function units to do all the required operationsFunction units to do all the required operations Possible additional storage (eg. MAR)Possible additional storage (eg. MAR) Interconnect to move information among regs and FUsInterconnect to move information among regs and FUs

Map each instruction to sequence of RTLsMap each instruction to sequence of RTLs Collate sequences into symbolic controller state Collate sequences into symbolic controller state

transition diagram (STD)transition diagram (STD) Implement controllerImplement controller

Page 5: COMP 206: Computer Architecture and Implementation

Datapath vs ControlDatapath vs Control

Datapath: Storage, FU, interconnect sufficient to perform the Datapath: Storage, FU, interconnect sufficient to perform the desired functionsdesired functions Inputs are Control PointsInputs are Control Points Outputs are signals (such as overflow, negative, etc)Outputs are signals (such as overflow, negative, etc)

Controller: State machine to orchestrate operation on the data Controller: State machine to orchestrate operation on the data pathpath Based on desired function and signalsBased on desired function and signals

5

Datapath Controller

Control Points

signals

Page 6: COMP 206: Computer Architecture and Implementation

6

ContentsContents Design objectivesDesign objectives Information representationInformation representation

Endian-ness, aligned accessEndian-ness, aligned access

Organization of InstructionsOrganization of Instructions EncodingEncoding

Page 7: COMP 206: Computer Architecture and Implementation

7

Instruction Set Design Objective Instruction Set Design Objective #1#1Code size (code density):Code size (code density):

Depends on:Depends on:size of MM/cachesize of MM/cacheaccess time of cache (on-chip/off-chip)access time of cache (on-chip/off-chip)CPU-MM bandwidthCPU-MM bandwidth

Frequently used instructions should be shortFrequently used instructions should be short Implies variable-length instructionsImplies variable-length instructions

But there are negatives to thisBut there are negatives to this

Page 8: COMP 206: Computer Architecture and Implementation

Instruction Set Design Objective Instruction Set Design Objective #2#2Execution speed (performance) :Execution speed (performance) :

Only frequently executed instructions should be included in the Only frequently executed instructions should be included in the instruction setinstruction set Infrequently executed instructions slow down the othersInfrequently executed instructions slow down the others Complex and long instructions tend to be used infrequentlyComplex and long instructions tend to be used infrequently Defining hardware-software interfaceDefining hardware-software interface

Frequently executed instructions should be fastFrequently executed instructions should be fast Pipelining should be made as easy as possiblePipelining should be made as easy as possible

Overlapped execution lowers CPI valueOverlapped execution lowers CPI value Single instruction length, simple instruction formats, and few Single instruction length, simple instruction formats, and few

addressing modes for easy decodingaddressing modes for easy decoding Three (register) address instructions decouple CPU and memoryThree (register) address instructions decouple CPU and memory

8

Page 9: COMP 206: Computer Architecture and Implementation

9

Instruction Set Design Objective Instruction Set Design Objective #3#3Minimize size and complexity of hardware Minimize size and complexity of hardware

(ALU/Control) (ALU/Control) Implementing infrequently executed instructions ties Implementing infrequently executed instructions ties

down hardware that is rarely used, and could be used down hardware that is rarely used, and could be used for some other purpose with greater advantagefor some other purpose with greater advantage

Page 10: COMP 206: Computer Architecture and Implementation

Instruction Set Design Objective Instruction Set Design Objective #4#4Instruction set as a programming language Instruction set as a programming language

Needs of a human programmer (less important today)Needs of a human programmer (less important today) Several desirable properties of instruction sets have been recognized and Several desirable properties of instruction sets have been recognized and

described, such as described, such as orthogonalityorthogonality (each operand can be specified (each operand can be specified independently of the others) and independently of the others) and consistencyconsistency (being able to predict the (being able to predict the remainder of an architecture given partial knowledge of the system)remainder of an architecture given partial knowledge of the system)

Needs of an optimizing compilerNeeds of an optimizing compiler Simple instructions are more suitable for code optimizationsSimple instructions are more suitable for code optimizations Optimizing compilers try to find the shortest or fastest code sequence that Optimizing compilers try to find the shortest or fastest code sequence that

implements the semantics of a HLL program. To make code implements the semantics of a HLL program. To make code reorganization tractable, an instruction set is needed that makes:reorganization tractable, an instruction set is needed that makes:

– the size of each instruction easy to calculate;the size of each instruction easy to calculate;– the execution time of each instruction easy to calculate;the execution time of each instruction easy to calculate;– the interactions between instructions easy to figure out.the interactions between instructions easy to figure out.

ISA features such as complex addressing modes, variable length ISA features such as complex addressing modes, variable length instructions, special-purpose registers provide too many ways of doing the instructions, special-purpose registers provide too many ways of doing the same thing and lead to combinatorial explosionsame thing and lead to combinatorial explosion

10

Page 11: COMP 206: Computer Architecture and Implementation

Notations for Information RepresentationNotations for Information Representation

11

64 bits

8 bytes

2 words

1 doubleword

Q: How do we number these various units of information in a consistent manner?

9 6 2 1 7 6 6

Most Significant Digit (MSD)“Big End”

Most Significant Digit (MSD)“Big End”

Least Significant Digit (LSD)“Little End”

Least Significant Digit (LSD)“Little End”

0 1 2 3 4 5 6“Big End”-ian Numbering

6 5 4 3 2 1 0 “Little End”-ian Numbering

“On holy wars and a plea for peace”, Danny Cohen, IEEE Computer 14(10), pages 49-54, Oct 1981

Page 12: COMP 206: Computer Architecture and Implementation

Why Is Numbering Important?Why Is Numbering Important? English text is written left-to-right and the characters are English text is written left-to-right and the characters are

numbered left-to-rightnumbered left-to-right Numbers can be numbered in two different waysNumbers can be numbered in two different ways Memory locations are numbered (addresses)Memory locations are numbered (addresses) Consequences of numberingConsequences of numbering

Data is stored in memory according to byte numbering (the lower-numbered Data is stored in memory according to byte numbering (the lower-numbered byte goes into a byte in memory with a smaller address)byte goes into a byte in memory with a smaller address)

Data is sent through a bit-serial communication channel according to bit Data is sent through a bit-serial communication channel according to bit numbering (bit 0 goes first, followed by bit 1, etc.)numbering (bit 0 goes first, followed by bit 1, etc.)

When displaying computer representation for humansWhen displaying computer representation for humans Numbers are written in the usual way (MSD on left, LSD on right)Numbers are written in the usual way (MSD on left, LSD on right) Text is written in such a way as to Text is written in such a way as to match the numbering of numbersmatch the numbering of numbers

12

Page 13: COMP 206: Computer Architecture and Implementation

Odds and Ends about NumberingOdds and Ends about Numbering The Little Endian notation is compatible with The Little Endian notation is compatible with

mathematical conventions of positional notationmathematical conventions of positional notation The Little Endian notation has the disadvantage that is The Little Endian notation has the disadvantage that is

displays English text in reversedisplays English text in reverse To overcome this, manuals for Little Endian machines usually display To overcome this, manuals for Little Endian machines usually display

character strings verticallycharacter strings vertically Example machinesExample machines

Little Endian: PDP-11, VAX, 80x86Little Endian: PDP-11, VAX, 80x86 Big Endian: IBM 370, MIPS, DLX, SPARCBig Endian: IBM 370, MIPS, DLX, SPARC Mixed: Motorola 68000, Z8000Mixed: Motorola 68000, Z8000

Big Endian byte orderingBig Endian byte ordering Little Endian bit ordering Little Endian bit ordering

13

Page 14: COMP 206: Computer Architecture and Implementation

Alignment of Words in MemoryAlignment of Words in Memory

CPU accesses a 32-bit word of data starting at byte address x…x00CPU accesses a 32-bit word of data starting at byte address x…x00 Such an address (multiple of 32[b]/8[b/B] = 4[B]) is called word-alignedSuch an address (multiple of 32[b]/8[b/B] = 4[B]) is called word-aligned Memory controller is simple and fast, data available in one cycleMemory controller is simple and fast, data available in one cycle

CPU accesses a 32-bit word of data starting at byte address 01111CPU accesses a 32-bit word of data starting at byte address 01111 Byte addresses are 0Byte addresses are 011111111, 10000, 1, 10000, 100010001, 1, 100100010 (misaligned address) (misaligned address) Doubles the access time of wordDoubles the access time of word

Requiring aligned addresses results in simpler memory controller and Requiring aligned addresses results in simpler memory controller and faster executionfaster execution

Costs some loss of storage, and adds complexity in code generatorsCosts some loss of storage, and adds complexity in code generators

14

32 bits

MemBank

00

8

MemBank

01

8

MemBank

10

8

MemBank

11

8

MemoryController

Page 15: COMP 206: Computer Architecture and Implementation

Sub-Word AccessesSub-Word Accesses

Byte operand in register is usually the Byte operand in register is usually the rightmost byterightmost byte of register of register Byte may come from Byte may come from any of the four memory banksany of the four memory banks Needs routing/permuting hardwareNeeds routing/permuting hardware

Either at memory side of bus (justified bus)Either at memory side of bus (justified bus)

Byte always travels on rightmost quarter of busByte always travels on rightmost quarter of bus Or on CPU side (unjustified bus)Or on CPU side (unjustified bus)

Bus lanes are extensions of memory bank lanesBus lanes are extensions of memory bank lanes

Source of complications in either caseSource of complications in either case 15

32 bits

MemBank

00

8

MemBank

01

8

MemBank

10

8

MemBank

11

8

MemoryController

CPURegister

File(32 bits)

Page 16: COMP 206: Computer Architecture and Implementation

What is Used?What is Used?

16SPEC2000

Page 17: COMP 206: Computer Architecture and Implementation

Organization of an InstructionOrganization of an Instruction

U n ifo rm V a ria b le

L eng th

O p co de

N u m b er W h e re H o w spe c if ied

O p e ra n ds S p e c if ie rs

F o rm at

S yn tax

P ro cess ing D a ta m ove m e nt T ra n s fe r o f co n tro l I/O o p era tio ns

S e m an tics

M a ch ine in s tru c tion

ArithmeticLogicalShift

(e.g., MIPS:4 bytes)

(e.g., VAX:1-37 bytes)

0 address1 address2 address3 addressimplied

InstructionRegisterMemory

Addressing modes•immediate•absolute•computed

Load (from MM)Store (to MM)Move (reg-reg)Move (MM-MM)

Unconditional (branch)Conditional (jump)CallReturn

If I/O is notmemory-mapped

1) Length of operands2) Shift/rotate: direction, amount3) Branch condition

Page 18: COMP 206: Computer Architecture and Implementation

Operand LocationsOperand Locations

18

Page 19: COMP 206: Computer Architecture and Implementation

Classification by OperandsClassification by OperandsStack Accumulator General Purpose Register

Load/Store Reg/Mem Mem/MemALU operations 0 address 1 address 3 address 2 (or 1.5) address 3 addressExplicit operands (1,1) (0,3) (1,2), (1, 3), (2, 2) (3, 3)Instruction size Short Short 4 bytes 2/4/6 bytes variableNeeds separate Load/Store Load/Store Load/Store StoreEarly examples Burroughs PDP-8 CDC 6600 IBM S/360 DEC VAX-11/780

B5000- Intel 8086 IBM S/370B7500 Motorola 6809

Current examples Transputer All RISC machines IBM 3033, IBM S/390Amdahl VHitachi, Fujitsu

Orthogonality Farthest from Intermediate Closest toPipelining Easiest Intermediate Hardest

19

Important machines that are difficult to classifyImportant machines that are difficult to classify Intel 80x86Intel 80x86

variable instruction size: 1-17 bytesvariable instruction size: 1-17 bytes memory can be destinationmemory can be destination uses implied registersuses implied registers

Motorola 680x0Motorola 680x0 Instruction size: 2, 4, 6, 8, 10 bytesInstruction size: 2, 4, 6, 8, 10 bytes Two address format only (2, 2)Two address format only (2, 2)

(m,n) means m memory operands n total operands

(m,n) means m memory operands n total operands

Page 20: COMP 206: Computer Architecture and Implementation

Registers versus CacheRegisters versus Cache SimilaritiesSimilarities

Both small, fast, and expensive (flip-flops)Both small, fast, and expensive (flip-flops) Both used to increase execution speed of CPUBoth used to increase execution speed of CPU Both operate based on locality of referenceBoth operate based on locality of reference

DifferencesDifferences Registers are visible in ISA; caches are not (except for instructions for Registers are visible in ISA; caches are not (except for instructions for

invalidation, prefetch, or flushing)invalidation, prefetch, or flushing) Number of registers is fixed by instruction format; size of cache is easily Number of registers is fixed by instruction format; size of cache is easily

changeablechangeable Registers have higher BW: 3 words/cycle, and are random-access; caches Registers have higher BW: 3 words/cycle, and are random-access; caches

have lower BW: 1 word/cycle, and are associativehave lower BW: 1 word/cycle, and are associative Register access time is fixed; cache access time is statisticalRegister access time is fixed; cache access time is statistical Register allocation is explicit by compiler; cache allocation is automaticRegister allocation is explicit by compiler; cache allocation is automatic Registers require fewer bits to address; caches require full memory addressesRegisters require fewer bits to address; caches require full memory addresses Registers create no I/O problems; caches doRegisters create no I/O problems; caches do

20

Page 21: COMP 206: Computer Architecture and Implementation

Organization of RegistersOrganization of Registers One general-purpose set (all interchangeable, “typeless”)One general-purpose set (all interchangeable, “typeless”) One general-purpose set (a few with dedicated uses)One general-purpose set (a few with dedicated uses)

PDP-11: eight 16-bit registers (R6: stack pointer, R7: PC)PDP-11: eight 16-bit registers (R6: stack pointer, R7: PC) VAX 11/780: sixteen 32-bit registers (four special-purpose, R14: stack pointer, VAX 11/780: sixteen 32-bit registers (four special-purpose, R14: stack pointer,

R15: PC)R15: PC) Two setsTwo sets

Motorola 68000: eight 32-bit data, eight 32-bit addressMotorola 68000: eight 32-bit data, eight 32-bit address IBM 370: sixteen 32-bit integer, four 64-bit FPIBM 370: sixteen 32-bit integer, four 64-bit FP DLX, MIPS: 31 32-bit integer, 32 32-bit FPDLX, MIPS: 31 32-bit integer, 32 32-bit FP

Three setsThree sets CDC 6600: eight 18-bit integer, eight 18-bit address, eight 60-bit FPCDC 6600: eight 18-bit integer, eight 18-bit address, eight 60-bit FP

Many registers with dedicated useMany registers with dedicated use Intel 80x86Intel 80x86

21

Page 22: COMP 206: Computer Architecture and Implementation

Addressing ModesAddressing Modes

We can’t directly refer to data values, only their addressesWe can’t directly refer to data values, only their addresses Except for immediate operandsExcept for immediate operands

Register deferredRegister deferred and and directdirect addressing modes can be synthesized from addressing modes can be synthesized from displacementdisplacement addressing mode addressing mode

22

Name Example Meaning When usedRegister add r4, r3 R[r4] := R[r4]+R[r3] When value is in registerImmediate add r4, #3 R[r4] := R[r4]+3 For constantsDisplacement add r4, 100(r1) R[r4] := R[r4]+M[100+R[r1]] Accessing local variablesRegister deferred add r4, (r1) R[r4] := R[r4] + M[R[r1]] Pointer, computed addressIndexed add r3, (r1+r2) R[r3] := R[r3]+M[R[r1]+R[r2]] Array addressingDirect add r1, (1001) R[r1] := R[r1]+M[1001] Static dataMemory indirect add r1, @(r3) R[r1] := R[r1]+M[M[R[r3]]] Pointer dereferencingAutoincrement add r1, (r2)+ R[r1] := R[r1]+M[R[r2]]; R[r2] := R[r2]+ d Stepping through arrayAutodecrement add r1, -(r2) R[r2] := R[r2]-d ; R[r1] := R[r1]+M[R[r2]] Stepping through arrayScaled add r1, 100(r2)[r3] R[r1] := R[r1]+M[100+R[r2]+d *R[r3]] Array indexing

Name Example Meaning When usedRegister add r4, r3 R[r4] := R[r4]+R[r3] When value is in registerImmediate add r4, #3 R[r4] := R[r4]+3 For constantsDisplacement add r4, 100(r1) R[r4] := R[r4]+M[100+R[r1]] Accessing local variablesRegister deferred add r4, (r1) R[r4] := R[r4] + M[R[r1]] Pointer, computed addressIndexed add r3, (r1+r2) R[r3] := R[r3]+M[R[r1]+R[r2]] Array addressingDirect add r1, (1001) R[r1] := R[r1]+M[1001] Static dataMemory indirect add r1, @(r3) R[r1] := R[r1]+M[M[R[r3]]] Pointer dereferencingAutoincrement add r1, (r2)+ R[r1] := R[r1]+M[R[r2]]; R[r2] := R[r2]+ d Stepping through arrayAutodecrement add r1, -(r2) R[r2] := R[r2]-d ; R[r1] := R[r1]+M[R[r2]] Stepping through arrayScaled add r1, 100(r2)[r3] R[r1] := R[r1]+M[100+R[r2]+d *R[r3]] Array indexing

R : the register fileM: the memory address spaced : the size of the data item being accessed (1, 2, 4, 8 bytes)

R : the register fileM: the memory address spaced : the size of the data item being accessed (1, 2, 4, 8 bytes)

Page 23: COMP 206: Computer Architecture and Implementation

Frequency of Addressing ModesFrequency of Addressing Modes

23SPEC2000

Register account for ½. This is the other ½.

Page 24: COMP 206: Computer Architecture and Implementation

24

Address Displacement SizesAddress Displacement Sizes

This type of data would help you decide how much This type of data would help you decide how much space to allocate to displacement. Tested on a machine space to allocate to displacement. Tested on a machine w/ 16 bits of displacement, so can’t evaluate more.w/ 16 bits of displacement, so can’t evaluate more.

SPEC2000

Page 25: COMP 206: Computer Architecture and Implementation

Use of Immediate OperandsUse of Immediate Operands

25

Page 26: COMP 206: Computer Architecture and Implementation

26

Length of Immediate Oper.Length of Immediate Oper.

Max size was 16. HP book says that a study on Max size was 16. HP book says that a study on VAX (32-bit imm.) showed 20-25% were longer VAX (32-bit imm.) showed 20-25% were longer than 16 bitsthan 16 bits

Page 27: COMP 206: Computer Architecture and Implementation

27

Control Transfer InstructionsControl Transfer InstructionsTerminologyTerminology

BTA (Branch Target Address): The destination address of the branchBTA (Branch Target Address): The destination address of the branch The BTA is The BTA is staticstatic if it is always the same during execution if it is always the same during execution The BTA is The BTA is dynamicdynamic if it can vary during a single execution of a program if it can vary during a single execution of a program

(procedure return, O-O dynamic dispatch, (procedure return, O-O dynamic dispatch, switchswitch statements are major statements are major examples)examples)

Branch Branch takentaken if next instruction to be executed is at address BTA if next instruction to be executed is at address BTA Branch Branch not takennot taken if next instruction to be executed is the one following the if next instruction to be executed is the one following the

branch instruction (“fall-through”)branch instruction (“fall-through”) Branch Branch outcomeoutcome: whether the branch is taken or not taken: whether the branch is taken or not taken Forward branch: BTA > (PC), where (PC) is the address of the branch Forward branch: BTA > (PC), where (PC) is the address of the branch

instructioninstruction Backward branch: BTA < (PC)Backward branch: BTA < (PC) An unconditional branch is always takenAn unconditional branch is always taken

Page 28: COMP 206: Computer Architecture and Implementation

Code Generation Examples for BranchesCode Generation Examples for Branches

28

if (x > 0) y += z; else y -=z;if (x > 0) y += z; else y -=z;

blez r7, L18addu r3, r3, r4j L33L18:subu r3, r3, r4L33:

blez r7, L18addu r3, r3, r4j L33L18:subu r3, r3, r4L33:

while (a < b) { a++; b--; x++;}

while (a < b) { a++; b--; x++;}

j L33L34:addu r5, r5, 1addu r6, r6, -1addu r7, r7, 1L33:slt r2, r5, r6bne r2, r0, L34

j L33L34:addu r5, r5, 1addu r6, r6, -1addu r7, r7, 1L33:slt r2, r5, r6bne r2, r0, L34

Register r3 contains yRegister r4 contains zRegister r5 contains aRegister r6 contains bRegister r7 contains x

Register r3 contains yRegister r4 contains zRegister r5 contains aRegister r6 contains bRegister r7 contains x

Page 29: COMP 206: Computer Architecture and Implementation

29

Classification of BranchesClassification of Branches

HP terminology Branch Jump Call ReturnConditional Unconditional Unconditional Unconditional

HLL equivalent IF-THEN GOTO CALL RETURNRelative freq. 83% 5% 6% 6%Taken With probability T always always alwaysNot taken With probability 1-T never never neverBTA static most often (PC-relative) PC-relative most frequent neverBTA dynamic usually not allowed BTA in register BTA in register always

Taken Not TakenF&T F&NT ForwardB&T B&NT Backward

Classifying branches into these fourgroups permits us to compute some of thedynamic frequencies if some others have been measured.

Rule of thumb: Backward branches tend to be taken, forward branches tend not to be taken. Why?

Page 30: COMP 206: Computer Architecture and Implementation

Evaluating Branch ConditionsEvaluating Branch Conditions

30

Name How is condition tested? Advantages DisadvantagesCondition code Special bits set by ALU ops Sometimes condition is Extra state, additional constraints

set for free on instruction reorderingCondition register Test arbitrary register Simple Uses up a register

with result of comparisonCompare and branch Compare is part of branch One instruction rather May be too much work

than two per instruction

Typical set of condition codes (e.g., Motorola 680x0)Typical set of condition codes (e.g., Motorola 680x0) NegativeResult, ZeroResult, ArithmeticOverflow, CarryOutNegativeResult, ZeroResult, ArithmeticOverflow, CarryOut

Many RISC machines do not use condition codes (e.g., MIPS, Alpha)Many RISC machines do not use condition codes (e.g., MIPS, Alpha) Magnitude comparisons are done with explicit COMPARE instructions that put Magnitude comparisons are done with explicit COMPARE instructions that put

their results into named registerstheir results into named registers Some instructions have two variants: one traps on overflow, the other does notSome instructions have two variants: one traps on overflow, the other does not

Page 31: COMP 206: Computer Architecture and Implementation

Branch DistanceBranch Distance

31

Page 32: COMP 206: Computer Architecture and Implementation

32

Instruction EncodingInstruction Encoding

These days encoding more important for embedded These days encoding more important for embedded processors. PowerPC processors. PowerPC compresses code in memorycompresses code in memory, , uncompresses in icache.uncompresses in icache.

Page 33: COMP 206: Computer Architecture and Implementation

““Typical” RISC ISATypical” RISC ISA 32-bit fixed format instruction (3 formats)32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take pair)32 32-bit GPR (R0 contains zero, DP take pair) 3-address, reg-reg arithmetic instruction3-address, reg-reg arithmetic instruction Single address mode for load/store: Single address mode for load/store:

base + displacementbase + displacement no indirectionno indirection

Simple branch conditionsSimple branch conditions Delayed branchDelayed branch

33

see: SPARC, MIPS, HP PA-Risc, DEC Alpha, IBM PowerPC, CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-3

Page 34: COMP 206: Computer Architecture and Implementation

Example: MIPSExample: MIPS

34

Op

31 26 01516202125

Rs1 Rd immediate

Op

31 26 025

Op

31 26 01516202125

Rs1 Rs2

target

Rd Opx

Register-Register

561011

Register-Immediate

Op

31 26 01516202125

Rs1 Rs2/Opx immediate

Branch

Jump / Call

Page 35: COMP 206: Computer Architecture and Implementation

35

Next TimeNext Time PipeliningPipelining

If you’ve never looked at pipelining, read If you’ve never looked at pipelining, read Appendix A, otherwise skimAppendix A, otherwise skim