graduate computer architecture chapter 2
TRANSCRIPT
1
Graduate Computer Architecture
Chapter 2 âInstruction Set Principles
2
Outline
Classifying instruction set architecturesMemory addressingAddressing modesType and size of operands Instructions for controlThe role of compiler
3
Brief Introduction to ISA
Instruction Set Architecture: a set of instructionsâEach instruction is directly executed by the CPUâs hardware
How is it represented?âBy a binary format since the hardware understands only bits
Concatenate together binary encoding for instructions, registers,constants, memories
4
Brief Introduction to ISA (cont.)
Options - fixed or variable length formatsâ Fixed - each instruction encoded in same size field (typically 1 word)â Variable âhalf-word, whole-word, multiple word instructions are possible
Typical physical blobs are bits, bytes, words, n-words Word size is typically 16, 32, 64 bits today
5
An Example of Program Execution
Commandâ Load AC from
Memoryâ Add to AC from
memoryâ Store AC to memory
Add the contents ofmemory 940 to thecontent of memory941 and stores theresult at 941
Fetch Execution
6
Instruction Set Design
timeCycleCPIICTimeCPU _**_ The instruction set influences everything
7
Classifying Instruction Set Architectures
How is typing done? How is the sizespecified
Type and size of operands
What are the options for the opcode?Operations
How is the effective address for anoperand calculated?Can all operands use any mode?
Addressing Modes
How many? Min, Max - maybe evenaverage
Number of explicitoperandsnamed per instruction
Where are they other than memoryOperand Storage in CPU
These choices critically affect - #instructions, CPI, andcycle time
8
Basic CPU Storage Options
9
Comparison
10
Classifying ISAsAccumulator (before 1960):
1 address add A acc acc + mem[A]
Stack (1960s to 1970s):0 address add tos tos + next
Memory-Memory (1970s to 1980s):2 address add A, B mem[A] mem[A] + mem[B]3 address add A, B, C mem[A] mem[B] + mem[C]
Register-Memory (1970s to present):2 address add R1, A R1 R1 + mem[A]
load R1, A R1 <_ mem[A]
Register-Register (Load/Store) (1960s to present):3 address add R1, R2, R3 R1 R2 + R3
load R1, R2 R1 mem[R2]store R1, R2 mem[R1] R2
11
Classifying ISAs
12
Stack Architectures Instruction set:
add, sub, mult, div, . . .push A, pop A
Example: A*B - (A+C*B)push Apush Bmulpush Apush Cpush Bmuladdsub
A BA
A*BA*B
A*BA*B
AAC
A*BA A*B
A C B B*C A+B*C result
13
Stacks: Pros and Cons Pros
â Good code density (implicit operand addressing top of stack)â Low hardware requirementsâ Easy to write a simpler compiler for stack architectures
Consâ Stack becomes the bottleneckâ Little ability for parallelism or pipeliningâ Data is not always at the top of stack when need, so additional
instructions like TOP and SWAP are neededâ Difficult to write an optimizing compiler for stack architectures
14
Accumulator Architectures
âąInstruction set:add A, sub A, mult A, div A, . . .load A, store A
âąExample: A*B - (A+C*B)load Bmul Cadd Astore Dload Amul Bsub D
B B*C A+B*C AA+B*C A*B result
15
Accumulators: Pros and Cons
âąProsâVery low hardware requirementsâEasy to design and understand
âąConsâAccumulator becomes the bottleneckâLittle ability for parallelism or pipeliningâHigh memory traffic
16
Memory-Memory Architectures
âąInstruction set:(3 operands) add A, B, C sub A, B, C mul A, B, C
âąExample: A*B - (A+C*B)â3 operands
mul D, A, Bmul E, C, Badd E, A, Esub E, D, E
17
Memory-Memory:Pros and Cons
âąProsâRequires fewer instructions (especially if 3
operands)âEasy to write compilers for (especially if 3
operands)âąCons
âVery high memory traffic (especially if 3 operands)âVariable number of clocks per instruction
(especially if 2 operands)âWith two operands, more data movements are
required
18
Register-Memory Architectures
âąInstruction set:add R1, A sub R1, A mul R1, Bload R1, A store R1, A
âąExample: A*B - (A+C*B)load R1, Amul R1, B /* A*B */store R1, Dload R2, Cmul R2, B /* C*B */add R2, A /* A + CB */sub R2, D /* AB - (A + C*B) */
19
Memory-Register:Pros and Cons
âąProsâSome data can be accessed without loading firstâInstruction format easy to encodeâGood code density
âąConsâOperands are not equivalent (poor orthogonality)âVariable number of clocks per instruction
20
Load-Store Architectures
âąInstruction set:add R1, R2, R3 sub R1, R2, R3 mul R1, R2, R3load R1, R4 store R1, R4
âąExample: A*B - (A+C*B)load R1, &Aload R2, &Bload R3, &Cload R4, R1load R5, R2load R6, R3mul R7, R6, R5 /* C*B */add R8, R7, R4 /* A + C*B */mul R9, R4, R5 /* A*B */sub R10, R9, R8 /* A*B - (A+C*B) */
21
Load-Store:Pros and Cons
âąProsâSimple, fixed length instruction encodingâInstructions take similar number of cyclesâRelatively easy to pipeline
âąConsâHigher instruction countâNot all instructions need three operandsâDependent on good compiler
22
Registers:Advantages and Disadvantages
âąAdvantagesâFaster than cache (no addressing mode or tags)âCan replicate (multiple read ports)âShort identifier (typically 3 to 8 bits)âReduce memory traffic
âąDisadvantagesâNeed to save and restore on procedure calls and context
switchâCanât take the address of a register (for pointers)âFixed size (canât store strings or structures efficiently)âCompiler must manage
23
Proâs and Conâs of Stack, Accumulator,Register Machine
24
It is the most common choice in todayâs general-purpose computers
Which register is specified by small âaddressâ(3 to6 bits for 8 to 64 registers)
Load and store have one long & one short address:One and half addresses
General Register Machine and InstructionFormats
25
Real Machines Are Not So Simple
Most real machines have a mixture of address instructionsA distinction can be made on whether arithmetic
instructions use data from memory If ALU instructions only use registers for operands and
result, machine type is load-storeâOnly load and store instructions reference memory
Other machines have a mix of register-memory andmemory-memory instructions
26
Combinations of Number of Memory Addressesand Operands Allowed
VAXMemory-memory33
VAXMemory-memory22
IBM 360/370, Intel80x86, Motorola68000, TITMS320C54x
Register-memory21
Alpha, ARM, MIPS,PowerPC, SPARC,SuperH, TrimediaTM5200
Register-register30
ExamplesTypes ofarchitecture
Maximum numberof operandsallowed
Number ofmemoryaddress
27
Compare Three Common General -PurposeRegister Computers
where (m,n) means m memory operands and n total operands
28
Outline
Classifying instruction set architecturesMemory addressingAddressing modesType and size of operands Instructions for controlThe role of compiler
29
Memory Addressing
How memory addresses are interpretedâEndian orderâAlignment
How architectures specify the address of an objectthey will accessâAddressing modes
30
Memory Addressing (cont.)
All instruction sets discussed in this book are byteaddressed
The instruction sets provide access for bytes (8 bits), halfwords (16 bits), words (32 bits), and even double words (64bits)
Two conventions for ordering the bytes within a larger objectâ Little Endianâ Big Endian
31
Little Endian
The low-order byte of an object is stored in memory at thelowest address, and the high-order byte at the highestaddress. (The little end comes first.)
For example, a 4-byte objectâ (Byte3 Byte2 Byte1 Byte0)â Base Address+0 Byte0â Base Address+1 Byte1â Base Address+2 Byte2â Base Address+3 Byte3
Intel processors (those used in PC's) use "Little Endian" byteorder.
Dr. William T. Verts An Essay on Endian Order, http://www.cs.umass.edu/~verts/cs32/endian.html, April 19, 1996
32
Big Endian
The high-order byte of an object is stored in memory at thelowest address, and the low-order byte at the highestaddress. (The big end comes first.)
For example, a 4-byte objectâ (Byte3 Byte2 Byte1 Byte0)â Base Address+0 Byte3â Base Address+1 Byte2â Base Address+2 Byte1â Base Address+3 Byte0
Dr. William T. Verts An Essay on Endian Order, http://www.cs.umass.edu/~verts/cs32/endian.html, April 19, 1996
33
Endian Order is Also Important to File Data
Adobe Photoshop -- Big Endian BMP (Windows and OS/2 Bitmaps) -- Little Endian DXF (AutoCad) -- Variable GIF -- Little Endian JPEG -- Big Endian PostScript -- Not Applicable (text!) Microsoft RIFF (.WAV & .AVI) -- Both, Endian identifier encoded into file Microsoft RTF (Rich Text Format) -- Little Endian TIFF -- Both, Endian identifier encoded into file
Dr. William T. Verts An Essay on Endian Order, http://www.cs.umass.edu/~verts/cs32/endian.html, April 19, 1996
34
MIPS requires that all words start at addresses thatare multiples of 4 bytes
Called Alignment: objects must fall on address thatis multiple of their size
0 1 2 3Aligned
NotAligned
A Note about Memory: Alignment
35
Alignment IssuesâąIf the architecture does not restrict memory accesses to be
aligned thenâSoftware is simpleâHardware must detect misalignment and make 2 memory accessesâExpensive detection logic is requiredâAll references can be made slower
âąSometimes unrestricted alignment is required for backwardscompatibility
âąIf the architecture restricts memory accesses to be aligned thenâSoftware must guarantee alignmentâHardware detects misalignment access and trapsâNo extra time is spent when data is aligned
âąSince we want to make the common case fast, having restrictedalignment is often a better choice, unless compatibility is anissue
36
Memory Addressing
Alignment restrictionsâAccesses to objects larger than a byte must be alignedâAn access to an object of size s bytes at byte address A is
aligned if A mod s = 0âA misaligned access takes multiple aligned memory references
37
Outline
Classifying instruction set architecturesMemory addressingAddressing modesType and size of operands Instructions for controlThe role of compiler
38
Memory Addressing
All architectures must address memory
A number of questions naturally arise What is accessed - byte, word, multiple words?
â todayâs machine are byte addressable this is a legacy and probably makes little sense otherwise
â main memory is really organized in n byte lines e.g. the cache model
Hence there is a natural alignment problemâ accessing a word or double-word which crosses 2 lines
requires 2 referencesâ automagic alignment is possible but hides the number of references
also therefore hides an important case of CPI bloat hence a bad idea - guess which company does this?
39
Addressing Modes
An important aspect of ISA designâhas major impact on both the HW complexity and the ICâHW complexity affects the CPI and the cycle time
Basically a set of mappingsâfrom address specified to address usedâaddress used = effective addressâeffective address may go to memory or to a register array
which is typically dependent on itâs location in the instruction field in some modes multiple fields are combined to form a memory address register addresses are usually more simple - e.g. they need to be fast
âeffective address generation is an important focus since it is the common case - e.g. every instruction needs it it must also be fast
40
Example for Addressing Modes
Accessing using a pointer ora computed address
Regs[R4] <- Regs[R4] +Mem[Regs[R1]]
Add R4, (R1)Register indirect
Accessing local variables (+simulates register redirect,direct addressing modes)
Regs[R4] <- Regs[R4] +Mem[100 + Regs[R1]]
Add R4,100(R1)Displacement
For constantsRegs[R4] <- Regs[R4] + 3Add R4,#3Immediate
When a value is in a registerRegs[R4] <- Regs[R4] +Regs[R3]
Add R4,R3Register
When usedMeaningExampleinstruction
Addressing mode
41
Example for Addressing Modes (cont.)
If R3 is the address of apointer p, the mode yields *p
Regs[R1] <- Regs[R1] +Mem[Mem[Regs[R3]]]
Add R1,@(R3)Memory indirect
Sometimes useful foraccessing static data;address constant may needto be large
Regs[R1] <- Regs[R1] +Mem[1001]
Add R1,(1001)Direct or absolute
Sometimes useful in arrayaddressing: R1 = base ofarray; R2= index amount
Regs[R3] <- Regs[R3] +Mem[Regs[R1] + Regs[R2]]
Add R3,(R1+R2)Indexed
When usedMeaningExampleinstruction
Addressing mode
42
Example for Addressing Modes (cont.)
Used to index arrays. May beapplied to any indexedaddressing mode in somecomputers
Regs[R1] <- Regs[R1] +Mem[100 + Regs[R2] +Regs[R3] * d]
AddR1,100(R2)[R3]
Scaled
Same use as autoincrement.Autodecrement/-increment canalso act as push/pop toimplement a stack
Regs[R2] <- Regs[R2] âdRegs[R1] <- Regs[R1] +Mem[Regs[R2]]
Add R1,-(R2)Autodecrement
Useful for stepping througharrays within a loop. R2 points tostart of array; each referenceincrements R2 by size of anelement, d
Regs[R1] <- Regs[R1] +Mem[Regs[R2]]Regs[R2] <- Regs[R2] + d
Add R1,(R2)+Autoincrement
When usedMeaningExampleinstruction
Addressing mode
43
Summary of Use of Memory Addressing Modedisplacement, immediate, and registerindirect addressing modes represent 75%to 99% of the addressing mode usage
For VAX architecture
44
MIPS implements only displacementâWhy? Experiment on VAX (ISA with every mode) found
distributionâDisp: 61%, reg-ind: 19%, scaled: 11%, mem-ind: 5%, other:
4%â80% use small displacement or register indirect (displacement 0)
I-type instructions: 16-bit displacementâIs 16-bits enough?âYes? VAX experiment showed 1% accesses use displacement
>16
Example: MIPS Addressing Modes
Op(6)
31 26 01516202125
Rs(5) Rd(5) Immediate(16)
45
Determining Field Size
Analyze your programsâusing dynamic tracesâproper application mixâoptimizing compiler
Chooseâdisplacement field sizeâimmediate or literal field sizeâaddress modesâregister file size and structure
Consider cost of these choicesâdatapathCPI and cycle timeâcode density and encoding
46
Displacement Addressing Mode
Whatâs an appropriate range of the displacements?
For Alpha architecture
The size of address should beat least 12-16 bits, whichcapture 75% to 99% of thedisplacements
47
Immediate or Literal Addressing Mode
Does the mode need to be supported for all operations or foronly a subset?
48
Immediate Addressing Mode (cont.)
Whatâs a suitable range of values for immediates?
For Alpha architecture
The size of the immediate fieldshould be at least 8-16 bits,which capture 50% to 80% ofthe immediates
49
Types of Operations
Arithmetic and Logic: AND, ADD Data Transfer: MOVE, LOAD, STORE Control BRANCH, JUMP, CALL System OS CALL, VM Floating Point ADDF, MULF, DIVF Decimal ADDD, CONVERT String MOVE, COMPARE Graphics (DE)COMPRESS
50
Distribution of Data Accessesby Size
51
Addressing Modes for the DSP World
Data is essentially an infinite streamâ hence model memory as a circular buffer
register holds a pointer 2 other registers hold start and end mark autoincrement and autodecrement detect end and then reset
â hence modulo or circular addressing mode FFT is an important application
â FFT shuffle or butterfly reverses the bit order of the effective address
â bit-reverse mode hw reverses the low order bits of an address number of bits reversed is a parameter depending on which step of`the FFT
algorithm youâre in at the time
Importance - 54 DSP algoâs on a TI C54x DSPâ immediate, displacement, register indirect, direct = 70%â auto inc/dec = 20% and the rest counts for < 10%
52
Addressing Modes for Signal Processing
DSPs deal with infinite, continuous streams of data,they routinely rely on circular buffersâModulo or circular addressing mode
For Fast Fourier Transform (FFT)âBit reverse addressingâ0112 1102
53
Frequency of Addressing Modes for TITMS320C54x DSP
54
Outline
Classifying instruction set architecturesMemory addressingAddressing modesType and size of operands Instructions for controlThe role of compiler
55
Type and Size of Operands
Character: 8 bitsHalf word: 16 bitsWords: 32 bitsDouble-precision floating point: 2 words (64 bits)
56
Type and Size of Operands
How is the type of an operand designated?â Encoding in the opcode
For an instruction, the operation is typically specified in one field, called theopcode
â By tag (not used currently)
Common operand typesâ Character
8-bit ASCII 16-bit Unicode (not yet used)
â Integer One-word 2âs complement
57
Common Operand Types (cont.)
âSingle-precision floating point One-word IEEE 754
âDouble-precision floating point 2-word IEEE 754
âPacked decimal (binary-coded decimal) 4 bits encode the values 0-9 2 decimal digits are packed into one byte
58
Two More Addressing Issues
Access alignment: address % size == 0?â Aligned: load-word @XXXX00, load-half @XXXXX0â Unaligned: load-word @XXXX10, load-half @XXXXX1â Question: what to do with unaligned accesses (uncommon case)?
Support in hardware? Makes all accesses slow Trap to software routine? Possibility Use regular instructions
â Load, shift, load, shift, and
MIPS? ISA support: unaligned access using two instructionsâ lwl @XXXX10; lwr @XXXX10
Endian-ness: arrangement of bytes in a wordâ Big-endian: sensible order (e.g., MIPS, PowerPC)
A 4-byte integer: â00000000 00000000 00000010 00000011âis 515â Little-endian: reverse order (e.g., x86)
A 4-byte integer: â00000011 00000010 00000000 00000000 âis 515â Why little endian? To be different? To be annoying? Nobody knows
59
SPEC2000 Operand Sizes
60
Media and Signal Processing
New data typesâe.g. vertex
32-bit floating-point values for x,y,z and w
âpixel 4 8-bit channels: RGB and A (transparency)
New numeric type for DSP landâfixed point for numbers between +1 and -1
0100 0000 0000 0000 2-1
New operationsâinner product is a common case
hence MAC performance is important effectively an ax + previous b style of computation sometimes called a fused operation
61
Operands for Media and SignalProcessing
Vertexâ(x, y, z) + w to help with color or hidden surfacesâ32-bit floating-point values
Pixelâ(R, G, B, A)âEach channel is 8-bit
62
Special DSP Operands
Fixed-point numbersâ A binary point just to the right of the sign bitâ Represent fractions between â1 and +1â Need some registers that are wider to guard against round-off error
Round-off errorâ a computation by rounding results at one or more intermediate steps, resulting in a result
different from that which would be obtained using exact numbers
63
Fixed-point Numbers (cont.)
2âcomplement number Fixed-point numbers
Douglas L. Jones, http://cnx.org/content/m11930/latest/
64
Example
Give three 16-bit patterns:0100 0000 0000 00000000 1000 0000 00000100 1000 0000 1000What values do they represent if they are twoâs complement integers? Fixed-point
numbers? AnswerTwoâs complement: 214, 211, 214 + 211 + 23
Fixed-point numbers: 2-1, 2-4, 2-1 + 2-4 + 2-12
65
1111111011011100101110101001100001110110010101000011001000010000
-1-2-3-4-5-6-7-876543210
0-1-2-3-4-5-6-776543210
151413121110
9876543210
b3b2b1b0
Bit-Pattern, Unsigned, 2âs Comp, 1âs Comp,
66
Outline
Classifying instruction set architecturesMemory addressingAddressing modesType and size of operands Instructions for controlThe role of compiler
67
SPECint92 codes
The most widely executedinstructions are the simpleoperations of an instructionset
The top-10 instructions for80x86 account for 96% ofinstructions executed
Make them fast, as they arethe common case
68
What Operations are Needed
Arithmetic and Logicalâ Add, subtract, multiple, divide, and, or
Data Transferâ Loads-stores
Controlâ Branch, jump, procedure call and return, trap
Systemâ Operating system call, virtual memory management instructions
All computers provide the above operations
69
What Operations are Needed (cont.)
Floating Pointâ Add, multiple, divide, compare
Decimalâ Add, multiply, decimal-to-character conversions
Stringâ move, compare, search
Graphicsâ pixel and vertex operations, compression/decompression operations
The above operations are optional
70
Relative Frequency ofControl Instructions
71
Operations for Media and SignalProcessing
Partitioned addâ16-bit data with a 64-bit ALU would perform four 16-bit adds
in a single clock cycleâSingle-Instruction Multiple-Data (SIMD) or vector
Paired single operationâPack two 32-bit floating-point operands into a single 64-bit
register
72
Operations for Media and SignalProcessing (cont.)
Saturating arithmeticâ If the result is too large to be represented, it is set to the largest representable
number, depending on the sign of the result
Several modes to round the wider accumulators into thenarrower data words
Multiply-accumulate instructionsâ a <- a + b*c
73
Instructions for Control Flow
74
Control Instruction Types
Jumps - unconditional transferConditional Branches
âhow is condition code set?âhow is target specified? How far away is it?
Callsâhow is target specified? How far away is it?âwhere is return address kept?âhow are the arguments passed? Callee vs. Caller save!
Returnsâwhere is the return address? How far away is it?âhow are the results passed?
75
Distribution of Control Flows
76
Addressing Modes for Control FlowInstructions
How to get the destination address of a control flowinstruction?â PC-relative
Supply a displacement that is added to the program counter (PC) Position independence
â Permit the code to run independently of where it is loaded
â A register contains the target addressâ The jump may permit any addressing mode to be used to supply the target
address
77
Usage of Register Indirect Jumps
Case & SwitchVirtual functions or methods
âC++ or Java
High-order functions or function pointersâC, C++, and Lisp
DLLâsâshared libraries that get dynamically loaded
78
Addressing Modes for Control Instructions
Known at compile time for unconditional and conditionalbranches - hence specified in the instructionâ as a register containing the target addressâ as a PC-relative offset
Consider word length addresses, registers, and instructionsâ full address desired? Then pick the register option.â BUT - setup and effective address will take longer.â if you can deal with smaller offset then PC relative worksâ PC relative is also position independent - so simple linker duty
consider the ease in particular for DLLâs
How do you find out what works?â start by measuring your programs of course.
79
Control instructions
Addressing modesâ PC-relative addressing (independent of program load & displacements are
close by) Requires displacement (how many bits?) Determined via empirical study. [8-16 works!]
â For procedure returns/indirect jumps/kernel traps, target may not be knownat compile time. Jump based on contents of register Useful for switch/(virtual) functions/function ptrs/dynamically linked libraries
etc.
80
How Far are Branch Targets fromBranches?
For Alpha architecture
The most frequent in the integer? programs are to targets that can be encoded in 4-8 bitsAbout 75% of the branches are in the forward direction
81
How to Specify the Branch Condition?
Program Status Word
82
Frequency of Different Types ofCompares in Branches
83
Procedure Invocation Options
The return address must be saved somewhere, sometimesin a special link register or just a GPR
Two basic schemes to save registersâ Caller saving
The calling procedure must save the registers that it wants preserved for accessafter the call
â Callee saving The called procedure must save the registers it want to use
84
Encoding an Instruction Set
85
Encoding an Instruction Set
How the instructions are encoded into a binaryrepresentation for execution?â Affects the size of codeâ Affects the CPU design
The operation is typically specified in one field, called theopcode
How to encode the addressing mode with the operationsâ Address specifierâ Addressing modes encoded as part of the opcode
86
Issues on Encoding an Instruction Set
Desire for lots of addressing modes and registers Desire for smaller instruction size and program size with
more addressing modes and registers Desire to have instructions encoded into lengths that will be
easy to handle in a pipelined implementationâ Multiple bytes, rather than arbitrary bitsâ Fixed-length
87
3 Popular Encoding Choices
Variableâ Allow virtually all addressing modes to be with all operations
Fixedâ A single size for all instructionsâ Combine the operations and the addressing modes into the opcodeâ Few addressing modes and operations
Hybridâ Size of programs vs. ease of decoding in the processorâ Set of fixed formats
88
3 Popular Encoding Choices (Cont.)
89
Reduced Code Size in RISCs
More narrower instructionsCompression
90
Encoding an Instruction set
a desire to have as many registers and addressingmode as possible
the impact of size of register and addressing modefields on the average instruction size and hence onthe average program size
a desire to have instruction encode into lengths thatwill be easy to handle in the implementation
91
Outline
Classifying instruction set architecturesMemory addressingAddressing modesType and size of operands Instructions for controlThe role of compiler
92
Compiler vs. ISA
Almost all programming is done in high-levellanguage (HLL) for desktop and server applications
Most instructions executed are the output of acompiler
So, separation from each other is impractical
93
Goals of a Compiler
CorrectnessSpeed of the compiled codeOthers
âFast compilationâDebugging supportâInteroperability among languages
94
Structure of Compiler
95
Optimization types
High level - done at source code levelâ procedure called only once - so put it in-line and save CALL
more general is to in-line if call-count < some threshold Local - done on basic sequential block
â common subexpressions produce same value - either allocate a register or replacewith single copy
â constant propagation - replace constant valued variable with the constant - savesmultiple variable accesses with same value
â stack reduction - rearrange expression to minimize temporary storage needs Global - same as local but done across branches
â primary goal = optimize loops code motion - remove code from loops that compute same value on each pass and put it
before the loop simplify or eliminate array addressing calculations in loops
Register allocationâ Associate registers with operands
Graph coloring Processor-dependent
â Depend on processor knowledge
96
Compiler Optimization Summary
97
Machine Dependent Optimizations
Strength reductionâreplace multiply with shift and add sequence
would make sense if there was no hardware support for MUL a trickier version: 17x = arithmetic left shift 4 and add
Pipeline schedulingâreorder instructions to minimize pipeline stallsâdependency analysisâcompiler canât see run time dynamics - e.g. branch direction
resolution
Branch offset optimizationâreorder code to minimize branch offsets
98
Compiler Based Register Optimization
Assume small number of registers (16-32) Optimizing use is up to compiler HLL programs have no explicit references to registers
â usually âis this always true?
Assign symbolic or virtual register to each candidatevariable
Map (unlimited) symbolic registers to real registers Symbolic registers that do not overlap can share real
registers If you run out of real registers some variables use
memory
99
Graph Coloring
Given a graph of nodes and edgesAssign a color to each nodeAdjacent nodes have different colorsUse minimum number of colorsNodes are symbolic registersTwo registers that are live in the same
program fragment are joined by an edgeTry to color the graph with n colors, where n
is the number of real registersNodes that can not be colored are placed in
memory
100
Graph Coloring Approach
101
Compiler Optimization Levels
Level 0: unoptimized codeLevel 1: local optimization, code scheduling, local register optimizationLevel 2: global optimization, loop transformation, global register optimizationLevel 3: procedure integration
102
Things you care about for the ISA design
Key: make the common case fast and the rarecase correct
How are variables allocated?How are variables addressed?How many registers will be needed?How does optimization change the instruction mix?What control structures are used?How frequently are the control structures used?
103
Allocation of VariablesStack
âused to allocate local variablesâgrown and shrunk on procedure calls and returnsâregister allocation works best for stack-allocated objects
Global data areaâused to allocate global variables and constantsâmany of these objects are arrays or large data structuresâimpossible to allocate to registers if they are aliased
Heapâused to allocate dynamic objectsâheap objects are accessed with pointersânever allocated to registers
104
Register Allocation Problem
Register allocation is much effective for statck-allocated objects than for global variables.
Register allocation is impossible for heap-allocatedobjects because they are accessed with pointers.
Global variables and some stack variables areimpossible to allocated because they are aliased.âMultiple ways to refer the address of a variable, making it
illegal to put it into a register
105
Designing ISA to Improve Compilation
Provide enough general purpose registers to ease registerallocation ( more than 16).
Provide regular instruction sets by keeping the operations,data types, and addressing modes orthogonal.
Provide primitive constructs rather than trying to map to ahigh-level language.
Simplify trade-off among alternatives. Allow compilers to help make the common case fast.
106
ISA Metrics Orthogonality
â No special registers, few special cases, all operand modesavailable with any data type or instruction type
Completenessâ Support for a wide range of operations and target applications
Regularityâ No overloading for the meanings of instruction fields
Streamlined Designâ Resource needs easily determined. Simplify tradeoffs.
Ease of compilation (programming?), Ease of implementation,Scalability
107
Quick Review ofDesign Space of ISA
Five Primary Dimensions Number of explicit operands ( 0, 1, 2, 3 ) Operand Storage Where besides memory? Effective Address How is memory location
specified? Type & Size of Operands byte, int, float, vector, . . .
How is it specified? Operations add, sub, mul, . . .
How is it specifed?Other Aspects Successor How is it specified? Conditions How are they determined? Encodings Fixed or variable? Wide? Parallelism
108
A "Typical" RISC
32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, Double Precision takes a
register pair) 3-address, reg-reg arithmetic instruction Single address mode for load/store:
base + displacementâno indirection
Simple branch conditions
see: SPARC, MIPS, MC88100, AMD2900, i960, i860PARisc, DEC Alpha, Clipper,CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-3
109
MIPS data types
Bytesâ characters
Half-wordsâ Short ints, OS related data-structures
Wordsâ Single FP, Integers
Doublewordsâ Double FP, Long Integers (in some implementations)
110
Instruction Layout for MIPS
111
MIPS (32 bit instructions)
Op
31 26 01516202125
Rs1 Rd Immediate
Op
31 26 025
Op
31 26 01516202125
Rs1 Rs2
target
Rd Opx
1. Register-Register
561011
2a. Register-Immediate
Op
31 26 01516202125
Rs1 Rs2/Opx Displacement
2b. Branch (displacement)
3. Jump / Call
112
MIPS (addressing modes)
Register direct Displacement Immediate Byte addressable & 64 bit address R0 always contains value 0 Displacement = 0 register indirect R0 + Displacement=0 absolute addressing
113
Types of Operations
Loads and StoresALU operationsFloating point operationsBranches and Jumps (control-related)
114
Load/Store Instructions
115
Sample ALU Instructions