computer = alu + memory registers alu 3 2 5 2 3 let’s try to compute 3 + 2 = 5 32 go to jail and...
DESCRIPTION
Registers ALU GPR Architecture (General Purpose Register) Let’s compute = 5 again ! Bus Y Bus X Bus W Put 3 on bus X Put 2 on bus Y Stuff X and Y into ALU ALU adds X and Y SLU send result to bus W Put bus W into Mem Our programmer needs to do this !TRANSCRIPT
Computer = ALU + Memory
RegistersALU
3
2
5
2
3
Let’s try to compute 3 + 2 = 5
3 2 Go to jail and do not
collect £200
RegistersALU
GPR Architecture(General Purpose Register)
Let’s compute 3 + 2 = 5 again !
32
5
322
3
5 5
Bus YBus X
Bus W
Put 3 on bus X
Put 2 on bus Y
Stuff X and Y into ALU
ALU adds X and Y
SLU send result to bus W
Put bus W into Mem
Our programmer needs to do this !
GRP Machine Details
Memory
Registers
r11
r0
r1
r2
r3
r4
r10
ALU
..
..
0
8
16
24
32
..
..
Load from Memory
Store to Memory
Load reg from mem
Load reg from mem
Add reg to reg into reg
Store reg in mem
Our programmer
needs to do this !
Accumulator Architecure
Memory
ALU
..
..
0
8
16
24
32
..
..
Get 3 from Memory and ADD !
24
8
1. Assume 8 is already in the accumulator. The programmer writes
Accumulator3
8 Add 3
2. The ALU does 3 + 8 = 11 and writes the result back into the accumulator
3
Let’s build a Computer
Let’s take a RISC. What do we need ?• Memory• Registers• ALU• Control Circuits• A programming language• A good Name - Simple Although Meaningful
What’s needed to build Sam-4 ?
PC
Code Memory
Code Memory – to store the program
Arithmetic – Logic Unit to do the maths
business
Registers to hold results of computationsX
Y
W
Y
W
r1r2
r0
X
Data Memory
0
1
7
mar
mdr
Data memory to hold source and results of
our work
Program Memory
PC = 4
12840
Code Memory
add
haltstore
load
add
Memory stores program instructions at a sequence of byte addresses. Each instruction is 32 bits, so the addresses increment by 4 bytes.
Here the Program Counter input address 4 to the memory which reads out the data word (32 bits) at address 4. This is the inst- ruction ‘add’
Address in
Data out
Registers, Registers1. Registers Store data at addresses. Yep, that’s Memory !
3. Multiport Registers have an input port (W) where data is send to be written into the register file.
2. There are TWO read ports (X and Y) where data can be simultaneously read out of the reg file.
4. The addresses for the read ports (X and Y) and the write port (W) come in here.
X
Y
W
Y
W
r1
r2
r0
X
Data Memory
0
1
7
mar
mdr
Here’s the memory
The Memory Data Register (MDR) is a parking place for data coming and going from the memory.
The Memory Address Register holds the address of the data location selected for read or write e,g, 7
7
Here’s Sam
Data Memory
Instruction reg
Code Memory
ALU
r1
r2
r0X
Y
W
X Y
W
0
1
7mar
mdr
Fetch-Execute Cycle
1. Fetch instruction from memory
2. Decode the opcode and read any
registers
3. Do any ALU operations
5. Write back results to registers
(Much more Clever and Useful)
add r3,r2,r1
Get contents of address 1
4. Do any Memory Access
ALU <- r1 ALU <- r2
ALU add
None needed
r3 <- ALU
First Example
ld r0 , [1]
ld r1 , [2]
add r2,r1,r0
st r2 , [7]
Load r0 with data at address 1Load r1 with data at address 2Add r0 and r1. Put result in r2Store r2 in memory address 7Note each of these instructions
runs through 5 steps of its own F-E Cycle
1. Instruction Fetch
Ld r0,[1]
Code Memory Data
MemoryALU
r1
r2
r0
Ld 0 1
PC = 0
X
Y
W
X Y 0
1
7mar
mdr
2. Decode, Reg Ops
Data Memory
+
Code Memory
ALU
r1
r2
r0Ld r0,[1]
Ld 0 1
PC = 4
1
X
Y
W
X Y 0
1
7mar
mdr
3. ALU Operation
Code Memory Data
MemoryALU
r1
r2
r0Ld r0,[1]
Ld 0 1
PC = 4
1
1
1
X
Y
W
X Y 0
1
7mar
mdr
4. Memory Access
Code Memory Data
MemoryALU
r1
r2
r0Ld r0,[1]
Ld 0 1
PC = 41
1
0
7
X
Y
W
X Y 0
1
7mar
mdr
5. Register Write
Code Memory Data
MemoryALU
r1
r2
r0Ld r0,[1]
Ld 0 1
PC = 4
1
0
7
X
Y
W
X Y
mar
mdr
W
1. Instruction Fetch
Data Memory
Code Memory
ALU
r1
r2
r0X
Y
W
X Y
W
0
1
7
add r2,r0,r1
add 2 0 1
PC = 4 mar
mdr
PC = 8
2. Decode, Reg Ops
Y
Data Memory
+
Code Memory
ALU
r1
r2
r0X
W
X Y
W
0
1
7
add r2,r0,r1
add 2 0 1
mar
mdr
3. ALU Operation
Data Memory
Code Memory
ALU
r1
r2
r0X
Y
W
X Y
W
0
1
7
add r2,r0,r1
add 2 0 1
PC = 8 mar
mdr
4. Memory Access
Data Memory
Code Memory
ALU
r1
r2
r0X
Y
W
X Y
W
0
1
7
add r2,r0,r1
add 2 0 1
PC = 8 mar
mdr
5. Register Write
W
Data Memory
Code Memory
ALU
r1
r2
r0X
Y
W
X Y 0
1
7
add r2,r0,r1
add 2 0 1
PC = 8 mar
mdr
Instruction Encoding Example
add rd rs rt unused
rd <- rs + rt
e.g. add r3, r1, r2 means r3 = r1 + r2
010110 00011 00010 00001 unused
All Sam’s instructions take up 32 bits.
Sam’s instructions start with the opcode then the destination reg- ister then the source register
opcodedestination
Source regs
First 6 bits for the opcode.
3 2 1
6 5 5 5Nr of Bits 11
The Instruction Register
010110 00010 00001 00011 unusedCode Memory
Add r2,r1,r3
add 2 1 3
312
Loaded with the instruction, the IR decodes this into bits which drive the
CPU digital logic circuits
?
Electronic Wires
Control Path001010 00010 00001 00011 unused
000101 00010 00001 00011 unused
add r2, r1, r3
sub r2, r1, r3
ALU
ALU
+
+
-
-
The add instruction is decoded and produces digital signals which select the + function in the ALU
Add !
Subtract !
The sub function decoded produces different digital signals
r1 r3
r1 r3
Sam and MIPS are 32 bit
001010 00110 01001 00011 unused
001010 101001111110010101011011111
001010 00010 00001 0101001111111011
opcode rd rs rt unused
opcode rd rs 16-bit address
add rd,rs,rt
ldr rd,[rs+c]
ldr rd,[c]opcode 26-bit address
32 bits wide
Other Arithmetic Instructions
sub rd rs rt unused
rd <- rs - rtopcodedestination
Source regs
Same coding applies to other arithmetic instructions
sub r3,r2,r1 and r2,r1,r0 or r5,r1,r2
6 5 5 5Nr of Bits
unused
A simple ‘Load’ instruction‘Load into rd the contents of memory at address which is in reg rs.’ Simple!
7696231511541453
210
ldr r9 , [r1]3
145r9
145
rsrdld
opcodedestination
Single source reg
1. Let’s say have already
loaded r1 with 3
2. Get data from mem at addr r1
(=3)
2. Load the data into r9
memory
A more complex ‘Load’
constant crsrdldr
opcodedestination
Source Load register rd with the contents of memory which you find at address r1 + c.
7696231511541453
210
ldr r9 , [r1 + 2]3 + 2
5
231r9
231
The mem
address is
formed as a sum
memory
… and a ‘Store’ instruction
constant crsrdstr
opcode destinationSource
Note here the data is moved from destination to store. Confusing? Mm.
7696196511541453
210
str r9 , [ r1 + 2 ]
3 + 2
5
196r9
196
1. Get data from r1
2. Write it to memory
What’s this?
‘Load Immediate’
Constant Crdldi
opcodedestination
In load immediate we get the constant C immediately following the opcode into the reg.
ldi r9 , 5
5
5r9
All reference to memory has gone!
Load ‘5’ straight into r9
A Summary So Far …
Example
add r3,r1,r1add rd,rs,rt
str r6,[r1 + 1]str rd, [rs + c]
str r0, [r1]st rd, [rs]
ldr r2,[r3 + 4]ldr rd, [rs + c]
ldr r2,[4]ld rd, [rs]
ldi r0,3ldi rd,C
Now it’s time to move on and look in detail at the hierarchy of computer languages – to see the influence
on the ISA.
Electronics
Assembling a Spreadsheet
ld r0, [ g ]
ld r1, [ h ]
add r2,r0,r1
st r2, [ f ]
Main() {
int f,g,h;
f = g + h;
}
Excel Applicatio
n
HLL Imple-mentation
ISA Assembler
The Great Idea here is that the ISA we need at
the bottom must serve the grand master at the
top, the Application.
The ISA must support the HLL implementation
Arrays (= Tables)How do we sum the array of numbers in column B? 1. We would use the
instruction ld r1,[r0 + B] where B=3, the start address of the array
2. Then we load r0 with 0 then 1, then 2, … to scan down the array
Ld r0 , 0
Ld r3 , 0
Ld r1, [r0 + 3]
r0 (=0) +3 = 3
Arrays (= Tables)How do we sum the array of numbers in column B?
Inc r0
Ld r1, [r0 + 3]
add r3,r3,r1
Get next cell, lad its value and add it
to the sum, in r3
1. Increment r1 to get the next data value inc r1 (0 + 1 = 1)
2. ld r2,[r0 + B] where B=3, the start address of the array but now r- contains 0
Making Decisions
if(c == 10) b = b + 2;
Let’s say we want to add 2 to a number B if
another number C is equal to 10
You mean, ‘If C = 10, then add 2
to B’
Yep
Here’s how we would do
it in C…
addi r3,r3,2
bne r2,r1,36
…
…
ldi r1,10
36
32
28
24
20
16
Branch around the
addBranch if not equal r1 r2 to addr 36
What about SAM?
First load the test number
10
Loops
ldi r2 , 0
ldi r1 , 4
ldi r0 , 0
8
4
0
bne r0,r1,12
addi r0 , r0 , 1
addi r2 , r2 , 3
20
16
12
Let’s say we want to make the sequence 0,3,6,9,12 and stop.
01234
0369
12
We take 4 steps and each step add
3 x = x + 3
So we need a register to
keep track of the number of steps (r0)
And a register to hold the
sum at each step
r0 r2Branch
unless r0 = r1 = 4
CBP 2005 Comp3070 Computer Architecture
75
Some x86 instructionsmov ax , [bx + c]mov [ax] , bxadd ax , bxadd [bx] , ax
These look rather like Sam’s RISC ops
But this is not. Here the contents of ax is being added straight into memory ! The x86 is a register – memory ISA and Sam is a register – register ISA
ldi r1 , aldi r2 , badd r3,r1,r2 st r3 , b
mov ax, aadd b,ax
Let’s compare the RR and RM ISA’s. Clearly RR needs more memory while the RM uses stronger operations
Sam
Intel x86
CBP 2005 Comp3070 Computer Architecture
77
Intel Instruction FormatIA-32 Format
CBP 2005 Comp3070 Computer Architecture
79
Variable Length Instructions
0% 10% 20% 30%
1
2
3
4
5
6
7
8
9
10
ExpressoGccSpiceNasa
All Sam’s instructions had the same length, 32 bits. This is also true for other RISC ISA’s such as SPARC and MIPS. Compare this with the x86 instruction vary from 1 to 17 bytes. Here’s some stats.
Inst
ruct
ion
Leng
th (b
ytes
)
Frequency of use
Clearly long complex instructions are used infrequently
But the use does depend on the app.
CBP 2005 Comp3070 Computer Architecture
81
Instruction TimingT1 T2 T3 T4 T5
Fetch
Decode, Reg Op
ALU Op
Mem Access
Reg Write
All Sam’s instructions occur in 5 clock cycles
One Clock Cycle
Time
• 1 Gigahertz SPARC in 1 second are 1 GigaClockCycles• That’s 109 cycles• That’s 1,000,000,000 cycles• That’s 200,000,000 add ops !
CBP 2005 Comp3070 Computer Architecture
83
Variable Time InstructionsHere’s a timing diagram for an Intel add
T1 T2 T3 T4 T5
Fetch Decode, Reg Op ALU Mem
AccessReg Write
T1 T2 T3 T4 T5
Fetch Decode, Reg Op ALU Mem
AccessReg Write
add ax , [bx + c]
[bx + c] ax = ax + mem[]
We need two adds. The first to get the address summed up …
… and the second to actually add memory to register ax
CBP 2005 Comp3070 Computer Architecture
85
strcmp(str, Greenspan);
Potent x86 Instructions
mov x,2 Immediate to memory 6xlat x Translate al via table 1imul x Multiply memory with
ax4
inc x Increment memory by 1
4
Repne scasb Scan string for match ! various
Greenspan
1.Application
2.High-Level Language (‘C’)
3.Intel ISA code
CBP 2005 Comp3070 Computer Architecture
87
Top 10 Intel x86 InstructionsTop 10 Intel x86 InstructionsRank Instruction Usage1 load 22% 2
conditional branch 20% 3 arithmetic / logic 19%
4 compare 16% 5 store 12 % 6 move reg - reg 4% 7
call - return 2%
We see that most instructions are Simple load, store, calculate, branch. None of Intel’s potent stuff figures here. So why did Intel design instructions no-one uses ?
CBP 2005 Comp3070 Computer Architecture
89
ISA R&D into the 80’s
1980 Berkeley Patterson RISC (SPARC)1981 Stanford Hennessy MIPS - Easy to Decode Ops
- Fast Issue Rate - Only load and Store references memory - Lots of registers
Emerging Design Guidelines
Let’s downshift and make things simpler …• Use simple instructions, load, store, add• Many of these will do one x86 potent op• Need more memory, but memory is cheap• More CPU cycles, but can still be faster
CBP 2005 Comp3070 Computer Architecture
91
Intel Architecture
Looks Great from the outside …… but is a golden mishmash with history of add-ons
CBP 2005 Comp3070 Computer Architecture
93
RISC Architecture
RISC Architecture
Minimalist Functional
CBP 2005 Comp3070 Computer Architecture
95
Summary … so farRISC
MinimalistSomething like ZenAll instructions the same length in memorySmall number of instructionsSmall number of addressing modesSimple instructions5 clock cyclesSPARC, MIPS
CISC
Different Length in memoryLarge number of instructionsHuge number of addressing modesComplex InstructionsVariable number of clock cycles.
Intel
CBP 2005 Comp 3070 Computer Architecture
97
Today the consequences of …
Intel (CISC) MIPS (RISC)
CBP 2005 Comp 3070 Computer Architecture
99
Laundry Model
Washer Drier Store Basket Wardrobe
CBP 2005 Comp 3070 Computer Architecture
101
Process Steps
A. Wash then Dry
idle
idle running
running
time
time9.00 10.00 11.00
1. Load the washer at 9.00
2. Done at 10, load the drier
3. Drier Done at 11
CBP 2005 Comp 3070 Computer Architecture
103
Sequential Process
3 loads takes 6 hours
time9.00 15.00 11.00
1. Load washer at 9.002. Done at 10, load
drier3. Drier Done at 114. Reload washer at
115. Done at 12, load
drier6. Drier done at 137. Reload washer at
138. Done at 14, load
drier9. Done at 15
13.00
CBP 2005 Comp 3070 Computer Architecture
105
Overlapping Process
3 loads takes 4 hours
time9.00 15.00 11.00
1. Load washer at 9.002. Done at 10, load drier
reload washer3. Both Done at 11. Reload
drier reload washer4. Both done at 12. Reload
drier5. Drier done at 13
13.00
From 10.00 till 11.00 both washer and dryer running concurrently
CBP 2005 Comp 3070 Computer Architecture
107
Washing Pipeline Filling
time
9.00 11.00 13.00 15.00 17.00
18.00
5 loads in 9 hours
5 Cycles !!!1. Get washing2. Wash3. Dry4. Store5. Put away
CBP 2005 Comp 3070 Computer Architecture
109
Can we Pipeline SAM ?
Data Memory
Instruction reg
Code Memory
ALU
r1
r2
r0X
Y
W
X Y
W
0
1
7mar
mdr
1.Fetch 2.Dec/Reg 3.ALU 4.Mem
5.RW
CBP 2005 Comp 3070 Computer Architecture
111
Pipelined Sam4
Data Memory
0
1
7
X
Y
W
Y
W
r1r2
r0
X
Code Memory
1.Fetch 2.Dec/Reg
3.ALU 4.Mem 5.RW
Buffer
time
CBP 2005 Comp 3070 Computer Architecture
113
5 Stages in Pipeline
ALUMem Reg Mem Reg
add r3,r1,r2 r1,r2 r3add
Let’s take the instruction add r3,r1,r2 and show which stage is needed for each part of the instruction.
1.Fetch 2.Dec/Reg
3.ALU 4.Mem 5.RW
time
CBP 2005 Comp 3070 Computer Architecture
115
ld r0 Mem r3
Two Instructions
ld r3,[r0+2]
Two instructions into the pipeline
add r4,r1,r2 ALUadd r1,r2 r4
r0
2
time
CBP 2005 Comp 3070 Computer Architecture
117
Structural Hazard
ALUMem Reg Mem Reg
ALUMem Reg Mem Reg
ALUMem Reg Mem Reg
ALUMem Reg Mem Reg
ALUMem Reg Mem Reg
Here we are being asked to read from memory and write to it simultaneously. Impossible!
Write (store)
Read (fetch)
Solution – Use separate code and data memories
add r4,r1,r2
st r0,[5]
CBP 2005 Comp 3070 Computer Architecture
119
Hazardous Washing
time
9.00 11.00 13.00 15.00 17.00
18.00
Washing basket containes both clean and dirty washing!
CBP 2005 Comp 3070 Computer Architecture
121
Code and Data Memories
ALUMem Reg Mem Reg
ALUMem Reg Mem Reg
ALUMem Reg Mem Reg
ALUMem Reg Mem Reg
ALUMem Reg Mem Reg
CBP 2005 Comp 3070 Computer Architecture
123
add r1,r2 r3
Data Hazard
add r3,r1,r2
but need r3 hereEARLIER !
add r4,r1,r3 add r1,r3 r4
r3 set heretime
CBP 2005 Comp 3070 Computer Architecture
125
Data Hazard
ALUMem Reg Mem Reg
ALUMem Reg Mem Reg
ALUMem Reg Mem Reg
ALUMem Reg Mem Reg
ALUMem Reg Mem Reg
add r3,r1,r2
add r4,r1,r3
Need value of r3 for second instruction before the first is complete.
CBP 2005 Comp 3070 Computer Architecture
127
Pipeline Stalls
ALUMem Reg Mem Reg
ALUMem Reg Mem Reg
ALUMem Reg Mem Reg
Mem
ALUReg Mem RegStall Stall
ALUMem Reg Mem Reg
add r3,r1,r2
add r4,r1,r3
Resolve Hazard – Insert delay into second instruction stream. ‘Stall’ Cycles.
But this needs extra electronics on the chip. Complex and Costly.
CBP 2005 Comp 3070 Computer Architecture
129
Forwarding
ALUMem Reg Mem Reg
ALUMem Reg Mem Reg
ALUMem Reg Mem Reg
ALUMem Reg Mem Reg
ALUMem Reg Mem Reg
add r3,r1,r2
add r4,r1,r3
Need value of r3 for second instruction before the first is complete.
So build in extra circuits to get the data as soon as it is available from the ALU
CBP 2005 Comp 3070 Computer Architecture
131
Compiler resolves Hazard
ALUMem Reg Mem Reg
ALUMem Reg Mem Reg
ALUMem Reg Mem Reg
ALUMem Reg Mem Reg
add r3,r1,r2
add r4,r1,r3
Compile can detect possible hazard and insert 2 nops (‘no ops’)
ALUMem Reg Mem Reg
ALUMem Reg Mem Regnop
nop
CBP 2005 Comp 3070 Computer Architecture
133
Example op code regs alu mem reg
writeld r1,[7]
ld r2,[8]
add r3,r1,r2
ld r1[7]
ld r2[8]
addr1, r2 r3