basavaraj talawar [email protected] · course structure textbooks – j p hayes, computer...
TRANSCRIPT
CO200 - Computer Organization & Architecture
Basavaraj [email protected]
Course Syllabus● Processor Basics
– CPU organization, Data representation and Instruction Sets
● Datapath Design– Fixed point arithmetic
– Adders, Subtracters, Multipliers, Dividers.
– ALU, Floating point arithmetic
● Control Design– Hardwired control, Microprogrammed control, Pipeline control
● Memory Organization– Serial vs. Random Access Memories
– Caches, Virtual Memory
● Principles of Pipelining● Principles of Parallel Computing
Course Structure● Textbooks
– J P Hayes, Computer Architecture and Organization, 3 ed., McGraw Hill.
– Hwang and Briggs, Computer Architecture and Parallel Processing, McGraw Hill.
– D Patterson and J Hennessy, Computer Organization and Architecture, MK, 3 ed.
● Other References– NPTEL course on “High Performance Computing” by Matthew
Jacob, IISc.
● Guest Lectures● About Course
– Surprise Quizzes – 15%, Assignments – 10%, Mid Sem – 25%, Final Exam – 50%
Course Objectives● To understand how a computer works● To know the architecture and working of
components inside a computer– Processor, Control unit, ALU, Memory, I/O
Course Objectives – Expanded● How is a machine language program executed
by a computer?● How does the software instruct the hardware to
perform a desired action? How does the hardware instruct a desired unit to perform its corresponding operation?
● Why study all of this?– To gain insight into the setting in which our programs
execute
– To improve the setting in which our programs execute – to improve the performance of the system
What is a Computer?
What is a Computer?● An electronic device which is capable of
receiving information (data) in a particular form and of performing a sequence of operations in accordance with a predetermined but variable set of procedural instructions (program) to produce a result in the form of information or signals.
Basic Computer Organization● Machine instructions
– Description of a primitive operation that a machine hardware is able to understand
– In binary
– Example of a 32b machine language instruction
00110011101100000100001110101011
Basic Computer Organization● Instruction Set
– Complete specification of all the kinds of instructions that the processor hardware was built to execute
– Eg.: ADD, SUB, XOR, JUMP, …
● How are programs written in high level languages such as C translated into a language that the machine understands?
The Computer Program● Description of
algorithms and data structures to achieve a specific objective
● A compiler translates the high level language into assembly language.
● An assembler translates the assembly into machine code.
Basic Computer Organization● Processor – Executes programs● Main Memory – Holds program and data● I/O – For communication and data
ALU REGISTERS
CONTROL
I/O I/O I/O I/O
BUS
MEMORY
Processor (CPU)
Inside the Processor● Control Hardware: Hardware to manage instruction
execution● ALU: Arithmetic and Logical Unit (hardware to do
arithmetic and logic operations)● Registers: Small units of memory to hold
data/instructions temporarily during execution● Memory: Stores information being processed by the
CPU● Input: Allows the user to supply information to the
computer● Output: Allows the user to receive information from the
computer
Physics in the Real World
Computer Architecture
Computer architecture is the design of the
abstraction/implementation layers that allow
us to execute information processing applications
efficiently using manufacturing technologies
Application
Algorithm
Programming Language
Operating System/Virtual Machines
Instruction Set Architecture
Organization/Microarchitecture
Register-Transfer Level
Gates
Circuits
Devices
Physics
Architecture vs. Organization● Architecture/Instruction Set Architecture (ISA)
– Programmer visible state (Memory & Registers)
– Operations (Instructions and how they work)
– Input/Output
– Data Representation – Types/Sizes
● Microarchitecture/Organization:– Is the way a given ISA is implemented on a
processor
Same Architecture, Different Organizations● AMD Athlon II X4
– X86 ISA
– Quad Core, 2.9GHz, 125W
– 3 Instructions/Cycle/Core
– 64KB L1Cache, 512KB L2 Cache
● Intel Atom– X86 Instruction Set
– Single Core, 1.6GHz, 2W
– 2 Instructions/Cycle/Core
– 32KB/24KB L1 I/D Cache, 512KB L2 Cache
Different Architectures, Organizations● AMD Vishera
– X86 ISA
– 8 Core, 4.7 GHz, 125W
– 64KB L1Cache, 2MB L2 Cache, 8MB L3
● IBM POWER 8– Power ISA
– 12 cores, 4.5GHz, 250W
– 64KB L1Cache, 512KB L2 Cache, 8MB L3.
Recap● What is a Computer?● Computer Organization and Architecture
– Registers, Control Unit, ALU, Memory, I/O, Bus
● ISA, Machine language● Organization vs. Architecture
Coming up …● Processor Performance● Machine Models
Concept of Time and Speed
● The period is the duration of one cycle in a repeating event – Period = Cycle time
● Frequency: Number of occurrences of a repeating event per unit time.– SI unit: Hertz (Hz)
Cycle Time=1
Frequency
On Processor Performance
Program ExecutionTime=
ExecutionTime per Instruction×Total Program Instructions
CPU Time=ExecutionTime per Instruction×InstructionCount
ExecutionTime per Instruction=Cycles spent per Instruction×Cycle Time
CPU Time= IC×Cycles per Instruction×CycleTime
What is the execution time of a program containing a millionInstructions each occupying 4 cycles in a 2 GHz processor?What is the execution time of a program containing a millionInstructions each occupying 4 cycles in a 2 GHz processor?
ExampleExample
● How is frequency related to performance?
Iron Law of Processor Performance
CPU Time= IC×Cycles per Instruction×CycleTime
CPU time=InstructionsProgram
∗Clock cyclesInstruction
∗SecondsClock cycle
Time per Cycle=1
Frequency
CPU Time=IC×CPIFrequency
On Processor Performance
CPU time=InstructionsProgram
∗Clock cyclesInstruction
∗SecondsClock cycle
COMPILERARCHITECTURE AND
ORGANIZATION
The GNU C Compiler● $gcc hello.c
The compiler and its working: Guest lecture by Dr. Janakiraman, IBM, August 2
Operations and Operands
● C = A + B● Operation: Addition. Operands: A & B. Result: C.● Instruction: ADD C, A, B
Where do Operands come fromand where do results go?
Architectural decision
Memory – Toy Example
.........
...
.........
...
.........
...
.........
...
.........
● Byte addressable
● Linearly increasing addresses
● Memory is 'growing down'
● Any location can be read
from/written into.
● How many bytes can be stored
in this example memory?
0x0000
0x0100
0x0101
0x0102
0x00FF
0xFFFF
0xFFFE
Recap● Processor performance● Abstract view of Memory
Your desktop has a 4GB Memory. How long (in bits)is its address?Your desktop has a 4GB Memory. How long (in bits)is its address?
ExampleExample
Operations and Operands
ALUControl
i1 i2
... Register File
.........
...Memory
PR
OC
ES
SO
R
Machine Model – Stack
...
TOS
...
...
...
...
...
...
...
...
...
...
STACK● Stack is a form of memory● Top of the Stack (Stack Pointer)● Push and Pop
0x00
0x01
0x02
0xFF
0xFE
Stack
94
71
...
10
TOS
...
...
...
...
...
...
...
...
STACK
0x00
0x01
0x02
0xFF
0xFE
PUSH 10PUSH 12POP 13PUSH 7
0x02TOS
172
44
255............77
0x07
0x10
0x12
0x13
...
...
0x03
0x04
0x05
0x06 ...
MEMORY
Stack
...
...
...
...
...
...
...
...
...
...
...
STACK
0xFF
0xFE
PUSH 10PUSH 12POP 13PUSH 7
172
44
255............77
0x07
0x10
0x12
0x13
...
...
MEMORY
94
71
10
...
...
0x00
0x01
0x02
0x03
0x04
0x05
0x06
TOS77
TOS
0x02TOS0x03
Stack
...
...
...
...
...
...
...
STACK
0x00
0x01
0x02
0xFF
0xFE
PUSH 10PUSH 12POP 13PUSH 7
172
44
255............77
0x07
0x10
0x12
0x13
...
...
MEMORY
TOS0x03
44TOS770x03
0x04
0x05
0x06
94
71
10 0x04
Stack
44
...
TOS
77
...
...
...
...
...
...
STACK
0xFF
0xFE
PUSH 10PUSH 12POP 13PUSH 7
TOS
172
44
255............77
0x07
0x10
0x12
0x13
...
...
MEMORY
0x00
0x01
0x02
0x03
0x04
0x05
0x06
94
71
10
44
TOS0x040x03
Stack
255
...
TOS
44
...
...
...
...
...
...
STACK
0xFF
0xFE
PUSH 10PUSH 12POP 13PUSH 7
44
255............77
0x07
0x10
0x12
0x13
...
...
MEMORY
0x00
0x01
0x02
0x03
0x04
0x05
0x06
94
71
10
TOS0x040x04
44
Machine Model – Stack
ALU
...
.........
...
TOS
STACK
...
TOS
...
...
...
...
...
...
...
...
...
...
STACK
PR
OC
ES
SO
RM
EM
OR
Y
Where do Operands come fromand where do results go?
Machine Model – Stack
ALU
...
.........
...
TOS
STACK
PR
OC
ES
SO
R ● The operands are always TOS, TOS – 1.
● Result always goes into TOS – 1.
● Implicit operands● Instruction: ADD● Example equation: d=(a+b)*c
Postfix Expressions
a + b ab+
(a + b)*c
X*c Xc*
where X = (a + b)
ab+c*
postfix form of (a + b) is ab+
Postfix Expressions
a + (b*c) abc*+
(a + b)* (c - d)
X * (c – d)
where X = (a + b)
X * Y
where Y = (c – d)
XY*
Xcd-*
replace Y with its postfix form
replace X with its postfix form
ab+cd-*(a + b)* (c - d)
Ze*
(((a + b)*c)+d)*e
((X*c)+d)*e
where X = (a + b)
(Y+d)*e
where Y = (X*c)
Z*e
where Z = (Y+d) replace Z with its postfix form
Yd+e*
replace Y with its postfix form
Xc*d+e*
replace X with its postfix form
ab+c*d+e*
Reverse Polish Notation● A way of expressing arithmetic expressions that
avoids the use of brackets.● Evaluated left-to-right. Natural on a stack.● Devised by the Polish philosopher and
mathematician Jan Łukasiewicz (1878-1956)
Infix Notation RPNa+b ab+(a+b)*c ab+c*a+(b*c) abc*+(a+b) * (c-d) ab+cd-*(((a+b)*c)+d)*e ab+c*d+e*
RPN Example
ab+Postfix Form:
Stack
...
...
...
...
...
...
...
...
...
...
...
a
RPN Example
ab+Postfix Form:
Stack
...
...
...
...
...
...
...
...
...
...
...
a
b
RPN Example
ab+
a _ bInfix Form:
Postfix Form:
Stack
...
...
...
...
...
...
...
...
...
...
...
a
b
RPN Example
ab+
a + bInfix Form:
Postfix Form:
Stack
...
...
...
...
...
...
...
...
...
...
...
a
b
RPN Example
ab+
Infix Form:
Postfix Form:
Stack
...
...
...
...
...
...
...
...
...
...
...
a + ba + b
RPN Example
ab+c*Postfix Form:
Stack
...
...
...
...
...
...
...
...
...
...
...
a
RPN Example
ab+c*Postfix Form:
Stack
...
...
...
...
...
...
...
...
...
...
...
a
b
RPN Example
ab+c*
a + bInfix Form:
Postfix Form:
Stack
...
...
...
...
...
...
...
...
...
...
...
a
b
RPN Example
ab+c*
Infix Form:
Postfix Form:
Stack
...
...
...
...
...
...
...
...
...
...
...
a + b
c
RPN Example
ab+c*
Infix Form:
Postfix Form:
Stack
...
...
...
...
...
...
...
...
...
...
...
(a + b) * c (a+b)*c
RPN Example
ab+c*
Infix Form:
Postfix Form:
Stack
...
...
...
...
...
...
...
...
...
...
...
(a+b)*c(a + b) * c
RPN Example
ab*cde/-*Postfix Form: ...
a TOS
...
...
...
...
...
...
...
...
...
...
Stack
RPN Example
ab*cde/-*
Infix Form:
Postfix Form: ...
(a*b)*(c-(d/e))
...
...
...
...
...
...
...
...
...
...
Stack
(a*b)*(c-(d/e))
Machine Model – Stack
ALU
...
TOS
STACK
PR
OC
ES
SO
R ● d = (a + b) * c● RPN: d = ab+c*
PUSH aPUSH bADDPUSH cMULTIPLYPOP d
...
...
...
...
...
...Sequence of Instructions
.........
...
b
a
c
d
Machine Model – Stack
ALU
...
TOS
STACK
PR
OC
ES
SO
R
● d=(a+b)*c
a
...
...
...
...
...
...
PUSH aPUSH bADDPUSH cMULTIPLYPOP d
.........
...
b
a
c
d
Machine Model – Stack
ALU
...
TOS
STACK
PR
OC
ES
SO
R
● d=(a+b)*c
b
a
...
...
...
...
...
...
PUSH aPUSH bADDPUSH cMULTIPLYPOP d
.........
...
b
a
c
d
Machine Model – Stack
ALU
...
d
c
b............
TOS
STACK
PR
OC
ES
SO
R
● d=(a+b)*c
a
b
a
...
...
...
...
...
...
PUSH aPUSH bADDPUSH cMULTIPLYPOP d
TOS
b
.........
...
b
a
c
d
Machine Model – Stack
...
d
c
b............
STACK
PR
OC
ES
SO
R
● d=(a+b)*c
a
a
...
...
...
...
...
...
PUSH aPUSH bADDPUSH cMULTIPLYPOP d
TOS
ba
TOS
a+b
.........
...
b
a
c
d
Machine Model – Stack
...
STACK
PR
OC
ES
SO
R
● d=(a+b)*c
a + b
...
...
...
...
...
...
PUSH aPUSH bADDPUSH cMULTIPLYPOP d
TOS
a+b
TOS
.........
...
b
a
c
d
Machine Model – Stack
...
STACK
PR
OC
ES
SO
R
● d=(a+b)*c
a + b
...
...
...
...
...
...
PUSH aPUSH bADDPUSH cMULTIPLYPOP d
cTOS
.........
...
b
a
c
d
Machine Model – Stack
...
STACK
PR
OC
ES
SO
R
● d=(a+b)*c
(a+b)*c
...
...
...
...
...
...
PUSH aPUSH bADDPUSH cMULTIPLYPOP d
TOS
.........
...
b
a
c
d
Machine Model – Stack
...
STACK
PR
OC
ES
SO
R
● d=(a+b)*c
...
...
...
...
...
...
PUSH aPUSH bADDPUSH cMULTIPLYPOP d
TOS
.........
...
b
a
c
d(a+b)*c
Stack based Machines● Burrough's B5000 (1960)● Forth machine● JVM, Intel x87 floating point unit.
Accumulator Based Machine Model
ALU
x............
ACCUMULATOR● One operand is implicit – the
accumulator.● Another operand is brought in
from the memory● The result of an operation is
always stored in the accumulator.
● Instruction: ADD x● Example: d = (a + b) * c
Accumulator Based Machine Model
ALU
.........
...
ACCUMULATOR
● d = (a + b) * c
LOAD aADD bMULTIPLY cSTORE d
Accumulator Based Machine Model
c
b
a............
a
● d = (a + b) * c
LOAD aADD bMULTIPLY cSTORE d
d LOAD: Transfer data from the memory into the processor
Accumulator is the implicit destination for the load operation.
Accumulator Based Machine Model
c
b
a...
...
a
● d = (a + b) * c
LOAD aADD bMULTIPLY cSTORE d
d
a+b
ba
Accumulator Based Machine Model
c
b
a...
...
● d = (a + b) * c
LOAD aADD bMULTIPLY cSTORE d
d
a+b(a+b)*c
ca+b
Accumulator Based Machine Model
c
b
a...
...
● d = (a + b) * c
LOAD aADD bMULTIPLY cSTORE d
d
(a+b)*c
STORE: Transfer data from the processor into the memory.
Destination in memory: dImplicit source: Accumulator
(a+b)*c
Accumulator Based Machines
● IBM 701 (1952)● PDP-8, PDP-12● Intel 4004, 8008, 8080, 8086
… ● Intel x86 processors still use
primary accumulator EAX and secondary accumulator EDX for multiplication and division of large numbers (MUL ECX)
Register–Memory Machine Models
...
...
...
...
...
...
...
...
...
...
...
REGISTER FILE
R31
R30
R0
R1
● Small units of memory to hold data/instructions temporarily during execution
● Each register identified by a number – R0, R1, …, R31
● All the registers make up a Register File
Register–Memory Machine Models
ALU
...
.........
...
...
...
...
...
...
...
...
...
...
...
...
REGISTER FILE
R31
R30
R0
R1
● Register file supplies one operand.
● Memory supplies another.
● Result is stored back in the register file.
● No implicit operands● d = (a + b) * c
Register–Memory Machine Models
ALU
...
LOAD R1, aADD R2, R1, bMULTIPLY R3, R2, cSTORE R3, d
● d = (a + b) * c
c
b
a...
...d
Register–Memory Machine Models
ALU
...
LOAD R1, aADD R2, R1, bMULTIPLY R3, R2, cSTORE R3, d
● d = (a + b) * c
c
b
a...
...d
a
Source in Memory: aDestination in Register File: R1
Register–Memory Machine Models
...
LOAD R1, aADD R2, R1, bMULTIPLY R3, R2, cSTORE R3, d
● d = (a + b) * c
c
b
a...
...d
a
a+b
Register–Memory Machine Models
...
LOAD R1, aADD R2, R1, bMULTIPLY R3, R2, cSTORE R3, d
● d = (a + b) * c
c
b
a...
...d
a
a+b
(a+b)*c
Register–Memory Machine Models
...
LOAD R1, aADD R2, R1, bMULTIPLY R3, R2, cSTORE R3, d
● d = (a + b) * c
c
b
a...
...d
a
a+b
(a+b)*c
Source in RF: R3Destination in Memory: d
Register – Register Machine Model
ALU
...
.........
...
● No implicit operands● Both operands are supplied from
the Register file.● Memory is accessed only
through Load and Store instructions.
● d = (a + b) * c
Machine Models – Comparison● Number of explicitly named operands● Number of instructions that can access data
from memory● Code size● Amount of data transferred between memory
and processor● Complexity of hardware● Ease of compilation (ease of generation of
machine code).
Machine Models – Memory Operands
Number ofMemory Addresses
Max. No. of operands allowed
Type of architecture
Examples
0 3 Load-store Alpha, ARM, MIPS, PowerPC, SPARC, SuperH, TM32
1 2 Register – memory IBM 360/370, Intel x86, Motorola 68000, TI TMS320C54x
2 2 Memory – memory VAX
3 3 Memory – memory VAX
Machine Models – Memory OperandsType Advantages Disadvantages
Register-Register(0, 3)
Simple, Fixed length encoding. Simple code generation model. Instructions take similar numbers of clocks to execute.
Higher instruction count than architectures with memory references in instructions. More instructions and lower instruction density lead to larger programs.
Register-Memory(1,2)
Data can be accessed without a separate load. Instruction format easy to encode. Good density.
Source operand is destroyed. Encoding a register number and a memory address in each instruction may restrict the number of registers. Clocks per instruction vary.
Memory-Memory(2,2) or (3,3)
Most compact. Doesn't waste registers for temporaries.
Large variations in instruction size, especially for three-operand instructions. Large variation in work per instruction. Memory accesses create a bottleneck.
C = A + B
ALU
...
............
TOS
STACK
ALU
............
ACCUMULATOR
ALU
...
............
REGISTOR-MEMORY
ALU
...
............
REGISTER-REGISTER
Push APush BAddPop C
Load AAdd BStore C
Load R1, AAdd R3, R1, BStore R3, C
Load R1, ALoad R2, BAdd R3, R1, R2Store R3, C