basavaraj talawar [email protected] · course structure textbooks – j p hayes, computer...

CO200 - Computer Organization & Architecture

Basavaraj [email protected]

Course Syllabus● Processor Basics

– CPU organization, Data representation and Instruction Sets

● Datapath Design– Fixed point arithmetic

– Adders, Subtracters, Multipliers, Dividers.

– ALU, Floating point arithmetic

● Control Design– Hardwired control, Microprogrammed control, Pipeline control

● Memory Organization– Serial vs. Random Access Memories

– Caches, Virtual Memory

● Principles of Pipelining● Principles of Parallel Computing

Course Structure● Textbooks

– J P Hayes, Computer Architecture and Organization, 3 ed., McGraw Hill.

– Hwang and Briggs, Computer Architecture and Parallel Processing, McGraw Hill.

– D Patterson and J Hennessy, Computer Organization and Architecture, MK, 3 ed.

● Other References– NPTEL course on “High Performance Computing” by Matthew

Jacob, IISc.

● Guest Lectures● About Course

– Surprise Quizzes – 15%, Assignments – 10%, Mid Sem – 25%, Final Exam – 50%

Course Objectives● To understand how a computer works● To know the architecture and working of

components inside a computer– Processor, Control unit, ALU, Memory, I/O

Course Objectives – Expanded● How is a machine language program executed

by a computer?● How does the software instruct the hardware to

perform a desired action? How does the hardware instruct a desired unit to perform its corresponding operation?

● Why study all of this?– To gain insight into the setting in which our programs

execute

– To improve the setting in which our programs execute – to improve the performance of the system

What is a Computer?

What is a Computer?● An electronic device which is capable of

receiving information (data) in a particular form and of performing a sequence of operations in accordance with a predetermined but variable set of procedural instructions (program) to produce a result in the form of information or signals.

Basic Computer Organization● Machine instructions

– Description of a primitive operation that a machine hardware is able to understand

– In binary

– Example of a 32b machine language instruction

00110011101100000100001110101011

Basic Computer Organization● Instruction Set

– Complete specification of all the kinds of instructions that the processor hardware was built to execute

– Eg.: ADD, SUB, XOR, JUMP, …

● How are programs written in high level languages such as C translated into a language that the machine understands?

The Computer Program● Description of

algorithms and data structures to achieve a specific objective

● A compiler translates the high level language into assembly language.

● An assembler translates the assembly into machine code.

Basic Computer Organization● Processor – Executes programs● Main Memory – Holds program and data● I/O – For communication and data

ALU REGISTERS

CONTROL

I/O I/O I/O I/O

BUS

MEMORY

Processor (CPU)

Inside the Processor● Control Hardware: Hardware to manage instruction

execution● ALU: Arithmetic and Logical Unit (hardware to do

arithmetic and logic operations)● Registers: Small units of memory to hold

data/instructions temporarily during execution● Memory: Stores information being processed by the

CPU● Input: Allows the user to supply information to the

computer● Output: Allows the user to receive information from the

computer

Physics in the Real World

Computer Architecture

Computer architecture is the design of the

abstraction/implementation layers that allow

us to execute information processing applications

efficiently using manufacturing technologies

Application

Algorithm

Programming Language

Operating System/Virtual Machines

Instruction Set Architecture

Organization/Microarchitecture

Register-Transfer Level

Gates

Circuits

Devices

Physics

Architecture vs. Organization● Architecture/Instruction Set Architecture (ISA)

– Programmer visible state (Memory & Registers)

– Operations (Instructions and how they work)

– Input/Output

– Data Representation – Types/Sizes

● Microarchitecture/Organization:– Is the way a given ISA is implemented on a

processor

Same Architecture, Different Organizations● AMD Athlon II X4

– X86 ISA

– Quad Core, 2.9GHz, 125W

– 3 Instructions/Cycle/Core

– 64KB L1Cache, 512KB L2 Cache

● Intel Atom– X86 Instruction Set

– Single Core, 1.6GHz, 2W

– 2 Instructions/Cycle/Core

– 32KB/24KB L1 I/D Cache, 512KB L2 Cache

Different Architectures, Organizations● AMD Vishera

– X86 ISA

– 8 Core, 4.7 GHz, 125W

– 64KB L1Cache, 2MB L2 Cache, 8MB L3

● IBM POWER 8– Power ISA

– 12 cores, 4.5GHz, 250W

– 64KB L1Cache, 512KB L2 Cache, 8MB L3.

Recap● What is a Computer?● Computer Organization and Architecture

– Registers, Control Unit, ALU, Memory, I/O, Bus

● ISA, Machine language● Organization vs. Architecture

Coming up …● Processor Performance● Machine Models

Concept of Time and Speed

● The period is the duration of one cycle in a repeating event – Period = Cycle time

● Frequency: Number of occurrences of a repeating event per unit time.– SI unit: Hertz (Hz)

Cycle Time=1

Frequency

On Processor Performance

Program ExecutionTime=

ExecutionTime per Instruction×Total Program Instructions

CPU Time=ExecutionTime per Instruction×InstructionCount

ExecutionTime per Instruction=Cycles spent per Instruction×Cycle Time

CPU Time= IC×Cycles per Instruction×CycleTime

What is the execution time of a program containing a millionInstructions each occupying 4 cycles in a 2 GHz processor?What is the execution time of a program containing a millionInstructions each occupying 4 cycles in a 2 GHz processor?

ExampleExample

● How is frequency related to performance?

Iron Law of Processor Performance

CPU Time= IC×Cycles per Instruction×CycleTime

CPU time=InstructionsProgram

∗Clock cyclesInstruction

∗SecondsClock cycle

Time per Cycle=1

Frequency

CPU Time=IC×CPIFrequency

On Processor Performance

CPU time=InstructionsProgram

∗Clock cyclesInstruction

∗SecondsClock cycle

COMPILERARCHITECTURE AND

ORGANIZATION

The GNU C Compiler● $gcc hello.c

The compiler and its working: Guest lecture by Dr. Janakiraman, IBM, August 2

Operations and Operands

● C = A + B● Operation: Addition. Operands: A & B. Result: C.● Instruction: ADD C, A, B

Where do Operands come fromand where do results go?

Architectural decision

Memory – Toy Example

.........

...

.........

...

.........

...

.........

...

.........

● Byte addressable

● Linearly increasing addresses

● Memory is 'growing down'

● Any location can be read

from/written into.

● How many bytes can be stored

in this example memory?

0x0000

0x0100

0x0101

0x0102

0x00FF

0xFFFF

0xFFFE

Recap● Processor performance● Abstract view of Memory

Your desktop has a 4GB Memory. How long (in bits)is its address?Your desktop has a 4GB Memory. How long (in bits)is its address?

ExampleExample

Operations and Operands

ALUControl

i1 i2

... Register File

.........

...Memory

PR

OC

ES

SO

R

Machine Model – Stack

...

TOS

...

...

...

...

...

...

...

...

...

...

STACK● Stack is a form of memory● Top of the Stack (Stack Pointer)● Push and Pop

0x00

0x01

0x02

0xFF

0xFE

Stack

94

71

...

10

TOS

...

...

...

...

...

...

...

...

STACK

0x00

0x01

0x02

0xFF

0xFE

PUSH 10PUSH 12POP 13PUSH 7

0x02TOS

172

44

255............77

0x07

0x10

0x12

0x13

...

...

0x03

0x04

0x05

0x06 ...

MEMORY

Stack

...

...

...

...

...

...

...

...

...

...

...

STACK

0xFF

0xFE


172

44

255............77

0x07

0x10

0x12

0x13

...

...

MEMORY

94

71

10

...

...

0x00

0x01

0x02

0x03

0x04

0x05

0x06

TOS77

TOS

0x02TOS0x03

Stack

...

...

...

...

...

...

...

STACK

0x00

0x01

0x02

0xFF

0xFE


172

44

255............77

0x07

0x10

0x12

0x13

...

...

MEMORY

TOS0x03

44TOS770x03

0x04

0x05

0x06

94

71

10 0x04

Stack

44

...

TOS

77

...

...

...

...

...

...

STACK

0xFF

0xFE


TOS

172

44

255............77

0x07

0x10

0x12

0x13

...

...

MEMORY

0x00

0x01

0x02

0x03

0x04

0x05

0x06

94

71

10

44

TOS0x040x03

Stack

255

...

TOS

44

...

...

...

...

...

...

STACK

0xFF

0xFE


44

255............77

0x07

0x10

0x12

0x13

...

...

MEMORY

0x00

0x01

0x02

0x03

0x04

0x05

0x06

94

71

10

TOS0x040x04

44


ALU

...

.........

...

TOS

STACK

...

TOS

...

...

...

...

...

...

...

...

...

...

STACK

PR

OC

ES

SO

RM

EM

OR

Y

Where do Operands come fromand where do results go?


ALU

...

.........

...

TOS

STACK

PR

OC

ES

SO

R ● The operands are always TOS, TOS – 1.

● Result always goes into TOS – 1.

● Implicit operands● Instruction: ADD● Example equation: d=(a+b)*c

Postfix Expressions

a + b ab+

(a + b)*c

X*c Xc*

where X = (a + b)

ab+c*

postfix form of (a + b) is ab+

Postfix Expressions

a + (b*c) abc*+

(a + b)* (c - d)

X * (c – d)

where X = (a + b)

X * Y

where Y = (c – d)

XY*

Xcd-*

replace Y with its postfix form

replace X with its postfix form

ab+cd-*(a + b)* (c - d)

Ze*

(((a + b)*c)+d)*e

((X*c)+d)*e

where X = (a + b)

(Y+d)*e

where Y = (X*c)

Z*e

where Z = (Y+d) replace Z with its postfix form

Yd+e*

replace Y with its postfix form

Xc*d+e*

replace X with its postfix form

ab+c*d+e*

Reverse Polish Notation● A way of expressing arithmetic expressions that

avoids the use of brackets.● Evaluated left-to-right. Natural on a stack.● Devised by the Polish philosopher and

mathematician Jan Łukasiewicz (1878-1956)

Infix Notation RPNa+b ab+(a+b)*c ab+c*a+(b*c) abc*+(a+b) * (c-d) ab+cd-*(((a+b)*c)+d)*e ab+c*d+e*

RPN Example

ab+Postfix Form:

Stack

...

...

...

...

...

...

...

...

...

...

...

a

RPN Example

ab+Postfix Form:

Stack

...

...

...

...

...

...

...

...

...

...

...

a

b

RPN Example

ab+

a _ bInfix Form:

Postfix Form:

Stack

...

...

...

...

...

...

...

...

...

...

...

a

b

RPN Example

ab+

a + bInfix Form:

Postfix Form:

Stack

...

...

...

...

...

...

...

...

...

...

...

a

b

RPN Example

ab+

Infix Form:

Postfix Form:

Stack

...

...

...

...

...

...

...

...

...

...

...

a + ba + b

RPN Example

ab+c*Postfix Form:

Stack

...

...

...

...

...

...

...

...

...

...

...

a

RPN Example

ab+c*Postfix Form:

Stack

...

...

...

...

...

...

...

...

...

...

...

a

b

RPN Example

ab+c*

a + bInfix Form:

Postfix Form:

Stack

...

...

...

...

...

...

...

...

...

...

...

a

b

RPN Example

ab+c*

Infix Form:

Postfix Form:

Stack

...

...

...

...

...

...

...

...

...

...

...

a + b

c

RPN Example

ab+c*

Infix Form:

Postfix Form:

Stack

...

...

...

...

...

...

...

...

...

...

...

(a + b) * c (a+b)*c

RPN Example

ab+c*

Infix Form:

Postfix Form:

Stack

...

...

...

...

...

...

...

...

...

...

...

(a+b)*c(a + b) * c

RPN Example

ab*cde/-*Postfix Form: ...

a TOS

...

...

...

...

...

...

...

...

...

...

Stack

RPN Example

ab*cde/-*

Infix Form:

Postfix Form: ...

(a*b)*(c-(d/e))

...

...

...

...

...

...

...

...

...

...

Stack

(a*b)*(c-(d/e))


ALU

...

TOS

STACK

PR

OC

ES

SO

R ● d = (a + b) * c● RPN: d = ab+c*

PUSH aPUSH bADDPUSH cMULTIPLYPOP d

...

...

...

...

...

...Sequence of Instructions

.........

...

b

a

c

d


ALU

...

TOS

STACK

PR

OC

ES

SO

R

● d=(a+b)*c

a

...

...

...

...

...

...


.........

...

b

a

c

d


ALU

...

TOS

STACK

PR

OC

ES

SO

R

● d=(a+b)*c

b

a

...

...

...

...

...

...


.........

...

b

a

c

d


ALU

...

d

c

b............

TOS

STACK

PR

OC

ES

SO

R

● d=(a+b)*c

a

b

a

...

...

...

...

...

...


TOS

b

.........

...

b

a

c

d


...

d

c

b............

STACK

PR

OC

ES

SO

R

● d=(a+b)*c

a

a

...

...

...

...

...

...


TOS

ba

TOS

a+b

.........

...

b

a

c

d


...

STACK

PR

OC

ES

SO

R

● d=(a+b)*c

a + b

...

...

...

...

...

...


TOS

a+b

TOS

.........

...

b

a

c

d


...

STACK

PR

OC

ES

SO

R

● d=(a+b)*c

a + b

...

...

...

...

...

...


cTOS

.........

...

b

a

c

d


...

STACK

PR

OC

ES

SO

R

● d=(a+b)*c

(a+b)*c

...

...

...

...

...

...


TOS

.........

...

b

a

c

d


...

STACK

PR

OC

ES

SO

R

● d=(a+b)*c

...

...

...

...

...

...


TOS

.........

...

b

a

c

d(a+b)*c

Stack based Machines● Burrough's B5000 (1960)● Forth machine● JVM, Intel x87 floating point unit.

Accumulator Based Machine Model

ALU

x............

ACCUMULATOR● One operand is implicit – the

accumulator.● Another operand is brought in

from the memory● The result of an operation is

always stored in the accumulator.

● Instruction: ADD x● Example: d = (a + b) * c


ALU

.........

...

ACCUMULATOR

● d = (a + b) * c

LOAD aADD bMULTIPLY cSTORE d


c

b

a............

a

● d = (a + b) * c


d LOAD: Transfer data from the memory into the processor

Accumulator is the implicit destination for the load operation.


c

b

a...

...

a

● d = (a + b) * c


d

a+b

ba


c

b

a...

...

● d = (a + b) * c


d

a+b(a+b)*c

ca+b


c

b

a...

...

● d = (a + b) * c


d

(a+b)*c

STORE: Transfer data from the processor into the memory.

Destination in memory: dImplicit source: Accumulator

(a+b)*c

Accumulator Based Machines

● IBM 701 (1952)● PDP-8, PDP-12● Intel 4004, 8008, 8080, 8086

… ● Intel x86 processors still use

primary accumulator EAX and secondary accumulator EDX for multiplication and division of large numbers (MUL ECX)

Register–Memory Machine Models

...

...

...

...

...

...

...

...

...

...

...

REGISTER FILE

R31

R30

R0

R1

● Small units of memory to hold data/instructions temporarily during execution

● Each register identified by a number – R0, R1, …, R31

● All the registers make up a Register File


ALU

...

.........

...

...

...

...

...

...

...

...

...

...

...

...

REGISTER FILE

R31

R30

R0

R1

● Register file supplies one operand.

● Memory supplies another.

● Result is stored back in the register file.

● No implicit operands● d = (a + b) * c


ALU

...

LOAD R1, aADD R2, R1, bMULTIPLY R3, R2, cSTORE R3, d

● d = (a + b) * c

c

b

a...

...d


ALU

...


● d = (a + b) * c

c

b

a...

...d

a

Source in Memory: aDestination in Register File: R1


...


● d = (a + b) * c

c

b

a...

...d

a

a+b


...


● d = (a + b) * c

c

b

a...

...d

a

a+b

(a+b)*c


...


● d = (a + b) * c

c

b

a...

...d

a

a+b

(a+b)*c

Source in RF: R3Destination in Memory: d

Register – Register Machine Model

ALU

...

.........

...

● No implicit operands● Both operands are supplied from

the Register file.● Memory is accessed only

through Load and Store instructions.

● d = (a + b) * c

Machine Models – Comparison● Number of explicitly named operands● Number of instructions that can access data

from memory● Code size● Amount of data transferred between memory

and processor● Complexity of hardware● Ease of compilation (ease of generation of

machine code).

Machine Models – Memory Operands

Number ofMemory Addresses

Max. No. of operands allowed

Type of architecture

Examples

0 3 Load-store Alpha, ARM, MIPS, PowerPC, SPARC, SuperH, TM32

1 2 Register – memory IBM 360/370, Intel x86, Motorola 68000, TI TMS320C54x

2 2 Memory – memory VAX

3 3 Memory – memory VAX

Machine Models – Memory OperandsType Advantages Disadvantages

Register-Register(0, 3)

Simple, Fixed length encoding. Simple code generation model. Instructions take similar numbers of clocks to execute.

Higher instruction count than architectures with memory references in instructions. More instructions and lower instruction density lead to larger programs.

Register-Memory(1,2)

Data can be accessed without a separate load. Instruction format easy to encode. Good density.

Source operand is destroyed. Encoding a register number and a memory address in each instruction may restrict the number of registers. Clocks per instruction vary.

Memory-Memory(2,2) or (3,3)

Most compact. Doesn't waste registers for temporaries.

Large variations in instruction size, especially for three-operand instructions. Large variation in work per instruction. Memory accesses create a bottleneck.

C = A + B

ALU

...

............

TOS

STACK

ALU

............

ACCUMULATOR

ALU

...

............

REGISTOR-MEMORY

ALU

...

............

REGISTER-REGISTER

Push APush BAddPop C

Load AAdd BStore C

Load R1, AAdd R3, R1, BStore R3, C

Load R1, ALoad R2, BAdd R3, R1, R2Store R3, C

basavaraj talawar [email protected] · course structure textbooks – j p hayes, computer...

Documents