advanced computer architectureadiaz/arqcomp/02...cisc, risc, advanced memory systems (caches,...

35
Laboratorio de Tecnologías de Información Arquitectura de Computadoras Organization- 1 Computer Organization Computer Organization Arquitectura de Computadoras Arquitectura de Computadoras Arturo D Arturo D í í az P az P é é rez rez Centro de Investigaci Centro de Investigaci ó ó n y de Estudios Avanzados del IPN n y de Estudios Avanzados del IPN Laboratorio de Tecnolog Laboratorio de Tecnolog í í as de Informaci as de Informaci ó ó n n [email protected] [email protected]

Upload: others

Post on 23-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 1

Computer OrganizationComputer Organization

Arquitectura de ComputadorasArquitectura de ComputadorasArturo DArturo Dííaz Paz Péérezrez

Centro de InvestigaciCentro de Investigacióón y de Estudios Avanzados del IPNn y de Estudios Avanzados del IPNLaboratorio de TecnologLaboratorio de Tecnologíías de Informacias de Informacióónn

[email protected]@cinvestav.mx

Page 2: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 2

Levels of OrganizationLevels of Organization

SPARCstation 20

Processor

Computer

Control

Datapath

Memory Devices

Input

Output

Workstation Design Target:25% of cost on Processor25% of cost on Memory(minimum memory size)Rest on I/O devices,power supplies, box

Page 3: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 3

The SPARCstation 20The SPARCstation 20

MemoryController SIMM Bus

Memory SIMMs

Slot 1MBus

Slot 0MBus

MSBI

Slot 1SBus

Slot 0SBus

Slot 3SBus

Slot 2SBus

MBus

SEC MACIO

Disk

Tape

SCSIBus

SBus

Keyboard

& Mouse

Floppy

Disk

External Bus

SPARCstation 20

Page 4: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 4

The Underlying InterconnectThe Underlying Interconnect

SPARCstation 20

MemoryController

SIMM Bus

MSBI

Processor/Mem Bus:MBus

SEC MACIO

Standard I/O Bus:

Sun’s High Speed I/O Bus:SBus

Low Speed I/O Bus:External Bus

SCSI Bus

Page 5: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 5

Processor and CachesProcessor and Caches

SPARCstation 20

Slot 1MBus

Slot 0MBus

MBus

MBus Module

External Cache

DatapathRegisters

InternalCache Control

Processor

Page 6: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 6

MemoryMemory

SPARCstation 20

MemoryController

Memory SIMM Bus

SIM

M S

lot 0

SIM

M S

lot 1

SIM

M S

lot 2

SIM

M S

lot 3

SIM

M S

lot 4

SIM

M S

lot 5

SIM

M S

lot 6

SIM

M S

lot 7

DRAM SIMM

DRAM

DRAM

DRAM

DRAMDRAMDRAMDRAM

DRAMDRAMDRAM

Page 7: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 7

Input and Output (I/O) DevicesInput and Output (I/O) Devices

SPARCstation 20

Slot 1SBus

Slot 0SBus

Slot 3SBus

Slot 2SBus

SEC MACIO

Disk

Tape

SCSIBus

SBus

Keyboard

& Mouse

Floppy

Disk

External Bus

SCSI Bus: Standard I/O Devices♦

SBus: High Speed I/O Devices

External Bus: Low Speed I/O Device

Page 8: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 8

Standard I/O DevicesStandard I/O Devices

SPARCstation 20

Disk

Tape

SCSIBus

SCSI = Small Computer Systems Interface♦

A standard interface (IBM, Apple, HP, Sun ... etc.)

Computers and I/O devices communicate with each other

The hard disk is one I/O device resides on the SCSI Bus

Page 9: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 9

High Speed I/O DevicesHigh Speed I/O Devices

SPARCstation 20

Slot 1SBus

Slot 0SBus

Slot 3SBus

Slot 2SBus

SBus

SBus

is SUN’s

own high speed I/O bus♦

SS20 has four SBus

slots where we can plug in

I/O devices♦

Example: graphics accelerator, video adaptor, ... etc.

High speed and low speed are relative terms

Page 10: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 10

Slow Speed I/O DevicesSlow Speed I/O Devices

SPARCstation 20

Keyboard

& Mouse

Floppy

Disk

External Bus

The are only four SBus

slots in SS20--”seats”

are expensive

The speed of some I/O devices is limited by human reaction time--very very slow by computer standard

Examples: Keyboard and mouse♦

No reason to use up one of the expensive SBus

slot

Page 11: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 11

SummarySummary

All computers consist of five components■

Processor: (1) datapath

and (2) control

(3) Memory■

(4) Input devices and (5) Output devices

Not all “memory”

are created equally■

Cache: fast (expensive) memory are placed closer to the processor

Main memory: less expensive memory--we can have more

Interfaces are where the problems are -

between functional units and between the computer and the outside world

Need to design against constraints of performance, power, area and cost

Page 12: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 12

Summary: Computer System Summary: Computer System ComponentsComponents

Proc

CachesBusses

Memory

I/O Devices:

Controllers

adapters

DisksDisplaysKeyboards

Networks

All have interfaces & organizations

Page 13: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 13

Processor Architecture ReviewProcessor Architecture Review

Arquitectura de ComputadorasArquitectura de ComputadorasArturo DArturo Dííaz Paz Péérezrez

Centro de InvestigaciCentro de Investigacióón y de Estudios Avanzados del IPNn y de Estudios Avanzados del IPNLaboratorio de TecnologLaboratorio de Tecnologíías de Informacias de Informacióónn

[email protected]@cinvestav.mx

Page 14: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 14

Levels of RepresentationLevels of Representation

High Level Language Program

Assembly Language Program

Machine Language Program

Control Signal Specification

Compiler

Assembler

Machine Interpretation

temp = v[k];v[k] = v[k+1];v[k+1] = temp;

lw

$15,

0($2)lw

$16,

4($2)

sw $16,

0($2)

sw $15,

4($2)

0000 1001 1100 0110 1010 1111 0101 10001010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111

°°

ALUOP[0:3] <= InstReg[9:11] & MASK

Page 15: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 15

Execution CycleExecution Cycle

InstructionFetch

InstructionDecode

OperandFetch

Execute

ResultStore

NextInstruction

Obtain instruction from program storage

Determine required actions and instruction size

Locate and obtain operand data

Compute result value or status

Deposit results in storage for later use

Determine successor instruction

Page 16: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 16

Top 10 80x86 InstructionsTop 10 80x86 Instructions

° Rank instruction Integer Average Percent total executed1 load 22%2 conditional branch 20%3 compare 16%4 store 12%5 add 8%6 and 6%7 sub 5%8 move register-register 4%9 call 1%10 return 1%

Total 96%° Simple instructions dominate instruction frequency

Page 17: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 17

MachineMachine OrganizationOrganization

Page 18: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 18

Basic Basic ProcessorProcessor ArchitectureArchitecture

What is in a microprocessor today ?♦

Integer Unit (was 32 bits, going to 64)■

Register File

ALU■

Logical / Shifts

PC Unit♦

Floating Point Unit (64 bits)■

Register File

Adder / Multiplier / Divide♦

Virtual Memory Support■

TLB

Memory System (split I/D)■

Fast cache memory, and associated controller

Page 19: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 19

BlockBlock DiagramDiagram

ICacheICache ITLBITLB DTLBDTLB DCacheDCache

Integer UnitInteger Unit

Floating Point UnitFloating Point Unit

PC Bus Addr Bus

Data BusInst Bus

Quite Simplified•No external interface

Page 20: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 20

IntegerInteger UnitUnit

Core of the machine

Main

part

is

a 32 bit

(moving

to

64 bit) dataflow

Register

file■

Holds

intermediate

results

Almost

all

machines have

at

least

32 registers■

Some

have

register

windows

(Sparc, 2900)

Multi-ported■

Need

2 read

/ 1 write

for

each

instruction

Bypass

logic

for

pipelinig

Page 21: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 21

IntegerInteger UnitUnit

Execute

Unit■

Shifter

(bits / bytes)

ALU■

Integer

Mult

/ Div

Ld/St

interface■

Address

generation

MDRout, MDRin, Addr

registers

Sequences

instructions

(Program

Counter)■

Needs

an

incrementer

and

adder

Ports

to

transfer

PC to

/ from

registers■

Some

registers

for

holding state

Page 22: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 22

FloatingFloating PointPoint UnitUnit

Usually performs IEEE compatible FPHas lots of hard stuff in itDenom, FP exceptions, rounding modesHardware often only does the common case, trap to software

Register file■

16 to 32 double precision (64bit) registers

Adder■

Often pipelined

Contains large shifters to align numbers as well as an adder♦

Multiplier■

Build tree multipliers / pipelined

Sometimes partial trees and iterate♦

Divider■

Either SRT algorithm, or iterative using the multiplier

Page 23: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 23

Virtual Virtual MemoryMemory

All modern processor use virtual addresses♦

Internal operations generate a virtual address■

Address needs to be mapped to a physical memory location

Mapping is done by the Operating System» Contains protection information too» Allows OS to move virtual memory to disk» Allows OS to run multiple programs on same machine

Problems it causes for the hardware■

Need to translate address before memory fetch

» Need to store all the translation» Translation must be fast

Sometimes the requested address is not in memory■

Sometimes the requested translation in not where you want it

» Both cause a machine exception that need to be handled

Page 24: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 24

Memory TranslationMemory Translation

Translation addresses is usually done using a small cache

Store frequently used translations in a Translation Lookaside

Buffer

Really a translation cache■

Usually pretty small, 64-1K entries

Stores mapping from virtual page # to physical page #■

Page 4K byte and getting larger

New TLB support super-pages (very large pages)

CAM RAM

Virtual Page Physical PageProtection Bits

Page 25: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 25

MemoryMemory TranslationTranslation

Problem

is

what

happens

on

TLB miss ?■

Take

an

exception, or

hardware FSM to

reload

?

CAM RAM

Virtual Page Physical PageProtection Bits

Page 26: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 26

Memory SystemMemory System

Usually multi-level with caches♦

Need memory to keep up with processor■

Can’t use DRAM

Use a fast SRAM to hold working set of program♦

Most accesses to this fast memory

If data is not present, cache misses■

Fetch data from memory (or larger cache)

First level cache often integrated on chip■

Separate I/D caches for more bandwidth

Tag Word0 Word1 Word2 Word3

Cmp Mux

Physical Addr

DataHit ?

Page 27: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 27

Machine PerformanceMachine Performance

Depends on the average time between instruction fetches

= Ninst

* CPI * Tcycle

Indirectly related to how long it takes to complete an instruction■

Can start next instruction before previous one is finished

Relation is set by the amount of ILP and pipeline structure

Also depends on the memory system design■

What percentage of refs. hit in the cache

How long does it take when they miss

Page 28: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 28

PipeliningPipelining

1 2 3 4

1 2 3 41 2 3 4 1 2 3 4

A way of exploiting instruction level parallelism

Page 29: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 29

PipeliningPipelining

Time per instruction■

TPI = CPI * CPU cycle time

Speedup

Requires all stages to be perfectly balanced■

No latch overhead

Real speedup will be less

Not visible to programmer■

That is in the ideal case

Instruction scheduling depends on the pipeline

SpeedupTPITPI Number of pipeline stageswithout pipeline

with pipeline= =

Page 30: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 30

PipelinedPipelined ExecutionExecution

Ideally

we

get

this:

But

in real life

there

are pipeline

hazards:■

Structural

» Some

resource

is

not

available

this

cycle■

Data

» Data needed

has not

been

produced

yet■

Control

» Which

instruction

to

execute

is

not

known

Instruction Number Clock number1 2 3 4 5 6 7 8 9

Instruction i IF ID EX MEM WBInstruction i+1 IF ID EX MEM WBInstruction i+2 IF ID EX MEM WBInstruction i+3 IF ID EX MEM WBInstruction i+4 IF ID EX MEM WB

Page 31: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 31

Modern Processor ArchitectureModern Processor Architecture

R10000 233 Mhz

Page 32: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 32

Today Conventional MicroprocessorsToday Conventional Microprocessors

Instructions sets■

CISC, RISC,

Advanced memory systems■

(caches, memory, virtual memory)

Advanced Instruction Level Parallelism■

(pipelining, superscalar, vectors, VLIW)

Storage systems (I/O)♦

Interconnection Technology

Basic parallel processing■

Double

core

Quad

core

Page 33: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 33

In 10 Years!In 10 Years!

R10000 233 Mhz

Page 34: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 34

Using the SiliconUsing the Silicon

PE PE PE

PE PE PE

M

MPP

More Cache

PE

CISC

PE

M

MMX

FFT VIZ RC5

64-way Superscalar

Vector

PE

M

PE

ReconfigurableProcessor

PE

M

ReconfigurableLogic

Page 35: Advanced Computer Architectureadiaz/ArqComp/02...CISC, RISC, Advanced memory systems (caches, memory, virtual memory) Advanced Instruction Level Parallelism (pipelining, superscalar,

Laboratorio deTecnologías de Información

Arquitectura de Computadoras Organization- 35

SummarySummary

Modern processors have a pipeline architecture♦

All stages in a pipeline must be balanced

Several resources used for different purposes♦

Not all instructions take the same time■

Clocks per instruction

Memory access instruction are among the most frequent (34%)

Main idea to increase performance is to exploit instruction level parallelism

Performance is major concern in designing modern processors since more space is available