anshul kumar, cse iitd other architectures & examples multithreaded architectures dataflow...

23
Anshul Kumar, CSE IITD Other Architectures & Other Architectures & Examples Examples Multithreaded architectures Dataflow architectures Multiprocessor examples 1 st May, 2006

Upload: kristopher-lucas

Post on 13-Dec-2015

224 views

Category:

Documents


3 download

TRANSCRIPT

Anshul Kumar, CSE IITD

Other Architectures & ExamplesOther Architectures & ExamplesOther Architectures & ExamplesOther Architectures & Examples

Multithreaded architectures

Dataflow architectures

Multiprocessor examples

1st May, 2006

Anshul Kumar, CSE IITD

Context switchingContext switchingContext switchingContext switching

• Delays and poor resource utilization due to -– Data/control hazards– cache misses– waiting for some event

• Solution – – context switch to another thread

• Context switch mechanism –– operating system - slow– hardware - fast

Anshul Kumar, CSE IITD

Multithreaded architectureMultithreaded architectureMultithreaded architectureMultithreaded architecture

• Hardware context switching• Models

– control flow or hybrid (control flow, data flow)

• Granularity– fine grain or coarse grain

• Memory organization– shared?, distributed?, cache coherent?

• No. of threads– small, medium, large

ILP and MultithreadingILP and MultithreadingILP and MultithreadingILP and MultithreadingILP Coarse MT Fine MT SMT

Hen

ness

y an

d P

atte

rson

Anshul Kumar, CSE IITD

Chip level multithreadingChip level multithreadingChip level multithreadingChip level multithreading

Executing instructions from multiple threads within one processor chip at the same time.

• Multithreading: Interleaved issue of multiple instructions from different threads

• Simultaneous multithreading (SMT): Issue multiple instructions from multiple threads in one cycle.

• Chip-level multiprocessing (CMP or Multicore): integrate two or more superscalar processors into one chip, each execute one thread independently

• Any combination of multithreading/SMT/CMP

Wik

iped

ia

Anshul Kumar, CSE IITD

Historical ExamplesHistorical ExamplesHistorical ExamplesHistorical Examples

Machine Granu- Procs Threads/ Memory Year

larity proc

HEP from fine max 16 8 active shared 1978

Denelcor 64 max centralized

Tera fine max 256 128 distributed 1990

shared

Alewife coarse max 512 1 active CC 1990

(MIT) sparcle 3 loaded

Anshul Kumar, CSE IITD

Modern examplesModern examplesModern examplesModern examples

• Pentium 4 Hyperthreading• MIPS MT 8 cores with 4 threads each

• IBM Power 5 dual core, 2 threads each

• Ultrasparc T1 fine grained multithreading

Anshul Kumar, CSE IITD

HEPHEPHEPHEP

FU1 FU2 FUn

Operandfetch

Matchingunit

Registers

Programmemory

Incrementcontrol

PSWqueue

To/fromdata

memory

SFU

Control loop 8 stage pipelinescheduler function unit

Anshul Kumar, CSE IITD

Control Flow & Data Flow modelsControl Flow & Data Flow modelsControl Flow & Data Flow modelsControl Flow & Data Flow models• Control Flow (von Neumann)

– control flows through a sequence of instructions, branches can alter the flow

– instructions get data from or put data in memory

– explicit parallelism through control operators – fork/join

• Data Flow– instructions are triggered by availability of data– data flows from instruction to instruction– explicit parallelism

Anshul Kumar, CSE IITD

Dataflow ModelDataflow ModelDataflow ModelDataflow Model

- +

*

A B 1

A-B B+1

R=(A-B)*(B+1)

Anshul Kumar, CSE IITD

Dataflow ProgramDataflow ProgramDataflow ProgramDataflow Program

A

B

A-B B+1

R=(A-B)*(B+1)

-

L4/1

+

1L4/2

*

L6/1

-

L2/2L3/1

B

L1:

L2: L3:

L4:

Compute B

Anshul Kumar, CSE IITD

Static Dataflow ArchitectureStatic Dataflow ArchitectureStatic Dataflow ArchitectureStatic Dataflow Architecture

FU1 FU2 FUn

Fetchunit

Updateunit

ActivityStore

Instructionqueue

to/from other PEs

Anshul Kumar, CSE IITD

Tagged-token dataflow architectureTagged-token dataflow architectureTagged-token dataflow architectureTagged-token dataflow architecture

FU1 FU2 FUn

Fetchunit

Formtoken unit

Instruction/data

memory

Tokenqueue

to/from other PEs

Matchingunit

Matchingstore

Anshul Kumar, CSE IITD

UMA ExamplesUMA ExamplesUMA ExamplesUMA Examples

• Earlier approach : Large number of processors (e.g. Denelcor HEP, NYU Ultracomputer)

• Now realized : Good only for small number of processors (e.g. Encore Multimax - 1980’s, SGI Power Challenge - 1990’s)

Anshul Kumar, CSE IITD

SGI Power ChallengeSGI Power ChallengeSGI Power ChallengeSGI Power Challenge

• 18 MIPS R 8000

• 16 GB RAM, 8-way interleaved

• 4 power channel-2, each 320 MB/s (I/O bus)

• Power path-2 : split transaction shared bus (256 bit data, 40 bit address)

• Snoopy cache coherence protocol

Anshul Kumar, CSE IITD

NUMA ExamplesNUMA ExamplesNUMA ExamplesNUMA Examples

• BBN TC2000

• IBM RP3

• Hector

• Cray T3D

Anshul Kumar, CSE IITD

HectorHectorHectorHector

• Hierarchical Structureglobal ring

local rings

stations

Proc module (P+C+M)

I/O module

Anshul Kumar, CSE IITD

HectorHectorHectorHector

local ring

global ring

local ring

station station station

station station station

Procmodule

Procmodule

Procmodule

I/Omodule

Stationcontroller

Station bus

Station

Anshul Kumar, CSE IITD

Cray T3DCray T3DCray T3DCray T3D

• Alpha 21064 Proc Cray Y-MP host

• upto 128 GB memory

• 4x4x4 3D torus - config upto 8x8x8

• 2 PEs in each node

Anshul Kumar, CSE IITD

CC-NUMA examplesCC-NUMA examplesCC-NUMA examplesCC-NUMA examples

Machine Nodes Mem Cache NetWisconsin single proc per col bus snoopybus gridMulticubeAquarius single proc per node snoopy+ bus gridMultimulti directoryStanford cluster per cluster snoopy+ pair ofDash 4 R3000+ directory meshes

FPU on busStanford single proc per node directory 2DFlash T5+magic chip meshConvex hyper node per SCI X barExemplar 8 PA-RISC hyper node (hyper node)

multi rings

Magic chip : memory + I/O + network controller

Anshul Kumar, CSE IITD

COMA examplesCOMA examplesCOMA examplesCOMA examples

• DDM (Data Diffusion Machine)– single bus (split transaction)– can be made hierarchical

• KSR 1– hierarchical rings– distributed directory is a matrix :

rows for pages, columns for caches

Anshul Kumar, CSE IITD

Distr Mem Arch ExamplesDistr Mem Arch ExamplesDistr Mem Arch ExamplesDistr Mem Arch ExamplesMachine Comp. Comm. Vec. Switch Topology

proc proc procnCUBE2 custom custom hyper cubeiPSC2 i386 yes yes hyper cubeIntel i860 i860 custom 2D mesh ParagonGenesis i870 i870 custom 2 level X barManna i860 i860 16x16 X bar hierarch.Parsytec P.PC601 T805 C004 3D meshTranstech i860 T805 C004 variable ParamidIBM SP2 Power2 i860 custom fat treeMeiko SPARC custom Fujitsu custom fat tree C32Parsys T900 T900 C104 hierarch sw SN9800

Anshul Kumar, CSE IITD

ReferencesReferencesReferencesReferences

• D. Sima, T. Fountain, P. Kacsuk, "Advanced Computer Architectures : A Design Space Approach", Addison Wesley, 1997.