cs 147 – parallel processing

Sophia Soohoo

The use of 2 or more central processing units in a single computer system

The CPUS share the other components of a computer Memory Disk System bus

Symmetric More than one computer processor will share

memory capacity and data path protocol Only one copy or the operating system will be

used to initiate all the orders executed by the processors involved in the connection

Each CPU can act independently All CPUs can be equal, or some processors can

be reserved for particular uses Drawback: bottleneck caused by bandwidth of

the memory bus connecting the various processors, the memory, and the disk arrays

Professor at Stanford University Received his PhD from Purdue

University Worked for 10 years in computer

organization and design. Proposed Flynn’s taxonomy in 1966

Single Instruction

Multiple Instruction

Single data

SISD MISD

Multiple data

SIMD MIMD

Flynn’s taxonomy distinguishes multi-processor computer architecture according to how they can be classified along the 2 independent dimensions of instruction and data.

SISD – single instruction single dataMISD – multiple instruction single data

SIMD – single instruction multiple dataMIMD – multiple instruction multiple data

• A serial (non parallel) computer

• Single instruction – only one instruction steam is being acted on by any CPU during any one clock cycle

• Oldest classification• Modern day uses:

• Older mainframes• Minicomputers• Workstations• PCs

A type of parallel computer Single instruction – all CPUs execute the same instruction at any

given clock cycle Multiple data – each CPU can operate on a different data

element Synchronous (lockstep) Since only one instruction is processed at a time, not necessary

for each CPU to fetch and decode the instruction Types: Processor arrays and vector pipelines Uses: Computers with GPUs

Single data stream is fed into CPUs Each CPU operates on the data independently through

independent instruction streams Advantage – redundancy/failsafe; multiple CPUs

perform the same tasks on the same data, which reduces the chance of incorrect results if a single CPU fails

Disadvantage – expensive Uses: array processors

Most common type of parallel computing Multiple instruction – every processor may be

executing a different instruction stream Multiple data – every CPU can work with a

different data stream Execution can be synchronous or asynchronous Examples: super computers, multiprocessor SMP

Model is divided into 3 main types of memory architectures: Shared Memory Distributed Memory Distributed Shared Memory

SISD

SIMD

MISD

MIMD

GMSV

GMMP

DMSV

DMMP

Single data stream

Mult iple data streams

Sin

gle

instr

str

eam

Multip

le in

str

str

eam

s

Flynn’s categories

Jo

hn

son

’s e

xpa

nsi

on

Shared variables

Message passing

Glo

bal

me

mory

D

istr

ibute

d

me

mory

Uniprocessors

Rarely used

Array or vector processors

Mult iproc’s or mult icomputers

Shared-memory mult iprocessors

Rarely used

Distributed shared memory

Distrib-memory mult icomputers

Ability for all processors to access all memory as global address space

Multiple CPUs can operate independently but share same memory resources

Changes in memory location affected by a CPU are visible to all other CPUs

Divided into 2 main classes: UMA NUMA

Uniform Memory Access All CPUs share the physical memory

uniformly Access time is independent of which CPU

makes the request or which memory chip contains the transferred data

Each CPU has a private cache Identical processors Cache coherent – if one processor updates a

location in shared memory, all other process know about the update.

In the UMA memory architecture, all processors access shared memory through a bus (or another type of interconnect)

Used in multiprocessors Provide separate memory for each CPU,

avoiding performance hit when several CPUs attempt to address the same memory Provides a performance benefit over single

shared memory by a factor roughly the number of processors

Memory access time depends on the memory location relative to the processor

Processor can access its own local memory faster that non-local memory

Advantages Global address space provides a user friendly

programming to memory Data sharing between tasks is fast and uniform due to

proximity of memory to CPUs Disadvantages

Lack of scalability between memory and CPUs. Adding more CPUs increases the traffic on shared memory CPU path

Programmer responsibility for synchronization constructs that insure “correct” access to global memory

Expensive to design and produce shared memory machines

Memory access time varies with the location of the data to be accessed. If data resides in local memory, access is fast. If data resides in remote memory, access is slower. The advantage of the NUMA architecture as a hierarchical shared memory scheme is its potential to improve average case access time through the introduction of fast, local memory.

Require a communication network to connect inter-processor memory

CPUs have their own distributed memory Memory in one CPU does not map to another – each

processor sees only its own memory No concept of global address space When processor needs to access data in another CPU, the

programmer must define how and when data is communicated

Shared memory component is usually cache coherent SMP machine

Combination of both shared and distributed memory Distributed memory component is the

networking of multiple SMPs Required to move data from one SMP to another

http://www.networkworld.com/details/550.html?def http://arith.stanford.edu/~flynn/ http://en.wikipedia.org/wiki/Flynn%27s_taxonomy https://computing.llnl.gov/tutorials/parallel_comp/

#Whatis

http://it.toolbox.com/wiki/index.php/NUMA_Architecture

http://www.drdobbs.com/go-parallel/article/showArticle.jhtml?articleID=218401502

http://www.ece.ucsb.edu/~parhami/text_par_proc.htm

http://www.ats.ucla.edu/rct/classes/introtoparallel_files/v3_document.htm

cs 147 – parallel processing

Documents

memory bus

memory location

memory capacity

memory chip

memory resourceschanges

instruction steam

cpuseach cpu

connectioneach cpu