yulia newton cs 147, fall 2009 sjsu. what is it? “parallel processing is the ability of an entity...

Yulia NewtonCS 147, Fall 2009SJSU

What is it?“Parallel processing is the ability of an entity

to carry out multiple operations or tasks simultaneously. The term is used in the contexts of both human cognition and machine computation.” - Wikipedia

HumansHuman body is a parallel processing architectureHuman body is the most complex machine known to

man We perform parallel processing all the time

Why?We want to do MOREWe want to do it FASTER

Limitations of Single ProcessorPhysical

HeatElectromagnetic interference

EconomicCost

Parallel ComputingDivide bigger tasks into smaller tasks

Reminds you Divide and Conquer concept?

Perform calculations of those tasks simultaneously (in parallel)

With Single CPUParallel processing is possible with single-

CPU, single-core computersRequires very sophisticated software called

distributed processing softwareMost parallel computing is done with

multiple CPU’s

Speedup from parallelization More processors = faster computing?Speed-up from parallelization is not linear

Can do single task in a unit of time

Can do twice as many tasks in same the amount of time

Can do four times as many tasks in the same amount of time

SpeedupDefinition:

Speedup is how much a parallel algorithm is faster than a corresponding sequential algorithm.

Gene Myron AmdahlNorwegian American computer architect and hi-

tech entrepreneur

Amdahl's LawFormulated in 1960sGives potential speed-up on parallel

computing system

Small portion of components/elements that cannot be parallelized will limit speed-up possible from parallelizable components/elements.

Amdahl's Law (cont’d)

where :S is the overall speed-up of the systemP is the fraction that is parallelizable

Amdahl's Law Alternative Form

wheref – fraction of the program that is enhanceds – speedup of the enhanced portion

Amdahl's Law (cont’d)

ProgramAmdahl's Law Example

Program has four parts

Code 1

Loop 1

Loop 2

Code 2

Part 1 = 11%

Part 2 = 18%

Part 3 = 23%

Part 4 = 48%

Amdahl's Law Example (cont’d)Let’s say:

P1 is not sped up, so S1 = 1 or 100%P2 is sped up 5×, so S2 = 500%P3 is sped up 20×, so S3 = 2000%P4 is sped up 1.6×, so S4 = 160%

Amdahl's Law Example (cont’d)Using P1/S1 + P2/S2 + P3/S3 + P4/S4:

Amdahl's Law Example (cont’d)0.4575 is a little less than ½ of the original

running time, which is 1Overall speed boost is 1 / 0.4575 = 2.186 or a

little more than double the original speedNotice how the 20× and 5× speedup don't

have much effect on the overall speed boost and running time when 11% is not sped up, and 48% is sped up by 1.6×

Amdahl's Law LimitationsBased on fixed work load or fixed problem

size (strong or hard scaling)Implies that machine size is irrelevant

John L. GustafsonAmerican computer scientist and

businessman

Gustafson's LawFormulated in 1988Closely related to Amdahl’s LawAddresses the shortcomings of Amdahl's law,

which cannot scale to match availability of computing power as the machine size increases

Gustafson's Law (cont’d)

where P is the number of processors, S is the speedup, and α the non-parallelizable part of the process

Gustafson's Law (cont’d)Proposes a fixed time concept which leads to

scaled speed up for larger problem sizes Uses weak or soft scaling

Gustafson's Law LimitationsSome problems do not have fundamentally

larger datasetsExample: processing one data point per world

citizen gets larger at only a few percent per year

Nonlinear algorithms may make it hard to take advantage of parallelism "exposed" by Gustafson's law

Parallel ArchitecturesSuperscalar VLIWVector ProcessorsInterconnection NetworksShared Memory MultiprocessorsDistributed ComputingDataflow ComputingNeural NetworksSystolic Arrays

Superscalar ArchitectureImplements instruction-level parallelism

(multiple instructions are executed simultaneously in each cycle)

Net effect is the same as pipeliningAdditional hardware is requiredContains specialized instruction fetch unit

(retrieves multiple instructions simultaneously from memory)

Relies on both hardware and compiler

Superscalar ArchitectureAnalogous to adding a lane to a highway –

more physical resources are required but in the end more cars can pass through

Superscalar ArchitectureExamples:

Pentium x86 processorsIntel i960CAAMD 29000-series

VLIW ArchitectureStands for Very Long Instruction WordSimilar to Superscalar architectureRelies entirely on compilerPack independent instructions into one long

instructionBigger size of compiled code

VLIW ArchitectureExamples:

Intel’s Itanium, IA-64Floating Point Systems' FPS164

Vector ProcessorsSupercomputers are vector processor

systemsSpecialized, heavily pipelined processorsEfficient operations on entire vectors and

matrices

Vector ProcessorsHeavily pipelined so that arithmetic operations

can be overlapped

Vector Processors (example)Instructions on a traditional processor:

For I = 0 to VectorLengthV3[i] = V1[i] + V2[i]

Instructions on a vector processor:LDV V1, R1LDV V2, R2ADDV R3, R1, R2STV R3, V3

Vector ProcessorsApplications:

Weather forecastingMedical diagnoses systemsImage processing

Vector ProcessorsExamples:

Cray series of cupercomputersXtrillion 3.0

Interconnection NetworksMIMD (Multiple instruction stream, multiple

data stream) type architectureEach processor has its own memoryProcessors are allowed to access other

processors’ memory via network

Interconnection NetworksCan be:

Dynamic – allow paths between two components to change from one communication to another

Static – do not allow a change of path between communications

Interconnection NetworksCan be:

Non-blocking – allow new connections in the presence of another simultaneous connections

Blocking – do not allow simultaneous connections

Interconnection NetworksSample interconnection networks:

Shared Memory ArchitectureMIMD (Multiple instruction stream, multiple

data stream) type architectureSingle memory pool accessed by all processorsCan be categorized by how the memory access

is performed:UMA (Uniform Memory Access) – all processors

have equal access to entire memoryNUMA (Non-Uniform Memory Access) – each

processor has access to its own piece of memory

Shared Memory ArchitectureDiagram of a Shared Memory system:

Distributed ComputingVery loosely coupled multicomputer system,

connected by buses or through a networkMain advantage is costMain disadvantage is slow communication

due to the physical distance between computing units

Distributed ComputingExample:

SETI (Search for Extra Terrestrial Intelligence) group at UC Berkley analyzes data from radio-telescopes

To help this project PC users can install SETI screen saver on their home computer

Data will be analyzed during the processor’s idle time

Half a million years of CPU time accumulated in about 18 months!

Data Flow ComputingVon Neumann machines exhibit sequential

control flow; data and instructions are segregated

Dataflow computing – control of the program is directly tied to the data

Order of the instructions does not matterEach instruction is considered a separate

processTokens are executed rather than instructions

Data Flow ComputingData Flow Graph for y=(c+3)*f

Data Flow ComputingData flow system diagram:

Neural NetworksGood for massively parallel applications, fault

tolerance and adapting to changing circumstances

Based on parallel architecture of human brain

Composed of large number of simple processing elements

Trainable

Neural NetworksModel human brainPerceptron is the

simplest example

Neural NetworksApplications:

Quality controlFinancial and economic forecastingOil and gas explorationSpeech and pattern recognitionHealth care cost reductionBankruptcy prediction

Systolic ArraysNetworks of processing elements that

rhythmically compute data by circulating it through the system

Variations of SIMD architecture

Systolic ArraysChain of processors in a systolic array

system:

To RecapWe completed overview of the following

parallel processing architectures: Superscalar VLIW Vector Processors Interconnection Networks Shared Memory Multiprocessors Distributed Computing Dataflow Computing Neural Networks Systolic Arrays

References and SourcesNull, Linda and Julia Lobur. “Computer Orga

nization and Architecture”http://en.wikipedia.orghttp://www.cs.berkeley.edu/~culler/cs258-s9

9http://www.cs.iastate.edu/~prabhuhttp://www.answers.comhttp://www.gigaflop.demon.co.uk

Questions?

Thank you!

yulia newton cs 147, fall 2009 sjsu. what is it? “parallel processing is the ability of an entity...

Documents