yulia newton cs 147, fall 2009 sjsu. what is it? “parallel processing is the ability of an entity...
TRANSCRIPT
Yulia NewtonCS 147, Fall 2009SJSU
What is it?“Parallel processing is the ability of an entity
to carry out multiple operations or tasks simultaneously. The term is used in the contexts of both human cognition and machine computation.” - Wikipedia
HumansHuman body is a parallel processing architectureHuman body is the most complex machine known to
man We perform parallel processing all the time
Why?We want to do MOREWe want to do it FASTER
Limitations of Single ProcessorPhysical
HeatElectromagnetic interference
EconomicCost
Parallel ComputingDivide bigger tasks into smaller tasks
Reminds you Divide and Conquer concept?
Perform calculations of those tasks simultaneously (in parallel)
With Single CPUParallel processing is possible with single-
CPU, single-core computersRequires very sophisticated software called
distributed processing softwareMost parallel computing is done with
multiple CPU’s
Speedup from parallelization More processors = faster computing?Speed-up from parallelization is not linear
Can do single task in a unit of time
Can do twice as many tasks in same the amount of time
Can do four times as many tasks in the same amount of time
SpeedupDefinition:
Speedup is how much a parallel algorithm is faster than a corresponding sequential algorithm.
Gene Myron AmdahlNorwegian American computer architect and hi-
tech entrepreneur
Amdahl's LawFormulated in 1960sGives potential speed-up on parallel
computing system
Small portion of components/elements that cannot be parallelized will limit speed-up possible from parallelizable components/elements.
Amdahl's Law (cont’d)
where :S is the overall speed-up of the systemP is the fraction that is parallelizable
Amdahl's Law Alternative Form
wheref – fraction of the program that is enhanceds – speedup of the enhanced portion
Amdahl's Law (cont’d)
ProgramAmdahl's Law Example
Program has four parts
Code 1
Loop 1
Loop 2
Code 2
Part 1 = 11%
Part 2 = 18%
Part 3 = 23%
Part 4 = 48%
Amdahl's Law Example (cont’d)Let’s say:
P1 is not sped up, so S1 = 1 or 100%P2 is sped up 5×, so S2 = 500%P3 is sped up 20×, so S3 = 2000%P4 is sped up 1.6×, so S4 = 160%
Amdahl's Law Example (cont’d)Using P1/S1 + P2/S2 + P3/S3 + P4/S4:
Amdahl's Law Example (cont’d)0.4575 is a little less than ½ of the original
running time, which is 1Overall speed boost is 1 / 0.4575 = 2.186 or a
little more than double the original speedNotice how the 20× and 5× speedup don't
have much effect on the overall speed boost and running time when 11% is not sped up, and 48% is sped up by 1.6×
Amdahl's Law LimitationsBased on fixed work load or fixed problem
size (strong or hard scaling)Implies that machine size is irrelevant
John L. GustafsonAmerican computer scientist and
businessman
Gustafson's LawFormulated in 1988Closely related to Amdahl’s LawAddresses the shortcomings of Amdahl's law,
which cannot scale to match availability of computing power as the machine size increases
Gustafson's Law (cont’d)
where P is the number of processors, S is the speedup, and α the non-parallelizable part of the process
Gustafson's Law (cont’d)Proposes a fixed time concept which leads to
scaled speed up for larger problem sizes Uses weak or soft scaling
Gustafson's Law LimitationsSome problems do not have fundamentally
larger datasetsExample: processing one data point per world
citizen gets larger at only a few percent per year
Nonlinear algorithms may make it hard to take advantage of parallelism "exposed" by Gustafson's law
Parallel ArchitecturesSuperscalar VLIWVector ProcessorsInterconnection NetworksShared Memory MultiprocessorsDistributed ComputingDataflow ComputingNeural NetworksSystolic Arrays
Superscalar ArchitectureImplements instruction-level parallelism
(multiple instructions are executed simultaneously in each cycle)
Net effect is the same as pipeliningAdditional hardware is requiredContains specialized instruction fetch unit
(retrieves multiple instructions simultaneously from memory)
Relies on both hardware and compiler
Superscalar ArchitectureAnalogous to adding a lane to a highway –
more physical resources are required but in the end more cars can pass through
Superscalar ArchitectureExamples:
Pentium x86 processorsIntel i960CAAMD 29000-series
VLIW ArchitectureStands for Very Long Instruction WordSimilar to Superscalar architectureRelies entirely on compilerPack independent instructions into one long
instructionBigger size of compiled code
VLIW ArchitectureExamples:
Intel’s Itanium, IA-64Floating Point Systems' FPS164
Vector ProcessorsSupercomputers are vector processor
systemsSpecialized, heavily pipelined processorsEfficient operations on entire vectors and
matrices
Vector ProcessorsHeavily pipelined so that arithmetic operations
can be overlapped
Vector Processors (example)Instructions on a traditional processor:
For I = 0 to VectorLengthV3[i] = V1[i] + V2[i]
Instructions on a vector processor:LDV V1, R1LDV V2, R2ADDV R3, R1, R2STV R3, V3
Vector ProcessorsApplications:
Weather forecastingMedical diagnoses systemsImage processing
Vector ProcessorsExamples:
Cray series of cupercomputersXtrillion 3.0
Interconnection NetworksMIMD (Multiple instruction stream, multiple
data stream) type architectureEach processor has its own memoryProcessors are allowed to access other
processors’ memory via network
Interconnection NetworksCan be:
Dynamic – allow paths between two components to change from one communication to another
Static – do not allow a change of path between communications
Interconnection NetworksCan be:
Non-blocking – allow new connections in the presence of another simultaneous connections
Blocking – do not allow simultaneous connections
Interconnection NetworksSample interconnection networks:
Shared Memory ArchitectureMIMD (Multiple instruction stream, multiple
data stream) type architectureSingle memory pool accessed by all processorsCan be categorized by how the memory access
is performed:UMA (Uniform Memory Access) – all processors
have equal access to entire memoryNUMA (Non-Uniform Memory Access) – each
processor has access to its own piece of memory
Shared Memory ArchitectureDiagram of a Shared Memory system:
Distributed ComputingVery loosely coupled multicomputer system,
connected by buses or through a networkMain advantage is costMain disadvantage is slow communication
due to the physical distance between computing units
Distributed ComputingExample:
SETI (Search for Extra Terrestrial Intelligence) group at UC Berkley analyzes data from radio-telescopes
To help this project PC users can install SETI screen saver on their home computer
Data will be analyzed during the processor’s idle time
Half a million years of CPU time accumulated in about 18 months!
Data Flow ComputingVon Neumann machines exhibit sequential
control flow; data and instructions are segregated
Dataflow computing – control of the program is directly tied to the data
Order of the instructions does not matterEach instruction is considered a separate
processTokens are executed rather than instructions
Data Flow ComputingData Flow Graph for y=(c+3)*f
Data Flow ComputingData flow system diagram:
Neural NetworksGood for massively parallel applications, fault
tolerance and adapting to changing circumstances
Based on parallel architecture of human brain
Composed of large number of simple processing elements
Trainable
Neural NetworksModel human brainPerceptron is the
simplest example
Neural NetworksApplications:
Quality controlFinancial and economic forecastingOil and gas explorationSpeech and pattern recognitionHealth care cost reductionBankruptcy prediction
Systolic ArraysNetworks of processing elements that
rhythmically compute data by circulating it through the system
Variations of SIMD architecture
Systolic ArraysChain of processors in a systolic array
system:
To RecapWe completed overview of the following
parallel processing architectures: Superscalar VLIW Vector Processors Interconnection Networks Shared Memory Multiprocessors Distributed Computing Dataflow Computing Neural Networks Systolic Arrays
References and SourcesNull, Linda and Julia Lobur. “Computer Orga
nization and Architecture”http://en.wikipedia.orghttp://www.cs.berkeley.edu/~culler/cs258-s9
9http://www.cs.iastate.edu/~prabhuhttp://www.answers.comhttp://www.gigaflop.demon.co.uk
Questions?
Thank you!