15 parallel processing
TRANSCRIPT
-
8/17/2019 15 Parallel Processing
1/36
Chapter 17
Parallel Processing
-
8/17/2019 15 Parallel Processing
2/36
Computer Organizations
-
8/17/2019 15 Parallel Processing
3/36
Multiple Processor Organization
• Single instruction, single data stream – SISD
• Single instruction, multiple data stream – SIMD
• Multiple instruction, single data stream – MISD
• Multiple instruction, multiple data stream- MIMD
-
8/17/2019 15 Parallel Processing
4/36
Single Instruction, Single Data Stream - SISD
• Single processor
• Single instruction stream
• Data stored in single memor
-
8/17/2019 15 Parallel Processing
5/36
Single Instruction, Multiple Data Stream - SIMD
• Single machine instruction
! "ach instruction e#ecuted on di$$erent set o$ data %di$$erent processors
• &um%er o$ processing elements
! Machine controls simultaneous e#ecution– 'oc(step %asis
! "ach processing element has associated data memor
• )pplication* +ector and arra processing
-
8/17/2019 15 Parallel Processing
6/36
Multiple Instruction, Single Data Stream - MISD
• Seuence o$ data
• ransmitted to set o$ processors
• "ach processor e#ecutes di$$erent instructionseuence
• &ot clear i$ it has e.er %een implemented
-
8/17/2019 15 Parallel Processing
7/36
Multiple Instruction, Multiple Data Stream- MIMD
• Set o$ processors
• Simultaneousl e#ecutes di$$erent instruction
seuences
• Di$$erent sets o$ data
• "#amples* SMPs, &/M) sstems, and Clusters
-
8/17/2019 15 Parallel Processing
8/36
a#onom o$ Parallel Processor )rchitectures
-
8/17/2019 15 Parallel Processing
9/36
0loc( Diagram o$ ightl Coupled Multiprocessor
• Processors share memor
• Communicate .ia that shared memor
-
8/17/2019 15 Parallel Processing
10/36
Smmetric Multiprocessor Organization
-
8/17/2019 15 Parallel Processing
11/36
Smmetric Multiprocessors
• ) stand alone computer ith the $olloing
characteristics! o or more similar processors o$ compara%le capacit
! Processors share same memor and I2O
! Processors are connected % a %us or other internal
connection! Memor access time is appro#imatel the same $or each
processor
! )ll processors share access to I2O– "ither through same channels or di$$erent channels gi.ing
paths to same de.ices! )ll processors can per$orm the same $unctions 3hence
smmetric4
! Sstem controlled % integrated operating sstem– pro.iding interaction %eteen processors
– Interaction at 5o%, tas(, $ile and data element le.els
-
8/17/2019 15 Parallel Processing
12/36
I0M z66MultiprocessorStructure
-
8/17/2019 15 Parallel Processing
13/36
SMP )d.antages
• Per$ormance!I$ some or( can %e done in parallel
• ).aila%ilit!Since all processors can per$orm the same
$unctions, $ailure o$ a single processor does nothalt the sstem
• Incremental groth!/ser can enhance per$ormance % adding
additional processors• Scaling
!+endors can o$$er range o$ products %ased onnum%er o$ processors
-
8/17/2019 15 Parallel Processing
14/36
Cache Coherence Pro%lems
Popular solution - Snoop Protocol
• Distri%ute cache coherence responsi%ilit among
cache controllers
• Cache recognizes that a line is shared
• /pdates announced to other caches
-
8/17/2019 15 Parallel Processing
15/36
'oosel Coupled - Clusters
• Collection o$ independent hole uniprocessors or SMPs! /suall called nodes
• Interconnected to $orm a cluster
• 8or(ing together as uni$ied resource! Illusion o$ %eing one machine
• Communication .ia $i#ed path or netor( connections
-
8/17/2019 15 Parallel Processing
16/36
Cluster Con$igurations
-
8/17/2019 15 Parallel Processing
17/36
Cluster 0ene$its
• )%solute scala%ilit
• Incremental scala%ilit
• 9igh a.aila%ilit
• Superior price2per$ormance
-
8/17/2019 15 Parallel Processing
18/36
Cluster Computer )rchitecture
-
8/17/2019 15 Parallel Processing
19/36
Cluster .: SMP
• 0oth pro.ide multiprocessor support to highdemand applications:
• 0oth a.aila%le commerciall• SMP*
!"asier to manage and control!Closer to single processor sstems– Scheduling is main di$$erence– 'ess phsical space– 'oer poer consumption
• Clustering*!Superior incremental ; a%solute scala%ilit!'ess cost!Superior a.aila%ilit
–
-
8/17/2019 15 Parallel Processing
20/36
&onuni$orm Memor )ccess 3&/M)43ightl coupled4
• )lternati.e to SMP ; Clusters
• &onuni$orm memor access! )ll processors ha.e access to all parts o$ memor
– /sing load ; store! )ccess time o$ processor di$$ers depending on region o$ memor– Di$$erent processors access di$$erent regions o$ memor at di$$erent
speeds
• Cache coherent &/M) =! Cache coherence is maintained among the caches o$ the .arious
processors! Signi$icantl di$$erent $rom SMP and Clusters
-
8/17/2019 15 Parallel Processing
21/36
Moti.ation
• SMP has practical limit to num%er o$processors!0us tra$$ic limits to %eteen 1> and >? processors
• In clusters each node has on memor!)pps do not see large glo%al memor!Coherence maintained % so$tare not hardare
• &/M) retains SMP $la.our hile gi.ing largescale multiprocessing
• O%5ecti.e is to maintain transparent sstemide memor hile permitting multiprocessornodes, each ith on %us or internal
interconnection sstem
-
8/17/2019 15 Parallel Processing
22/36
CC-&/M) Organization
-
8/17/2019 15 Parallel Processing
23/36
&/M) Pros ; Cons
• Possi%l e$$ecti.e per$ormance at higherle.els o$ parallelism than one SMP
• &ot .er supporti.e o$ so$tare changes
• Per$ormance can %rea(don i$ too much
access to remote memor!Can %e a.oided %*
– '1 ; '@ cache design reducing all memor accessA &eed good temporal localit o$ so$tare
• &ot transparent!Page allocation, process allocation and load
%alancing changes can %e di$$icult
• ).aila%ilit=
-
8/17/2019 15 Parallel Processing
24/36
Multithreading
• Instruction stream di.ided into smaller streams3threads4
• "#ecuted in parallel
• here are a ide .ariet o$ multithreading designs
-
8/17/2019 15 Parallel Processing
25/36
De$initions o$ hreads and Processes
• hreads in multithreaded processors ma or ma not
%e same as so$tare threads
• Process*! )n instance o$ program running on computer
• hread* dispatcha%le unit o$ or( ithin process! Includes processor conte#t 3hich includes the program
counter and stac( pointer4 and data area $or stac(! hread e#ecutes seuentiall
! Interrupti%le* processor can turn to another thread
• hread sitch! Sitching processor %eteen threads ithin same process! picall less costl than process sitch
-
8/17/2019 15 Parallel Processing
26/36
Implicit and "#plicit Multithreading
• )ll commercial processors and moste#perimental ones use e#plicit multithreading!Concurrentl e#ecute instructions $rom di$$erent
e#plicit threads
!Interlea.e instructions $rom di$$erent threads onshared pipelines or parallel e#ecution on parallelpipelines
• Implicit multithreading is concurrent
e#ecution o$ multiple threads e#tracted $romsingle seuential program!Implicit threads de$ined staticall % compiler or
dnamicall % hardare
-
8/17/2019 15 Parallel Processing
27/36
)pproaches to "#plicit Multithreading
• Interlea.ed
! Bine-grained! Processor deals ith to or more thread conte#ts at a time! Sitching thread at each cloc( ccle! I$ thread is %loc(ed it is s(ipped
• 0loc(ed
! Coarse-grained! hread e#ecuted until e.ent causes dela! ":g: cache miss! "$$ecti.e on in-order processor! ).oids pipeline stall
• Simultaneous 3SM4! Instructions simultaneousl issued $rom multiple threads toe#ecution units o$ superscalar processor
• Chip multiprocessing! Processor is replicated on a single chip! "ach processor handles separate threads
-
8/17/2019 15 Parallel Processing
28/36
Scalar Processor )pproaches
• Single-threaded scalar!Simple pipeline!&o multithreading
• Interlea.ed multithreaded scalar
!"asiest multithreading to implement
!Sitch threads at each cloc( ccle!Pipeline stages (ept close to $ull occupied!9ardare needs to sitch thread conte#t %eteen
ccles
• 0loc(ed multithreaded scalar!hread e#ecuted until latenc e.ent occurs!8ould stop pipeline
!Processor sitches to another thread
-
8/17/2019 15 Parallel Processing
29/36
Scalar Diagrams
-
8/17/2019 15 Parallel Processing
30/36
Multiple Instruction Issue Processors
• Superscalar
! &o multithreading
• Interlea.ed multithreading superscalar*! "ach ccle, as man instructions as possi%le issued $rom
single thread
! Delas due to thread sitches eliminated! &um%er o$ instructions issued in ccle limited %
dependencies
• 0loc(ed multithreaded superscalar! Instructions $rom one thread
! 0loc(ed multithreading used
-
8/17/2019 15 Parallel Processing
31/36
Multiple Instruction Issue Diagram
-
8/17/2019 15 Parallel Processing
32/36
Multiple Instruction Issue Processors
• +er long instruction ord 3+'I84!":g: I)->?
!Multiple instructions in single ord
!picall constructed % compiler
!Operations ma %e e#ecuted in parallel in same ord!Ma pad ith no-ops
• Interlea.ed multithreading +'I8!Similar e$$iciencies to interlea.ed multithreading on
superscalar architecture• 0loc(ed multithreaded +'I8
!Similar e$$iciencies to %loc(ed multithreading onsuperscalar architecture
-
8/17/2019 15 Parallel Processing
33/36
Multiple Instruction Issue Diagram
Parallel Simultaneous
-
8/17/2019 15 Parallel Processing
34/36
Parallel, Simultaneous"#ecution o$ Multiple hreads
• Simultaneous multithreading!Issue multiple instructions at a time
!One thread ma $ill all horizontal slots
!Instructions $rom to or more threads ma %eissued
!8ith enough threads, can issue ma#imum num%ero$ instructions on each ccle
• Chip multiprocessor
!Multiple processors!"ach has to-issue superscalar processor
!"ach processor is assigned thread– Can issue up to to instructions per ccle per thread
-
8/17/2019 15 Parallel Processing
35/36
Parallel Diagram
-
8/17/2019 15 Parallel Processing
36/36
"#amples
• Some Pentium ? 3single processor4!Intel calls it hperthreading
!SM ith support $or to threads
!Single multithreaded processor, logicall toprocessors
• I0M Poer!9igh-end PoerPC
!Com%ines chip multiprocessing ith SM
!Chip has to separate processors!"ach supporting to threads concurrentl using
SM