explicitly parallel plaorms - leiden...
TRANSCRIPT
![Page 1: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/1.jpg)
ExplicitlyParallelPla.orms
• Explicit Parallelism, Task Parallelism • Mostly in the order of >> 10 • Requires active involvement of the programmer and /
or compiler (no free lunch) • Requires additional program constructs • Requires new programming paradigms
![Page 2: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/2.jpg)
Amdahl’sLawGivenacomputa<onofwhichafrac<onofqcannotbeparallelized.ThenthemaximalspeedupwithPprocessorsislimitedto:
SP = T / ( q T + (1-q) T / P )withTthesequen<al<me.So,with q=0.01 maxspeedup<=100regardlessofP
q=0.05 maxspeedup<=20regardlessofP q=0.10 maxspeedup<=10regardlessofP
![Page 3: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/3.jpg)
Flynn’sTaxonomy
• Processingunitsinparallelcomputerseitheroperateunderthecentralizedcontrolofasinglecontrolunitorworkindependently.
• Ifthereisasinglecontrolunitthatdispatchesthesameinstruc<ontovariousprocessors(thatworkondifferentdata),themodelisreferredtoassingleinstruc<onstream,mul<pledatastream(SIMD).
• Ifeachprocessorhasitsowncontrolcontrolunit,eachprocessorcanexecutedifferentinstruc<onsondifferentdataitems.Thismodeliscalledmul<pleinstruc<onstream,mul<pledatastream(MIMD).
![Page 4: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/4.jpg)
SIMDandMIMDarchitectures
AtypicalSIMDarchitecture(a)andatypicalMIMDarchitecture(b).
![Page 5: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/5.jpg)
SIMDProcessors• Thesameinstruc=onondifferentprocessors(func=onalunits).
Execu<onis=ghtlysynchronized.• SomeoftheearliestparallelcomputerssuchastheIlliacIV,
MPP,DAP,CM-2,andMasParMP-1belongedtothisclassofmachines.
• Variantsofthisconcepthavefounduseinco-processingunitssuchastheMMXunitsinIntelprocessorsandGPU’slikeNVIDIA.
• SIMDreliesontheregularstructureofcomputa=ons(suchasthoseinimageprocessing).
• ItisoWennecessarytoselec<velyturnoffopera<onsoncertaindataitems.Forthisreason,mostSIMDprogrammingparadigmsallowforan``ac=vitymask'',whichdeterminesifaprocessorshouldpar<cipateinacomputa<onornot.
![Page 6: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/6.jpg)
MIMDProcessors• IncontrasttoSIMDprocessors,MIMDprocessorscan
executedifferentprogramsondifferentprocessors.• Avariantofthis,calledsingleprogrammul<pledata
streams(SPMD)executesthesameprogramondifferentprocessors,butallowsfordifferentinstruc<onstobeexecutedoneachprocessor(if/casestmts)
• ItiseasytoseethatSPMDandMIMDarecloselyrelatedintermsofprogrammingflexibilityandunderlyingarchitecturalsupport.
• Examplesofsuchpla.ormsincludecurrentgenera<onSunUltraServers,SGIOriginServers,mul=processorPCs,worksta=onclusters,andtheIBMSP.
![Page 7: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/7.jpg)
SIMD-MIMDComparison• SIMDcomputersrequirelesshardwarethanMIMD
computers(singlecontrolunit).• However,sinceSIMDprocessorsare<ghtlysynchronizedand
thereforespeciallydesigned,theytendtobeexpensiveandhavelongdesigncycles.(NVIDIAformsanexcep<ontothis,WHY?)
• Incontrast,pla.ormssuppor<ngtheMIMD/SPMDparadigmcanbebuiltfrominexpensiveoff-the-shelfcomponentswithrela<velylibleeffortinashortamountof<me.
• Notallapplica=onsarenaturallysuitedtoSIMDprocessors.• MIMD/SPMDpla.ormshaverela<velylargecommunica=on
overhead,thereforeaskforlargegrainparallelism.
![Page 8: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/8.jpg)
Communica<onModelofParallelPla.orms
• Therearetwoprimaryformsofdataexchangebetweenparalleltasks-accessingashareddataspaceandexchangingmessages.
• Pla.ormsthatprovideashareddataspacearecalledshared-address-spacemachinesormul<processors.
• Pla.ormsthatsupportmessagingarealsocalledmessagepassingplaQormsormul<-computers.
![Page 9: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/9.jpg)
Shared-Address-SpacePla.orms
• Part(orall)ofthememoryisaccessibletoallprocessors.
• Processorsinteractbymodifyingdataobjectsstoredinthisshared-address-space.
• Ifthe<metakenbyaprocessortoaccessanymemorywordinthesystemglobalisiden<cal,thepla.ormisclassifiedasauniformmemoryaccessmachine(UMA).Ifthisisnotthecasethenwerefertoanon-uniformmemoryaccessmachine(NUMA).
![Page 10: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/10.jpg)
NUMAandUMAShared-Address-SpacePla.orms
(a) Uniform-memoryaccessshared-address-spacecomputer;(b) Uniform-memory-accessshared-address-spacecomputerwithcaches
andmemories;(c) Non-uniform-memory-accessshared-address-spacecomputerwith
localmemoryonly.
![Page 11: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/11.jpg)
ProgrammingConsequences• IncontrasttoUMApla.orms,NUMAmachinesrequirelocalityfromunderlyingalgorithmsforperformance.
• ProgrammingShared-Address-SpaceplaQormsiseasiersincereadsandwritesareimplicitlyvisibletootherprocessors.
• However,read-writedatatoshareddatamustbecoordinated.
• Cachesinsuchmachinesrequirecoordinatedaccesstomul<plecopies.Thisleadstothecachecoherenceproblem.
• Aweakermodelofthesemachinesprovidesanaddressmap,butnotcoordinatedaccess.Thesemodelsarecallednoncachecoherentsharedaddressspacemachines.
![Page 12: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/12.jpg)
Shared-Address-Spacevs.
SharedMemoryMachines
• WerefertoShared-Address-SpacePla.ormsasaprogrammingabstrac=onandtoSharedMemoryMachinesasaphysicalmachine.
• Itispossibletoprovideasharedaddressspaceusingaphysicallydistributedmemory.
![Page 13: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/13.jpg)
Message-PassingPla.orms
• Thesepla.ormscompriseofasetofprocessorsandtheirown(exclusive)memory.
• Naturallyexamplesareclusteredworksta<onsandnon-shared-address-spacemul<-computers.
• Thesepla.ormsareprogrammedusing(variantsof)sendandreceiveprimi=ves.
• LibrariessuchasMPI(MessagePassingInterface)andPVM(ParallelVirtualMachine)providesuchprimi<ves.OpenMPisanAPIbasedonmul<threading.
![Page 14: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/14.jpg)
MessagePassingvs.
SharedAddressSpacePla.orms
• MessagepassingrequiresliSlehardwaresupport,otherthananetwork.
• SharedaddressspaceplaQormscaneasilyemulatemessagepassing.Thereverseismoredifficulttodo(inanefficientmanner).
![Page 15: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/15.jpg)
Interconnec=onNetworksforParallelComputers
• Interconnec<onnetworkscarrydatabetweenprocessorsandtomemory.
• Interconnectsaremadeofswitchesandlinks(wires,fiber).
• Interconnectsareclassifiedassta=cordynamic.• Sta=cnetworksconsistofpoint-to-pointcommunica<onlinksamongprocessingnodesandarealsoreferredtoasdirectnetworks.
• Dynamicnetworksarebuiltusingswitchesandcommunica<onlinks.Dynamicnetworksarealsoreferredtoasindirectnetworks.
![Page 16: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/16.jpg)
Sta<candDynamicInterconnec<onNetworks
Classifica<onofinterconnec<onnetworks:(a)asta=cnetwork;and(b)adynamicnetwork.
![Page 17: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/17.jpg)
NetworkTopologies
• Avarietyofnetworktopologieshavebeenproposedandimplemented.
• Thesetopologiestradeoffperformanceforcost.
• CommercialmachinesoWenimplementhybridsofmul=pletopologiesforreasonsofpackaging,cost,andavailablecomponents.
![Page 18: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/18.jpg)
NetworkTopologies:Buses
• Someofthesimplestandearliestparallelmachinesusedbuses.
• Allprocessorsaccessacommonbusforexchangingdata.• ThedistancebetweenanytwonodesisO(1)inabus.Thebusalsoprovidesaconvenientbroadcastmedia.
• However,thebandwidthofthesharedbusisamajorboSleneck.
• Typicalbusbasedmachinesarelimitedtodozensofnodes.Sun(Cray)serversandIntelCorebasedshared-busmul=processorsareexamplesofsucharchitectures.
![Page 19: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/19.jpg)
NetworkTopologies:Buses
Bus-basedinterconnects(a)withnolocalcaches;(b)withlocalmemory/caches.
Sincemuchofthedataaccessedbyprocessorsislocaltotheprocessor,alocalmemorycanimprovetheperformance.
![Page 20: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/20.jpg)
NetworkTopologies:Crossbars
Acompletelynon-blockingcrossbarnetworkconnec<ngpprocessorstobmemorybanks.
Acrossbarnetworkusesanp×mgridofswitchestoconnectpinputstomoutputsinanon-blockingmanner.
![Page 21: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/21.jpg)
NetworkTopologies:Crossbars
• ThecostofacrossbarofpprocessorsgrowsasO(p2).
• Thisisgenerallydifficulttoscaleforlargevaluesofp.
• ExamplesofmachinesthatemploycrossbarsincludetheSunUltraHPC10000andtheFujitsuVPP500.
![Page 22: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/22.jpg)
NetworkTopologies:Mul<stageNetworks
• Crossbarshaveexcellentperformancescalabilitybutpoorcostscalability.
• Buseshaveexcellentcostscalability,butpoorperformancescalability.
• Mul=stageinterconnectsstrikeacompromisebetweentheseextremes.
![Page 23: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/23.jpg)
NetworkTopologies:Mul<stageNetworks
Theschema<cofatypicalmul<stageinterconnec<onnetwork.
![Page 24: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/24.jpg)
•
NetworkTopologies:
Mul<stageOmegaNetwork• Oneofthemostcommonlyusedmul<stage
interconnectsistheOmeganetwork.• Thisnetworkconsistsoflog p stages,wherepisthenumberofinputs/outputs.
• Ateachstage,inputiisconnectedtooutputj:
![Page 25: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/25.jpg)
•
NetworkTopologies:OmegaNetwork
EachstageoftheOmeganetworkimplementsaperfectshuffleasfollows:
Aperfectshuffleinterconnec=onforeightinputsandoutputs.
![Page 26: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/26.jpg)
Rou<nginOmegaNetwork
X1X2X3X4 -> X1X2X3Y4 -> X2X3Y4X1 -> X2X3Y4Y1 -> Sw PS Sw PS X3Y4Y1X2 -> X3Y4Y1Y2 -> Y4Y1Y2X3 -> Y4Y1Y2Y3 -> Sw PS Sw Y1Y2Y3Y4
Connec<ngX1X2X3X4toY1Y2Y3Y4
![Page 27: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/27.jpg)
NetworkTopologies:Mul=stageOmegaNetwork
• Theperfectshufflepabernsareconnectedusing2×2switches.
• Theswitchesoperateintwomodes:crossoverorpass-through(switchbitposi<onornot).
Twoswitchingconfigura<onsofthe2×2switch:(a)Pass-through;(b)Cross-over.
![Page 28: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/28.jpg)
• NetworkTopologies:
Mul=stageOmegaNetwork
AcompleteomeganetworkΩ8connec<ng8inputsandeightoutputs.AnomeganetworkΩnhasn/2 * log n switchingnodes(log n stages).
AcompleteOmeganetworkwiththeperfectshuffleinterconnectsandswitchescanbeillustratedasfollows:
![Page 29: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/29.jpg)
NetworkTopologies:theBuSerflyNetwork
Avaria<onofTheOmeganetwork
TwoStages;
![Page 30: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/30.jpg)
InFact:Thefollowingnetworksareequivalent•
![Page 31: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/31.jpg)
•
![Page 32: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/32.jpg)
•
![Page 33: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/33.jpg)
Rela<onshipwithFFT•
![Page 34: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/34.jpg)
Rou<ngProper<es
• Clos/BenesshowedthatRNRN-1canrealizeanypermuta=on.
ProofisbasedonHall’smarriagetheorem:Imaginetwogroups;oneofnmen,andoneofnwomen.Foreachwoman,thereisasubsetofthemen,anyoneofwhichshewouldhappilymarry;andanymanwouldbehappytomarryawomanwhowantstomarryhim.Considerwhetheritispossibletopairup(inmarriage)themenandwomensothateverypersonishappy.IfweletAibethesetofmenthatthei-thwomanwouldbehappytomarry,thenthemarriagetheoremstatesthateachwomancanhappilymarryamanifandonlyforanysubsetofthewomen,thenumberofmenwhomatleastoneofthewomenwouldbehappytomarry,beatleastasbigasthenumberofwomeninthatsubset.Itisobviousthatthiscondi<onisnecessary,asifitdoesnothold,therearenotenoughmentoshareamongthewomen.Whatisinteres<ngisthatitisalsoasufficientcondi<on.
• ΩNisequivalentwithRN-1,soΩN
-1ΩNcanalsorealizeanypermuta=on.Non-blocking!!!!!
• ThisisnotthecaseforΩNΩN.• ΩNΩNΩNcanalsorealizeanypermuta=ons.Proofbasedon
coun=ngarguments,actualrou=ngisverycomplicated.
![Page 35: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/35.jpg)
NetworkTopologies:CompletelyConnectedNetwork
• Eachprocessorisconnectedtoeveryotherprocessor.
• ThenumberoflinksinthenetworkscalesasO(p2).
• Whiletheperformancescalesverywell,thehardwareisnotrealizableforlargevaluesofp.
• Inthissense,thesenetworksaresta=ccounterpartsofcrossbars.
![Page 36: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/36.jpg)
• NetworkTopologies:Completely
ConnectedandStarConnectedNetworks
(a)Acompletely-connectednetworkofeightnodes;(b)astarconnectednetworkofninenodes.
![Page 37: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/37.jpg)
NetworkTopologies:StarConnectedNetwork
• Everynodeisconnectedonlytoacommonnodeatthecenter.
• DistancebetweenanypairofnodesisO(1).However,thecentralnodebecomesaboSleneck.
• Inthissense,starconnectednetworksaresta=ccounterpartsofbuses.
![Page 38: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/38.jpg)
• NetworkTopologies:
LinearArrays,Meshes,andk-dMeshes
• Inalineararray,eachnodehastwoneighbors,onetoitsleWandonetoitsright.Ifthenodesateitherendareconnected,werefertoitasa1-Dtorusoraring.
• Ageneraliza<onto2dimensionshasnodeswith4neighbors,tothenorth,south,east,andwest.
• Afurthergeneraliza<ontoddimensionshasnodeswith2dneighbors.
• Aspecialcaseofad-dimensionalmeshisahypercube.Here,d=logp,wherepisthetotalnumberofnodes.
![Page 39: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/39.jpg)
• NetworkTopologies:
Two-andThreeDimensionalMeshes
Twoandthreedimensionalmeshes:(a)2-Dmeshwithnowraparound;(b)2-Dmeshwithwraparoundlink(2-Dtorus);and(c)a3-Dmeshwith
nowraparound.
![Page 40: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/40.jpg)
• NetworkTopologies:
HypercubesandtheirConstruc<on
Construc=onofhypercubesfromhypercubesoflowerdimension.
![Page 41: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/41.jpg)
Proper=esofHypercubes
• Thedistancebetweenanytwonodesisatmostlogp.
• Eachnodehaslogpneighbors.• Thedistancebetweentwonodesisgivenbythenumberofbitposi=onsatwhichthetwonodesdiffer,andthereforeislimitedtologp.
![Page 42: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/42.jpg)
• NetworkTopologies:Tree-BasedNetworks
Completebinarytreenetworks:(a)asta=ctreenetwork;and(b)a
dynamictreenetwork.
![Page 43: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/43.jpg)
NetworkTopologies:TreeProper<es
• Thedistancebetweenanytwonodesisnomorethan2logp.
• Linkshigherupthetreepoten<allycarrymoretrafficthanthoseatthelowerlevels.
• Forthisreason,avariantcalledafat-tree,fabensthelinksaswegoupthetree.
• Treescanbelaidoutin2DwithnowirecrossingsinΩ(√nlogn)spacearea.
![Page 44: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/44.jpg)
• NetworkTopologies:FatTrees
Afattreenetworkof16processingnodes.Bandwidtheach=mesdoubleswhengoinguponelevel.
![Page 45: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/45.jpg)
Evalua=ngSta<cInterconnec<onNetworks
• Diameter:Thedistancebetweenthefarthesttwonodesinthenetwork.Thediameterofalineararrayisp−1,thatofameshis2(−1),thatofatreeandhypercubeislogp,andthatofacompletelyconnectednetworkisO(1).
• Bisec6onWidth:Theminimumnumberofwiresyoumustcuttodividethenetworkintotwoequalparts.Thebisec<onwidthofalineararrayandtreeis1,thatofameshis,thatofahypercubeisp/2andthatofacompletelyconnectednetworkisp2/4.
• Arcconnec6vity:Theminimumnumberofedges(arcs)thatneedtoberemovedtomakethegraphdisconnected.
• Vertexconnec6vity:Theminimumnumberofver=ces(nodes)thatneedtoberemovedtomakethegraphdisconnected.
• Cost:Thenumberoflinksorswitches(whicheverisasympto<callyhigher)isameaningfulmeasureofthecost.However,anumberofotherfactors,suchastheabilitytolayoutthenetwork,thelengthofwires,etc.,alsofactorintothecost.
![Page 46: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/46.jpg)
• Evalua<ng
Sta=cInterconnec<onNetworks
Network Diameter BisectionWidth
Arc Connectivity
Cost (No. of links)
Completely-connected
Star
Complete binary tree
Linear array
2-D mesh, no wraparound
2-D wraparound mesh
Hypercube
Wraparound k-ary d-cube
![Page 47: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/47.jpg)
•
Evalua<ngDynamicInterconnec<onNetworks
Network Diameter Bisection
Width Arc Connectivity
Cost (No. of links)
Crossbar
Omega Network
Dynamic Tree
![Page 48: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/48.jpg)
MessagePassingCostsinParallelComputers
• Thetotal<metotransferamessageoveranetworkcomprisesofthefollowing:– Startup@me(ts):Timespentatsendingandreceivingnodes(execu<ngtherou<ngalgorithm,programmingrouters,etc.).
– Per-hop@me(th):This<meisafunc<onofnumberofhopsandincludesfactorssuchasswitchlatencies,networkdelays,etc.
– Per-wordtransfer@me(tw):This<meincludesalloverheadsthataredeterminedbythelengthofthemessage.Thisincludesbandwidthoflinks,errorcheckingandcorrec<on,etc.
![Page 49: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/49.jpg)
Store-and-ForwardRou<ng• Amessagetraversingmul<plehopsiscompletelyreceivedatanintermediatehopbeforebeingforwardedtothenexthop.
• Thetotalcommunica<oncostforamessageofsizemwordstotraverselcommunica<onlinksis
• Inmostpla.orms,thissmallandtheaboveexpressioncanbeapproximatedby
![Page 50: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/50.jpg)
PacketRou<ng• Store-and-forwardmakespooruseofcommunica<onresources.
• Packetrou<ngbreaksmessagesintopacketsandpipelinesthemthroughthenetwork.
• Sincepacketsmaytakedifferentpaths,eachpacketmustcarryrou<nginforma<on,errorchecking,sequencing,andotherrelatedheaderinforma<on.
• Thetotalcommunica<on<meforpacketrou<ngisapproximatedby:
• Thefactortwaccountsforoverheadsinpacketheaders.
![Page 51: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/51.jpg)
Cut-ThroughRou<ng• Takestheconceptofpacketrou<ngtoanextremebyfurtherdividingmessagesintobasicunitscalledflits.
• Sinceflitsaretypicallysmall,theheaderinforma<onmustbeminimized.
• Thisisdonebyforcingallflitstotakethesamepath,insequence.
• Atracermessagefirstprogramsallintermediaterouters.Allflitsthentakethesameroute.
• Errorchecksareperformedontheen<remessage,asopposedtoflits.
• Nosequencenumbersareneeded.
![Page 52: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/52.jpg)
Cut-ThroughRou<ng
• Thetotalcommunica<on<meforcut-throughrou<ngisapproximatedby:
• Thisisiden<caltopacketrou<ng,however,twistypicallymuchsmaller.
![Page 53: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/53.jpg)
Rou<ngMechanismsforInterconnec<onNetworks
Howdoesonecomputetheroutethatamessagetakesfromsourcetodes<na<on?
– Rou<ngmustpreventdeadlocks-forthisreason,weusedimension-orderedore-cuberou<ng.
– Rou<ngmustavoidhot-spots-forthisreason,two-steprou<ngisoWenused.Inthiscase,amessagefromsourcestodes<na<ondisfirstsenttoarandomlychosenintermediateprocessoriandthenforwardedtodes<na<ond.
![Page 54: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/54.jpg)
• CaseStudies:
TheIBMBlue-GeneArchitecture
![Page 55: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/55.jpg)
• CaseStudies:
TheCrayT3EArchitecture
Interconnec<onnetworkoftheCrayT3E:(a)nodearchitecture;(b)networktopology.
![Page 56: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/56.jpg)
• CaseStudies:
TheSGIOrigin3000Architecture
![Page 57: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/57.jpg)
TheCedarArchitecture•
![Page 58: Explicitly Parallel Plaorms - Leiden Universityliacs.leidenuniv.nl/~rietveldkfd/courses/parpro2016/Lecture_3.pdf · SIMD Processors • The same instruc=on on different processors](https://reader034.vdocuments.mx/reader034/viewer/2022042909/5f3a22d536b493389c186f1e/html5/thumbnails/58.jpg)
MasParMP1