concurrency and parallel programming “an introduction”

Post on 14-Jan-2022

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CarlosJaimeBarriosHernandez,PhD. EISIUIS @carlosjaimebh

ConcurrencyandParallelProgramming

ConcurrencyandParallelProgramming“AnIntroduction”

ConcurrentandParallel

Larépétitionsurlascène,1874,EdgarDegas,Paris,Muséed'Orsay.

Plan

•  TheTraditionalWay •  DesignSpacesofParallelProgrammingRecall •  ConcurrentProgramming •  DistributedMemoryVs.SharedMemory •  DesignModelsforConcurrentAlgorithms

•  TaskDecomposition •  DataDecomposition

•  ConcurrentAlgorithmDesignFeaturesandForces •  NotParallelizableJobs,TasksandAlgorithms •  AlgorithmStructures •  FinalNotes

TraditionalWay

DesigningandBuildingParallelPrograms,byIanFosterinhttp://www.mcs.anl.gov/~itf/dbpp/

DesignSpacesofParallelProgramming*

•  PatternsforParallelProgramming,TimotyMattson,BeverlyA.SandersandBernaL.Massingill, SoftwarePatternSeries,Addison-Wesley2004

FC

• FindingConcurrency(StructuringProblemtoexposeexploitableconcurrency)

AS

• AlgorithmStructure(StructureAlgorithmtotakeadvantageofConcurrency)

SS

• SupportingStructures(InterfacesbetweenAlgorithmsandEnvironments)

IM

•  ImplementationMechanisms(DefineProgrammingEnvironments)

(Remember)ConcurrencyandParallelism

•  Asystemis“concurrent”ifitcansupporttwoormoreactionsinprogressatthesametime

•  Asystemis“parallel”ifitcansupporttwoormoreactionsexecutingsimultaneously

ConcurrentProgrammingisallaboutindependentcomputationsthatthemachinecanexecuteinanyorder.

ConcurretVs.Parallel

DistributedVs.Parallel

ConcurrentProgrammingGeneralSteps 1.   Analysis

•  IdentifyPossibleConcurrency •  Hotspot:Anypartitionofthecodethathasasignificantamount

ofactivity •  Timespent,Independenceofthecode…

2.   DesignandImplementation •  Threadingthealgorithm

3.   TestsofCorrectness •  DetectingandFixingThreadingErrors

4.   TuneofPerformance •  RemovingPerformanceBottlenecks

•  Logicalerrors,contention,synchronizationerrors,imbalance,excessiveoverhead

•  TuningPerformanceProblemsinthecode(tuningcycles)

DistributedVs.Shared

MemoryProgramming

CommonFeatures •  RedundantWork •  DividingWork •  SharingData(DifferentMethods)

•  Dynamic/StaticAllocationofWork •  Dependingofthenatureofserialalgorithm,resultingconcurrentversion,numberofthreads/processors

OnlytoSharedMemory

•  LocalDeclarationsandThread-LocalStorage

•  MemoryEffects: •  FalseSharing

•  CommunicationinMemory •  MutualExclusion •  Producer/ConsumerModel •  Reader/WriterLocks(InDistributedMemoryisBoss/Worker)

TasksandDataDecomposition

•  TasksDecomposition:TaskParallelism •  DataDecomposition:DataParallelism(GeometricParallelism)

ConcurrentComputationfromSerialCodes

•  SequentialConsistencyProperty:Gettingthesameanswerastheserialcodeonthesameinputdataset,comparingsequenceofexecutioninconcurrentsolutionsoftheconcurrentalgorithms.

in P out

in P out

P

P

SequentialVersion

Parallel/ConcurrentVersion

Tasksmustbeassignedtothreadsforexecution

TaskDecompositionConsiderations

• Whatarethetasksandhowaredefined?

• Whatarethedependenciesbetweentaskandhowcantheybesatisfied?

• Howarethetaskassignedtothreads?

Whatarethetasksandhowaredefined? •  Thereshouldbeatleastasmanytasksastherewillbethreads(orcores) •  Itisalmostalwaysbettertohave(many)moretasksthanthreads.

•  Granularitymustbelargeenoughtooffsettheoverheadthatwillbeneededtomanagethetasksandthreads •  Morecomputation:highergranularity(coarse-grained) •  LessComputation:lowergranularity(fine-grained)

Granularityistheamountofcomputationdonebeforesynchronizationisneeded

TaskGranularity

Core0

overhead

task

overhead

task

overhead

task

Core1 Core2 Core0

overhead

task

Core1 Core3

overhead

task

overhead

task

overhead

task

overhead

task

overhead

task

overhead

task

overhead

task

overhead

task

Fine-graineddecomposition Coarse-graineddecomposition

TaskDependencies

OrderDependency DataDependency

EnchantinglyParallelCode:Codewithoutdependencies

Process1

Process2

Out

in In1 In2

Process1

Process3

Process2

Out1 Out2

Process3

Out

DataDecompositionConsiderations

(GeometricDecomposition)

DataStructuresmustbe(commonly)dividedinarraysorlogicalstructures.

•  Howshouldyoudividethedataintochunks?

•  Howshouldyouensurethatthetasksforeachchunkhaveaccesstoalldatarequiredforupdate?

•  Howarethedatachunksassignedtothreads?

Howshouldyoudividedataintochunks?

Byindividualelements Byrows

Bygroupsofcolumns Byblocks

TheShapeoftheChunk

•  DataDecompositionhaveanadditionaldimension. •  Itdetermineswhattheneighboringchunksareandhowanyexchangeofdatawillbehandledduringthecourseofthechunkcomputations.

2SharedBorders

•  Regularshapes:CommonRegulardataorganizations. •  Irregularshapes:maybenecessaryduetotheirregular

organizationsofthedata.

5SharedBorders

Howshouldyouensurethatthetasksforeachchunkhaveaccesstoalldatarequiredforupdate? •  UsingGhostCells

•  Usingghostcellstoholdcopieddatafromaneighboringchunk.

Originalsplitwithghostcells

Copyingdataintoghostcells

Howarethedatachunks(andtasks)assignedtothreads?

•  DataChunksareassociatedwithtasksandareassignedtothreadsstaticallyordynamically

•  ViaScheduling •  Static:whentheamountofcomputationswithintasksisuniformandpredictable

•  Dynamic:toachieveagoodbalanceduetovariabilityinthecomputationneededbychunk •  Requiremany(more)tasksthanthreads.

ConcurrentDesignModelsFeatures •  Efficiency

•  Concurrentapplicationsmustrunquicklyandmakegooduseofprocessingresources.

•  Simplicity •  Easiertounderstand,develop,debug,verifyandmaintain.

•  Portability •  Intermsofthreadingportability.

•  Scalability •  Itshouldbeeffectiveonawiderangeofnumberofthreadsandcores,andsizesofdatasets.

TasksandDomainDecompositionPatterns

•  TaskDecompositionPattern •  Understandthecomputationallyintensivepartsoftheproblem. •  FindingTasks(asmuch…)

•  Actionsthatarecarriedouttosolvetheproblem •  Actionsaredistinctandrelativelyindependent.

•  DataDecompositionPattern •  Datadecompositionimpliedbytasks. •  FindingDomains:

•  Mostcomputationallyintensivepartoftheproblemisorganizedaroundthemanipulationoflargedatastructure.

•  Similaroperatorsarebeingappliedtodifferentpartsofthedatastructure. •  Insharedmemoryprogrammingenvironments,datadecompositionwillbeimpliedbytaskdecomposition.

GroupandOrderTasksPatterns

•  GroupTasksPattern •  Simplifytheproblemdependencyanalysis

•  Ifagroupoftasksmustworktogetheronadatasharedstructure •  Ifagroupoftasksaredependent

•  OrderTasksPattern •  Findandcorrectlyaccountfordependenciesresultingfromconstraintsontheorderofexecutionofacollectionoftasks. •  Temporaldependencies •  SpecificRequirementsofthetasks

DataSharingPattern

• Datadecompositionmightdefinesomedatathatmustbesharedamongthetasks.

• Datadependenciescanalsooccurwhenonetaskneedsaccesstosomeportionsoftheanothertask’slocaldata. • ReadOnly •  EffectivelyLocal(Accessedbyoneofthetasks) • ReadWrite

•  Accumulative •  Multipleread/SingleWrite

DesignEvaluationPattern

• Productionofanalysisanddecomposition: •  Taskdecompositiontoidentifyconcurrency •  Datadecompositiontoidentifydatalocaltoeachtask •  Groupoftaskandorderofgroupstosatisfytemporalconstraints

•  Dependenciesamongtasks • DesignEvaluation

•  Suitabilityforthetargetplatform •  DesignQuality •  Preparationforthenextphaseofthedesign

NotParallelizableJobs,TasksandAlgorithms • Algorithmswithstate • Recurrences • InductionVariables • Reductions • Loop-carriedDependencies

TheMythicalMan-Month:EssaysonSoftwareEngineering.ByFredBrooks.EdAddison-WesleyProfessional,1995

AlgorithmStructures

• OrganizingbyTasks •  TaskParallelism • DivideandConquer

• OrganizingbyDataDecomposition • GeometricDecomposition • RecursiveData

• OrganizingbyFlowofData • Pipeline •  Event-BasedCoordination

AlgorithmStructureDecisionTree(MajorOrganizingPrinciple)

Start

OrganizeByTasks

Linear

TaskParallelism

Recursive

DivideandConquer

OrganizeByDataDecomposition

Linear

GeometricDecomposition

Recursive

RecursiveData

OrganizeByFlowofData

Linear

Pipeline

Recursive

Event-BasedCoordination

DivideandConquerStrategy

Problem

Subproblem Subproblem Subproblem Subproblem

Subsolution Subsolution Subsolution Subsolution

Subproblem Subproblem

Subsolution Subsolution

Solution

split

split split

Solve Solve Solve Solve

Merge

Merge Merge

DivideandConquerParallelStrategy

split

base-casesolve

base-casesolve

merge

split

base-casesolve

base-casesolve

merge

split

merge Eachdashed-lineboxrepresentsatask

RecursiveDataStrategy

•  Involvesanoperationonarecursivedatastructurethatappearstorequiresequentialprocessing: •  Lists •  Trees •  Graphs

•  RecursiveDatastructureiscompletelydecomposedintoindividualelements.

•  Structureintheformofaloop(top-levelstructure)

•  Simultaneouslyupdatingallelementsofthedatastructure(Synchronization)

•  Examples: •  Partialsumsofalinkedlist.

•  Uses: •  WidelyusedonSIMDplatforms(HPF77)

•  CombinatorialoptimizationProblems.

•  Partialsums •  Listranking •  Eulertoursandeardecomposition •  Findingrootsoftreesinaforestofrooteddirectedtrees.

PipelineStrategy

•  Involvesperformingacalculationonmanysetsofdata,wherethecalculationcanbeviewedintermsofdataflowingthroughasequenceofstages •  InstructionpipelineinmodernCPUs

•  VectorProcessing(Loop-levelpipelining)

•  Algorithm-levelPipelining •  SignalProcessing •  Graphics •  ShellProgramsinUnix

Event-BasedCoordinationStrategy

•  Applicationdecomposedintogroupsofsemi-independenttasksinteractinginanirregularfashion.

•  Interactiondeterminedbyaflowofdatabetweenthegroups,implyingorderingconstraintsbetweenthetasks.

1

2

3

FinalNotes

•  EveryParallelAlgorithminvolvesacollectionoftasksthatcanexecuteconcurrently •  Thekeyisfindingtasks(andcollectthem)

•  Data-baseddecompositionisgoodif: •  Themostcomputationallyintensivepartoftheproblemisorganizedaroundthemanipulationofalargedatasetstructure.

•  Similaroperationsarebeingappliedtodifferentpartsofthedatastructurewithindependency.

•  Howeverthedesiredfeaturesofaconcurrent/parallelprogram(efficiency,simplicity,portabilityandscalability): •  Efficiencyconflictswithportability •  Efficiencyconflictswithsimplicity

•  Thusagoodalgorithmdesignmuststrikeabalancebetweenabstractionandportabilityandsuitabilityforaparticulartargetarchitecture.

RecommendedLectures

•  TheArtofConcurrency“AthreadMonkey’sGuidetoWritingParallelApplications”,byClayBreshears(Ed.OReilly,2009)

• WritingConcurrentSystems.Part1.,byDavidChisnall(InformITAuthor’sBlog:http://www.informit.com/articles/article.aspx?p=1626979)

•  PatternsforParallelProgramming.,byT.Mattson.,B.SandersandB.MassinGill(Ed.AddisonWeslley,2009)WebSite:http://www.cise.ufl.edu/research/ParallelPatterns/

•  DesigningandBuildingParallelPrograms,byIanFosterinhttp://www.mcs.anl.gov/~itf/dbpp/

Class-Delayedwork

•  RevisionofChapter2ofDesigningandBuildingParallelPrograms,byIanFosterinhttp://www.mcs.anl.gov/~itf/dbpp/

•  SolveintheExercisesSectionthe1and2numerals. •  Imagineasolutionforareal-worldhighcomplexproblemtosolveinthecampus(conceptually)

•  Readhttp://www.cs.wisc.edu/multifacet/papers/ieeecomputer08_amdahl_multicore.pdf

top related