daa-unit v paralle algorithms and concurrant algorithmsminimum spanning tree: prim's algorithm...
TRANSCRIPT
DAA-Unit VParalle Algorithms andConcurrant Algorithms
By Prof.By Prof. B.A.KhivsaraB.A.KhivsaraAssistant Professor,Assistant Professor,
Department of Computer EngineeringDepartment of Computer EngineeringSNJB’s COESNJB’s COE ChandwadChandwad
OutlineOutlineSequential and Parallel ComputingSequential and Parallel Computing
RAM and PRAM ModelsRAM and PRAM Models
Amdahl’s LawAmdahl’s Law
Brent’s TheoremBrent’s TheoremBrent’s TheoremBrent’s Theorem
Parallel Algorithm Analysis and optimal paralle algorithmsParallel Algorithm Analysis and optimal paralle algorithms
Graph ProblemsGraph Problems
Concurrent AlgorithmsConcurrent Algorithms
Dinning Philosophers ProblemDinning Philosophers Problem
Sequential Vs Parallel ComputingSequential Vs Parallel Computing
Sequential ParallelWhile Sequential programminginvolves a consecutive and orderedexecution of processes one afteranother
Parallel programming involves theconcurrent computation orsimultaneous execution of processesor threads at the same time.
sequential programming, processesare run one after another in asuccession fashion
while in parallel computing, you havemultiple processes execute at thesame time
Simple to implement Complex to implement
Processor Utilization is poor Processor utilization is efficient
OutlineOutlineSequential and Parallel ComputingSequential and Parallel Computing
RAM and PRAM ModelsRAM and PRAM Models
Amdahl’s LawAmdahl’s Law
Brent’s TheoremBrent’s TheoremBrent’s TheoremBrent’s Theorem
Parallel Algorithm Analysis and optimal paralle algorithmsParallel Algorithm Analysis and optimal paralle algorithms
Graph ProblemsGraph Problems
Concurrent AlgorithmsConcurrent Algorithms
Dinning Philosophers ProblemDinning Philosophers Problem
RRandomandom AAccessccess MMachineachine ModelModel(RAM)(RAM)
RAM model of serial computers:• Memory is a sequence of words, each
capable of containing an integer.• Each memory access takes one unit of time• Basic operations (add, multiply, compare)
take one unit time.• Instructions are not modifiable• Read-only input tape, write-only output tape
• Memory is a sequence of words, eachcapable of containing an integer.
• Each memory access takes one unit of time• Basic operations (add, multiply, compare)
take one unit time.• Instructions are not modifiable• Read-only input tape, write-only output tape
RAM Sequential Algorithm StepsRAM Sequential Algorithm Steps
A READREAD phase in which the processor readsdatum from a memory location and copies it intoa register.
A READREAD phase in which the processor readsdatum from a memory location and copies it intoa register.
A COMPUTECOMPUTE phase in which a processorperforms a basic operation on data from one ortwo of its registers.
A COMPUTECOMPUTE phase in which a processorperforms a basic operation on data from one ortwo of its registers.
A COMPUTECOMPUTE phase in which a processorperforms a basic operation on data from one ortwo of its registers.
A COMPUTECOMPUTE phase in which a processorperforms a basic operation on data from one ortwo of its registers.
A WRITEWRITE phase in which the processor copiesthe contents of an internal register into amemory location.
A WRITEWRITE phase in which the processor copiesthe contents of an internal register into amemory location.
Parallel AlgorithmParallel Algorithm
In computer science, a parallelalgorithm, as opposed to atraditional serial algorithm, is analgorithm which can be executed apiece at a time on many differentprocessing devices, and thencombined together again at the endto get the correct result.
In computer science, a parallelalgorithm, as opposed to atraditional serial algorithm, is analgorithm which can be executed apiece at a time on many differentprocessing devices, and thencombined together again at the endto get the correct result.
In computer science, a parallelalgorithm, as opposed to atraditional serial algorithm, is analgorithm which can be executed apiece at a time on many differentprocessing devices, and thencombined together again at the endto get the correct result.
In computer science, a parallelalgorithm, as opposed to atraditional serial algorithm, is analgorithm which can be executed apiece at a time on many differentprocessing devices, and thencombined together again at the endto get the correct result.
Approaches to parallelismApproaches to parallelism
• A thread consists of a single flow of control,a program counter, a call stack, and a smallamount of thread-specific data
• Threads share memory, and communicateby reading and writing to that memory
shared-memoryparallel
processing
shared-memoryparallel
processing
8
shared-memoryparallel
processing
shared-memoryparallel
processing
• A process is a thread that has its ownprivate memory
• Threads send messages to one another.
message-passingparallel
processing
message-passingparallel
processing
PRAMPRAM [[PParallelarallel RRandomandom AAccessccessMMachine]achine]
PRAM composed of:PRAM composed of:
• P processors, each with its own unmodifiable program.• A single shared memory composed of a sequence of
words, each capable of containing an arbitrary integer.• a read-only input tape.• a write-only output tape.
• P processors, each with its own unmodifiable program.• A single shared memory composed of a sequence of
words, each capable of containing an arbitrary integer.• a read-only input tape.• a write-only output tape.
PRAM model is a synchronous, MIMD,shared address space parallel computer.PRAM model is a synchronous, MIMD,shared address space parallel computer.
PRAM model of computation
Shared memory
PRAM model CharacteristicsPRAM model Characteristics
At each unit of time, a processor is eitheractive or idle (depending on id)At each unit of time, a processor is eitheractive or idle (depending on id)
All processors execute same programAll processors execute same programAll processors execute same programAll processors execute same program
At each time step, all processors execute sameinstruction on different data (“data-parallel”)At each time step, all processors execute sameinstruction on different data (“data-parallel”)
Focuses on concurrency onlyFocuses on concurrency only
Variants of PRAM modelVariants of PRAM modelExclusive
WriteConcurrent
Write
ExclusiveRead
EREW ERCW
ConcurrentRead
CREW CRCW
ExclusiveWrite
ConcurrentWrite
ExclusiveRead
EREW ERCW
ConcurrentRead
CREW CRCW
Concurrent (C) means, many processors can do the operationsimultaneously in the same memoryExclusive (E) not concurent
More PRAM taxonomyMore PRAM taxonomy
EREW - exclusive read, exclusive write• A program isn’t allowed to have two processors
access the same memory location at the same time.
CREW - concurrent read, exclusive writeCREW - concurrent read, exclusive write
CRCW - concurrent read, concurrent write• Needs protocol for arbitrating write conflicts
CROW – concurrent read, owner write• Each memory location has an official “owner”
SubSub--variants of CRCWvariants of CRCW
Common CRCWCommon CRCW• CW iff all processors writing same value
Arbitrary CRCWArbitrary CRCWArbitrary CRCWArbitrary CRCW• Arbitrary value of write set stored
Priority CRCWPriority CRCW• Value of min-index processor stored
Combining CRCWCombining CRCW
OutlineOutlineSequential and Parallel ComputingSequential and Parallel Computing
RAM and PRAM ModelsRAM and PRAM Models
Amdahl’s LawAmdahl’s Law
Brent’s TheoremBrent’s TheoremBrent’s TheoremBrent’s Theorem
Parallel Algorithm Analysis and optimal paralle algorithmsParallel Algorithm Analysis and optimal paralle algorithms
Graph ProblemsGraph Problems
Concurrent AlgorithmsConcurrent Algorithms
Dinning Philosophers ProblemDinning Philosophers Problem
Amdahl's lawAmdahl's law
Amdahl's law, also known asAmdahl's argument, is used to findthe maximum expected improvementto an overall system when only part ofthe system is improved. It is oftenused in parallel computing to predictthe theoretical maximum speedupusing multiple processors
Amdahl's law, also known asAmdahl's argument, is used to findthe maximum expected improvementto an overall system when only part ofthe system is improved. It is oftenused in parallel computing to predictthe theoretical maximum speedupusing multiple processors
Amdahl's law, also known asAmdahl's argument, is used to findthe maximum expected improvementto an overall system when only part ofthe system is improved. It is oftenused in parallel computing to predictthe theoretical maximum speedupusing multiple processors
Amdahl's law, also known asAmdahl's argument, is used to findthe maximum expected improvementto an overall system when only part ofthe system is improved. It is oftenused in parallel computing to predictthe theoretical maximum speedupusing multiple processors
Amdahl's lawAmdahl's law
In the case of parallelization, Amdahl's law states that if P isthe proportion of a program that can be made parallel and
(1 − P) is the proportion that cannot be parallelized (serial),then the maximum speedup that can be achieved by using Nprocessors is
In the case of parallelization, Amdahl's law states that if P isthe proportion of a program that can be made parallel and
(1 − P) is the proportion that cannot be parallelized (serial),then the maximum speedup that can be achieved by using Nprocessors is
In the case of parallelization, Amdahl's law states that if P isthe proportion of a program that can be made parallel and
(1 − P) is the proportion that cannot be parallelized (serial),then the maximum speedup that can be achieved by using Nprocessors is
In the case of parallelization, Amdahl's law states that if P isthe proportion of a program that can be made parallel and
(1 − P) is the proportion that cannot be parallelized (serial),then the maximum speedup that can be achieved by using Nprocessors is
S(N) = 1(1 - P) + P/N
S(N) = 1(1 - P) + P/N
As N goes to infinity,the maximum speedup S(N) = 1/(1 - P)As N goes to infinity,the maximum speedup S(N) = 1/(1 - P)
Amdahl's lawAmdahl's law ExampleExample 11 95% of a program’s execution time occurs inside
a loop that can be executed in parallel. What is
the maximum speedup we should expect from a
parallel version of the program executing on 8
CPUs?
18
95% of a program’s execution time occurs inside
a loop that can be executed in parallel. What is
the maximum speedup we should expect from a
parallel version of the program executing on 8
CPUs?
9.58/)05.01(05.0
1
Amdahl's lawAmdahl's law ExampleExample 22 5% of a parallel program’s execution time is spent
within inherently sequential code.The maximum
speedup achievable by this program, regardless
of how many PEs are used, is
19
5% of a parallel program’s execution time is spent
within inherently sequential code.The maximum
speedup achievable by this program, regardless
of how many PEs are used, is
2005.01
/)05.01(05.01lim pp
LimitationsLimitations of Amdahl's lawof Amdahl's law Amdahl's law only applies to cases where the problem
size is fixed.
ExampleExample
If 75% of a process can be parallelized, and there arefour processors, then the possible speedup is
1 / ((1 - 0.75) + 0.75/4) = 2.286 But with 40 processors the speedup is only
1 / ((1 - 0.75) + 0.75/40) = 3.721
This has led to conclude that having lots of processorswon’t help very much
Amdahl's law only applies to cases where the problemsize is fixed.
ExampleExample
If 75% of a process can be parallelized, and there arefour processors, then the possible speedup is
1 / ((1 - 0.75) + 0.75/4) = 2.286 But with 40 processors the speedup is only
1 / ((1 - 0.75) + 0.75/40) = 3.721
This has led to conclude that having lots of processorswon’t help very much
OutlineOutlineSequential and Parallel ComputingSequential and Parallel Computing
RAM and PRAM ModelsRAM and PRAM Models
Amdahl’s LawAmdahl’s Law
Brent’s TheoremBrent’s TheoremBrent’s TheoremBrent’s Theorem
Parallel Algorithm Analysis and optimal paralle algorithmsParallel Algorithm Analysis and optimal paralle algorithms
Graph ProblemsGraph Problems
Concurrent AlgorithmsConcurrent Algorithms
Dinning Philosophers ProblemDinning Philosophers Problem
Brent’s TheoremBrent’s TheoremBrent's theorem specifies that for a sequential algorithm with t timesteps, and a total of m operations, that a run time T is definitelypossible on a shared memory machine (PRAM) with p processors.
Brent's theorem specifies that for a sequential algorithm with t timesteps, and a total of m operations, that a run time T is definitelypossible on a shared memory machine (PRAM) with p processors.
It may be possible to implement this algorithm faster (by schedulinginstruction differently to minimise idle processors, for instance), butit is definitely possible to implement this algorithm in this time givenp processors.
It may be possible to implement this algorithm faster (by schedulinginstruction differently to minimise idle processors, for instance), butit is definitely possible to implement this algorithm in this time givenp processors.
It may be possible to implement this algorithm faster (by schedulinginstruction differently to minimise idle processors, for instance), butit is definitely possible to implement this algorithm in this time givenp processors.
It may be possible to implement this algorithm faster (by schedulinginstruction differently to minimise idle processors, for instance), butit is definitely possible to implement this algorithm in this time givenp processors.
Brent’s Theorem : Given a parallel algorithm A that performs mcomputation steps in time t , then p processors can executealgorithm in A in time
T = t + (m-t)/p
Brent’s Theorem : Given a parallel algorithm A that performs mcomputation steps in time t , then p processors can executealgorithm in A in time
T = t + (m-t)/p
Brent’s TheoremBrent’s Theorem ExampleExampleSay we are summing an array.
for (i=0;i < length(a); i++)sum = sum + a(i);
Say we are summing an array.for (i=0;i < length(a); i++)
sum = sum + a(i);
Using this algorithm, each add operation depends on the result of theprevious one, forming a chain of length n, thus t = n. There are noperations, so m = n.
Using this algorithm, each add operation depends on the result of theprevious one, forming a chain of length n, thus t = n. There are noperations, so m = n.
Using this algorithm, each add operation depends on the result of theprevious one, forming a chain of length n, thus t = n. There are noperations, so m = n.
Using this algorithm, each add operation depends on the result of theprevious one, forming a chain of length n, thus t = n. There are noperations, so m = n.
T = n + 0/p.T = n + 0/p.
so no matter how many processors are available, this algorithm willtake time n.so no matter how many processors are available, this algorithm willtake time n.
Brent’s TheoremBrent’s Theorem ExampleExample
Now consider the adding the array Parallely:sum(a) = ( (A0 + A1) + (A2 + A3) ) + ( (A4 + A5) + (A6 + A7) )Now consider the adding the array Parallely:sum(a) = ( (A0 + A1) + (A2 + A3) ) + ( (A4 + A5) + (A6 + A7) )
We can add A2 to A3 without needing to knowwhat A0 + A1 is.We can add A2 to A3 without needing to knowwhat A0 + A1 is.We can add A2 to A3 without needing to knowwhat A0 + A1 is.We can add A2 to A3 without needing to knowwhat A0 + A1 is.
For an array of length n, the longest chain(s) will be of lengthlog(n). t = log(n). m = n.For an array of length n, the longest chain(s) will be of lengthlog(n). t = log(n). m = n.
T = log(n) + (n - log(n))/p.T = log(n) + (n - log(n))/p.
Brent’s TheoremBrent’s TheoremAnalysis/ImportanceAnalysis/Importance
This tells us many useful things about the algorithm:This tells us many useful things about the algorithm:
If we have n processors, the algorithm can beimplemented in log(n) timeIf we have n processors, the algorithm can beimplemented in log(n) timeIf we have n processors, the algorithm can beimplemented in log(n) time
If we have log(n) processors, the algorithm can beimplemented in 2log(n) time (asymptotically this is thesame as log(n) time)
If we have log(n) processors, the algorithm can beimplemented in 2log(n) time (asymptotically this is thesame as log(n) time)
If we have 1 processor, the algorithm can be implementedin n time.If we have 1 processor, the algorithm can be implementedin n time.
Brent’s TheoremBrent’s TheoremAnalysis/ImportanceAnalysis/Importance
If we consider the amount of work done in each case,with one processor, we do n work, with log(n)processors we do n work, but with n processors we donlog(n) work.
If we consider the amount of work done in each case,with one processor, we do n work, with log(n)processors we do n work, but with n processors we donlog(n) work.
The implementations with 1 or log(n) processors,therefore are cost optimal, while the implementationwith n processors is not.
The implementations with 1 or log(n) processors,therefore are cost optimal, while the implementationwith n processors is not.
The implementations with 1 or log(n) processors,therefore are cost optimal, while the implementationwith n processors is not.
The implementations with 1 or log(n) processors,therefore are cost optimal, while the implementationwith n processors is not.
It is important to remember that Brent's theorem doesnot tell us how to implement any of these algorithms inparallel; it merely tells us what is possible.
It is important to remember that Brent's theorem doesnot tell us how to implement any of these algorithms inparallel; it merely tells us what is possible.
OutlineOutlineSequential and Parallel ComputingSequential and Parallel Computing
RAM and PRAM ModelsRAM and PRAM Models
Amdahl’s LawAmdahl’s Law
Brent’s TheoremBrent’s TheoremBrent’s TheoremBrent’s Theorem
Parallel Algorithm Analysis and optimal parallel algorithmsParallel Algorithm Analysis and optimal parallel algorithms
Graph ProblemsGraph Problems
Concurrent AlgorithmsConcurrent Algorithms
Dinning Philosophers ProblemDinning Philosophers Problem
Efficient and optimal parallelEfficient and optimal parallelalgorithms Analysisalgorithms Analysis
A parallel algorithm is efficient iff it is fast (e.g. polynomial time) and the product of the parallel time and number of processors
is close to the time of at the best know sequentialalgorithm
T sequential (T parallel N processors ) A parallel algorithms is optimal iff this product is
of the same order as the best known sequentialtime
A parallel algorithm is efficient iff it is fast (e.g. polynomial time) and the product of the parallel time and number of processors
is close to the time of at the best know sequentialalgorithm
T sequential (T parallel N processors ) A parallel algorithms is optimal iff this product is
of the same order as the best known sequentialtime
Metrics:Metrics:Speedup , Efficiency and CostSpeedup , Efficiency and Cost Speedup:
S = Time( sequential algorithm-Ts)Time(parallel algorithm- Tp)
Efficiency:E = S / NN is the number of processors
Cost :C= N * Tp
Speedup:S = Time( sequential algorithm-Ts)
Time(parallel algorithm- Tp) Efficiency:
E = S / NN is the number of processors
Cost :C= N * Tp
Metrics:Metrics: AnalysisAnalysisParallel algorithm is cost-optimal:• If parallel cost = sequential time
• Cp = T1• Ep = 100%
Critical when down-scaling:
• If parallel cost = sequential time• Cp = T1• Ep = 100%
Critical when down-scaling:• If parallel implementation may become slower than
sequential• T1 = n3
• Tp = n2.5 when p = n2
• Cp = n4.5
Analysis a parallel algorithmAnalysis a parallel algorithmExampleExample
AddingAdding n numbersn numbers iin paralleln parallel
Analysis a parallel algorithmAnalysis a parallel algorithmExample:Example:Example for 8 numbers:We start with 4 processors and each of them adds 2 items in the first step.Example for 8 numbers:We start with 4 processors and each of them adds 2 items in the first step.
The number of items is halved at every subsequent step. Hence log n steps arerequired for adding n numbers. The processor requirement is O(n) .The number of items is halved at every subsequent step. Hence log n steps arerequired for adding n numbers. The processor requirement is O(n) .
A parallel algorithms is analyzed mainly in terms of its time, processor andwork complexities.A parallel algorithms is analyzed mainly in terms of its time, processor andwork complexities.
32
A parallel algorithms is analyzed mainly in terms of its time, processor andwork complexities.A parallel algorithms is analyzed mainly in terms of its time, processor andwork complexities.
Time complexity T(n) : How many time steps are needed? T(n) = O(log n)Time complexity T(n) : How many time steps are needed? T(n) = O(log n)
Processor complexity P(n) : How many processors are used? P(n) = O(n)Processor complexity P(n) : How many processors are used? P(n) = O(n)
Work complexity W(n) : What is the total work done by all the processors? W(n)= O(n log n)Work complexity W(n) : What is the total work done by all the processors? W(n)= O(n log n)
OutlineOutlineSequential and Parallel ComputingSequential and Parallel Computing
RAM and PRAM ModelsRAM and PRAM Models
Amdahl’s LawAmdahl’s Law
Brent’s TheoremBrent’s TheoremBrent’s TheoremBrent’s Theorem
Parallel Algorithm Analysis and optimal paralle algorithmsParallel Algorithm Analysis and optimal paralle algorithms
Graph ProblemsGraph Problems
Concurrent AlgorithmsConcurrent Algorithms
Dinning Philosophers ProblemDinning Philosophers Problem
Graph Problems: Topic OverviewGraph Problems: Topic Overview
General frameworkGeneral framework
Minimum Spanning Tree: Prim's AlgorithmMinimum Spanning Tree: Prim's AlgorithmMinimum Spanning Tree: Prim's Algorithm
Single-Source Shortest Paths: Dijkstra'sAlgorithmSingle-Source Shortest Paths: Dijkstra'sAlgorithm
Bipartite GraphBipartite Graph
Representation of GraphsRepresentation of Graphs
a) An undirected graph and (b) a directed graph.
Graph ProblemGraph Problem--GeneralGeneralFrameworkFramework Let M be an n*n matrix with nonnegative integer
coefficients. Let be an matrix defines as (i,i) = 0 for every i (i,j) = min {Mi0i1 + Mi1i2 +….. + Mik-1ik } for every i≠j Where i0 = i and ik =j and min is taken over all
sequences i0, i1,… ik
Let M be an n*n matrix with nonnegative integercoefficients. Let be an matrix defines as (i,i) = 0 for every i (i,j) = min {Mi0i1 + Mi1i2 +….. + Mik-1ik } for every i≠j Where i0 = i and ik =j and min is taken over all
sequences i0, i1,… ik
Algorithm forAlgorithm forM[i,j] = M[i,j] for 1<=i , j<=nFor r =1 to logn do{q[i,j,k]=m[i,j]+m[j,k] for 1<= i and j,k <=nIn Parallel setM[i,j]= min { q[i,1,j], q[i,2,j] , …… q[i,n,j] } for 1<=i and
j<=n}Put (i,i)=0 for all i and (i,j) =m[I,j] for i≠j
M[i,j] = M[i,j] for 1<=i , j<=nFor r =1 to logn do{q[i,j,k]=m[i,j]+m[j,k] for 1<= i and j,k <=nIn Parallel setM[i,j]= min { q[i,1,j], q[i,2,j] , …… q[i,n,j] } for 1<=i and
j<=n}Put (i,i)=0 for all i and (i,j) =m[I,j] for i≠j
ExampleExample 1
2
3
6 4
5
0 0 1 1 1 0 0 0 0 0 0 0
M0 0 1 1 1 01 0 0 0 0 11 1 0 1 1 11 1 0 0 1 11 1 1 0 0 00 0 1 0 1 0
0 0 0 0 0 00 0 0 0 0 01 1 0 1 1 11 1 0 0 1 10 0 0 0 0 00 0 0 0 0 0
(1,5) = 0 since M12 + M25=0
(3,1) = 1 since for every choice of i1,i2….ik-1 the sum Mi0i1 + Mi1i2 …. is >0
Graph Problems: Topic OverviewGraph Problems: Topic Overview
General frameworkGeneral framework
Minimum Spanning Tree: Prim's AlgorithmMinimum Spanning Tree: Prim's AlgorithmMinimum Spanning Tree: Prim's Algorithm
Single-Source Shortest Paths: Dijkstra'sAlgorithmSingle-Source Shortest Paths: Dijkstra'sAlgorithm
Bipartite GraphBipartite Graph
Minimum Spanning TreeMinimum Spanning TreeA spanning tree of an undirected graph G is asubgraph of G that is a tree containing all thevertices of G.
A spanning tree of an undirected graph G is asubgraph of G that is a tree containing all thevertices of G.
In a weighted graph, the weight of a subgraph isthe sum of the weights of the edges in thesubgraph.
In a weighted graph, the weight of a subgraph isthe sum of the weights of the edges in thesubgraph.
In a weighted graph, the weight of a subgraph isthe sum of the weights of the edges in thesubgraph.
In a weighted graph, the weight of a subgraph isthe sum of the weights of the edges in thesubgraph.
A minimum spanning tree (MST) for a weightedundirected graph is a spanning tree withminimum weight.
A minimum spanning tree (MST) for a weightedundirected graph is a spanning tree withminimum weight.
Minimum Spanning TreeMinimum Spanning Tree
An undirected graph and its minimum spanning tree.
Minimum Spanning Tree:Minimum Spanning Tree:Prim'sPrim's AlgorithmAlgorithm
Prim's algorithm for finding an MST is a greedyalgorithm.Prim's algorithm for finding an MST is a greedyalgorithm.
Start by selecting an arbitrary vertex, include it intothe current MST.Start by selecting an arbitrary vertex, include it intothe current MST.Start by selecting an arbitrary vertex, include it intothe current MST.Start by selecting an arbitrary vertex, include it intothe current MST.
Grow the current MST by inserting into it thevertex closest to one of the vertices already incurrent MST.
Grow the current MST by inserting into it thevertex closest to one of the vertices already incurrent MST.
Minimum Spanning Tree: Prim's Algorithm
Minimum Spanning Tree:Minimum Spanning Tree: Prim's AlgorithmPrim's Algorithm
Prim's sequentialsequential minimum spanning tree algorithm.
Prim's Algorithm:Prim's Algorithm: Parallel FormulationParallel FormulationThe algorithm works in n outer iterations - it is hard to execute theseiterations concurrently.The algorithm works in n outer iterations - it is hard to execute theseiterations concurrently.
The inner loop is relatively easy to parallelize. Let p be the number ofprocesses, and let n be the number of vertices.The inner loop is relatively easy to parallelize. Let p be the number ofprocesses, and let n be the number of vertices.
The adjacency matrix is partitioned in a 1-D block fashion, withdistance vector d partitioned accordingly.The adjacency matrix is partitioned in a 1-D block fashion, withdistance vector d partitioned accordingly.The adjacency matrix is partitioned in a 1-D block fashion, withdistance vector d partitioned accordingly.The adjacency matrix is partitioned in a 1-D block fashion, withdistance vector d partitioned accordingly.
In each step, a processor selects the locally closest node, followed bya global reduction to select globally closest node.In each step, a processor selects the locally closest node, followed bya global reduction to select globally closest node.
This node is inserted into MST, and the choice broadcast to allprocessors.This node is inserted into MST, and the choice broadcast to allprocessors.
Each processor updates its part of the d vector locally.Each processor updates its part of the d vector locally.
Prim's Algorithm:Prim's Algorithm: Parallel FormulationParallel Formulation
The partitioning of the distance array d and the adjacency matrixA among p processes.
Prim's Algorithm:Prim's Algorithm: Parallel AnalysisParallel Analysis
The cost of a broadcast is O(log p).The cost of a broadcast is O(log p).
The cost of local updation of the d vector is O(n/p).The cost of local updation of the d vector is O(n/p).
The parallel time per iteration is O(n/p + log p).The parallel time per iteration is O(n/p + log p).The parallel time per iteration is O(n/p + log p).
The total parallel time for n vertices with n iteration isO(n2/p + n log p).The total parallel time for n vertices with n iteration isO(n2/p + n log p).
The corresponding efficiency is O(p2log2p).The corresponding efficiency is O(p2log2p).
Graph Problems: Topic OverviewGraph Problems: Topic Overview
General frameworkGeneral framework
Minimum Spanning Tree: Prim's AlgorithmMinimum Spanning Tree: Prim's AlgorithmMinimum Spanning Tree: Prim's Algorithm
Single-Source Shortest Paths: Dijkstra'sAlgorithmSingle-Source Shortest Paths: Dijkstra'sAlgorithm
Bipartite GraphBipartite Graph
SingleSingle--Source Shortest PathsSource Shortest Paths
For a weighted graph G = (V,E,w), the single-sourceshortest paths problem is to find the shortest pathsfrom a vertex v ∈ V to all other vertices in V.
For a weighted graph G = (V,E,w), the single-sourceshortest paths problem is to find the shortest pathsfrom a vertex v ∈ V to all other vertices in V.
Dijkstra's algorithmDijkstra's algorithm is similar to Prim's algorithm. Itmaintains a set of nodes for which the shortest pathsare known.
Dijkstra's algorithmDijkstra's algorithm is similar to Prim's algorithm. Itmaintains a set of nodes for which the shortest pathsare known.
Dijkstra's algorithmDijkstra's algorithm is similar to Prim's algorithm. Itmaintains a set of nodes for which the shortest pathsare known.
Dijkstra's algorithmDijkstra's algorithm is similar to Prim's algorithm. Itmaintains a set of nodes for which the shortest pathsare known.
It grows this set based on the node closest to sourceusing one of the nodes in the current shortest pathset.
It grows this set based on the node closest to sourceusing one of the nodes in the current shortest pathset.
SingleSingle--Source Shortest Paths:Source Shortest Paths:Dijkstra's AlgorithmDijkstra's Algorithm
Dijkstra's sequential single-source shortest pathsalgorithm.
Dijkstra's Algorithm:Dijkstra's Algorithm: Parallel FormulationParallel Formulation
The weighted adjacency matrix is partitioned using the1-D block mapping.The weighted adjacency matrix is partitioned using the1-D block mapping.
Each process selects, locally, the node closest to thesource, followed by a global reduction to select nextnode.
Each process selects, locally, the node closest to thesource, followed by a global reduction to select nextnode.
Each process selects, locally, the node closest to thesource, followed by a global reduction to select nextnode.
Each process selects, locally, the node closest to thesource, followed by a global reduction to select nextnode.
The node is broadcast to all processors and the l-vector updated.The node is broadcast to all processors and the l-vector updated.
The parallel performance of Dijkstra's algorithm isidentical to that of Prim's algorithm.The parallel performance of Dijkstra's algorithm isidentical to that of Prim's algorithm.
Dijkstra'sDijkstra's Algorithm:Algorithm: Parallel AnalysisParallel Analysis
The cost of a broadcast is O(log p).The cost of a broadcast is O(log p).
The cost of local updation of the d vector is O(n/p).The cost of local updation of the d vector is O(n/p).
The parallel time per iteration is O(n/p + log p).The parallel time per iteration is O(n/p + log p).The parallel time per iteration is O(n/p + log p).
The total parallel time for n vertices with n iteration isO(n2/p + n log p).The total parallel time for n vertices with n iteration isO(n2/p + n log p).
The corresponding efficiency is O(p2log2p).The corresponding efficiency is O(p2log2p).
Graph Problems: Topic OverviewGraph Problems: Topic Overview
General frameworkGeneral framework
Minimum Spanning Tree: Prim's AlgorithmMinimum Spanning Tree: Prim's AlgorithmMinimum Spanning Tree: Prim's Algorithm
Single-Source Shortest Paths: Dijkstra'sAlgorithmSingle-Source Shortest Paths: Dijkstra'sAlgorithm
Bipartite GraphBipartite Graph
A Bipartite GraphA Bipartite Graph
It is an undirected graph G=(V, E) in which V can bepartitioned into two (disjoint) sets V1 and V2 such that(u, v) E implies either u V1 and v V2 or vice versa.
It is an undirected graph G=(V, E) in which V can bepartitioned into two (disjoint) sets V1 and V2 such that(u, v) E implies either u V1 and v V2 or vice versa.
That is, all edges go between the two sets V1 and V2 butare not allowed within V1 and V2
That is, all edges go between the two sets V1 and V2 butare not allowed within V1 and V2
A Bipartite Graph ExampleA Bipartite Graph Example
Graph (Bipartite)Graph (Bipartite) MatchingMatching
Graph G = (V,E)Graph G = (V,E)
Graph Matching M is a subset of EGraph Matching M is a subset of EGraph Matching M is a subset of E
• if e1, e2 distinct edges in M• Then they have no vertex in common
Matching in Bipartite graph is a set of edges chosenin such a way that no two edges share an endpoint.Matching in Bipartite graph is a set of edges chosenin such a way that no two edges share an endpoint.
A Bipartite Graph MatchingA Bipartite Graph Matching
1
2
3
45
6
7
89
U V
1
2
3
45
6
7
89
U V
Match M={(1,6),(3,7),(5,9)}
|M|= 3 edges
There is no edge from node 2,4
and 8 in M. so these are free node
(a) Bipartite Graph (b) Bipartite Graph Matching
1
2
3
45
6
7
89
U V
A Bipartite Graph Maximum MatchingMaximum Matching
1
2
3
45
6
7
89
U V
consider free node 4 and add new edge(4,7) to M and edge (3,7) is removed fromM, so the node 3 will be free.
Blue Edge denote original edge in MGreen edge denote removed edge from MRed edge denote newly added edge to M
consider free node and add new edge(3,9) in M and edge (5,9) is removed fromM, so the node 5 will be free.
1
2
3
45
6
7
89
U V
A Bipartite Graph Maximum MatchingMaximum Matching
New free node is 5 , so new edge (5,8) is added in M. Now only node 2 is freebut adding edge (2,6) is not feasible. No node is there in free list so stop.
There is an Alternate path from (4,7), (7,3), (3,9), (9,5), (5,8)Path 4-7-3-9-5-8 is also an Augmented path(P) as it starts andends with free node.New M={(1,6),(3,9),(4,7),(5,8)}|M|= 4 edges
AugmentingAugmenting Path(P)Path(P) in G givenin G givenGraph Matching MGraph Matching M
• An augmenting path p = (e1, e2, …, ek)
1 3 5 k
2 4 k-1
begins and ends atunmatched vertices
requiree , e , e , ..., e E-Me , e , ..., e M
ìïïí Îïï Îî
1 3 5 k
2 4 k-1
begins and ends atunmatched vertices
requiree , e , e , ..., e E-Me , e , ..., e M
ìïïí Îïï Îî
Maximum Matching AlgorithmMaximum Matching Algorithm• Algorithm
graph G (V,E)[1] M {}[2] there exists an augmenting
path p relative to MM M P
[3] maximum matching M
input
while
dooutput
=¬
¬ Å
graph G (V,E)[1] M {}[2] there exists an augmenting
path p relative to MM M P
[3] maximum matching M
input
while
dooutput
=¬
¬ Å
Parallel Algorithm for BipartiteParallel Algorithm for BipartiteGraphGraph
For parallelism, we discover multiple vertex-disjoint augmentingpaths simultaneously.
When searching for vertex-disjoint augmenting paths, we buildalternating forests
In parallel platforms, we can search and built (vertex-disjoint)alternating forest in two ways.
(1) source parallel (DFS style): each processor build onealternating tree in the forest starting from an unmatched vertex.
(2) level parallel (BFS style): every processor together buildone level of BFS forest and then move to the next level.
The first approach employs coarse-grained parallelism becausea processor build the whole tree.
The second approach employs fine-grained parallelism since aprocessor can process a single vertex.
For parallelism, we discover multiple vertex-disjoint augmentingpaths simultaneously.
When searching for vertex-disjoint augmenting paths, we buildalternating forests
In parallel platforms, we can search and built (vertex-disjoint)alternating forest in two ways.
(1) source parallel (DFS style): each processor build onealternating tree in the forest starting from an unmatched vertex.
(2) level parallel (BFS style): every processor together buildone level of BFS forest and then move to the next level.
The first approach employs coarse-grained parallelism becausea processor build the whole tree.
The second approach employs fine-grained parallelism since aprocessor can process a single vertex.
Parallelizing Augmenting Path SearchesParallelizing Augmenting Path Searches
a
b
c
d
e
a’
b’
c’
d’
e’
Source parallel: DFS basedCoarse grained
Level parallel: BFS basedFine grained
e e’
b a’ a b’
e d’ d e’
T1
T2
L0
b a’ a b’
e d’ d e’L1 L2 L3
T1 T2 T1 T2
Parallel Algorithm for BipartiteParallel Algorithm for BipartiteGraphGraph However, note that this searching is not same as
traditional BFS/DFS. For example, it is not a parallel single DFS because of
the following differences : (a) we search form multiple-source, i.e., we run
multiple DFS's simultaneously (b) The paths are vertex-disjoint, need extra check for
that. For example, edges marked with "x" are not partof the forest to maintain vertex-disjointednessproperty
(c) and the paths are alternating, i.e., every other stepis trivial
However, note that this searching is not same astraditional BFS/DFS.
For example, it is not a parallel single DFS because ofthe following differences :
(a) we search form multiple-source, i.e., we runmultiple DFS's simultaneously
(b) The paths are vertex-disjoint, need extra check forthat. For example, edges marked with "x" are not partof the forest to maintain vertex-disjointednessproperty
(c) and the paths are alternating, i.e., every other stepis trivial
Parallelizing Augmenting Path SearchesParallelizing Augmenting Path Searchesa
b
c
d
e
a’
b’
c’
d’
e’
ParallelDFS
ParallelDFS
e e’
b a’ a b’
e d’ d e’
T1
T2
1. Multiple source2. Vertex-disjoint3. Alternating DFS
OutlineOutlineSequential and Parallel ComputingSequential and Parallel Computing
RAM and PRAM ModelsRAM and PRAM Models
Amdahl’s LawAmdahl’s Law
Brent’s TheoremBrent’s TheoremBrent’s TheoremBrent’s Theorem
Parallel Algorithm Analysis and optimal paralle algorithmsParallel Algorithm Analysis and optimal paralle algorithms
Graph ProblemsGraph Problems
Concurrent AlgorithmsConcurrent Algorithms
Dinning Philosophers ProblemDinning Philosophers Problem
ConcurrentConcurrent AlgorithmsAlgorithms
concurrency is a property of systems in whichseveral computations are executing simultaneously,and potentially interacting with each other.
concurrency is a property of systems in whichseveral computations are executing simultaneously,and potentially interacting with each other.
The computations may be executing on multiplecores in the same chip, preemptively time-sharedthreads on the same processor
The computations may be executing on multiplecores in the same chip, preemptively time-sharedthreads on the same processor
The computations may be executing on multiplecores in the same chip, preemptively time-sharedthreads on the same processor
The computations may be executing on multiplecores in the same chip, preemptively time-sharedthreads on the same processor
models concurrent computation :Parallel RandomAccess Machine model, Actor modelmodels concurrent computation :Parallel RandomAccess Machine model, Actor model
Advantages of ConcurrencyAdvantages of Concurrency
Concurrent processes can reduceduplication in code.Concurrent processes can reduceduplication in code.
The overall runtime of the algorithm can besignificantly reduced.The overall runtime of the algorithm can besignificantly reduced.The overall runtime of the algorithm can besignificantly reduced.The overall runtime of the algorithm can besignificantly reduced.
More real-world problems can be solved thanwith sequential algorithms alone.More real-world problems can be solved thanwith sequential algorithms alone.
Redundancy can make systems morereliable.Redundancy can make systems morereliable.
Disadvantages of ConcurrencyDisadvantages of Concurrency
Runtime is not always reduced, so carefulplanning is requiredRuntime is not always reduced, so carefulplanning is required
Concurrent algorithms can be more complexthan sequential algorithmsConcurrent algorithms can be more complexthan sequential algorithmsConcurrent algorithms can be more complexthan sequential algorithmsConcurrent algorithms can be more complexthan sequential algorithms
Shared data can be corruptedShared data can be corrupted
Communications between tasks is neededCommunications between tasks is needed
Achieving ConcurrencyAchieving Concurrency
Many computers today have more thanone processor (multiprocessor machines)
CPU 1 CPU 2
Memory
bus
Achieving ConcurrencyAchieving Concurrency Concurrency can also be achieved on a
computer with only one processor: The computer “juggles” jobs, swapping its
attention to each in turn “Time slicing” allows many users to get CPU
resources Tasks may be suspended while they wait for
something, such as device I/O
CPU
task 1task 2
task 3ZZZZ
ZZZZ
Concurrency can also be achieved on acomputer with only one processor: The computer “juggles” jobs, swapping its
attention to each in turn “Time slicing” allows many users to get CPU
resources Tasks may be suspended while they wait for
something, such as device I/O
Concurrency vs. ParallelismConcurrency vs. Parallelism
Concurrency is the execution ofmultiple tasks at the same time,regardless of the number ofprocessors.
Parallelism is the execution ofmultiple processors on the same task.
Concurrency is the execution ofmultiple tasks at the same time,regardless of the number ofprocessors.
Parallelism is the execution ofmultiple processors on the same task.
Types of Concurrent SystemsTypes of Concurrent Systems
MultiprogrammingMultiprogramming
MultiprocessingMultiprocessingMultiprocessingMultiprocessing
MultitaskingMultitasking
Distributed SystemsDistributed Systems
Examples of ConcurrentAlgorithms
P/C Bounded-BufferP/C Bounded-Buffer
Readers and WritersReaders and WritersReaders and WritersReaders and Writers
Dining-Philosophers…Dining-Philosophers…
OutlineOutlineSequential and Parallel ComputingSequential and Parallel Computing
RAM and PRAM ModelsRAM and PRAM Models
Amdahl’s LawAmdahl’s Law
Brent’s TheoremBrent’s TheoremBrent’s TheoremBrent’s Theorem
Parallel Algorithm Analysis and optimal paralle algorithmsParallel Algorithm Analysis and optimal paralle algorithms
Graph ProblemsGraph Problems
Concurrent AlgorithmsConcurrent Algorithms
Dinning Philosophers ProblemDinning Philosophers Problem
Dining Philosophers ProblemDining Philosophers Problem
Philosophers spend their lives alternatingbetween thinking and eating.Philosophers spend their lives alternatingbetween thinking and eating.
Five philosophers can be seated around acircular table.Five philosophers can be seated around acircular table.
There is a shared bowl of rice.There is a shared bowl of rice.There is a shared bowl of rice.There is a shared bowl of rice.
In front of each one is a plate.In front of each one is a plate.
Between each pair of philosophers there is achopstick, so there are five chopsticks.Between each pair of philosophers there is achopstick, so there are five chopsticks.
It takes two chopsticks to take/eat rice, so whilen is eating neither n+1 nor n-1 can be eating.It takes two chopsticks to take/eat rice, so whilen is eating neither n+1 nor n-1 can be eating.
Dining PhilosophersDining Philosophers ProblemProblem
Each one thinks for a while, gets the chopsticks/forksneeded, eats, and puts the chopsticks/forks down again,in an endless cycle.
Each one thinks for a while, gets the chopsticks/forksneeded, eats, and puts the chopsticks/forks down again,in an endless cycle.
Illustrates the difficulty of allocating resources amongprocess without deadlock and starvation.Illustrates the difficulty of allocating resources amongprocess without deadlock and starvation.Illustrates the difficulty of allocating resources amongprocess without deadlock and starvation.Illustrates the difficulty of allocating resources amongprocess without deadlock and starvation.
Dining PhilosophersDining Philosophers ProblemProblem
The challenge is to grant requests for chopsticks whileavoiding deadlock and starvation.The challenge is to grant requests for chopsticks whileavoiding deadlock and starvation.
Deadlock can occur if everyone tries to get their chopsticksat once. Each gets a left chopstick, and is stuck, becauseeach right chopstick is someone else’s left chopstick.
Deadlock can occur if everyone tries to get their chopsticksat once. Each gets a left chopstick, and is stuck, becauseeach right chopstick is someone else’s left chopstick.
Deadlock can occur if everyone tries to get their chopsticksat once. Each gets a left chopstick, and is stuck, becauseeach right chopstick is someone else’s left chopstick.
Deadlock can occur if everyone tries to get their chopsticksat once. Each gets a left chopstick, and is stuck, becauseeach right chopstick is someone else’s left chopstick.
The Dining Philosophers ProblemThe Dining Philosophers Problem
PhilosophersPhilosophers• Think• Take Forks (One At A Time)• Eat• Put Forks (One At A Time)
Eating requires 2 forksEating requires 2 forksEating requires 2 forksEating requires 2 forks
Pick one fork at a timePick one fork at a time
How to prevent deadlock?How to prevent deadlock?
What about starvation?What about starvation?
What about concurrency?What about concurrency?
Idea is to capture the concept of multiple processes competing for limited resources79
Dining Philosopher’s Problem(Dijkstra ’71)
What can go wrong?What can go wrong?
Starvation:Starvation: A policy that canleave some philosopher hungry insome situation
Starvation:Starvation: A policy that canleave some philosopher hungry insome situation
Starvation:Starvation: A policy that canleave some philosopher hungry insome situation
Deadlock:Deadlock: A policy that leaves allthe philosophers “stuck”, so thatnobody can do anything at all
Deadlock:Deadlock: A policy that leaves allthe philosophers “stuck”, so thatnobody can do anything at all
An incorrectAn incorrect solutionsolution-- DeadlockDeadlock ispossible
( means “waiting for this fork”) 82
An incorrect solutionAn incorrect solution-- Starvation ispossible
Eat
Eat
BlockStarvation!
83
Dining Philosophers SolutionDining Philosophers Solution
Each philosopheris a process. One semaphore
per fork: fork: array[0..4] of
semaphores Initialization:
fork[i].count := 1for i := 0..4
Process Pi:repeatthink;wait(fork[i]);wait(fork[i+1 mod 5]);eat;signal(fork[i+1 mod 5]);signal(fork[i]);forever
Each philosopheris a process. One semaphore
per fork: fork: array[0..4] of
semaphores Initialization:
fork[i].count := 1for i := 0..4
Process Pi:repeatthink;wait(fork[i]);wait(fork[i+1 mod 5]);eat;signal(fork[i+1 mod 5]);signal(fork[i]);forever
Note: deadlock if each philosopher starts by pickingup his left fork!
Dining PhilosophersDining Philosophers SolutionSolution--PossiblePossible solutions to avoid deadlock:solutions to avoid deadlock:
Allow at most four philosophers to be sittingsimultaneously at the table . Allow aphilosopher to pick up the forks only if bothare available.
Allow at most four philosophers to be sittingsimultaneously at the table . Allow aphilosopher to pick up the forks only if bothare available.
Allow at most four philosophers to be sittingsimultaneously at the table . Allow aphilosopher to pick up the forks only if bothare available.
Use an asymmetric solution - an odd-numbered philosopher picks up first the leftchopstick and then the right chopstick. Even-numbered philosopher picks up first the rightchopstick and then the left chopstick.
Use an asymmetric solution - an odd-numbered philosopher picks up first the leftchopstick and then the right chopstick. Even-numbered philosopher picks up first the rightchopstick and then the left chopstick.
Dining Philosophers SolutionDining Philosophers Solution
A solution:A solution: admit only 4 philosophers ata time that try to eat.A solution:A solution: admit only 4 philosophers ata time that try to eat.
Then 1 philosopher can always eat whenthe other 3 are holding 1 fork.Then 1 philosopher can always eat whenthe other 3 are holding 1 fork.
Process Pi:repeatthink;wait(T);wait(fork[i]);wait(fork[i+1 mod 5]);eat;signal(fork[i+1 mod 5]);signal(fork[i]);signal(T);forever
Then 1 philosopher can always eat whenthe other 3 are holding 1 fork.Then 1 philosopher can always eat whenthe other 3 are holding 1 fork.
Hence, we can use another semaphoreT that would limit at 4 the number ofphilosophers “sitting at the table”.
Hence, we can use another semaphoreT that would limit at 4 the number ofphilosophers “sitting at the table”.
Initialize: T.count := 4Initialize: T.count := 4
Process Pi:repeatthink;wait(T);wait(fork[i]);wait(fork[i+1 mod 5]);eat;signal(fork[i+1 mod 5]);signal(fork[i]);signal(T);forever