cs 584. sorting n one of the most common operations n definition: –arrange an unordered collection...
Post on 21-Dec-2015
216 views
TRANSCRIPT
Sorting
One of the most common operationsOne of the most common operations Definition:Definition:
– Arrange an unordered collection of Arrange an unordered collection of elements into a monotonically increasing or elements into a monotonically increasing or decreasing order.decreasing order.
Two categories of sortingTwo categories of sorting– internal (fits in memory)internal (fits in memory)– external (uses auxiliary storage)external (uses auxiliary storage)
Sorting Algorithms
Comparison basedComparison based– compare-exchangecompare-exchange– O(n log n)O(n log n)
Noncomparison basedNoncomparison based– Uses known properties of the elementsUses known properties of the elements– O(n)O(n) - bucket sort etc. - bucket sort etc.
Parallel Sorting Issues
Input and Output sequence storageInput and Output sequence storage– Where?Where?– Local to one processor or distributedLocal to one processor or distributed
ComparisonsComparisons– How compare elements on different nodesHow compare elements on different nodes
# of elements per processor# of elements per processor– One (compare-exchange --> comm.)One (compare-exchange --> comm.)– Multiple (compare-split --> comm.)Multiple (compare-split --> comm.)
Sorting Networks
Specialized hardware for sortingSpecialized hardware for sorting– based on comparatorbased on comparator
xy
xy
max{x,y}min{x,y}
min{x,y}max{x,y}
Parallel Sorting Algorithms
Merge SortMerge Sort Quick SortQuick Sort Bitonic SortBitonic Sort Others … Others …
Merge Sort
Simplest parallel sorting algorithm?Simplest parallel sorting algorithm? StepsSteps
– Distribute the elementsDistribute the elements– Everybody sort their own sequenceEverybody sort their own sequence– Merge the listsMerge the lists
ProblemProblem– How to merge the listsHow to merge the lists
Quicksort
Simple, low overheadSimple, low overhead O(n log n)O(n log n) Divide and conquerDivide and conquer Divide recursively into smaller Divide recursively into smaller
subsequences.subsequences.
Quicksort
n elements stored in A[1…n]n elements stored in A[1…n] DivideDivide
– Divide a sequence into two partsDivide a sequence into two parts– A[q…r] becomes A[q…s] and A[s+1…r]A[q…r] becomes A[q…s] and A[s+1…r]– make all elements of A[q…s] smaller than make all elements of A[q…s] smaller than
or equal to all elements of A[s+1…r]or equal to all elements of A[s+1…r] ConquerConquer
– Recursively apply QuicksortRecursively apply Quicksort
Quicksort
Partition the sequence A[q…r] by Partition the sequence A[q…r] by picking a picking a pivotpivot..
Performance is greatly affected by the Performance is greatly affected by the choice of the pivot.choice of the pivot.
If we pick a bad pivot, we end up with a If we pick a bad pivot, we end up with a O(nO(n22) ) algorithm.algorithm.
Parallelizing Quicksort
Task parallelismTask parallelism– At each step of the algorithm 2 recursive At each step of the algorithm 2 recursive
calls are made.calls are made.– Farm out one of the recursive calls to Farm out one of the recursive calls to
another processor.another processor. ProblemsProblems
– The work of partitioning is done by one The work of partitioning is done by one processor.processor.
Parallelizing Quicksort
Consider domain decomposition.Consider domain decomposition. HypercubeHypercube
– a a dd dimensional hypercube can be split into two dimensional hypercube can be split into two (d-1)(d-1) dimensional hypercubes such that each processor in dimensional hypercubes such that each processor in one cube is connected to one in the other cube.one cube is connected to one in the other cube.
If all processors know the pivot, neighbors split If all processors know the pivot, neighbors split their respective lists and all elements larger their respective lists and all elements larger than the pivot are distributed to one subcube than the pivot are distributed to one subcube and smaller elements are distributed to the and smaller elements are distributed to the other subcubeother subcube
Parallelizing Quicksort
After we go through each dimension, if After we go through each dimension, if n>p the numbers are not totally sorted.n>p the numbers are not totally sorted.– Why?Why?
Each processor then sorts their own Each processor then sorts their own sublist using a sequential quicksort.sublist using a sequential quicksort.
Pivot selection is particularly importantPivot selection is particularly important– Bad pivots eliminate some processorsBad pivots eliminate some processors
Pivot Selection
Random selectionRandom selection– During the iDuring the ithth split one of the processors in split one of the processors in
each subcube picks a random element each subcube picks a random element from its list and broadcasts to others.from its list and broadcasts to others.
ProblemProblem– What if a bad pivot is selected at first?What if a bad pivot is selected at first?
Pivot Selection
Median selectionMedian selection– If the distribution is uniform then each If the distribution is uniform then each
processor's list is a representative sample processor's list is a representative sample thus the median is representativethus the median is representative
ProblemProblem– Is the distribution really uniform?Is the distribution really uniform?– Can we assume that a single processor's Can we assume that a single processor's
list has the same distribution as the full list?list has the same distribution as the full list?
Procedure HypercubeQuickSort(B)sort B using sequential quicksortfor I = 1 to d Select pivot and broadcast or receive pivot partition B into B1 and B2 such that B1<= pivot < B2
if ith bit of iproc is zero thensend B2 to neighbor along ith dimensionC = subsequence received along ith dimension
Merge B1 and C into B else
send B2 to neighbor along C = subsequence received along ith dimension Merge B2 and C into B endifendfor
Analysis
Iterations = Iterations = loglog22pp Select a pivot = Select a pivot = O(n)O(n)
– keep sublist sortedkeep sublist sorted
Broadcast pivot = O(Broadcast pivot = O(loglog22p)p) Split the sequenceSplit the sequence
– split own sequence = split own sequence = O(log n/p)O(log n/p)– exchange blocks with neighbor = exchange blocks with neighbor = O(n/p)O(n/p)– merge blocks = merge blocks = O(n/p)O(n/p)
Analysis
Quicksort appears very scalableQuicksort appears very scalable Depends heavily on the pivotDepends heavily on the pivot Easy to parallelizeEasy to parallelize
Hypercube sorting algorithms depend Hypercube sorting algorithms depend on the ability to map a hypercube onto on the ability to map a hypercube onto the node communication architecture.the node communication architecture.
Bitonic Sort
Key operation:Key operation:– rearrange a bitonic sequence to orderedrearrange a bitonic sequence to ordered
Bitonic SequenceBitonic Sequence– sequence of elements <asequence of elements <a00, a, a11, … , a, … , an-1n-1>>
There exists i such that <aThere exists i such that <a00, … ,a, … ,aii> is > is
monotonically increasing and <amonotonically increasing and <ai+1i+1,… , a,… , an-1n-1> is > is
monotonically decreasing ormonotonically decreasing or There exists a cyclic shift of indicies such that There exists a cyclic shift of indicies such that
the above is satisfied.the above is satisfied.
Bitonic Sequences
<1, 2, 4, 7, 6, 0> <1, 2, 4, 7, 6, 0> – First it increases then decreasesFirst it increases then decreases– i = 3i = 3
<8, 9, 2, 1, 0, 4><8, 9, 2, 1, 0, 4>– Consider a cyclic shiftConsider a cyclic shift– i will equal 2 or 3i will equal 2 or 3
Rearranging a Bitonic Sequence
Let s = <aLet s = <a00, a, a11, … , a, … , an-1n-1>>
– aan/2n/2 is the beginning of the decreasing seq. is the beginning of the decreasing seq.
Let sLet s11= <min{a= <min{a00, a, an/2n/2}, min{a}, min{a11, a, an/2 +1n/2 +1}…min{a}…min{an/2-1n/2-1,a,an-1n-1}>}>
Let sLet s22=<max{a=<max{a00, a, an/2n/2}, max{a}, max{a11,a,an/2+1n/2+1}… max{a}… max{an/2-1n/2-1,a,an-1n-1} >} >
In sequence sIn sequence s11 there is an element b there is an element bii = min{a = min{aii, a, an/2+in/2+i}}
– all elements before ball elements before bii are from increasing are from increasing
– all elements after ball elements after bii are from decreasing are from decreasing
Sequence sSequence s22 has a similar point has a similar point
Sequences sSequences s11 and s and s22 are bitonic are bitonic
Rearranging a Bitonic Sequence
Every element of sEvery element of s11 is smaller than is smaller than
every element of severy element of s22
Thus, we have reduced the problem of Thus, we have reduced the problem of rearranging a bitonic sequence of size n rearranging a bitonic sequence of size n to rearranging two bitonic sequences of to rearranging two bitonic sequences of size n/2 then concatenating the size n/2 then concatenating the sequences.sequences.
What about unordered lists?
To use the bitonic merge for n items, we must To use the bitonic merge for n items, we must first have a bitonic sequence of n items.first have a bitonic sequence of n items.
Two elements form a bitonic sequenceTwo elements form a bitonic sequence Any unsorted sequence is a concatenation of Any unsorted sequence is a concatenation of
bitonic sequences of size 2bitonic sequences of size 2 Merge those into larger bitonic sequences Merge those into larger bitonic sequences
until we end up with a bitonic sequence of until we end up with a bitonic sequence of size nsize n
Creating a Bitonic Sequence
23
90
60
40
0
3
8
12
14
20
10
9
5
0
18
23
35
40
60
90
95
20
14
12
10
9
8
18
95
5
3
18
95
35
23
40
60
90
0
12
14
8
3
5
9
20
10
18
95
35
23
40
60
0
90
14
12
8
3
9
5
20
10
35
1000
1111
1110
1101
1100
1011
1010
1001
0111
0110
0101
0100
0011
0010
0001
0000
Wires
Mapping onto a hypercube
One element per processorOne element per processor Start with the sorting network mapsStart with the sorting network maps Each wire represents a processorEach wire represents a processor Map processors to wires to minimize Map processors to wires to minimize
the distance traveled during exchangethe distance traveled during exchange
Bitonic SortProcedure BitonicSortfor i = 0 to d -1 for j = i downto 0
if (i + 1)st bit of iproc <> jth bit of iproc comp_exchange_max(j, item)else comp_exchange_min(j, item)endif
endforendfor
comp_exchange_max and comp_exchange_min compare and exchange the item with the neighbor on the jth dimension
Assignment
Pick 16 random integersPick 16 random integers Draw the Bitonic Sort networkDraw the Bitonic Sort network Step through the Bitonic sort network to Step through the Bitonic sort network to
produce a sorted list of integers.produce a sorted list of integers. Explain how the if statement in the Explain how the if statement in the
Bitonic sort algorithm works.Bitonic sort algorithm works.