algorithms devised for a google interview

20
© Russell John Childs, PhD. Date: 2015-03-21. Algorithms for calculating order statistic. Module K-STATISTIC Specifications Description: Given an unsigned integer, k, and a set of arrays, S:={A}, the k smallest items in S shall be found and returned. Specifications: K-STATISTIC-1: "A" shall be a sorted array of unsigned, non-duplicative integers. K-STATISTIC-2: "S" shall be a set of sorted arrays, A. K-STATISTIC-3: An unsigned integer, "k", shall be within the range . K-STATISTIC-4: : "R" shall be a sorted array of unsigned integers. K-STATISTIC-5: The k smallest elements, over all arrays, A, in the set S, shall be found and returned in R. K-STATISTIC-6: T shall be the number of threads allocated to the module. K-STATISTIC-7: The time-order-of-complexity of this module shall be assessed. Interface Specifications: K-STATISTIC-8: This module shall provide the interface: template<typename Type> vector<Type> k_statistic(const vector< vector<Type> >& S, unsigned k) K-STATISTIC-8.1: k_statistic shall take the set, S:={A}, as an argument. K-STATISTIC-8.2: k_statistic shall take, k, as an argument. K-STATISTIC-8.3: k_statistic shall take a template parameter, Type, which shall map to an integer. K-STATISTIC-9: k_statistic shall return the set R. Data Specifications: K-STATISTIC-10: Each sorted array, A, shall be no larger than 1MB ( bytes). K-STATISTIC-11: "D" shall be an ordered dataset of unsigned integers. K-STATISTIC-12: "D" shall be partitioned into the set S:={A} and A shall satisfy K-STATISTIC-10 Module K-STATISTIC Analysis Multi-threaded Bucket Sort Analysis: Let B be a bucket sort. Let N be the number of items to be sorted. Let M be the number of buckets in B. Let be i-th bucket in B. Let be the number of items in . Items, , are placed in container, , in bucket , . Two cases arise: (1) C is a vectorised, ordered linked-list, s.t. consecutive nodes belong to each thread, where T is thread-count, and is multi-casted to all threads. Insertions and searches are : Copying to linear memory is . Thus, for it is possible to achieve are insertion, search and copy. Thread- overhead may make SIMD a more suitable choice. (2) C is a non-vectorised ordered set: Item insertion/search : , where H is entropy : . Sort: . The entropy of the integers directly affects . A uniform distribution has the lowest : . A -func has the highest: . With one thread per array: , since each thread needs to process no more than from each array.

Upload: russell-childs

Post on 18-Jul-2015

116 views

Category:

Engineering


1 download

TRANSCRIPT

© Russell John Childs, PhD. Date: 2015-03-21. Algorithms for calculating order statistic.

Module K-STATISTIC Specifications Description: Given an unsigned integer, k, and a set of arrays, S:={A}, the k smallest items in S shall be

found and returned.

Specifications:

K-STATISTIC-1: "A" shall be a sorted array of unsigned, non-duplicative integers. K-STATISTIC-2: "S" shall be a set of sorted arrays, A. K-STATISTIC-3: An unsigned integer, "k", shall be within the range . K-STATISTIC-4: : "R" shall be a sorted array of unsigned integers. K-STATISTIC-5: The k smallest elements, over all arrays, A, in the set S, shall be found and returned in R. K-STATISTIC-6: T shall be the number of threads allocated to the module. K-STATISTIC-7: The time-order-of-complexity of this module shall be assessed.

Interface Specifications:

K-STATISTIC-8: This module shall provide the interface: template<typename Type>

vector<Type> k_statistic(const vector< vector<Type> >& S, unsigned k)

K-STATISTIC-8.1: k_statistic shall take the set, S:={A}, as an argument. K-STATISTIC-8.2: k_statistic shall take, k, as an argument. K-STATISTIC-8.3: k_statistic shall take a template parameter, Type, which shall map to an integer. K-STATISTIC-9: k_statistic shall return the set R.

Data Specifications:

K-STATISTIC-10: Each sorted array, A, shall be no larger than 1MB ( bytes). K-STATISTIC-11: "D" shall be an ordered dataset of unsigned integers. K-STATISTIC-12: "D" shall be partitioned into the set S:={A} and A shall satisfy K-STATISTIC-10

Module K-STATISTIC Analysis Multi-threaded Bucket Sort Analysis:

Let B be a bucket sort. Let N be the number of items to be sorted. Let M be the number of buckets in B. Let be i-th bucket in B. Let be the number of items in .

Items, , are placed in container, , in bucket , . Two cases arise:

(1) C is a vectorised, ordered linked-list, s.t. consecutive nodes belong to each thread, where T is thread-count, and is multi-casted to all threads. Insertions and searches are : Copying to linear memory is . Thus, for it is possible to achieve are insertion, search and copy. Thread-overhead may make SIMD a more suitable choice. (2) C is a non-vectorised ordered set: Item insertion/search:

, where H is

entropy:

.

Sort: . The entropy of the integers directly affects . A uniform

distribution has the lowest :

. A -func has the highest: .

With one thread per array:

,

since each thread needs to process no more than from each array.

Multi-threaded Merge Sort Analysis:

Two sorted arrays are merged according to the following prescription: Let be the value of the median element of the left-hand array, , where is the size. Let be the position of in the right-hand array, , through a binary search. Array is split about its mid-point , where . Array is split about b, , where . Sub-arrays are then recombined: , as the following diagram depicts:

Case 1: The number of threads, . The order of complexity for merging two arrays of equal size is

given by:

Given arrays, of equal size, and an infinite number of threads, , the arrays may be merged in pairs to

give:

For 1 billion arrays and a k-statistic of 1 billion, would be:

Case 2: The number of threads, , is finite. The expression for this case is ( threads for merge, for

reduction by pairs, ):

, NB:

.

Threads

0

1 2

3 4

5 6 Synchronisation of

pushes to task queue

Ordered array

Results (using: Visual Studio 2013, Intel Quad-Core 2.6 GHz i7-3720QM, 8 GB RAM, Dell Precision M4700 Mobile Workstation):

For 1, 2 and 4 threads, the vectorised bucket sort is around five times faster than successive, single-threaded std::merge operations.

Sadly, the performance decreases with thread-count, indicating thread-overhead is an issue. Performance-profiling with Intel VTune

Amplifier has not yet been undertaken, so the amount of time spent in locks, RFOs and memory fetches to cache is unknown.

The vectorised merge-sort is very slow. This may, again, be the result of thread-overhead and thread oversubscription.

Overall, it is surmised that this sort of vectorisation is better accomplished through SIMD or FPGAs, where each "thread" performs

relatively few operations and the thread-count can be far higher.

/** * \brief Algorithms for calculating Kth order statistic given multiple, * sorted arrays. This compiles under Visual Studio 2013. As yet, * code has not been ported to Eclipse and Linux. Code is untested * but available for review. * \details This file contains a multi-threaded bucket sort and two methods. * One of the methods uses the bucket sort to obtain Kth order statistic and the other uses a merge-sort whose merge is vectorised * \author Russell John Childs, PhD. * \date 2015-03-21 * \copyright Russell John Childs, PhD. */ #include <vector> #include <set> #include <queue> #include <chrono> #include <thread> #include <future> #include <condition_variable> #include <random> #include <iostream> #include <sstream> /** * @namespace Sort * @brief The namespace for k statistics sorting */ namespace Sort { /** \class * @param Type. The type of the element in the array to be sorted * * @param BucketCount. The number of buckets for the sort * * @param Predicate. An overloaded method "unsigned operator()(const Type&) * This must take a parameter of type const Type& and return a unique unsigned * integer within [lower, upper] specified in this class's constructor. * */ template< typename Type, unsigned BucketCount, typename Comp=std::less<Type>> class BucketSort { public: /** * @param lower : unsigned. The lower value of the range. * * @param upper : unsigned. The upper value of the range. * * Notes: If an element lies outside the range it is ignored and not * included in the final, sorted result */ BucketSort(unsigned lower, unsigned upper) : m_lower(lower), m_upper(upper), m_normalisation(long double(BucketCount-1) / long double(upper - lower)), m_size(0) { //Reset all the buckets reset(); } /** * No operations specified. */ ~BucketSort(void) { }

/** * Emtpies the buckets */ void reset(void) { m_buckets.clear(); m_buckets.resize(BucketCount); m_size = 0; } /** * Sorts the passed array. * @param arr : Type. The fixed array, of type Type, to be sorted * * Notes: This is a convenience function for handling fixed arrays. * It is, at this time, unimplemented and does nothing. */ template<int Size, typename Pred> void sort(Type (&arr) [Size], Pred) { //This method to be implemented at a later date. } /** * Sorts the passed container. * @param arr : const Container&. The conainer to be sorted. * * @param pred : Pred. A function "unsigned func(const Type& elem)" that * returns the unique unsigned key associated with elem. * Keys outside [lower, upper] cause elem to be skipped. * * Notes: (1) Container must provide method const Type& operator[](unsigned). * (2) The result is returned by Type& operator[](unsigned) and * get_result() This allows a succession of arrays to be passed to * sort() before the result is obtained. * (3) sort() may be terminated by pred() returning val > arr.size(). * (4) This implementation uses locks. Conversion to lock-free is * relatively straightforward, but requires substantially more * testing, due to subtle bugs that arise with lock-free. */ template< typename Container, typename Pred > void sort(const Container &arr, Pred pred ) { //Get array size unsigned size = arr.size(); //Add each each source element to sorted list for (unsigned i = 0; i < size; ++i) { unsigned key = pred((arr[i])); if (m_lower <= key && key <= m_upper) { unsigned index = unsigned((key - m_lower)*m_normalisation); { std::lock_guard<std::mutex> lck(m_mutexes[index]); m_buckets[index].insert(arr[i]); //Update total element count; ++m_size; } } else if (key == -int(1)) { //predicate has signalled that sort should terminate i = key-1; } } }

/** * Returns an element in the sorted array * @param index : unsigned. The index of the element in the sorted array. * * @return const std::vector<Type>&. The list of sorted objects. * Notes: 1. This method will be provided at a future date. It requires that * BucketSort utilise a linear, contiguous buffer for its buckets * allowing for O(1) retrieval of an element. It is, at this time, * not available to the user, who must use the far less efficient * mechanism "get_result()[index]". */ Type& operator[](unsigned index) { } /** * Returns a vector containing the sorted elements * * @param k : unsigned. Number of sorted elements to return. Default = All. * @ return : const std::vector<Type>& . The sorted elements * */ const std::vector<Type>& get_result(unsigned k=0) { unsigned size = k; if (k == 0) { //Get total number of elements for (auto& bucket : m_buckets) { size += bucket.size(); } } //Resize result vector m_result.resize( size ); //Store sorted result unsigned index = 0; for (auto& bucket : m_buckets) { if (index < k) { auto lim = std::min<unsigned>(bucket.size(), k - index); auto iter = bucket.begin(); for (unsigned i = 0; i < lim; ++i) { m_result[index++] = *iter++; } } } //Return sorted result return m_result; } /** * Finds a specified element * @param in : const Type&. The element sought. * * @param advance : int. For the specified element, finds the element that * is "advance" elements before (if advance < 0) or * after (if advance > 0) the specified element "in".

* * @return typename std::set<Type>::iterator . An iterator to the element, * if present, or end(). * * Notes: This implementation uses locks. Conversion to lock-free is * relatively straightforward, but requires substantially more * testing, due to subtle bugs that arise with lock-free. */ template<typename Pred> typename std::set<Type>::iterator find(const Type& in, Pred pred, int advance) { //Get bucket bounds and bucket index int bounds = BucketCount - 1; unsigned key = pred(in); int index = unsigned((key - m_lower)*m_normalisation); //Get beginning and end of the bucket table std::unique_lock<std::mutex> lck(m_mutexes[0]); auto begin = m_buckets[0].begin(); lck.unlock(); std::unique_lock<std::mutex> lck1(m_mutexes[bounds]); auto end = m_buckets[bounds].end(); lck1.unlock(); //Create return var typename std::set<Type, Comp>::iterator ret_val; bool is_not_found = index > bounds; if (is_not_found == false ) { std::lock_guard<std::mutex> lck(m_mutexes[index]); ret_val = m_buckets[index].find(in); if (ret_val == m_buckets[index].end()) { ret_val = end; is_not_found = true; } } else { //Out of bounds ret_val = end; } //Increment iterator whilst within bounds while (is_not_found == false && advance > 0) { std::unique_lock<std::mutex> lck(m_mutexes[index]); if (ret_val != m_buckets[index].end()) { //Increment if within bounds of current bucket ++ret_val; --advance; } else if (++index <= bounds) { //If within bounds of table, get start of next bucket lck.unlock(); std::lock_guard<std::mutex> lck1(m_mutexes[index]); ret_val = m_buckets[index].begin(); } else { //Out-of-bounds ret_val = end; is_not_found = true;

} } //Decrement iterator whilst within bounds while (is_not_found == false && advance < 0) { std::unique_lock<std::mutex> lck(m_mutexes[index]); if (ret_val != m_buckets[index].begin()) { //Decrement if within bounds of current bucket --ret_val; ++advance; } else if (--index >= 0) { //If within bounds of table, go to eof prev bucket lck.unlock(); std::lock_guard<std::mutex> lck1(m_mutexes[index]); //auto in = m_buckets[index]; //ret_val = in.begin() == in.end() ? in.end() : in.end()--; ret_val = m_buckets[index].end(); } else { //Out-of-bounds ret_val = begin; is_not_found = true; } } return ret_val; } /** * Returns total count of elements in bucket sort * * @return unsigned . Total count of elements. */ unsigned size(void) { return m_size; } private: unsigned m_lower; unsigned m_upper; long double m_normalisation; unsigned m_size; std::vector<std::set<Type, Comp>> m_buckets; std::mutex m_mutexes[BucketCount]; std::vector<Type> m_result; }; template<typename T= unsigned> using Element=std::pair<unsigned, T>; template<typename T> using ElementArray = std::vector<Element<T>>; template<typename T> using ElementArrays = std::vector<ElementArray<T>>; /** k_statistic * @param arrays : ElementArrays. The sorted arrays to be merged * * @param k : unsigned. The k-statistic, i.e. the k smallest elements in "arrays". * * Notes: (1) At least one input array must have sizeof >= k. If necessary, * pad the 1st array with value > k-th smallest. * (2 )Function uses bucket sort to keep track of k smallest elements * so far. For each array, if an element is larger than the largest * of the k smallest found so far, then all later elements are * discarded. Otherwise, the set of k smallest elements is updated

* with the new element. */ namespace KStatisticBucketSort { template< typename T > const ElementArray<T> k_statistic(ElementArrays<T>& arrays, unsigned k) { std::atomic<unsigned> last_found; //Find an array, A, with at least k elements unsigned index = 0; while (arrays[index++].size() < k && index < arrays.size()); unsigned upper = last_found = arrays[index - 1][k - 1].first; //Create a buacket sort shared amongst thread. //TODO: Remove magic number (100) and use dynamically sized buckets struct Comp{ bool operator()(const Element<T>& lhs, const Element<T>& rhs ) { return lhs.first < rhs.first; } }; BucketSort<Element<T>, 2000, Comp> bucket_sort(0, upper); //Add A as the base case { auto pred = [&](const Element<T>& elem){ return elem.first; }; bucket_sort.sort(arrays[index - 1], pred); } unsigned start_array = index - 1; //Create a predicate for the bucket sort auto pred = [&](const Element<T>& elem) { //Update last element in list of k smallest found so far unsigned old_val = last_found; //last of the k smallest unsigned new_val = old_val; //new val for last of k smallest bool stop_processing=false; //Flag to continue or discontinue do { old_val = last_found; new_val = old_val; stop_processing = false; if (bucket_sort.size() < k) { new_val =std::max<unsigned>( elem.first, old_val ); } else if (elem.first > old_val) { //Simply stop processing array if elem > max(k-smallest) new_val = old_val; stop_processing = true; } else { //Add elem and update max(k-smallest) auto tmp = [&](const Element<T>& in){ return in.first; }; new_val = std::max<unsigned>(elem.first, bucket_sort.find( std::make_pair(old_val, elem.second), tmp, -1)->first); } } while (last_found.compare_exchange_weak(old_val, new_val) == false && stop_processing == false ); return stop_processing == false ? elem.first : -int(1); }; //Create a thread function that adds a new array to the bucket sort index = 0;

std::atomic<bool> start_thread = false; std::atomic<unsigned> pop_count = 0; auto add_array = [&]( void ) { //Wait for start signal while (start_thread == false); //Loop over arrays "popping" each one processed unsigned old_count = pop_count; unsigned new_count = old_count + 1; while (old_count < arrays.size()) { //Claim an array, by capturing pop count and incrmeenting while (pop_count.compare_exchange_weak(old_count, old_count+1) == false); //Check pop count is within bounds and not the starting array if (old_count != start_array && old_count < arrays.size()) { bucket_sort.sort(arrays[old_count], pred); } } }; //Add arrays to the bucket sort, limit number of threads (4=magic num, //but this is only proof-of-concept code unsigned thread_limit = 4; unsigned thread_count = 0; std::vector< std::future<void> >results; for (auto& arr : arrays) { //Add array to sort results.push_back( std::async(std::launch::async, add_array)); //If thread lim reached: if (thread_count > thread_limit) { //Wait for existing threads to finish start_thread = true; for (auto& res : results) { res.wait(); } //Reset limit checks thread_count = 0; results.clear(); } ++thread_count; } //Start threads and wait for results start_thread = true; for (auto& res : results) { res.wait(); } //Extract k-th order statistic from bucket table return bucket_sort.get_result(k); } } /** k_statistic * @param arrays : ElementArrays. The sorted arrays to be merged * * @param k : unsigned. The k-statistic, i.e. the k smallest elements in "arrays". * * Notes: (1) Function uses merge sort to find smallest elements. The merge is vectorised. * (2) The implementation of this method is in a STATE OF FLUX. * Focussed on stress-testing Bucket-Sort algorithm */

namespace KStatisticMergeSort { template<typename T> const ElementArray<T> k_statistic(ElementArrays<T>& arrays, unsigned k) { //std::atomic isn't copy-constructible, so need to wrap it for std::vec //Very annoying. Naughty C++ committee. struct AtomicUnsigned { AtomicUnsigned(void) : m_val(new std::atomic<unsigned>(0)) { } ~AtomicUnsigned(void) { delete m_val; } void operator=(unsigned i) { *m_val = i; } void operator+=(unsigned i) { unsigned old = *m_val; while (m_val->compare_exchange_weak(old, old + i) == false); } operator unsigned(void) { return *m_val; } std::atomic<unsigned>* m_val; }; //EOF annoying code-bloat //Create sync id for each queue std::vector<AtomicUnsigned> sync_id; //Struct to hold id, used to sync pushes to queue, //and the boundaries of the left and right array chunks to merge struct QueueData { unsigned m_sync_id; unsigned m_generations_skipped; typename ElementArray<T>::const_iterator m_lhs_first; typename ElementArray<T>::const_iterator m_lhs_last; typename ElementArray<T>::const_iterator m_rhs_first; typename ElementArray<T>::const_iterator m_rhs_last; }; //Create task queues. One queue for each array-pair. Length will halve //with each iteration: N/2 -> N/4 -> N/8 ... -> 1 merged array std::vector<std::queue<QueueData>> queues; //Create array of sorted counts (2*k sorted elems => done!) std::vector<AtomicUnsigned> sorted_count; //Create pneding task count (1 thread spawned per task up to thr limit) std::atomic<int> pending_tasks(0); std::vector<AtomicUnsigned> queue_tasks; //Create merge function std::vector<std::mutex> mut(arrays.size()); //This function should be made lock-free //Lambda to extract queue results to an array

std::function<void(unsigned)> merge_pair = [&](unsigned i) { //Flag to indicate pair is merged bool done = false; //Lock queue and peek 1st task std::unique_lock<std::mutex> lck(mut[i]); QueueData q_data = queues[i].front(); //Check for termination condition (2*k sorted items) if (sorted_count[i] == k << 1) { //Check final tree level fully populated //Level of node unsigned node_level = static_cast<unsigned>(std::log2(q_data.m_sync_id+1)+1); //Depth of tree unsigned last_level = static_cast<unsigned>(std::log2(sync_id[i] + 1) + 1); //Node is at tree depth? done = node_level == last_level; } //Terminate if finished if (done == true) { lck.unlock(); } //Only process queue[i] if we have not reached last node in last level else { //Get latest chunk from queue queues[i].pop(); lck.unlock(); //Get middle of lhs //-------------------- ------------- //| | M | | OR | M | | //------------------- ------------- auto lhs_middle = q_data.m_lhs_first + (q_data.m_lhs_last - q_data.m_lhs_first) / 2; //Get "middle" of rhs(insertion point of M) //-------------------- //| |M'<M | I>=M | //-------------------- auto rhs_middle = std::upper_bound(q_data.m_rhs_first, q_data.m_rhs_last, *lhs_middle); //Create (lhs.lower_half, rhs.lower_half). // ( [beg, mid) , [beg, I) ) QueueData left { q_data.m_sync_id * 2 + 1, q_data.m_generations_skipped, q_data.m_lhs_first, lhs_middle,//NB: first==mid implies [mid,mid), i.e. "empty" q_data.m_rhs_first, rhs_middle }; //Create (lhs.upper_half, rhs.upper_half). // ( [mid,last+1) , [I, last+1) ) QueueData right { q_data.m_sync_id * 2 + 2, q_data.m_generations_skipped, lhs_middle, q_data.m_lhs_last, rhs_middle,

q_data.m_rhs_last }; //Termination conditions //1. Prev gen lhs sorted(q_data.m_lhs_first==q_data.m_lhs_last) //2. All data >= lhs.mid ---> left=empty if ( right.m_generations_skipped > 0 || q_data.m_lhs_first == q_data.m_lhs_last || /*q_data.m_rhs_first == q_data.m_rhs_last ||*/ (left.m_lhs_first == left.m_lhs_last && left.m_rhs_first == left.m_rhs_last) ) { //Store in + just keep rhs branch right = q_data; right.m_sync_id = q_data.m_sync_id * 2 + 2; //Keep track of tree levels skipped in bifurcation process ++right.m_generations_skipped; //Update count of sorted elements if (right.m_generations_skipped == 1) { sorted_count[i].m_val->fetch_add( ((q_data.m_lhs_last - q_data.m_lhs_first) + (q_data.m_rhs_last - q_data.m_rhs_first)) ); } } //Wait for synchronisation value // / \ // / \ // / \ / \ wait for right-2 (0 gens skipped) // /\ /\ /\ \ wait for right-2 (1 gen skipped) // /\/\/\/\ /\/\ \ wait for right-4 (2 gens skipped) unsigned skipped = std::max<unsigned>(right.m_generations_skipped, 1); unsigned prev_id; do { prev_id = right.m_sync_id - (1 << skipped); //std::this_thread::sleep_for(std::chrono::microseconds(1)); std::this_thread::yield(); } while (sync_id[i].m_val->compare_exchange_weak(prev_id, prev_id) == false); //Only push lhs if generation not skipped lck.lock(); if (right.m_generations_skipped == 0) { //Push chunk to q and update task count queues[i].push(left); auto& task = *(queue_tasks[i].m_val); task++; ++pending_tasks; } //Push rhs chunk, update task count & sync id queues[i].push(right); auto& task = *(queue_tasks[i].m_val); task++; ++pending_tasks; sync_id[i] = right.m_sync_id; } }; //Lambda for performing container.begin()+n auto advance = [&]( const ElementArray<T>& a, typename ElementArray<T>::const_iterator it, unsigned ind) {

std::advance(it, std::min<unsigned>(ind, a.end()-it)); return it; }; //Lambda to push array pairs onto queues auto push_pairs = [&](const ElementArrays<T>& arrs) { //Resize queues, task counts, sorted counts, sync ids auto size = arrs.size(); queues.clear(); queues.resize(size / 2); queue_tasks.clear(); queue_tasks.resize(size / 2); sorted_count.clear(); sorted_count.resize(size/2); sync_id.clear(); sync_id.resize(queues.size()); //oop over array pairs unsigned count = 0; for (unsigned i = 0; i < (size>>1)<<1; i += 2) { //Populate thread id, gens skipped, start/end of lhs of pair, //start/end rhs of pair. NB interval = [beg,end) = [0,1,..,k-1,k) QueueData data { 0, 0, arrs[i].begin(), advance(arrs[i], arrs[i].cbegin(), k), arrs[i + 1].cbegin(), advance(arrs[i + 1], arrs[i + 1].cbegin(), k) }; //Push chunk to q and update task count queues[count].push(data); auto& tmp = *(queue_tasks[count++].m_val); tmp++; ++pending_tasks; } }; //Push original arrays push_pairs(arrays); //Lambda to extract queue results to an array auto extract = [&](ElementArrays<T>& arrs) { unsigned count = 0; //Lopp over queues for (auto& q : queues) { ElementArray<T> arr; while (q.empty() == false) { //Get start/end of chunk range auto pair = q.front(); q.pop(); //Only extract if range not empty and elems <= k if (pair.m_lhs_first != pair.m_lhs_last) { std::copy(pair.m_lhs_first, pair.m_lhs_last, std::back_inserter(arr)); } //Only extract if range not empty and elems <= k if (pair.m_rhs_first != pair.m_rhs_last) {

std::copy(pair.m_rhs_first, pair.m_rhs_last, std::back_inserter(arr)); } } //Return sorted array arr.resize(k); arrs.push_back(arr); } }; //This section attempts to balance workload evenly across available threads. // //For each array-pair in turn, priority is given to vectorising the merge //of each pair. // //Array-pairs are, successively, processed in parallel, until all threads //are consumed in the vectorised merges of the pairs processed so far. // //As each vectorised merge bifructaes (1->2->4->8...) it will consume more //threads until all threads are being used to vectorise merges. When this //point is reached, array-pair parallel processes ceases, since all threads //are vectorising the existing merges. // //The aim is to minimise the latency of the merge in the hope that this //minimises the latency associated with merging all the array-pairs. // //Storage for results (toggle between ret_val[0<->1], so ret_val[a] has prev //results ret_val[b] has new results in seq N/2 -> N/4 -> N/8 ElementArrays<T> ret_val[2]; unsigned toggle = 0; //Vector of futures returned by tasks std::vector<std::future<void>> results; //Thread limit and count. Magic number, since code=proof-of-concept only unsigned thread_limit = 1024; unsigned thread_count = 0; //Flag to indicate that sorting is complete bool done; bool once = true; do { done = true; //Keep processing any tasks on the queues while (pending_tasks > 0) { //Loop over queueus unsigned q = 0; for (auto& queue : queues) { //Process tasks for this queue std::unique_lock<std::mutex> lck(mut[q]); unsigned num = queue_tasks[q]; for (unsigned task = 0; task < num; ++task) { //Limit num threads spawned if (thread_count < thread_limit) { //Spawn 1 thrd/task and update task count results.push_back(std::async(std::launch::async, merge_pair, q)); --pending_tasks; auto& tmp = *(queue_tasks[q].m_val); tmp--; ++thread_count; } } ++q; lck.unlock();

} //Wait for current tasks to finish spawning new tasks for (auto& res : results) { res.wait(); --thread_count; } results.clear(); } //Extract results auto& old_ret_val = ret_val[toggle]; toggle = ++toggle % 2; ret_val[toggle].clear(); if (once && arrays.size()%2 != 0) { ret_val[toggle].push_back(arrays[arrays.size() - 1]); } once = false; if (old_ret_val.size() % 2 != 0) { ret_val[toggle].push_back(old_ret_val[old_ret_val.size() - 1]); } extract(ret_val[toggle]); //Clear queues and add extracted results if (ret_val[toggle].size() > 1) { queues.clear(); push_pairs(ret_val[toggle]); done = false; } } while (done == false); //Return sorted results return ret_val[toggle][0]; } } } namespace TestSort { /** * @brief Creates random number of randomly-sized, ordered arrays of random numbers. * * @param k : unsigned k-th order statisitc. * * @param max_num_arrays : unsigned . Max number of arrays to be sorted. * * @param max_size_of_arrays : unsigned . Max number of elements in array. * * @param out : Sort::ElementArrays<unsigned>& . The arrays to be returned. * */ template<typename T> void get_random_arrays(unsigned k, unsigned max_num_arrays, unsigned max_size_of_array, Sort::ElementArrays<T>& out) { //Random generator (replace 0 with dev() for random seed) std::random_device dev; std::mt19937 generator(0/*dev()*/); Sort::ElementArrays<unsigned> element_arrays; //Create random number for number of arrays std::uniform_int_distribution<T> arrays_rnd(2, max_num_arrays); unsigned num_arrays = arrays_rnd(generator);

//Create arrays std::set<unsigned> s; for (unsigned i = 0; i < num_arrays; ++i) { //Create radnom number for sizeof array Sort::ElementArray<T> element_array; std::uniform_int_distribution<T> elements_rnd(k, max_size_of_array); unsigned num_elements = elements_rnd(generator); //Create random numbers for elements and add to array std::uniform_int_distribution<T> element_rnd(0, 1<<31); for (unsigned j = 0; j < num_elements; ++j) { T elem = element_rnd(generator); /*bool res = s.insert(elem).second; if (res == false) { //std::cout << "duplicate removed" << std::endl; j = j-1; } else*/ { element_array.push_back(std::make_pair(elem, i)); } } //Sort the arrays and return them std::sort(element_array.begin(), element_array.end()); out.push_back(element_array); } } /** * @brief Creates random number of randomly-sized, ordered arrays of random numbers. * Adds all of these arrays to the sorting algorithm and validates the result * against the known result obtained by adding arrays to a sorted set. * * @param k : unsigned k-th order statisitc. * * @param max_num_arrays : unsigned . Max number of arrays to be sorted. * * @param max_size_of_arrays : unsigned . Max number of elements in array. * * @return bool : Pass = true. * */ bool test_bucket(unsigned k, unsigned max_num_arrays, unsigned max_size_of_array) { //Declare arrays Sort::ElementArrays<unsigned> test_sort; //Print banner std::cout << "Creating up to " << max_num_arrays << " arrays, each of size <= " << max_size_of_array << ". Please wait ..." << std::endl; //Create a set of ordered, random integer arrays get_random_arrays<unsigned>(k, max_num_arrays, max_size_of_array, test_sort); //Print banner unsigned ave = 0; for (auto& arr : test_sort) ave += arr.size(); ave /= test_sort.size(); std::cout << "Testing bucket algorithm with: Num arrays = " << test_sort.size() << ", average size = " << ave << ", k = " << k << ". Please wait ... " << std::endl;

//Run sorting algorithm auto start = std::chrono::high_resolution_clock::now(); auto result = Sort::KStatisticBucketSort::k_statistic(test_sort, k); auto end = std::chrono::high_resolution_clock::now(); //Print banner std::cout << "Algorithm execution completed, after " << std::chrono::duration<double, std::milli>(end - start).count() << " ms." << std::endl; //Run STL merge //Print banner std::cout << "Running std::merge. Please wait ..." << std::endl; //Start timer auto start_1 = std::chrono::high_resolution_clock::now(); //Merge (out[0]=arr1)+arr2 -> out[1], out[1]+arr2 -> out[0], out[0]+arr3 -> out[1] .... std::vector<Sort::Element<unsigned>> output[2]; output[0].insert(output[0].begin(), test_sort[0].begin(), test_sort[0].begin() + k); output[0].resize(2 * k); output[1].resize(2 * k); unsigned toggle; for (unsigned i = 1; i < test_sort.size(); ++i) { toggle = (i - 1) % 2; std::merge(output[toggle].begin(), output[toggle].begin() + k, test_sort[i].begin(), test_sort[i].begin() + k, output[(toggle + 1) % 2].begin()); } //Resize back to k output[(toggle + 1) % 2].resize(k); auto& validate = output[(toggle + 1) % 2]; //stop timer auto end_1 = std::chrono::high_resolution_clock::now(); //Print banner std::cout << "std::merge execution completed, after " << std::chrono::duration<double, std::milli>(end_1 - start_1).count() << " ms." << std::endl; std::cout << "Validating results. Please wait ..." << std::endl; //Validate that values extracted from set == result from algorithm return result == validate; } bool test_merge(unsigned k, unsigned max_num_arrays, unsigned max_size_of_array) { using namespace Sort; //Declare arrays Sort::ElementArrays<unsigned> test_sort; //Print banner std::cout << "Creating up to " << max_num_arrays << " arrays, each of size <= " << max_size_of_array << ". Please wait ..." << std::endl; //Preliminary debugging tests. /*test_sort.push_back( { { 0, 0 }, { 2, 2 }, { 4, 4 }, { 6, 6 }, { 8, 8 }, { 10, 10 }, { 12, 12 }, { 14, 14 }, { 16, 16 }, { 18, 18 }, { 20, 20 }, { 22, 22 }, { 24, 24 }, { 26, 26 }, { 28, 28 }, { 30, 30 } }); test_sort.push_back( { { 40, 40 }, { 42, 42 }, { 44, 44 }, { 46, 46 }, {48, 48 }, { 410, 410 }, { 412, 412 }, { 414, 414 }, { 416, 416 }, { 418, 418 }, { 420, 420 }, { 422, 422 }, { 424, 424 }, { 426, 426 }, { 428, 428 }, { 430, 430 } });*/ /*test_sort.push_back( { { 0, 0 }, { 2, 2 }, { 4, 4 }, { 6, 6 }, { 8, 8 }, { 10, 10 },

{ 12, 12 }, { 14, 14 }, { 16, 16 }, { 18, 18 }, { 20, 20 }, { 22, 22 }, { 24, 24 }, { 26, 26 }, { 28, 28 }, { 30, 30 } });*/ /*test_sort.push_back( { { 0, 0 }, { 2, 2 }, { 4, 4 }, { 6, 6 }, { 8, 8 }, { 10, 10 }, { 12, 12 }, { 14, 14 }, { 16, 16 }, { 18, 18 }, { 20, 20 }, { 22, 22 }, { 24, 24 }, { 26, 26 }, { 28, 28 }, { 30, 30 } }); test_sort.push_back( { { 1, 1 }, { 3, 3 }, { 5, 5 }, { 7, 7 }, { 9, 9 }, { 11, 11 }, { 13, 13 }, { 15, 15 }, { 17, 17 }, { 19, 19 }, { 21, 21 }, { 23, 23 }, { 25, 25 }, { 27, 27 }, { 29, 29 }, { 31, 31 } }); test_sort.push_back( { { 1, 1 }, { 3, 3 }, { 5, 5 }, { 7, 7 }, { 9, 9 }, { 11, 11 }, { 13, 13 }, { 15, 15 }, { 17, 17 }, { 19, 19 }, { 21, 21 }, { 23, 23 }, { 25, 25 }, { 27, 27 }, { 29, 29 }, { 31, 31 } }); test_sort.push_back( { { 0, 0 }, { 2, 2 }, { 4, 4 }, { 6, 6 }, { 8, 8 }, { 10, 10 }, { 12, 12 }, { 14, 14 }, { 16, 16 }, { 18, 18 }, { 20, 20 }, { 22, 22 }, { 24, 24 }, { 26, 26 }, { 28, 28 }, { 30, 30 } });*/ /* for (unsigned i = 0; i < 4; ++i) { Sort::ElementArray<unsigned> arr; for (unsigned j = 0; j < 2 * k; ++j) { arr.push_back(std::make_pair(i + (j * 4), i + (j * 4))); } test_sort.push_back(arr); } test_sort.pop_back();*/ //Randomised strees-test. get_random_arrays<unsigned>(k, max_num_arrays, max_size_of_array, test_sort); //Print banner unsigned ave = 0; for (auto& arr : test_sort) ave += arr.size(); ave /= test_sort.size(); std::cout << "Testing merge algorithm with: Num arrays = " << test_sort.size() << ", average size = " << ave << ", k = " << k << ". Please wait ... " << std::endl; //Run sorting algorithm auto start = std::chrono::high_resolution_clock::now(); auto result = Sort::KStatisticMergeSort::k_statistic(test_sort, k); auto end = std::chrono::high_resolution_clock::now(); //Print banner std::cout << "Algorithm execution completed, after " << std::chrono::duration<double, std::milli>(end - start).count() << " ms." << std::endl; //Run STL merge //Print banner std::cout << "Running std::merge. Please wait ..." << std::endl; //Start timer auto start_1 = std::chrono::high_resolution_clock::now(); //Merge (out[0]=arr1)+arr2 -> out[1], out[1]+arr2 -> out[0], out[0]+arr3 -> out[1] .... std::vector<Sort::Element<unsigned>> output[2]; output[0].insert(output[0].begin(), test_sort[0].begin(), test_sort[0].begin() + k); output[0].resize(2 * k); output[1].resize(2 * k); unsigned toggle; for (unsigned i = 1; i < test_sort.size(); ++i) {

toggle = (i - 1) % 2; std::merge(output[toggle].begin(), output[toggle].begin() + k, test_sort[i].begin(), test_sort[i].begin() + k, output[(toggle + 1) % 2].begin()); } //Resize back to k output[(toggle + 1) % 2].resize(k); auto& validate = output[(toggle + 1) % 2]; //stop timer auto end_1 = std::chrono::high_resolution_clock::now(); //Print banner std::cout << "std::merge execution completed, after " << std::chrono::duration<double, std::milli>(end_1 - start_1).count() << " ms." << std::endl; std::cout << "Validating results. Please wait ..." << std::endl; //Validate that values extracted from set == result from algorithm return result == validate; } } int main(void) { bool result_bucket = TestSort::test_bucket(1000, 1000, 10000); std::cout << "Bucket Sort Algorithm: " << (result_bucket ? "Passed." : "Failed.") << std::endl; std::cout << std::endl; bool result_merge = TestSort::test_merge(100, 500, 10000); std::cout << "Merge Sort Algorithm: " << (result_merge ? "Passed." : "Failed.") << std::endl; return 0; }