the c++20 synchronization library...bryce adelstein lelbach cuda c++ core libraries lead iso c++...

175
unique_future<std::uint64_t> fibonacci(execution_policy auto&& s, std::uint64_t n) { if (n < 2) co_return n; auto n1 = async(s, fibonacci<decltype(s)>, s, n - 1); auto n2 = fibonacci(s, n - 2); co_return co_await n1 + co_await n2; } Bryce Adelstein Lelbach, Meeting C++ 2019 Copyright (C) 2019 Bryce Adelstein Lelbach The C++20 Synchronization Library

Upload: others

Post on 07-Nov-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

unique_future<std::uint64_t>fibonacci(execution_policy auto&& s, std::uint64_t n) {if (n < 2) co_return n;

auto n1 = async(s, fibonacci<decltype(s)>, s, n - 1);auto n2 = fibonacci(s, n - 2);

co_return co_await n1 + co_await n2;}

Bryce Adelstein Lelbach, Meeting C++ 2019

Copyright (C) 2019 Bryce Adelstein Lelbach

The C++20 Synchronization Library

Page 2: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

Bryce Adelstein Lelbach

CUDA C++ Core Libraries Lead

ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair

THE C++20 SYNCHRONIZATION LIBRARY@blelbach

Page 3: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

includecpp.org

Page 4: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

Bryce Adelstein Lelbach

CUDA C++ Core Libraries Lead

ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair

THE C++20 SYNCHRONIZATION LIBRARY@blelbach

Page 5: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

namespace stdr = std::ranges;namespace stdv = std::views;

void f(std::invocable auto&&);// ^ Constrained template.

5Copyright (C) 2019 Bryce Adelstein Lelbach

Page 6: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

Recipe For a Tasking Runtime

Worker threads.

Multi-consumer, multi-producer concurrent queue.

Termination detection mechanism.

Parallel algorithms.

6Copyright (C) 2019 Bryce Adelstein Lelbach

Page 7: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

Recipe For a Tasking Runtime

Worker threads.

Multi-consumer, multi-producer concurrent queue.

Termination detection mechanism.

Parallel algorithms.

7Copyright (C) 2019 Bryce Adelstein Lelbach

Page 8: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

struct thread_group {private:std::vector<std::thread> members;

public:thread_group(std::uint64_t n, std::invocable auto&& f) {for (auto i : stdv::iota(0, n)) members.emplace_back(f);

}};

8Copyright (C) 2019 Bryce Adelstein Lelbach

Page 9: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

struct thread_group {private:std::vector<std::thread> members;

public:thread_group(std::uint64_t n, std::invocable auto&& f) {for (auto i : stdv::iota(0, n)) members.emplace_back(f);

}};

int main() {std::atomic<std::uint64_t> count(0);

{ thread_group tg(6, [&] { ++count; });

}

std::cout << count << "\n";}

9Copyright (C) 2019 Bryce Adelstein Lelbach

Page 10: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

struct thread_group {private:std::vector<std::thread> members;

public:thread_group(std::uint64_t n, std::invocable auto&& f) {for (auto i : stdv::iota(0, n)) members.emplace_back(f);

}};

int main() {std::atomic<std::uint64_t> count(0);

{ thread_group tg(6, [&] { ++count; });

}

std::cout << count << "\n";}

10Copyright (C) 2019 Bryce Adelstein Lelbach

Page 11: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

struct thread_group {private:std::vector<std::thread> members;

public:thread_group(std::uint64_t n, std::invocable auto&& f) {for (auto i : stdv::iota(0, n)) members.emplace_back(f);

}

~thread_group() {stdr::for_each(members, [] (std::thread& t) { t.join(); });

}};

11Copyright (C) 2019 Bryce Adelstein Lelbach

Page 12: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

struct thread_group {private:std::vector<std::jthread> members;

public:thread_group(std::uint64_t n, std::invocable auto&& f) {for (auto i : stdv::iota(0, n)) members.emplace_back(f);

}};

12Copyright (C) 2019 Bryce Adelstein Lelbach

Page 13: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::jthread

Just like std::thread, except:

When destroyed, if the thread is joinable, it joins instead of calling terminate.

Joining Thread

13Copyright (C) 2019 Bryce Adelstein Lelbach

Page 14: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

struct thread_group {private:std::vector<std::jthread> members;

public:thread_group(std::uint64_t n, std::invocable auto&& f) {for (auto i : stdv::iota(0, n)) members.emplace_back(f);

}};

14Copyright (C) 2019 Bryce Adelstein Lelbach

Page 15: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

struct thread_group {private:std::vector<std::jthread> members;

public:thread_group(std::uint64_t n, std::invocable<std::stop_token> auto&& f) {for (auto i : stdv::iota(0, n)) members.emplace_back(f);

}};

int main() {std::atomic<std::uint64_t> count(0);

{thread_group tg(6,[&] (std::stop_token s) { while (!s.stop_requested()) ++count; }

);}

std::cout << count << "\n";}

15Copyright (C) 2019 Bryce Adelstein Lelbach

Page 16: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::jthread

Just like std::thread, except:

When destroyed, if the thread is joinable, it joins instead of calling terminate.

It supports interruption.

std::jthread invocables will be passed a std::stop_token parameter if they support it.

Interruption API:

[[nodiscard]] stop_source std::jthread::get_stop_source() noexcept;

[[nodiscard]] stop_token std::jthread::get_stop_token() const noexcept;

bool std::jthread::request_stop() noexcept;

Joining Thread

16Copyright (C) 2019 Bryce Adelstein Lelbach

Page 17: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::stop_*

std::stop_source (analogous to a promise)

Producer of stop requests.

Owns the shared state (if any).

std::stop_token (analogous to future)

Handle to a std::stop_source.

Consumer only; can query for stop requests, but can’t make them.

std::stop_callback (analogous to future::then)

Mechanism for registering invocables to be run upon receiving a stop request.

Interruption Facilities

17Copyright (C) 2019 Bryce Adelstein Lelbach

Page 18: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

CV Interruption Support

struct condition_variable_any {template <typename Lock, typename Predicate>bool wait(Lock& lock, stop_token stoken, Predicate pred);

template <typename Lock, class Clock, typename Duration, typename Predicate>bool wait_until(Lock& lock, stop_token stoken,

chrono::time_point<Clock, Duration> const& abs, Predicate pred);template <typename Lock, typename Rep, typename Period, typename Predicate>bool wait_for(Lock& lock, stop_token stoken,

const chrono::duration<Rep, Period>& rel, Predicate pred);};

18Copyright (C) 2019 Bryce Adelstein Lelbach

Page 19: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

Recipe For a Tasking Runtime

Worker threads.

Multi-consumer, multi-producer concurrent queue.

Termination detection mechanism.

Parallel algorithms.

19Copyright (C) 2019 Bryce Adelstein Lelbach

Page 20: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <typename T, std::uint64_t QueueDepth>struct concurrent_bounded_queue {private:std::queue<T> items; std::mutex items_mtx;std::counting_semaphore<QueueDepth> items_produced{0};std::counting_semaphore<QueueDepth> remaining_space{QueueDepth};

void push(std::convertible_to<T> auto&& u);T pop();

public:constexpr concurrent_bounded_queue() = default;

void enqueue(std::convertible_to<T> auto&& u);

T dequeue();std::optional<T> try_dequeue();

};

20Copyright (C) 2019 Bryce Adelstein Lelbach

Page 21: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <typename T, std::uint64_t QueueDepth>struct concurrent_bounded_queue {private:std::queue<T> items; std::mutex items_mtx;std::counting_semaphore<QueueDepth> items_produced{0};std::counting_semaphore<QueueDepth> remaining_space{QueueDepth};

void push(std::convertible_to<T> auto&& u);T pop();

public:constexpr concurrent_bounded_queue() = default;

void enqueue(std::convertible_to<T> auto&& u);

T dequeue();std::optional<T> try_dequeue();

};

21Copyright (C) 2019 Bryce Adelstein Lelbach

Page 22: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <typename T, std::uint64_t QueueDepth>struct concurrent_bounded_queue {private:std::queue<T> items; std::mutex items_mtx;std::counting_semaphore<QueueDepth> items_produced{0};std::counting_semaphore<QueueDepth> remaining_space{QueueDepth};

void push(std::convertible_to<T> auto&& u);T pop();

public:constexpr concurrent_bounded_queue() = default;

void enqueue(std::convertible_to<T> auto&& u);

T dequeue();std::optional<T> try_dequeue();

};

22Copyright (C) 2019 Bryce Adelstein Lelbach

Page 23: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <typename T, std::uint64_t QueueDepth>struct concurrent_bounded_queue {private:std::queue<T> items; std::mutex items_mtx;std::counting_semaphore<QueueDepth> items_produced{0};std::counting_semaphore<QueueDepth> remaining_space{QueueDepth};

void push(std::convertible_to<T> auto&& u);T pop();

public:constexpr concurrent_bounded_queue() = default;

void enqueue(std::convertible_to<T> auto&& u);

T dequeue();std::optional<T> try_dequeue();

};

23Copyright (C) 2019 Bryce Adelstein Lelbach

Page 24: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <typename T, std::uint64_t QueueDepth>struct concurrent_bounded_queue {private:std::queue<T> items; std::mutex items_mtx;std::counting_semaphore<QueueDepth> items_produced{0};std::counting_semaphore<QueueDepth> remaining_space{QueueDepth};

void push(std::convertible_to<T> auto&& u);T pop();

public:constexpr concurrent_bounded_queue() = default;

void enqueue(std::convertible_to<T> auto&& u);

T dequeue();std::optional<T> try_dequeue();

};

24Copyright (C) 2019 Bryce Adelstein Lelbach

Page 25: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::counting_semaphore

template <ptrdiff_t least_max_value = implementation-defined>struct counting_semaphore {static constexpr ptrdiff_t max() noexcept;

constexpr explicit counting_semaphore(ptrdiff_t desired);

void release(ptrdiff_t update = 1);

void acquire();bool try_acquire() noexcept;template <typename Rep, typename Period>bool try_acquire_for(const chrono::duration<Rep, Period>& rel_time);

template <typename Clock, typename Duration>bool try_acquire_until(const chrono::time_point<Clock, Duration>& abs_time);

};25Copyright (C) 2019 Bryce Adelstein Lelbach

Page 26: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::counting_semaphore

template <ptrdiff_t least_max_value = implementation-defined>struct counting_semaphore {static constexpr ptrdiff_t max() noexcept;

constexpr explicit counting_semaphore(ptrdiff_t desired);

void release(ptrdiff_t update = 1);

void acquire();bool try_acquire() noexcept;template <typename Rep, typename Period>bool try_acquire_for(const chrono::duration<Rep, Period>& rel_time);

template <typename Clock, typename Duration>bool try_acquire_until(const chrono::time_point<Clock, Duration>& abs_time);

};26Copyright (C) 2019 Bryce Adelstein Lelbach

Page 27: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::counting_semaphore

template <ptrdiff_t least_max_value = implementation-defined>struct counting_semaphore {static constexpr ptrdiff_t max() noexcept;

constexpr explicit counting_semaphore(ptrdiff_t desired);

void release(ptrdiff_t update = 1);

void acquire();bool try_acquire() noexcept;template <typename Rep, typename Period>bool try_acquire_for(const chrono::duration<Rep, Period>& rel_time);

template <typename Clock, typename Duration>bool try_acquire_until(const chrono::time_point<Clock, Duration>& abs_time);

};27Copyright (C) 2019 Bryce Adelstein Lelbach

Page 28: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::counting_semaphore

template <ptrdiff_t least_max_value = implementation-defined>struct counting_semaphore {static constexpr ptrdiff_t max() noexcept;

constexpr explicit counting_semaphore(ptrdiff_t desired);

void release(ptrdiff_t update = 1);

void acquire();bool try_acquire() noexcept;template <typename Rep, typename Period>bool try_acquire_for(const chrono::duration<Rep, Period>& rel_time);

template <typename Clock, typename Duration>bool try_acquire_until(const chrono::time_point<Clock, Duration>& abs_time);

};28Copyright (C) 2019 Bryce Adelstein Lelbach

Page 29: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::counting_semaphore

template <ptrdiff_t least_max_value = implementation-defined>struct counting_semaphore {static constexpr ptrdiff_t max() noexcept;

constexpr explicit counting_semaphore(ptrdiff_t desired);

void release(ptrdiff_t update = 1);

void acquire();bool try_acquire() noexcept;template <typename Rep, typename Period>bool try_acquire_for(const chrono::duration<Rep, Period>& rel_time);

template <typename Clock, typename Duration>bool try_acquire_until(const chrono::time_point<Clock, Duration>& abs_time);

};29Copyright (C) 2019 Bryce Adelstein Lelbach

Page 30: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::counting_semaphore

using binary_semaphore = counting_semaphore<1>;

30Copyright (C) 2019 Bryce Adelstein Lelbach

Page 31: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::mutex vs std::counting_semaphore<N>

31Copyright (C) 2019 Bryce Adelstein Lelbach

std::mutex

Ensures a resource is only accessed by one thread at a time.

Each thread which needs the resource blocks until it receives it.

Thread identity:

Only the locking thread may unlock.

A locked mutex is unlocked once.

std::counting_semaphore<N>

Does not limit how many threads access resources concurrently.

Each thread which needs a resource blocks until it receives one.

No thread identity:

Any thread may release.

A thread may release up to N count.

Page 32: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <typename T, std::uint64_t QueueDepth>struct concurrent_bounded_queue {private:std::queue<T> items; std::mutex items_mtx;std::counting_semaphore<QueueDepth> items_produced{0};std::counting_semaphore<QueueDepth> remaining_space{QueueDepth};

void push(std::convertible_to<T> auto&& u);T pop();

public:constexpr concurrent_bounded_queue() = default;

void enqueue(std::convertible_to<T> auto&& u);

T dequeue();std::optional<T> try_dequeue();

};

32Copyright (C) 2019 Bryce Adelstein Lelbach

Page 33: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <typename T, std::uint64_t QueueDepth>struct concurrent_bounded_queue {private:std::queue<T> items; std::mutex items_mtx;std::counting_semaphore<QueueDepth> items_produced{0};std::counting_semaphore<QueueDepth> remaining_space{QueueDepth};

void push(std::convertible_to<T> auto&& u);T pop();

public:constexpr concurrent_bounded_queue() = default;

void enqueue(std::convertible_to<T> auto&& u);

T dequeue();std::optional<T> try_dequeue();

};

33Copyright (C) 2019 Bryce Adelstein Lelbach

Page 34: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <typename T, std::uint64_t QueueDepth>struct concurrent_bounded_queue {private:std::queue<T> items; std::mutex items_mtx;std::counting_semaphore<QueueDepth> items_produced{0};std::counting_semaphore<QueueDepth> remaining_space{QueueDepth};

void push(std::convertible_to<T> auto&& u){std::scoped_lock l(items_mtx);items.emplace(std::forward<decltype(u)>(u));

}};

34Copyright (C) 2019 Bryce Adelstein Lelbach

Page 35: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <typename T, std::uint64_t QueueDepth>struct concurrent_bounded_queue {private:std::queue<T> items; std::mutex items_mtx;std::counting_semaphore<QueueDepth> items_produced{0};std::counting_semaphore<QueueDepth> remaining_space{QueueDepth};

void push(std::convertible_to<T> auto&& u){std::scoped_lock l(items_mtx);items.emplace(std::forward<decltype(u)>(u));

}};

35Copyright (C) 2019 Bryce Adelstein Lelbach

Page 36: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <typename T, std::uint64_t QueueDepth>struct concurrent_bounded_queue {private:std::queue<T> items; std::mutex items_mtx;std::counting_semaphore<QueueDepth> items_produced{0};std::counting_semaphore<QueueDepth> remaining_space{QueueDepth};

void push(std::convertible_to<T> auto&& u){std::scoped_lock l(items_mtx);items.emplace(std::forward<decltype(u)>(u));

}};

36Copyright (C) 2019 Bryce Adelstein Lelbach

Page 37: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <typename T, std::uint64_t QueueDepth>struct concurrent_bounded_queue {private:std::queue<T> items; std::mutex items_mtx;std::counting_semaphore<QueueDepth> items_produced{0};std::counting_semaphore<QueueDepth> remaining_space{QueueDepth};

public:void enqueue(std::convertible_to<T> auto&& u) {remaining_space.acquire();push(std::forward<decltype(u)>(u));items_produced.release();

}};

37Copyright (C) 2019 Bryce Adelstein Lelbach

Page 38: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <typename T, std::uint64_t QueueDepth>struct concurrent_bounded_queue {private:std::queue<T> items; std::mutex items_mtx;std::counting_semaphore<QueueDepth> items_produced{0};std::counting_semaphore<QueueDepth> remaining_space{QueueDepth};

public:void enqueue(std::convertible_to<T> auto&& u) {remaining_space.acquire();push(std::forward<decltype(u)>(u));items_produced.release();

}};

38Copyright (C) 2019 Bryce Adelstein Lelbach

Page 39: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <typename T, std::uint64_t QueueDepth>struct concurrent_bounded_queue {private:std::queue<T> items; std::mutex items_mtx;std::counting_semaphore<QueueDepth> items_produced{0};std::counting_semaphore<QueueDepth> remaining_space{QueueDepth};

public:void enqueue(std::convertible_to<T> auto&& u) {remaining_space.acquire();push(std::forward<decltype(u)>(u));items_produced.release();

}};

39Copyright (C) 2019 Bryce Adelstein Lelbach

Page 40: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <typename T, std::uint64_t QueueDepth>struct concurrent_bounded_queue {private:std::queue<T> items; std::mutex items_mtx;std::counting_semaphore<QueueDepth> items_produced{0};std::counting_semaphore<QueueDepth> remaining_space{QueueDepth};

public:void enqueue(std::convertible_to<T> auto&& u) {remaining_space.acquire();push(std::forward<decltype(u)>(u));items_produced.release();

}};

40Copyright (C) 2019 Bryce Adelstein Lelbach

Page 41: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <typename T, std::uint64_t QueueDepth>struct concurrent_bounded_queue {private:std::queue<T> items; std::mutex items_mtx;std::counting_semaphore<QueueDepth> items_produced{0};std::counting_semaphore<QueueDepth> remaining_space{QueueDepth};

void push(std::convertible_to<T> auto&& u);T pop();

public:constexpr concurrent_bounded_queue() = default;

void enqueue(std::convertible_to<T> auto&& u);

T dequeue();std::optional<T> try_dequeue();

};

41Copyright (C) 2019 Bryce Adelstein Lelbach

Page 42: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <typename T, std::uint64_t QueueDepth>struct concurrent_bounded_queue {private:std::queue<T> items; std::mutex items_mtx;std::counting_semaphore<QueueDepth> items_produced{0};std::counting_semaphore<QueueDepth> remaining_space{QueueDepth};

T pop() {std::optional<T> tmp;std::scoped_lock l(items_mtx);tmp = std::move(items.front());items.pop();return std::move(*tmp);

}};

42Copyright (C) 2019 Bryce Adelstein Lelbach

Page 43: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <typename T, std::uint64_t QueueDepth>struct concurrent_bounded_queue {private:std::queue<T> items; std::mutex items_mtx;std::counting_semaphore<QueueDepth> items_produced{0};std::counting_semaphore<QueueDepth> remaining_space{QueueDepth};

T pop() {std::optional<T> tmp;std::scoped_lock l(items_mtx);tmp = std::move(items.front());items.pop();return std::move(*tmp);

}};

43Copyright (C) 2019 Bryce Adelstein Lelbach

Page 44: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <typename T, std::uint64_t QueueDepth>struct concurrent_bounded_queue {private:std::queue<T> items; std::mutex items_mtx;std::counting_semaphore<QueueDepth> items_produced{0};std::counting_semaphore<QueueDepth> remaining_space{QueueDepth};

T pop() {std::optional<T> tmp;std::scoped_lock l(items_mtx);tmp = std::move(items.front());items.pop();return std::move(*tmp);

}};

44Copyright (C) 2019 Bryce Adelstein Lelbach

Page 45: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <typename T, std::uint64_t QueueDepth>struct concurrent_bounded_queue {private:std::queue<T> items; std::mutex items_mtx;std::counting_semaphore<QueueDepth> items_produced{0};std::counting_semaphore<QueueDepth> remaining_space{QueueDepth};

public:T dequeue() {items_produced.acquire();T tmp = pop();remaining_space.release();return std::move(tmp);

}};

45Copyright (C) 2019 Bryce Adelstein Lelbach

Page 46: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <typename T, std::uint64_t QueueDepth>struct concurrent_bounded_queue {private:std::queue<T> items; std::mutex items_mtx;std::counting_semaphore<QueueDepth> items_produced{0};std::counting_semaphore<QueueDepth> remaining_space{QueueDepth};

public:T dequeue() {items_produced.acquire();T tmp = pop();remaining_space.release();return std::move(tmp);

}};

46Copyright (C) 2019 Bryce Adelstein Lelbach

Page 47: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <typename T, std::uint64_t QueueDepth>struct concurrent_bounded_queue {private:std::queue<T> items; std::mutex items_mtx;std::counting_semaphore<QueueDepth> items_produced{0};std::counting_semaphore<QueueDepth> remaining_space{QueueDepth};

public:T dequeue() {items_produced.acquire();T tmp = pop();remaining_space.release();return std::move(tmp);

}};

47Copyright (C) 2019 Bryce Adelstein Lelbach

Page 48: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <typename T, std::uint64_t QueueDepth>struct concurrent_bounded_queue {private:std::queue<T> items; std::mutex items_mtx;std::counting_semaphore<QueueDepth> items_produced{0};std::counting_semaphore<QueueDepth> remaining_space{QueueDepth};

public:T dequeue() {items_produced.acquire();T tmp = pop();remaining_space.release();return std::move(tmp);

}};

48Copyright (C) 2019 Bryce Adelstein Lelbach

Page 49: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <typename T, std::uint64_t QueueDepth>struct concurrent_bounded_queue {private:std::queue<T> items; std::mutex items_mtx;std::counting_semaphore<QueueDepth> items_produced{0};std::counting_semaphore<QueueDepth> remaining_space{QueueDepth};

public:std::optional<T> try_dequeue() {if (!items_produced.try_acquire()) return {};T tmp = pop();remaining_space.release();return std::move(tmp);

}};

49Copyright (C) 2019 Bryce Adelstein Lelbach

Page 50: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <typename T, std::uint64_t QueueDepth>struct concurrent_bounded_queue {private:std::queue<T> items; std::mutex items_mtx;std::counting_semaphore<QueueDepth> items_produced{0};std::counting_semaphore<QueueDepth> remaining_space{QueueDepth};

public:std::optional<T> try_dequeue() {if (!items_produced.try_acquire()) return {};T tmp = pop();remaining_space.release();return std::move(tmp);

}};

50Copyright (C) 2019 Bryce Adelstein Lelbach

Page 51: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <typename T, std::uint64_t QueueDepth>struct concurrent_bounded_queue {private:std::queue<T> items; std::mutex items_mtx;std::counting_semaphore<QueueDepth> items_produced{0};std::counting_semaphore<QueueDepth> remaining_space{QueueDepth};

void push(std::convertible_to<T> auto&& u);T pop();

public:constexpr concurrent_bounded_queue() = default;

void enqueue(std::convertible_to<T> auto&& u);

T dequeue();std::optional<T> try_dequeue();

};

51Copyright (C) 2019 Bryce Adelstein Lelbach

Page 52: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <typename T, std::uint64_t QueueDepth>struct concurrent_bounded_queue {private:std::queue<T> items; std::mutex items_mtx;std::counting_semaphore<QueueDepth> items_produced{0};std::counting_semaphore<QueueDepth> remaining_space{QueueDepth};

void push(std::convertible_to<T> auto&& u);T pop();

public:constexpr concurrent_bounded_queue() = default;

void enqueue(std::convertible_to<T> auto&& u);

T dequeue();std::optional<T> try_dequeue();

};

52Copyright (C) 2019 Bryce Adelstein Lelbach

Page 53: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

struct spin_mutex {private:std::atomic_flag flag = ATOMIC_FLAG_INIT;

public:void lock() {while (flag.test_and_set(std::memory_order_acquire));

}

void unlock() {flag.clear(std::memory_order_release);

}};

53Copyright (C) 2019 Bryce Adelstein Lelbach

Page 54: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

struct spin_mutex {private:std::atomic_flag flag = ATOMIC_FLAG_INIT;

public:void lock() {for (std::uint64_t k = 0; flag.test_and_set(std::memory_order_acquire); ++k) {if (k < 16) __asm__ __volatile__( "rep; nop" : : : "memory" );else if (k < 64) sched_yield();else {timespec rqtp = { 0, 0 };rqtp.tv_sec = 0; rqtp.tv_nsec = 1000;nanosleep(&rqtp, nullptr);

}}

}

void unlock() {flag.clear(std::memory_order_release);

}};

54Copyright (C) 2019 Bryce Adelstein Lelbach

Page 55: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

struct spin_mutex {private:std::atomic_flag flag = ATOMIC_FLAG_INIT;

public:void lock() {for (std::uint64_t k = 0; flag.test_and_set(std::memory_order_acquire); ++k) {if (k < 16) __asm__ __volatile__( "rep; nop" : : : "memory" );else if (k < 64) sched_yield();else {timespec rqtp = { 0, 0 };rqtp.tv_sec = 0; rqtp.tv_nsec = 1000;nanosleep(&rqtp, nullptr);

}}

}

void unlock() {flag.clear(std::memory_order_release);

}};

55Copyright (C) 2019 Bryce Adelstein Lelbach

Page 56: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

struct spin_mutex {private:std::atomic_flag flag = ATOMIC_FLAG_INIT;

public:void lock() {for (std::uint64_t k = 0; flag.test_and_set(std::memory_order_acquire); ++k) {if (k < 16) __asm__ __volatile__( "rep; nop" : : : "memory" );else if (k < 64) sched_yield();else {timespec rqtp = { 0, 0 };rqtp.tv_sec = 0; rqtp.tv_nsec = 1000;nanosleep(&rqtp, nullptr);

}}

}

void unlock() {flag.clear(std::memory_order_release);

}};

56Copyright (C) 2019 Bryce Adelstein Lelbach

Page 57: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

struct spin_mutex {private:std::atomic_flag flag = ATOMIC_FLAG_INIT;

public:void lock() {for (std::uint64_t k = 0; flag.test_and_set(std::memory_order_acquire); ++k) {if (k < 16) __asm__ __volatile__( "rep; nop" : : : "memory" );else if (k < 64) sched_yield();else {timespec rqtp = { 0, 0 };rqtp.tv_sec = 0; rqtp.tv_nsec = 1000;nanosleep(&rqtp, nullptr);

}}

}

void unlock() {flag.clear(std::memory_order_release);

}};

57Copyright (C) 2019 Bryce Adelstein Lelbach

Page 58: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

struct spin_mutex {private:std::atomic_flag flag = ATOMIC_FLAG_INIT;

public:void lock() {for (std::uint64_t k = 0; flag.test_and_set(std::memory_order_acquire); ++k) {if (k < 16) __asm__ __volatile__( "rep; nop" : : : "memory" );else if (k < 64) sched_yield();else {timespec rqtp = { 0, 0 };rqtp.tv_sec = 0; rqtp.tv_nsec = 1000;nanosleep(&rqtp, nullptr);

}}

}

void unlock() {flag.clear(std::memory_order_release);

}};

58Copyright (C) 2019 Bryce Adelstein Lelbach

Page 59: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

struct spin_mutex {private:std::atomic_flag flag = ATOMIC_FLAG_INIT;

public:void lock() {while (flag.test_and_set(std::memory_order_acquire))flag.wait(true, std::memory_order_relaxed);

}

void unlock() {flag.clear(std::memory_order_release);flag.notify_one();

}};

59Copyright (C) 2019 Bryce Adelstein Lelbach

Page 60: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

struct spin_mutex {private:std::atomic<bool> flag = ATOMIC_VAR_INIT(false);

public:void lock() {while (flag.exchange(true, std::memory_order_acquire))flag.wait(true, std::memory_order_relaxed);

}

void unlock() {flag.store(false, std::memory_order_release);flag.notify_one();

}};

60Copyright (C) 2019 Bryce Adelstein Lelbach

Page 61: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::atomic{_flag}::wait/notify

template <typename T>struct atomic {void wait(T old, memory_order = memory_order::seq_cst) const volatile noexcept;void wait(T old, memory_order = memory_order::seq_cst) const noexcept;void notify_one() volatile noexcept;void notify_one() noexcept;void notify_all() volatile noexcept;void notify_all() noexcept;

};

61Copyright (C) 2019 Bryce Adelstein Lelbach

Page 62: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::atomic{_flag}::wait/notify

template <typename T>struct atomic {void wait(T old, memory_order = memory_order::seq_cst) const volatile noexcept;void wait(T old, memory_order = memory_order::seq_cst) const noexcept;void notify_one() volatile noexcept;void notify_one() noexcept;void notify_all() volatile noexcept;void notify_all() noexcept;

};

62Copyright (C) 2019 Bryce Adelstein Lelbach

Page 63: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::atomic{_flag}::wait/notify

template <typename T>struct atomic {void wait(T old, memory_order = memory_order::seq_cst) const volatile noexcept;void wait(T old, memory_order = memory_order::seq_cst) const noexcept;void notify_one() volatile noexcept;void notify_one() noexcept;void notify_all() volatile noexcept;void notify_all() noexcept;

};

63Copyright (C) 2019 Bryce Adelstein Lelbach

Page 64: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::atomic{_flag}::wait/notify

struct atomic_flag {void wait(bool old, memory_order = memory_order::seq_cst) const volatile noexcept;void wait(bool old, memory_order = memory_order::seq_cst) const noexcept;void notify_one() volatile noexcept;void notify_one() noexcept;void notify_all() volatile noexcept;void notify_all() noexcept;

};

64Copyright (C) 2019 Bryce Adelstein Lelbach

Page 65: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::atomic_flag::test

struct atomic_flag {void wait(bool old, memory_order = memory_order::seq_cst) const volatile noexcept;void wait(bool old, memory_order = memory_order::seq_cst) const noexcept;void notify_one() volatile noexcept;void notify_one() noexcept;void notify_all() volatile noexcept;void notify_all() noexcept;

bool test(memory_order = memory_order::seq_cst) const volatile noexcept;bool test(memory_order = memory_order::seq_cst) const noexcept;

};

65Copyright (C) 2019 Bryce Adelstein Lelbach

Page 66: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::atomic{_flag} wait and notify

Futex. Supported for certain size objects on Linux and Windows.

Condition Variables. Supported for certain size objects on Linux and Mac.

Contention Table. Used to optimize futex notify or to hold CVs.

Timed back-off. Supported on everything.

Spinlock. Supported on everything. Note: performance is terrible.

Some possible implementation strategies

66Copyright (C) 2019 Bryce Adelstein Lelbach

Page 67: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

struct spin_mutex {private:std::atomic_flag flag = ATOMIC_FLAG_INIT;

public:void lock() {while (flag.test_and_set(std::memory_order_acquire))flag.wait(true, std::memory_order_relaxed);

}

void unlock() {flag.clear(std::memory_order_release);flag.notify_one();

}};

67Copyright (C) 2019 Bryce Adelstein Lelbach

Page 68: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

68Copyright (C) 2019 Bryce Adelstein Lelbach

spin_mutex

unlock

Thread A

lock

unlock

Thread B

lock

Page 69: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

69Copyright (C) 2019 Bryce Adelstein Lelbach

spin_mutex

lock

unlock

Thread A

lock

unlock

Thread B

UNFAIR

Page 70: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

struct ticket_mutex {private:std::atomic<int> in = ATOMIC_VAR_INIT(0);std::atomic<int> out = ATOMIC_VAR_INIT(0);

public:void lock() {auto const my = in.fetch_add(1, std::memory_order_acquire);while (true) {auto const now = out.load(std::memory_order_acquire);if (now == my) return;out.wait(now, std::memory_order_relaxed);

}}

void unlock() {out.fetch_add(1, std::memory_order_release);out.notify_all();

}}; 70Copyright (C) 2019 Bryce Adelstein Lelbach

Page 71: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

struct ticket_mutex {private:std::atomic<int> in = ATOMIC_VAR_INIT(0);std::atomic<int> out = ATOMIC_VAR_INIT(0);

public:void lock() {auto const my = in.fetch_add(1, std::memory_order_acquire);while (true) {auto const now = out.load(std::memory_order_acquire);if (now == my) return;out.wait(now, std::memory_order_relaxed);

}}

void unlock() {out.fetch_add(1, std::memory_order_release);out.notify_all();

}}; 71Copyright (C) 2019 Bryce Adelstein Lelbach

Page 72: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

struct ticket_mutex {private:std::atomic<int> in = ATOMIC_VAR_INIT(0);std::atomic<int> out = ATOMIC_VAR_INIT(0);

public:void lock() {auto const my = in.fetch_add(1, std::memory_order_acquire);while (true) {auto const now = out.load(std::memory_order_acquire);if (now == my) return;out.wait(now, std::memory_order_relaxed);

}}

void unlock() {out.fetch_add(1, std::memory_order_release);out.notify_all();

}}; 72Copyright (C) 2019 Bryce Adelstein Lelbach

Page 73: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

struct ticket_mutex {private:std::atomic<int> in = ATOMIC_VAR_INIT(0);std::atomic<int> out = ATOMIC_VAR_INIT(0);

public:void lock() {auto const my = in.fetch_add(1, std::memory_order_acquire);while (true) {auto const now = out.load(std::memory_order_acquire);if (now == my) return;out.wait(now, std::memory_order_relaxed);

}}

void unlock() {out.fetch_add(1, std::memory_order_release);out.notify_all();

}}; 73Copyright (C) 2019 Bryce Adelstein Lelbach

Page 74: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

struct ticket_mutex {private:std::atomic<int> in = ATOMIC_VAR_INIT(0);std::atomic<int> out = ATOMIC_VAR_INIT(0);

public:void lock() {auto const my = in.fetch_add(1, std::memory_order_acquire);while (true) {auto const now = out.load(std::memory_order_acquire);if (now == my) return;out.wait(now, std::memory_order_relaxed);

}}

void unlock() {out.fetch_add(1, std::memory_order_release);out.notify_all();

}}; 74Copyright (C) 2019 Bryce Adelstein Lelbach

Page 75: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

struct ticket_mutex {private:std::atomic<int> in = ATOMIC_VAR_INIT(0);std::atomic<int> out = ATOMIC_VAR_INIT(0);

public:void lock() {auto const my = in.fetch_add(1, std::memory_order_acquire);while (true) {auto const now = out.load(std::memory_order_acquire);if (now == my) return;out.wait(now, std::memory_order_relaxed);

}}

void unlock() {out.fetch_add(1, std::memory_order_release);out.notify_all();

}}; 75Copyright (C) 2019 Bryce Adelstein Lelbach

Page 76: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

76Copyright (C) 2019 Bryce Adelstein Lelbach

Page 77: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

struct ticket_mutex {private:std::atomic<int> in = ATOMIC_VAR_INIT(0);std::atomic<int> out = ATOMIC_VAR_INIT(0);

public:void lock() {auto const my = in.fetch_add(1, std::memory_order_acquire);while (true) {auto const now = out.load(std::memory_order_acquire);if (now == my) return;out.wait(now, std::memory_order_relaxed);

}}

void unlock() {out.fetch_add(1, std::memory_order_release);out.notify_all();

}}; 77Copyright (C) 2019 Bryce Adelstein Lelbach

Page 78: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

struct ticket_mutex {private:alignas(std::hardware_destructive_interference_size) std::atomic<int> in= ATOMIC_VAR_INIT(0);

alignas(std::hardware_destructive_interference_size) std::atomic<int> out= ATOMIC_VAR_INIT(0);

public:void lock() {auto const my = in.fetch_add(1, std::memory_order_acquire);while (true) {auto const now = out.load(std::memory_order_acquire);if (now == my) return;out.wait(now, std::memory_order_relaxed);

}}

void unlock() {out.fetch_add(1, std::memory_order_release);out.notify_all();

}}; 78Copyright (C) 2019 Bryce Adelstein Lelbach

Page 79: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <typename T, std::uint64_t QueueDepth>struct concurrent_bounded_queue {private:std::queue<T> items; ticket_mutex items_mtx;std::counting_semaphore<QueueDepth> items_produced{0};std::counting_semaphore<QueueDepth> remaining_space{QueueDepth};

void push(std::convertible_to<T> auto&& u);T pop();

public:constexpr concurrent_bounded_queue() = default;

void enqueue(std::convertible_to<T> auto&& u);

T dequeue();std::optional<T> try_dequeue();

};

79Copyright (C) 2019 Bryce Adelstein Lelbach

Page 80: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

Recipe For a Tasking Runtime

Worker threads.

Multi-consumer, multi-producer concurrent queue.

Termination detection mechanism.

Parallel algorithms.

80Copyright (C) 2019 Bryce Adelstein Lelbach

Page 81: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <std::uint64_t QueueDepth>struct bounded_depth_task_manager {private:concurrent_bounded_queue<std::any_invocable<void()>, QueueDepth> tasks;thread_group threads;

void process_tasks(std::stop_token s);

public:bounded_depth_task_manager(std::uint64_t n);

void submit(std::invocable auto&& f);};

81Copyright (C) 2019 Bryce Adelstein Lelbach

Page 82: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <std::uint64_t QueueDepth>struct bounded_depth_task_manager {private:concurrent_bounded_queue<std::any_invocable<void()>, QueueDepth> tasks;thread_group threads;

void process_tasks(std::stop_token s);

public:bounded_depth_task_manager(std::uint64_t n);

void submit(std::invocable auto&& f);};

82Copyright (C) 2019 Bryce Adelstein Lelbach

Page 83: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <std::uint64_t QueueDepth>struct bounded_depth_task_manager {private:concurrent_bounded_queue<std::any_invocable<void()>, QueueDepth> tasks;thread_group threads;

public:void submit(std::invocable auto&& f) {tasks.enqueue(std::forward<decltype(f)>(f));

}};

83Copyright (C) 2019 Bryce Adelstein Lelbach

Page 84: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <std::uint64_t QueueDepth>struct bounded_depth_task_manager {private:concurrent_bounded_queue<std::any_invocable<void()>, QueueDepth> tasks;thread_group threads;

void process_tasks(std::stop_token s);

public:bounded_depth_task_manager(std::uint64_t n);

void submit(std::invocable auto&& f);};

84Copyright (C) 2019 Bryce Adelstein Lelbach

Page 85: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <std::uint64_t QueueDepth>struct bounded_depth_task_manager {private:concurrent_bounded_queue<std::any_invocable<void()>, QueueDepth> tasks;thread_group threads;

void process_tasks(std::stop_token s) {while (!s.stop_requested())tasks.dequeue()();

}

public:bounded_depth_task_manager(std::uint64_t n): threads(n, [&] (std::stop_token s) { process_tasks(s); })

{}};

85Copyright (C) 2019 Bryce Adelstein Lelbach

Page 86: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <std::uint64_t QueueDepth>struct bounded_depth_task_manager {private:concurrent_bounded_queue<std::any_invocable<void()>, QueueDepth> tasks;thread_group threads;

void process_tasks(std::stop_token s) {while (!s.stop_requested())tasks.dequeue()();

}

public:bounded_depth_task_manager(std::uint64_t n): threads(n, [&] (std::stop_token s) { process_tasks(s); })

{}};

86Copyright (C) 2019 Bryce Adelstein Lelbach

Page 87: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

int main() {std::atomic<std::uint64_t> count(0);

{bounded_depth_task_manager<64> tm(6);

for (auto i : stdv::iota(0, 256))tm.submit([&] { ++count; });

}

std::cout << count << "\n";}

87Copyright (C) 2019 Bryce Adelstein Lelbach

Page 88: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

int main() {std::atomic<std::uint64_t> count(0);

{bounded_depth_task_manager<64> tm(6);

for (auto i : stdv::iota(0, 256))tm.submit([&] { ++count; });

}

std::cout << count << "\n";}

88Copyright (C) 2019 Bryce Adelstein Lelbach

Page 89: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <std::uint64_t QueueDepth>struct bounded_depth_task_manager {private:concurrent_bounded_queue<std::any_invocable<void()>, QueueDepth> tasks;thread_group threads;

void process_tasks(std::stop_token s) {while (!s.stop_requested())tasks.dequeue()();

}

public:bounded_depth_task_manager(std::uint64_t n): threads(n, [&] (std::stop_token s) { process_tasks(s); })

{}};

89Copyright (C) 2019 Bryce Adelstein Lelbach

Page 90: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <std::uint64_t QueueDepth>struct bounded_depth_task_manager {private:concurrent_bounded_queue<std::any_invocable<void()>, QueueDepth> tasks;thread_group threads;

void process_tasks(std::stop_token s) {while (!s.stop_requested())tasks.dequeue()();

while (true) {if (auto f = tasks.try_dequeue()) std::move(*f)();else break;

}}

public:bounded_depth_task_manager(std::uint64_t n): threads(n, [&] (std::stop_token s) { process_tasks(s); })

{}};

90Copyright (C) 2019 Bryce Adelstein Lelbach

Page 91: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <std::uint64_t QueueDepth>struct bounded_depth_task_manager {private:void process_tasks(std::stop_token s) {while (!s.stop_requested())tasks.dequeue()();

while (true) {if (auto f = tasks.try_dequeue()) std::move(*f)();else break;

}}

public:~bounded_depth_task_manager() {

std::latch l(threads.size() + 1);for (auto i : stdv::iota(0, threads.size()))submit([&] { l.arrive_and_wait(); });

threads.request_stop();l.count_down();

}};

91Copyright (C) 2019 Bryce Adelstein Lelbach

Page 92: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <std::uint64_t QueueDepth>struct bounded_depth_task_manager {private:void process_tasks(std::stop_token s) {while (!s.stop_requested())tasks.dequeue()();

while (true) {if (auto f = tasks.try_dequeue()) std::move(*f)();else break;

}}

public:~bounded_depth_task_manager() {

std::latch l(threads.size() + 1);for (auto i : stdv::iota(0, threads.size()))submit([&] { l.arrive_and_wait(); });

threads.request_stop();l.count_down();

}};

92Copyright (C) 2019 Bryce Adelstein Lelbach

Page 93: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::latch

struct latch {static constexpr ptrdiff_t max() noexcept;

constexpr explicit latch(ptrdiff_t expected);

latch(const latch&) = delete;latch& operator=(const latch&) = delete;

void count_down(ptrdiff_t update = 1);bool try_wait() const noexcept;void wait() const;void arrive_and_wait(ptrdiff_t update = 1);

};

93Copyright (C) 2019 Bryce Adelstein Lelbach

Page 94: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::latch

struct latch {static constexpr ptrdiff_t max() noexcept;

constexpr explicit latch(ptrdiff_t expected);

latch(const latch&) = delete;latch& operator=(const latch&) = delete;

void count_down(ptrdiff_t update = 1);bool try_wait() const noexcept;void wait() const;void arrive_and_wait(ptrdiff_t update = 1);

};

94Copyright (C) 2019 Bryce Adelstein Lelbach

Page 95: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::latch

struct latch {static constexpr ptrdiff_t max() noexcept;

constexpr explicit latch(ptrdiff_t expected);

latch(const latch&) = delete;latch& operator=(const latch&) = delete;

void count_down(ptrdiff_t update = 1);bool try_wait() const noexcept;void wait() const;void arrive_and_wait(ptrdiff_t update = 1);

};

95Copyright (C) 2019 Bryce Adelstein Lelbach

Page 96: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::latch

struct latch {static constexpr ptrdiff_t max() noexcept;

constexpr explicit latch(ptrdiff_t expected);

latch(const latch&) = delete;latch& operator=(const latch&) = delete;

void count_down(ptrdiff_t update = 1);bool try_wait() const noexcept;void wait() const;void arrive_and_wait(ptrdiff_t update = 1);

};

96Copyright (C) 2019 Bryce Adelstein Lelbach

Page 97: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::latch

struct latch {static constexpr ptrdiff_t max() noexcept;

constexpr explicit latch(ptrdiff_t expected);

latch(const latch&) = delete;latch& operator=(const latch&) = delete;

void count_down(ptrdiff_t update = 1);bool try_wait() const noexcept;void wait() const;void arrive_and_wait(ptrdiff_t update = 1);

};

97Copyright (C) 2019 Bryce Adelstein Lelbach

Page 98: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <std::uint64_t QueueDepth>struct bounded_depth_task_manager {private:void process_tasks(std::stop_token s) {while (!s.stop_requested())tasks.dequeue()();

while (true) {if (auto f = tasks.try_dequeue()) std::move(*f)();else break;

}}

public:~bounded_depth_task_manager() {

std::latch l(threads.size() + 1);for (auto i : stdv::iota(0, threads.size()))submit([&] { l.arrive_and_wait(); });

threads.request_stop();l.count_down();

}};

98Copyright (C) 2019 Bryce Adelstein Lelbach

Page 99: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <std::uint64_t QueueDepth>struct bounded_depth_task_manager {private:void process_tasks(std::stop_token s) {while (!s.stop_requested())tasks.dequeue()();

while (true) {if (auto f = tasks.try_dequeue()) std::move(*f)();else break;

}}

public:~bounded_depth_task_manager() {

std::latch l(threads.size() + 1);for (auto i : stdv::iota(0, threads.size()))submit([&] { l.arrive_and_wait(); });

threads.request_stop();l.count_down();

}};

99Copyright (C) 2019 Bryce Adelstein Lelbach

Page 100: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <std::uint64_t QueueDepth>struct bounded_depth_task_manager {private:void process_tasks(std::stop_token s) {while (!s.stop_requested())tasks.dequeue()();

while (true) {if (auto f = tasks.try_dequeue()) std::move(*f)();else break;

}}

public:~bounded_depth_task_manager() {

std::latch l(threads.size() + 1);for (auto i : stdv::iota(0, threads.size()))submit([&] { l.arrive_and_wait(); });

threads.request_stop();l.count_down();

}};

100Copyright (C) 2019 Bryce Adelstein Lelbach

Page 101: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <std::uint64_t QueueDepth>struct bounded_depth_task_manager {private:void process_tasks(std::stop_token s) {while (!s.stop_requested())tasks.dequeue()();

while (true) {if (auto f = tasks.try_dequeue()) std::move(*f)();else break;

}}

public:~bounded_depth_task_manager() {

std::latch l(threads.size() + 1);for (auto i : stdv::iota(0, threads.size()))submit([&] { l.arrive_and_wait(); });

threads.request_stop();l.count_down();

}};

101Copyright (C) 2019 Bryce Adelstein Lelbach

Page 102: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <std::uint64_t QueueDepth>struct bounded_depth_task_manager {private:void process_tasks(std::stop_token s) {while (!s.stop_requested())tasks.dequeue()();

while (true) {if (auto f = tasks.try_dequeue()) std::move(*f)();else break;

}}

public:~bounded_depth_task_manager() {

std::latch l(threads.size() + 1);for (auto i : stdv::iota(0, threads.size()))submit([&] { l.arrive_and_wait(); });

threads.request_stop();l.count_down();

}};

102Copyright (C) 2019 Bryce Adelstein Lelbach

Page 103: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <std::uint64_t QueueDepth>struct bounded_depth_task_manager {private:void process_tasks(std::stop_token s) {while (!s.stop_requested())tasks.dequeue()();

while (true) {if (auto f = tasks.try_dequeue()) std::move(*f)();else break;

}}

public:~bounded_depth_task_manager() {

std::latch l(threads.size() + 1);for (auto i : stdv::iota(0, threads.size()))submit([&] { l.arrive_and_wait(); });

threads.request_stop();l.count_down();

}};

103Copyright (C) 2019 Bryce Adelstein Lelbach

Page 104: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <std::uint64_t QueueDepth>struct bounded_depth_task_manager {private:void process_tasks(std::stop_token s) {while (!s.stop_requested())tasks.dequeue()();

}

public:~bounded_depth_task_manager() {

std::latch l(threads.size() + 1);for (auto i : stdv::iota(0, threads.size()))submit([&] { l.arrive_and_wait(); });

threads.request_stop();l.count_down();

}};

104Copyright (C) 2019 Bryce Adelstein Lelbach

Page 105: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

Recipe For a Tasking Runtime

Worker threads.

Multi-consumer, multi-producer concurrent queue.

Termination detection mechanism.

Parallel algorithms.

105Copyright (C) 2019 Bryce Adelstein Lelbach

Page 106: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <stdr::range I, std::random_access_iterator O,typename T, std::invocable</* ... */> BO>

requires /* ... */void histogram(I&& input, O output, T inc, OP op) {stdr::for_each(input, [&] (auto&& t) { output[op(t)] += inc; });

}

106Copyright (C) 2019 Bryce Adelstein Lelbach

Page 107: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <stdr::range I, std::random_access_iterator O,typename T, std::invocable</* ... */> BO>

requires /* ... */void histogram(I&& input, O output, T inc, OP op) {stdr::for_each(input, [&] (auto&& t) { output[op(t)] += inc; });

}

107Copyright (C) 2019 Bryce Adelstein Lelbach

Page 108: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

Histogram

108Copyright (C) 2019 Bryce Adelstein Lelbach

H e l l o C + +

Page 109: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

Histogram

109Copyright (C) 2019 Bryce Adelstein Lelbach

e l o C H +

1 2 1 1 1 2 1

H e l l o C + +

Page 110: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <execution_policy EP,stdr::random_access_range I, std::random_access_iterator O,typename T, std::invocable</* ... */> BO>

requires /* ... */void histogram(EP&& exec, I&& input, O output, T inc, OP op);

110Copyright (C) 2019 Bryce Adelstein Lelbach

Page 111: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <execution_policy EP,stdr::random_access_range I, std::random_access_iterator O,typename T, std::invocable</* ... */> BO>

requires /* ... */void histogram(EP&& exec, I&& input, O output, T inc, OP op);

111Copyright (C) 2019 Bryce Adelstein Lelbach

Page 112: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void histogram(EP&& exec, I&& input, O output, T inc, OP op) {std::uint64_t const elements = stdr::distance(input);std::uint64_t const chunks = exec.concurrent_agents() * 4;std::uint64_t const chunk_size = (elements + chunks - 1) / chunks;

// ...}

112Copyright (C) 2019 Bryce Adelstein Lelbach

Page 113: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void histogram(EP&& exec, I&& input, O output, T inc, OP op) {std::uint64_t const elements = stdr::distance(input);std::uint64_t const chunks = exec.concurrent_agents() * 4;std::uint64_t const chunk_size = (elements + chunks - 1) / chunks;

// ...}

113Copyright (C) 2019 Bryce Adelstein Lelbach

Page 114: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void histogram(EP&& exec, I&& input, O output, T inc, OP op) {std::uint64_t const elements = stdr::distance(input);std::uint64_t const chunks = exec.concurrent_agents() * 4;std::uint64_t const chunk_size = (elements + chunks - 1) / chunks;

// ...}

114Copyright (C) 2019 Bryce Adelstein Lelbach

Page 115: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void histogram(EP&& exec, I&& input, O output, T inc, OP op) {std::uint64_t const elements = stdr::distance(input);std::uint64_t const chunks = exec.concurrent_agents() * 4;std::uint64_t const chunk_size = (elements + chunks - 1) / chunks;

std::latch l(chunks);

// ...}

115Copyright (C) 2019 Bryce Adelstein Lelbach

Page 116: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void histogram(EP&& exec, I&& input, O output, T inc, OP op) {// ...

for (auto chunk : stdv::iota(0, chunks))exec.submit(// ...

);}

116Copyright (C) 2019 Bryce Adelstein Lelbach

Page 117: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void histogram(EP&& exec, I&& input, O output, T inc, OP op) {// ...

for (auto chunk : stdv::iota(0, chunks))exec.submit(// ...

);}

117Copyright (C) 2019 Bryce Adelstein Lelbach

Page 118: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void histogram(EP&& exec, I&& input, O output, T inc, OP op) {// ...

for (auto chunk : stdv::iota(0, chunks))exec.submit([=, &input, &l] {auto const my_begin = chunk * chunk_size;auto const my_end = std::min(elements, (chunk + 1) * chunk_size);

// ...}

);}

118Copyright (C) 2019 Bryce Adelstein Lelbach

Page 119: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void histogram(EP&& exec, I&& input, O output, T inc, OP op) {// ...

for (auto chunk : stdv::iota(0, chunks))exec.submit([=, &input, &l] {auto const my_begin = chunk * chunk_size;auto const my_end = std::min(elements, (chunk + 1) * chunk_size);

// ...}

);}

119Copyright (C) 2019 Bryce Adelstein Lelbach

Page 120: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void histogram(EP&& exec, I&& input, O output, T inc, OP op) {// ...

for (auto chunk : stdv::iota(0, chunks))exec.submit([=, &input, &l] {auto const my_begin = chunk * chunk_size;auto const my_end = std::min(elements, (chunk + 1) * chunk_size);

stdr::for_each(stdr::begin(input) + my_begin,stdr::begin(input) + my_end,[&] (auto&& t) {output[op(t)] += inc;

});

// ...}

);}

120Copyright (C) 2019 Bryce Adelstein Lelbach

Page 121: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void histogram(EP&& exec, I&& input, O output, T inc, OP op) {// ...

for (auto chunk : stdv::iota(0, chunks))exec.submit([=, &input, &l] {auto const my_begin = chunk * chunk_size;auto const my_end = std::min(elements, (chunk + 1) * chunk_size);

stdr::for_each(stdr::begin(input) + my_begin,stdr::begin(input) + my_end,[&] (auto&& t) {output[op(t)] += inc;

});

// ...}

);}

121Copyright (C) 2019 Bryce Adelstein Lelbach

Page 122: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

e l o C H +

1 2 1 1 1 2 1

Histogram

122Copyright (C) 2019 Bryce Adelstein Lelbach

H e l l o C + +

H e l l o C + +

Page 123: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

e l o C H +

1 2 1 1 1 2 1

Histogram

123Copyright (C) 2019 Bryce Adelstein Lelbach

H e l l o C + +

H e l l o C + +

Page 124: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

Histogram

124Copyright (C) 2019 Bryce Adelstein Lelbach

e l o C H +

1 2 1 1 1 2 1

H e l l o C + +

H e l l o C + +

Page 125: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void histogram(EP&& exec, I&& input, O output, T inc, OP op) {// ...

for (auto chunk : stdv::iota(0, chunks))exec.submit([=, &input, &l] {auto const my_begin = chunk * chunk_size;auto const my_end = std::min(elements, (chunk + 1) * chunk_size);

stdr::for_each(stdr::begin(input) + my_begin,stdr::begin(input) + my_end,[&] (auto&& t) {std::atomic_ref r(output[op(t)]);r.fetch_add(inc, std::memory_order_relaxed);

});

// ...}

);}

125Copyright (C) 2019 Bryce Adelstein Lelbach

Page 126: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::atomic_ref<T>

126Copyright (C) 2019 Bryce Adelstein Lelbach

std::atomic<T> holds a T.

template <struct T>struct atomic {private:T data; // exposition only

public:// ...

};

std::atomic_ref<T> does not hold a T.

template <struct T>struct atomic_ref {private:T* ptr; // exposition only

public:explicit atomic_ref(T&);

// Otherwise, same API as std::atomic.};

Page 127: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void histogram(EP&& exec, I&& input, O output, T inc, OP op) {// ...

for (auto chunk : stdv::iota(0, chunks))exec.submit([=, &input, &l] {auto const my_begin = chunk * chunk_size;auto const my_end = std::min(elements, (chunk + 1) * chunk_size);

stdr::for_each(stdr::begin(input) + my_begin,stdr::begin(input) + my_end,[&] (auto&& t) {std::atomic_ref r(output[op(t)]);r.fetch_add(inc, std::memory_order_relaxed);

});

// ...}

);}

127Copyright (C) 2019 Bryce Adelstein Lelbach

Page 128: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::atomic<floating-point>

template<> struct atomic<floating-point> {floating-point fetch_add(floating-point,

memory_order = memory_order_seq_cst) volatile noexcept;floating-point fetch_add(floating-point,

memory_order = memory_order_seq_cst) noexcept;floating-point fetch_sub(floating-point,

memory_order = memory_order_seq_cst) volatile noexcept;floating-point fetch_sub(floating-point,

memory_order = memory_order_seq_cst) noexcept;};

128Copyright (C) 2019 Bryce Adelstein Lelbach

Page 129: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void histogram(EP&& exec, I&& input, O output, T inc, OP op) {// ...

for (auto chunk : stdv::iota(0, chunks))exec.submit([=, &input, &l] {auto const my_begin = chunk * chunk_size;auto const my_end = std::min(elements, (chunk + 1) * chunk_size);

stdr::for_each(stdr::begin(input) + my_begin,stdr::begin(input) + my_end,[&] (auto&& t) {std::atomic_ref r(output[op(t)]);r.fetch_add(inc, std::memory_order_relaxed);

});

l.count_down();}

);}

129Copyright (C) 2019 Bryce Adelstein Lelbach

Page 130: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void histogram(EP&& exec, I&& input, O output, T inc, OP op) {std::uint64_t const elements = stdr::distance(input);std::uint64_t const chunks = exec.concurrent_agents() * 4;std::uint64_t const chunk_size = (elements + chunks - 1) / chunks;

std::latch l(chunks);

for (std::uint64_t chunk = 0; chunk < chunks; ++chunk)exec.submit(// ...

);

l.wait();}

130Copyright (C) 2019 Bryce Adelstein Lelbach

Page 131: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void histogram(EP&& exec, I&& input, O output, T inc, OP op) {std::uint64_t const elements = stdr::distance(input);std::uint64_t const chunks = exec.concurrent_agents() * 4;std::uint64_t const chunk_size = (elements + chunks - 1) / chunks;

std::latch l(chunks);

for (std::uint64_t chunk = 0; chunk < chunks; ++chunk)exec.submit(// ...

);

l.wait();}

131Copyright (C) 2019 Bryce Adelstein Lelbach

Page 132: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

template <execution_policy EP,stdr::random_access_range I, std::random_access_iterator O,std::invocable</* ... */> BO>

requires /* ... */void inclusive_scan(EP&& exec, I&& input, O output, OP op);

132Copyright (C) 2019 Bryce Adelstein Lelbach

Page 133: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

Inclusive Scan

133Copyright (C) 2019 Bryce Adelstein Lelbach

a b c d e f g h i

a ab abc abcd abcde abcdef abcdefg abcdefgh abcdefghi

Page 134: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

Inclusive Scan

134Copyright (C) 2019 Bryce Adelstein Lelbach

a b c d e f g h i

a b c d e f g h i

Page 135: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

Inclusive Scan

135Copyright (C) 2019 Bryce Adelstein Lelbach

a b c d e f g h i

a ab abc d de def g gh ghi

std::inclusive_scan std::inclusive_scan std::inclusive_scan

Page 136: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

Inclusive Scan

136Copyright (C) 2019 Bryce Adelstein Lelbach

a b c d e f g h i

a ab abc d de def g gh ghi

Page 137: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

Inclusive Scan

137Copyright (C) 2019 Bryce Adelstein Lelbach

a b c d e f g h i

a ab abc d de def g gh ghi

Page 138: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::inclusive_scan std::inclusive_scan std::inclusive_scan

Inclusive Scan

138Copyright (C) 2019 Bryce Adelstein Lelbach

a b c d e f g h i

abc def ghiaggregates =

a ab abc d de def g gh ghi

Page 139: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

Inclusive Scan

139Copyright (C) 2019 Bryce Adelstein Lelbach

a b c d e f g h i

abc def ghiaggregates =

std::inclusive_scan

a ab abc d de def g gh ghi

std::inclusive_scan std::inclusive_scan std::inclusive_scan

Page 140: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

Inclusive Scan

140Copyright (C) 2019 Bryce Adelstein Lelbach

a b c d e f g h i

a ab abc d de def g gh ghi

std::inclusive_scan std::inclusive_scan std::inclusive_scan

abc abcdef abcdefghiaggregates =

std::inclusive_scan

Page 141: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::inclusive_scan

Inclusive Scan

141Copyright (C) 2019 Bryce Adelstein Lelbach

a b c d e f g h i

a ab abc abcd abcde abcdef abcdefg abcdefgh abcdefghi

Increment by aggregates[0] Increment by aggregates[1]

a ab abc d de def g gh ghi

std::inclusive_scan std::inclusive_scan std::inclusive_scan

abc abcdef abcdefghiaggregates =

Page 142: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

Inclusive Scan

142Copyright (C) 2019 Bryce Adelstein Lelbach

a b c d e f g h i

a ab abc abcd abcde abcdef abcdefg abcdefgh abcdefghi

Increment by aggregates[0] Increment by aggregates[1]

abc def ghiaggregates =

std::inclusive_scan

a ab abc d de def g gh ghi

std::inclusive_scan std::inclusive_scan std::inclusive_scanUpsweep

Downsweep

Page 143: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void inclusive_scan(EP&& exec, I&& input, O&& output, BO&& op) {std::uint64_t const elements = stdr::distance(input);std::uint64_t const chunks = exec.concurrent_agents();std::uint64_t const chunk_size = (elements + chunks - 1) / chunks;

using T = stdr::ranges_value_t<I>;std::vector<T> aggregates(chunks);

std::barrier</* ... */> upsweep_barrier(chunks, /* ... */);

std::latch downsweep_latch(chunks);

// ... }

143Copyright (C) 2019 Bryce Adelstein Lelbach

Page 144: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void inclusive_scan(EP&& exec, I&& input, O&& output, BO&& op) {std::uint64_t const elements = stdr::distance(input);std::uint64_t const chunks = exec.concurrent_agents();std::uint64_t const chunk_size = (elements + chunks - 1) / chunks;

using T = stdr::ranges_value_t<I>;std::vector<T> aggregates(chunks);

std::barrier</* ... */> upsweep_barrier(chunks, /* ... */);

std::latch downsweep_latch(chunks);

// ... }

144Copyright (C) 2019 Bryce Adelstein Lelbach

Page 145: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void inclusive_scan(EP&& exec, I&& input, O&& output, BO&& op) {std::uint64_t const elements = stdr::distance(input);std::uint64_t const chunks = exec.concurrent_agents();std::uint64_t const chunk_size = (elements + chunks - 1) / chunks;

using T = stdr::ranges_value_t<I>;std::vector<T> aggregates(chunks);

std::barrier</* ... */> upsweep_barrier(chunks, /* ... */);

std::latch downsweep_latch(chunks);

// ... }

145Copyright (C) 2019 Bryce Adelstein Lelbach

Page 146: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void inclusive_scan(EP&& exec, I&& input, O&& output, BO&& op) {std::uint64_t const elements = stdr::distance(input);std::uint64_t const chunks = exec.concurrent_agents();std::uint64_t const chunk_size = (elements + chunks - 1) / chunks;

using T = stdr::ranges_value_t<I>;std::vector<T> aggregates(chunks);

std::barrier</* ... */> upsweep_barrier(chunks, /* ... */);

std::latch downsweep_latch(chunks);

// ... }

146Copyright (C) 2019 Bryce Adelstein Lelbach

Page 147: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void inclusive_scan(EP&& exec, I&& input, O&& output, BO&& op) {std::uint64_t const elements = stdr::distance(input);std::uint64_t const chunks = exec.concurrent_agents();std::uint64_t const chunk_size = (elements + chunks - 1) / chunks;

using T = stdr::ranges_value_t<I>;std::vector<T> aggregates(chunks);

std::barrier</* ... */> upsweep_barrier(chunks, /* ... */);

std::latch downsweep_latch(chunks);

// ... }

147Copyright (C) 2019 Bryce Adelstein Lelbach

Page 148: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void inclusive_scan(EP&& exec, I&& input, O&& output, BO&& op) {// ...

for (auto chunk : stdv::iota(0, chunks))exec.submit([=, &aggregates, &upsweep_barrier, &downsweep_latch] {

auto const this_begin = chunk * chunk_size;auto const this_end = std::min(elements, (chunk + 1) * chunk_size);aggregates[chunk] = *--stdr::inclusive_scan(stdr::begin(input) + this_begin,

stdr::begin(input) + this_end,output + this_begin,op);

upsweep_barrier.arrive_and_wait();

// ...}));

// ... }

148Copyright (C) 2019 Bryce Adelstein Lelbach

Page 149: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void inclusive_scan(EP&& exec, I&& input, O&& output, BO&& op) {// ...

for (auto chunk : stdv::iota(0, chunks))exec.submit([=, &aggregates, &upsweep_barrier, &downsweep_latch] {

auto const this_begin = chunk * chunk_size;auto const this_end = std::min(elements, (chunk + 1) * chunk_size);aggregates[chunk] = *--stdr::inclusive_scan(stdr::begin(input) + this_begin,

stdr::begin(input) + this_end,output + this_begin,op);

upsweep_barrier.arrive_and_wait();

// ...}));

// ... }

149Copyright (C) 2019 Bryce Adelstein Lelbach

Page 150: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void inclusive_scan(EP&& exec, I&& input, O&& output, BO&& op) {// ...

for (auto chunk : stdv::iota(0, chunks))exec.submit([=, &aggregates, &upsweep_barrier, &downsweep_latch] {

auto const this_begin = chunk * chunk_size;auto const this_end = std::min(elements, (chunk + 1) * chunk_size);aggregates[chunk] = *--stdr::inclusive_scan(stdr::begin(input) + this_begin,

stdr::begin(input) + this_end,output + this_begin,op);

upsweep_barrier.arrive_and_wait();

// ...}));

// ... }

150Copyright (C) 2019 Bryce Adelstein Lelbach

Page 151: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void inclusive_scan(EP&& exec, I&& input, O&& output, BO&& op) {// ...

for (auto chunk : stdv::iota(0, chunks))exec.submit([=, &aggregates, &upsweep_barrier, &downsweep_latch] {

auto const this_begin = chunk * chunk_size;auto const this_end = std::min(elements, (chunk + 1) * chunk_size);aggregates[chunk] = *--stdr::inclusive_scan(stdr::begin(input) + this_begin,

stdr::begin(input) + this_end,output + this_begin,op);

upsweep_barrier.arrive_and_wait();

// ...}));

// ... }

151Copyright (C) 2019 Bryce Adelstein Lelbach

Page 152: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void inclusive_scan(EP&& exec, I&& input, O&& output, BO&& op) {// ...

for (auto chunk : stdv::iota(0, chunks))exec.submit([=, &aggregates, &upsweep_barrier, &downsweep_latch] {

auto const this_begin = chunk * chunk_size;auto const this_end = std::min(elements, (chunk + 1) * chunk_size);aggregates[chunk] = *--stdr::inclusive_scan(stdr::begin(input) + this_begin,

stdr::begin(input) + this_end,output + this_begin,op);

upsweep_barrier.arrive_and_wait();

// ...}));

// ... }

152Copyright (C) 2019 Bryce Adelstein Lelbach

Page 153: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void inclusive_scan(EP&& exec, I&& input, O&& output, BO&& op) {std::uint64_t const elements = stdr::distance(input);std::uint64_t const chunks = exec.concurrent_agents();std::uint64_t const chunk_size = (elements + chunks - 1) / chunks;

using T = stdr::ranges_value_t<I>;std::vector<T> aggregates(chunks);

std::barrier<std::function<void()>> upsweep_barrier(chunks,[&] { stdr::inclusive_scan(aggregates, aggregates.begin(), op); });

std::latch downsweep_latch(chunks);

// ... }

153Copyright (C) 2019 Bryce Adelstein Lelbach

Page 154: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::barrier

template <typename CompletionFunction = see below>struct barrier {using arrival_token = see below;

static constexpr ptrdiff_t max() noexcept;

constexpr explicit barrier(ptrdiff_t expected,CompletionFunction f = CompletionFunction());

[[nodiscard]] arrival_token arrive(ptrdiff_t update = 1);void wait(arrival_token&& arrival) const;

void arrive_and_wait();void arrive_and_drop();

};154Copyright (C) 2019 Bryce Adelstein Lelbach

Page 155: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::barrier

template <typename CompletionFunction = see below>struct barrier {using arrival_token = see below;

static constexpr ptrdiff_t max() noexcept;

constexpr explicit barrier(ptrdiff_t expected,CompletionFunction f = CompletionFunction());

[[nodiscard]] arrival_token arrive(ptrdiff_t update = 1);void wait(arrival_token&& arrival) const;

void arrive_and_wait();void arrive_and_drop();

};155Copyright (C) 2019 Bryce Adelstein Lelbach

Page 156: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::barrier

template <typename CompletionFunction = see below>struct barrier {using arrival_token = see below;

static constexpr ptrdiff_t max() noexcept;

constexpr explicit barrier(ptrdiff_t expected,CompletionFunction f = CompletionFunction());

[[nodiscard]] arrival_token arrive(ptrdiff_t update = 1);void wait(arrival_token&& arrival) const;

void arrive_and_wait();void arrive_and_drop();

};156Copyright (C) 2019 Bryce Adelstein Lelbach

Page 157: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::barrier

template <typename CompletionFunction = see below>struct barrier {using arrival_token = see below;

static constexpr ptrdiff_t max() noexcept;

constexpr explicit barrier(ptrdiff_t expected,CompletionFunction f = CompletionFunction());

[[nodiscard]] arrival_token arrive(ptrdiff_t update = 1);void wait(arrival_token&& arrival) const;

void arrive_and_wait();void arrive_and_drop();

};157Copyright (C) 2019 Bryce Adelstein Lelbach

Page 158: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::barrier

template <typename CompletionFunction = see below>struct barrier {using arrival_token = see below;

static constexpr ptrdiff_t max() noexcept;

constexpr explicit barrier(ptrdiff_t expected,CompletionFunction f = CompletionFunction());

[[nodiscard]] arrival_token arrive(ptrdiff_t update = 1);void wait(arrival_token&& arrival) const;

void arrive_and_wait();void arrive_and_drop();

};158Copyright (C) 2019 Bryce Adelstein Lelbach

Page 159: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::barrier

template <typename CompletionFunction = see below>struct barrier {using arrival_token = see below;

static constexpr ptrdiff_t max() noexcept;

constexpr explicit barrier(ptrdiff_t expected,CompletionFunction f = CompletionFunction());

[[nodiscard]] arrival_token arrive(ptrdiff_t update = 1);void wait(arrival_token&& arrival) const;

void arrive_and_wait();void arrive_and_drop();

};159Copyright (C) 2019 Bryce Adelstein Lelbach

Page 160: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::latch vs std::barrier

160Copyright (C) 2019 Bryce Adelstein Lelbach

std::latch

Supports asynchronous arrival.

Single phase.

No thread identity:

Threads may arrive multiple times.

Any thread may wait on a latch.

No completion function.

std::barrier

Supports asynchronous arrival.

Multi phase.

Thread identity:

A thread may arrive only once per phase.

Only a thread who has arrived may wait.

Supports completion functions.

Page 161: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void inclusive_scan(EP&& exec, I&& input, O&& output, BO&& op) {std::uint64_t const elements = stdr::distance(input);std::uint64_t const chunks = exec.concurrent_agents();std::uint64_t const chunk_size = (elements + chunks - 1) / chunks;

using T = stdr::ranges_value_t<I>;std::vector<T> aggregates(chunks);

std::barrier<std::function<void()>> upsweep_barrier(chunks,[&] { stdr::inclusive_scan(aggregates, aggregates.begin(), op); });

std::latch downsweep_latch(chunks);

// ... }

161Copyright (C) 2019 Bryce Adelstein Lelbach

Page 162: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void inclusive_scan(EP&& exec, I&& input, O&& output, BO&& op) {// ...

for (auto chunk : stdv::iota(0, chunks))exec.submit([=, &aggregates, &upsweep_barrier, &downsweep_latch] {

auto const this_begin = chunk * chunk_size;auto const this_end = std::min(elements, (chunk + 1) * chunk_size);aggregates[chunk] = *--stdr::inclusive_scan(stdr::begin(input) + this_begin,

stdr::begin(input) + this_end,output + this_begin,op),

upsweep_barrier.arrive_and_wait();

// ...}));

// ... }

162Copyright (C) 2019 Bryce Adelstein Lelbach

Page 163: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void inclusive_scan(EP&& exec, I&& input, O&& output, BO&& op) {// ...

for (auto chunk : stdv::iota(0, chunks))exec.submit([=, &aggregates, &upsweep_barrier, &downsweep_latch] {

auto const this_begin = chunk * chunk_size;auto const this_end = std::min(elements, (chunk + 1) * chunk_size);aggregates[chunk] = *--stdr::inclusive_scan(stdr::begin(input) + this_begin,

stdr::begin(input) + this_end,output + this_begin,op),

upsweep_barrier.arrive_and_wait();

// ...}));

// ... }

163Copyright (C) 2019 Bryce Adelstein Lelbach

Page 164: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

std::barrierSynchronization

164Copyright (C) 2019 Bryce Adelstein Lelbach

N x arrives

1 x completion

N x waits complete

happens-before

happens-before

Page 165: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void inclusive_scan(EP&& exec, I&& input, O&& output, BO&& op) {// ...

for (auto chunk : stdv::iota(0, chunks))exec.submit([=, &aggregates, &upsweep_barrier, &downsweep_latch] {

// ...

upsweep_barrier.arrive_and_wait();

if (0 != chunk)stdr::for_each(output + this_begin, output + this_end,

[&, chunk] (auto& t) { t = op(std::move(t), aggregates[chunk - 1]); });

downsweep_latch.count_down();}));

// ... }

165Copyright (C) 2019 Bryce Adelstein Lelbach

Page 166: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void inclusive_scan(EP&& exec, I&& input, O&& output, BO&& op) {// ...

for (auto chunk : stdv::iota(0, chunks))exec.submit([=, &aggregates, &upsweep_barrier, &downsweep_latch] {

// ...

upsweep_barrier.arrive_and_wait();

if (0 != chunk)stdr::for_each(output + this_begin, output + this_end,

[&, chunk] (auto& t) { t = op(std::move(t), aggregates[chunk - 1]); });

downsweep_latch.count_down();}));

// ... }

166Copyright (C) 2019 Bryce Adelstein Lelbach

Page 167: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void inclusive_scan(EP&& exec, I&& input, O&& output, BO&& op) {// ...

for (auto chunk : stdv::iota(0, chunks))exec.submit([=, &aggregates, &upsweep_barrier, &downsweep_latch] {

// ...

upsweep_barrier.arrive_and_wait();

if (0 != chunk)stdr::for_each(output + this_begin, output + this_end,

[&, chunk] (auto& t) { t = op(std::move(t), aggregates[chunk - 1]); });

downsweep_latch.count_down();}));

// ... }

167Copyright (C) 2019 Bryce Adelstein Lelbach

Page 168: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void inclusive_scan(EP&& exec, I&& input, O&& output, BO&& op) {// ...

for (auto chunk : stdv::iota(0, chunks))exec.submit([=, &aggregates, &upsweep_barrier, &downsweep_latch] {

// ...

upsweep_barrier.arrive_and_wait();

if (0 != chunk)stdr::for_each(output + this_begin, output + this_end,

[&, chunk] (auto& t) { t = op(std::move(t), aggregates[chunk - 1]); });

downsweep_latch.count_down();}));

// ... }

168Copyright (C) 2019 Bryce Adelstein Lelbach

Page 169: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void inclusive_scan(EP&& exec, I&& input, O&& output, BO&& op) {std::uint64_t const elements = stdr::distance(input);std::uint64_t const chunks = exec.concurrent_agents();std::uint64_t const chunk_size = (elements + chunks - 1) / chunks;

std::vector<T> aggregates(chunks);

std::barrier<std::function<void()>> upsweep_barrier(chunks,[&] { stdr::inclusive_scan(aggregates, aggregates.begin(), op); });

std::latch downsweep_latch(chunks);

for (auto chunk : stdv::iota(0, chunks))exec.submit(

// ...);

downsweep_latch.wait();}

169Copyright (C) 2019 Bryce Adelstein Lelbach

Page 170: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

void inclusive_scan(EP&& exec, I&& input, O&& output, BO&& op) {std::uint64_t const elements = stdr::distance(input);std::uint64_t const chunks = exec.concurrent_agents();std::uint64_t const chunk_size = (elements + chunks - 1) / chunks;

std::vector<T> aggregates(chunks);

std::barrier<std::function<void()>> upsweep_barrier(chunks,[&] { stdr::inclusive_scan(aggregates, aggregates.begin(), op); });

std::latch downsweep_latch(chunks);

for (auto chunk : stdv::iota(0, chunks))exec.submit(

// ...);

downsweep_latch.wait();}

170Copyright (C) 2019 Bryce Adelstein Lelbach

Page 171: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

C++20 Synchronization Library

std::atomic<T> et al

wait/notify interface

std::atomic_ref<T>

test interface for std::atomic_flag

Floating-point specializations

171Copyright (C) 2019 Bryce Adelstein Lelbach

std::latch & std::barrier

std::counting_semaphore

std::jthread

Joining destructor

std::stop_* interruption mechanism

Page 172: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

libcu++

Opt-in, heterogeneous, incremental C++ standard library for CUDA.

Port of LLVM’s libc++; contributed C++20 sync library upstream.

Version 1 (next week): <atomic> (Pascal+), <type_traits>.

Version 2 (1H 2020): atomic<T>::wait/notify (Volta+), <barrier>(Volta+), <latch> (Volta+), <counting_semaphore> (Volta+), <chrono>, <ratio>, <functional> minus function.

Future priorities: <complex>, <tuple>, <array>, <utility>, <cmath>, string processing, …

The CUDA C++ Standard Library

172Copyright (C) 2019 Bryce Adelstein Lelbach

Page 173: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

173Copyright (C) 2019 Bryce Adelstein Lelbach

#include <atomic>std::atomic<int> x;

#include <cuda/std/atomic>cuda::std::atomic<int> x;

#include <cuda/atomic>cuda::atomic<int, cuda::thread_scope_block> x;

std:: ISO C++, __host__ only.

cuda::std:: CUDA C++, __host__ __device__.

Strictly conforming to ISO C++.

cuda:: CUDA C++, __host__ __device__.

Conforming extensions to ISO C++.

Page 174: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

174Copyright (C) 2019 Bryce Adelstein Lelbach

#include <atomic>std::atomic<int> x;

#include <cuda/std/atomic>cuda::std::atomic<int> x;

#include <cuda/atomic>cuda::atomic<int, cuda::thread_scope_block> x;

std:: ISO C++, __host__ only.

cuda::std:: CUDA C++, __host__ __device__.

Strictly conforming to ISO C++.

cuda:: CUDA C++, __host__ __device__.

Conforming extensions to ISO C++.

CUDA is the only GPU platform that implements C++ parallel forward progress and the C++ memory model; not possible

with OpenCL or SYCL.

Page 175: The C++20 Synchronization Library...Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE C++20 SYNCHRONIZATION

175Copyright (C) 2019 Bryce Adelstein Lelbach

@blelbach