fast and lock-free concurrent priority queues for multi-thread systems håkan sundell philippas...

Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems

Håkan Sundell

Philippas Tsigas

Outline

Synchronization Methods Priority Queues Concurrent Priority Queues

Lock-Free Algorithm: Problems and Solutions

Experiments Conclusions

Synchronization

Shared data structures needs synchronization

Synchronization using Locks Mutually exclusive access to whole or parts

of the data structure

P1P2

P3

P1P2

P3

Blocking Synchronization

DrawbacksBlockingPriority InversionRisk of deadlock

Locks: Semaphores, spinning, disabling interrupts etc.Reduced efficiency because of

reduced parallelism

Non-blocking Synchronization

Lock-Free SynchronizationOptimistic approach

• Assumes it’s alone and prepares operation which later takes place (unless interfered) in one atomic step, using hardware atomic primitives

• Interference is detected via shared memory and the atomic primitives

• Retries until not interfered by other operations

• Can cause starvation

Non-blocking Synchronization

Lock-Free SynchronizationAvoids problems with locks Simple algorithmsFast when having low contention

Wait-Free SynchronizationAlways finishes in a finite number of

its own steps.• Complex algorithms• Memory consuming• Less efficient in average than lock-free

Priority Queues

Fundamental data structure Works on a set of <value,priority>

pairs Two basic operations:

Insert(v,p): Adds a new element to the priority queue

v=DeleteMin(): Removes the element <v,p> with the highest priority

Sequential Priority Queues

All implementations involves search phase in either Insert or DeleteMinArrays. Maximum complexity O(N)Ordered Lists. O(N)Trees. O(log N)

• Heaps. O(log N)

Advanced structures (i.e. combinations)

Randomized Algorithm: Skip Lists

William Pugh: ”Skip Lists: A Probabilistic Alternative to Balanced Trees”, 1990 Layers of ordered lists with different

densities, achieves a tree-like behavior

Time complexity: O(log2N) – probabilistic!

1 2 3 4 5 6 7

Head Tail

50%25%…

Why Skip Lists for Concurrent Priority Queues? Ordered Lists is simpler than Trees

Easier to make efficient concurrently Search complexity is important

Skip Lists is an alternative to Trees Lotan and Shavit: “Skiplist-Based

Concurrent Priority Queues”, 2000 Implementation using multiple locks

1 2 3 4 5 6 7

LLL L

LL L

LLL L

LL L

L L L L L L L

Our Lock-Free Concurrent Skip List

Define node state to depend on the insertion status at lowest level as well as a deletion flag

Insert from lowest level going upwards

Set deletion flag. Delete from highest level going downwards

1 2 3 4 5 6 7D D D D D D D

123

p

123

p D

Overlapping operations on shared data Example: Insert operation

- which of 2 or 3 gets inserted? Solution: Compare-And-Swap

atomic primitive:

CAS(p:pointer to word, old:word, new:word):booleanatomic do

if *p = old then *p := new; return true;

else return false;

1

2

3

4

Insert 3

Insert 2

Dynamic Memory Management

Problem: System memory allocation functionality is blocking!

Solution (lock-free), IBM freelists:Pre-allocate a number of nodes, link

them into a dynamic stack structure, and allocate/reclaim using CAS

Head Mem 1 Mem 2 Mem n…

Used 1Reclaim

Allocate

Concurrent Insert vs. Delete operations

Problem:

- both nodes are deleted!

Solution (Harris et al): Use bit 0 of pointer to mark deletion status

1

3

42Delete

Insert

a)b)

1

3

42 * a)b)

c)

The ABA problem

Problem: Because of concurrency (pre-emption in particular), same pointer value does not always mean same node (i.e. CAS succeeds)!!!

1 76

4

2 73

4

Step 1:

Step 2:

The ABA problem

Solution: (Valois et al) Add reference counting to each node, in order to prevent nodes that are of interest to some thread to be reclaimed until all threads have left the node

1 * 6 *

2 73

4

1 1

? ? ?

1

CAS Failes!

New Step 2:

Helping Scheme

Threads need to traverse safely

Need to remove marked-to-be-deleted nodes while traversing – Help!

Finds previous node, finish deletion and continues traversing from previous node

1 42 *1 42 * or

? ?

1 42 *

Back-Off Strategy

For pre-emptive systems, helping is necessary for efficiency and lock-freeness

For really concurrent systems, overlapping CAS operations (caused by helping and others) on the same node can cause heavy contention

Solution: For every failed CAS attempt, back-off (i.e. sleep) for a certain duration, which increases exponentially

Our Lock-Free Algorithm

Based on Skip Lists Treated as layers of ordered lists

Uses CAS atomic primitive Lock-Free memory management

IBM Freelists Reference counting

Helping scheme Back-Off strategy All together proved to be linearizable

Experiments

1-30 threads on platforms with different levels of real concurrency

10000 Insert vs. DeleteMin operations by each thread. 100 vs. 1000 initial inserts

Compare with other implementations:Lotan and Shavit, 2000Hunt et al “An Efficient Algorithm for

Concurrent Priority Queue Heaps”, 1996

Full Concurrency

Medium Pre-emption

High Pre-emption

Conclusions

Our work includes a Real-Time extension of the algorithm, using time-stamps and a time-stamp recycling scheme

Our lock-free algorithm is suitable for both pre-emptive as well as systems with full concurrency Will be available as part of NOBLE software

library, http://www.noble-library.org See Technical Report for full details,

http://www.cs.chalmers.se/~phs

Questions?

Contact Information: Address:

Håkan Sundell vs. Philippas TsigasComputing ScienceChalmers University of Technology

Email:<phs , tsigas> @ cs.chalmers.se

Web: http://www.cs.chalmers.se/~phs/warp

Semaphores

Back-off spinlocks

Jones Skew-Heap

The algorithm in more detail

Insert:1. Create node with random height2. Search position (Remember drops)3. Insert or update on level 14. Insert on level 2 to top (unless

already deleted)5. If deleted then HelpDelete(1)

All of this while keeping track of references, help deleted nodes etc.


DeleteMin1. Mark first node at level 1 as deleted,

otherwise HelpDelete(1) and retry2. Mark next pointers on level 1 to top3. Delete on level top to 1 while

detecting helping, indicate success4. Free node

All of this while keeping track of references, help deleted nodes etc.


HelpDelete(level)1. Mark next pointer at level to top

2. Find previous node (info in node)

3. Delete on level unless already helped, indicate success

4. Return previous node All of this while keeping track of

references, help deleted nodes etc.

Correctness

Linearizability (Herlihy 1991)In order for an implementation to be

linearizable, for every concurrent execution, there should exist an equal sequential execution that respects the partial order of the operations in the concurrent execution

Correctness

Define precise sequential semantics Define abstract state and its interpretation

Show that state is atomically updated Define linearizability points

Show that operations take effect atomically at these points with respect to sequential semantics

Creates a total order using the linearizability points that respects the partial order The algorithm is linearizable

Correctness

Lock-freenessAt least one operation should always

make progress There are no cyclic loop depencies,

and all potentially unbounded loops are ”gate-keeped” by CAS operationsThe CAS operation guarantees that at

least one CAS will always succeed• The algorithm is lock-free

Real-Time extension

DeleteMin operations should ignore nodes that are inserted after the DeleteMin operation startedNodes are inserted together with a

timestampBecause timestamps are only used for

relative comparisons, no need for a real-time clock

• Generate time-stamps by increasing function

Real-Time extension

Timestamps are potentially unbounded and will overflowRecycle ”wrapped-over” timestamp

values by having TagFieldSize=MaxTag*2

Timestamps at nodes can stay forever (MaxTag => unlimited)Every operation traverses one step

through the Skiplist and updates ”too old” timestamps

fast and lock-free concurrent priority queues for multi-thread systems håkan sundell philippas...

Documents

lockfree slide

concurrency slide

starvation slide

linearizable slide

highest priority slide

p d slide

ll lllllll slide

reduced parallelism