ipdps 2003 - fast and lock free concurrent priority queues

10
2009-02-24 1 NOBLE: Non-Blocking Programming Support via Lock-Free Shared Abstract Data Types Håkan Sundell University College of Borås Parallel Scalable Solutions AB Philippas Tsigas Chalmers University of Technology 2 #CPUs #Threads Traditional desktop applications Traditional multi- threaded desktop applications Multi-threaded applications on new multicore CPU(s) High performance multi- threaded applications on multiprocessors Concurrent applications 1 5 Multi-thread Programming Threads need to share data Abstract data types Queue, Dictionary, List etc. Implemented using data structures Arrays, Linked lists, Skip lists, Hash table, Trees etc. Must be thread-safe ! Using simple critical sections (locks) limits (kills) performance / scalability. 3 Multi-thread Programming Typical application scenarios: Producer-Consumer Shared Queue Shared Priority Queue Presenter-Updater (i.e. in memory database) Shared Dictionary Shared Linked List 4 5 Shared Data Structures Several approaches without locks exist. Software Transactional Memory General solutions. In research. High overhead, low performance. Ad-hoc solutions Specific solutions for each particular type of data structure. Best possible performance. Transactions Hardware S.T.M. Application Hardware Application Ad-hoc 6 Ad-hoc Shared Data Structures NOBLE Large commercial software library for C/C++ Real-Time Java Uses Wait-Free Queues for communication. Java 1.6, Concurrency package Contains several non-blocking data structures for developers.

Upload: others

Post on 21-Oct-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IPDPS 2003 - Fast and Lock Free Concurrent Priority Queues

2009-02-24

1

NOBLE: Non-Blocking

Programming Support via

Lock-Free Shared Abstract

Data Types

Håkan Sundell University College of Borås

Parallel Scalable Solutions AB

Philippas TsigasChalmers University of Technology

2#CPUs

#Threads

Traditional desktop

applications

Traditional multi-

threaded desktop

applications

Multi-threaded applications on

new multicore CPU(s)

High performance multi-

threaded applications on

multiprocessors

Concurrent applications

1 5

Multi-thread Programming

Threads need to share data

Abstract data types

• Queue, Dictionary, List etc.

Implemented using data structures

• Arrays, Linked lists, Skip lists, Hash table,

Trees etc.

Must be thread-safe !

Using simple critical sections (locks)

limits (kills) performance / scalability.

3

Multi-thread Programming

Typical application scenarios:

Producer-Consumer

• Shared Queue

• Shared Priority Queue

Presenter-Updater (i.e. in memory

database)

• Shared Dictionary

• Shared Linked List

4

5

Shared Data Structures

Several approaches without locks exist.

Software Transactional Memory General solutions.

In research.

High overhead, low performance.

Ad-hoc solutions

Specific solutions for each particular type of data structure.

Best possible performance.

Transactions

Hardware

S.T.M.

Application

Hardware

Application

Ad-hoc

6

Ad-hoc Shared Data

Structures

NOBLE

Large commercial software library for

C/C++

Real-Time Java

Uses Wait-Free Queues for

communication.

Java 1.6, Concurrency package

Contains several non-blocking data

structures for developers.

Page 2: IPDPS 2003 - Fast and Lock Free Concurrent Priority Queues

2009-02-24

2

Commercial Tools

Intel Threading Building Blocks

C++

Windows, Linux, Mac

Microsoft Parallel Extensions to .NET

Framework 3.5 (Preview)

C#

NOBLE

C/C++

Windows, Linux, Mac, Unix

Bild: 7 8

NOBLE Professional Edition:

Contents

Memory Management Memory allocation

Memory reclamation (garbage collection)

Atomic primitives Single-word and Multi-word transactions.

Common shared data structures Stack

Queue

Deque

Priority Queue

Dictionary

Linked Lists

Snapshots

Properties of ad-hoc data

structure algorithms

Sequential Time Complexity

Space Complexity

Scalability

Overall Performance

Semantics

Locality

Dependencies and Limitations

9 10

NOBLE Professional Edition:

Design

Easy to use. Hides the underlying complexity and shows a common and simple interface.

Versatile. Contains lock-based as well as lock-free/wait-free implementations of each component.

Efficient. Designed for best possible performance.

Object-oriented. Designed in C, but can easily be encapsulated in C++ or other language constructs.

Configurable. A lot of optional parameters and functionalities that can be set/tuned to meet specific needs.

11

NOBLE Professional Edition:

Example

Thread 2

Globals

#include <Noble.h>

NBLQueueRoot* queue;

Thread 1

NBLQueue *handle=

NBLQueueGetHandle(queue);

NBLQueueEnqueue(handle, item);

NBLQueueFreeHandle(handle);

Main

queue=NBLQueueCreateLF();

/* Create and run the threads */

NBLQueueFree(queue);

NBLQueue *handle=

NBLQueueGetHandle(queue);

item=NBLQueueDequeue(handle);

NBLQueueFreeHandle(handle);

Thread

1Queue

Thread

2

C++ Example (Dictionary +

Memory)

12

Globals

class MyObject {

public:

int x;

int y;

int z;

};

NBL::Memory<MyObject> * memory;

NBL::Dictionary<MyObject,int> *dictionary;

Main

NBL::Memory<MyObject> * memory = NBL::Memory<MyObject>::CreateLF_SUU();

dictionary = NBL::Dictionary<MyObject,int>::CreateLF_EB();

dictionary->SetValueMemoryHandler(memory);

Thread 2

Thread 1

MyObject * obj1 = memory->AllocBlock();

obj1->x = 1;

obj1->y = 2;

obj1->z = 3;

dictionary->Insert(1,obj1);

memory->ReleaseRef(obj1);

MyObject * obj1 = dictionary->Find(1);

...

memory->ReleaseRef(obj1);

Page 3: IPDPS 2003 - Fast and Lock Free Concurrent Priority Queues

2009-02-24

3

NBL::Word

For atomic updates of individual

memory words

Applications

Shared counters

Concurrent updates of logically

connected values

• Graphs

13

NBL::Word - Constructors

14

NBL::Word - Members

15

NBL::Word - Members

16

NBL::Word - Example

17

NBL::Memory

For handling of memory reclamation /

allocation of user data structures.

Applications

Safely storing and retreving data in

containers (e.g. Dictionary)

Internal use in implementation of lock-

free abstract data types

18

Page 4: IPDPS 2003 - Fast and Lock Free Concurrent Priority Queues

2009-02-24

4

NBL::Memory - Constructors

19

NBL::Memory Constructors

20

NBL::Memory - Members

21

NBL::Memory - Example

22

NBL::Stack

Storing data in a LIFO container

23

NBL::Stack - Constructors

24

Page 5: IPDPS 2003 - Fast and Lock Free Concurrent Priority Queues

2009-02-24

5

NBL::Stack - Members

25

NBL::Stack - Example

26

NBL::Queue

Storing data in a FIFO container

27

NBL::Queue - Constructors

28

NBL::Queue - Members

29

NBL::Queue - Example

30

Page 6: IPDPS 2003 - Fast and Lock Free Concurrent Priority Queues

2009-02-24

6

NBL::Deque

For storing data in a LIFO/FIFO

container

Applications

Work-stealing queues

31

NBL::Deque - Constructors

32

NBL::Deque - Members

33

NBL::Deque - Example

34

NBL::PriorityQueue

Storing items in LIFO/priority

container

Applications

Producer-Consumer

Load-Balancing

35

NBL::PriorityQueue -

Constructors

36

Page 7: IPDPS 2003 - Fast and Lock Free Concurrent Priority Queues

2009-02-24

7

NBL::PriorityQueue -

Members

37

NBL::PriorityQueue -

Example

38

NBL::Dictionary

For storing associated data in

container

Applications

In-memory database

39

NBL::Dictionary -

Constructors

40

NBL::Dictionary - Members

41

NBL::Dictionary - Example

42

Page 8: IPDPS 2003 - Fast and Lock Free Concurrent Priority Queues

2009-02-24

8

NBL::List

Storing data and local order relations

in a container

Container contents is traversible

(enumerable, iterable)

43

NBL::List - Constructors

44

NBL::List - Members

45

NBL::List - Example

46

NBL::Snapshot

For consistent view of a set of shared

values

47

NBL::Snapshot -

Constructors

48

Page 9: IPDPS 2003 - Fast and Lock Free Concurrent Priority Queues

2009-02-24

9

NBL::Snapshot - Members

49

NBL::Snapshot - Example

50

Obstacles

Semantics

Various Semantics used in Literature

Extend, Modify and Optimize

Memory Consistency

Memory Barriers needed

Excessive Care

Application Integration

User Level Objects

51

Obstacles

Languages

C

C++

• Thread Local Storage is Limited

Process vs Threads

Code vs Data adress ranges on

different CPU’s

Architecural Differences

Performance Tweaking (Back-off, ...)52

Benchmarks and Improvements

53 54

How we work

Practical

Efficient

Algorithmic

DesignCorrectness

Implementation

Page 10: IPDPS 2003 - Fast and Lock Free Concurrent Priority Queues

2009-02-24

10

55

Correctness

Parallel software have infinitely many running scenarios and interleavings.

Testing is not enough for being sure.

Needs proofs of correctness.

Machine-made proofs can be very time-consuming and difficult for humans to read.

Our approach

Intuitive readable-text proofs, using analytical and mathematical oriented methods.

56

Products and Availability

NOBLE Professional Edition

Details described in the Reference

Manual (>250 pages)

License model

• Commercial, Based on number of

Developers

Available for modern platforms

• x86, sparc, powerpc, mips etc.

• Win32 (32/64-bit), Linux etc.

• Custom

Availability

NOBLE Professional Edition

Freely available to academia for

educational purposes

• Platforms of choice

• Licensed by application

Free Demo Version

• Windows

• Visual Studio C++

57

Future Work

More functionality

Improved support for object-oriented

languages and environments

Managed environments

More experiments

Dissipation

58

59

Questions?

Thank You for listening!

www.pss-ab.com

www.adm.hb.se/~hsu

www.cse.chalmers.se/~tsigas