transactional memory lecturer: danny hendler. speeding up uni-processors is harder and harder ...

57
Transactional Memory Lecturer: Danny Hendler

Upload: roger-rose

Post on 02-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Transactional Memory

Lecturer: Danny Hendler

Page 2: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Speeding up uni-processors is harder and harder

Intel, Sun (RIP), AMD, IBM now focusing on “multi-core” architectures

Already, most computers are multiprocessors

How can we write correct and

efficient algorithms for

multiprocessors?

The Future of Computing

Page 3: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

A fundamental problem of thread-level parallelism

. .Account[i] = Account[i]-X;Account[j] = Account[j]+X; .

. .

. .Account[i] = Account[i]-X;Account[j] = Account[j]+X; .

. .

Thread A Thread B

But what if execution is concurrent?

Must avoid race

conditions

Page 4: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Inter-thread synch. alternatives

Page 5: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”
Page 6: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”
Page 7: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”
Page 8: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”
Page 9: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”
Page 10: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

What is a transaction?

• A transaction is a sequence of memory reads and writes, executed by a single thread, that either commits or aborts

• If a transaction commits, all the reads and writes appear to have executed atomically

• If a transaction aborts, none of its stores take effect

• Transaction operations aren't visible until they commit (if they do)

Page 11: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Transactions properties:A transaction satisfies the following key

property:

• Atomicity: Each transaction either commits (its changes seem to take effect atomically) or aborts (its changes have no effect).

• Serializability: all committed transactions issue the same operations and receive the same responses as in some sequential history consisting only of committed transactions.• Some work considers weaker or stronger requirements

• Isolation: Transaction writes are not visible outside the transaction until it commits

Page 12: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Transactional Memory Goals

• A new multiprocessor architecture• The goal: Implementing nonblocking

synchronization that is– efficient– easy to use compared with conventional techniques

based on mutual exclusion

• Implemented by hardware support (such as straightforward extensions to multiprocessor cache-coherence protocols) and / or by software mechanisms

Page 13: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

A Usage Example

Locks:

Lock(L[i]); Lock(L[j]); Account[i] = Account[i] –

X; Account[j] = Account[j] +

X; Unlock(L[j]); Unlock(L[i]);

Transactional Memory:

atomic { Account[i] = Account[i] –

X; Account[j] = Account[j] +

X; };

Account[i] = Account[i]-X;Account[j] = Account[j]+X;

Page 14: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

• Transactions execute in commit order

ld 0xdddd...st 0xbeef

Transaction ATime

ld 0xbeef

Transaction C

ld 0xbeef

Re-execute Re-execute with new datawith new data

Commit

ld 0xdddd...ld 0xbbbb

Transaction B

Commit Violation!Violation!

0xbeef0xbeef

Taken from a presentation by Royi Maimon & Merav Havuv, prepared for a seminar given by Prof. Yehuda Afek.

Transactions interaction

Page 15: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Software Transactional Memory for Dynamic-Sized Data Structures(DSTM – Dynamic STM)

Maurice Herlihy,Victor Luchangco,Mark Moir,William N. Scherer III

PODC 2003

Prepared by Adi Suissa

Page 16: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Motivation

•Transactional Memory – simplifies parallel programming

•STM – Software based TM▫Usually simpler than Hardware based TM▫Can handle situations where HTM fails

•However:▫It is immature (supports static data sets and

static transactions)▫It is complicated

Page 17: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Overview

•Short recap and what’s new?•How to use DSTM?•Example•Diving into DSTM•Example 2• Improving performance•Obstruction freedom

Page 18: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Transactions

•Transaction – a sequence of steps executed by a single thread

•Transactions are atomic: each transaction either commits (it takes effect) or aborts (its effects are discarded)

•Transactions are linearizable: they appear to take effect in a one-at-a-time order

Page 19: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

The computation modelStarting

transaction

Read-Transactional(o1)Write-Transactional(o2)

Read(o3)Write(o4)

Commit-Transaction

Page 20: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

The computation model

•Committing a transaction can have two outcomes:▫Success: the transaction’s operations take

effect▫Failure: the operations are discarded

• Implemented in Java and in C++

Page 21: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Previous STM designs

•Only static memory – need to declare the memory that can be transactioned statically▫We want the ability to create transactional

objects dynamically•Only static transactions – transactions need

to declare which addresses they are going to access before the transaction begins▫We want to let transactions determine which

object to access based on information of objects read inside a transaction

and this is why it is called Dynamic Software

Transactional Memory

Page 22: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Overview

•Short recap and what’s new?•How to use DSTM?•Example•Diving into DSTM•Example 2• Improving performance•Obstruction freedom

Page 23: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Threads

•A thread that executes transactions must be inherited from TMThread

•Each thread can run a single transaction at a time

class TMThread : Thread {void beginTransaction();bool commitTransaction();void abortTransaction();

}

Don’t forget the run() method

Page 24: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Objects (1)

•All transactinal objects must implement the TMCloneable interface:

•This method clones the object, but clone implementors don’t need to handle synchronization issues

inteface TMCloneable {Object clone();

}

Page 25: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Objects (2)

• In order to make an object transactional, need to wrap it

•TMObject is a container for regular Java objects

Object

TMObject

Page 26: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Opening an object

•Before using a TMObject in a transaction, it must be opened

•An object can either be opened for READ or WRITE (and read)

class TMObject {TMObject(Object obj);enum Mode {READ, WRITE};Object open(Mode mode);

}

Page 27: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Overview

•Short recap and what’s new?•How to use DSTM?•Example•Diving into DSTM•Example 2• Improving performance•Obstruction freedom

Page 28: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

An atomic counter (1)

•The counter has a single data member and two operations:

•The object is shared by multiple threads

class Counter : TMCloneable {int counterValue = 0;

void inc(); // increment the valueint value(); // returns the valueObject clone();

}

Page 29: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

An atomic counter (2)

•When a thread wants to access the counter in a transaction, it must first open the object using the encapsulated version:

Counter counter = new Counter();TMObject tranCounter = new TMObject(counter);

((TMThread)Thread.currentThread).beginTransaction();…Counter counter = (Counter)tranCounter.open(WRITE);counter.inc();…((TMThread)Thread.currentThread).commitTransaction();

Returns true/false to

indicate commit status

Page 30: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Overview

•Short recap and what’s new?•How to use DSTM?•Example•Diving into DSTM•Example 2• Improving performance•Obstruction freedom

Page 31: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

DSTM implementation

•Transactional object structure:

start

TMObject

transactionnew object

old object

status

Data

Data

Locator

Page 32: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Current object version

•The current object version is determined by the status of the transaction that most recently opened the object in WRITE mode:▫committed: the new object is the current▫aborted: the old object is the current▫active: the old object is the current, and

the new is tentative•The actual version only changes when a

commit is successful

Page 33: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Opening an object (1)

•Let's assume transaction A tries to open object o in WRITE mode.

•Let transaction B be the transaction that most recently opened o in WRITE mode.

•We need to distinguish between the following cases:▫B is committed▫B is aborted▫B is active

Page 34: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Opening an object (2) – B committed

start

o

transactionnew object

old object

committed

Data

DataB’s Locator

1 A creates a new Locator

transactionnew object

old object

A’s Locator

2 A clones the previous new object, and sets new

Data

clone

3 A sets old object to the previous new

active4 Use CAS in

order to replace locator

If CAS fails, A restarts from the beginning

Page 35: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Opening an object (3) – B aborted

start

o

transactionnew object

old object

aborted

Data

DataB’s Locator

1 A creates a new Locator

transactionnew object

old object

A’s Locator

2 A clones the previous old object, and sets new

Data

clone

3 A sets old object to the previous old

active4 Use CAS in

order to replace locator

Page 36: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Opening an object (4) – B active•Problem: B is active and can either commit

or abort, so which version (old/new) should we use?

•Answer: A and B are conflicting transactions, that run at the same time

•Use Contention Manager to decide which should continue and which should abort

• If B needs to abort, try to change its status to aborted (using CAS)

Page 37: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Opening an object (5)

•Lets assume transaction A opens object o in READ mode▫Fetch the current version just as before▫Add the pair (o, v) to the readers list (read-

only table)

Page 38: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Committing a transaction

•The commit needs to do the following:1. Validate the transaction2. Change the transaction’s status from

active to committed (using CAS)

Page 39: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Validating transactions•What?▫Validate the objects (only) read by the

transaction•Why?▫To make sure that the transaction observes a

consistent state•How?

1.For each pair (o, v) in the read-only table, verify that v is still the most recently committed version of o

2.Check that (status == active)

If the validation fails, throw an exception so the user will restart the transaction from

the beginning

Page 40: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Validation inconsistency

•Assume two threads A and B

• If B after A, then o1 = 2, o2 = 1;• If A after B, then o1 = 1, o2 = 2• If they run concurrently we can have o1 =

1, o2 = 1 which is illegal

Thread A1. x <- read(o1)2. w(o2, x + 1)

Thread B1. y <- read(o2)2. w(o1, y + 1)

Initially:o1 = 0o2 = 0

Page 41: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Conflicts

•Conflicts are detected when:▫A transaction first opens an object and finds

that it is open for modification by another transaction

▫When the transaction validates its read set (on opening an object or commit)

Page 42: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Overview

•Short recap and what’s new?•How to use DSTM?•Example•Diving into DSTM•Example 2• Improving performance•Obstruction freedom

Page 43: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Ordered Integer List – IntSet (1)

Min 3 4 8 Max

6

Page 44: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Ordered Integer List – IntSet (2)

class List implements TMCloneable {int value;TMObject next;

List(int v) { value = v; }

public Object clone() {List newList = new List(value);newList.next = next;return newList;

}}

Page 45: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Ordered Integer List – IntSet (3)

class IntSet {TMObject first; // the list’s anchor

IntSet() {List firstList = new List

(Integer.MIN_VALUE);first = new TMObject(firstList);firstList.next = new TMObject(

new List(Integer.MAX_VALUE));}

}

Page 46: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Ordered Integer List – IntSet (4)class IntSet {

boolean insert(int v) {List newList = new List(v);TMObject newNode = new

TMObject(newList);TMThread thread = Thread.currentThread();while (true) {

thread.beginTransaction();boolean result = true;try {

…} catch (Denied d) {}if (thread.commitTransaction())

return result;}

}}

Page 47: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Ordered Integer List – IntSet (5)

try {List prevList = (List)this.first.open(WRITE);List currList = (List)prevList.next.open(WRITE);while (currList.value < v) {

prevList = currList;currList = (List)currList.next.open(WRITE);

}if (currList.value == v) {

result = false;} else {

result = true;newList.next = prevList.next;prevList.next = newNode;

}}

Page 48: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Overview

•Short recap and what’s new?•How to use DSTM?•Example•Diving into DSTM•Example 2• Improving performance•Obstruction freedom

Page 49: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Single entrance

•What is the problem with the previous example?

•How can it be solved?▫Opening for READ on traversal▫Maybe something more sophisticated?

Page 50: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Releasing an object

•An object that was open for READ can be released

•What does it imply?▫Careful planning▫Can increase performance▫What happens if we open an object, release

it and open it again in the same transaction?

▫Can lead to validation problems

Page 51: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Overview

•Short recap and what’s new?•How to use DSTM?•Example•Diving into DSTM•Example 2• Improving performance•Obstruction freedom

Page 52: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Non-Blocking Algorithms

•A family of algorithms on a shared data•Each sub-family satisfies different

progress guarantees•Usually, there is a correlation between the

progress guarantee strength and the complexity of the algorithm

Page 53: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Wait-Free algorithms

•An algorithm is wait-free if every operation has a bound on the number of steps it will take before completing

No Starvation

Page 54: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Lock-Free algorithms

•An algorithm is lock-free if every step taken achieves global progress

•Even if n-1 processes fail (while doing operations on the shared memory), the last processor can still complete its operation

•Example: Shavit & Touitou’s STM implementation

Page 55: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Obstruction-Free algorithms

•An algorithm is obstruction-free if at any point, a single thread executed in isolation for a bounded number of steps will complete its operation

•Doesn’t avoid live-locks•Example: DSTM implementation•What is it good for?

Page 56: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”

Contention Manager (CM)

•The contention manager arbitrates between two conflicting transactions

•Given two (conflicting) transactions TA, TB, then CM(TA, TB):1. Decides who wins2. Decides what the loser should do

(abort/wait/retry)•Conflicts policy

Page 57: Transactional Memory Lecturer: Danny Hendler.  Speeding up uni-processors is harder and harder  Intel, Sun (RIP), AMD, IBM now focusing on “multi-core”