local linearizability - forsyteforsyte.at/download/frida15/15sokolova-local-lineariziability... ·...

Local LinearizabilityAna Sokolova

joint work with:

Andreas Haas

Michael LippautzChristoph KirschTom Henzinger

Andreas Holzer

Ali Sezgin Helmut VeithHannes Payer

Semantics of concurrent data structures

• Sequential specification = set of legal sequences

!

• Consistency condition = e.g. linearizability / sequential consistency

Ana Sokolova FRIDA DisCoTec 5.6.15

e.g. pools, queues, stacks

e.g. queue legal sequence enq(1)enq(2)deq(1)deq(2)

e.g. linearizable queue concurrent history

t1: enq(2) deq(1)

enq(1) deq(2)t2:

Consistency conditions


Linearizability

Sequential Consistency

there exists a sequential witness that preserves

precedence

t1: enq(2) deq(1)

enq(1) deq(2)t2: 1

2 3

4

there exists a sequential witness that preserves per-

thread precedence (program order)

t1: enq(1) deq(2)

deq(1) enq(2)t2:1

2 3

4

[Herlihy,Wing ’90]

[Lamport’79]

Performance and scalability


throughput

# of threads / cores

:-)))

:-)

:-(:-\

Relaxations allow trading !

correctness for

performance


provide the potential for better-performing

implementations

Relaxing the Semantics

• Sequential specification = set of legal sequences

• Consistency condition = e.g. linearizability / sequential consistency


Quantitative relaxations Henzinger, Kirsch, Payer, Sezgin,S. POPL13

Local linearizability in this talk for queues only

(feel free to ask for more)

not “sequentially

correct”

too weak

Local Linearizability main idea

• Partition a history into a set of local histories

• Require linearizability per local history


Already present in some shared-memory consistency conditions

(not in our form of choice)

Local sequential consistency… is also possible

no global witness

Local Linearizability (queue) example


t1: enq(1)

deq(1)enq(2)

deq(2)

t2:

(sequential) history not linearizable

t1-induced history, linearizable

t2-induced history, linearizable

locally linearizable

Local Linearizability (queue) definition


Queue signature ∑ = {enq(x) | x ∈ V} ∪ {deq(x) | x ∈ V} ∪ {deq(empty)}

For a history h with n threads, we put

Inh(i) = {enq(x)i ∈ h | x ∈ V}

Outh(i) = {deq(x)j ∈ h | enq(x)i ∈ Inh(i)} ∪ {deq(empty)}

in-methods of thread i enqueues performed

by thread i

out-methods of thread i dequeues

(performed by any thread) corresponding to enqueues that

are in-methods h is locally linearizable iff every thread-induced history hi = h | (Inh(i) ∪ Outh(i)) is linearizable.

Generalizations of Local Linearizability


Signature ∑

For a history h with n threads, choose

Inh(i)

Outh(i)

in-methods of thread i, methods that go in hi

out-methods of thread i, dependent methods

on the methods in Inh(i) (performed by any thread)

h is locally linearizable iff every thread-induced history hi = h | (Inh(i) ∪ Outh(i)) is linearizable.

by increasing the in-methods,

LL gradually moves to linearizability

Where do we stand?


In general

Linearizability


Local Linearizability

Where do we stand?


For queues (and all pool-like data structures)

Linearizability


Local Linearizability

Where do we stand?


C: For queues

Linearizability


Local Linearizability& Pool-seq.cons.

Properties


Local linearizability is compositional

h (over multiple objects) is locally linearizable iff each per-object subhistory of h is locally linearizable

like linearizability unlike sequential consistency

Local linearizability is modular / “decompositional”

uses decomposition into smaller histories, by definition

allows for modular verification

Verification (queue)


Queue sequential specification (axiomatic)

s is a legal queue sequence iff 1. s is a legal pool sequence, and 2. enq(x) <s enq(y) ⋀ deq(y) ∈ s ⇒ deq(x) ∈ s ⋀ deq(x) <s deq(y)

Queue linearizability (axiomatic)

h is queue linearizable iff 1. h is pool linearizable, and 2. enq(x) <h enq(y) ⋀ deq(y) ∈ h ⇒ deq(x) ∈ h ⋀ deq(y) ≮h deq(x)

Verification (queue)


Queue sequential specification (axiomatic)

s is a legal queue sequence iff 1. s is a legal pool sequence, and 2. enq(x) <s enq(y) ⋀ deq(y) ∈ s ⇒ deq(x) ∈ s ⋀ deq(x) <s deq(y)

Queue local linearizability (axiomatic)

h is queue locally linearizable iff 1. h is pool locally linearizable, and 2. enq(x) <hi enq(y) ⋀ deq(y) ∈ h ⇒ deq(x) ∈ h ⋀ deq(y) ≮h deq(x)

Generic Implementations


Your favorite linearizable data structure implementationΦ

turns into a locally linearizable implementation by:

segment of dynamic size (n)

t2t1 tn…

Φ Φ Φ

local inserts / global (randomly distributed) removes

LLD Φ pool linearizable

& locally

linearizable

Performance


0

2

4

6

8

10

12

14

16

18

2 10 20 30 40 50 60 70 80

mill

ion

oper

atio

nspe

rsec

(mor

eis

bette

r)

number of threads

MSLCRQ

k-FIFO (k=80)

LL k-FIFO (k=80)static LL DQ (p=40)

dynamic LL DQ

1-RA DQ (p=80)

(a) FIFO queues and “queue-like” pools

0

2

4

6

8

10

12

14

16

18

2 10 20 30 40 50 60 70 80

mill

ion

oper

atio

nspe

rsec

(mor

eis

bette

r)

number of threads

TreiberTS-interval Stackk-Stack (k=80)

LL k-Stack (k=80)static LL DS (p=40)

dynamic LL DS

1-RA DS (p=80)

(b) Stacks and “stack-like” pools

Figure 9: Performance and scalability of producer-consumer mi-crobenchmarks with an increasing number of threads on a 40-core(2 hyperthreads per core) machine

of the fastest locally linearizable implementation (dynamic LL) tothe fastest linearizable queue (LCRQ) and stack (TS-interval Stack)implementation at 80 threads is 1.93 and 1.79, respectively.

Note that the performance degradation for LCRQ between 30and 70 threads aligns with the performance of fetch-and-inc—the CPU instruction that atomically retrieves and modifies the con-tents of a memory location—on the benchmarking machine, whichis different on the original benchmarking machine [31]. LCRQ usesfetch-and-inc as its key atomic instruction.

7. Conclusions & Future WorkLocal linearizability utilizes the idea of decomposition yielding anintuitive and verifiable consistency condition for concurrent objectsthat enables implementations with superior performance and scala-bility compared to linearizable and relaxed implementations. Thereare at least two directions for future work: (1) Further implicationsof decomposition to correctness; (2) Program-aware correctness,i.e., identifying classes of programs that are (in)sensitive to notionsof (relaxed) semantics.

References[1] Y. Afek, G. Korland, and E. Yanovsky. Quasi-Linearizability: Relaxed

Consistency for Improved Concurrency. In Proc. Conference onPrinciples of Distributed Systems (OPODIS), LNCS, pages 395–410.Springer, 2010. .

[2] M. Ahamad, R. Bazzi, R. John, P. Kohli, and G. Neiger. The powerof processor consistency. In Proc. Symposium on Parallel Algorithmsand Architectures (SPAA), pages 251–260. ACM, 1993. .

[3] M. Ahamad, G. Neiger, J. Burns, P. Kohli, and P. Hutto. Causalmemory: definitions, implementation, and programming. DistributedComputing, 9(1):37–49, 1995. ISSN 0178-2770. .

[4] D. Alistarh, J. Kopinsky, J. Li, and N. Shavit. The spraylist: A scalablerelaxed priority queue. In Proc. Symposium on Principles and Practiceof Parallel Programming (PPoPP), pages 11–20. ACM, 2015. .

[5] J. Aspnes, M. Herlihy, and N. Shavit. Counting networks. J. ACM, 41(5):1020–1048, Sept. 1994. ISSN 0004-5411. .

[6] M. Batty, M. Dodds, and A. Gotsman. Library abstraction for c/c++concurrency. In Proc. Symposium on Principles of Programming Lan-guages (POPL), pages 235–248, New York, NY, USA, 2013. ACM..

[7] A. Bouajjani, M. Emmi, C. Enea, and J. Hamza. Tractable refinementchecking for concurrent objects. In Proc. Symposium on Principles ofProgramming Languages (POPL), pages 651–662. ACM, 2015. .

[8] S. Burckhardt, A. Gotsman, H. Yang, and M. Zawirski. Replicateddata types: Specification, verification, optimality. In Proc. Symposiumon Principles of Programming Languages (POPL), pages 271–284,New York, NY, USA, 2014. ACM. .

[9] P. . A. E. Committee. Popl 2015 artifact evaluation. URL http://popl15-aec.cs.umass.edu/home/. Accessed on 01/14/2015.

[10] Computational Systems Group, University of Salzburg. Scal: High-performance multicore-scalable computing. URL http://scal.cs.uni-salzburg.at.

[11] J. Derrick, B. Dongol, G. Schellhorn, B. Tofan, O. Travkin, andH. Wehrheim. Quiescent Consistency: Defining and Verifying Re-laxed Linearizability. In Proc. International Symposium on FormalMethods (FM), LNCS, pages 200–214. Springer, 2014. .

[12] M. Dodds, A. Haas, and C. Kirsch. A scalable, correct time-stampedstack. In Proc. Symposium on Principles of Programming Languages(POPL), pages 233–246. ACM, 2015. .

[13] J. Goodman. Cache consistency and sequential consistency. Univer-sity of Wisconsin-Madison, Computer Sciences Department, 1991.

[14] A. Haas, T. Henzinger, C. Kirsch, M. Lippautz, H. Payer, A. Sezgin,and A. Sokolova. Distributed queues in shared memory: Multicoreperformance and scalability through quantitative relaxation. In Proc.International Conference on Computing Frontiers (CF). ACM, 2013..

[15] A. Heddaya and H. Sinha. Coherence, non-coherence and local consis-tency in distributed shared memory for parallel computing. Technicalreport, Computer Science Department, Boston University, 1992.

[16] S. Heller, M. Herlihy, V. Luchangco, M. Moir, W. Scherer, andN. Shavit. A lazy concurrent list-based set algorithm. In Proc.Conference on Principles of Distributed Systems (OPODIS), LNCS.Springer, 2005. .

[17] J. Hennessy and D. Patterson. Computer Architecture, Fifth Edi-tion: A Quantitative Approach. Morgan Kaufmann Publishers Inc.,San Francisco, CA, USA, 5th edition, 2011. ISBN 012383872X,9780123838728.

[18] T. Henzinger, C. Kirsch, H. Payer, A. Sezgin, and A. Sokolova. Quan-titative relaxation of concurrent data structures. In Proc. Symposiumon Principles of Programming Languages (POPL), pages 317–328.ACM, 2013. .

[19] T. Henzinger, A. Sezgin, and V. Vafeiadis. Aspect-oriented lineariz-ability proofs. In Proc. Internatonal Conference on Concurrency The-ory (CONCUR), LNCS, pages 242–256. Springer, 2013. .

short description of paper 8 2015/5/21

MS queue

LLD MS queue

LLD MS queue performs

significantly better than

MS queue

Performance


0

2

4

6

8

10

12

14

16

18

2 10 20 30 40 50 60 70 80

mill

ion

oper

atio

nspe

rsec

(mor

eis

bette

r)

number of threads

MSLCRQ

k-FIFO (k=80)


dynamic LL DQ

1-RA DQ (p=80)


0

2

4

6

8

10

12

14

16

18

2 10 20 30 40 50 60 70 80

mill

ion

oper

atio

nspe

rsec

(mor

eis

bette

r)

number of threads



dynamic LL DS

1-RA DS (p=80)



























MS queue

LLD MS queue

LLD Φ performs

significantly better than Φ

Performance


0

2

4

6

8

10

12

14

16

18

2 10 20 30 40 50 60 70 80

mill

ion

oper

atio

nspe

rsec

(mor

eis

bette

r)

number of threads

MSLCRQ

k-FIFO (k=80)


dynamic LL DQ

1-RA DQ (p=80)


0

2

4

6

8

10

12

14

16

18

2 10 20 30 40 50 60 70 80

mill

ion

oper

atio

nspe

rsec

(mor

eis

bette

r)

number of threads



dynamic LL DS

1-RA DS (p=80)



























MS queue

LLD MS queue

LLD MS queue performs better

than the best known

pools

Thank You!

local linearizability - forsyteforsyte.at/download/frida15/15sokolova-local-lineariziability... ·...

Documents