two techniques for proving lower bounds

Two Techniques for Proving Lower Bounds

Hagit AttiyaTechnion

Goal of this Presentation•Describe two common techniques for

proving lower bounds in distributed computing:▫Information theory arguments▫Covering

•Variations•Applications

nicer system architecture

My always first slide…

real system architecture

algorithm

problem

implementation

Part IInformation Theory Arguments

Overview•Bound the flow of information among

processes (and memory)•Show that information takes long to be

acquired•Argue that solving a particular problem

requires information about many processes•Usually applies to:

▫Shared memory systems▫Synchronous executions (imply lower bounds

also for asynchronous executions)•Details depend on the primitives used

Single-writer registers: Possible argument•Need to read from each process•The state of a process can be found only

in its own register•Hence, first process must read n registers

Not reallyWhen processes take steps together

First process doubles information in 2nd step

But can’t do better than that

More Refined Argument• Consider synchronized executions

▫Processes take steps in rounds ▫All reads appear before all writes

• INF(pi,t-1): The set of inputs influencing process pi at the start of round t▫For t = 1, INF(pi,t-1) = {pi}▫For t > 1, if pi reads a value written by pj,

INF(pi,t) = INF(pi,t-1) [ INF(pj,t-1)▫For t > 1, if pi writes, INF(pi,t) = INF(pi,t-1)

INF determines the state• INF(pi,t-1): The set of inputs influencing process pi at

the start of round t▫For t = 1, INF(pi,t-1) = {pi}▫For t > 1, if pi reads a value written by pj,


Proof by case analysis

Lemma: If the states of processes in INF(pi,t-1) are the same in configurations C and C’, then pi takes the same steps in a t-round execution from C and from C’

Size of INF• INF(pi,t-1): The set of inputs influencing process pi at

the start of round t▫For t = 1, INF(pi,t-1) = {pi}▫For t > 1, if pi reads a value written by pj,


• I(t) = max |INF(pi,t)|

I(t) ≤ 2t

Lemma: I(0) = 1, and I (t) ≤ 2 I(t-1)

Simple application: Computing OR

• Consider input configurationC0 = (0,0, , 0, , 0)

• The size of the influence set of a process is < n in all rounds < log n

• Some process pi is not in INF(p1,log n-1)

By lemma, p_1 returns the same value in C0 and in C1 = (0,0, , 1, , 0)

A contradiction

pi

Application: Approximate agreementFor a small ² > 0•Processes start with input in [0,1]•Must decide on an output in [0,1] such that

▫All outputs are within ² of each other (agreement)

▫If all inputs are v, the output is v (validity)

System is asynchronous and a process must decide even if it runs by itself (solo termination)

Application: Approximate agreement[Attiya, Shavit, Lynch]

•Consider input configuration C0 = (0,0, , , , 0)

•Run all processes to completion from C0

must decide 0

•If number of rounds T < log nÞ I(T) < nÞ 9 process pi INF(p1,T)

Approximate agreement (cont.)•Consider two input configurations

C0 = (0, , , , , 0) C1 = (0, , 1 , , 0)

•Run pi to completion, must decide 1•pi INF(p1,T)Þp1 still decides 0 when running from this

configuration, contradicting agreement

pi

Theorem: Solo-terminating approximate agreement requires (log n) rounds in a synchronous failure-free run

Approximate agreement (cont.)•Consider two input configurations

C0 = (0, , , , , 0) C1 = (0, , 1 , , 0)

•Run pi to completion, must decide 1•pi INF(p1,T)Þp1 still decides 0 when running from this

configuration, contradicting agreement

pi

Theorem: Solo-terminating approximate agreement requires (log n) rounds in a synchronous failure-free run

Overhead of solo-termination: in “nice” runs, since otherwise, a synchronous algorithm can solve the problem in one round.

With multi-writer registers•Previous theorem does not hold•A wait-free approximate agreement

algorithm that takes O(1) rounds in “nice” executions

[Schenk]

•Even simpler: An O(1) OR algorithm

With multi-writer registers•Previous theorem does not hold•A wait-free approximate agreement

algorithm that takes O(1) rounds in “nice” executions

[Schenk]

•Even simpler: An O(1) OR algorithm

•Only a few initial configurations to distinguish between

Can you

find it?

Overhead of single-writer registers: Separates single-writer and multi-writer registers

Information flow with multi-writer registers

The previous argument does not hold

Instead, consider how learning more information allows to differentiate between input configurations

Capture as a partitioning of process states and memory values

[Beame]

(0, , 1 , , 0)

(0 , , ,

, ,0)

(1, , 1 , , 0)

(0, , 0 , , 1)

Multi-writer registers: Ordering events

Within each round•Put all reads, then•Put all writes

ÞReads obtain value written at the end of previous round

Partitioning into equivalence classesFor process p and round t, two input configurations are in the same equivalence class of P(p,t) if p is in the same state after t rounds from both(in a synchronous failure-free execution)

P(t): the number of classes after t rounds (max over p)

V(R,t), V(t) defined similarly for locations R

P(t), V(t) · (4n+2)2t−2

Lemma: P(t) · P(t-1)V(t-1) and V(t) · n P(t-1)+V(t-1)

Application: The collect problem• update(v) stores v as latest value of a process• collect() returns a set of values (one per process)

When each process initially stores one of two valuesÞ There are 2n possible input configurations

Each leading to a different output

Previous lemma implies (4n+2)2t−2 ≥ P(t) ≥ 2n

Þ Must have (log n) rounds

Also for other primitives (CAS)Non-reading CAS

Reading CAS returns the old value (can be handled, but we won’t do that)

Can also extend to non-reading kCAS

CAS(R,old,new){if R==old then

R = newreturn success

else return fail}

Careful with CASMore information flow in a sequence of steps

initially, R == 0cas(R,0,1) cas(R,1,2) . . . cas(R,n−1,n)

On the other hand

cas(R,n-1,n) cas(R,n-2,n-1) . . . cas(R,0,1)

Ordering events within a roundPut all reads first.Put all writes last.

For every register R whose current value is v, consider all CAS events:

▫Put all events with old v: all fail▫Put all events with old == v: only the first succeeds

(assumes operations are non-degenerate)

Allows to prove a lemma analogue to multi-writer registers (different constants)

Information Flow with Bounded Fan-InArbitrary objects, but bounded contention

▫Not too many processes access the same base object similtaneously

Isolate processes n a Q-independent execution ▫Only processes in Q take steps▫Access only objects not modified by processes

in QFor a process p 2 Q, a Q-independent

execution is indistinguishable from a p-solo execution

Constructing independent executions

Proof by induction, with a trivial base case.

Induction step: consider Qt-independent execution. We use the following result from graph theory.

Look at the next steps processes in Qt are about to perform, and construct an undirected graph (V,E)

Lemma: For any algorithm using only objects with contention ≤ w and every t ≥ 0, there is a t-round Qt-independent execution, with| Qt | ≥ n/(w+2)t

Turan theorem: Any graph (V,E) has an independent set of size |V|2/(|V|+2|E|)

Induction step: The graph• V = Qt

• E contains an edge {pi, pj} if ▫pi and pj access the same object, or▫pi is about to read an object modified by pj, or ▫pj is about to read an object modified by pi

|E| ≤ | Qt|(w+1)/2

Turan’s theorem and inductive hypothesis there is an independent set Qt+1 of size ≥ n/(w+2)t

Omit all steps of Qt – Qt+1 from the execution to get a Qt+1-independent execution

Application: Weak Test&SetWeak test&set: Like test&set but at most one success

Take t such that (w+2)t < nLemma gives a t-round {pi,pj}-independent execution

• Each of pi and pj seems to be running solo must succeed Contradiction

Theorem: The solo step complexity of weak test&set is (log n / log w )

Part IICovering

Covering: The basic idea

Several processes write to the same locationWrites by early processes are lost, if no read in between

Must write to distinct locationsOther process must read these locations

Max Register•WriteMax(v,R) operation

•ReadMax operation op returns the maximal value written by a WriteMax operation that▫completed before op started, or▫overlaps op

•Special case of a linearizable object

Lower bound for ReadMax operation

[Jayanti, Tan, Toueg]

The proof is constructive

Theorem: ReadMax must read n different registers.

Construction for the lower bound

®k ¯k

writesby p1 … pk

to R1 … Rk

p1 … pk

perform WriteMaxoperations

°k

Pn performs ReadMaxoperationreads

R1 … Rk

Proof by induction on k = 0, …, n

Base case is simple

Taking k = n yields the result

Inductive Step

®k ¯k

writesby p1 … pk

to R1 … Rk

p1 … pk


°k

Pn performs ReadMaxoperation

pk+1


must write to R R1 …

Rk

¯k

writesby p1 … pk

to R1 … Rk°

k


does not observe

pk+1

¼k

Inductive Step

®k ¯k

writesby p1 … pk

to R1 … Rk

p1 … pk


°k


pk+1


must write to R R1 …

Rk

¯k

writesby p1 … pk

to R1 … Rk°

k


must readR R1 …Rk

Inductive Step

®k ¯k

writesby p1 … pk

to R1 … Rk

p1 … pk


°k


pk+1


¯k

writesby p1 … pk

to R1 … Rk°

k

Pn performs ReadMaxoperationwrite to Rk+1

Claim follows with R1 … Rk Rk+1 and ®k+1 = ®k ¼k

¼k

Swap objectsTheorem holds for other primitives and objects, e.g., (register-to memory) swap

Need some care in constructing ¼k, °k

swap(R,v){tmp = Rreturn tmp

}

Result holds also for other objects•E.g., counters

•Constructed execution contains many increment operations

•Better algorithms when▫Few increment operations▫Max register holds bounded values

[Aspnes, Attiya, Censor-Hillel]

Counters with CASCounters can be implemented with a single location R, and a single CAS per operation:•To increment, simply:

▫read previous value from R▫CAS +1 to R

•To read the counter, simply read R

Lots of contention on R! This is inevitable

The memory stalls measure[Dwork, Herlihy, Waarts]

If k processes access (or modify) the same location at the same configuration

▫The first process incurs one step, and no stalls▫The second process incurs one step, and one stall▫ .▫ .▫ .▫The k’th process incurs one step, and k-1 stalls

Lower bound on number of stallsTheorem: ReadCounter must incur n stalls + steps.

p1 … pk poised onR1 … Rm, m · k

p1 … pk

perform Incrementoperations

Pn performs ReadCounter

operationaccessesR1 … Rm

Similar construction as in previous theorem

Lower bound on number of stallsTheorem: ReadCounter must incur n stalls + steps.

p1 … pk poised onR1 … Rm, m · k

p1 … pk

perform Incrementoperations

Pn performs ReadCounter

operationaccessesR1 … Rk

incurs k

stalls +

steps

Similar construction as in previous theorem

Wrap-up•There are many lower bound results

But fewer techniques…

•Some results & techniques are relevant to questions asked in Transform

•Material is based on monograph-in-writing with Faith Ellen▫Let me know if you want to proof-read it!

two techniques for proving lower bounds

Documents