motivation: reliable software

40
Parallel and Distributed Systems Laboratory Paradise: A Toolkit for Building Reliable Concurrent Systems On Building Reliable Concurrent Systems Vijay K. Garg Professor, Department of ECE and CS Director, PDSL The University of Texas at Austin Austin, TX 78712

Upload: jaegar

Post on 04-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

On Building Reliable Concurrent Systems Vijay K. Garg Professor, Department of ECE and CS Director, PDSL The University of Texas at Austin Austin, TX 78712. Motivation: Reliable Software. Multithreaded Distributed programs are prone to errors. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Motivation: Reliable Software

Parallel and Distributed Systems Laboratory

Paradise: A Toolkit for Building Reliable Concurrent Systems

On Building Reliable Concurrent Systems

Vijay K. Garg

Professor, Department of ECE and CSDirector, PDSL

The University of Texas at AustinAustin, TX 78712

Page 2: Motivation: Reliable Software

2

Motivation: Reliable Software

Multithreaded Distributed programs are prone to errors.

– Concurrency, nondeterminism, process and channel failures

Techniques to ensure program correctness

Before program development: Model Checking

During: Testing and Debugging

After: Software Fault-Tolerance

Page 3: Motivation: Reliable Software

3

Paradise Approach

Key Abstraction: Global Properties

Model Checking and Verification: Check global properties against model of the program (Promela)

Testing and Debugging: Global breakpoints, Trace analysis

Software Fault-Tolerance: Monitoring for global properties, Controlled Reexecution

Page 4: Motivation: Reliable Software

4

Talk Outline

Motivation

Monitoring Distributed Systems

– Clock : Tracking Dependency

– Camera : Global Snapshot (Checkpoint)

– Sensor : Detecting Global Properties

– Slicer : Computation Slicing

– Supervisor: Controlling Execution

Other Projects at PDSL

Page 5: Motivation: Reliable Software

5

Paradise Environment

Program Monitor

Slicer

Predicate

Observe

Control

Page 6: Motivation: Reliable Software

6

Trace Model: Total Order vs Partial Order

Total order: interleaving of events in a trace

Partial order: Lamport’s happened-before model

f2e1

CS2 CS1

f1 e2

P1

P2

Partial Order Trace

CS2

CS1

e1 e2

f1 f2

e2e1

CS1CS2

f1 f2

Successful Trace

Specification:CS1 Λ CS2

¬CS2 ¬CS1

¬ CS1

¬CS2

¬CS1 ¬ CS2

Faulty Trace

Page 7: Motivation: Reliable Software

7

Tracking Dependency

computation: a set of events ordered by “happened before” relation

Problem: Timestamp events to answer

– e happened before f ?

– e concurrent with f ?

Page 8: Motivation: Reliable Software

8

Clocks in a Distributed System

Result: s happened before t i the vector at s is less than the vector at t.

Vector Clocks [Fidge 89, Mattern 89]

P1

(1,0,0) (2,1,0) (3,1,0)

P2

(0,1,0) (0,2,0)

P3

(0,0,1) (0,0,2) (2,1,3)

Page 9: Motivation: Reliable Software

9

Dynamic Chain Clocks

Problem with vector clocks: scalability, dynamic process structure

Idea: Computing the “chains” in an online fashion [Aggarwal and Garg PODC 05] for relevant events

a

f

eb

d

c

h

g

a b c d

e f g h

A computation with 4 processes The relevant subcomputation

P1

P2

P3

P4

Page 10: Motivation: Reliable Software

10

Experimental Results

Simulation of a computation with 1% relevant events

Measured

– number of components vs number of threads

– total time overhead vs number of threads

Page 11: Motivation: Reliable Software

11

Talk Outline

Motivation and Overview

Instrumentation

– Clock : Tracking Dependency

Property Checking

– Camera : Global Snapshot (Checkpoint)

– Sensor : Detecting Global Properties

– Slicer : Computation Slicing

Page 12: Motivation: Reliable Software

12

Global Snapshot

Problem: Compute a global snapshot/checkpoint of the system (state of the processes and the channels)

Motivation

– Checkpointing for fault tolerance

– Distributed debugging

– Detecting stable predicates

Page 13: Motivation: Reliable Software

13

Key Difficulties: Taking care of messages

Two sites A and B with $400 each. Site A sends a message with $ 100 to site B.

Problem 1: Inconsistent State checkpoint (A) before message sent - $400 message received before checkpoint (B) - $500

Problem 2: Messages in transit message sent before checkpoint (A) - $300 checkpoint (B) before message received - $400

P1

P2

P3

G1G2

Page 14: Motivation: Reliable Software

14

Current Algorithms

Key idea: white/red processes and messages

A process must be red to act on a red message

Record white messages received by red processes

[Chandy and Lamport 85] A "marker" message on every channel

[Mattern 89, SBF+ 04] Number of white messages sent on every channel

O(N2) messages for completely connected topology

Page 15: Motivation: Reliable Software

15

Our Algorithms

joint work with Rahul Garg, and Yogish Sabharwal, IBM IRL. (ACM International Conference on Supercomputing 2006)

– Lower Bound: Ω(N log w) messages

– Implementation: Blue Gene/L with MPI.

– w: average number of messages in transit

Algorithm Number of messages Message size

Grid Based O(N3/2) O(N1/2 logw)

Tree Based O(N logN logw) O(logw + logN)

Centralized O(N logw) O(logw + logN)

Page 16: Motivation: Reliable Software

16

Global Property Detection

Predicate: A global condition expressed using variables on processes

– e.g., more than one process is in critical section, there is no token in the system

Problem: find a global state that satisfies the given predicate

P1

P2

G1 G2

Critical section

Page 17: Motivation: Reliable Software

17

The Main Difficulty in Partial Order

Algorithm for general predicate [Cooper and Marzullo 91]

Too many global states : A computation may contain as many as O(kn) global states

• k: maximum number of events on a process• n: number of processes

e1 e2

f1 f2

T┴

P1

P2 {e1, ┴} {f1, ┴}

{e1, f1, ┴}

{e2, e1, f1, ┴}

{e2, e1, f2, f1, ┴

{e1, f2, f1, ┴}

{e2, e1, ┴}

{┴}

Page 18: Motivation: Reliable Software

18

Efficient Predicate Detection for Special Cases

stable predicate: [Chandy and Lamport 85] • once the predicate becomes true, it stays true e.g.,

deadlock

unstable predicate:• observer independent predicate [Charron-Bost et al 95]

occurs in one interleaving occurs in all interleavings e.g., any disjunction of local predicate

• linear predicate [Chase and Garg 95] e.g., conjunctive predicates such as there is no leader in

the system• relational predicate: x1 + x2 +…+ xn ≥ k [Chase and Garg

95] e.g., violation of k-mutual exclusion

Page 19: Motivation: Reliable Software

19

Linear PredicatesThe set of consistent cuts that satisfy a linear

predicate is closed under intersection [Chase and Garg 95].

Examples:

conjunctive predicates: critical1 and critical2 channel predicates: “all channels are empty” , “there are

exactly k messages in the channel from process P to Q”

some relational predicates: x1 + x2 · k, when xi is mon. non-decreasing

W X W X

W Å X

Page 20: Motivation: Reliable Software

20

Conjunctive Predicates

A predicate that can be expressed as l1 Λ l2 Λ … ln , where li is local to Pi.

Detect errors that may be hidden in some run due to race conditions.

Examples:

– mutual exclusion problem: (P1 in CS) and (P2 in CS)

– missing primary: (P1 is secondary) and (P2 is secondary) and (P3 is secondary)

Importance: Sufficient for detection of any boolean expression of local predicates

Page 21: Motivation: Reliable Software

21

Conjunctive Predicates: Centralized Algorithm

(l1 Λ l2 Λ … ln ) is true iff there exist si in Pi such that li is true in state si, and si and sj are incomparable for distinct i,j.

Page 22: Motivation: Reliable Software

22

Algorithms for Conjunctive Predicates

Centralized Algorithm [Garg and Waldecker 92] Each non-checker process maintains its local vector and sends to the checker process the chain clock whenever

– local predicate is true

– at most once in each message interval.

Time complexity: Checker requires at most O(n2m) comparisons.

– token based algorithm [Garg and Chase 95]

– completely distributed algorithm [Garg and Chase 95]

– keeping queues shorter [Chiou and Korfhage 95]

– avoiding control messages [Hurfin, Mizuno, Raynal, Singhal 96]

Page 23: Motivation: Reliable Software

23

Other Special Classes of Predicates

Relational Predicates

– Let xi: number of token at Pi

– Σxi < k: loss of tokens

– Algorithms: max-flow techniques [Groselj 93, Chase and Garg 95, Wu and Chen 98]

– Dilworth's partition [Tomlinson and Garg 96]

Page 24: Motivation: Reliable Software

24

Relational Predicates

Let xi 0 be a variable at Pi . Predicates of the form [Chase and Garg 95]

xi k

Algorithm: Consistent cut with minimum value = min cut in the flow graph

Page 25: Motivation: Reliable Software

25

Predicate Detection in General

Explore the state-space (need to examine all global states) without constructing the graph

breadth first manner [Cooper and Marzullo 91]

depth first manner [Alagar and Venkatesan 94]

lexical order [Garg 03]

{e1, ┴} {f1, ┴}

{e1, f1, ┴}

{e2, e1, f1, ┴}

{e2, e1, f2, f1, ┴

{e1, f2, f1, ┴}

{e2, e1, ┴}

{┴}

Page 26: Motivation: Reliable Software

26

Talk Outline

Motivation and Overview

Instrumentation

– Clock : Tracking Dependency

Property Checking

– Camera : Global Snapshot (Checkpoint)

– Sensor : Detecting Global Properties

– Slicer : Computation Slicing

– Supervisor: Controlling Execution

Page 27: Motivation: Reliable Software

27

The Main Idea of Computation Slicing

Partial order trace

slice

state explosion

keep all red global statesslicing

Page 28: Motivation: Reliable Software

28

How does Computation Slicing Help?

Partial order trace

slice

retain all global states satisfying b1

slicing for b1

check b1 Λ b2

check b2

satisfy b1

Page 29: Motivation: Reliable Software

29

Example

Detect predicate (x*y + z < 5) Λ (x ≥1) Λ (z ≤3)

P1

P2

P3

x

y

z

a

1

b

2

c

-1

d

0

e

0

f

2

g

1

h

3

u

4

v

1

w

2

x

4

{a,e,f,u,v} {b}

{w} {g}

Computation

Slice with respect to (x ≥1) Λ (z ≤3)

Page 30: Motivation: Reliable Software

30

Slice

slice: a sub-trace such that:

it contains all consistent cuts of the trace satisfying the given predicate

it contains the least number of consistent cuts

[Garg and Mittal 01, Mittal and Garg 01]

predicate

trace

sliceSlicer

Page 31: Motivation: Reliable Software

31

Results

Efficient polynomial-time algorithms for computing the slice for:

– linear predicates: [Garg and Mittal 01]• time-complexity: O(n2m)

– general predicate:• Theorem: Given a computation, if a predicate b can be

detected efficiently then the slice for b can also be computed efficiently. [Mittal,Sen and Garg 03]

– combining slices: Boolean operators

– temporal logic operators: EF, AG, EG

– approximate slice: For arbitrary boolean expression

• n: number of processes• m: number of events

Page 32: Motivation: Reliable Software

32

POTA Architecture [Sen and Garg 04]

Instrumentor

Specification

SlicerPredicate Detector

Trace

Slice

Predicate (Specification)

TranslatorExecuteProgram

ExecuteSPIN

Program

InstrumentedProgram

Promela

Trace Slice

yes/witnessno/counterexample

no/counterexample

yes

Analyzer

Page 33: Motivation: Reliable Software

33

Experiments: Dining Philosophers Trace Verification

POTA: Partial Order Trace Analyzer (based on slicing) [Sen and Garg 03]

SPIN: A widely used model checking tool [Holzmann 97]

– SPIN: 250 seconds for n = 6, runs out of memory for n > 6.

– POTA: can handle n= 200. Used 400 seconds.

Predicate: Two neighboring dining philosophers do not eat concurrently

Page 34: Motivation: Reliable Software

34

Supervisor: Motivation for Control

maintain global invariants or proper order of events

– Examples: Distributed Debugging

• ensure that busy1 V busy2 is always true

• ensure that m1 is delivered before m2

Fault tolerance• On fault, rollback and execute under control

Page 35: Motivation: Reliable Software

35

Rollback Recovery for Software Faults

Re-execution Problem:

To re-execute in order to avoid a recurrence of a previously detected failure

– Progressive Retry [Wang et al 97]

– Controlled Re-execution [Tarafdar and Garg 98]

P1

P2

P3

faulty state

restored state

Page 36: Motivation: Reliable Software

36

Controlled Re-execution

Add the synchronization necessary to maintain safety property

– e.g., mutual exclusion

Critical section

P1

P2

a

G1

c

d b

Page 37: Motivation: Reliable Software

37

Results

Efficient algorithms for computing the synchronization for:

– Locks [Tarafdar, Garg DISC98]

• O(nm) algorithm for various types of locks

– disjunctive predicate [Mittal, Garg 00]

e.g., (n-1)-mutual exclusion • time-complexity:} O(m2)• minimizes the number of synchronization arrows

– region predicate [Mittal, Garg PODC 00]

e.g., virtual clocks of processes are “approximately” synchronized

• time-complexity:O(nm2)• maximizes the concurrency in the controlled computation

n: number of processes, m: number of events

Page 38: Motivation: Reliable Software

38

Conclusions

Efficient algorithms possible for monitoring global properties

Observation and Control: a powerful abstraction

Current execution engines are designed for performance rather than fault-tolerance

Page 39: Motivation: Reliable Software

39

Other Research Projects

Distributed simulation: GVT algorithms, fault-tolerance

Recovery Schemes: Optimistic Message Logging, fault-tolerance without replication

Model Checking: Partial Order Methods

Formal Methods: Petri Nets, Lattice Theory, Max Plus Algebra

Page 40: Motivation: Reliable Software

40

Questions

?