© chinese university, cse dept. distributed systems / 9 - 1 distributed systems topic 9: time,...

© Chinese University, CSE Dept. Distributed Systems / 9 - 1

Distributed Systems

Topic 9: Time, Coordination and Replication

Dr. Michael R. LyuComputer Science & Engineering Department

The Chinese University


Outline

1 Time

2 Coordination

3 Replication The gossip architecture The process groups approach

4 Summary


1 Time

The notation of time– story: 12/9/1949

External synchronization Internal synchronization Physical clocks and their synchronization Logical time and logical clocks


1.1 Synchronizing Physical Clocks

Computer each contain their physical clock. Physical clock is limited by its resolution - the

period between updates of the clock register. Clock drift often happens to physical clocks. To compensate for clock drifts, computers are

synchronized to a time service, e.g., UTC - Coordinated universal time.

Several other algorithms for synchronization.


1.1 Compensating for Clock Drift

S(t) = H(t) + (t); S = application time, H = hardware clock time, = compensating factor.

Assuming linear relation (t) = aH(t) + b. Let the value of the software clock be Tskew when H

= h, and let the actual time be Treal. If S is to give the actual time after N further ticks,

we have

Tskew = (1 + a)h + b, and Treal + N = (1 + a)(h + N) + b. a = (Treal - Tskew) / N and b = Tskew - (1 + a)h


1.1 Cristian’s Clock Synchronization

Let the time returned in S’s message mt be t. P should set its clock to t + Tround/2.

The time by S’s clock when the reply message arrives is [t + min, t + Tround - min], with width Tround - 2 min and accuracy ±(Tround/2 - min).


1.1 The Berkeley Algorithm

A coordinator computer is chosen to act as the master. Master periodically polls to slaves whose clocks are to be synchronized.

The master estimates their local clock times by observing the round-trip times, and it averages the values obtained.

The master takes a fault-tolerant average. Should the master fail, then another can be

elected to take over.


1.1 The Network Time Protocol

NTP distributes time information to provide:– a service to synchronize clients in Internet– a reliable service that survives loss of connection– a frequent resynchronization for client’s clock drift– protection against interference with time server

NTP service is provided by various servers: – Primary servers, secondary servers, and servers

of other levels (called strata). Synchronization subnet: the servers which are

connected in a logical hierarchy.


1.1 NTP Synchronization Modes

Estimating delay and offset in NTP protocol:– a = Ti-2 - Ti-3

– b = Ti –Ti-1

– di = a + b

– oi = (a-b)/2

NTP servers synchronize in three modes:–Multicast mode

–Procedure-call mode

–Symmetric mode


1.2 Logical Time and Logical Clocks

The order of the events – two events occurred in the order they appear in a

process.

– event of sending occurred before event of receiving.

happened-before relation, denoted by HB1: If process p: x p y, then x y.

HB2: For any message m, send(m) rcv(m),

HB3: If x, y and z are events such that x y and y z, then x z.


1.2 Logical Timestamps Example

Events occurring at three processes


1.2 Logical Timestamps

Logical clock - a monotonically increasing software counter.

Cp: logical clock for process p; Cp(a): timestamp of event a at p; C(b): timestamp of event b

LC1: event issued at process p: Cp := Cp + 1

LC2: a) p sends message m to q with value t = Cp

b) Cq := max(Cq,t) and applies LC1 to rcv(m). If a b then C(a) < C(b), but not visa versa! Total order logical clock and vector clock.


1.2 Logical Timestamps Example

Events occurring at three processes

1 2

3 4

51

3

6

=> 7


2 Coordination

Distributed processes need to coordinate their activities.

Distributed mutual exclusion is required for safety, liveness, and ordering properties.

Election algorithms: methods for choosing a unique process for a particular role.


2.1 Distributed Mutual Exclusion

The basic requirements for mutual exclusion:– ME1 (safety): At most one process may execute

in the critical section (CS) at a time.– ME2 (liveness): A process requesting entry to the

CS is eventually granted.– ME3 (ordering): Entry to the CS should be

granted in happened-before order. The central server algorithm. A ring-based algorithm. A distributed algorithm using logical clocks.


2.2 Elections

An election is a procedure carried out to choose a process from a group.

A ring-based election algorithm. The bully algorithm.


3 Replication

Replication is the maintenance of on-line copies of data and resources

For performance, availability, fault tolerance. Basic Architectural Model. Consistency and request ordering. The gossip architecture. The process group approach.


3 Bulletin Board Example


3 Replication Issues

Replica management models consider trade-off between accuracy and response time.

– Simple asynchronous model

– Totally synchronous model

– Quorum-based schemes

– Causality-ordered

Multicast updates to a process group. Read/write ratio.


3.1 Basic Architectural Model


3.1 The Gossip Architecture


3.1 The Primary Copy Model


3.2 Consistency and Request Ordering

Criteria: correctness vs. expenses. Total, causal, and sync ordering requirements. Implementing request ordering. Implementing total ordering. Implementing causal ordering with vector

timestamps.


3.2.1 Total, Causal, and Sync Ordering

Let r1 and r2 be requests. Total ordering: Either r1 is processed before r2

or r2 is processed before r1, at all RMs. Causal ordering: If r1 happened-before r2 then

r1 is processed before r2 at all RMs. FIFO ordering: If r1 is issued before r2 then r1

is processed before r2 at all RMs. Sync-ordering: If r1 is sync-ordered, then

either r1 is processed before r2 at all RMs or r2 is processed before r1 at all RMs.


3.2.1 Example 1


3.2.1 Example 2


3.2.2 Implementing Request Ordering

Hold-back: A received request is not processed by RM until ordering constraints can be met.

Stable message: all prior requests processed. Hold-back queue vs. delivery queue. Safety property: no message will be delivered

out of order by being prematurely transferred. Liveness property: no message should wait on

the hold-back queue forever.


3.2.3 Implementing Total Ordering

Basic approach: assign totally ordered identifiers to requests.

Sequencer Distributed agreement in assigning request ids.


3.2.4 Implementing Causal Ordering

Vector timestamp: a list of counts of update events, one for each of the replica managers.

Merging vector timestamps: choose the largest values from the two vectors, component-wise.

e.g., FE time vector: (2,3,4)


3.3 The Gossip Architecture

Query & Update Ops. Clients Communication via FE


3.4 Process Group Approach

Process group and group communication. Group structure

– peer group

– server group

– client-server group

– subscription group

– hierarchical groups


3.4 Process Group Services

Group membership management– Create

– Join

– Leave

Group address expansion Multicast communication

– unreliable multicast

– reliable multicast

– atomic multicast


3.4 Multicast Communication

Sample multicasting operationvoid Multicast (in orderType order, in groupId group, in msg m, in int nReplies, out msgSeq replies) raises (…);

Order types:– unordered

– total ordering

– causal ordering

– sync-ordering


4 Summary

Timing issues– Synchronizing physical clocks.

– Logical time and logical clocks.

Distributed coordination and mutual exclusions.

Replication to providing good performance, high

availability and fault tolerance.

The gossip approach and the process group

approach.

CORBA replication service is a research topic.

© chinese university, cse dept. distributed systems / 9 - 1 distributed systems topic 9: time,...

Documents

t t round

t i t i

chinese university slide

t real t skew n

time service

hardware clock time

t real n

ht t s