distributed systems fall 2010 time and synchronization

Distributed Systems Fall 2010

Time and synchronization

Fall 2010 5DV020 3

Outline

• Introduction• Basic definitions• Synchronization algorithms

– Synchronous systems– Cristian's algorithm– Berkeley algorithm– Network Time Protocol

• Summary

Fall 2010 5DV020 4

Time, and the lack thereof

• A global notion of the correct time would be tremendously useful.

Why?

– Consistency of distributed data, transactions, authenticity checks (ticket lifetimes), duplication detection, distributed debugging and garbage detection, etc.

Fall 2010 5DV020 5

Time, and the lack thereof

• Why do we not have global time?

– Clocks drift, are inaccurate, may fail arbitrarily, etc.

– Time is relative, and depends on the observer of the timed events• Causal relationships (cause and effect) may not be violated

Fall 2010 5DV020 6

Basic definitions

• Distributed system is P, consisting of N processes: pi, i =

1, 2, …, N

• Each process has state si

• Processes communicate only via message passing (network)

• Events e occur in processes– Internal events– Send events– Receive events

Fall 2010 5DV020 7

Basic definitions

• Events are ordered within a process by the relation →i

e0 →i e1 →i e

2

• Define a history of pi as the

events as described by →i

history(pi) = hi = <ei0, ei

1,

ei2, ...>

Fall 2010 5DV020 8

Basic definitions

• Clock skew– Instantaneous difference between readings of any two clocks

• Clock drift– Variations in how clocks count time (oscillations in a crystal), which cause divergence between clocks

Basic definitions

• Clock drift rate– Change in offset between clock and a perfect clock• Consumer level clocks 10-6 seconds/second, roughly 1 second for each 11.6 days

Fall 2010 95DV020

Fall 2010 5DV020 10

Computer clocks

• Hardware clock H(t)– Gives “raw” time reading

• Software clock– C(t) = αH(t) + β– Scaled by OS to give accurate time

– Used for timestamps

Fall 2010 5DV020 11

Time sources

• Coordinated Universal Time (abbreviated UTC, thanks to the French)– Atomic clocks– Used for synchronization of all kinds of equipment (e.g. your computer, GPS, fancy radio-controlled clocks, etc.)

Fall 2010 5DV020 12

Synchronization types

• External synchronization– Processes are synchronized to external time source (e.g. UTC)

• Internal synchronization– “Correct time” exists only within a group of processes

– Must not be synchronized to external source

Fall 2010 5DV020 13

Correctness and monotonicity

• Correctness (drift is bounded):(1 – p)(t' – t) ≤ H(t') – H(t) ≤ (1 + p)(t' – t)

– Forbids “jumps” in hardware clocks to the bound p

• Monotonicity (ever-increasing)t' > t ⇒ C(t') > C(t)

– Note: only deals with software clock

– Simpler, and often sufficient

Fall 2010 5DV020 14

Synchronization algorithms

• Internal synchronization– In synchronous systems (trivial case)

– Berkeley algorithm

• External synchronization– Cristian's algorithm– Network Time Protocol (NTP)

Fall 2010 5DV020 15

Clock synchronization in synchronous systems

• Synchronous systems define bounds on all relevant parts– Clock drift– Message transmission delays– Process execution step requirements

• Send request, get response back

Internal

Clock synchronization in synchronous systems

• Only uncertainty is actual current transmission delay

u = (max – min)– Set time to (time in response) + u/2

– For N processes, optimum bound is

u(1 - 1/N)

Internal

Fall 2010 165DV020

Fall 2010 5DV020 17

Cristian's algorithm

• S is connected to time source

• p requests (mr) and receives (mt)

time– S records time as soon before transmitting message as possible

– p knows total round-trip-time Tround

– Simply set time to (t + Tround / 2)?

External

Fall 2010 5DV020 18


• Only if at same LAN! But then, if minimum transmit time (tmin)

is known:– Earliest time S could have placed time in mt was tmin after p

dispatched mr, and tmin before p received mt

– [t + tmin, t + Tround – tmin]

– Width of range is (Tround – 2 tmin),

so accuracy is +-(Tround/2 - tmin)

External

Fall 2010 5DV020 19


• Single point of failure!• Crashing server?

– Multicast to group of servers

• Fake servers?– Establish cryptographic authentication

• Arbitrarily failing servers?– Have enough correct ones to achieve agreement

External

Fall 2010 5DV020 20

Berkeley algorithm

• Uses Cristian's methods• Master/Slave relationship• Master polls slaves

– Gets current time in each slave– Sends the offset from own time to each slave

• Master fails?– Crash: elect a new one!– Arbitrary failure? Oops…

Internal

Fall 2010 5DV020 21

Network Time Protocol

• Unlike the others, designed for WAN rather than LAN use– Time servers close to the time source are more trusted

– Redundant paths → survives disconnects

– Massively scalable– Authentication of time servers to avoid propagation of arbitrary failures

External

Fall 2010 5DV020 22


• Synchronization subnets– Primary level (stratum) is directly connected to time source

– Secondary level syncs to primary, tertiary to secondary, etc.• High strata number means less reliable

– Dynamically reconfigurable: if time source goes down, primary level becomes secondary level

External

Fall 2010 5DV020 23


• Multicast mode– “Time is X” between LAN nodes

• Only as accurate as LAN allows• Used only for unimportant nodes

• Procedure-call mode– Similar to Cristian's algorithm– More accurate than multicast mode

• Symmetric mode– Pairs of messages– Used in lower strata

External

Fall 2010 5DV020 24


• All messages sent over UDP• For procedure-call and symmetric mode, messages contain– Local time of previous NTP messages between the nodes were sent and received

– Local time of current message transmission

• Receiver notes local time when message is received

External

Fall 2010 5DV020 25


• Delay in Server B may be non-negligible

• Messages may be lost along the way

External

Fall 2010 5DV020 26


• For each message pair calculate

oi estimated offset between

clocks

di total transmission time

(delay)

• True offset is denoted o (without the index)

• Denote transmission time of m as t, and that of m' as t'

External

Fall 2010 5DV020 27


Ti-2 = Ti-3 + t + o

Ti = Ti-1 + t' – o

• leads to

di = t + t' = Ti-2 – Ti-3 + Ti – Ti-1

• also

o = oi + (t' – t)/2, where

oi = (Ti-2 – Ti-3 + Ti-1 - Ti)/2

External

Fall 2010 5DV020 28


• Since t, t' ≥ 0, we know thatoi – di /2 ≤ o ≤ oi + di /2

• Or, in English: oi is an

estimate of the offset, and di

is a measure of its accuracy

External

Fall 2010 5DV020 29


• Pairs are retained for quality calculations

• NTP peers communicate with many other peers, to decrease error

External

Fall 2010 5DV020 30

Summary

• We do not have universal time– But we can synchronize clocks “reasonably well” anyway

• Internal vs. external synchronization

• Real-time systems must use more sophisticated algorithms than what we have seen during this lecture!

Fall 2010 5DV020 31

Summary

• Algorithms– Synchronous system (trivial)– Cristian's algorithm

• Used in many others

– Berkeley algorithm• Master/Slave application of Cristian's for internal synchronization

– Network Time Protocol• Suitable for WANs• Message pairs

Fall 2010 5DV020 32

Next lecture

• Logical time• Global states• Distributed debugging

distributed systems fall 2010 time and synchronization

Documents

synchronization slide

fall 20105dv0204 time

fall 20105dv0205 time

internal slide

global time

raw time

set time

accurate time