distributed systems fall 2010 time and synchronization

32

Post on 19-Dec-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed Systems Fall 2010 Time and synchronization
Page 2: Distributed Systems Fall 2010 Time and synchronization

Distributed Systems Fall 2010

Time and synchronization

Page 3: Distributed Systems Fall 2010 Time and synchronization

Fall 2010 5DV020 3

Outline

• Introduction• Basic definitions• Synchronization algorithms

– Synchronous systems– Cristian's algorithm– Berkeley algorithm– Network Time Protocol

• Summary

Page 4: Distributed Systems Fall 2010 Time and synchronization

Fall 2010 5DV020 4

Time, and the lack thereof

• A global notion of the correct time would be tremendously useful.

Why?

– Consistency of distributed data, transactions, authenticity checks (ticket lifetimes), duplication detection, distributed debugging and garbage detection, etc.

Page 5: Distributed Systems Fall 2010 Time and synchronization

Fall 2010 5DV020 5

Time, and the lack thereof

• Why do we not have global time?

– Clocks drift, are inaccurate, may fail arbitrarily, etc.

– Time is relative, and depends on the observer of the timed events• Causal relationships (cause and effect) may not be violated

Page 6: Distributed Systems Fall 2010 Time and synchronization

Fall 2010 5DV020 6

Basic definitions

• Distributed system is P, consisting of N processes: pi, i =

1, 2, …, N

• Each process has state si

• Processes communicate only via message passing (network)

• Events e occur in processes– Internal events– Send events– Receive events

Page 7: Distributed Systems Fall 2010 Time and synchronization

Fall 2010 5DV020 7

Basic definitions

• Events are ordered within a process by the relation →i

e0 →i e1 →i e

2

• Define a history of pi as the

events as described by →i

history(pi) = hi = <ei0, ei

1,

ei2, ...>

Page 8: Distributed Systems Fall 2010 Time and synchronization

Fall 2010 5DV020 8

Basic definitions

• Clock skew– Instantaneous difference between readings of any two clocks

• Clock drift– Variations in how clocks count time (oscillations in a crystal), which cause divergence between clocks

Page 9: Distributed Systems Fall 2010 Time and synchronization

Basic definitions

• Clock drift rate– Change in offset between clock and a perfect clock• Consumer level clocks 10-6 seconds/second, roughly 1 second for each 11.6 days

Fall 2010 95DV020

Page 10: Distributed Systems Fall 2010 Time and synchronization

Fall 2010 5DV020 10

Computer clocks

• Hardware clock H(t)– Gives “raw” time reading

• Software clock– C(t) = αH(t) + β– Scaled by OS to give accurate time

– Used for timestamps

Page 11: Distributed Systems Fall 2010 Time and synchronization

Fall 2010 5DV020 11

Time sources

• Coordinated Universal Time (abbreviated UTC, thanks to the French)– Atomic clocks– Used for synchronization of all kinds of equipment (e.g. your computer, GPS, fancy radio-controlled clocks, etc.)

Page 12: Distributed Systems Fall 2010 Time and synchronization

Fall 2010 5DV020 12

Synchronization types

• External synchronization– Processes are synchronized to external time source (e.g. UTC)

• Internal synchronization– “Correct time” exists only within a group of processes

– Must not be synchronized to external source

Page 13: Distributed Systems Fall 2010 Time and synchronization

Fall 2010 5DV020 13

Correctness and monotonicity

• Correctness (drift is bounded):(1 – p)(t' – t) ≤ H(t') – H(t) ≤ (1 + p)(t' – t)

– Forbids “jumps” in hardware clocks to the bound p

• Monotonicity (ever-increasing)t' > t ⇒ C(t') > C(t)

– Note: only deals with software clock

– Simpler, and often sufficient

Page 14: Distributed Systems Fall 2010 Time and synchronization

Fall 2010 5DV020 14

Synchronization algorithms

• Internal synchronization– In synchronous systems (trivial case)

– Berkeley algorithm

• External synchronization– Cristian's algorithm– Network Time Protocol (NTP)

Page 15: Distributed Systems Fall 2010 Time and synchronization

Fall 2010 5DV020 15

Clock synchronization in synchronous systems

• Synchronous systems define bounds on all relevant parts– Clock drift– Message transmission delays– Process execution step requirements

• Send request, get response back

Internal

Page 16: Distributed Systems Fall 2010 Time and synchronization

Clock synchronization in synchronous systems

• Only uncertainty is actual current transmission delay

u = (max – min)– Set time to (time in response) + u/2

– For N processes, optimum bound is

u(1 - 1/N)

Internal

Fall 2010 165DV020

Page 17: Distributed Systems Fall 2010 Time and synchronization

Fall 2010 5DV020 17

Cristian's algorithm

• S is connected to time source

• p requests (mr) and receives (mt)

time– S records time as soon before transmitting message as possible

– p knows total round-trip-time Tround

– Simply set time to (t + Tround / 2)?

External

Page 18: Distributed Systems Fall 2010 Time and synchronization

Fall 2010 5DV020 18

Cristian's algorithm

• Only if at same LAN! But then, if minimum transmit time (tmin)

is known:– Earliest time S could have placed time in mt was tmin after p

dispatched mr, and tmin before p received mt

– [t + tmin, t + Tround – tmin]

– Width of range is (Tround – 2 tmin),

so accuracy is +-(Tround/2 - tmin)

External

Page 19: Distributed Systems Fall 2010 Time and synchronization

Fall 2010 5DV020 19

Cristian's algorithm

• Single point of failure!• Crashing server?

– Multicast to group of servers

• Fake servers?– Establish cryptographic authentication

• Arbitrarily failing servers?– Have enough correct ones to achieve agreement

External

Page 20: Distributed Systems Fall 2010 Time and synchronization

Fall 2010 5DV020 20

Berkeley algorithm

• Uses Cristian's methods• Master/Slave relationship• Master polls slaves

– Gets current time in each slave– Sends the offset from own time to each slave

• Master fails?– Crash: elect a new one!– Arbitrary failure? Oops…

Internal

Page 21: Distributed Systems Fall 2010 Time and synchronization

Fall 2010 5DV020 21

Network Time Protocol

• Unlike the others, designed for WAN rather than LAN use– Time servers close to the time source are more trusted

– Redundant paths → survives disconnects

– Massively scalable– Authentication of time servers to avoid propagation of arbitrary failures

External

Page 22: Distributed Systems Fall 2010 Time and synchronization

Fall 2010 5DV020 22

Network Time Protocol

• Synchronization subnets– Primary level (stratum) is directly connected to time source

– Secondary level syncs to primary, tertiary to secondary, etc.• High strata number means less reliable

– Dynamically reconfigurable: if time source goes down, primary level becomes secondary level

External

Page 23: Distributed Systems Fall 2010 Time and synchronization

Fall 2010 5DV020 23

Network Time Protocol

• Multicast mode– “Time is X” between LAN nodes

• Only as accurate as LAN allows• Used only for unimportant nodes

• Procedure-call mode– Similar to Cristian's algorithm– More accurate than multicast mode

• Symmetric mode– Pairs of messages– Used in lower strata

External

Page 24: Distributed Systems Fall 2010 Time and synchronization

Fall 2010 5DV020 24

Network Time Protocol

• All messages sent over UDP• For procedure-call and symmetric mode, messages contain– Local time of previous NTP messages between the nodes were sent and received

– Local time of current message transmission

• Receiver notes local time when message is received

External

Page 25: Distributed Systems Fall 2010 Time and synchronization

Fall 2010 5DV020 25

Network Time Protocol

• Delay in Server B may be non-negligible

• Messages may be lost along the way

External

Page 26: Distributed Systems Fall 2010 Time and synchronization

Fall 2010 5DV020 26

Network Time Protocol

• For each message pair calculate

oi estimated offset between

clocks

di total transmission time

(delay)

• True offset is denoted o (without the index)

• Denote transmission time of m as t, and that of m' as t'

External

Page 27: Distributed Systems Fall 2010 Time and synchronization

Fall 2010 5DV020 27

Network Time Protocol

Ti-2 = Ti-3 + t + o

Ti = Ti-1 + t' – o

• leads to

di = t + t' = Ti-2 – Ti-3 + Ti – Ti-1

• also

o = oi + (t' – t)/2, where

oi = (Ti-2 – Ti-3 + Ti-1 - Ti)/2

External

Page 28: Distributed Systems Fall 2010 Time and synchronization

Fall 2010 5DV020 28

Network Time Protocol

• Since t, t' ≥ 0, we know thatoi – di /2 ≤ o ≤ oi + di /2

• Or, in English: oi is an

estimate of the offset, and di

is a measure of its accuracy

External

Page 29: Distributed Systems Fall 2010 Time and synchronization

Fall 2010 5DV020 29

Network Time Protocol

• Pairs are retained for quality calculations

• NTP peers communicate with many other peers, to decrease error

External

Page 30: Distributed Systems Fall 2010 Time and synchronization

Fall 2010 5DV020 30

Summary

• We do not have universal time– But we can synchronize clocks “reasonably well” anyway

• Internal vs. external synchronization

• Real-time systems must use more sophisticated algorithms than what we have seen during this lecture!

Page 31: Distributed Systems Fall 2010 Time and synchronization

Fall 2010 5DV020 31

Summary

• Algorithms– Synchronous system (trivial)– Cristian's algorithm

• Used in many others

– Berkeley algorithm• Master/Slave application of Cristian's for internal synchronization

– Network Time Protocol• Suitable for WANs• Message pairs

Page 32: Distributed Systems Fall 2010 Time and synchronization

Fall 2010 5DV020 32

Next lecture

• Logical time• Global states• Distributed debugging