im ntu distributed information systems 2004 replication management -- 1 replication management...

Post on 04-Jan-2016

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 11

Replication Management

Yih-Kuen Tsay

Dept. of Information Management

National Taiwan University

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 22

Motivations for Replication

• Performance enhancement– Client vs. server caching– Server pools– Replication of immutable vs. changing data

• Increased availability– Server failures– Network partition and disconnected operation

• Fault tolerance: guarantee correctness in spite of faults

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 33

General Requirements

• Replication transparency– Clients are not aware of multiple physical copies

(replicas) of an object.– Clients see one logical copy for each object.

• Consistency– Servers perform operations in a way that meets

the specification of correctness.

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 44

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

An Architecture forReplication Management

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 55

About the Servers

• Recoverability• State Machines

– Consist of state variables and commands– Outputs determined by the sequence of requests

processed

• Static vs. dynamic set of replica managers– Dynamic: servers may crash; new ones may join– Static: crashed servers are considered to cease

operating (possibly for an indefinite period)

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 66

Phases of Request Processing

• Issuance– unicast or multicast (from the front end to replica managers)

• Coordination (to ensure consistency)– FIFO ordering, causal ordering, total ordering, …

• Execution (maybe tentatively)• Agreement (to commit or abort)• Response

– From one replica manager or several replica managers to the front end

* The ordering of the phases varies for different systems.

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 77

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

Services for Process Groups

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 88

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

View-Synchronous Group Communications

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 99

Correctness Criteria

• Linearizability

• Sequential consistency

* Consider individual operations (instead of transactions).

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 1010

Linearizability

• The interleaved sequence of operations meets the specification of a single correct copy of the objects.

• The order of operations in the interleaving is consistent with the real times at which the operations occurred in the actual execution.

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 1111

Sequential Consistency

• The one-copy semantics of the replicated objects is respected.

• The order of operations is preserved for each client, i.e., consistent with the program order for each client.

* Every linearizable service is also sequentially consistent.

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 1212

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

Consistency is easily guaranteed if the replica managers are organized as a group

and the primary uses view-synchronous group communication to send updates.

The Primary-Backup (Passive) Model

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 1313

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

Each front end sends its requests one at a time to all replica managers using a

totally ordered multicast primitive, ensuring that all requests are processed in the

same order at all replica managers.

Active Replication

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 1414

The Gossip Architecture

• A framework for providing high availability of service through lazy replication

• A request normally executed at one replica

• Replicas updated by lazy exchange of gossip messages (containing most recent updates).

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 1515

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

Operations in a Gossip Service

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 1616

Timestamps

• Each front end keeps a vector timestamp reflecting the latest version accessed.

• The timestamp is attached to every request sent to a replica.

• Two front ends may exchange messages directly; these messages also carry timestamps.

• The merging of timestamps is done as usual.

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 1717

Timestamps (cont.)

• Each replica keeps a replica timestamp representing those updates it has received.

• It also keeps a value timestamp, reflecting the updates in the replicated value.

• The replica timestamp is attached to the reply to an update, while the value timestamp is attached to the reply to a query.

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 1818

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

Timestamp Propagations

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 1919

The Update Log

• Every update, when received by a replica, is recorded in the update log of the replica.

• Two reasons for keeping a log:– The update cannot be applied yet; it is held

back.– It is uncertain if the update has been received

by all replicas.

• The entries are sorted by timestamps.

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 2020

The Executed Operation Table

• The same update may arrive at a replica from a front end and in a gossip message from another replica.

• To prevent an update from being applied twice, the replica keeps a list of identifiers of the updates that have been applied so far.

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 2121

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

A Gossip Replica Manager

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 2222

Processing Query Requests

• A query request q carries a timestamp q.prev, reflecting the latest version of the value that the front end has seen.

• Request q can be applied (i.e., it is stable) if q.prev valueTS (the value timestamp of the replica that received q).

• Once q is applied, the replica returns the current valueTS along with the reply.

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 2323

Processing Update Requests

• For an update u (not a duplicate), replica i – increments the i-th element of its replica timestamp r

eplicaTS by one,– adds an entry to the log with a timestamp ts derived

from u.prev by replacing the i-th element with that of replicaTS, and

– return ts to the front end immediately.

• When the stability condition u.prev valueTS holds, update u is applied and its ts is merged with valueTS.

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 2424

Processing Gossip Messages

• For every gossip message received, a replica does the following:– Merge the arriving log with its own; duplicated updates

are discarded.– Apply updates that have become stable.

• A gossip message need not contain the entire log, if it is certain that some of the updates have been seen by the receiving replica.

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 2525

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

Updates in Bayou

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 2626

About Bayou

• Consistency guarantees

• Merging of updates

• Dependency checks

• Merge procedures

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 2727

Coda vs. AFS

• More general replication

• Greater tolerance toward server crashes

• Allowing disconnected operations

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 2828

• A replicated transactional service should appear the same as one without replicated data.

• The effects of transactions performed by various clients on replicated data are the same as if they had been performed one at a time on single data items; this property is called one-copy serializability.

Transactions with Replicated Data

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 2929

• Failures should be serialized with respect to transactions.

• Any failure observed by a transaction must appear to have happened before the transaction started.

Transactions withReplicated Data (cont.)

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 3030

Schemes for One-Copy Serializability

• Read one/write all

• Available copies replication

• Schemes that also tolerate network partitioning:– available copies with validation– quorum consensus– virtual partition

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 3131

Source: Instructor’s guide for G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

B

A

Client + front end

BB BA A

getBalance(A)

Client + front end

Replica managersReplica managers

deposit(B,3);

UT

Transactions on Replicated Data

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 3232

Available Copies Replication

• A client's read request on a logical data item may be performed by any available replica, but a client's update request must be performed by all available replicas.

• A local validation procedure is required to ensure that any failure or recovery does not appear to happen during the progress of a transaction.

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 3333

Source: Instructor’s guide for G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

A

X

Client + front end

P

B

Client + front end

Replica managers

deposit(A,3);

UT

deposit(B,3);

getBalance(B)

getBalance(A)

Replica managers

Y

M

B

N

A

B

Available Copies Replication (cont.)

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 3434

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

Network Partition

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 3535

Available Copies with Validation

• The available copies algorithm is applied within each partition.

• When a partition is repaired, the possibly conflicting transactions that took place in the separate partitions are validated.

• If the validation fails, some of the transactions have to be aborted.

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 3636

Quorum Consensus Methods

• One way to ensure consistency across different partitions is to make a rule that operations can only be carried out within one of the partitions.

• A quorum is a subgroup of replicas whose size gives it the right to execute operations.

• Version numbers or timestamps may be used to determine whether copies of the data item are up to date.

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 3737

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

An Example for Quorum Consensus

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 3838

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

Two Network Partitions

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 3939

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

Virtual Partition

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 4040

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

Overlapping Virtual Partitions

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 4141

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

Creating Virtual Partitions

top related