im ntu distributed information systems 2004 replication management -- 1 replication management...

IM NTUIM NTU

Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 11

Replication Management

Yih-Kuen Tsay

Dept. of Information Management

National Taiwan University

IM NTUIM NTU

Motivations for Replication

• Performance enhancement– Client vs. server caching– Server pools– Replication of immutable vs. changing data

• Increased availability– Server failures– Network partition and disconnected operation

• Fault tolerance: guarantee correctness in spite of faults

IM NTUIM NTU

General Requirements

• Replication transparency– Clients are not aware of multiple physical copies

(replicas) of an object.– Clients see one logical copy for each object.

• Consistency– Servers perform operations in a way that meets

the specification of correctness.

IM NTUIM NTU

Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

An Architecture forReplication Management

IM NTUIM NTU

About the Servers

• Recoverability• State Machines

– Consist of state variables and commands– Outputs determined by the sequence of requests

processed

• Static vs. dynamic set of replica managers– Dynamic: servers may crash; new ones may join– Static: crashed servers are considered to cease

operating (possibly for an indefinite period)

IM NTUIM NTU

Phases of Request Processing

• Issuance– unicast or multicast (from the front end to replica managers)

• Coordination (to ensure consistency)– FIFO ordering, causal ordering, total ordering, …

• Execution (maybe tentatively)• Agreement (to commit or abort)• Response

– From one replica manager or several replica managers to the front end

* The ordering of the phases varies for different systems.

IM NTUIM NTU

Services for Process Groups

IM NTUIM NTU

View-Synchronous Group Communications

IM NTUIM NTU

Correctness Criteria

• Linearizability

• Sequential consistency

* Consider individual operations (instead of transactions).

IM NTUIM NTU

Linearizability

• The interleaved sequence of operations meets the specification of a single correct copy of the objects.

• The order of operations in the interleaving is consistent with the real times at which the operations occurred in the actual execution.

IM NTUIM NTU

Sequential Consistency

• The one-copy semantics of the replicated objects is respected.

• The order of operations is preserved for each client, i.e., consistent with the program order for each client.

* Every linearizable service is also sequentially consistent.

IM NTUIM NTU

Consistency is easily guaranteed if the replica managers are organized as a group

and the primary uses view-synchronous group communication to send updates.

The Primary-Backup (Passive) Model

IM NTUIM NTU

Each front end sends its requests one at a time to all replica managers using a

totally ordered multicast primitive, ensuring that all requests are processed in the

same order at all replica managers.

Active Replication

IM NTUIM NTU

The Gossip Architecture

• A framework for providing high availability of service through lazy replication

• A request normally executed at one replica

• Replicas updated by lazy exchange of gossip messages (containing most recent updates).

IM NTUIM NTU

Operations in a Gossip Service

IM NTUIM NTU

Timestamps

• Each front end keeps a vector timestamp reflecting the latest version accessed.

• The timestamp is attached to every request sent to a replica.

• Two front ends may exchange messages directly; these messages also carry timestamps.

• The merging of timestamps is done as usual.

IM NTUIM NTU

Timestamps (cont.)

• Each replica keeps a replica timestamp representing those updates it has received.

• It also keeps a value timestamp, reflecting the updates in the replicated value.

• The replica timestamp is attached to the reply to an update, while the value timestamp is attached to the reply to a query.

IM NTUIM NTU

Timestamp Propagations

IM NTUIM NTU

The Update Log

• Every update, when received by a replica, is recorded in the update log of the replica.

• Two reasons for keeping a log:– The update cannot be applied yet; it is held

back.– It is uncertain if the update has been received

by all replicas.

• The entries are sorted by timestamps.

IM NTUIM NTU

The Executed Operation Table

• The same update may arrive at a replica from a front end and in a gossip message from another replica.

• To prevent an update from being applied twice, the replica keeps a list of identifiers of the updates that have been applied so far.

IM NTUIM NTU

A Gossip Replica Manager

IM NTUIM NTU

Processing Query Requests

• A query request q carries a timestamp q.prev, reflecting the latest version of the value that the front end has seen.

• Request q can be applied (i.e., it is stable) if q.prev valueTS (the value timestamp of the replica that received q).

• Once q is applied, the replica returns the current valueTS along with the reply.

IM NTUIM NTU

Processing Update Requests

• For an update u (not a duplicate), replica i – increments the i-th element of its replica timestamp r

eplicaTS by one,– adds an entry to the log with a timestamp ts derived

from u.prev by replacing the i-th element with that of replicaTS, and

– return ts to the front end immediately.

• When the stability condition u.prev valueTS holds, update u is applied and its ts is merged with valueTS.

IM NTUIM NTU

Processing Gossip Messages

• For every gossip message received, a replica does the following:– Merge the arriving log with its own; duplicated updates

are discarded.– Apply updates that have become stable.

• A gossip message need not contain the entire log, if it is certain that some of the updates have been seen by the receiving replica.

IM NTUIM NTU

Updates in Bayou

IM NTUIM NTU

About Bayou

• Consistency guarantees

• Merging of updates

• Dependency checks

• Merge procedures

IM NTUIM NTU

Coda vs. AFS

• More general replication

• Greater tolerance toward server crashes

• Allowing disconnected operations

IM NTUIM NTU

• A replicated transactional service should appear the same as one without replicated data.

• The effects of transactions performed by various clients on replicated data are the same as if they had been performed one at a time on single data items; this property is called one-copy serializability.

Transactions with Replicated Data

IM NTUIM NTU

• Failures should be serialized with respect to transactions.

• Any failure observed by a transaction must appear to have happened before the transaction started.

Transactions withReplicated Data (cont.)

IM NTUIM NTU

Schemes for One-Copy Serializability

• Read one/write all

• Available copies replication

• Schemes that also tolerate network partitioning:– available copies with validation– quorum consensus– virtual partition

IM NTUIM NTU

Source: Instructor’s guide for G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

Client + front end

BB BA A

getBalance(A)

Client + front end

Replica managersReplica managers

deposit(B,3);

Transactions on Replicated Data

IM NTUIM NTU

Available Copies Replication

• A client's read request on a logical data item may be performed by any available replica, but a client's update request must be performed by all available replicas.

• A local validation procedure is required to ensure that any failure or recovery does not appear to happen during the progress of a transaction.

IM NTUIM NTU

Source: Instructor’s guide for G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.

Client + front end

Replica managers

deposit(A,3);

deposit(B,3);

getBalance(B)

getBalance(A)

Replica managers

Available Copies Replication (cont.)

IM NTUIM NTU

Network Partition

IM NTUIM NTU

Available Copies with Validation

• The available copies algorithm is applied within each partition.

• When a partition is repaired, the possibly conflicting transactions that took place in the separate partitions are validated.

• If the validation fails, some of the transactions have to be aborted.

IM NTUIM NTU

Quorum Consensus Methods

• One way to ensure consistency across different partitions is to make a rule that operations can only be carried out within one of the partitions.

• A quorum is a subgroup of replicas whose size gives it the right to execute operations.

• Version numbers or timestamps may be used to determine whether copies of the data item are up to date.

IM NTUIM NTU

An Example for Quorum Consensus

IM NTUIM NTU

Two Network Partitions

IM NTUIM NTU

Virtual Partition

IM NTUIM NTU

Overlapping Virtual Partitions

IM NTUIM NTU

Creating Virtual Partitions

im ntu distributed information systems 2004 replication management -- 1 replication management...

Documents

2006 - community to community forum - final report - tsay...

consistency and replication - nanjing...

oracle database advanced replication management api...

replication and query processing in the appa data management...

management of data replication for pc cluster based cloud...

replication security - hagander · replication security...

full platform independent database replication...

chapter 7: replication management using the state machine

chapter 7: replication management using the state...

james myers and jane tsay - nccu

replication management agent configuration and · pdf...

pseudo gtid and easy mysql replication topology management

replication server quick start guide for sap hana … ·...

replication management in reliable real-time systems -...

d. l.young nan-jing wu and ting-kuei tsay

replication management pseudo-gtid and easy … and easy...

vocabulary key to tsay do greisghilde

string processing - (based on [manber...

advanced replication 11g release 2 (11.2) - oracle ·...

data replication buying guide - keeping data in motiondata...