im ntu distributed information systems 2004 replication management -- 1 replication management...
Post on 04-Jan-2016
215 Views
Preview:
TRANSCRIPT
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 11
Replication Management
Yih-Kuen Tsay
Dept. of Information Management
National Taiwan University
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 22
Motivations for Replication
• Performance enhancement– Client vs. server caching– Server pools– Replication of immutable vs. changing data
• Increased availability– Server failures– Network partition and disconnected operation
• Fault tolerance: guarantee correctness in spite of faults
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 33
General Requirements
• Replication transparency– Clients are not aware of multiple physical copies
(replicas) of an object.– Clients see one logical copy for each object.
• Consistency– Servers perform operations in a way that meets
the specification of correctness.
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 44
Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
An Architecture forReplication Management
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 55
About the Servers
• Recoverability• State Machines
– Consist of state variables and commands– Outputs determined by the sequence of requests
processed
• Static vs. dynamic set of replica managers– Dynamic: servers may crash; new ones may join– Static: crashed servers are considered to cease
operating (possibly for an indefinite period)
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 66
Phases of Request Processing
• Issuance– unicast or multicast (from the front end to replica managers)
• Coordination (to ensure consistency)– FIFO ordering, causal ordering, total ordering, …
• Execution (maybe tentatively)• Agreement (to commit or abort)• Response
– From one replica manager or several replica managers to the front end
* The ordering of the phases varies for different systems.
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 77
Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
Services for Process Groups
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 88
Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
View-Synchronous Group Communications
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 99
Correctness Criteria
• Linearizability
• Sequential consistency
* Consider individual operations (instead of transactions).
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 1010
Linearizability
• The interleaved sequence of operations meets the specification of a single correct copy of the objects.
• The order of operations in the interleaving is consistent with the real times at which the operations occurred in the actual execution.
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 1111
Sequential Consistency
• The one-copy semantics of the replicated objects is respected.
• The order of operations is preserved for each client, i.e., consistent with the program order for each client.
* Every linearizable service is also sequentially consistent.
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 1212
Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
Consistency is easily guaranteed if the replica managers are organized as a group
and the primary uses view-synchronous group communication to send updates.
The Primary-Backup (Passive) Model
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 1313
Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
Each front end sends its requests one at a time to all replica managers using a
totally ordered multicast primitive, ensuring that all requests are processed in the
same order at all replica managers.
Active Replication
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 1414
The Gossip Architecture
• A framework for providing high availability of service through lazy replication
• A request normally executed at one replica
• Replicas updated by lazy exchange of gossip messages (containing most recent updates).
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 1515
Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
Operations in a Gossip Service
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 1616
Timestamps
• Each front end keeps a vector timestamp reflecting the latest version accessed.
• The timestamp is attached to every request sent to a replica.
• Two front ends may exchange messages directly; these messages also carry timestamps.
• The merging of timestamps is done as usual.
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 1717
Timestamps (cont.)
• Each replica keeps a replica timestamp representing those updates it has received.
• It also keeps a value timestamp, reflecting the updates in the replicated value.
• The replica timestamp is attached to the reply to an update, while the value timestamp is attached to the reply to a query.
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 1818
Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
Timestamp Propagations
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 1919
The Update Log
• Every update, when received by a replica, is recorded in the update log of the replica.
• Two reasons for keeping a log:– The update cannot be applied yet; it is held
back.– It is uncertain if the update has been received
by all replicas.
• The entries are sorted by timestamps.
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 2020
The Executed Operation Table
• The same update may arrive at a replica from a front end and in a gossip message from another replica.
• To prevent an update from being applied twice, the replica keeps a list of identifiers of the updates that have been applied so far.
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 2121
Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
A Gossip Replica Manager
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 2222
Processing Query Requests
• A query request q carries a timestamp q.prev, reflecting the latest version of the value that the front end has seen.
• Request q can be applied (i.e., it is stable) if q.prev valueTS (the value timestamp of the replica that received q).
• Once q is applied, the replica returns the current valueTS along with the reply.
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 2323
Processing Update Requests
• For an update u (not a duplicate), replica i – increments the i-th element of its replica timestamp r
eplicaTS by one,– adds an entry to the log with a timestamp ts derived
from u.prev by replacing the i-th element with that of replicaTS, and
– return ts to the front end immediately.
• When the stability condition u.prev valueTS holds, update u is applied and its ts is merged with valueTS.
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 2424
Processing Gossip Messages
• For every gossip message received, a replica does the following:– Merge the arriving log with its own; duplicated updates
are discarded.– Apply updates that have become stable.
• A gossip message need not contain the entire log, if it is certain that some of the updates have been seen by the receiving replica.
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 2525
Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
Updates in Bayou
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 2626
About Bayou
• Consistency guarantees
• Merging of updates
• Dependency checks
• Merge procedures
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 2727
Coda vs. AFS
• More general replication
• Greater tolerance toward server crashes
• Allowing disconnected operations
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 2828
• A replicated transactional service should appear the same as one without replicated data.
• The effects of transactions performed by various clients on replicated data are the same as if they had been performed one at a time on single data items; this property is called one-copy serializability.
Transactions with Replicated Data
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 2929
• Failures should be serialized with respect to transactions.
• Any failure observed by a transaction must appear to have happened before the transaction started.
Transactions withReplicated Data (cont.)
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 3030
Schemes for One-Copy Serializability
• Read one/write all
• Available copies replication
• Schemes that also tolerate network partitioning:– available copies with validation– quorum consensus– virtual partition
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 3131
Source: Instructor’s guide for G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
B
A
Client + front end
BB BA A
getBalance(A)
Client + front end
Replica managersReplica managers
deposit(B,3);
UT
Transactions on Replicated Data
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 3232
Available Copies Replication
• A client's read request on a logical data item may be performed by any available replica, but a client's update request must be performed by all available replicas.
• A local validation procedure is required to ensure that any failure or recovery does not appear to happen during the progress of a transaction.
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 3333
Source: Instructor’s guide for G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
A
X
Client + front end
P
B
Client + front end
Replica managers
deposit(A,3);
UT
deposit(B,3);
getBalance(B)
getBalance(A)
Replica managers
Y
M
B
N
A
B
Available Copies Replication (cont.)
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 3434
Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
Network Partition
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 3535
Available Copies with Validation
• The available copies algorithm is applied within each partition.
• When a partition is repaired, the possibly conflicting transactions that took place in the separate partitions are validated.
• If the validation fails, some of the transactions have to be aborted.
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 3636
Quorum Consensus Methods
• One way to ensure consistency across different partitions is to make a rule that operations can only be carried out within one of the partitions.
• A quorum is a subgroup of replicas whose size gives it the right to execute operations.
• Version numbers or timestamps may be used to determine whether copies of the data item are up to date.
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 3737
Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
An Example for Quorum Consensus
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 3838
Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
Two Network Partitions
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 3939
Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
Virtual Partition
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 4040
Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
Overlapping Virtual Partitions
IM NTUIM NTU
Distributed Information Systems Distributed Information Systems 20042004 Replication ManagementReplication Management -- -- 4141
Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition.
Creating Virtual Partitions
top related