swift container sync

Container Sync

Gregory [email protected]

http://tlohg.com/

OpenStack Design SummitApril 26-29, 2011

mailto:[email protected]


http://tlohg.com

http://tlohg.com

Original Goal

Provide greater availability and durability with geographically distinct replicas.

Multi-Region Replication• Replicate objects to other Swift clusters.• Allow a configurable number of remote replicas.• Ideally allow per container configuration.

Problems• Very complex to implement, the simpler feature I propose

is already pretty complex.• Swift currently only has a cluster-wide replica count.• Tracking how many replicas are remote and where adds

complexity.• Per container remote replica counts adds complexity.

Complexity = More Time and More Bugs

New Goal

Provide greater availability and durability with geographically distinct replicas.

Simpler Container Synchronization• Replicate objects to other Swift clusters.• Remote replica count not configurable, it is the number of

replicas the remote cluster is already configured for.• Per container configuration allowed, but just "to where".

Benefits• Much simpler (but still complex).• Doesn't alter fundamental Swift internals.• Per container configuration that doesn't change behavior,

only the destination.• Side Benefit: Can actually synchronize containers within the

same cluster. (Migrating an account to another, for instance.)

Simpler = Less Time and Fewer Bugs

How the User Would Use It

1. Set the first container's X-Container-Sync-To and X-Container-Sync-Key values; the To to the second container's URL and the Key made up:

2. Set the second container's X-Container-Sync-To and X-Container-Sync-Key values; the To to the first container's URL and the Key the same made up value:

Now, any existing objects in the containers will be synced to one another as well as any additional objects.

$ st post -t https://cluster2/v1/AUTH_gholt/container2 -k secret container1

$ st post -t https://cluster1/v1/AUTH_gholt/container1 -k secret container2

https://cluster2/v1/AUTH_gholt/container2'




Advanced Container SynchronizationYou can synchronize more than just two containers.

Normally you just synchronize the two containers:

But, you could synchronize more by using a chain:

Container 1 Container 2

Container 1 Container 2 Container 3

Caveats• Valid X-Container-Sync-To destinations must be configured for each

cluster ahead of time. The feature is based on Cluster Trust.

• The Swift cluster clocks need to be set reasonably close to one other. Swift timestamps each operation and these timestamps are used in conflict resolution. For example, if an object is deleted on one cluster and overwritten on the other, whichever has the newest timestamp will win.

• There needs to be enough bandwidth between the clusters to keep up with all the changes to the synchronized containers.

• There will be a burst of bandwidth used when turning the feature on for an existing container full of objects.

• A user has no explicit guarantee when a change will make it to the remote cluster. For example, a successful PUT means that cluster has the object, not the remote cluster. The synchronization happens in the background.

• Does not sync object POSTs yet (more on this later).

• Since background syncs come from the container servers themselves, they need to communicate with the remote cluster, probably requiring an HTTP proxy, and probably one per zone to avoid choke points.

What’s Left To Do?

HTTP Proxying

Tests

Documentation

POSTs

Because object POSTs don't currently cause a container database update, we need to either cause an update or come up with another way to synchronize them.

The current plan is to modify POSTs to actually be a COPY internally.

Downside: POSTs to large files will take longer.

Upside: We have noticed very few POSTs in production.

Live Account Migrations

This is a big step towards live account migrations.1. Turn on sync for the linked accounts on the two clusters.2. Wait for the new account to get caught up.3. Switch auth response URL to new account and revoke all existing account tokens.4. Put old account in a read-only mode.5. Turn off sync from the new account to the old.6. Wait until old account is no longer sending updates plus some safety time.7. Purge old account.

Missing Pieces:

• Account sync (creating new containers on both sides, deletes and posts too).

• Account read-only mode.

• Using alternate operator-only headers to not conflict with the user's, also keeping the user from seeing or modifying the values.

Implementation

st• Updated to set/read X-Container-Sync-To and X-Container-Sync-Key.

Swauth and container-server• Requires a new conf value allowed_sync_hosts indicating the allowed remote

clusters.

swift-container-sync• New daemon that runs on every container server.

• Scans every container database looking for ones with sync turned on.

• Sends updates based on any new ROWIDs in the container database.

• Keeps sync points in the local container databases of the last ROWIDs sent out.

Complexity - swift-container-sync

There are three container databases on different servers for each container.

No need and quite wasteful for each to send all the updates.

Easiest solution is to just have one send out the updates, but:

• What if that one is down?

• Couldn't synchronization be done faster if all three were involved?

Instead, each sends a different third of the updates (assuming 3 replicas here).

• Downside: If one is down, a third of the updates will be delayed until it comes back up.

So, in addition, each node will send all older updates to ensure quicker synchronization.

• Normally, each server does a third of the updates.

• Each server also does all older updates for assurance.

• The vast majority of assurance updates will short circuit.

In The Weeds

• Two sync points are kept per container database.

• All rows between the two sync points trigger updates. *

• Any rows newer than both sync points cause updates depending on the node's position for the container (primary nodes do one third, etc. depending on the replica count of course).

• After a sync run, the first sync point is set to the newest ROWID known and the second sync point is set to newest ROWID for which all updates have been sent.

* This is a slight lie. It actually only needs to send the two-thirds of updates it isn't primarily responsible for since it knows it already sent the other third.

In The Weeds

An example may help. Assume replica count is 3 and perfectly matching ROWIDs starting at 1.

First sync run, database has 6 rows:

• SyncPoint1 starts as -1.


• No rows between points, so no "all updates" rows.

• Six rows newer than SyncPoint1, so a third of the rows are sent by node 1, another third by node 2, remaining third by node 3.

• SyncPoint1 is set as 6 (the newest ROWID known).

• SyncPoint2 is left as -1 since no "all updates" rows were synced.

In The Weeds

Next sync run, database has 12 rows:

• SyncPoint1 starts as 6.


• The rows between -1 and 6 all trigger updates (most of which should short-circuit on the remote end as having already been done).

• Six more rows newer than SyncPoint1, so a third of the rows are sent by node 1, another third by node 2, remaining third by node 3.

• SyncPoint1 is set as 12 (the newest ROWID known).

• SyncPoint2 is set as 6 (the newest "all updates" ROWID).

In this way, under normal circumstances each node sends its share of updates each run and just sends a batch of older updates to ensure nothing was missed.

Extras

• swift-container-sync can be configured to only spend x amount of time trying to sync a given container -- avoids one crazy container starving out all others.

• A crash of a container server means lost container database copies that will be replaced by one of the remaining copies on the other servers. The reestablished server will get the sync points from the copy, but no updates will be lost due to the "all updates" algorithm the other two followed.

• Rebalancing the container ring moves container database copies around, but results in the same behavior as a crashed server would.

• For bidirectional sync setups, the receiver will send the sender back the updates (though they will short-circuit). Only way I can think to prevent that is to track where updates were received from (X-Loop) but that's expensive.

Anything Else?

[email protected]

http://tlohg.com/



http://tlohg.com/

http://tlohg.com/

swift container sync

Technology