this lecture was composed from fragments of two lectures ... · this lecture was composed from...

54
This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch [email protected] Consistency, Concurrency, ACID, and CAP Tuesday, November 05, 2013 4:26 PM Consistency_and_Concurrency Page 1

Upload: others

Post on 01-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing.

Alva [email protected]

Consistency, Concurrency, ACID, and CAPTuesday, November 05, 20134:26 PM

Consistency_and_Concurrency Page 1

Page 2: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

The most important attribute of a distributed system: how consistency is handled.

A consistent distributed object has the property that if many applications access the object via differing interfaces, all interfaces provide "the same view" of the object.

The definition of consistency:

You purchase an item. You ask what you purchased (perhaps through a different interface or mechanism than the one used to purchase the item). If you always get an accurate depiction of what you purchased -- regardless of interface -- then the purchase system is consistent.

Example: credit card purchases

ConsistencyTuesday, November 05, 201312:13 PM

Consistency_and_Concurrency Page 2

Page 3: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

How inconsistency arisesTuesday, November 05, 201312:18 PM

Consistency_and_Concurrency Page 3

Page 4: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

How inconsistency arises: horizontal scalingTuesday, November 05, 201312:23 PM

Consistency_and_Concurrency Page 4

Page 5: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

How inconsistency arises: enter the cloudTuesday, November 05, 201312:23 PM

Consistency_and_Concurrency Page 5

Page 6: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

When using a "cloud resource", the server you utilize often changes for every transaction you make. Reason: horizontal scalability. (elasticity)Unless servers somehow communicate, you receive an inconsistent view of the world. "The cloud" is one model of server communication. "The cloud" has its own concepts of consistency.

The facts of life for inconsistency

The facts of life for inconsistencyTuesday, November 05, 201312:20 PM

Consistency_and_Concurrency Page 6

Page 7: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

Is not a parallel or distributed computing program. Instances of an application do not communicate with each other. Instances have no local persistent memory.The only communication between instances is in sharing the same cloud!

A cloud application…

A cloud application...Wednesday, January 27, 20109:05 AM

Consistency_and_Concurrency Page 7

Page 8: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

Are regular serial application programs. Can have many concurrent instances running in different locations. All instances do the same thing.All instances have the same view of the world.

"Distributable" applications

Distributable programsWednesday, January 27, 20109:06 AM

Consistency_and_Concurrency Page 8

Page 9: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

Give all instances of an application "the same view" of the world. Are opaque to the application. Are distributed in ways that the application cannot detect.

Distributed objects

Distributed objectsWednesday, January 27, 20109:08 AM

Consistency_and_Concurrency Page 9

Page 10: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

The class determines the kind of application. An instance is one copy of a program that satisfies class requirementsThere can be many concurrent instances of a client (e.g., 1000 cellphone users)

For a client:

The class is the kind of serviceAn instance is one copy of a program that provides the service. There can be many instances of the same service(e.g., geographically distributed)

For a service:

Best to think about cloud clients and services in terms of classes and instances:

Classes and instancesMonday, January 24, 20117:58 PM

Consistency_and_Concurrency Page 10

Page 11: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

You run an app on your cellphone. It connects to a service. It does something (e.g., recording your Lat/Long location) It logs out. What can you assume about this transaction?

The concept of binding

you'll ever get the same server again. or that the server you get will have the same view of the cloud.(unless you know something more about the cloud…!)

You cannot assume that

All data must be stored in the cloud. There is no useful concept of local data.

Caveat: when you write a cloud service

BindingMonday, January 24, 20118:01 PM

Consistency_and_Concurrency Page 11

Page 12: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

A model of cloud executionMonday, January 24, 20117:54 PM

Consistency_and_Concurrency Page 12

Page 13: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

ACID: also known as Structured Query Language (SQL)NoSQL: Not Only SQL: makes a choice of desirable ACID properties, leaves some out.The CAP Theorem: constraints the choices NoSQL can enable.

Three different approaches to storing cloud data

ACID, NoSQL, and CAPTuesday, November 05, 201312:58 PM

Consistency_and_Concurrency Page 13

Page 14: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

Atomicity: requested operations either occur or not, and there is nothing in between "occurring" and "not occurring". Consistency: what you wrote is what you read. Isolation: no other factors other than your actions affect data. Durability: what you wrote remains what you read, even after system failures.

Consistent datastores should exhibit what is commonly called ACID:

Why isn't everything ACID? Why do non-ACID systems exist? This is a deep question....

ACIDWednesday, January 27, 201011:09 AM

Consistency_and_Concurrency Page 14

Page 15: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

Two visible properties of a distributed object: consistency and concurrencyConsistency: the extent to which "what you write" is "what you read" back, afterward. Concurrency: what happens when two instances of your application try to do conflicting things at the same exact time?

Consistency and concurrency

Consistency and concurrencyWednesday, January 27, 20109:19 AM

Consistency_and_Concurrency Page 15

Page 16: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

The extent to which "what you write" into a distributed object is "what you read" later.

Strong consistency: if you write something into a distributed object, you always read what you wrote, even immediately after the write. Weak (eventual) consistency: if you write something into a distributed object, then eventually -- after some time passes -- you will be able to read it back. Immediate queries may return stale data.

Two kinds:

Consistency

ConsistencyWednesday, January 27, 20109:20 AM

Consistency_and_Concurrency Page 16

Page 17: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

Strong consistency is like writing to a regular file or database: what you write is always what you get back. Eventual consistency is like writing something on pieces of paper and mailing them to many other people. What you get back depends upon which person you talk to and when you ask. Eventually,they'll all know about the change.

A file analogy

A file analogyWednesday, January 27, 20109:36 AM

Consistency_and_Concurrency Page 17

Page 18: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

How conflicting concurrent operations are handled.

Strong concurrency: if two or more conflicting operations are requested at the same time, they are serialized and done in arrival order, and both are treated as succeeding. Thus the last request determines the outcome. Weak ("opportunistic") concurrency: if two conflicting operations are requested at the same time, the first succeeds and the second fails. Thus the first request determines the outcome.

Two kinds:

Concurrency

ConcurrencyWednesday, January 27, 20109:23 AM

Consistency_and_Concurrency Page 18

Page 19: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

On linux, file writes exhibit strong concurrency, in the sense that conflicting writes all occur and the last one wins. Likewise, in a database, a stream of conflicting operations are serialized and all occur -- the last one determines the outcome. Opportunistic concurrency only occurs when there is some form of data locking, e.g., in a database transaction block.

A file analogy

A file analogyWednesday, January 27, 20109:41 AM

Consistency_and_Concurrency Page 19

Page 20: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

Obviously, we want both strong consistency and strong concurrencyBut we can't have both at the same time!

Consistency/Concurrency tradeoffs

Strong consistency and concurrencyWednesday, January 27, 20109:54 AM

Consistency_and_Concurrency Page 20

Page 21: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

Some form of read blocking until a consistent state is achieved, Which implies a (relatively) slow read time before unblocking.Which means we can't have strong concurrency!

Strong consistency requires

Strong consistency requirementsWednesday, January 27, 201010:01 AM

Consistency_and_Concurrency Page 21

Page 22: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

Some form of write sequencing. A (relatively) fast write time, with little blocking. Which means writes need time to propogate.Which means we can't have strong consistency!

Strong concurrency requires

Strong concurrency requirementsWednesday, January 27, 201010:01 AM

Consistency_and_Concurrency Page 22

Page 23: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

Provides strong consistencyAt the expense of opportunistic concurrency.

Google's "appEngine"

Provides strong concurrency. At the expense of exhibiting eventual consistency.

Amazon's "dynamo"

Two approaches to PAASWednesday, January 27, 201010:03 AM

Consistency_and_Concurrency Page 23

Page 24: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

Strong consistency: what you write is always what you read, even if you read at a (geographically) different place!Opportunistic concurrency: updates can fail; application is responsible for repeating failed update operations. Updates should be contained in "try" blocks!

AppEngine properties

AppEngineWednesday, January 27, 201010:18 AM

Consistency_and_Concurrency Page 24

Page 25: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

A distributed object retrieved from the persistence manager remains "attached" to the cloud. If you "set" something in a persistent object, this implicitly modifies the cloud version and every copy in other concurrently running application instances!This is what strong consistency means! So if some concurrent application instance sets something else in an object instance you fetched,your object will reflect that change (via strong consistency). So mostly, you observe what appears to be strong concurrency.

Modifying distributed objects in AppEngine

Modifying distributed objects in AppEngineWednesday, January 27, 201011:23 AM

Consistency_and_Concurrency Page 25

Page 26: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

How is this actually done? It's actually smoke and mirrors!

The illusion of strong consistency

Every object is "dirty" if changed, and "clean" if not.

A class of objects is dirty if any instance is. An instance is dirty if its data isn't completely propogated.

Very fast mechanisms for propogating "dirty" information (e.g., a bit array).

Actually propogate the data, then relabel the thing as clean.

Relatively slow mechanisms for changing something from "dirty" to "clean".

Applications get "dirty" info immediately, and then wait until the data is clean before proceeding!

Creating strong consistency

The illusion of strong consistencyMonday, January 24, 20118:09 PM

Consistency_and_Concurrency Page 26

Page 27: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

import com.google.appengine.api.datastore.Key;

import java.util.Date;import javax.jdo.annotations.IdGeneratorStrategy;import javax.jdo.annotations.IdentityType;import javax.jdo.annotations.PersistenceCapable;import javax.jdo.annotations.Persistent;import javax.jdo.annotations.PrimaryKey;

@PersistenceCapable(identityType = IdentityType.APPLICATION)public class Employee {

@PrimaryKey@Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)private Key key;

@Persistentprivate String firstName;

@Persistentprivate String lastName;

@Persistentprivate Date hireDate;

public Employee(String firstName, String lastName, Date hireDate) {this.firstName = firstName;this.lastName = lastName;this.hireDate = hireDate;

}

// Accessors for the fields. JDO doesn't use these, but your application does.

An example objectWednesday, January 27, 201011:28 AM

Consistency_and_Concurrency Page 27

Page 28: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

public Key getKey() {return key;

}

public String getFirstName() {return firstName;

} public void setFirstName(String firstName) {

this.firstName = firstName;}

public String getLastName() {return lastName;

} public void setLastName(String lastName) {

this.lastName = lastName;}

public Date getHireDate() {return hireDate;

} public void setHireDate(Date hireDate) {

this.hireDate = hireDate;}

}

Pasted from <http://code.google.com/appengine/docs/java/datastore/dataclasses.html>

Consistency_and_Concurrency Page 28

Page 29: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

A hard thing to understandWednesday, January 27, 20101:57 PM

Consistency_and_Concurrency Page 29

Page 30: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

Changes to a persistent object occur when requested. Access functions must be used; persistent data mustbe private; the Persistence Manager factory adds management code to make this happen! Changes are reflected everywhere the object is being referenced!

AppEngine persistent object caveats

Persistent object caveatsWednesday, January 27, 20102:38 PM

Consistency_and_Concurrency Page 30

Page 31: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

When you change something in an instance of a persistent object, it is changed in every other imageof that instance, including inside other instances of your application!

What does strong consistency mean?

But this is a polite illusion; in reality, other instances of your application wait for data they need to arrive!

What does strong consistency mean?Wednesday, January 27, 20102:02 PM

Consistency_and_Concurrency Page 31

Page 32: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

If you are just modifying one object in straightforward ways, one-attribute-at-a-time, you might think there is strong conurrency. But problems arise when you're trying to update an object, e.g., from itself. The solution to these problems -- and not the problems themselves -- makes concurrency weak!

Is concurrency actually weak?

Is concurrency actually weak?Wednesday, January 27, 20102:07 PM

Consistency_and_Concurrency Page 32

Page 33: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

Consider the code:

Key k = KeyFactory.createKey(Employee.class.getSimpleName(), "[email protected]");// assume existence of persistent getSalary and setSalary methodsEmployee e = pm.getObjectById(Employee.class, k);e.setSalary(e.getSalary()+100); // Give Alfred a raise!

Consider what happens when two application instancesinvoke this code at nearly the same time.

Updating an entity from its own valuesWednesday, January 27, 20102:08 PM

Consistency_and_Concurrency Page 33

Page 34: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

e.setSalary(e.getSalary()+100)The code

tmp = e.getSalary()tmp = tmp + 100e.setSalary(tmp)

is the same thing as (and is implemented as!)

The problem of concurrent updatesWednesday, January 27, 20102:11 PM

Consistency_and_Concurrency Page 34

Page 35: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

So, we can execute this twice according to the following schedule:

Instance 1 Instance 2

e.getSalary

e.getSalary

e.setSalary

e.setSalary

And Alfred gets a $100 raise rather than a $200 raise :(

So, we can execute this as: Wednesday, January 27, 20102:13 PM

Consistency_and_Concurrency Page 35

Page 36: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

Identify operations that should be done without interruption. Keep other concurrent things from interfering between begin() and commit(). E.g.:

Key k = KeyFactory.createKey(Employee.class.getSimpleName(), "[email protected]");pm.currentTransaction().begin();Employee e = pm.getObjectById(Employee.class, k);e.setSalary(e.getSalary()+100); // Give Alfred a raise!

pm.currentTransaction().commit();try {

// ouch: something prevented the transaction!throw ex; // share the pain!

} catch (JDOCanRetryException ex) {

}

Transactions allow us to avoid that problem:

TransactionsWednesday, January 27, 20102:16 PM

Consistency_and_Concurrency Page 36

Page 37: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

The transaction block (from begin() to commit()) attempts to execute before any other changes can be made to e. If no other changes have been made to e between begin() and commit(), the transaction succeeds. If some change in e has been made meanwhile, the transaction fails, e gets the changed values, the whole transaction is cancelled, and the application has to recover somehow (if it can).

How this works

How this worksWednesday, January 27, 20102:32 PM

Consistency_and_Concurrency Page 37

Page 38: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

the object will only change due to a commit.If two commits interleave, an exception is thrown.

If two applications try to do this, then

So we know we've goofed!

How this behaves

It is the use of transactions that "creates" weak concurrency, but without them, we have chaos!

How this behavesWednesday, January 27, 20102:34 PM

Consistency_and_Concurrency Page 38

Page 39: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

Transaction blocks delimit things that should be done together. If something changes about the object between begin() and commit(), the transaction throws an exception.So, the application knows that its request failed.

Optimistic concurrency

Optimistic concurrencyWednesday, January 27, 20102:23 PM

Consistency_and_Concurrency Page 39

Page 40: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

To modify an object, you must read it. If two object operations are attempted at the same time, it is possible for both to contain stale data with respect to each other. Then the only reasonable choice is for the loser to start over by reading e again! Otherwise data is lost! Thus the only reasonable choice that preserves transactional integrity is opportunistic concurrency!

A subtle point of object consistency

A subtle pointWednesday, January 27, 201010:27 AM

Consistency_and_Concurrency Page 40

Page 41: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

So far, we see that the outcome of concurrency is weak (transaction) consistency. Why does strong concurrency work well in Amazon dynamo? This is really subtle. Dynamo is intended as a data warehouse. Thus it is not a database itself, but rather, a historical record of a database. Thus it is never necessary to invoke a transaction on it!

When is strong concurrency a good choice?

When is strong concurrency a good choice? Wednesday, January 27, 201010:34 AM

Consistency_and_Concurrency Page 41

Page 42: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

Often misunderstood: some more radical authors interpret NoSQL as "Prevent Structured Query Language":)

NoSQL = "Not only Structured Query Language"

Assumes that the majority of queries are simple key fetches that don't require SQL structures. Optimizes simple key fetches for quick response. Often contains a fallback to interpreting full SQL, with performance disadvantages. Or doesn't provide all ACID properties.

In actuality, the NoSQL movement:

The NoSQL controversy

The NoSQL controversyWednesday, March 09, 20111:10 PM

Consistency_and_Concurrency Page 42

Page 43: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

Gets the last value associated with a key (most of the time).

ValueType get(KeyType key)

Assert the binding of a key to a value.Subsequent puts update data.

int put(KeyType key, ValueType value)

Operations are:A typical NoSQL service

A typical NoSQL serviceWednesday, March 09, 20113:12 PM

Consistency_and_Concurrency Page 43

Page 44: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

The most common use-case is retrieval by key.Values are just bags of bits; the user can given them structure as needed.The CAP theorem: one cannot build a distributed database system that exhibits all of strong Consistency, high Availability, and robustness in the presence of Partitioning (loss of messages).

Roots of NoSQL

Roots of NoSQLWednesday, March 09, 20111:13 PM

Consistency_and_Concurrency Page 44

Page 45: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

Consistency: all nodes have the same view of data. (High) Availability: data is instantly available from the system. Partitionability: the system continues to respond if messages are lost (due to system or network failure).

The CAP conjecture (Eric Brewer, 2000): any distributed datastore can only exhibit two of the following three properties

Proved to be true by Gilbert and Lynch (2002).

The CAP Theorem

Very similar to my initial claim about the tension between consistency and concurrency.Main contribution: cloud datastores can be categorized in terms of the two (of three) properties C,A,P that they exhibit. AppEngine is in class CP = Consistency + PartitionabilityOnly other reasonable cloud class is class AP = Availability + PartionabilityWe don't want a cloud datastore that loses data (e.g., ¬P)!

In the cloud context:

The CAP TheoremWednesday, March 09, 20111:15 PM

Consistency_and_Concurrency Page 45

Page 46: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

CAP theorem claim is related to my initial claim about the tension between consistency and concurrency.Main contribution: cloud datastores can be categorized in terms of the two (of three) properties C,A,P that they exhibit. AppEngine JDO is in class CP = Consistency + PartitionabilityOnly other reasonable class is AP = Availability + PartionabilityWe don't want a cloud datastore that loses data! (e.g., ¬P)

In the cloud context:

Impact of the CAP theoremWednesday, March 09, 20111:56 PM

Consistency_and_Concurrency Page 46

Page 47: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

Amazon DynamoFacebook's CassandraLinkedIn's Project Voldemort

The class AP contains datastores like:

Everybody needed one. Google didn't publish theirs. And it was CP, not AP.

Why so many?

The class APWednesday, March 09, 20111:58 PM

Consistency_and_Concurrency Page 47

Page 48: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

Periodically, a Hadoop job is run to identify potential friends. Output is stored to a Voldemort (NoSQL) datastore. A web service accesses the datastore in read-only mode.

How LinkedIn's colleague search actually works

Data changes slowly, so the Hadoop job only has to be done once/day or less. The Voldemort datastore is read-only once the job is done. So we don't need the C in CAP, because there are no consistency issues!So Voldemort, in class AP, suffices.

Some notes:

How LinkedIn's colleague search actually worksWednesday, March 09, 20112:02 PM

Consistency_and_Concurrency Page 48

Page 49: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

The back-end cloud datastore for amazon.com itself. Serves requests for shopping carts, purchases. On an eventually consistent datastore…!How?

What is dynamo?

What is dynamo?Wednesday, March 09, 20112:17 PM

Consistency_and_Concurrency Page 49

Page 50: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

Vector clocks for conflict detection. Business logic for conflict resolution.

Unique features of Dynamo

Unique features of dynamoWednesday, March 09, 20112:27 PM

Consistency_and_Concurrency Page 50

Page 51: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

Every change transaction contains a timestamp and a pointer to the previous version of the object. As transactions flow through the system, they are accumulated in a local history on each node.

Pruned if there are no conflicts. Resolved (in an application-specific manner) if conflicts occur.

Local history is

Vector clocks

Vector clocksWednesday, March 09, 20112:56 PM

Consistency_and_Concurrency Page 51

Page 52: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

Example: shopping cartWednesday, March 09, 20115:03 PM

Consistency_and_Concurrency Page 52

Page 53: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

If object is a shopping cart, contents are merged. If object is a purchase record, apparent duplicate purchases are eliminated.

What happens when a conflict occurs is based upon business rules.

Business-based recovery

Business-based recoveryWednesday, March 09, 20113:05 PM

Consistency_and_Concurrency Page 53

Page 54: This lecture was composed from fragments of two lectures ... · This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch

I told you previously that one would not want to store business data in an eventually consistent store.

embedding business logic. using vector clock updates.for each kind of object, separately.

Amazon "gets away with" doing that, by

This is a complex game, which is why Dynamo isn't available to their customers.

Why Amazon "gets away with" (eventually consistent) Dynamo

Why?Wednesday, March 09, 20113:16 PM

Consistency_and_Concurrency Page 54