Database Scalability, Elasticity, and Autonomy in the Cloud
Agrawal et al.
Oct 24, 2011
Framing
• Survey paper• Identifies necessary qualities of cloud storage– Scalability– Sensible consistency / programming model– Scale-down and migration– Autonomic management
• Pointers to different work in the space
Scalability
• Add more resources, get more performance– Handle more requests per second– Store more data
• Achievable with scale-up or scale-out– Scale-out is the only paradigm for the cloud
• App’s parallelism is limited by Amdahl’s Law
Finding the right design point
• What’s the right consistency / programming model?• Pure key-value stores are too weak– Only have transactions on single records
• Traditional RDBMs are too strong– Can’t just run MySQL at scale
• Instead, provide strong consistency within a portion of the data– Megastore– Vertica, Aster, Teradata, Greenplum, …
Data Fusion vs. Data Fission
Consistency
Weak Strong
Dynamo MySQLBigTable, PNUTS
Fusion Fission
Megastore,G-Store
Azure,ElasTraS,Rel Cloud
Data Fusion
• Start with a key-value store• Partition records into groups • Provide multi-record updates within a group• Cross-group operations handled separately• Assumes that cross-group ops are rare
Data Fission
• Start with a relational database• Partition tables into shards • Provide ACID within each shard• Cross-shard ops are expensive• Assumes that cross-shard ops are rare
What’s the difference?
• Is Fusion vs. Fission a worthwhile distinction?• Seems like they both arrive at the same place• Megastore “Fusion” vs. ElasTras “Fission”– Shard tables based on a table’s primary key– Shard is co-located on the same machine– ACID transactions within a shard– Primary and secondary indexes– All Megastore is missing is an SQL interface!
The difference
• Different targeted users– Fusion is for people who own datacenters– Fission is for people who want SQL in the cloud
• Different exposed API– Fusion is more explicit about performance– Fission tries to hide partitioning from user
• Anything else?
Elasticity
• Dynamically scaling up and down on-demand• Important with pay-as-you-go cloud pricing• Consolidate to reduce costs• Expand to increase performance• Need to move state and processing duties
around within the system
Live migration of databases
• Shared-disk– “Global disk” shared by all DB nodes– Just need to copy in-memory state– Iterative copy: sync up cached pages + transaction state
to minimize the availability hit• Shared-nothing– Each DB node is its own separate DB instance– Need to copy both local disk state and memory– Push/pull: gradually shift new requests to the new node,
sync state in the background
Database Autonomy
• Need management to be more automatic• Elasticity and load balancing based on usage
and ML predictions• Performance modeling– Migration costs (availability, performance, $$$)– Resource isolation (consolidated services)– SLAs
Questions?
Tree schema
• Primary table’s primary key used for sharding• Secondary tables are sharded into row groups– Row groups are co-located and transactional
• Global tables are write-rarely, and replicated on all nodes
Tree schema