Download - Scalability: Rdbms Vs Other Data Stores
Scalability
• Increase Resources Increase Performance (Linearly)
• Performance?– Latency, Capacity, Throughput
• Vertical Scalability (Scaling Up)– Divide the functionality
• Horizontal Scalability (Scaling Out)– Divide the data
Relational Database
• Table, Row, Column• Set, Item, Property
Relational Theory
• Selection: SELECT• Filter: WHERE• Join: JOIN, LEFT JOIN,RIGHT JOIN• Correlation:
SELECT a FROM A WHERE A.b IN (SELECT b FROM B WHERE b.a > a)
Relational Theory
• Aggregation– Set Operators• Union, Intersection, Minus
– Group By• MAX, MIN, SUM, AVG
Transactions: Atomicity
• Transaction Level– Entire Logical operations is a transaction– Multiple statements
• Statement level– Each statement is either successful or not, no
partial success– Multiple records
• Record Level– All modifications to a record are successful or not
Transactions: Consistency
• Integrity Constraints• Referential Integrity
Transactions: Isolation Levels
• Serializable– A definite order of mutations/transactions is possible to
arrive to state B from state A• Repeatable Read
– Any data read by a transaction will remain so till transaction is complete
• Non Repeatable Read aka Read Committed– Two reads within a transaction may give different results
• Dirty Read– A transaction might read data which might then be
rolledback
RDBMS Luxuries
• Multiple Indexes• Auto Increments/Sequences• Triggers
Scalability in RDBMS
• Replication– Read Replication (Master-Slave)– Read Write Replication (Master-Master)
• Cluster– Distributed Transaction– Two-phase commits
Scalability Impediments
• Performance– Sub-Queries/Correlation, Joins, Aggregates, – Referential Integrity constraints
• Basic Guarantee– Consistency– Availability
CAP?
• Conjecture: Distributed systems cannot ensure all three of the following properties at once– Consistency The client perceives that a set of
operations has occurred all at once.– Availability Every operation must terminate in an
intended response.– Partition tolerance Operations will complete, even
if individual components are unavailable.
ACID to BASE
• Basically Available - system seems to work all the time
• Soft State - it doesn't have to be consistent all the time
• Eventually Consistent - becomes consistent at some later time
BASE: An Example
BEGIN TransactionINSERT INTO ORDER( oid, timestamp, customer)FOREACH item IN itemList
INSERT INTO ORDER_ITEM ( oid, item.id, item.quantity, item.unitprice)
//UPDATE INVENTORY SET quantity=quantity-item.quantity WHERE item = item.idCOMMIT
END Transaction
Assume Each statement is queued for execution You will get COMMIT success
Alternate Implementations
• BigTable – Google – CP• Hbase – Apache – CP • HyperTable – Community - CP • Dynamo – Amazon – AP• SimpleDB – Amazon - AP• Voldemort – LinkedIn – AP• Cassandra – Facebook – AP• MemcacheDB - community – CP/AP
Data Models
• Key/Value Pairs – Dynamo, MemcacheDB, Voldemort
• Row-Column– BigTable, Casandra, SimpleDB, Hypertable, Hbase
Programming Models
// Open the tableTable *T = OpenOrDie("/bigtable/web/webtable");// Write a new anchor and delete an old anchorRowMutation r1(T, "com.cnn.www");r1.Set("anchor:www.c-span.org", "CNN");r1.Delete("anchor:www.abc.com");Operation op;Apply(&op, &r1);
BigTable: Consistent yet Infinitely Scalable
• Single Master• B+ tree based data distribution
BigTable: Transactions
Invoice
Invoice Item
Delivery Note
• Enities and Entity Groups
Dynamo: Highly available and Infinitely Scalable
• Consistent Hashing• Peer to Peer Distributed• Gossip based member discovery
RDBMS or Other?
• Nature of Business• Maturity of the Product• Cost of Adoption• Maturity of the alternative Datastores
Q&A