scalable databases - from relational databases to polyglot persistence

31
Sergio Bossa – [email protected] Javaday IV – Roma – 30 gennaio 2010 SCALABLE DATABASES From Relational Databases To Polyglot Persistence Sergio Bossa [email protected] http://twitter.com/sbtourist

Upload: sergio-bossa

Post on 10-May-2015

5.649 views

Category:

Technology


0 download

DESCRIPTION

In a world where everyone is connected, and everyone's data is on the web, scaling your database is no more a choice: it is a necessity.In this talk we'll see how to make relational and non-relational databases scale at our needs by understanding and applying old and new patterns, then we'll look at the most common use cases, and how to address them by choosing the right patterns and tools.

TRANSCRIPT

Page 1: Scalable Databases - From Relational Databases To Polyglot Persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

SCALABLE DATABASESFrom Relational Databases

To Polyglot Persistence

Sergio Bossa [email protected]://twitter.com/sbtourist

Page 2: Scalable Databases - From Relational Databases To Polyglot Persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

About Me● Software architect and engineer

● Gioco Digitale (online gambling and casinos)● Open Source enthusiast

● Terracotta Messaging (http://forge.terracotta.org)● Terrastore (http://code.google.com/p/terrastore)● Actorom (http://code.google.com/p/actorom)

● (Micro-)Blogger● http://twitter.com/sbtourist● http://sbtourist.blogspot.com

Page 3: Scalable Databases - From Relational Databases To Polyglot Persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

Five fallacies of data-centric systems

Data model is static.Data volume is predictable.

Data access load is predictable.Database topology doesn't change.

Database never fails.

Page 4: Scalable Databases - From Relational Databases To Polyglot Persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

Scalable databases in action

● Scaling your database as a way to solve fallacies above.● Scale to handle heterogeneous data.● Scale to handle more data.● Scale to handle more load.● Scale to handle topology changes due to:

● Unplanned growth.● Unpredictable failures.

Page 5: Scalable Databases - From Relational Databases To Polyglot Persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

Scaling Relational Databases

Page 6: Scalable Databases - From Relational Databases To Polyglot Persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

Master-Slave replication● Master - Slave replication.

● One (and only one) master database.

● One or more slaves.● All writes goes to the master.

● Replicated to slaves.● Reads are balanced among master

and slaves.● Major issues:

● Single point of failure.● Single point of bottleneck.● Static topology.

Page 7: Scalable Databases - From Relational Databases To Polyglot Persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

Master-Master replication● Master - Master replication.

● One or more masters.● Writes and reads can go to any

master node.● Writes are replicated among

masters.● Major issues:

● Limited performance and scalability (typically due to 2PC).

● Complexity.● Static topology.

Page 8: Scalable Databases - From Relational Databases To Polyglot Persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

Vertical partitioning● Vertical partitioning.

● Put tables belonging to different functional areas on different database nodes.● Scale your data and load by

function.● Move joins to the application

level.● Major issues:

● No more truly relational.● What if a functional area grows too

much?

Page 9: Scalable Databases - From Relational Databases To Polyglot Persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

Horizontal partitioning● Horizontal partitioning.

● Split tables by key and put partitions (shards) on different nodes.● Scale your data and load by key.● Move joins to the application

level.● Needs some kind of routing.

● Major issues:

● No more truly relational.● What if your partition grows too

much?

Page 10: Scalable Databases - From Relational Databases To Polyglot Persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

Caching● Put a cache in front of your database.

● Distribute.● Write-through for scaling reads.● Write-behind for scaling reads and

writes.● Saves you a lot of pain, but ...

● “Only” scales read/write load.

Page 11: Scalable Databases - From Relational Databases To Polyglot Persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

Did we solve our fallacies?● We tried, but ...

● Still bound to the relational model.● Replication only covers a few use cases.● Partitioning is hard.● Caching is good, but not definitive.● ...

● Can we do any better?

Page 12: Scalable Databases - From Relational Databases To Polyglot Persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

It's Not Only SQL

Page 13: Scalable Databases - From Relational Databases To Polyglot Persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

NOSQL Characteristics● Main traits of characterization:

● Data Model.● Data Processing.● Consistency Model.● Scale Out.

Page 14: Scalable Databases - From Relational Databases To Polyglot Persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

Data Model (1)● Column-family based.● Structure:

● Key-identified rows with a sparse number of columns.● Columns grouped in families.● Multiple families for the same key.

● Highlights:● Dynamically add and remove columns.● Efficiently access columns in the same group (column

family).

Page 15: Scalable Databases - From Relational Databases To Polyglot Persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

Data Model (2)● Document based.● Structure:

● Key-identified documents.● Schema-less (but optionally constrained).

– JSON, XML ...● Highlights:

● Dynamically change inner documents structure.● Efficiently access documents as a unit.

Page 16: Scalable Databases - From Relational Databases To Polyglot Persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

Data Model (3)● Graph based.● Structure:

● Nodes to represent your data.● Relations as meaningful links between nodes.● Properties to enrich both.

● Highlights:● Rich data model.● Efficient, fast, traversal of nodes and relations.

Page 17: Scalable Databases - From Relational Databases To Polyglot Persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

Data Model (4)● Key-Value based.● Structure:

● Key-identified opaque values.● Highlights:

● Great flexibility.● Fast reads/writes for single entries.

Page 18: Scalable Databases - From Relational Databases To Polyglot Persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

Data Processing● Several options:

● Map/Reduce.● Predicates.● Range Queries.● ...

● One common principle:● Move processing toward related data.

Page 19: Scalable Databases - From Relational Databases To Polyglot Persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

Consistency Model (1)● Strict Consistency.

● All nodes ...● At every point in time ...● See a consistent view of the stored data.

– Per-key consistency.– Multi-key consistency.

Page 20: Scalable Databases - From Relational Databases To Polyglot Persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

Consistency Model (2)● Eventual Consistency.

● Only a subset of all nodes ...● At a specific point in time ...● See a consistent view of the stored data.

– Other nodes will serve stale data.– Other nodes will eventually get updates later.

Page 21: Scalable Databases - From Relational Databases To Polyglot Persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

Scale Out (1)● Master-based.

● Membership managed and broadcasted by masters.

● Data consistency guaranteed by masters.

● No SPOF with active/passive masters.

● No SPOB with active/active masters or cluster-cluster replication.

● Prone to partitioning failures.

Page 22: Scalable Databases - From Relational Databases To Polyglot Persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

Scale Out (2)● Peer-to-peer.

● Membership is maintained through multicast or gossip-based protocols.

● Data consistency is maintained through quorum protocols.

● Easier to scale.● Harder to maintain consistency.

Page 23: Scalable Databases - From Relational Databases To Polyglot Persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

NOSQL Use Cases● Use cases evolve along the following kinds of data:

● Rich.● Runtime.● Hot Spot.● Massive.● Computational.

● Do not use the same product for all cases.● Pick multiple products for different use cases.

Page 24: Scalable Databases - From Relational Databases To Polyglot Persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

NOSQL Products - Cassandra● Cassandra (http://incubator.apache.org/cassandra)● Data Model:

● Column-family based.● Data Processing:

● Range queries, Predicates.● Consistency:

● Eventual consistency.● Scalability:

● Peer-to-peer, gossip based.

Page 25: Scalable Databases - From Relational Databases To Polyglot Persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

NOSQL Products - Mongo DB● Mongo DB (http://www.mongodb.org)● Data Model:

● Document based (JSON).● Data Processing:

● Map/Reduce, SQL-like queries.● Consistency:

● Per-document strict consistency.● Scalability:

● Replication, partitioning (alpha).

Page 26: Scalable Databases - From Relational Databases To Polyglot Persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

NOSQL Products - Neo4j● Neo4j (http://neo4j.org)● Data Model:

● Graph based.● Data Processing:

● Path traversal, Index-based search.● Consistency:

● Strict consistency.● Scalability:

● Replication.

Page 27: Scalable Databases - From Relational Databases To Polyglot Persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

NOSQL Products - Riak● Riak (http://riak.basho.com)● Data Model:

● Document based (JSON).● Data Processing:

● Map/Reduce.● Consistency:

● Eventual consistency.● Scalability:

● Peer-to-peer, gossip based.

Page 28: Scalable Databases - From Relational Databases To Polyglot Persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

NOSQL Products - Terrastore● Terrastore (http://code.google.com/p/terrastore)● Data Model:

● Document based (JSON).● Data Processing:

● Range queries, Predicates.● Consistency:

● Per-document strict consistency.● Scalability:

● Master-based.

Page 29: Scalable Databases - From Relational Databases To Polyglot Persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

NOSQL Products - Voldemort● Voldemort (http://project-voldemort.com)● Data Model:

● Key-Value.● Data Processing:

● None.● Consistency:

● Eventual consistency.● Scalability:

● Peer-to-peer, gossip based.

Page 30: Scalable Databases - From Relational Databases To Polyglot Persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

NOSQL Products and Use Cases

Page 31: Scalable Databases - From Relational Databases To Polyglot Persistence

Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010

Final words● A New World.

● New paradigms.● New use cases.● New products.

● Don't dismiss the old stuff.● Relational databases still have their place.

● Embrace change.● May the NOSQL power be with you.

● Let the Polyglot Persistence era begin!