enterprise nosql: silver bullet or poison pill

Post on 15-Nov-2014

2.700 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

This is a slightly revised version of the keynote I gave for the first time at StrangeLoop 2010. It tries to shows the pros and cons of NoSQL versus SQL and highlight whats easy and not so easy to do so people have a better understanding of typical NoSQL type products.

TRANSCRIPT

Enterprise NoSQLSilver bullet or poison pill?

@billynewportIBM distinguished engineer

NoSQL = not only SQLRumors of the SQL DBMS demise are greatly exaggerated.

SQL databases will be around for a long, long time.

However...

Nosql offers additional competing and/or complementary technologies for storing data in different organizations than traditional SQL albeit with different sets of pros and cons.

Agenda

• Discuss the SQL mindset

• Discuss the NoSQL mindset and contrast

SQL benefits

Centralized schema managed by DBA.

Relatively static schema

Easy Ad hoc query support

Normalized Data

SQL Benefits

Relationships through joins

Easy indexing

No consistency issues, one copy/system of record

No need to partition data model.

SQL means domain centric

• Think about the data, find the nouns

• Nouns become tables

• identify attributes/keys

• normalize the tables to Nth normal form…

Domain centric• Use SQL to ask any question

• Use indexes to speed up SQL queries.

• Think Data Model first, worry about questions/access patterns later.

SQL Eco system

• Standards for programming (SQL/JDBC/ODBC/ESQL)

• Easy to port applications between different SQL databases using the right standard.

• IDE support

Ecosystem

• Availability of reporting tools.• Availability of ETL

(extract/transform/load) tools.• SQL centric brainwashing occurs

from a young age in engineers.

SQL implementations

However, these choices lead to vertical single machine implementations

Or at best, shared everything, limited scale out implementations on exotic (expensive) hardware.

But• A machine with:

• dual-socket Intel multi-core

• 256GB memory

• SSD storage

can likely run >90% of all the SQL databases out there really FAST.

Types of nosql

Static Key value store (memcache)

DataGrid KV store (IBM WebSphere eXtreme Scale, Oracle Coherence, Gigaspaces)

Row oriented Sparse column store (Cassandra, HBase, ...)

Remote shared memory (Terracotta, IBM Cluster Accelerator, IBM 390 Coupling Facility)

Key document store (MongoDB)

Network store (Neo4J)

BUT, as Spiderman said: With great power

comes GREAT RESPONSIBILITY!

NoSQL solutions typically relax some of the established constraints in return for implementation flexibility for certain solutions difficult to implement with SQL.

NoSQL means choices

Relax constraints for flexibility

Relaxing some of these choices leads to different possible store implementation strategies.

Simplest is shared key/blob store

Partitioning the data model leads to sharding and linear scale out but:

No cross shard query support

No cheap global indexes

No joins across shards

Pro/con

• This may allow linear scaling

• This may allow fast relationship traversal

• This may allow more flexible schemas

• This may allow more consistency choices

• But, you must make trade offs to get here

This is not obvious at all to most people!

Question Centric• NoSQL seems to start with the

questions rather than the data.

• Once we know the questions then we can layout the data using some partitioned model.

• We can now scale it out and all is good

• What could you do if scale wasn’t an issue?

Question Centric

Ask a different question maybe?

Issues

• The new questions may require a different partitioning schema to be efficient.

• Now it doesn’t scale at all.

• Repartitioning is extremely hard.

• Offline questions can be solved with map/reduce or similar batch approaches with maybe a copy of the data.

Multiple clusters

• You can try storing the data partitioned different ways in different NoSQL clusters.

• Pick the cluster you want depending on the question.

Multiple clusters• Data consistency?

• You better not have a lot of questions because this gets expensive fast.

• Lots of online different questions don’t suit sharded NoSQL.

Don’t normalize• You can’t easily do joins with nosql.

• This means you want to denormalize and keep the needed data in the rows even if this means duplicating it.

• Remember, storage/DASD is super cheap in a scale out model.

• Consistency?

System of Record (SOR)

• SQL means DBMS is the System of Record

• People are used to this.

• It’s the first problem implementing any kind of cache on top of a DBMS.

• How do I keep the cache in sync with the database?

NoSQL SOR• Usually in a NOSQL world, multiple

system of records are NORMAL.

• The application defines consistency rules and just gets on with it.

• Inconsistency is handled with a business process of some kind.

• This is a big mind shift for normal SQL programmers…

Benefits of multiple SOR

• You can SCALE!

• No concurrency bottlenecks

• You can locate data sets around the planet and use the closest one.

• More highly available as there are multiple copies and replicas are typically multi-master.

Drawbacks of multiple SOR

• Consistency is a problem.

• Conflicts need to be reconciled.

• Most products only have rudimentary support for this:

• Imagine bank balances using last write wins…

• But, even with bank balances, inconsistencies can be handled correctly.

OperationsSQL

Insert

Select

Update

Delete

NoSQL

Put

Retrieve by key

Delete

Complex Search typically means map/reduce…

Search -> Retrieve• For online queries, try to convert every

search to a retrieve operation.• Cache query results

• Precalculate every possible query

• Maintain these query caches

• In other words, use some kind of global index for simple search but maintaining it may be expensive.

• Joins/Group By/Limit and so on are more difficult

Table scans• Multi-machine table scans won’t work for

anything online.

• Google doesn’t map/reduce for every google search!

• Offline complex queries can be done using Map/Reduce

• You need to write code for most complex searches!

Transaction integrity

Used to be just ‘normal’ transactions

Not any more. Not all transactions are equal.

Synchronous versus write behind.

Chained or asynchronous versus 2pc

Schema

Does the store understand the schema?

Is a row just a blob or does it have shape?

Is the schema an application only idea?

DBAs or app developers own the schema?

Can application developers be trusted?

Skill levelMore flexibility and application control

Typically means higher skill level on the development side

Single app company means highly skilled team.

Multiapp company means less highly skilled teams.

Law of big numbers at work. The fewer developers, usually more chance of high skill level.

Thank you

Driving a race car under control is fun!

Being a passenger in an unguided missile is not!

Go in with your eyes open!

Thank you

@billynewport

top related