nosql

58
DISIM - University of L’Aquila Ivano Malavolta [email protected] http://www.di.univaq.it/malavolta

Upload: ivano-malavolta

Post on 07-May-2015

1.472 views

Category:

Technology


1 download

DESCRIPTION

NoSQL: Overview of the main features and approaches This presentation has been developed in the context of the Databases course at the DISIM Department of the University of L’Aquila (Italy). http://www.di.univaq.it/malavolta

TRANSCRIPT

Page 1: NoSQL

DISIM - University of L’Aquila

Ivano Malavolta

[email protected]

http://www.di.univaq.it/malavolta

Page 2: NoSQL

DISIM - University of L’Aquila

Why, When, Who NOSQL (now)?

The CAP Theorem

NOSQL Approaches

Case Study 1: Instagram

Case Study 2: Twitter

Case Study 3: tumblr

Summary

References

Page 3: NoSQL

DISIM - University of L’Aquila

ACID

Atomicity

Consistency

Isolation

Durability

Based on Relational Algebra

Select, Projection, Set Operators, Renaming, Joins

Concept of Schema

Standard

Page 4: NoSQL

DISIM - University of L’Aquila

The term was coined in 2009 by Eric Evans,

Software Developer at Apache Software Foundation

Class of non-relational data storage systems

Usually do not require a fixed schema

Many NoSQL offerings relax one or more of the ACID properties

Page 5: NoSQL

DISIM - University of L’Aquila

Page 6: NoSQL

DISIM - University of L’Aquila

No to SQL …we are not against SQL!

Not only SQL It’s about recognizing that for some problems other storage solutions are better suited!

http://goo.gl/gWIoy

Page 7: NoSQL

DISIM - University of L’Aquila

Each NOSQL approach addresses some

limitations of relational databases, like:

• horizontal scalability

• read/write performance

• schema limitations

• difficult query patterns

• parallel data processing

• etc.

reason about sharding and master-slave

replicas

Page 8: NoSQL

DISIM - University of L’Aquila

Massive read/write performance

usually fast key-value access

High Availability

Data can be stored in multiple nodes data can be partitioned

Helps in avoiding a single point of failure fault-tolerance

http://goo.gl/PVpoh

http://goo.gl/DAxmN

Page 9: NoSQL

DISIM - University of L’Aquila

Flexible schema and data types

easy to develop the application layer

(JSON, HTTP access, JS functions, etc.)

Ease of maintenance, administration

many vendors are spending a lot of effort on ease of use, minimal administration, and automated operations

Promotes parallel computing

tremendously performant!

see Map-Reduce

http://goo.gl/PVpoh http://goo.gl/DAxmN

Page 10: NoSQL

DISIM - University of L’Aquila

Supporting large data sets with room to grow

thanks to partitioning, data structures and dedicated algorithms

Tunable for deployment size or functionality

can be used for either medium to large datasets both in terms of size and complexity

CHEAP (open-source)

http://goo.gl/PVpoh

http://goo.gl/DAxmN

Page 11: NoSQL

DISIM - University of L’Aquila

What are we giving up?

• joins

• group by

• order by

• indexes

• ACID transactions

• complex relationships

• powerful and standard query language (SQL)

• data independence (mainly for data integrity)

• maturity

http://goo.gl/PVpoh

some NOSQL approaches provide some (but not

all) features listed here

Page 12: NoSQL

DISIM - University of L’Aquila

– Storage of large amount of non-transactional data • log analysis, web statistics, etc.

– Caching results from slower databases (see Twitter)

– Data denormalization of expensive join queries

– Manage data that is not easily analyzed in a RDBMS such as time-or location-based data

– Real-time systems • games, financial data, chats, etc.

Do you have somewhere a large set of uncontrolled, unstructured, data

that you are trying to fit into a RDBMS?

Page 13: NoSQL

DISIM - University of L’Aquila Slide curtesy of Tobias Lindaaker http://www.thobe.org/

Page 14: NoSQL

DISIM - University of L’Aquila Slide curtesy of Tobias Lindaaker http://www.thobe.org/

Page 15: NoSQL

DISIM - University of L’Aquila Slide curtesy of Tobias Lindaaker http://www.thobe.org/

Page 16: NoSQL

DISIM - University of L’Aquila Slide curtesy of Tobias Lindaaker http://www.thobe.org/

Page 17: NoSQL

DISIM - University of L’Aquila Slide curtesy of Tobias Lindaaker http://www.thobe.org/

Page 18: NoSQL

DISIM - University of L’Aquila Slide curtesy of Tobias Lindaaker http://www.thobe.org/

Page 19: NoSQL

DISIM - University of L’Aquila Slide curtesy of Tobias Lindaaker http://www.thobe.org/

Page 20: NoSQL

DISIM - University of L’Aquila Slide curtesy of Tobias Lindaaker http://www.thobe.org/

Page 21: NoSQL

DISIM - University of L’Aquila Slide curtesy of Tobias Lindaaker http://www.thobe.org/

Page 22: NoSQL

DISIM - University of L’Aquila

Why, When, Who NOSQL (now)?

The CAP Theorem

NOSQL Approaches

Case Study 1: Instagram

Case Study 2: Twitter

Case Study 3: tumblr

Summary

References

Page 23: NoSQL

DISIM - University of L’Aquila

CAP Theorem

formulated by scientist Eric Brewer in 2000

It is impossible for a distributed computer system to

simultaneously provide all three of the following guarantees:

• Consistency: each client always has the same view of the data

• Availability: every received request must result in a response

• Partition Tolerance: every node must respond, even though some messages between the nodes may be lost

Page 24: NoSQL

DISIM - University of L’Aquila

Demonstration...

Page 25: NoSQL

DISIM - University of L’Aquila

To scale out, you have to partition

you have to choose between consistency or availability

Consistency Availability

Partition Tolerance

CP AP

CA

Page 26: NoSQL

DISIM - University of L’Aquila

Consistency model weaker than

= Basically Available, Soft state, Eventual consistency

ACID

BASE

If a node fails, part of the data

will not be available, but the entire data layer stays operational

The state of the system may change

over time, even without input

The system becomes consistent at some later time

Atomicity Consistency Isolation Durability

http://queue.acm.org/detail.cfm?id=1394128

Page 27: NoSQL

DISIM - University of L’Aquila

BASE example

Page 28: NoSQL

DISIM - University of L’Aquila

Why, When, Who NOSQL (now)?

The CAP Theorem

NOSQL Approaches

Case Study 1: Instagram

Case Study 2: Twitter

Case Study 3: tumblr

Summary

References

Page 29: NoSQL

DISIM - University of L’Aquila

Four genres of NOSQL databases:

key value

Key-value

Columnar

key

Document

Graph

Page 30: NoSQL

DISIM - University of L’Aquila

Here the focus is on SCALABILITY

designed to handle massive load

stores a collection of Key-Value pairs

think absout maps or (associative arrays) in classical programming languages

http://goo.gl/LfG1N

KEY= string value

VALUE= any kind of element such as strings, videos, XML files, etc.

Key Namespaces to avoid collisions

Implementations:

Riak Redis

Voldemort Dynamo

Page 31: NoSQL

DISIM - University of L’Aquila

PROS • easy to use • extreme performance • no need to maintain indices • large horizontal data CONS • no complex queries (no SQL) • no transactions

– actually REDIS has transactions

• many data structures cannot be easily modeled as key-value pairs • must fit in memory

http://goo.gl/PGfjU

Page 32: NoSQL

DISIM - University of L’Aquila

• Stock prices

• Analytics

• Real-time data collection

• Real-time communication

• User sessions storage

• Caching Data from other DBs

SEE CASE STUDIES LATER IN THIS LECTURE

Page 33: NoSQL

DISIM - University of L’Aquila

Midway between relational and KV stores

Values are queried by matching keys like relational DBs, their values are groups of zero or more columns

Differently from relational DBs, data from a given column is stored together

adding columns is quite inexpensive

Each row can have a different set of columns, or none at all this allows tables to remain sparse without additional storage cost for null values

Implementations:

HBase BigTable

Cassandra Vertica

Page 34: NoSQL

DISIM - University of L’Aquila

PROS

• Easy to Distribute Tasks

• Solving ‘Big Data’ issues

• High Availability

• Garbage collection for expired data

• Scanning is very easy

CONS

• De-normalization

• Expensive to insert

• Requires heavy pre-planning of queries

Page 35: NoSQL

DISIM - University of L’Aquila

• Search engines

• Logging

• Analysing log data

• When you need to scan huge, two-dimensional, join-less tables

• Banking (consistency enforcement)

• Many implementations provide versioning facilities

• in Cassandra writing is faster than reading values (!)

SEE CASE STUDIES LATER IN THIS LECTURE

Page 36: NoSQL

DISIM - University of L’Aquila

Super-set of key-value DBs, you can query also on the value part

the document portion is structured

Think about documents as tuples with any number of fields (JSON)

Documents can contain nested structures

Documents are often versioned

Different document databases take different approaches for indexing, querying, replication, consistency, etc.

choose wisely!

Implementations:

MongoDB CouchDB RavenDB

Page 37: NoSQL

DISIM - University of L’Aquila

PROS

• Variable data

• Object Oriented Paradigms

• Concurrency

• Works well with de-normalized data

CONS

• Hard to do complex queries

• No Joins

• Enforcing Structured Data

Page 38: NoSQL

DISIM - University of L’Aquila

• When you don’t know in advance what exactly your data will look like

• They map well to object-oriented programming models

• For accumulating, occasionally changing data, on which pre-defined queries are to be run

• Places where versioning is important

• Services that handle age difference, geographic location, tastes and dislikes, etc.

• A leaderboard system that depends on many variables

SEE CASE STUDIES LATER IN THIS LECTURE

Page 39: NoSQL

DISIM - University of L’Aquila

Focus on modeling the structure of data & interconnectivity

Inspired by mathematical Graph Theory ( G=(E,V) )

Data model is the Property Graph:

• Entities are nodes

• Relationships are edges between Nodes

• Key-Value pairs on both

Excels in dealing with highly interconnected data Relational DBs can model graphs, but an edge requires a join which is expensive

Implementations:

Neo4J OrientDB FlockDB Trinity

B

D

A

E

C e

a

c

b

d

Page 40: NoSQL

DISIM - University of L’Aquila

Page 41: NoSQL

DISIM - University of L’Aquila

PROS

• Easy match with the problem domain – with relational, you have to create ER diagram, then normalize, etc.

• ability to quickly traverse nodes and relationships to find relevant data – you can apply the Dijstra algorithm for querying the DB

• Fit well with object-oriented concepts

• Neo4J has full ACID conformity

CONS

• generally not suitable for network partitioning – due to the high interconnectedness

• No Joins

• Enforcing Structured Data

Page 42: NoSQL

DISIM - University of L’Aquila

• Social networks

• Recommendation engines

• Geographic data

• Public transport links

• Road maps

• Network topologies

SEE CASE STUDIES LATER IN THIS LECTURE

Page 43: NoSQL

DISIM - University of L’Aquila

Page 44: NoSQL

DISIM - University of L’Aquila

Page 45: NoSQL

DISIM - University of L’Aquila http://goo.gl/0JoW8

Page 46: NoSQL

DISIM - University of L’Aquila

Why, When, Who NOSQL (now)?

The CAP Theorem

NOSQL Approaches

Case Study 1: Instagram

Case Study 2: Twitter

Case Study 3: tumblr

Summary

References

Page 47: NoSQL

DISIM - University of L’Aquila http://goo.gl/xpPac

Page 48: NoSQL

DISIM - University of L’Aquila http://goo.gl/xpPac

http://goo.gl/mkfQN

key-value

key-value (in the cloud)

relational

Page 49: NoSQL

DISIM - University of L’Aquila

Page 50: NoSQL

DISIM - University of L’Aquila http://goo.gl/2kdvm

key-value

graph

columnar

plus Blobstore!

Page 51: NoSQL

DISIM - University of L’Aquila

http://goo.gl/CrC0P

Page 52: NoSQL

DISIM - University of L’Aquila http://goo.gl/CrC0P

key-value

columnar

relational

Page 53: NoSQL

DISIM - University of L’Aquila

Why, When, Who NOSQL (now)?

The CAP Theorem

NOSQL Approaches

Case Study 1: Instagram

Case Study 2: Twitter

Case Study 3: tumblr

Summary

References

Page 54: NoSQL

DISIM - University of L’Aquila

SCALABILITY - SCALABILITY – SCALABILITY

SCALABILITY - SCALABILITY - SCALABILITY

SCALABILITY - SCALABILITY – SCALABILITY

...usually at the cost of consistency

NOSQL is not the silver bullet for everything

Polyglot data is the new main trend...

...in 10 years the majority of the IT solutions still based

on RDBMS

both to size and complexity

Page 55: NoSQL

DISIM - University of L’Aquila

Page 56: NoSQL

DISIM - University of L’Aquila

simply drop a line to

[email protected]

Page 57: NoSQL

DISIM - University of L’Aquila

Chapters 1 and 9

http://goo.gl/ThO63

http://nosql-database.org/

check out my blog for these slides

www.ivanomalavolta.com

Page 58: NoSQL

DISIM - University of L’Aquila

Neo4j - http://neo4j.org

OrientDB – http://www.orientdb.org

VoltDB – http://www.voltdb.com

CouchDB - http://couchdb.apache.org

Cassandra - http://cassandra.apache.org

Riak – http://www.basho.com

Hbase – http: //hbase.apache.org

MongoDB - http://www.mongodb.org

Redis - http://code.google.com/p/redis

Oracle Berkley DB - http://www.oracle.com/database/berkeley-db

FlockDB - http://github.com/twitter/flockdb