big data document and graph d bs - couch-db and orientdb

Post on 22-Jan-2017

330 Views

Category:

Software

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Big Data

NoSQL Database Types: episode II

Content

▪ Document Store▪ Graph DB

Graph

Graph DB

▪ Why Graph DB▪ OrientDB▪ OrientDB vs Neo4J

Graph DB: Why

Long time around

In some form

Graph DB: Why

Graph DB: Why

Can it handle complexity?

▪ Key/Value ▪ Column Store▪ Document Store

can not handle relations

▪ Graph Database !

Graph DB: Why

Graph DB: RDBMS relations

Customer Address

Graph DB: 1 to 1

Customer Address

id address

2 Antwerp

4 Brussels

5 Essen

id name address_id

1 Tom VdB 5

2 Tom C. 4

3 Andriy 2

Graph DB: 1 to N

Customer Address

id address

1 Tom

2 Andriy

3 Jos

id customer location

1 3 Antwerp

2 3 Brussels

3 1 Rome

Graph DB: N to M

Customer CustomerAddress

id address

1 Tom

2 Andriy

3 Jos

customer address

3 1

3 5

2 1

Address

id location

1 Antwerp

5 Brussels

Graph DB: what is wrong

Graph DB: The join

Customer CustomerAddress

id address

1 Tom

2 Andriy

3 Jos

customer address

3 1

3 5

2 1

Address

id location

1 Antwerp

5 Brussels

These joins are all executed everytime you traverse the relationship

Graph DB: what is wrong

Chris De Bruyne
Ik heb de film gezien maar snap niet wat deze reeks van fotos te maken hebben met het onderwerp op de slides? Misschien moet ik nog eens kijken?
Tom Van den Bulck
eerder ter begeleiding van het verhaal in de meeting minutes: 1. de vraag2. de queste - zoektocht3. de oplossing - indexes (zwart ridderke)4 maar die laat zich niet zo gemakkelijk aanpassen

Graph DB: what is wrong

A join means searching for a key in another table

In order to improve performance one adds indexing

But that slows down inserts, updates and deletes

Graph DB: index lookup

A-Z

A-L M-Z

A-L

A-D E-L

M-Z

M-R S-Z

A-D

A-B C-D

E-L

E-G H-L

E-G

E-F G

H-L

H-J K-L

Jos

Jos

Graph DB: index lookup

Now

Imagine

billions of records

Graph DB: index lookup

This join is executed for every involved table multiplied for all scanned records

Graph DB: What about document databases

{ “_id”: 1, “name”: Tom, “address_id”: 4}

Graph DB: Is there a better way

“A graph database is any storage system that provides index-free adjacency “

Marko Rodriguez

“auther of Tinkerpop Blueprints”

Graph DB: Is there a better way

Index free relationshops ?

Graph DB: Back to school

Graph DB: Back to School

Tom Essenlives in

I am a Vertex We are vertices

An Edge

Graph DB: Back to School

Tomfirstname: TomSurname: VdB

Company: Ordina

Essenpopulation:

17000

lives in

since: 1982

Graph DB: Back to School

1 to N relationships

TomEssen

lives insince: 1982

Walked in:when: 1990, 1992

Graph DB: Back to School

Graph Example

Tom

Ordina

isMemberOf

Works For

meetup: bigdata.be

Hosted By

VisitedOffice

Graph DB: Back to school

Congratulations - you are now graduated in graph theory

GraphDB: Index Lookup vs Relations

GraphDB: Index Lookup vs Relations

Graph DB: OrientDB

▪ How does OrientDB manage relationships▪ Some Limits▪ Hybrid▪ Transactions and ACID▪ Create the Graph▪ Query vs Traversal▪ Schema

OrientDB: Manage Relationships

Tom(Vertex)

Essen(Vertex)

Rid: #13.35 Rid: #13.100

Label: “customer”Name: Tom

Label: “city”Name: Essen

OrientDB: Manage Relationships

Tom(Vertex)

Essen(Vertex)

Rid: #13.35 Rid: #13.100

Label: “customer”Name: Tomout: #14.3

Label: “city”Name: Essenin: #14.3

Lives in

Rid: #14.3

Label: “Lives in”In: #13.35Out: #13.100

OrientDB: Some Limits

Databases

Clusters

Records per cluster (Edges, Vertices and Documents)

Records per database

Record Size

Document Properties

Chris De Bruyne
Clusters : multi-master, masterless, master-slave ? Ik zou hierover toch ff uitwijden want de verschillende mogelijkheden zijn
Chris De Bruyne
Wat is een database? de fysieke server of de OrientDb instantie? Waarom zou je 1000 Databases gebruiken (vandaar ook, wat is een database)?
Tom Van den Bulck
database is gelijk ... een database op een oracleop een orientdb instantie kan je net zoals bij een oracle verschillende db's aanmaken - met elk hun eigen schema

OrientDB: Some Limits

Indexes

Queries

Concurrency Level

Chris De Bruyne
exclusive lock op wat? DB, "table", node?
Tom Van den Bulck
eerder dat transacties 1 voor 1 binnenkomen en verwerkt wordenorientdb gebruikt optimistic locking: dus bij write gaat die zien of situatie nu gelijk is aan de situatie die jij had voor dat je je transactie startte

OrientDB: Class - Records - Cluster

OrientDB: Hybrid Model

OrientDB: Transactions and ACID

OrientDB: Transactions and ACID

OrientDB: Transactions and ACID

OrientDB: Create the Graph - SQL

OrientDB: Create the Graph - Java

Chris De Bruyne
Spring-Data voorbeeldje? Of was er enkel ondersteuning voor Neo4J?
Tom Van den Bulck
gewoon simpel voorbeeldje in java - spring data ga ik niet direct gebruiken in workshops. Liever mensen eerst iets uitleggen met de drivers aangeleverd door de makersOrientdb heeft wel een spring data implementatie gemaakt: https://github.com/orientechnologies/spring-data-orientdb

OrientDB: Query vs Traversal

Order 1

Order 2

Order 3

Calendar

Year 2014

Month 12/2014

Day: 1 dec 2014

Day: 6 dec 2014

Special Order Orders

OrientDB: Schema

▪ schema full

▪ schema-mixed

▪ schema-less

OrientDB: Schema Design

Jos

Tom

André

Sends Email to

Sends Email to

OrientDB: Schema Design

Jos

Tom

André

Emailsends

TO

CC

OrientDB: Gremlin

OrientDB: Gremlin

Pipeline of steps▪ transform▪ filter▪ sideEffect▪ branch

OrientDB: Gremlin

OrientDB: Gremlin

Chris De Bruyne
Cooool,

OrientDB: Gremlin

Chris De Bruyne
wat is n1, n2, n3 ?
Tom Van den Bulck
top 3 van graph db's

Graphdb: Use Cases

▪ Recommendation engines

▪ Ranking/Credibility

▪ Path Finding (shortest, longest, mutual friends)

▪ Social (friendship, following, key connectors)

Some code to play with

1. Go to https://github.com/tomvdbulck/orientdb_initiation

2. Make sure the following items have been installed on your machine:

o Java 7 or higher

o Git (if you like a pretty interface to deal with git, try SourceTree)

o Maven

3. Install VirtualBox https://www.virtualbox.org/wiki/Downloads

4. Install Vagrant https://www.vagrantup.com/downloads.html

5. Clone the repository into your workspace

6. Open a command prompt, go to the vagrant folder and run

vagrant up

7. This will start up the vagrant box. The first time will take a while (approx. 5 min) as it has to

download the OS image and other dependencies.

Want More?

Even More?

Upcoming meetup on 17/06 - @ Ordina

1st meetup of Spark Belgum http://www.meetup.com/Spark-Belgium/events/222632697/

Want More?

Upcoming meetup hosted @ordina on wednesday 24/06 - Neo4j

http://www.meetup.com/graphdb-belgium/events/222504421

Even More?

Upcoming workshop on 2/7 - @ Ordina

Introduction to Hadoop and it’s zoo

Questions or Suggestions?

top related