ikt437 knowledge engineering and...

32
IKT437 Knowledge Engineering and Representation NoSQL ~ No SQL or Not Only SQL Jan Pettersen Nytun, UiA

Upload: others

Post on 27-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

IKT437

Knowledge Engineering and Representation

NoSQL ~ No SQL or Not Only SQL

Jan Pettersen Nytun, UiA

Page 2: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

Overview

• Introduction and Motivation

• Types of NoSQL

• What's the most popular NoSQL database?

• Example – The Graph Database Neo4J

2

Page 3: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

Some Possible CharacteristicsAll characteristics may not be supported

• Non-relational

• Flexible schema

• Other or additional query languages than SQL

• Distributed – horizontal scaling

• Less structured data

• Supports big data

3

NOSQL – Comes in many different variants

Page 4: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

The Benefits of NoSQL[https://www.mongodb.com/nosql-explained]

When compared to relational databases, NoSQL databases are

more scalable and provide superior performance, and their data

model addresses several issues that the relational model is not

designed to address:

• Geographically distributed architecture instead of expensive,

monolithic architecture

• Large volumes of rapidly changing structured, semi-

structured, and unstructured data

• Agile sprints, quick schema iteration, and frequent code

pushes

• Object-oriented programming that is easy to use and flexible 4

Page 5: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

5

[ref: http://www.cs.tut.fi/~tjm/seminars/nosql2012/NoSQL-Intro.pdf]

Page 6: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

Overview

• Introduction and Motivation

• Types of NoSQL

• What's the most popular NoSQL database?

• Example – The Graph Database Neo4J

6

Page 7: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

NoSQL Database Types[https://www.mongodb.com/nosql-explained]

• Graph stores are used to store information about networks of data, such as

social connections. Graph stores include Neo4J and triple stores like Fuseki.

• Document databases pair each key with a complex data structure known as a

document.

• Key-value stores are the simplest NoSQL databases. Every single item in the

database is stored as an attribute name (or 'key'), together with its value.

Examples of key-value stores are Riak and Berkeley DB.

• Wide-column stores such as Cassandra and HBase are optimized for queries

over large datasets, and store columns of data together, instead of rows.

7

Page 8: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

Document Store

• The central concept is the notion of a "document“ which corresponds to a row

in RDBMS.

• A document comes in some standard formats like JSON (BSON).

• Documents are addressed in the database via a unique key that represents

that document.

• The database offers an API or query language that retrieves documents based

on their contents.

• Documents are schema free, i.e., different documents can have structures and

schema that differ from one another. (An RDBMS requires that each row

contain the same columns.)

8

Page 9: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

MongoDB to documents (JSON):

{

_id: ObjectId("51156a1e056d6f966f268f81"),

type: "Article",

author: "Derick Rethans",

title: "Introduction to Document Databases with MongoDB",

date: ISODate("2013-04-24T16:26:31.911Z"),

body: "This arti…"

},

{

_id: ObjectId("51156a1e056d6f966f268f82"),

type: "Book",

author: "Derick Rethans",

title: "php|architect's Guide to Date and Time Programming with PHP",

isbn: "978-0-9738621-5-7"

}9

Page 10: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

Overview

• Introduction and Motivation

• Types of NoSQL

• What's the most popular NoSQL database?

• Example – The Graph Database Neo4J

10

Page 11: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

What's the most popular NoSQL database?[https://www.quora.com/Whats-the-most-popular-NoSQL-database]

Vadim Ismakaev, Co-Founder at GraceUpdated Apr 27, 2015

• Asking “what NoSQL database is the most popular” is a bit

incorrect since different problems require different types of

NoSQL solutions. …focus on solving very specific problems.

While this allows to achieve the best possible results in those

specific cases, it comes at a cost of some other functionalities.

11

Page 12: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

So - what's the most popular NoSQL database?

Top NoSQL Database Enginesby

http://www.kdnuggets.com/2016/06/top-nosql-database-engines.html

Next Two Slides:

12

Page 13: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

We measure the popularity of a system by using the following parameters:

• Number of mentions of the system on websites, …

• General interest in the system. For this measurement, we use the frequency

of searches in Google Trends.

• Frequency of technical discussions about the system... Stack Overflow …

• Number of job offers, in which the system is mentioned...

• Number of profiles in professional networks, in which the system is

mentioned... LinkedIn …

• Relevance in social networks. We count the number of Twitter tweets, in

which the system is mentioned.

13

Method of calculating the scores of the DB-Engines Ranking

[http://db-engines.com/en/ranking_definition]

Page 14: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

The DB-Engines Ranking ranks database management systems according to their popularity – not only NOSQL databases

14

[http://db-engines.com/en/ranking_trend]

Page 15: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

15

Document databases: MongoDBWide-column stores: Cassandra and Hbasekey-value: RedisGraph database: Neo4j

[http://www.kdnuggets.com/2016/06/top-nosql-database-engines.html]

Page 16: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

Overview

• Introduction and Motivation

• Types of NoSQL

• What's the most popular NoSQL database?

• Example – The Graph Database Neo4J

16

Page 17: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

Neo4J

• Graph-oriented

• Implemented in Java and accessible from software written in other languages using the Cypher

query language through a transactional HTTP endpoint.

• ACID-compliant transactional database with native graph storage and processing.

• The most popular graph database.

• Everything is stored as an edge, a node or an attribute.

• Each node and edge can have any number of attributes.

• Both the nodes and edges can be labelled.

• Labels can be used to narrow searches.

17

Page 18: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

Following Slides are copied from a presentation made by

Jim Webber

Neo4J

Page 19: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

stolefrom

lovesloves

enemy

enemy

A Good Man Goes to War

appeared in

appeared in

appeared in

appeared in

Victory of the Daleks

appeared in

appeared in

companion

companion

enemy

Page 20: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

Property Graph Model

Page 21: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

Property Graph Model

Page 22: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

Property Graph Model

name: the Doctor

age: 907

species: Time Lord

first name: Rose

late name: Tyler

vehicle: tardis

model: Type 40

Page 23: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

Graphs are very whiteboard-friendly

Page 24: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

What’s Neo4j?

• It’s is a Graph Database

• Embeddable and server

• Full ACID transactions

– don’t mess around with durability, ever.

• Schema free

Page 25: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

More on Neo4j

• Neo4j is stable

– In 24/7 operation since 2003

• Neo4j is under active development

• High performance graph operations

– Traverses 1,000,000+ relationships / second on commodity hardware

Page 26: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

Core API

Neo4j Logical Architecture

REST APIJVM Language Bindings

Traversal Framework

Caches

Memory-Mapped (N)IO

Filesystem

Java Ruby Clojure…

Graph Matching

Page 27: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

Neo4J – Application Programming

• Through the Java APIs

– JVM languages have bindings to the same APIs

• JRuby, Jython, Clojure, Scala…

• Managing nodes and relationships

• Indexing

• Traversing

• Path finding

• Pattern matching

Page 28: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

Core API

• Deals with graphs in terms of their fundamentals:– Nodes

• Properties– KV Pairs

– Relationships• Start node

• End node

• Properties– KV Pairs

Page 29: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

Creating Nodes

GraphDatabaseService db = new

EmbeddedGraphDatabase("/tmp/neo");

Transaction tx = db.beginTx();

try {

Node theDoctor = db.createNode();

theDoctor.setProperty("character", "the

Doctor");

tx.success();

} finally {

tx.finish();

}

Page 30: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

Creating Relationships

Transaction tx = db.beginTx();

try {

Node theDoctor = db.createNode();

theDoctor.setProperty("character", "The Doctor");

Node susan = db.createNode();

susan.setProperty("firstname", "Susan");

susan.setProperty("lastname", "Campbell");

susan.createRelationshipTo(theDoctor,

DynamicRelationshipType.withName("COMPANION_OF"));

tx.success();

} finally {

tx.finish();

}

Page 31: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

Indexing a Graph?

• Graphs are their own indexes!

• But sometimes we want short-cuts to well-known nodes

• Can do this in our own code

– Just keep a reference to any interesting nodes

Page 32: IKT437 Knowledge Engineering and Representationgrimstad.uia.no/janpn/IKT437/2016/slides/pdf/NoSQL-2016.pdf · Document Store •The central concept is the notion of a "document“

Why graph matching?

• It’s super-powerful for looking for patterns in a data set

– E.g. retail analytics

• Higher-level abstraction than raw traversers

– You do less work!