triple stores in a nutshell - ktikti.tugraz.at/.../2012/06/...in_a_nutshell-public.pdf · triple...

Knowledge Technologies Institute

1

Triple Stores in a Nutshell

Franjo Bratić

Alfred Wertner


2

Overview

What are essential characteristics of a Triple Store? – short introduction

– examples and background information

“The Agony of choice” - what’s on the market? which one fits for me?

- Few examples

Benchmark - Example

Live Demo With AllegroGraph

Import data

Use Java Client API and run some queries


3

Motivation

RDF is good in modeling assertions RDF consists of assertions

Aka Triples

Application developers need tools which can manage

RDF data Import/Export

Query

Update

…

http://www.franz.com/agraph/support/documentation/current/agraph-introduction.html


4

Triple Stores: Essentials

Triple Stores are tools for RDF Data Management

Essential characteristics:

Persist RDF Data – Native Storage Design (Graph Database)

– Use Relational Database

Query and update the graph Support SPARQL


5

Persist RDF Data: Native Store

Designed for storing graphs

Block diagram of a native store implementation

http://www.franz.com/agraph/support/documentation/current/agraph-introduction.html


6

Persist RDF Data: Quads

A quad extends a triple with context information Fast retrieval of triples

Supported by many Triple Stores

Is not part of RDF!

“Get everything about Chuck’s home page”

Subject Predicate Object Context

Ground Chuck Type Human Chuck‘s home

page

Angel petOf Ground Chuck Chuck‘s home

page

petOf inverseOf hasPet English grammar

Dog subClassOf Mammal science


7

Persist RDF Data: Rdbms

Stores triples with a relational database

Can you imagine of a simple solution how to achieve

that?


8

Triple Stores: Essentials

Triple Stores are tools for RDF Data Management

Essential characteristics:

Persist RDF Data – Native Storage Design (Graph Database)

– Use Relational Database

Query and update the graph Support SPARQL


9

Query and update the Graph: SPARQL

SPARQL Query Language support SPARQL Protocol

SPARQL Query Language

SPARQL Protocol Query and update operations based on HTTP

Between client and SPARQL endpoint

SPARQL Query Language Queries: SELECT, ASK, DESCRIBE, CONSTRUCT

Updates: INSERT, DELETE


10

Triple Stores …


11

The agony of choice …

Are there differences?

Is one of them „the right one“?

How to choose one for the project? - Requirements / criteria?

- Environment of use?

- Performance?

- Costs?

- …


12

Set some criteria …

Scalability - Persistent stores better than in-memory stores

Interoperability & portability - Programming language !!!

- commit to use entire stack of a store

Optimization - native stores vs. 3rd party stores

License, Support, Community, …

… only a few left!


13

AllegroGraph v4.9

load, store, query RDF data

includes an implementation of Prolog

runs natively on Linux x86-64 bit

Interfaces: Java, Python, Ruby, Perl, C#, Clojure, Common Lisp

Tools: AGWebView, Gruff, …

License: Free < 50 Million Triples


14

AllegroGraph v4.9

http://www.franz.com/agraph/allegrograph/ag_client-server_arch_4.2.2.png






15

OpenLink Virtuoso v6.2

high-performance object-relational SQL database

written in C

distributions for Unix & Windows

Access through:

Jena & Sesame

Tools: ISQL, Graphical Conductor

License: GPL v2 & commercial


16

OpenLink Virtuoso v6.2

http://virtuoso.openlinksw.com/images/varch625.jpg




17

Jena

Java based Open Source Framework

represents RDF Graphs as native models: - In-memory

- other data sources (file, database)

Framework includes: - RDF – API

- Reading and writing RDF in RDF/XML, N3 and N-Triples

- OWL – API

- In-memory and persistent storage SPARQL query engine

- Rule-based inference engine

- Query engine with SPARQL specification


18

Jena TDB

high performance, pure-Java

non-SQL storage subsystem

persistent graph storage layer for Jena

works with Jena SPARQL query engine (ARQ)

number of extensions (e.g. property functions, aggregates, arbitrary length property paths)

custom implementation of B+Tree-s

License: BSD-License


19

Jena SDB

basically is a Java Loader

Multiple stores supported - e.g. MySQL, PostgreSQL, Oracle, DB2, Apache Derby, …

provides for: - scalable storage & query of RDF datasets

using conventional SQL databases

database tools for - load balancing, security, clustering

- backup and administration can all be used to

manage the installation

designed specifically to support SPARQL


20

Sesame

framework for processing RDF data

- parsing, storing, inference & querying

on top of a variety of storage systems - relational db-s, in-memory, file systems, keyword indexers, …

large scale of tools - HTTP, SOAP, RMI access

supports 100% SPARQL (since 2008)

supports main RDF file formats: - RDF/XML, Turtle, N-Triples, TriG & TriX, …


21

Sesame

as Java Servlet Application

in Apache Tomcat

communicate over

HTTP

http://www.openrdf.org/doc/sesame/users/figures/sesame-server.png






22

Sesame

Sesame‘s

overall

architecture

http://www.openrdf.org/doc/sesame/users/figures/sesame-arch.png






23

Benchmark

What data to be used? - Lehigh University Benchmark (LUBM)

- 14 test queries

- Berlin SPARQL Benchmark (BSBM)

- 12 test queries

- „real-world“ data

- e.g. DBPedia, WordNet, …

Who is testing? - no central institution

- tests (mostly) only by creator manipulated

Testing architecture?


24

Benchmark

In almost all not considered - RDFS reasoning

- SPARQL 1.1

- Heavy load

- multiple queries in parallel

Conclusion of every benchmark in advance:

NO store wins in every field!!!


25

Benchmark example

“Yet Another Triple Store Benchmark”

http://mt.inf.tu-dresden.de/forschung/topics/bm/

Machine Hardware – CPU: Intel® Xeon® CPU X5660 @ 2.80GHz x 4

– RAM: 16 GB

– Harddisk: 1 x 34 GB, 1 x 42 GB

Software – OS: Ubuntu 12.04 LTS / 64 Bit

– JRE: JDK 1.7.0_04

– Apache Tomcat Ver. 7.0.28







26

Benchmark example stores

Fuseki (Jena TDB SPARQL Server) ver. 0.2.3 - TDB Loader of Jena TDB 0.9.0

NanoSPARQLServer of bigdata ver. 1.2.0 - deployed on a tomcat server

OWLIM LITE ver. 5.0.5001 - via Sesame 2.6.5 deployed on a tomcat server

OpenLink Virtuoso Ver. 6.01.3127


27

Benchmark example dataset

NYTimes Jamendo Movie DB

Yago 2 Core

N-Triple Datasize (MByte) 56.2 151.0 891.6 5,427.2

Triple (Mio) 0.35 1.05 6.15 35.43

Instances (k) 13.2 290.4 665.4 2,648.4

Classes 19 21 53 292,861

Properties 69 47 222 93


28

Benchmark example queries

Query 1-6 - generic queries

- same for each dataset

Query 7-13 - SPARQL 1.1 Queries specialized for each dataset

Query 14&15: - SPARQL Update queries

- delete and insert some data in the graph


29

Load Time Result

http://mt.inf.tu-dresden.de/forschung/topics/bm/loading.pdf






30

Load Time Result







31

Memory requirement

http://mt.inf.tu-dresden.de/forschung/topics/bm/memory.pdf






32

Memory requirement







33 http://mt.inf.tu-dresden.de/forschung/topics/bm/queries_no_inf.pdf

http://mt.inf.tu-dresden.de/forschung/topics/bm/queries_no_inf.pdf





34







35

Triple Store

DEMO!!!

triple stores in a nutshell - ktikti.tugraz.at/.../2012/06/...in_a_nutshell-public.pdf · triple...

Documents