triple stores in a nutshell - ktikti.tugraz.at/.../2012/06/...in_a_nutshell-public.pdf · triple...
TRANSCRIPT
Knowledge Technologies Institute
1
Triple Stores in a Nutshell
Franjo Bratić
Alfred Wertner
Knowledge Technologies Institute
2
Overview
What are essential characteristics of a Triple Store? – short introduction
– examples and background information
“The Agony of choice” - what’s on the market? which one fits for me?
- Few examples
Benchmark - Example
Live Demo With AllegroGraph
Import data
Use Java Client API and run some queries
Knowledge Technologies Institute
3
Motivation
RDF is good in modeling assertions RDF consists of assertions
Aka Triples
Application developers need tools which can manage
RDF data Import/Export
Query
Update
…
http://www.franz.com/agraph/support/documentation/current/agraph-introduction.html
Knowledge Technologies Institute
4
Triple Stores: Essentials
Triple Stores are tools for RDF Data Management
Essential characteristics:
Persist RDF Data – Native Storage Design (Graph Database)
– Use Relational Database
Query and update the graph Support SPARQL
Knowledge Technologies Institute
5
Persist RDF Data: Native Store
Designed for storing graphs
Block diagram of a native store implementation
http://www.franz.com/agraph/support/documentation/current/agraph-introduction.html
Knowledge Technologies Institute
6
Persist RDF Data: Quads
A quad extends a triple with context information Fast retrieval of triples
Supported by many Triple Stores
Is not part of RDF!
“Get everything about Chuck’s home page”
Subject Predicate Object Context
Ground Chuck Type Human Chuck‘s home
page
Angel petOf Ground Chuck Chuck‘s home
page
petOf inverseOf hasPet English grammar
Dog subClassOf Mammal science
Knowledge Technologies Institute
7
Persist RDF Data: Rdbms
Stores triples with a relational database
Can you imagine of a simple solution how to achieve
that?
Knowledge Technologies Institute
8
Triple Stores: Essentials
Triple Stores are tools for RDF Data Management
Essential characteristics:
Persist RDF Data – Native Storage Design (Graph Database)
– Use Relational Database
Query and update the graph Support SPARQL
Knowledge Technologies Institute
9
Query and update the Graph: SPARQL
SPARQL Query Language support SPARQL Protocol
SPARQL Query Language
SPARQL Protocol Query and update operations based on HTTP
Between client and SPARQL endpoint
SPARQL Query Language Queries: SELECT, ASK, DESCRIBE, CONSTRUCT
Updates: INSERT, DELETE
Knowledge Technologies Institute
10
Triple Stores …
Knowledge Technologies Institute
11
The agony of choice …
Are there differences?
Is one of them „the right one“?
How to choose one for the project? - Requirements / criteria?
- Environment of use?
- Performance?
- Costs?
- …
Knowledge Technologies Institute
12
Set some criteria …
Scalability - Persistent stores better than in-memory stores
Interoperability & portability - Programming language !!!
- commit to use entire stack of a store
Optimization - native stores vs. 3rd party stores
License, Support, Community, …
… only a few left!
Knowledge Technologies Institute
13
AllegroGraph v4.9
load, store, query RDF data
includes an implementation of Prolog
runs natively on Linux x86-64 bit
Interfaces: Java, Python, Ruby, Perl, C#, Clojure, Common Lisp
Tools: AGWebView, Gruff, …
License: Free < 50 Million Triples
Knowledge Technologies Institute
14
AllegroGraph v4.9
http://www.franz.com/agraph/allegrograph/ag_client-server_arch_4.2.2.png
Knowledge Technologies Institute
15
OpenLink Virtuoso v6.2
high-performance object-relational SQL database
written in C
distributions for Unix & Windows
Access through:
Jena & Sesame
Tools: ISQL, Graphical Conductor
License: GPL v2 & commercial
Knowledge Technologies Institute
16
OpenLink Virtuoso v6.2
http://virtuoso.openlinksw.com/images/varch625.jpg
Knowledge Technologies Institute
17
Jena
Java based Open Source Framework
represents RDF Graphs as native models: - In-memory
- other data sources (file, database)
Framework includes: - RDF – API
- Reading and writing RDF in RDF/XML, N3 and N-Triples
- OWL – API
- In-memory and persistent storage SPARQL query engine
- Rule-based inference engine
- Query engine with SPARQL specification
Knowledge Technologies Institute
18
Jena TDB
high performance, pure-Java
non-SQL storage subsystem
persistent graph storage layer for Jena
works with Jena SPARQL query engine (ARQ)
number of extensions (e.g. property functions, aggregates, arbitrary length property paths)
custom implementation of B+Tree-s
License: BSD-License
Knowledge Technologies Institute
19
Jena SDB
basically is a Java Loader
Multiple stores supported - e.g. MySQL, PostgreSQL, Oracle, DB2, Apache Derby, …
provides for: - scalable storage & query of RDF datasets
using conventional SQL databases
database tools for - load balancing, security, clustering
- backup and administration can all be used to
manage the installation
designed specifically to support SPARQL
Knowledge Technologies Institute
20
Sesame
framework for processing RDF data
- parsing, storing, inference & querying
on top of a variety of storage systems - relational db-s, in-memory, file systems, keyword indexers, …
large scale of tools - HTTP, SOAP, RMI access
supports 100% SPARQL (since 2008)
supports main RDF file formats: - RDF/XML, Turtle, N-Triples, TriG & TriX, …
Knowledge Technologies Institute
21
Sesame
as Java Servlet Application
in Apache Tomcat
communicate over
HTTP
http://www.openrdf.org/doc/sesame/users/figures/sesame-server.png
Knowledge Technologies Institute
22
Sesame
Sesame‘s
overall
architecture
http://www.openrdf.org/doc/sesame/users/figures/sesame-arch.png
Knowledge Technologies Institute
23
Benchmark
What data to be used? - Lehigh University Benchmark (LUBM)
- 14 test queries
- Berlin SPARQL Benchmark (BSBM)
- 12 test queries
- „real-world“ data
- e.g. DBPedia, WordNet, …
Who is testing? - no central institution
- tests (mostly) only by creator manipulated
Testing architecture?
Knowledge Technologies Institute
24
Benchmark
In almost all not considered - RDFS reasoning
- SPARQL 1.1
- Heavy load
- multiple queries in parallel
Conclusion of every benchmark in advance:
NO store wins in every field!!!
Knowledge Technologies Institute
25
Benchmark example
“Yet Another Triple Store Benchmark”
http://mt.inf.tu-dresden.de/forschung/topics/bm/
Machine Hardware – CPU: Intel® Xeon® CPU X5660 @ 2.80GHz x 4
– RAM: 16 GB
– Harddisk: 1 x 34 GB, 1 x 42 GB
Software – OS: Ubuntu 12.04 LTS / 64 Bit
– JRE: JDK 1.7.0_04
– Apache Tomcat Ver. 7.0.28
Knowledge Technologies Institute
26
Benchmark example stores
Fuseki (Jena TDB SPARQL Server) ver. 0.2.3 - TDB Loader of Jena TDB 0.9.0
NanoSPARQLServer of bigdata ver. 1.2.0 - deployed on a tomcat server
OWLIM LITE ver. 5.0.5001 - via Sesame 2.6.5 deployed on a tomcat server
OpenLink Virtuoso Ver. 6.01.3127
Knowledge Technologies Institute
27
Benchmark example dataset
NYTimes Jamendo Movie DB
Yago 2 Core
N-Triple Datasize (MByte) 56.2 151.0 891.6 5,427.2
Triple (Mio) 0.35 1.05 6.15 35.43
Instances (k) 13.2 290.4 665.4 2,648.4
Classes 19 21 53 292,861
Properties 69 47 222 93
Knowledge Technologies Institute
28
Benchmark example queries
Query 1-6 - generic queries
- same for each dataset
Query 7-13 - SPARQL 1.1 Queries specialized for each dataset
Query 14&15: - SPARQL Update queries
- delete and insert some data in the graph
Knowledge Technologies Institute
29
Load Time Result
http://mt.inf.tu-dresden.de/forschung/topics/bm/loading.pdf
Knowledge Technologies Institute
30
Load Time Result
http://mt.inf.tu-dresden.de/forschung/topics/bm/loading.pdf
Knowledge Technologies Institute
31
Memory requirement
http://mt.inf.tu-dresden.de/forschung/topics/bm/memory.pdf
Knowledge Technologies Institute
32
Memory requirement
http://mt.inf.tu-dresden.de/forschung/topics/bm/memory.pdf
Knowledge Technologies Institute
33 http://mt.inf.tu-dresden.de/forschung/topics/bm/queries_no_inf.pdf
Knowledge Technologies Institute
34
http://mt.inf.tu-dresden.de/forschung/topics/bm/queries_no_inf.pdf
Knowledge Technologies Institute
35
Triple Store
DEMO!!!