graph databases, triple stores and their uses…

44
Graph Databases, Triple Stores and their uses… San Jose, NoSQL, 2012 Jans Aasman CEO Franz Inc

Upload: dinhcong

Post on 03-Jan-2017

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Graph Databases, Triple Stores and their uses…

Graph Databases, Triple Stores and their uses…

San Jose, NoSQL, 2012Jans AasmanCEO Franz Inc

Page 2: Graph Databases, Triple Stores and their uses…

OverviewOverview

• Franz Inc• What is a Graph Database?• An example to start • What is a Triple Store?• Where do people use Graph Databases and Triple Stores?

– Car manufactoring– EPIM: a reporting platform for 31 Oil companiesAmdocs– Amdocs

• Why do people use Graph databases?• How do you get a graph out of your relational databaseHow do you get a graph out of your relational database

Page 3: Graph Databases, Triple Stores and their uses…

Franz Inc – Who We AreFranz Inc  Who We Are

• Private, founded 1984 • an AI and 

Semantic Technology company• Berkeley/OaklandBerkeley/Oakland

Page 4: Graph Databases, Triple Stores and their uses…

Graph database

Page 5: Graph Databases, Triple Stores and their uses…
Page 6: Graph Databases, Triple Stores and their uses…

Wh i h diff bWhat is the difference between a relational database and a graph 

database?

Page 7: Graph Databases, Triple Stores and their uses…
Page 8: Graph Databases, Triple Stores and their uses…
Page 9: Graph Databases, Triple Stores and their uses…

How is it different and why is it fl ibl ?more flexible?

• No Schema. – Say whatever you want to say but

• No Link Tables – because you can do one‐to‐many relationships directly

• No Indexing Choices( )– Can add new data attributes (predicates) on‐the‐fly that 

will be real‐time available for querying, because everything is automatically indexed.y g y

• Takes anything you give it: it is trivial to consume– Rows and columns from RDB, XML, RDF(S), OWL, Text and Extracted Entities

Page 10: Graph Databases, Triple Stores and their uses…

A triple store is a special type of h d bgraph database

• Where nodes and links are (mostly) URIs( y)– subject, predicate, object, [graph]– Persistent URIs make is straight forward to link datasets, create web of data (LOD)

• Based on W3C recommendationsRDFS puts an object layer on top of triples– RDFS puts an object layer on top of triples

– OWL adds first order description logic– SPARQL very close to SQL but focusing on graphsSPARQL very close to SQL but focusing on graphs

• Most graph databases live in memory, triple stores can be bigger than memory, rely more on indexing and query optimizers

Page 11: Graph Databases, Triple Stores and their uses…

Demo ‐ LODDemo  LOD

Page 12: Graph Databases, Triple Stores and their uses…

Facebook, Bing, Google are all building up big proprietarybuilding up big proprietary 

knowledge graphs

Page 13: Graph Databases, Triple Stores and their uses…

The public version:i k dLinked Open Data 

Tim Berners‐Lee outlined four principles of linked data:• Use URIs to identify things.• Use HTTP URIs so that these things can be referred to and 

looked up ("dereferenced") by people and user agentslooked up ( dereferenced ) by people and user agents.• Provide useful information 

about the thing when its URI is dereferenced, using standard formats such as RDF/XML.

• Include links to other, related URIs in the exposed data to improve discovery of p yother related information on the Web.

Page 14: Graph Databases, Triple Stores and their uses…

Oct 2007Oct 2007

Page 15: Graph Databases, Triple Stores and their uses…

LOD cloud – Sept 22 2010LOD cloud  Sept 22 2010

latest LOD cloud

Page 16: Graph Databases, Triple Stores and their uses…

Demo politicsDemo politics

Page 17: Graph Databases, Triple Stores and their uses…

Who uses this in the enterprise?Who uses this in the enterprise?

Page 18: Graph Databases, Triple Stores and their uses…

DoD and Intelligence CCustomers

Page 19: Graph Databases, Triple Stores and their uses…
Page 20: Graph Databases, Triple Stores and their uses…

Enterprise ExperienceEnterprise Experience

• Amdocs: a Telco platform that knows (almost) everthing about every customer in real time.– Saves 20 % on the total cost of a Customer Care Operation

• Car manufactor XWarns early for disruptions in the supply chain– Warns early for disruptions in the supply chain

• EPIM: a reporting platform for 31 oil companies.– Create a flexible unified reporting structure over tens ofCreate a flexible unified reporting structure over tens of different proprietary reporting sources.

Page 21: Graph Databases, Triple Stores and their uses…
Page 22: Graph Databases, Triple Stores and their uses…

Risk in Supply Chain Management:determine potential impact of anydetermine potential impact of any disruptions to the supply chain. 

Page 23: Graph Databases, Triple Stores and their uses…

Questions that an Early Warning h ldSystem should answer:

– which parts produced by a (sub‐sub‐)vendor will be less p p y ( )available due to a flood in China?

– which of our cars will be affected by political unrest in Thailand?Thailand?

– how can our competitors disrupt our supply chain by buying up all producers of this chip?

– Did one of our sub sub vendors start selling to our competition and what does that mean for us?What happened historically with the price of this sub part– What happened historically with the price of this sub part when the prices for crude oil or any other raw material went up?

– Is one of the (sub sub sub) vendors in our chain in financial distress and how would that affect us. 

Page 24: Graph Databases, Triple Stores and their uses…

We need three graphs (or clouds f d ) hof data) come together

• The bills of materials for the cars that we produce, plus the parts tree, plus the names of the first tier vendors that provide parts and if possible our parts inventory and inventory prediction for parts.

• The supply chain for the first tier vendorsThe supply chain for the first tier vendors– Who sells the sub‐parts to our first tier vendors and then go 

recursively down this tree– Get the names and geo locations and all other meta data about 

this network of vendors and suppliers• Spider the web and business news sources forp

– Every supplier, the countries where they are located, commodities, etc etc..Analyze the text for risks– Analyze the text for risks

Page 25: Graph Databases, Triple Stores and their uses…
Page 26: Graph Databases, Triple Stores and their uses…

The following slide shows the h d bgraph as used by power users

• Company X providing an Exhaust Muffler for a car Yp y p g• That is bought from the first tier vendor USAcme• Who buys it from a vendor in Bangkok (Thai Acme)

• Where we also show a news paper article that has the news about floods in Thailandfloods in Thailand 

• And because Thailand has the place Bangkok

• We have a potential risk.

Page 27: Graph Databases, Triple Stores and their uses…
Page 28: Graph Databases, Triple Stores and their uses…

EPIM ReportingHub SolutionEPIM ReportingHub SolutionMay 11th , 2011

Slide 28

y ,

Page 29: Graph Databases, Triple Stores and their uses…
Page 30: Graph Databases, Triple Stores and their uses…

Data Processing Approach

Import Processing Export

XMLSemantic

XMLXML

XML XML Model

Semantic

RDFRepository Excel

MappingRules

OutputTemplates

ExcelSemantic

Tables Model

HTML

&Models

(SPIN)

&Models

(SWP)

RDBSemantic

D2RQ Model

DomainOntologies JSON

(SPIN) (SWP)

Slide 30

Page 31: Graph Databases, Triple Stores and their uses…

Capability Architecture

Reporting Normalization Classification Validation Report DisseminationApplication Ontologies

Reporting Obligations

Normalization Mappings

Classification Models

Validation Rulesets

Report Templates

Dissemination Obligations

Resource

SystemOntologies

Access Control

Transforms Metrics Workflow Logging Notification

Slide 31

DDR ISO 15926 NPD FactPages PCA RDL Operators PartnersResource Ontologies Policies

Page 32: Graph Databases, Triple Stores and their uses…

When graph database or triple ?store?

Page 33: Graph Databases, Triple Stores and their uses…

You have billions of ‘same‐type’ objects and you need to retrieve them extremely fast

You have a fixed size, static data set and you 

need fast graph retrieve them extremely fast. Or you need simple analytics.

g pcomputations and pattern 

matching.

You need all the features of an enterprise database butYou need to work with 

ontology  driven knowledge base, rules but also the 

flexibility of a graph database

Page 34: Graph Databases, Triple Stores and their uses…

When Graph Database or T i l S ?Triple Store?

When you need ultimate flexibilityWhen you need ultimate flexibility• Modeling knowledge and assets• Hundreds to thousands of classes with different features• Everyday new classes and new features• You work with rules and reasoning

When you need ultimate ‘linkability’When you need ultimate  linkability• For (ad hoc) integration of databases

When you need pattern recognition and network analysis• Complex  networks of people, companies, products, etc

When you need event processing using geospatial, temporal reasoning and social network analysis combined with flexiblereasoning and social network analysis combined with flexible metadata

Page 35: Graph Databases, Triple Stores and their uses…

Q1: A reasonable hard query for horizontally scaling stores and rdb, a straight forward query

Select ?a ?b ?c ?d ?e

for vertical/parallel store

where {

Franz send-money ?a

?a send money ?b?a send-money ?b

?b send-money ?c

?c send-money Cray

Cray send-money ?d

Not (?d = ?c)

?d send-money ?e?d send money ?e

Not (?e ?b)

?e send-money Franz}

Page 36: Graph Databases, Triple Stores and their uses…

Q1: A very hard query for horizontally scaling stores and rdb a straight forward query for

Find a money trail from Franz to Cray that is more than

stores and rdb, a straight forward query for vertical/parallel store

two steps, find another money trail from Franz Cray that is more than two step where the two trails are completely different

(Select (?path1 ?path2)

(path Franz Cray <send-money> >= 2 ?path1)

(path Cray Franz <send-money> >= 2 ?path2)

(empty (intersection ?path1 ?path2))

Page 37: Graph Databases, Triple Stores and their uses…

Why is this hard in SQLy Q

• Relational databases very good at straight joins but less optimal for self‐joins of unpredictable length

• Try writing this as a sql query☺• Try writing this as a sql query ☺

Page 38: Graph Databases, Triple Stores and their uses…

Why is this hard in distributed key/value storeskey/value stores.

• Databases like Cassandra are extremely good at retrieving nested objects in a see of billion of objects but are less optimal for joins.

• Relatively hard to write these as map reduce expressions Every• Relatively hard to write these as map reduce expressions. Every query has to be expressed as program, ad hoc  is therefore discouraged’

Page 39: Graph Databases, Triple Stores and their uses…

A Simple Event OntologyA Simple Event Ontology

• A type• A type– Meetings, communications event, financial transactions, visit, attack/truce, an insurance claim, a purchase order

– RDFS++ reasoningRDFS++ reasoning• A list of actors

– Social Network Analysis• A place• A place

– GeoSpatial Reasoning• A Start‐time and possible an end‐time

Temporal Reasoning– Temporal Reasoning• Anything else that describes the event

– Goods that changed hands

Page 40: Graph Databases, Triple Stores and their uses…

Social Network Analysis yAnswers 4 questions

• How far is P1 from P2 (and how strong is the relation?)relation?)

• To what groups does this person belong (ego groups, cliques?)

• How important is this person in the group?person in the group?

• Does this group have a leader, how cohesive are they?

Page 41: Graph Databases, Triple Stores and their uses…

GeoSpatialGeoSpatial

• Make the following super efficient• Make the following super efficient– Where did something happen?– How far was event1 from event2?– Find all the events that occurred in a bounding box or radius of M miles?

– Do these two shapes overlap?– Find all the objects in theintersection of two shapesintersection of two shapes

• On a very large scale– when things don’t fit in memory– millions of events and polygons

Page 42: Graph Databases, Triple Stores and their uses…

Temporal ReasoningTemporal Reasoning

• Adhere to our convention to encode StartTimes and EndTimes and enjoyEndTimes and enjoy efficient temporal primitives

• Implementation ofAllen’s intervall i i itilogic primitives

Page 43: Graph Databases, Triple Stores and their uses…

And try this on RDB/Cassandra: yActivity Recognition

• Mix SNA with reasoning and temporal/geospatial reasoning.g p /g p g

Find all meetings that happened in November within 5 miles of Berkeley that was attended by the most important person in Jans’ friends and friends of p pfriends.

(select (?x)(ego-group person:jans knows ?group 2) SNA(actor-centrality-members ?group knows ?x ?num) SNA(q ?event fr:actor ?x) DB Lookupq p(qs ?event rdf:type fr:Meeting) RDFS(interval-during ?event “2008-11-01” “2008-11-06”) Temporal(geo-box-around geoname:Berkeley ?event 5 miles) Spatial!)

Page 44: Graph Databases, Triple Stores and their uses…

Thanks..Thanks..