page 1 renzo angles and claudio gutierrez university of chile acm computing surveys, 2008 survey...
DESCRIPTION
Page 3 Why a Graph Data Model? Natural modeling of data –Able to keep all the information about an entity in a single node and showing related information by arcs connected to it –Visible to the user and allows a natural way of handling applications data Queries can refer directly to this graph structure –Allow users to express a query at a high level of abstraction –A data model where the operations over data are graph transformations Comparison with other Database modelsTRANSCRIPT
Page 1
Renzo Angles and Claudio GutierrezUniversity of Chile
ACM Computing Surveys, 2008
Survey of Graph Database Models
Page 2
Introduction
• Graph Data Model?– Data and/or the schema are represented by Graphs, or by data
structures generalizing the notion of graph– Data Manipulation is expressed by graph-oriented operation
• Graph DB-Model?– A model in which the data structures for the schema and/or in-
stances are modeled as a directed, possibly labeled, graph, or generalizations of the graph data structure, where data manipu-lation is expressed by graph-oriented operations and type con-structors, and appropriate integrity constraints can be defined over the graph structure
Page 3
Why a Graph Data Model?
• Natural modeling of data– Able to keep all the information about an entity in a single node
and showing related information by arcs connected to it– Visible to the user and allows a natural way of handling applica-
tions data
• Queries can refer directly to this graph structure– Allow users to express a query at a high level of abstraction– A data model where the operations over data are graph transformations
Comparison with other Database models
Page 4
Motivations and Applications
• Graph DB are motivated by real-life applications where component interconnectivity is a key fea-ture– Classical App
• ‘See’ Data connectivity• Managing Transportation Network• Graphical and Visual interfaces• On-line hypertext
– Complex Networks• Social Networks• Information Networks• Biological Networks
Page 5
Data structures
• The representation of entities and relations is fundamental to graph DB-models
• Graph DB-model is a framework for the presentation of con-nectivity among entities– Directed/Undirected graphs, Labeled/Unlabeled edges and nodes, Hy-
pergraphs
• Representation of Entities : Schema and Instance– Schema graph defines entity types(nodes labeled with type name) and
relation(edges labeled with relation names)– Instance graph contains entities (nodes labeled entity type or identifier)
and relation(labeled edge according to schema)– Tuple and sets (PaMal, GDM) and n-ary relations (GOAL, GDM)
Page 6
Data structures (cont’d)
• Representation of Relations– Attributes
• Labeled edges directly related to nodes• In case of GROOVY, attributes are <node,edge,node> triples inside
hypernodes– Entities
• Most models do not support this feature because relations are rep-resented as simple labeled edges
– Standard Abstraction• Is-part-of, is-composed by, n-ary relation
– Derivation• ISA, is-of-type
– Nested• This feature is naturally supported by using hypergraph structures
Page 7
Integrity Constraints
• Schema-Instance Consistency– Entity Type checking
• The instance should contain only entities and relations from entity types and relations that were defined in the schema
• An entity in the instance may only have those relations or properties defined for its entity type
– Type checking constitute
• Object Identity and Referential Integrity– Set-based data models such as the relational model are value-based– Object Identity
• Every node has its own identifier– Referential Integrity
• ‘Only existing entities be referenced’
Page 8
Query and Manipulation Languages
• A query language is a collection of operators or inferrencing rules• Existing Query Language
– G• Based on regular expressions• Graphical query: set of labeled directed multigraphs• Nodes are variables or constants• Edges can be labeled with regular expressions
– G+• Extension of G• Graphical query• Graph query + summary graph
– GraphLog (G-log)• Extension of G+• Adds negation• Graph pattern = graph query + edge query + summary graph• Includes transitive closure operator
Page 9
GraphLog example
• Query A asks for the names of Mary’s grandparents (fixed path query)• Query B asks for the name of the maternal grandmother of Mary (tree-like
query)• Query C calculates Mary’s Ancestors (transitive closure)
Page 10
A Genealogy Diagram – an example
Page 11
LDM (Logical Data Model)
• The schema uses two basic type nodes for representing data val-ues (N and L), and two nodes (NL and PP) to establish relations among data values in a relational style
• The instance is a collection of tables, one for each node of the schema.
Page 12
Hypernode Model
• The schema defines a person as a complex object with the prop-erties name and lastname of type string, and parent of type per-son
• The instance shows the relations in the genealogy among differ-ent instances of person
Page 13
GROOVY
• At the schema level, we model an object PERSON as a hyper-graph that relates the attributes NAME, LASTNAME and PARENTS
• Value functional dependency NAME,LASTNAME → PARENTS logi-cally represented by the directed hyperedge ({NAME, LAST-NAME} {PARENTS})
Page 14
Sematic-XT
• This model does not define an schema• In the first level, the graph contains the relations Name and Last-
name to identify people (P1, . . . , P6)• In the second level we use the abstraction of Person, to compress
the attributes Name and Lastname and represent only the rela-tion Parent between people
Page 15
GGL
• Schema and instances are mixed• Packaged graph nodes (Person1, Person2, . . . ) are used to en-
capsulate information about the graph defining a Person• Relations among these packages are established using edges la-
beled with parent
Page 16
PaMaL
• Schema: basic type (string), class (Person), tuple (X), set (*) nodes for the schema level
• Atomic (George, Ana, etc.), instance (P1, P2, etc), tuple and set nodes for the instance level
• Note the use of edges ∈ to indicate elements in a set, and the edge typ to indicate the type of class Person (these edges are changed to val in the instance level).
Page 17
GRAM
• At the schema level, we use generalized names for definition of entities and relations
• At the instance level, we create instance labels (e.g. PERSON 1) to represent entities, and use the edges (defined in the schema) to express relations between data and entities
Page 18
Object Exchange Model (OEM)
• Schema and instance are mixed• The data is modeled beginning in a root node &pp, with children
person nodes, each of them identified by an Object-ID (e.g. &p2)• These nodes have children that contain data (name and last-
name) or references to other nodes (parent)
Page 19
RDF
Page 20