spatial tree logics to reason about semistructured data
DESCRIPTION
SEBD 2003. Spatial tree logics to reason about Semistructured Data. Speaker: Giovanni Conforti Joint work with: Giorgio Ghelli. Dipartimento di Informatica – Università di Pisa. What I’m going to talk about …. A gentle introduction to Spatial Tree Logics (STL) - PowerPoint PPT PresentationTRANSCRIPT
Spatial tree logics to reason about Semistructured Data
Speaker: Giovanni Conforti Joint work with: Giorgio Ghelli
SEBD 2003
Dipartimento di Informatica – Università di Pisa
What I’m going to talk about …
A gentle introduction to Spatial Tree Logics (STL)STL and Semistructured Data (SSD)– Properties of SSD (Constraints, Types, Queries) Spatial
Tree Logic (STL) Formulas – Decision Problems for SSD Validity/Satisfiability of STL
Formulas
Presentation of a decidable fragment of the TQL logic
Background: Spatial Logics
Modal Logics to describe properties of structured worldsMany Applications: Ambient Calculus, -calculus, tree structured data, shared data structures, … Spatial (and temporal) modal operators to describe structure (and behavior)Equivalence, model checking and validity problem are already studied for many spatial logicsMany works involving Cardelli, Gordon, Caires, Ghelli, Gardner, …
A Simple Ground Spatial Tree Logic
Worlds = Information trees : Unordered (multisets of) labeled trees
F,F’ ::= 0 (empty root) | n[F] (an edge labelled n leading to the i.t. F) |
F | F (the i.t. F “next to” the i.t F’) Logic = propositional logic connectives + modal operators describing the structure
A,B :: = True | Not A | A and B 0 | n[A] | A | B
Examples
F = book[ title[Databases[0]] | author[Ghelli[0]] | author[Albano[0]] ]
F |= A F |= BF |= CF |= D
An information tree: a tree labelled book
with 3 subtrees
Some formulas describing trees
A = book[ author[Ghelli[0]]]
B = book[ author[Ghelli[0]] | True]
C = book[ Not (editor[True] | True) ]
D = book[ title[True] And author[True] ]
First order and modal recursion
The full TQL logic extends the ground fragment with:– X tree variables– x[A] locations with label variables– Exists x. A quantification over labels (and trees)– μξ. A fixpoint (ξ positive in A)
Decision Problems
Given a formula A and a model FModel checking: F |= A ?Query Answering: find values of x such that F |= A(x)Satisfiability sat(A): Exists a F’ such that F’ |= A ?Validity vld(A): is true that For each model F’, F’ |= A ?Negation in the logic: Sat(A) Not vld(Not A)ImplicationF. F|=A implies F|=B vld(Not A Or B)
With the simple ground STL all these problems are decidable, but that is not true for satisfiability/validity if we introduce variables and quantification (or fixpoint)
A SSD Data model: labeled trees
articles
articlearticle
authordate
title
monthyear
GordonApr, 2000
Feb
TQL
… …
author
Cardelli
date
2001
author
Ghelli
… …
articles[article[
author[Cardelli] |author[Gordon] |title [Anywhere] |date[Apr, 2000] ]
article[author[Ghelli] |
title[TQL] |conf[ETAPS] |date[
month[Feb] | year[2001] ] ]
]
information trees
SSD Schema and Types
Schema and Types to constraint the structure of SSD:– DTDs;– XML Schema;– Regular Expression Types;
A schema:Article = article[ title[String],author[String]*,date[True]?]
A recurisve type:Section = section[
init[String], Section*, conc[String] ]
Types in STL
Regular Type expressions and DTD can be expressed (up to document order) in STL extended with modal recursionA schema:article[ title[String],author[String]*,date[True]?]
In STL article[ title[True]| (. 0 Or author[True]|) | date[True] or 0
]
SSD Constraints
Integrity Constraints on the values of SSD:– Inclusion Constraints;– Inverse Relationship Constraints;– Key Constraints;
path expressions to navigate on SSD:articles.article.title(x)
root.section*.init(x)
Integrity constraints as inclusion of paths:student.takes => course.cno student.takes course.taken_byKey constraints (first order logic with paths):
x,y. article.title(x) And article.title(y) And (x=y) => (x == y)
Constraints in STL
Integrity Constraints over SSD are easily expressed using STL with variables and quantification.Examples using path abbreviation (.a[A] = a[A] | True):– An inclusion constraint $X. .student.taking[$X] => .course.cno[$X]– A key constraint for SSD:
$X. Not (.article.title[$X] | .article.title[$X] )
Combining quantification with recursion we can express complex types and constraints (e.g. binary trees)
SSD Queries
Many query languages (Xquery, Lorel, Yatl, …), essentially queries are expressions selecting data reachable from paths and constructing new resultsTQL a peculiar query language based on spatial tree logic, the selection is done using pattern matching over STL formulasTQL logic expresses all regular path expressionsQuery answering is implemented for the full TQL logic
SSD Decision Problems with STL
Given a data source F, and formulas A representing a schema and B, B’ a set of integrity constraintsValidation: F |= A, F|=B, F|= A And B
Schema/constraint consistency: sat(A), sat(B), sat(A And B)
Constraint Implication (inference): vld(B => B’)
Constraint Implication in presence of a schema: vld(A and B => B’)
A decidable TQL sublogic
STL are good to express types, constraints and queries over SSD but:– Validity in the full TQL logic is undecidable– The gound logic is decidable, but it is not enough to express
all interesting types and contraints
We are looking for a decidable fragment of TQL expressive enough to reason about SSD A first step in this direction is the following logic…
A decidable TQL sublogic
A, B ::= True | A and B | Not A| 0 | %[A] | n[A] | A|B We can define useful operators to describe types and
constraints in this decidable logic String =def %[0] Tree =def %[True]A or B =def Not (Not A And Not B) A => B =def Not A Or BAexists =def A | True Aforeach =def Not( Not A | True)
AforeachTree =def (Tree => A) foreach Note: if A => Tree we can use AforeachTree to express A*
Conclusions and Future Directions
STL provide a powerful unified framework for types, constraints, and queries over SSD and XMLThis framework is worth of studying, it may lead to:– A good formalization of “SSD reasoning” in terms of model
checking and validity– Generalization of results on reasoning about types, constraints – Query Optimization strategies guided by types/constraints
(some) future steps– Extend the decidable logic to express integrity constraints– Modeling ordered trees
Spatial tree logics to reason about Constraints and Types
Speaker: Giovanni Conforti Supervisor: Giorgio Ghelli
Università di Pisa: Ph.D. Proposal
SSD Query Optimization
TQL pattern clause uses STL formulas…We can use validated constraints C an types T as information to optimize queries (e.g. static declaration of empty result)A query from Q |= A select Q’ can be rewritten with from Q |= B select Q’ for each B such that
(C and T) => (A <=> B)
Research Plan: pianification
The challenge is ambitious, it must be intended as a long term direction of our workWe address some initial tasks we expect to accomplish:
– Comparison of STL with other formalisms for types and constraints– Find a “satisfactory” decidable logic fragment to express types (and
constraints)– Write a preliminar formal system for constraint (and type) implication
We plan two stages:1. (2nd year) deep study of basic theories (tree automata, modal
logics, description logics) and initial tasks investigation2. (3rd year) Initial tasks completion and integration of the results in
a unified formal framework
Research Plan: directions
Main directions, investigate on:– Expressivity of Spatial Tree Logics (in particular for standard Types
and Constraints specifications)– Decidability and complexity of model checking and validity for
fragments (or extensions) of TQL logic– Reformulation (or generalization) of known results about reasoning
and optimization over SSDOther interesting directions: – Implementation of a query rewriter guided by constraints and types– Extensions to the logic to model order, data updates, private
names
Background: Semi-structured data (SSD)
Semi - Structured Data (SSD) are used to:– model and query web (HTML, XML, …);– store sperimental data;– integrate eterogeneous databases;– …
SSD are:– Self-describing (structure is implicit);– Irregular;– Always in evolution