spatial tree logics to reason about semistructured data

22
Spatial tree logics to reason about Semistructured Data Speaker: Giovanni Conforti Joint work with: Giorgio Ghelli SEBD 2003 Dipartimento di Informatica – Università di Pisa

Upload: idana

Post on 05-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

SEBD 2003. Spatial tree logics to reason about Semistructured Data. Speaker: Giovanni Conforti Joint work with: Giorgio Ghelli. Dipartimento di Informatica – Università di Pisa. What I’m going to talk about …. A gentle introduction to Spatial Tree Logics (STL) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Spatial tree logics to reason about Semistructured Data

Spatial tree logics to reason about Semistructured Data

Speaker: Giovanni Conforti Joint work with: Giorgio Ghelli

SEBD 2003

Dipartimento di Informatica – Università di Pisa

Page 2: Spatial tree logics to reason about Semistructured Data

What I’m going to talk about …

A gentle introduction to Spatial Tree Logics (STL)STL and Semistructured Data (SSD)– Properties of SSD (Constraints, Types, Queries) Spatial

Tree Logic (STL) Formulas – Decision Problems for SSD Validity/Satisfiability of STL

Formulas

Presentation of a decidable fragment of the TQL logic

Page 3: Spatial tree logics to reason about Semistructured Data

Background: Spatial Logics

Modal Logics to describe properties of structured worldsMany Applications: Ambient Calculus, -calculus, tree structured data, shared data structures, … Spatial (and temporal) modal operators to describe structure (and behavior)Equivalence, model checking and validity problem are already studied for many spatial logicsMany works involving Cardelli, Gordon, Caires, Ghelli, Gardner, …

Page 4: Spatial tree logics to reason about Semistructured Data

A Simple Ground Spatial Tree Logic

Worlds = Information trees : Unordered (multisets of) labeled trees

F,F’ ::= 0 (empty root) | n[F] (an edge labelled n leading to the i.t. F) |

F | F (the i.t. F “next to” the i.t F’) Logic = propositional logic connectives + modal operators describing the structure

A,B :: = True | Not A | A and B 0 | n[A] | A | B

Page 5: Spatial tree logics to reason about Semistructured Data

Examples

F = book[ title[Databases[0]] | author[Ghelli[0]] | author[Albano[0]] ]

F |= A F |= BF |= CF |= D

An information tree: a tree labelled book

with 3 subtrees

Some formulas describing trees

A = book[ author[Ghelli[0]]]

B = book[ author[Ghelli[0]] | True]

C = book[ Not (editor[True] | True) ]

D = book[ title[True] And author[True] ]

Page 6: Spatial tree logics to reason about Semistructured Data

First order and modal recursion

The full TQL logic extends the ground fragment with:– X tree variables– x[A] locations with label variables– Exists x. A quantification over labels (and trees)– μξ. A fixpoint (ξ positive in A)

Page 7: Spatial tree logics to reason about Semistructured Data

Decision Problems

Given a formula A and a model FModel checking: F |= A ?Query Answering: find values of x such that F |= A(x)Satisfiability sat(A): Exists a F’ such that F’ |= A ?Validity vld(A): is true that For each model F’, F’ |= A ?Negation in the logic: Sat(A) Not vld(Not A)ImplicationF. F|=A implies F|=B vld(Not A Or B)

With the simple ground STL all these problems are decidable, but that is not true for satisfiability/validity if we introduce variables and quantification (or fixpoint)

Page 8: Spatial tree logics to reason about Semistructured Data

A SSD Data model: labeled trees

articles

articlearticle

authordate

title

monthyear

GordonApr, 2000

Feb

TQL

… …

author

Cardelli

date

2001

author

Ghelli

… …

articles[article[

author[Cardelli] |author[Gordon] |title [Anywhere] |date[Apr, 2000] ]

article[author[Ghelli] |

title[TQL] |conf[ETAPS] |date[

month[Feb] | year[2001] ] ]

]

information trees

Page 9: Spatial tree logics to reason about Semistructured Data

SSD Schema and Types

Schema and Types to constraint the structure of SSD:– DTDs;– XML Schema;– Regular Expression Types;

A schema:Article = article[ title[String],author[String]*,date[True]?]

A recurisve type:Section = section[

init[String], Section*, conc[String] ]

Page 10: Spatial tree logics to reason about Semistructured Data

Types in STL

Regular Type expressions and DTD can be expressed (up to document order) in STL extended with modal recursionA schema:article[ title[String],author[String]*,date[True]?]

In STL article[ title[True]| (. 0 Or author[True]|) | date[True] or 0

]

Page 11: Spatial tree logics to reason about Semistructured Data

SSD Constraints

Integrity Constraints on the values of SSD:– Inclusion Constraints;– Inverse Relationship Constraints;– Key Constraints;

path expressions to navigate on SSD:articles.article.title(x)

root.section*.init(x)

Integrity constraints as inclusion of paths:student.takes => course.cno student.takes course.taken_byKey constraints (first order logic with paths):

x,y. article.title(x) And article.title(y) And (x=y) => (x == y)

Page 12: Spatial tree logics to reason about Semistructured Data

Constraints in STL

Integrity Constraints over SSD are easily expressed using STL with variables and quantification.Examples using path abbreviation (.a[A] = a[A] | True):– An inclusion constraint $X. .student.taking[$X] => .course.cno[$X]– A key constraint for SSD:

$X. Not (.article.title[$X] | .article.title[$X] )

Combining quantification with recursion we can express complex types and constraints (e.g. binary trees)

Page 13: Spatial tree logics to reason about Semistructured Data

SSD Queries

Many query languages (Xquery, Lorel, Yatl, …), essentially queries are expressions selecting data reachable from paths and constructing new resultsTQL a peculiar query language based on spatial tree logic, the selection is done using pattern matching over STL formulasTQL logic expresses all regular path expressionsQuery answering is implemented for the full TQL logic

Page 14: Spatial tree logics to reason about Semistructured Data

SSD Decision Problems with STL

Given a data source F, and formulas A representing a schema and B, B’ a set of integrity constraintsValidation: F |= A, F|=B, F|= A And B

Schema/constraint consistency: sat(A), sat(B), sat(A And B)

Constraint Implication (inference): vld(B => B’)

Constraint Implication in presence of a schema: vld(A and B => B’)

Page 15: Spatial tree logics to reason about Semistructured Data

A decidable TQL sublogic

STL are good to express types, constraints and queries over SSD but:– Validity in the full TQL logic is undecidable– The gound logic is decidable, but it is not enough to express

all interesting types and contraints

We are looking for a decidable fragment of TQL expressive enough to reason about SSD A first step in this direction is the following logic…

Page 16: Spatial tree logics to reason about Semistructured Data

A decidable TQL sublogic

A, B ::= True | A and B | Not A| 0 | %[A] | n[A] | A|B We can define useful operators to describe types and

constraints in this decidable logic String =def %[0] Tree =def %[True]A or B =def Not (Not A And Not B) A => B =def Not A Or BAexists =def A | True Aforeach =def Not( Not A | True)

AforeachTree =def (Tree => A) foreach Note: if A => Tree we can use AforeachTree to express A*

Page 17: Spatial tree logics to reason about Semistructured Data

Conclusions and Future Directions

STL provide a powerful unified framework for types, constraints, and queries over SSD and XMLThis framework is worth of studying, it may lead to:– A good formalization of “SSD reasoning” in terms of model

checking and validity– Generalization of results on reasoning about types, constraints – Query Optimization strategies guided by types/constraints

(some) future steps– Extend the decidable logic to express integrity constraints– Modeling ordered trees

Page 18: Spatial tree logics to reason about Semistructured Data

Spatial tree logics to reason about Constraints and Types

Speaker: Giovanni Conforti Supervisor: Giorgio Ghelli

Università di Pisa: Ph.D. Proposal

Page 19: Spatial tree logics to reason about Semistructured Data

SSD Query Optimization

TQL pattern clause uses STL formulas…We can use validated constraints C an types T as information to optimize queries (e.g. static declaration of empty result)A query from Q |= A select Q’ can be rewritten with from Q |= B select Q’ for each B such that

(C and T) => (A <=> B)

Page 20: Spatial tree logics to reason about Semistructured Data

Research Plan: pianification

The challenge is ambitious, it must be intended as a long term direction of our workWe address some initial tasks we expect to accomplish:

– Comparison of STL with other formalisms for types and constraints– Find a “satisfactory” decidable logic fragment to express types (and

constraints)– Write a preliminar formal system for constraint (and type) implication

We plan two stages:1. (2nd year) deep study of basic theories (tree automata, modal

logics, description logics) and initial tasks investigation2. (3rd year) Initial tasks completion and integration of the results in

a unified formal framework

Page 21: Spatial tree logics to reason about Semistructured Data

Research Plan: directions

Main directions, investigate on:– Expressivity of Spatial Tree Logics (in particular for standard Types

and Constraints specifications)– Decidability and complexity of model checking and validity for

fragments (or extensions) of TQL logic– Reformulation (or generalization) of known results about reasoning

and optimization over SSDOther interesting directions: – Implementation of a query rewriter guided by constraints and types– Extensions to the logic to model order, data updates, private

names

Page 22: Spatial tree logics to reason about Semistructured Data

Background: Semi-structured data (SSD)

Semi - Structured Data (SSD) are used to:– model and query web (HTML, XML, …);– store sperimental data;– integrate eterogeneous databases;– …

SSD are:– Self-describing (structure is implicit);– Irregular;– Always in evolution