università degli studi di pisa speaker: giovanni conforti joint work with: orlando ferrara and...
Post on 18-Dec-2015
220 views
TRANSCRIPT
Università degli Studi di Pisa
Speaker: Giovanni Conforti
Joint work with: Orlando Ferrara and Giorgio Ghelli
TQL Algebra and its Implementation
IFIP TCS @ 2002 Montreal, 28th August
2
• Short introduction to SSD and SSD query languages.
• Tree logic and TQL overview.
• TQL Algebra motivations.
What I’m going to talk about…
• TQL Algebra presentation.
• Translation algorithm.
• Translation correctness.
• Our implementation model.
• Conclusions and future works.
IFIP TCS @ 2002 Montreal, 28th August
3
• Semi-Structured Data (SSD) are used to:
• model and query web (HTML, XML, …);• store sperimental data;• integrate eterogeneous databases;• …
Semi-structured Data
• Semi-Structured Data (SSD) structure is:
• irregular;• implicit;• always in evolution;• .........
IFIP TCS @ 2002 Montreal, 28th August
4
Data model: SSD as labelled trees (Example)
articles
articlearticle
authordate
title
monthyear
GordonApr, 2000
Feb
TQL
… …
author
Cardelli
date
2001
author
Ghelli
… …
IFIP TCS @ 2002 Montreal, 28th August
articles[article[
author[Cardelli] |author[Gordon] |title [Anywhere] |date[Apr, 2000] ]
article[author[Ghelli] |
title[TQL] |conf[ETAPS] |date[
month[Feb] | year[2001] ] ]
]
5
• As for tabular data we have SQL and relational algebra, we’d want to define query language and algebra for SSD
• Specify and develop a good query language for SSD (in paricular for XML) is one of the main current challenges of database and web research communities.
SSD query languages
• After several proposals (Lorel, YATL, XMLQL, XDuce, etc.) the W3C has introduced the standard XQuery whose implementation and specification are work in progress.
IFIP TCS @ 2002 Montreal, 28th August
6
• Extend the ambient logic to describe properties of SSD, obtaining a tree logic
• The Tree logic is a modal logic good to express:
• properties that regard horizontal and vertical structure of SSD
• properties whose specification requires negation, recursion or universal quantification
• constraint and types of SSD
TQL – the idea
• Introduce free variables inside tree logic formulas; use a pattern-matching approach to bind these variables to values inside a given data source new SSD query strategy: TQL
IFIP TCS @ 2002 Montreal, 28th August
7
• Based on three clauses:• matching;• filtering;• reconstruction.
• The possibility of integrating logic expression and queries inside the same language gives several advantages in terms of expressivity and optimization (i.e. rewriting based on types)
TQL – the language
Fused in the binding operator:
• But this talk is not about TQL language, but about TQL Algebra… so i will introduce TQL aspects only needed to understand our work about the algebra.
IFIP TCS @ 2002 Montreal, 28th August
• If you want to learn more about TQL see these two articles [WebDB2002] and [ETAPS2000]
8
9
10
11
A, B ::= T A A B x. A X. A
0 L[A] A | B L ~ L’ X
A
Tree Logics - syntax
Negation allows the definition of derived operators:
F A B x. AX. A L[A] A || B
Path Expressions:
• regular expressions;
• compact way to express constraints on paths over trees;
• can be defined using Tree Logics formulas.
Es. .m.n[A] as m[ n[A] | T ] | T
IFIP TCS @ 2002 Montreal, 28th August
12
F 0 iff F = 0
F A B iff F A e F B
F m[A] iff F = m[F’] e F’ A
F A | B iff F’, F’’. F = F’ | F’’ e F’ A e F’’
B
F m[A] iff F’. F = m[F’] F’ A
F A || B iff F’, F’’. F = F’ | F’’ F’ A o F’’
B
F T always
F X iff F = (X)
F A iff ( F A )
… … …
Tree Logics – describing set of trees (forests)
IFIP TCS @ 2002 Montreal, 28th August
13
TQL Queries
Syntax:Q, Q’ ::= 0 | X | L[Q] | f(Q) | Q | Q’ | from Q A select Q’
Example: result[ from $articles articles[
article[title[$T] | date[$D] | T ] | T] select article[title[$T] | date[$D]]]
{ month[Feb] | year[2001] }
{TQL}
{Apr, 2000}{Anywhere}
$D$T
result[ article[ title [Anywhere] | date[Apr, 2000] ] | article[ title[TQL] | date[ month[Feb] | year[2001] ] ] ]
IFIP TCS @ 2002 Montreal, 28th August
14
In general an intermediate algebra assures:
• transformability
• executability
TQL Algebra motivations – in general
Parser
Transation
Execution
TQL query
Algebric expression
TQL Rewriting
TQL Algebra Rewriting
Physical optimization
IFIP TCS @ 2002 Montreal, 28th August
15
• No current algebra for XML supports TQL operators (negation,
quantification, horizontal navigation, etc.) => we write a new one.
TQL Algebra motivations – TQL case
IFIP TCS @ 2002 Montreal, 28th August
• Due to negation and derived operators, this algebra must support
infinite bindings (variable bound to an infinite number of values).
• We want an algebra whose semantics is formally specified in
order to prove its correctness w.r.t. TQL semantics.
• We want a running prototype, so we have to implement data
structures and translation, evaluation algorithms for TQL Algebra
16
• It is an algebra of tables and trees, defined on four sorts.• label expressions L : denoting labels;
• tree expressions Q : denoting forests (set of trees);• row expressions RV: denoting rows over V (tuples with type V);
• table expressions TV: denoting finite or infinite tables (set of rows) with schema V.
TQL Algebra – sorts and their semantics
IFIP TCS @ 2002 Montreal, 28th August
• The basic sort is the table one, that is used to represent the evaluation of a Q A TQL binding operation.
• SSD and TQL query results are naturally represent by tree expressions.
17
TQL Algebra – table expressions
• One-row tables
{RV} | {(x L )} | {(x Q )}
• Relational operators (union, cartesian product, projection and restriction)
T UV, V’
T | T V ,V’ T | V T L ~ L’ T
• Universe and Complement
1V | CoV (T )
• Vertical test and horizontal iterator of trees
if Q = y[Y] then T Y,y else T | U{Q=Y|Y’}
Y|Y’
• Recursion
letrec M = Y. T M,Y in T M | M( Q )IFIP TCS @ 2002 Montreal, 28th August
18
TQL Algebra – tree expressions
Tree algebra reflects the TQL operators used to build trees (queries). The differences are• X does not denote a variable, but a name of a row;• we have a new metavariable Y ranging over tree variables;• the from-select clause is substituted by the tree construction (multiset union) Parr T Qr whose informal semantic is:
“Compute the union of all Qr where r is a row belonging to T”.
IFIP TCS @ 2002 Montreal, 28th August
Q ::= R(X) | Y | 0 | Q | Q’ | L[Q] | f(Q) | Parr T Qr
19
TQL Algebra – derived table expressions
• We can define by translation several useful table expressions:
• intersection, junction, extension
•co-projection (dual of projection)
•other structural test on the tree
• These operators are very useful for translate derived operators of the tree logic!
•All of them are implemented in the current system.
IFIP TCS @ 2002 Montreal, 28th August
20
Translation from TQL to TQL Algebra
IFIP TCS @ 2002 Montreal, 28th August
• The core of translation is the binder translation. We perform a semantic inversion transforming a formula (function from substitutions to set of trees) to a function that, given a tree returns a set of substitutions (table expression).
A Q, RV,
• Translation is defined by structural recursion on A
• It actually depends from the current schema V,
• Q and R are only plugged somewhere inside the expression.
• is an environment mapping logical recursive variables to algebric ones.
╓ ╖
21
Translation from TQL to TQL Algebra - example
• Example:
from Q A x. x[$Z] select Q’RV = Par
r TQ’ RV ; r
T A x. x[$Z] Q ,RV,
A
{$Z}
x[$Z]
……
IFIP TCS @ 2002 Montreal, 28th August
╓ ╖ ╓ ╖
╓ ╖
╓ ╖╓ ╖
22
Translation – operators
IFIP TCS @ 2002 Montreal, 28th August
Formula Algebric Operator Dual Formula Dual Algebric Op.
T Universe F Empty
A Complement
A B Junction A B Ext. Union
x. A, X. A Projection x. A, X. A
Co-Projection
0, L[A] Test L[A] Test (inv)
L ~ L’ Restriction
A | B Union Iterator A || B Join Iterator
X Singleton
A Recursion (minfix) A maxfix
23
Translation correctness
IFIP TCS @ 2002 Montreal, 28th August
• The formal approach we have taken allows us to prove the correctness of the translation. That is :
Theorem
FV(RV) dom(e) , FV(Q) V
[[ Q ]] e(RV ) = Q RV e
Semantics of the query Q in e(RV ) is equivalent to the semantics of the translation of Q in RV
╓ ╖╙ ╜
• The core of the proof is the from-select case in which we prove the correctness of binder translation
24
• Representing in a finite space possibly infinite tables.
Implementing the algebra – model description
• We use disjunctive constraints (closely related to proposals in constraint databases).
• For each algebric operator we define and implement the corresponding one that works on disjunctive constraints.
• New algorithms for complex operators (complement, co-projection, tree navigation)
{ a }{ b }
NotIn { a, b }{ a }
$Y$X
IFIP TCS @ 2002 Montreal, 28th August
25
Implementing the algebra – The TQL System
Tql Engine
Sys Interface
DB
World Wide Web
…...
World Wide Web
Tql Applet
Tql ServletTql GUI
XML
Tql Applet
File system
• Implemented in Java and ported to C#.
• Some stats:
• ~20.000 LoC;
• 182 classes.
• Download at:
http://tql.di.unipi.it/tql
IFIP TCS @ 2002 Montreal, 28th August
26
• TQL Algebra:
• realized as a tool for execute TQL;• seems to be quite general;• it is implemented (with some restictions);• deals with infinite tables.
Conclusions
• Future works:
• rewritings (with types and constraints);• static safety analysis;• cost model and physical optimizations;• extension to the graph model (graph logic).
IFIP TCS @ 2002 Montreal, 28th August
27
The End
IFIP TCS @ 2002 Montreal, 28th August
The End.