xpath and beyond: formal foundations xerox research centre europe / inria jean-yves vion-dury xerox...
TRANSCRIPT
XPath and Beyond: Formal Foundations
Jean-Yves Vion-Dury XeroxXerox Research Centre Europe / INRIAResearch Centre Europe / INRIA
Pierre Genevès INRIAINRIA
05/2004XPath and Beyond: Formal
foundations
Roadmap: Part 1
XPath: a cornerstone of the XML architecture Theory and Engineering Some key problems The trends around XPath theoretical studies A Logic Based approach Mathematical Characterization Why using the Coq Proof Assistant ?
05/2004XPath and Beyond: Formal
foundations
XPath: a cornerstone of the XML architecture
Expresses both node selection and/or structural properties
Currently used in XSLT, XQuery, XML Schema, XLink, XPointer,…
XPath is elegant, compact, effective and powerful Claim: will be increasingly used and studied in the
future Indexing large document bases Checking integrity constraints / global structural properties Linking increasing document volumes
05/2004XPath and Beyond: Formal
foundations
Theory and Engineering in Computer Sciences
Some decades ago, some theoretical studies prepared engineering The relational algebra enabled a huge market around data storage and
access Information Theory prepared digital processing (networks, image and
sound processing, compression algorithms,…) Linguistic, Logic and Formal mathematics prepared programming
languages A Strange situation today around documents…
W3C Standardization activities produce specifications, and many problems remain open
Some theoreticians try to capture problems and to understand underlying issues, long after the publication of the specifications!
This induces new difficulties and requires different approaches In order to deal with low level issues, closed from implementations In order to face complexity of systems
05/2004XPath and Beyond: Formal
foundations
Some Key Problems around XPath
Formal semantics definition Formal Model of Documents (trees, streams, graphs, strings,…?) Precise, useful and simple Denotational/Operational semantics
Type checking Constraints on Document structure (tree grammars, graph grammars,
pattern matching) Valid/Invalid Path expression with respect to a particular schema
Rewriting path expressions In order to customize compilation/interpretation Normalization
Optimization Reduction of the complexity of suitable models Simplifying expressions while preserving semantics
Equivalence p1 ≈ p2 gives a fundamental understanding of the language
Containment p1 ≤ p2 Gives an even more fundamental view Key inference: If p is a key for a schema S, then all p’ such that p’ ≤ p are
keys too
05/2004XPath and Beyond: Formal
foundations
Linking Key Problems around XPath
Invalid expression and containment p ≤
Rewriting and equivalence (p1 | p2)/p -> p1/p | p2/p and (p1 | p2)/p ≈ p1/p | p2/p
Optimization and containment If p1 ≤ p2 then (p1 | p2)/p -> p2/p
Equivalence and containment p1 ≈ p2 iff p1 ≤ p2 and p2 ≤ p1
Containment and type checking Structural constraints can be captured in XPath expression Structural Constraint satisfaction can thus be checked
05/2004XPath and Beyond: Formal
foundations
The problem of containment (expression)
)()(,, 2121 tptppptxt xx
/td table/ dtable/tr/t ** table// dtable/tr/t
**/[.//c]/* a/b/c
::*::** td]/parent[self/ table/ d]table/tr[t
05/2004XPath and Beyond: Formal
foundations
The problem of typed containment (expression)
/td) table/tr /td*(table/
)()(,,, 2121 tptppptxStS xxS
/td table/tr /td*table/ html
05/2004XPath and Beyond: Formal
foundations
The Trends around XPath Theoretical Studies
Formal semantics
Rewriting OptimizationContainment
& Equivalence
Typed Containment/Optimization
Child and descendant axes
[Geneves,vion04][Flesca03]
[Miklau,Suciu][Neven,Schwentick][Deutsch,Tannen][Geneves,vion04]
[Deutsch,Tannen][Kwong,Gertz]
All axes [Olteanu01][Olteanu01][vion,layaida03]
[Geneves,vion][vion,layaida03] ?
All axes+ Position andcount
[Wadler99][vion,layaida03][Gottlob,koch03]
[Geneves,Rose04]
[Geneves,vion] [Geneves,vion]
05/2004XPath and Beyond: Formal
foundations
A Logic Based Approach
A set of axioms to reason on terms comparison ≤ As opposed to model based approaches
A partial equivalence relation to minimize the axiom set fully congruent (e.g. p1 ≤ p2 and p1==p3 implies p3 ≤p2)
Theorems for simplifying the containment proofs E.g. reflexivity, transitivity
Drawback: syntactic level more combinatorial as opposed to model based approaches
Advantage: syntactic level more extensible, provided the previous point is addressed Gives more indication on the underlying issues due to
language peculiarities
05/2004XPath and Beyond: Formal
foundations
XPath: abstract syntax ([Wadler99],[Olteanu01])
05/2004XPath and Beyond: Formal
foundations
Denotational semantics ([Wadler99][Olteanu01])
05/2004XPath and Beyond: Formal
foundations
Denotational semantics ([Wadler99][Olteanu01])
05/2004XPath and Beyond: Formal
foundations
Denotational semantics ([Wadler99][Olteanu01])
05/2004XPath and Beyond: Formal
foundations
Basic axioms
1'::'::
''d
NaNa
NNaa
topc^^ 1cp
24321
424321
//
][][d
pppp
pppppp
05/2004XPath and Beyond: Formal
foundations
Union & Intersection
121
21 ippp
pppp
ccppp
pppp3
21
21
| acppp
pp3
21
1
|
221
21 ippp
pppp
05/2004XPath and Beyond: Formal
foundations
Qualifiers
32211
2121
][][d
qpqp
qqpp
05/2004XPath and Beyond: Formal
foundations
The equivalence relation ( [Olteanu01])
05/2004XPath and Beyond: Formal
foundations
Using equivalence in proofs
app
pppp
1
221
bpp
pppp
1
221
05/2004XPath and Beyond: Formal
foundations
Mathematical Characterization
Soundness of the equivalence
Soundness of rules (e.g.)
Completeness of rule system (e.g.)
)]()|()([| tpptpppp xx321321
)]()|()([| tpptpppp xx321321
)]()()()([ 122121 tptptptppp xxxx
05/2004XPath and Beyond: Formal
foundations
Why Using the Coq Proof Assistant ?
Coq http://coq.inria.fr is a Proof Assistant based on the Calculus of Inductive Constructions
Higher Order Logic Constructive Logic Typed
To address the complexity problem related to proofs To benefit from the help of the Proof Assistant in case analysis To maintain all the mathematical architecture along exploratory work
To work in a rigorous frame To produce rock solid and readable results
The challenge: Require powerful data structure modelling capabilities Learning Coq is an additional difficulty ! Developing a proof in Coq is more demanding
But… Coq is quite mature now (v8.0, 25 years of research !) and very
expressive
05/2004XPath and Beyond: Formal
foundations
Roadmap: Part 2
Modelling XPath using inductive constructions Formal Semantics and interpretations
Interpreter based on the denotational semantics A relational semantics for XPath
Modelling the containment relation Using the proof system: containment checking Current work on characterization Methodology and expected outcomes
05/2004XPath and Beyond: Formal
foundations
Modelling XPath using inductive constructions
Paths are defined inductively “void” (), “top” () are atoms | / … are binary
constructors [] involves qualifiers
_true, _false are atoms “and”, “or”, “not” :
constructors “leq” (): a cross-inductive
definition Functional notation, example:
a/b[c] slash a (qualif b c)
05/2004XPath and Beyond: Formal
foundations
Interpreter based on the denotational semantics
Evaluates a path p from the context node x of the tree t
The evaluation of a path returns a set of nodes
Cross-Recursive and terminating functions
The evaluation of a qualifier returns a boolean
05/2004XPath and Beyond: Formal
foundations
Need for a logic-based semantics
The classical semantics describes an interpreter that computes nodesets
This computational vision leads to useless complexity in proofs
Is there another way to capture XPath Semantics?
05/2004XPath and Beyond: Formal
foundations
A Relational Semantics for XPath
An Interpretation of paths in First-Order Logic
A path is translated into a dyadic formula
Rp holds for all pairs (x,y) of nodes such that y is accessed from x through the path p.
Advantages: interpretations of paths
and qualifiers are unified Direct translation in Coq
Sem math du papier
05/2004XPath and Beyond: Formal
foundations
Modelling the containment relation (1)
A binary logical relation “Ple” Gathers all containment rules in a single inductive
construction Suited for using Coq’s built-in tactics (constructor,
inversion)
05/2004XPath and Beyond: Formal
foundations
Modelling the containment relation (2)
The containment relation ≤ for paths Is inductive
Is defined using its dual relation for qualifiers (“Qipl”)
05/2004XPath and Beyond: Formal
foundations
Using the proof system: Containment Checking
We have modelled: XPath terms Their interpretation The containment relation (that gathers our containment
axioms) We can now check containment facts with the proof
engine Demo of a tactical which proves the fact:
./*/b ≤ ./descendant::b
Underlying goal: extend the tactical in order to automatize the checking of all containment facts
05/2004XPath and Beyond: Formal
foundations
Proving Properties: Characterization
Proving the equivalence of semantics (done) Current work: proving the validity of our axiomatization:
Soundness Completeness Finding relevant induction schemes
mutual induction (duality between paths-qualifiers) Induction on a measure of the term complexity
Finding generic and modular Coq tactics (to reduce combinatorial issues)
05/2004XPath and Beyond: Formal
foundations
Methodology and Possible outcomes
SoundNot Sound
Inductive Relation Ple
Incomplete
Complete
Fix wrong rulesAdd missing rules
Intrinsically Incomplete
Incomplete Algorithm
Algorithm
Extend the fragment
Undecidable
Decidable
Undecidable
Decidable
why?
why?
05/2004XPath and Beyond: Formal
foundations
Conclusion
We proposed a Logic based framework for static analysis of XPath
Modelling with inductive constructions (XPath terms and interpretations, Containment Relation)
Preliminary result: a simpler semantics Ongoing Work on Characterization
05/2004XPath and Beyond: Formal
foundations
Backup slides
Applications
05/2004XPath and Beyond: Formal
foundations
Some Applications (1)
Optimization of XPath queries Detecting contradictions (p ≤ void) Eliminating redundancies
Example: //a[*/b/c and descendant::b]
/descendant::a[*/b/c] */b/c => descendant::b
An optimization not currently achieved at runtime by XPath engines:
Xalan C++
05/2004XPath and Beyond: Formal
foundations
Some Applications (2)
Static Analysis of XPath host languages Example: XSLT
Checking XSLT stylesheets Optimization of XSLT stylesheets
Extending XPath expressive power with an inclusion constraint: p[p1 p2]
Integrity Constraint-Checking id(//book/@authors) //persons/@name
Transformation languages strongly based on XPath