xquery - csd.uoc.grhy561/data/lectures/cs561xquery10.pdf · xml query language w3c working group...
TRANSCRIPT
11
1
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQuery
2
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XML Query Language W3C Working Group
� Current technical documents:�XML Query Requirements�XML Query Use Cases �XQuery 1.0: An XML Query Language �XQuery 1.0 and XPath 2.0 Data Model �XQuery 1.0 Formal Semantics �XML Syntax for XQuery 1.0 (XQueryX)
� The goal of the XML Query WG is to produce:�an abstract data model for XML documents, �a set of query operators on that data model,�a query language based on these query operators
22
3
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
� Select portions of a document
� all
� Copy portions of a document while preserving the hierarchy and theorder of the nodes
� UnQL (non-ordered graphs), YaTL, Quilt, XSLT
� Combine (join) two documents
� all except UnQL,XPath and XSLT (only intra-document joins)
� Construct new documents
� all except XPath
� Navigate irregular or unknown documents
� full (vertical) regular expressions: UnQL, Lorel, XML-QL
� simple (vertical) navigation: XPath, XSLT, Quilt
� horizontal regular expressions: YaTL
XML Query Language Requirements
4
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
� Formulate predicates on the tag and attribute names
� all
Query and preserve the nodes global topological order
� Quilt
Apply aggregation and sorting functions
� all except XPath, UnQL, XML-QL
� Apply existential and universal quantifiers
� Lorel, Quilt, XSLT (not explicitelly!)
� Apply full-text predicates and text operations
� none satisfactorily
XML Query Language Requirements
33
5
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XML Query Use Cases
� Use Case Organization
�Description, DTD/Schema, Input Data, Queries and Results
� Current Use Cases
�"XMP": Experiences and exemplars
�"TREE": Queries that preserve hierarchy
�"SEQ“: Queries based on sequences
�"R“: Access to relational data
�"TEXT": Full-text search
�"NS“: Queries using namespaces
�"PARTS“: Recursive computation
�"REF“: Queries based on references
6
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQuery in a Nutshell
� W3C Specification (Working Draft 11 February 2005)�Satisfies the XML Query requirements
� Formal semantics based on XML Abstract Data Model�The input and output of an XQueryare instances of the XML Query Data Model
� XQuery is a functional language in which a query is represented as an expression�The result of the query is the result of the evaluation of the expression
�Expressions are evaluated in a certain environment
�XQuery expressions can be nested with full generality
DOM
SAX
DBMS
XML
Java
COBOL
DOM
SAX
DBMS
XML
Java
COBOL
W3C XML Query Data Model
W3C XML Query Data Model
XQuery
44
7
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQuery in a Nutshell
� Strongly and statically typed
�detect statically errors in the queries
�infer the type of the result of valid queries
�ensure statically that the result of a given query is of a given (expected) type if the input dataset is guaranteed to be of a given type
� XQuery is a case-sensitive language
�keywords are in lower-case
� Dual syntax: XML (XQueryX) and non XML
� Influenced by: SQL, ODMG OQL, XQL, Quilt, XPath (2.0)
8
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
The XQuery Abstract Data Model
� Common for XPath 2.0 and XQuery 1.0
� A logical model composed of
� a set of logical entities
� constructors and accessors for each entity
� Based on the notion of an ordered tree
�XML data cannot be modeled as a “simple” tree
� A value is either the error value, or an ordered sequence of zero or more items
� An item is a node or an atomic value
� There are seven kinds of nodes:
�Document, Element, Attribute, Text , Comment, Processing Instruction, Namespace
� Literals (XML Schema DataTypes)�int, real, double, string
� Sequences
55
9
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XML and the XQuery Data Model
PSVI: Post-Schema Validation Infoset
10
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Facts about Values
� Examples of values:
�47, <goldfish/>47, <goldfish/>47, <goldfish/>47, <goldfish/>
�( ), (1, 2, 3), (47, <goldfish/>, "Hello")( ), (1, 2, 3), (47, <goldfish/>, "Hello")( ), (1, 2, 3), (47, <goldfish/>, "Hello")( ), (1, 2, 3), (47, <goldfish/>, "Hello")
�An XML document, an attribute standing by itself
�ERROR
� There is no distinction between an item and a sequence of length one
� There are no nested sequences
� There is no null value
� A sequence can be empty
� Sequences can contain heterogeneous values
� All sequences are ordered
66
11
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
The Sequence Type
A sequence of sequences is automatically flattened
12
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Facts about Nodes
� Nodes have identity (atomic values don't)
� Element and attribute nodes have a type annotation:
�generated by validating the node
�may be a complex type
�type may be unknown ("anyTypeanyTypeanyTypeanyType")
� Each node has a typed value:
�a sequence of atomic values (or ERROR)
�type may be unknown ("anySimpleTypeanySimpleTypeanySimpleTypeanySimpleType")
� There is a document order among nodes:
�ordering among documents and constructed nodes is implementation-defined but stable
77
13
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQuery Operators and Functions
� Variables�Notation $$$$var_namevar_namevar_namevar_name
� Infix and Prefix Operators�Logical: andandandand or or or or �Arithmetic: + + + + ---- * div mod* div mod* div mod* div mod (no « //// » )�Comparison: > < = <= >= is> < = <= >= is> < = <= >= is> < = <= >= is isnotisnotisnotisnot (existential semantics)�Node equality: == !==== !==== !==== !==�Node ordering : << >><< >><< >><< >>�Type checking: instance ofinstance ofinstance ofinstance of
� Built-in functions�Sequence values: empty() exists() sequenceempty() exists() sequenceempty() exists() sequenceempty() exists() sequence----deepdeepdeepdeep----equal() equal() equal() equal() sequencesequencesequencesequence----nodenodenodenode----equal()equal()equal()equal()
�String values: compare()compare()compare()compare() concatconcatconcatconcat() () () () startswithstartswithstartswithstartswith() () () () endswithendswithendswithendswith() () () () � Sequence/Node Constructors:
�( ( ( ( ………… ) , { ) , { ) , { ) , { ………… }}}}
� User-defined functions�Syntax fun_name(fun_name(fun_name(fun_name(…………))))
14
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Basic XQueries
� Literals: """"HelloHelloHelloHello" " " " 47 4.7 4.7E47 4.7 4.7E47 4.7 4.7E47 4.7 4.7E----2222
� Constructed values: true()true()true()true() false()false()false()false() date("date("date("date("2002200220022002----03030303----15151515")")")")
� Constructed sequences: A sequence may be constructed by enclosing zero or more expressions separated by commas
�$a$a$a$a, , , , $b$b$b$b is the same as ( ( ( ( $a$a$a$a, , , , $b$b$b$b )))): denotes a sequence containing two members represented by variables
�((((1111, (, (, (, (2222, , , , 3333), ( ), (), ( ), (), ( ), (), ( ), (4444)))))))) is the same as 1111, , , , 2222, , , , 3333, , , , 4444
�5555 to to to to 8888 is the same as 5555, , , , 6666, , , , 7777, , , , 8888
� Operations:
�1111 + + + + 1111
�$x$x$x$x
�$x$x$x$x////titletitletitletitle
�$x$x$x$x////price price price price ++++ 1111
�docdocdocdoc((((““““www.forth.gr/www.forth.gr/www.forth.gr/www.forth.gr/paper.xmlpaper.xmlpaper.xmlpaper.xml””””))))
88
15
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQuery Functions
� Function calls
�threethreethreethree----argumentargumentargumentargument----function(1, 2, 3)function(1, 2, 3)function(1, 2, 3)function(1, 2, 3)
�twotwotwotwo----argumentargumentargumentargument----function(1, (2, 3))function(1, (2, 3))function(1, (2, 3))function(1, (2, 3))
� Functions are not overloaded (except certain built-ins)
� Evaluating a function call
�Convert arguments to expected types and bind parameters
�Evaluate function body
�Convert result to expected result type
� Conversions (if needed):
�Extract typed value from node
�Cast "anySimpleTypeanySimpleTypeanySimpleTypeanySimpleType" argument to expected type
�Promote numerics and derived types
16
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQuery Constructors
� To construct an element with a known name and content, use XML syntax:
<book isbn="12345">
<title> The politics of Experience </title>
</book>
� If the content of an element or attribute must be computed, use a nested expression enclosed in { }{ }{ }{ }
<book isbn="{ { { { $x }}}}">
{{{{ $b/title }}}}
</book>
� If both the name and the content must be computed, use a computed constructor:
element { element { element { element { name-expr } { } { } { } { content-expr }}}}
attribute { attribute { attribute { attribute { name-expr } { } { } { } { content-expr }}}}
element copied from source
element copied newly created
99
17
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQuery Path Expressions
� In the second chapter of the document named “paper.xml", find the figure(s) with caption “XML”
doc(“paper.xml")/chapter[2]//figure
[caption = “XML"]
� Extended with:
�a new dereference operator
�a range predicate
� Find captions of figures that are referenced by figref elements in the chapter of “paper.xml" with title “XQuery"
doc(“paper.xml")/chapter[title = “XQuery"]
//figref/@refid->fig/caption
18
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XPath vs XQuery Axes
1010
19
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQuery Predicates
� Serve as a filter on a sequence (often used in paths)
� Meaning of E1[E2]E1[E2]E1[E2]E1[E2]:
� For each item eeee in the value of E1E1E1E1, evaluate E2E2E2E2 with:
�Context item = eeee
�Context position = position of eeee within the value of E1E1E1E1
� Retain those items in E1E1E1E1 for which the predicate truth value of E2E2E2E2 is true
� The predicate truth value of an expression EEEE:�If EEEE has a Boolean value: use that value
Example: $$$$books[isbnbooks[isbnbooks[isbnbooks[isbn > 5000]> 5000]> 5000]> 5000]�If EEEE has a numeric value: TRUE if eeee is equal to the context position, otherwise FALSE Example: $books[5]$books[5]$books[5]$books[5]
�If EEEE is an (non-)empty sequence: (TRUE) FALSE Example: $books[acknowledgement]$books[acknowledgement]$books[acknowledgement]$books[acknowledgement]
�Otherwise, return an error
20
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQuery Simple Expressions
� Arithmetic operators: + + + + ---- * div mod* div mod* div mod* div mod�Extract typed value from node�Cast "anySimpleTypeanySimpleTypeanySimpleTypeanySimpleType" to double�Promote numeric operands to a common type�Multiple values => error�If operand is ( ) , return ( )�Arithmetic supported for numeric and date/time types
� Boolean operators/functions: andandandand or not()or not()or not()or not()
�Return TRUETRUETRUETRUE or FALSEFALSEFALSEFALSE (2-valued logic)
�"Early-out" semantics (need not evaluate both operands)
� Result depends on effective boolean value of operands
�If operand is of type boolean, it serves as its own EBV
�If operand is ( ), EBV is FALSEFALSEFALSEFALSE
�If operand is a non-empty node sequence, EBV is TRUETRUETRUETRUE
�In any other case, return an error
1111
21
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQuery Complex Expressions
� XPath expressions (for navigation)
� reshuffled XPath 2.0� FLWR expressions (for iteration):
� FORFORFORFOR…………LETLETLETLET…………WHEREWHEREWHEREWHERE…………RETURNRETURNRETURNRETURN…………
� Sorting:
� ORDERBYORDERBYORDERBYORDERBY…………ASCENDING/ ASCENDING/ ASCENDING/ ASCENDING/ DESCENDINGDESCENDINGDESCENDINGDESCENDING
� Aggregate Functions:
� COUNT AVG SUM MIN MAXCOUNT AVG SUM MIN MAXCOUNT AVG SUM MIN MAXCOUNT AVG SUM MIN MAX
� Grouping:
� implicit using nested queries
� (Left/Full Outer) Joins:
� Implicit using nested queries
� Set operations:
� UNION INTERSECT EXCEPTUNION INTERSECT EXCEPTUNION INTERSECT EXCEPTUNION INTERSECT EXCEPT
� Conditional Expressions:
� IFIFIFIF…………THENTHENTHENTHEN…………ELSEELSEELSEELSE…………
� Quantifiers:
� SOME/EVERYSOME/EVERYSOME/EVERYSOME/EVERY…………SATISFIESSATISFIESSATISFIESSATISFIES
� Full-text predicates:
� CONTAINSCONTAINSCONTAINSCONTAINS� Type Switches:
� TYPESWITCHTYPESWITCHTYPESWITCHTYPESWITCH…………CASECASECASECASE…………DEFAULTDEFAULTDEFAULTDEFAULT…………� Casting, Treating and Asserting:
� CAST AS, TREAT, ASSERTCAST AS, TREAT, ASSERTCAST AS, TREAT, ASSERTCAST AS, TREAT, ASSERT� Local Functions
� Currently: arbitrary recursion
22
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQuery Examples
<bib>
<book year="1994199419941994">
<title>TCP/IP IllustratedTCP/IP IllustratedTCP/IP IllustratedTCP/IP Illustrated</title>
<author> <last>StevensStevensStevensStevens</last> <first>W.W.W.W.</first></author>
<publisher>AddisonAddisonAddisonAddison----WesleyWesleyWesleyWesley</publisher>
<price>65.9565.9565.9565.95</price>
</book>
<book year="1992199219921992">
<title>Advanced Programming the Unix environmentAdvanced Programming the Unix environmentAdvanced Programming the Unix environmentAdvanced Programming the Unix environment</title>
<author> <last>StevensStevensStevensStevens</last> <first>W.W.W.W.</first></author>
<publisher>AddisonAddisonAddisonAddison----WesleyWesleyWesleyWesley</publisher>
<price>65.9565.9565.9565.95</price>
</book>
Example data:list of books
1212
23
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQuery Examples
<book year="2000200020002000">
<title>Data on the WebData on the WebData on the WebData on the Web</title><author><last>AbiteboulAbiteboulAbiteboulAbiteboul</last> <first>SergeSergeSergeSerge</first></author><author><last>BunemanBunemanBunemanBuneman</last> <first>PeterPeterPeterPeter</first></author><author><last>SuciuSuciuSuciuSuciu</last> <first>DanDanDanDan</first></author><publisher>Morgan Kaufmann PublishersMorgan Kaufmann PublishersMorgan Kaufmann PublishersMorgan Kaufmann Publishers</publisher><price>39.9539.9539.9539.95</price>
</book><book year="1999199919991999">
<title>The Economics of Technology and Content for DigitalThe Economics of Technology and Content for DigitalThe Economics of Technology and Content for DigitalThe Economics of Technology and Content for DigitalTV TV TV TV </title>
<editor><last>GerbargGerbargGerbargGerbarg</last> <first>DarcyDarcyDarcyDarcy</first><affiliation>CITICITICITICITI</affiliation>
</editor><publisher>KluwerKluwerKluwerKluwer Academic PublishersAcademic PublishersAcademic PublishersAcademic Publishers</publisher><price>129.95129.95129.95129.95</price>
</book></bib>
Example data:list of books
24
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQuery FLWR (“Flower”) Expression
FORFORFORFOR and LETLETLETLET clauses generate a list of tuples of bound expressions, preserving document order
WHEREWHEREWHEREWHERE clause applies a predicate eliminating some of the tuples
RETURNSRETURNSRETURNSRETURNS clause is executed for each surviving tuple, generating an ordered list of outputs
FOR FOR FOR FOR varvarvarvar in in in in exprexprexprexpr
LET LET LET LET varvarvarvar:=:=:=:=exprexprexprexpr WHERE WHERE WHERE WHERE exprexprexprexpr
RETURN RETURN RETURN RETURN exprexprexprexpr
List of tuples List of tuples
Instance of XQuerydata model
A FLWR expression binds some expressions, applies a predicate, and constructs a new result
1313
25
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQuery FLWR (“Flower”) Expression: Example
� Query1: find all books titles published after 1995
FOR $x IN doc("bib.xml")/bib/book
WHERE $x/year > 1995
RETURN $x/title
� Result:
<title>TCP/IP Illustrated</title>
<title>Advanced Programming the Unix environ…</title><title>Data on the Web</title><title>The Economics of Technology and …</title>
26
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQuery FOR vs. LET
� FORFORFORFOR Query: binds var to each element in the list expr
FOR $x IN doc("bib.xml")/bib/book
RETURN <result> $x </result>
�Returns<result> <book>...</book></result><result> <book>...</book></result>
� LETLETLETLET Query: binds var to the entire list expr
LET $x :=:=:=:= doc("bib.xml")/bib/book
RETURN <result> $x </result>
�Returns<result> <book>...</book
<book>...</book>...
</result>�Useful for common subexpressions and for aggregations
1414
27
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
More Complex FLWR Expressions
� Query2: List all books of authors published by Morgan Kaufmann:
FOR $a IN distinct-nodes(doc("bib.xml")
/bib/book[publisher=“Morgan Kaufmann”]/author)
RETURN <result>
$a,
FOR $t IN doc("bib.xml")/bib
/book[author=$a]/title
RETURN $t
</result>
� distinct-nodes (values): eliminates duplicates� Query3: Find books whose price is larger than average:
LET $a = avg(doc("bib.xml")/bib/book/price)
FOR $b in doc("bib.xml")/bib/book
WHERE $b/price > $a
RETURN $b� avg: a (aggregate) function that returns the average
28
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Nesting/Composing XQuery Expressions
<bibliography>
ExpressionExpressionExpressionExpression
</bibliography>
� Every expression has a value and no side effects
� Expressions are fully composable
� Expressions propagate the error value
�Exception: andandandand, orororor, quantifiers have "early-out"
1515
29
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Nesting/Composing XQuery Expressions
<bibliography>
{
FOR $b IN ExpressionExpressionExpressionExpression
RETURNExpressionExpressionExpressionExpression
}
</bibliography>
30
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Nesting/Composing XQuery Expressions
<bibliography>
{
FOR $b IN ExpressionExpressionExpressionExpression
RETURN<book>
{ ExpressionExpressionExpressionExpression , ExpressionExpressionExpressionExpression
}</book> ORDERBY (ExpressionExpressionExpressionExpression , ExpressionExpressionExpressionExpression)
}
</bibliography>
1616
31
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Nesting/Composing XQuery Expressions
<bibliography> {
FOR $b IN doc("bib.xml")//book
RETURN<book>
{$b/author, $b/title
}</book> ORDERBY (author, title)
}
</bibliography>
32
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Sequences in XQuery
� Ordered and unordered sequence
�Ordered sequence (list): default semantics
/bib/book/author
�Unordered sequence (set, bag): indicates the order is not significant
distinctdistinctdistinctdistinct----nodesnodesnodesnodes(/bib/book/author)
unordered unordered unordered unordered (/bib/book/author)
� LET $b = /bib/book $b is a sequence
� $b/@isbn a sequence
� $b/price sequence of n prices
� $b/price * 0.7 sequence of n numbers
� $b/price * $b/quantity sequence of n x m numbers
1717
33
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Sorting in XQuery
<publisher_list>FOR $p IN distinct-values(doc("bib.xml")//publisher) RETURN <publisher>
<name> $p/text() </name> ,FOR $b IN doc("bib.xml")//book
[publisher = $p] RETURN <book>
$b/title ,
$b/price</book> ORDERBYORDERBYORDERBYORDERBY(price DESCENDINGDESCENDINGDESCENDINGDESCENDING)
</publisher> ORDERBYORDERBYORDERBYORDERBY(name) </publisher_list>
�Sorting arguments
�Refer to the name space of the RETURNRETURNRETURNRETURN clause, not the FORFORFORFOR clause
�To sort on an element you don’t want to display
�Return it, then remove it with an additional query
34
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Group_by and Having in XQuery
� Query4: Group by author name their first ten books, for authors having written more than ten books:
FOR $a IN distinct_nodes( doc("bib.xml")//book/author/lastname)
LET $books := doc("bib.xml")//book[SOME $y INauthor/lastname = $a]
WHERE COUNT( $books )>10
RETURN <result>
{$a} { $books[1 to 10] }
</result>
The query uses an element constructor to enclose each author name and related books in the result element
1818
35
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Query5: For each book title list their prices offered by amazon and bn
<books-with-prices>
{FOR $a IN doc(“amazon.xml”)/book,
$b IN doc(“bn.xml”)/book
WHERE $b/@isbn = $a/@isbn
RETURN
<book>
{ $a/title }
<price-amazon>{ $a/price }</price-amazon>,
<price-bn>{ $b/price }</price-bn>
</book>
}
</books-with prices>
Joins in XQuery
The concatenation operator (",") is used to combine the two main
parts of the query
36
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Query5: For each book title list their prices offered by amazon and bn
<books-with-prices>
{FOR $a IN doc(“amazon.xml”)/book
RETURN
<book>
{ $a/title }
<price-amazon>{ $a/price }</price-amazon>,
{FOR $b IN doc(“bn.xml”)/bookWHERE $b/@isbn = $a/@isbnRETURN <price-bn>{ $b/price }</price-bn>}
</book>
}
</books-with prices>
Left-outer joins in XQuery
1919
37
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Query5: For each book title list their prices offered by amazon and bn
LET $allISBNs:= distinct-values(doc(“amazon.xml”)/book/@isbn uniondoc(“bn.xml”)/book/@isbn )
RETURN<books-with-prices>{FOR $isbn IN $allISBNsRETURN<book> {FOR $a IN doc(“amazon.xml”)/book[@isbn=$isbn]RETURN <price-amazon>{ $a/price }</price-amazon>
} {FOR $b IN doc(“bn.xml”)/book[@isbn=$isbn]RETURN <price-bn>{ $b/price }</price-bn>
}</book>
}</books-with-prices>
Full-outer joins in XQuery
38
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
If-Then-Else in XQuery
�Query6: Make a list of holdings, ordered by title. For journals, include
the editor, and for all other holdings, include the author
FOR $h IN //holding
RETURN <holding>
$h/title,
IFIFIFIF $h/@type = "Journal"
THENTHENTHENTHEN $h/editor
ELSEELSEELSEELSE $h/author
</holding> ORDERBY (title)
2020
39
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Existential and Universal Quantifiers in XQuery
�Query7: Find titles of books in which both sailing and windsurfing are mentioned in the same paragraph
FOR $b IN doc("bib.xml")//book
WHERE SOMESOMESOMESOME $p IN $b//para SATISFIESSATISFIESSATISFIESSATISFIES
contains($p, "sailing")
AND contains($p, "windsurfing")
RETURN $b/title
�Query8: Find titles of books in which sailing is mentioned in every paragraph
FOR $b IN doc("bib.xml")//book
WHERE EVERYEVERYEVERYEVERY $p IN $b//para SATISFIESSATISFIESSATISFIESSATISFIES
contains($p, "sailing")
RETURN $b/title
40
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQuery Local Functions
�Query9:Find the maximum depth of the document named "partlist.xml"
NAMESPACE xsd="http://www.w3.org/2001/XMLSchema-datatypes"
FUNCTIONFUNCTIONFUNCTIONFUNCTION depth(ELEMENT $e) RETURNSRETURNSRETURNSRETURNS xsd:integer
{
-- An empty element has depth 1
-- Otherwise, add 1 to max depth of children
IF empty($e/*) THEN 1
ELSE max(depth($e/*)) + 1
}
depth(doc("partlist.xml"))
2121
41
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQuery and Typing
� XML documents contains a wide range of information:
�From loosely typed information (without a schema)
�To rigidly structured data
� So, a language for querying XML must:
�Avoid assumption about what is allowed
�Allow data to be managed without frequent casting of values
�Allow programmer to focus on the document and not on the whims of the type system
� Objective: How to use the XQuery type system to write type-safe queries, how to understand and resolve type errors and how to work around the type system when necessary
42
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Types in XQuery
� Built-in types
�They are available in any query
�i.e. document, element, attribute, node, text node, ID, IDREF, IDREFS, …
� Types imported from a specific schema
�New types are available in our query
�i.e. define element imdb {
element movie*,
element person*,
element company*
}
� XQuery’s type system is based on XML Schema, but, it is simpler and more concise, for the sake of type checking
2222
43
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Examples
� “hello” has type xs:string
� 42 has type xs:integer
42.00 has type xs:decimal
4.2e1 has type xs:double
� 42.00 + 6.9e1 has type xs:double
� “One” + “Two” is a static type error (…)
� “one” < “two” has type xs:boolean
� “One” < 2 is a static type error
� (1, “two”, 3.14e1) has type
(xs:integer,xs:string,xs:double)
� (<th>value</th>, <td>1</td>) has type
(element(th),element(td))
44
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
No schema defined
� Example 1:
avg (doc(“imdb.xml”)/imdb/movie/runtime)
Runtime is untyped
Converts each runtime to double
2323
45
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
No schema defined
� Example 2:
define function movies-by-genre($g)
{
for $b in doc(“imdb.xml”)/imdb/movie
where $b/genre = $g
return $b/title
}
for $x in doc(“imdb.xml”)/imdb//genre
return movies-by-genre($x)
for $x in doc(“imdb.xml”)/imdb/person
return movies-by-genre($x)
Empty result
46
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Formal Semantics of XQuery
� Defines static & dynamic semantics for every type of XQuery expression
� Static type-checking (compile-time): detect common type errors during static analysis instead of being discovered only when the program is run
�Depends only on the query itself compared against imported schemas
�Infers result type based on types of operands
�Static type: is a compile-time property of an expression
�Static type errors: our well-known type errors
May not be required at all conformance levels of XQuery
� Dynamic execution (run-time): specifies the relationship between input data, an XQuery expression, and the output data
�Depends on input data
�Defines the result value based on the operand values
�Dynamic type: the type of an operand that is determined at runtime
�Dynamic error: occurs during evaluation of a query
Causes implicit invocation of an error function -- fn:error()
2424
47
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Static Typing Example
� Example 3:
define function movies-by-genre($g as element(genre))
as element(title)*
{
for $b in doc(“imdb.xml”)/imdb/movie
where $b/genre = $g
return $b/title
}
48
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
If type is not given …
� Any element that is not explicitly given a simple or complex type has the type xs:anyType
�In XML Schema xs:anyType is the type annotation for every element that has not been validated or for which no more specific type is known
� Any attribute that is not explicitly given a simple or complex type has the type xdt:untypedAtomic
�Is closely related to xs:anySimpleType which is the root of the simple type hierarchy in XML Schema with two differences:
xs:anySimpleType includes non atomic values
xs:anySimpleType includes primitive types (xs:string) as subtypes whereas xdt:untypedAtomic has no subtypes
2525
49
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Hierarchy of Schema Types used in XQuery
50
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQuery Validating Expression
� Definition: the process of determining whether a document instance complies with the definitions & constraints specified by a DTD or Schema�Validation context: controls how the name of a given element is matched against declarations in the imported schemas It is not possible to determine static types if the context used to validate an element constructor was not known statically
So… validation context is part of the static environment� Syntax: validate { validate { validate { validate { expr }}}}� Semantics: evaluate expr (a loaded document or a query), then serialize its value as an XML string and invoke the schema validator on it
� i.e.validatevalidatevalidatevalidate
<movies>{for $m in $imdb/imdb/movie[@mid < “m2”]return
<movie>{ $m/title }</movie>}</movies>
2626
51
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
� Example:
Constructs an element (user) with last name and first name (if it is not too long)
let $first := “Vassilis”,
$last := “Christophides”,
return
<user id=“A123456”>
<name>
{if (fn:string-length($first) > 10) then ( ) else
<first> {$first} </first>}
<last> {$last } </last>
</name>
</user>
Validation example
Globalcontext
Usercontext
User/name
context
User/name
context
52
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Validation example
� Example (cont.):
Constructs an element (user) with last name and first name (if it is not too long)
let $first := validate context user/name
{<first>Vassilis</first>},
$last := validate context user/name
{<last>Christophides</last>}
return
<user id=“A123456”>
<name>
{if (fn:string-length($first) > 10) then ( ) else
$first, $last}
</name>
</user>
Validation context must be explicitly given
2727
53
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Validation modes
� Three modes of validation, to allow mixing of valid and well-formed data:
�Strict requires that a declaration be available for the element, and the
element must validate with respect to that declaration
The element is labeled with the declared type, as are all elements
and attributes within the element
�Skip has no constraints; the elements must simply be well formed
The element and all elements within it are labeled with type
xs:anyType, and all attributes within it are labeled with type
xs:anySimpleType
�Lax behaves as strict if there is a declaration available; otherwise, it
behaves as skip
54
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Testing Types in XQuery
� Instance OfInstance OfInstance OfInstance Of expression returns TRUETRUETRUETRUE if its first operand is an instance of the type named in its second operand, otherwise FALSEFALSEFALSEFALSE:
$animal instance ofinstance ofinstance ofinstance of element dog
� TypeswitchTypeswitchTypeswitchTypeswitch expression executes one branch, based on the type of its operand:
typeswitchtypeswitchtypeswitchtypeswitch($animal)
casecasecasecase element dog return woof($animal)
casecasecasecase element duck return quack($animal)
defaultdefaultdefaultdefault return "No sound"
2828
55
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Tinkering with Types
� cast ascast ascast ascast as ST ( ST ( ST ( ST ( expr ))))�Convert value to target type�Only for predefined type pairs (simple types) according to conversion table as well as derived to base types open issue: converting complex types
�May return error at run-time
� treat astreat astreat astreat as ST ( ST ( ST ( ST ( expr ))))�Check dynamically whether the type of (expr) matches type�Serves as a compile-time "promise"�At run-time, returns an error if type of expr is not ST
� assert asassert asassert asassert as STSTSTST ( expr )�Check statically whether the type of (expr) matches type�Serves as a compile-time assertion�Compile-time error if static type of expr is not ST
56
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Examples
� “2000-01-01” cast ascast ascast ascast as xs:date = xs:date("2000-01-01")
� if $a is of type Address
$a/zipcode
gives a static type error,
but …
($a treat astreat astreat astreat as element(*, USAddress))/zipcode
avoids this static type error
� for $m in $imdb/imdb/movie return
assert asassert asassert asassert as element title (
<title>{$m/title/text()}</title>
)
2929
57
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Importing Schemas
� SCHEMA targetNamespace
SCHEMA targetNamespace AT schemaLocation
import schemas
� Example
schema “http://www.csd.uoc.gr/~hy561/imdb”
schema “http://www.csd.uoc.gr/~hy561/imdb” at
“http://www.csd.uoc.gr/~hy561/Assignment/imdb.xsd”
58
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Example Schema
define element define element define element define element imdbimdbimdbimdb {element movie*,
element person*,
element company*}
define element define element define element define element moviemoviemoviemovie {
attribute attribute attribute attribute midmidmidmid,,,,
element element element element titletitletitletitle,
element soundmix*,
element language*,
element runtime?,
element certification*,
element genre+,
element color*,
element plot?,
element originCountry*,
element alternativeTitle*,
element plotkeyword*,
element plotwrittenby?,
element productionYear?,
element kind?,
element episodes?,
element movie_award*,
element related_qualifications?}
define attributedefine attributedefine attributedefine attribute mid {xsd:string}
define elementdefine elementdefine elementdefine element title {xsd:string}
....
....
....
define element define element define element define element movie_awardmovie_awardmovie_awardmovie_award {{{{
attribute attribute attribute attribute idrefidrefidrefidref {xsd:string} }{xsd:string} }{xsd:string} }{xsd:string} }
define element related_qualifications
{element qualification*}
3030
59
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQuery Global Variables
� define global $imdb {treat as document imdb
(glx:document-validate("/local/imdb.xml", "imdb"))}
<imdb>
<movie>
</movie>
.
.
.
<person>
</person>
</imdb>
$$$$imdbimdbimdbimdb
60
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQuery and XML Schema (projection)
� Example 4: Give the names of all the persons
Query:
$imdb/imdb/person/name
Type:
element name { xsd:string } *
Result:
<name>Audrey Tautou</name>,
<name>Artus de Penguern</name>,
.
.
.
sequence
3131
61
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQuery and XML Schema (selection)
� Example 5: Give the names of all persons that their id is greater than p4
Query(XPATH):
$imdb/imdb/person[@pid > “p4”]/name
Query(XQuery):
for $a in $imdb/imdb/person
where $a/@pid > “p4”
return $a/name
Type:
element name { xsd:string } *
Result:
<name>Keanu Reeves</name>,
<name>Sandra Bullock</name>,
. . .
sequence
62
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQuery and XML Schema (construction)
� Example 6: Give the title of the movies that have been produced after 1994
Query:
<result>
{for $i in $imdb/imdb/movie
where $i/productionYear > 1994
return $i/title }
</result>
Type:
element result {element title {xsd:string}*}
Result:
<result>
<title>Amelie</title>
<title>Dancer in the dark</title>
. . .
XML tree
3232
63
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQuery and XML Schema (construction)
� Example 7: Give the production year and the origin country of all movies
Query:
<result>
{for $b in
$imdb/imdb/movie
return
<movie>{ $b/productionYear,
$b/originCountry }</movie>}
</result>
Type:element result {
element movie {element productionYear {xsd:int}?,element originCountry {xsd:string}*
}*}
Result:
<<<<resultresultresultresult>>>>
<<<<moviemoviemoviemovie>>>>
<<<<productionYearproductionYearproductionYearproductionYear>2001</>2001</>2001</>2001</productionYearproductionYearproductionYearproductionYear>>>>
<<<<originCountryoriginCountryoriginCountryoriginCountry>France</>France</>France</>France</originCountryoriginCountryoriginCountryoriginCountry>>>>
</</</</moviemoviemoviemovie>>>>
. . .
XML tree
64
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQuery and XML Schema (construction)
� Example 8: Give the movie id and the origin country of all moviesQuery:<result>{for $b in
$imdb/imdb/moviereturn<movie id = “{$b/@mid}”>{ $b/originCountry }</movie>}
</result>Type:element result {
element movie {attribute id {xdt:untypedAtomic},element originCountry {xsd:string}*
}*}
Result:
<<<<resultresultresultresult>>>>
<<<<moviemoviemoviemovie idididid====““““m1">m1">m1">m1">
<<<<originCountryoriginCountryoriginCountryoriginCountry>France</>France</>France</>France</originCountryoriginCountryoriginCountryoriginCountry>>>>
</</</</moviemoviemoviemovie>>>>
. . .
XML tree
xsd:string
3333
65
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQuery and XML Schema (Grouping)
� Example 9: Group the movies according to their countries
Query:
<results>
{for $i in distinct-values($imdb//originCountry)
let $p := for $t in $imdb//movie
where $t/originCountry = $i
return <movie>{$t/title/text()}</movie>
return <result country = "{$i}">
{$p}</result>}
</results>
Type:
element results {
element result {attribute country {xdt:untypedAtomic},
element movie {text}*} *}
xsd:string
Join Condition
66
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Type errors (1)
� A type error occurs when the type of an expression is not compatible with the type required by the context in which the expression is used
� Example 10:
Give the name and the name of the father for each person
Query:
<result>
{for $b in $imdb/imdb/person
return
<person>{ $b/name, $b/fatherName }</person>
}
</result>
Type:
element result {element person {
element name {xsd:string}}*}
3434
67
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Type errors 1 (cont.)
� Example 10:
Give the name for each person
Query:
<result>
{for $b in $imdb/imdb/persons
return
<person>{ $b/name}</person>
}
</result>
Type:
element result {( )}
68
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Type errors (2)
� Example 11:
Query:
<result>
{for $b in $imdb/imdb/movie
return
<movie>{ $b/runtime + $b/title}</movie>
}
</result>
Error: xsd:int + xsd:string
3535
69
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Some remarks…
� Example 12: Return the run time and the title for every movieQuery:
<result>{for $b in $imdb/imdb/moviereturn<movie>
<runtime>{$b/runtime/text()}</runtime>,$b/title/text()
</movie>}</result>
Type: element result {element movie {element runtime {text?},text}*}
Type: element result {
element movie{element runtime{text?},text,element title{xsd:string}}*
}
{$b/title}
70
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Another example (union op)
� Example 13: Give the origin country and the country of every movie
Query:
<result>
{for $x in $imdb/imdb/movie,
$n in op:union($x/originCountry, $x/origin)
return
<movie>
<runtime>{data($n)}</runtime>
</movie>
}
</result>
Type:
element result {element movie
{element runtime {type xdt:anyAtomicType*}}*}
3636
71
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQuery Processing Model Overview
72
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQuery Compatibility Issues with XPath 1.0
� Equality (and all the other relational operators) has an implicit existential quantifier:
�$x=3 and $x=4 can evaluate to true
�$x<2 and $x>4 can also evaluate to true
�$x=3 and $x!=3 can evaluate to true
�$x=$y and $y=$z does not imply $x = $z
�$x=$x can evaluate to false
� Nasty consequences:
� high probability of user errors and intense frustration
� good old query evaluation algorithms don’t work anymore
� schema evolution is badly handled
3737
73
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQuery Compatibility Issues with XPath 1.0
� Implicit data conversions for dealing with the “semi-structured” aspect of the data by trying to avoid static or dynamic errors as much as possible� from an element to the element content� from an attribute to the attribute content� from a sequence to a value (by taking the first member)� from a typed value to string� from a string to a typed value� from any typed value to a Boolean (e.g. from a node set to Boolean)
� Examples of bad cases:
� //book[price] is not the same as //book[price+0]
� <book>{@price}</book> is not the same as <book>{@price+0}</book>
� Bad consequences:�the result of the evaluation of an expression can depend on the context where the expression appear
�high probability of user errors and intense frustration
74
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQuery Compatibility Issues with XPath 1.0
� “////” is not a simple projection:
� (//book orderbyorderbyorderbyorderby @price)/title will be sorted by document order, not by price
� Bad consequences:
� high probability of user errors and more frustration
� “////” often requires materialization (for sorting and duplicate elimination)
� difficult to parallelize and stream
3838
75
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
<xsl:transform
xmlns:xsl="http://www.w3.org/1999/XSL/ http://www.w3.org/1999/XSL/ http://www.w3.org/1999/XSL/ http://www.w3.org/1999/XSL/ TransformTransformTransformTransform" version="1.01.01.01.0">
<xsl:output method="xmlxmlxmlxml" indent="yesyesyesyes" />
<xsl:template match="////">
<xsl:element name="bibbibbibbib">
<xsl:for-each select="doc('xmpdoc('xmpdoc('xmpdoc('xmp----
data.xmldata.xmldata.xmldata.xml')/bib/book')/bib/book')/bib/book')/bib/book">
<xsl:if test="publisher= 'Addisonpublisher= 'Addisonpublisher= 'Addisonpublisher= 'Addison----
WesleyWesleyWesleyWesley‘‘‘‘and @year>'1991'and @year>'1991'and @year>'1991'and @year>'1991'">
<xsl:element name="bookbookbookbook">
<xsl:attribute name="yearyearyearyear">
<xsl:value-of select="@year@year@year@year"/>
</xsl:attribute>
<xsl:copy-of select="titletitletitletitle" />
</xsl:element>
</xsl:if>
</xsl:for-each>
</xsl:element>
</xsl:template>
</xsl:transform>
<bib>{<bib>{<bib>{<bib>{
for $b in for $b in for $b in for $b in doc("data/xmpdoc("data/xmpdoc("data/xmpdoc("data/xmp----data.xml")/bib/bookdata.xml")/bib/bookdata.xml")/bib/bookdata.xml")/bib/book
where $b/publisher="Addisonwhere $b/publisher="Addisonwhere $b/publisher="Addisonwhere $b/publisher="Addison----Wesley" and Wesley" and Wesley" and Wesley" and int($bint($bint($bint($b/@year) > 1991/@year) > 1991/@year) > 1991/@year) > 1991
returnreturnreturnreturn
<book year = {$b/@year}>{<book year = {$b/@year}>{<book year = {$b/@year}>{<book year = {$b/@year}>{
$b/title$b/title$b/title$b/title
}</book>}</book>}</book>}</book>
}</bib>}</bib>}</bib>}</bib>
xslt.xls XMPQ1.xquery
XSLT & XQuery: is There a Difference?
76
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010XSLT & XQUERY
XQuery Result:<?xml version="1.0" ?><?xml version="1.0" ?><?xml version="1.0" ?><?xml version="1.0" ?>
<quip:<quip:<quip:<quip:resultresultresultresult xmlns:quipxmlns:quipxmlns:quipxmlns:quip="http://="http://="http://="http://namespaces.softwareag.com/tamino/quipnamespaces.softwareag.com/tamino/quipnamespaces.softwareag.com/tamino/quipnamespaces.softwareag.com/tamino/quip/">/">/">/">
<<<<bibbibbibbib>>>>
<<<<bookbookbookbook yearyearyearyear="1994">="1994">="1994">="1994">
<<<<titletitletitletitle>TCP/IP Illustrated</>TCP/IP Illustrated</>TCP/IP Illustrated</>TCP/IP Illustrated</titletitletitletitle>>>>
</</</</bookbookbookbook>>>>
<<<<bookbookbookbook yearyearyearyear="1992">="1992">="1992">="1992">
<<<<titletitletitletitle>Advanced Programming in the Unix environment</>Advanced Programming in the Unix environment</>Advanced Programming in the Unix environment</>Advanced Programming in the Unix environment</titletitletitletitle> > > >
</</</</bookbookbookbook>>>>
</</</</bibbibbibbib>>>>
</quip:</quip:</quip:</quip:resultresultresultresult>>>>
XSLT Xalan engine Result:<?xml version="1.0" encoding="UTF<?xml version="1.0" encoding="UTF<?xml version="1.0" encoding="UTF<?xml version="1.0" encoding="UTF----8" ?>8" ?>8" ?>8" ?>
<<<<bibbibbibbib>>>>
<<<<bookbookbookbook yearyearyearyear="1994">="1994">="1994">="1994">
<<<<titletitletitletitle>TCP/IP Illustrated</>TCP/IP Illustrated</>TCP/IP Illustrated</>TCP/IP Illustrated</titletitletitletitle>>>>
</</</</bookbookbookbook>>>>
<<<<bookbookbookbook yearyearyearyear="1992">="1992">="1992">="1992">
<<<<titletitletitletitle>Advanced Programming in the Unix environment</title>>Advanced Programming in the Unix environment</title>>Advanced Programming in the Unix environment</title>>Advanced Programming in the Unix environment</title>
</</</</bookbookbookbook>>>>
</</</</bibbibbibbib>>>>
XSLT & XQuery: is There a Difference?
3939
77
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQueryX
� XQueryX is an XML representation of an XQuery
� It was created by mapping the productions of the XQuery abstract syntax directly into XML productions
� XQueryX useful to enable:
� Parser reuse
� Queries on queries
� Generation of queries
� Embedding of queries in XML documents
78
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQueryX
LET $authors := /book/author
RETURN <AUTHORS>{ $authors }</AUTHORS>
<q:query xmlns:q="http://www.w3.org/2001/06/xqueryx">
<q:flwr>
<q:letAssignment variable="$authors">
<q:step axis="CHILD">
<q:identifier/>
<q:step axis="CHILD">
<q:identifier>book</q:identifier>
<q:identifier>author</q:identifier>
</q:step>
</q:step>
</q:letAssignment>
4040
79
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQueryX
<q:return>
<q:elementConstructor>
<q:tagName>
<q:identifier>AUTHORS</q:identifier>
</q:tagName>
<q:variable>$authors</q:variable>
</q:elementConstructor>
</q:return>
</q:flwr>
</q:query>
80
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQuery Status
4141
81
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XML Support for DBMS: Direction
82
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Conclusion
� XML is the lingua franca of the Web
� XML is the next big challenge for the database community
� Large quantities of a new type of data
� textual, irregular, self-organizing, distributed, replicated, etc.
� Many orders of magnitude larger:
� the volume of XML data
� the number of XML data repositories
� The need for such a technology is here
� The solutions are not here !
� Myriad of standards and products issued from industry
4242
83
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Use Case “XMP”: Experiences and Exemplars
� Query: “List books published by Addison-Wesley after 1991, including their year and title”
� Result:<?xml version="1.0" ?>
<result>
<bib>
<book year="1994199419941994">
<title>TCP/IP IllustratedTCP/IP IllustratedTCP/IP IllustratedTCP/IP Illustrated</title>
</book>
<book year="1992199219921992">
<title>Advanced Programming in the UnixAdvanced Programming in the UnixAdvanced Programming in the UnixAdvanced Programming in the Unix
environmentenvironmentenvironmentenvironment</title>
</book>
</bib>
</result>
84
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010Use Case “XMP”: Experiences and Exemplars
<bib>
<book year="1994199419941994">
<title>TCP/IP IllustratedTCP/IP IllustratedTCP/IP IllustratedTCP/IP Illustrated</title>
<author><last>StevensStevensStevensStevens</last> <first>W.W.W.W.</first></author>
<publisher>AddisonAddisonAddisonAddison----WesleyWesleyWesleyWesley</publisher>
<price>65.9565.9565.9565.95</price>
</book>
<book year="1992199219921992">
<title>Advanced Advanced Advanced Advanced ProgProgProgProg………… the Unix environmentthe Unix environmentthe Unix environmentthe Unix environment</title>
<author><last>StevensStevensStevensStevens</last><first>W.W.W.W.</first> </author>
<publisher>AddisonAddisonAddisonAddison----WesleyWesleyWesleyWesley</publisher>
<price>65.9565.9565.9565.95</price>
</book>
<book year="2000200020002000">
<title>Data on the WebData on the WebData on the WebData on the Web</title>
<author><last>AbiteboulAbiteboulAbiteboulAbiteboul</last><first>SergeSergeSergeSerge</first></author>
<author><last>BunemanBunemanBunemanBuneman</last><first>PeterPeterPeterPeter</first></author>
<author><last>SuciuSuciuSuciuSuciu</last><first>DanDanDanDan</first></author>
<publisher>Morgan Kaufmann PublishersMorgan Kaufmann PublishersMorgan Kaufmann PublishersMorgan Kaufmann Publishers</publisher>
<price>39.9539.9539.9539.95</price>
</book>- <book year="1999199919991999">
<title>The Economics of Technology and Content for The Economics of Technology and Content for The Economics of Technology and Content for The Economics of Technology and Content for Digital TVDigital TVDigital TVDigital TV</title>
<editor><last>GerbargGerbargGerbargGerbarg</last><first>DarcyDarcyDarcyDarcy</first>
<affiliation>CITICITICITICITI</affiliation> </editor>
<publisher>KluwerKluwerKluwerKluwer Academic PublishersAcademic PublishersAcademic PublishersAcademic Publishers</publisher>
<price>129.95129.95129.95129.95</price>
</book>
</bib>
<bib>{
for $b in doc("data/xmp-data.xml")/bib/book
where $b/publisher = "Addison-Wesley" and int($b/@year) > 1991
return
<book year = {$b/@year}>{
$b/title
}</book>
}</bib>
xmp-data.xml XMPQ1.xquery
Use Case “XMP”: Experiences and Exemplars
4343
85
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010Use Case “TREE”: Qs that preserve hierarchy
<?xml version="1.0" ?> <!DOCTYPE book SYSTEM “book.dtd”>
<book>
<title>Data on the WebData on the WebData on the WebData on the Web</title>
<author>Serge Serge Serge Serge AbiteboulAbiteboulAbiteboulAbiteboul</author>
<author>Peter Peter Peter Peter BunemanBunemanBunemanBuneman</author>
<author>Dan Dan Dan Dan SuciuSuciuSuciuSuciu</author>
<section id="introintrointrointro" difficulty="easyeasyeasyeasy">
<title>IntroductionIntroductionIntroductionIntroduction</title>
<p>Text ...Text ...Text ...Text ...</p>
<section>
<title>AudienceAudienceAudienceAudience</title>
<p>Text ...Text ...Text ...Text ...</p>
</section>
<section>
<title>Web Data and the Two CulturesWeb Data and the Two CulturesWeb Data and the Two CulturesWeb Data and the Two Cultures</title>
<p>Text ...Text ...Text ...Text ...</p>
<figure height="400400400400" width="400400400400">
<title>Traditional client/server architecture Traditional client/server architecture Traditional client/server architecture Traditional client/server architecture
</title>
<image source="csarch.gifcsarch.gifcsarch.gifcsarch.gif" />
</figure>
<p>Text ...Text ...Text ...Text ...</p>
</section>
</section>
<section id="syntaxsyntaxsyntaxsyntax" difficulty="mediummediummediummedium">
….
</section>
….
</book>
<figlist>{
for $f in doc("data/tree-data.xml")//figure
return
<figure>{
($f/@*
,$f/title
)
}</figure>
}</figlist>
tree-data.xml TREEQ2.xquery
Use Case “TREE”: Qs that preserve hierarchy
86
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010Use Case “TREE”: Qs that preserve hierarchy
<?xml version="1.0" ?>
<quip:result xmlns:quip="http://http://http://http://namespaces.softwareag.com/tamino/quipnamespaces.softwareag.com/tamino/quipnamespaces.softwareag.com/tamino/quipnamespaces.softwareag.com/tamino/quip////">
<figlist>
<figure height="400400400400" width="400400400400">
<title>Traditional client/server architectureTraditional client/server architectureTraditional client/server architectureTraditional client/server architecture</title>
</figure>
<figure height="200200200200" width="500500500500">
<title>Graph representations of structuresGraph representations of structuresGraph representations of structuresGraph representations of structures</title>
</figure>
<figure height="250250250250" width="400400400400">
<title>Examples of RelationsExamples of RelationsExamples of RelationsExamples of Relations</title>
</figure>
</figlist>
</quip:result>
� Query: Prepare a (flat) figure list for first book, listing all the figures and their titles. Preserve the original attributes of each <figure> element, if any
� Result:
Use Case “TREE”: Qs that preserve hierarchy
4444
87
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010Use Case “TREE”: Qs that preserve hierarchy
(<section_count>{
count(doc("data/tree-data.xml")//section)
}</section_count>
, <figure_count>{
count(doc("data/tree-data.xml")//figure)
}</figure_count>
)<?xml version="1.0" ?> -
<quip:resultxmlns:quip="http://namespaces.http://namespaces.http://namespaces.http://namespaces.
softwareag.com/tamino/quipsoftwareag.com/tamino/quipsoftwareag.com/tamino/quipsoftwareag.com/tamino/quip////">
<section_count>7777</section_count>
<figure_count>3333</figure_count>
</quip:result>
TREEQ3.xquery/Result TREEQ4.xquery/Result
<top_section_count>{
count(doc("data/tree-data.xml")/book/section)
}</top_section_count>
<?xml version="1.0" ?>
<quip:resultxmlns:quip="http://namespaces.http://namespaces.http://namespaces.http://namespaces.
softwareag.com/tamino/quipsoftwareag.com/tamino/quipsoftwareag.com/tamino/quipsoftwareag.com/tamino/quip////"><top_section_count>2222</top_section_count>
</quip:result>
Use Case “TREE”: Qs that preserve hierarchy
88
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010Use Case “SEQ”: Queries based on sequence
<report>
<section>
<section.title>ProcedureProcedureProcedureProcedure</section.title>
<section.content> The patient was taken to the The patient was taken to the The patient was taken to the The patient was taken to the operating room where she was placed in supine position operating room where she was placed in supine position operating room where she was placed in supine position operating room where she was placed in supine position and and and and <anesthesia>induced under general anesthesia. induced under general anesthesia. induced under general anesthesia. induced under general anesthesia. </anesthesia>
<prep>
<action>A Foley catheter was placed to A Foley catheter was placed to A Foley catheter was placed to A Foley catheter was placed to decompress the bladderdecompress the bladderdecompress the bladderdecompress the bladder
</action> and the abdomen was then prepped and and the abdomen was then prepped and and the abdomen was then prepped and and the abdomen was then prepped and draped in sterile fashion. draped in sterile fashion. draped in sterile fashion. draped in sterile fashion.
</prep>
<incision> A curvilinear incision was made A curvilinear incision was made A curvilinear incision was made A curvilinear incision was made
<geography>in the midline immediatelyin the midline immediatelyin the midline immediatelyin the midline immediately
infraumbilicalinfraumbilicalinfraumbilicalinfraumbilical </geography>
and the subcutaneous tissue was divided and the subcutaneous tissue was divided and the subcutaneous tissue was divided and the subcutaneous tissue was divided
<instrument>using using using using electrocauteryelectrocauteryelectrocauteryelectrocautery....</instrument>
</incision>
The fascia was identified and The fascia was identified and The fascia was identified and The fascia was identified and
<action>#2 0 #2 0 #2 0 #2 0 MaxonMaxonMaxonMaxon stay sutures were placed on each stay sutures were placed on each stay sutures were placed on each stay sutures were placed on each
side of the midline. side of the midline. side of the midline. side of the midline. </action>
<incision> The fascia was divided using The fascia was divided using The fascia was divided using The fascia was divided using
<instrument>electrocauteryelectrocauteryelectrocauteryelectrocautery</instrument> and the and the and the and the peritoneum was entered. peritoneum was entered. peritoneum was entered. peritoneum was entered.
</incision>
. . .. . .. . .. . .
</section.content>
</section>
</report>
for $s in
doc("data/report1.xml")//
section[section.title="Procedure"]
let $instruments:=$s//instrument
for $i in 1 to 2
return $instruments[$i]
Query: In the Procedure section of Report1, what are the first two Instruments to be used?
Result:<?xml version="1.0" ?>
<quip:result xmlns:quip="http://namespaces.http://namespaces.http://namespaces.http://namespaces.
softwareag.com/tamino/quipsoftwareag.com/tamino/quipsoftwareag.com/tamino/quipsoftwareag.com/tamino/quip////">
<instrument>using using using using electrocauteryelectrocauteryelectrocauteryelectrocautery....
</instrument>
<instrument>electrocauteryelectrocauteryelectrocauteryelectrocautery</instrument>
</quip:result>
report1.xml SEQQ2.xquery
Use Case “SEQ”: Queries based on sequence
4545
89
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Use Case “R”: Access to Relational DataUSERS
USERID NAME RATING
U01 Tom Jones B
U02 Mary Doe A
U03 Dee Linquent D
U04 Roger Smith C
U05 Jack Sprat B
U06 Rip Van Winkle B
ITEMS
ITEMNO DESCR O_BY DATE PRICE
1001 Red Bicycle U01 99-01-05 99-01-20 40
1002 Motorcycle U02 99-02-11 99-03-15 500
1003 Old Bicycle U02 99-01-10 99-02-20 25
1004 Tricycle U01 99-02-25 99-03-08 15
1005 Tennis Racket U03 99-03-19 99-04-30 20
1006 Helicopter U03 99-05-05 99-05-25 50000
1007 Racing Bicycle U04 99-01-20 99-02-20 200
1008 Broken Bicycle U01 99-02-05 99-03-06 25
BIDS
USERID ITEMNO BID BID_DATE
U02 1001 35 99-01-07
U04 1001 40 99-01-08
U02 1001 45 99-01-11
U04 1001 50 99-01-13
U02 1001 55 99-01-15
U01 1002 400 99-02-14
….
<result> {
for $u in doc("users.xml")//user_tuple
for $i in doc("items.xml")//item_tuple
where $u/rating > "C" and $i/reserve_price > 1000 and $i/offered_by = $u/userid
return <warning>
{ $u/name } { $u/rating } { $i/description }
{ $i/reserve_price } </warning> }
</result>
relational data RQ2.xquery
Use Case “R”: Access to Relational Data
90
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
<?xml version="1.0" ?>
<quip:result …>
<result>-
<warning>
<name>Dee Dee Dee Dee LinquentLinquentLinquentLinquent</name>
<rating>DDDD</rating>
<description>HelicopterHelicopterHelicopterHelicopter</description>
<reserve_price>50000500005000050000</reserve_price>
</warning>
</result>
</quip:result>
Query: Find cases where a user with a rating worse (alphabetically, greater) than "C" is offering an item with a reserve price of more than 1000
Result:
Use Case “R”: Access to Relational DataUse Case “R”: Access to Relational Data
4646
91
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
<result>{
for $u in doc("data/R-users.xml")//user_tuple
let $b := doc("data/R-bids.xml")//bid_tuple[userid = $u/useridand int(string-value(bid)) >= 100]
where count($b) > 1
return
<big_spender>{ $u/name/text() }</big_spender>
}</result>
Result:<?xml version="1.0" ?>
<quip:result xmlns:quip="http://http://http://http://namespaces.softwareag.com/tamino/quipnamespaces.softwareag.com/tamino/quipnamespaces.softwareag.com/tamino/quipnamespaces.softwareag.com/tamino/quip////">
<result>
<big_spender>Mary DoeMary DoeMary DoeMary Doe</big_spender>
<big_spender>Dee Dee Dee Dee LinquentLinquentLinquentLinquent</big_spender>
<big_spender>Roger SmithRoger SmithRoger SmithRoger Smith</big_spender>
</result>
</quip:result>
Query: List names of users who have placed multiple bids of at least $100 each
Use Case “R”: Access to Relational DataUse Case “R”: Access to Relational Data
92
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
<?xml version="1.0" ?> -
<quip:result xmlns:quip="http://http://http://http://namespaces.softwareag.com/tamino/quipnamespaces.softwareag.com/tamino/quipnamespaces.softwareag.com/tamino/quipnamespaces.softwareag.com/tamino/quip////">
<part partid="0000" name="carcarcarcar">
<part partid="1111" name="engineengineengineengine">
<part partid="3333" name="pistonpistonpistonpiston" />
</part>
<part partid="2222" name="doordoordoordoor">
<part partid="4444" name="windowwindowwindowwindow" />
<part partid="5555" name="locklocklocklock" />
</part>
</part>
<part partid="10101010" name="skateboardskateboardskateboardskateboard">
<part partid="11111111" name="boardboardboardboard" />
<part partid="12121212" name="wheelwheelwheelwheel" />
</part>
<part partid="20202020" name="canoecanoecanoecanoe" />
</quip:result>
Query: Convert the sample document from "partlist" format to "parttree" format (see DTD section for definitions). In the result document, part containment is represented by containment of one <part> element inside another. Each part that is not part of any other part should appear as a separate top-level element in the output document
Result:
Use Case “PARTS”: Recursive Parts ExplosionUse Case “PARTS”: Recursive Parts Explosion
4747
93
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010Use Case “PARTS”: Recursive Parts Explosion
<?xml version="1.0" encoding="ISO-8859-1" ?>
<partlist>
<part partid="0000" name="carcarcarcar" />
<part partid="1111" partof="0000" name="engineengineengineengine" />
<part partid="2222" partof="0000" name="doordoordoordoor" />
<part partid="3333" partof="1111" name="pistonpistonpistonpiston" />
<part partid="4444" partof="2222" name="windowwindowwindowwindow" />
<part partid="5555" partof="2222" name="locklocklocklock" />
<part partid="10101010" name="skateboardskateboardskateboardskateboard" />
<part partid="11111111" partof="10101010" name="boardboardboardboard" />
<part partid="12121212" partof="10101010" name="wheelwheelwheelwheel" />
<part partid="20202020" name="canoecanoecanoecanoe" />
</partlist>
define function one_level(xs:AnyType $p, xs:AnyType $ps) returns xs:AnyType {
<part>{
($p/@partid
,$p/@name
,for $s in
$ps[$p/@partid = @partof]
return one_level($s,$ps)
)
}</part>
}
let $ps := doc("data/parts-data.xml")/partlist/part
for $p in $ps[not(@partof)]
return one_level($p,$ps)
parts-data.xml PARTSQ1a.xquery
Use Case “PARTS”: Recursive Parts Explosion
94
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
References
� “XML Data: From Research to Standards” Daniela Florescu, Jerome Simeon Tutorial VLDB 2000
� “XML Queries and Updates” Daniela Florescu Tutorial XML World 2001
� “Query Languages for XML” Malika Mahoui Lecture Notes CSE330 University of Pennsylvania, Fall 2001
� “Progress Report on XQuery” Don Chamberlin Almaden Research Center May 24, 2002
4848
95
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
XQuery Software
� QuiP
� http://www.softwareag.com/developer/downloads/default.htm
� Software AG
Windows and Linux on x86
� Features
Latest W3C syntax
Graphical user interface
� Kweelt
� http://kweelt.sourceforge.net/
� Open Source
Runs on all Java platforms
� Problems
Older syntax, from previous W3C requirements
No graphical user interface
96
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
� General questions:
� Which programming style ? (most powerful and most natural)
� Which type of syntax ? (beauty contest...)
� Unclear technical issues:
� Should the query semantics depend on the presence of the schema?
� Automatic type coercion?
� Type inference and type checking?
� Recursive functions?
� Full regular expressions?
� Fusion operator?
� Copy semantics or identity-based semantics?
� Copy by reacheability?
� Dereferencing operator semantics?
� Extensibility: protocol and exception handling ?
Open Questions
4949
97
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
� Update languages for XML
� XML views of object-relational
databases
� Storing XML data in object-
relational DBMSs
� Alternative storage methods
for XML data
� Indexing XML
� Query processing algorithms
for XML data
� Mixing structured search with
full-text search
XML-related Research Problems
� XML benchmarks
� Distributed execution of XML
queries
� XML-based information
mediation
� XML data cleaning
� XML data compression
� Efficient (streamed) processing
of XML transformations
� XML-based information
brokering
� XML-based workflow systems
and many more...
98
ICS-FORTH & Univ. of Crete V. Christophides I. Fundulaki
CS561 Spring 2010
Formal Semantics of XQuery
� If a query passes static type checking, it may still return the error value
�It may divide by zero
�Casts may fail. Example:
�cast as integer($x) where value of $x is "garbage"
� If a query fails static type checking, it may still execute successfully and compute a useful result. Example (with no schema):
�$emp/salary + 1000
�Static semantics says this is a type error
�Dynamic semantics executes it successfully if $emp has exactly one salary subelement with a numeric value