xml model and processing transparency no. 1 query xml documents with xquery cheng-chia chen
TRANSCRIPT
XML Model and Processing
Transparency No. 1
Query XML Documentswith XQuery
Cheng-Chia Chen
Introduction to XLink
Transparency No. 2
Objectives
How XML generalizes relational databasesThe XQuery languageHow XML may be supported in databases
Introduction to XLink
Transparency No. 3
XQuery 1.0
XML documents naturally generalize database relationsXQuery is the corresponding generalization of SQL
Introduction to XLink
Transparency No. 4
Queries on XML documents
XML documents generalize relational data:
relations (tables) tuples (record; rows) attributes (fields; columns)
Introduction to XLink
Transparency No. 5
From Relations to Trees
Introduction to XLink
Transparency No. 6
Only Some Trees are Relations
A relation is a tree of height two with an unbounded number of children (rows) all of which have the same fixed number of child nodes (columns)
The database community has been looking for a richer data model than relations. Hierarchical, object-oriented, or multi-dimensional databases have emerged, but neither has reached consensus.
A DTD for RDB : <!ELEMENT PeopleDataBase (PeopleTable, …) > <!ELEMENT PeopleTable (PeopleRow*) > <!ELEMENT PeopleRow (%Column;) >
<!ENTITY % Column “FirstName, FastName, Age “ !> <!ELEMENT FirstName (#PCDATA)> <!ELEMENT LastName (#PCDATA)> <!ELEMENT Age (#PCDATA)>
Introduction to XLink
Transparency No. 7
Trees Are Not Relations
Not all XML Document trees satisfy the previous characterization Trees are ordered, while both rows and columns of tables
may be permuted without changing the meaning of the data
Trees may have height > 2. Trees is in general not homogeneous.
Elements of the same type may have different number and types of children.
This does not mean that we cannot store these documents in traditional data base but only that a more complex encoding of xml documents into data base tables is necessary.
Introduction to XLink
Transparency No. 8
An examlpe student grade records
<students> <student id="100026"> <name>Joe Average</name> <age>21</age> <major>Biology</major> <grades> <result course="Math 101" grade="C-"/> <result course="Biology 101" grade="C+"/> <result course="Statistics 101" grade="D"/> </grades> </student> <student id="100078"> <name>Jack Doe</name> <age>18</age> <major>Physics</major><major>XML Science</major> <grades> <result course="Math 101" grade="A"/> <result course="XML 101" grade="A-"/> <result course="Physics 101" grade="B+"/> <result course="XML 102" grade="A"/> </grades> </student></students>
Introduction to XLink
Transparency No. 9
A Corresponding Student Database
key point of data base design: decompose general xml tree into flat data base tables
Introduction to XLink
Transparency No. 10
For Relational Data base, we have well-established theory and practice, can we have the same achievement for XML ? database XML DDL(DataDefintionLang) XML Schema DQL(DataQueryLang;SQL) XQuery
How should query languages like SQL be similarly generalized?
Introduction to XLink
Transparency No. 11
Usage Scenarios of an XML query language
For data-oriented (highly structured) languages We want to carry over the kinds of queries that we
performed in the original relational model query (virtual) XML representations of databases, transform data into new XML representations, integrate data from multiple heterogeneous data sources
For document-oriented (semi-structured) languages Queries could be used to retrieve parts of documents to provide dynamic indexes to perform context-sensitive searching to generate new documents as combinations of existing documents Studied in the domain of informational retrieval (IR)
Introduction to XLink
Transparency No. 12
Usage Scenarios
For programming/protocol languages Queries could be used to automatically generate
documentation, like javadoc.For hybrid languages
Queries could be used to data mine hybrid data, such as patient records
to perform queries on documents with embedded data, such as catalogs, patient health records, employment records, or business analysis documents
Introduction to XLink
Transparency No. 13
XQuery Design Requirements
Must have at least one XML syntax and at least one human-readable syntax
Must be declarativeMust be namespace awareMust coordinate with XML SchemaMust support simple and complex datatypesMust combine information from multiple documentsMust be able to transform and create XML trees
Introduction to XLink
Transparency No. 14
The XQuery language
Developed by W3C, currently at the level of a Working Draft.
Derived from several previous proposals: XML-QL YATL Lorel Quilt
which all agree on the fundamental principles. XQuery relies on XPath and XML Schema datatype
s. Although not finalized yet, many
commercial implementations have been released. Two formats
XQuery : plaintext syntax XQueryX : XML syntax.
Introduction to XLink
Transparency No. 15
Relationship to XPath
XQuery 1.0 is a strict superset of XPath 2.0 Every XPath 2.0 expression is directly an XQuery 1.0
expression (a query)The extra expressive power is the ability to
join information from different sources and generate new XML fragments
Introduction to XLink
Transparency No. 16
Relationship to XSLT
XQuery and XSLT are both domain-specific languages for combining and transforming XML data from multiple sources
They are vastly different in design, partly for historical reasons XQuery is designed from scratch, XSLT is an intellectual descendant of CSS (cascaded Style
Sheet) – about how to render a document.Technically, they may emulate each other.XQuery Grammar
Introduction to XLink
Transparency No. 17
XQuery concepts
A query in XQuery is an expression that: reads a sequence of XML fragments or atomic values returns a sequence of XML fragments or atomic values
The principal forms of XQuery expressions are: path expressions element constructors
FLWR ("flower") expressions list expressions conditional expressions quantified expressions datatype expressions
Introduction to XLink
Transparency No. 18
XQuery Modules and Prologs
An XQuery is: a main module + zero or more library modules. MainModule ::= Prolog QueryBoby LibraryModule ::= ModuleDecl Prolog Module ::= VersionDecl? (MainModule | LibraryModule)
Like XPath expressions, XQuery expressions are evaluated relatively to a context
This is explicitly provided by a prologSettings define various parameters for the XQuery
processor language, such as:
xquery version "1.0"; (: this is verisondecl :) declare xmlspace preserve; declare xmlspace strip;
Introduction to XLink
Transparency No. 19
More From the Prolog
Prolog ::= ( (DefaultNamespaceDecl | Setter | NamespaceDecl | Import) Separator )* ( (VarDecl | FunctionDecl | OptionDecl) Separator )*
[21] SchemaImport ::= "import" "schema" SchemaPrefix? URILiteral ("at" URILiteral ("," URILiteral)*)?
[22] SchemaPrefix ::= ("namespace" NCName "=") | ("default" "element" "namespace")[23] ModuleImport ::= "import" "module" ("namespace" NCName "=")? URILiteral ("at"
URILiteral ("," URILiteral)*)?
(: for element and type :)declare default element namespace URI; declare default function namespace URI;import schema namespace soap="http://.../soap-envelope" at "http://.../soap-envelope/";
declare namespace NCName = URI;
Introduction to XLink
Transparency No. 20
Implicit Declarations
declare namespace xml = "http://www.w3.org/XML/1998/namespace";
(: for xml schema and schema instance :)xs = "http://www.w3.org/2001/XMLSchema";xsi = "http://www.w3.org/2001/XMLSchema-instance";(: for xpath fucntion and data types :)fn = "http://www.w3.org/2005/11/xpath-functions";(: for xquery fucntions :)local="http://www.w3.org/2005/11/xquery-local-functions";
only xml prefix cannot be redefined.
Introduction to XLink
Transparency No. 21
Xquery Conecpts
QueryBody = Expr[31] Expr ::= ExprSingle ("," ExprSingle)*[32] ExprSingle ::= FLWORExpr
| QuantifiedExpr | TypeswitchExpr| IfExpr | OrExpr
Expressions are evaluated relative to a context: namespaces variables functions date and time context item (current node or atomic value) context position (in the sequence being processed) context size (of the sequence being processed)
Introduction to XLink
Transparency No. 22
XPath Expressions
XPath expressions are also legal XQuery expressions same data model : sequence of items items : nodes or atomic data
The XQuery prolog gives the required static context namespace, function library, variable bindings
The initial context node, position, and size are undefined Since XQuery is intended to operate on multple docuemnts. initial context info can be obtained by starting from fn:doc
()
Introduction to XLink
Transparency No. 23
Datatype Expressions
Same atomic values as XPath 2.0Also lots of primitive simple values: xs:string("XML is fun") xs:boolean("true") xs:decimal("3.1415") xs:float("6.02214199E23") xs:dateTime("1999-05-31T13:20:00-05:00") xs:time("13:20:00-05:00") xs:date("1999-05-31") xs:gYearMonth("1999-05") xs:gYear("1999") xs:hexBinary("48656c6c6f0a") xs:base64Binary("SGVsbG8K") xs:anyURI("http://www.brics.dk/ixwt/") xs:QName("rcp:recipe") Functions for theses types can be found in fn:XXX().
Introduction to XLink
Transparency No. 24
XML Expressions
XQuery expressions may compute new XML nodesExpressions may denote
element, character data, comment, and processing instruction nodes
Each node is created with a unique node identityConstructors may be either
direct (literal template ) or computed
Introduction to XLink
Transparency No. 25
Uses the standard XML syntaxEx:
The expression
<foo><bar/>baz</foo> evaluates to the given XML fragment
Note that
<foo/> is <foo/> evaluates to false why ? ( unique identity of each node )
Direct Constructors
Introduction to XLink
Transparency No. 26
Namespaces in Constructors (1/3)
The following three constructions have the same effect.
declare default element namespace "http://businesscard.org";
<card> <name>John Doe</name> <title>CEO, Widget Inc.</title> <email>[email protected]</email> <phone>(202) 555-1414</phone> <logo uri="widget.gif"/></card>
Introduction to XLink
Transparency No. 27
Namespaces in Constructors (2/3)
declare namespace b = "http://businesscard.org";<b:card> <b:name>John Doe</b:name> <b:title>CEO, Widget Inc.</b:title> <b:email>[email protected]</b:email> <b:phone>(202) 555-1414</b:phone> <b:logo uri="widget.gif"/></b:card>
Introduction to XLink
Transparency No. 28
Namespaces in Constructors (3/3)
<card xmlns="http://businesscard.org"> <name>John Doe</name> <title>CEO, Widget Inc.</title> <email>[email protected]</email> <phone>(202) 555-1414</phone> <logo uri="widget.gif"/></card>
Introduction to XLink
Transparency No. 29
Enclosed Expressions
Called attribuate value template in XSLT can be appled to element contents also.
<foo>1 2 3 4 5</foo><foo>{1, 2, 3, 4, 5}</foo><foo>{1, "2", 3, 4, 5}</foo><foo>{1 to 5}</foo><foo>1 {1+1} {" "} {"3"} {" "} {4 to 5}</foo>
<foo bar="1 2 3 4 5"/><foo bar="{1, 2, 3, 4, 5}"/><foo bar="1 {2 to 4} 5"/>‘{‘ now has special meaning and must be escaped using 
23;
Introduction to XLink
Transparency No. 30
Explicit Constructors
<card xmlns="http://businesscard.org"> <name>John Doe</name> <title>CEO, Widget Inc.</title> <email>[email protected]</email> <phone>(202) 555-1414</phone> <logo uri="widget.gif"/></card> element card {
namespace { "http://businesscard.org" }, element name { text { "John Doe" } }, element title { text { "CEO, Widget Inc." } } , element email { text { "[email protected]" } }, element phone { text { "(202) 555-1414" } }, element logo { attribute uri { "widget.gif" } }}
Introduction to XLink
Transparency No. 31
Computed QNames
element { "card" } { namespace { "http://businesscard.org" }, element { "name" } { text { "John Doe" } }, element { "title" } { text { "CEO, Widget Inc." } }, element { "email" } { text { "[email protected]" } }, element { "phone" } { text { "(202) 555-1414" } }, element { "logo" } { attribute { "uri" } { "widget.gif" } }}
Introduction to XLink
Transparency No. 32
element { if ($lang=”zh_TW“) then ”名片 " else "card" } { namespace { "http://businesscard.org" },
element { if ($lang=“zh_TW”) then ”姓名 " else "name" } { text { "John Doe" } },
element { if ($lang=”zh_TW “) then ”頭銜 " else "title" }
{ text { "CEO, Widget Inc." } },
element { "email" } { text { "[email protected]" } },
element { if ($lang=“zh_TW”) then ”電話 " else "phone"} { text { "(202) 456-1414" } },
element { "logo" } { attribute { "uri" } { "widget.gif" } }}
Biliingual Business Cards
Introduction to XLink
Transparency No. 33
FLWOR Expressions
Used for general queries:Find the names of all students which have more than
one major and order them by student id. <doubles> { for $s in fn:doc("students.xml")//student let $m := $s/major where fn:count($m) ge 2 order by $s/@id return <double> { $s/name/text() } </double> } </doubles>
Introduction to XLink
Transparency No. 34
The Difference Between For and Let (1/4)
for $x in (1, 2, 3, 4)let $y := ("a", "b", "c")return ($x, $y)
1, a, b, c, 2, a, b, c, 3, a, b, c, 4, a, b, c
Introduction to XLink
Transparency No. 35
The Difference Between For and Let (2/4)
let $x in (1, 2, 3, 4)for $y := ("a", "b", "c")return ($x, $y)
1, 2, 3, 4, a, 1, 2, 3, 4, b, 1, 2, 3, 4, c
Introduction to XLink
Transparency No. 36
The Difference Between For and Let (3/4)
for $x in (1, 2, 3, 4)for $y in ("a", "b", "c")return ($x, $y)
1, a, 1, b, 1, c, 2, a, 2, b, 2, c,3, a, 3, b, 3, c, 4, a, 4, b, 4, c
Introduction to XLink
Transparency No. 37
The Difference Between For and Let (4/4)
let $x := (1, 2, 3, 4)let $y := ("a", "b", "c")return ($x, $y)
1, 2, 3, 4, a, b, c
Introduction to XLink
Transparency No. 38
Computing Joins
Find distinct names of recipses that make use of at least one stuff in a refrigerator?
declare namespace rcp = "http://www.brics.dk/ixwt/recipes";for $r in fn:doc("recipes.xml")//rcp:recipefor $i in $r//rcp:ingredient/@namefor $s in fn:doc("fridge.xml")//stuff[text()=$i]return fn:distinct($r/rcp:title/text())
<fridge> <stuff>eggs</stuff> <stuff>olive oil</stuff> <stuff>ketchup</stuff> <stuff>unrecognizable moldy thing</stuff></fridge>
Introduction to XLink
Transparency No. 39
Inverting a Relation
From Recipe ingredients to ingredietn recipes
declare namespace rcp = "http://www.brics.dk/ixwt/recipes";<ingredients> { for $i in distinct-values( fn:doc("recipes.xml")//rcp:ingredient/@name ) return <ingredient name="{$i}"> { for $r in fn:doc("recipes.xml")//rcp:recipe where $r//rcp:ingredient[@name=$i] return <title>$r/rcp:title/text()</title> } </ingredient> }</ingredients>
Introduction to XLink
Transparency No. 40
Sorting the Results
declare namespace rcp = "http://www.brics.dk/ixwt/recipes";<ingredients> { for $i in distinct-values( fn:doc("recipes.xml")//rcp:ingredient/@name ) order by $i return <ingredient name="{$i}"> { for $r in fn:doc("recipes.xml")//rcp:recipe where $r//rcp:ingredient[@name=$i] order by $r/rcp:title/text() return <title>$r/rcp:title/text()</title> } </ingredient> }</ingredients>
Introduction to XLink
Transparency No. 41
A More Complicated Sorting
for $s in document("students.xml")//student
order by fn:count($s/results/result[fn:contains(@grade,"A")])
descending, fn:count($s/major) descending, xs:integer($s/age/text()) ascending
return $s/name/text()
Introduction to XLink
Transparency No. 42
Using Functions
declare function local:grade($g) { if ($g="A") then 4.0 else if ($g="A-") then 3.7 else if ($g="B+") then 3.3 else if ($g="B") then 3.0 else if ($g="B-") then 2.7 else if ($g="C+") then 2.3 else if ($g="C") then 2.0 else if ($g="C-") then 1.7 else if ($g="D+") then 1.3 else if ($g="D") then 1.0 else if ($g="D-") then 0.7 else 0};declare function local:gpa($s) { fn:avg(for $g in $s/results/result/@grade return
local:grade($g))};<gpas> { for $s in fn:doc("students.xml")//student return <gpa id="{$s/@id}" gpa="{local:gpa($s) }"/> }</gpas>
Introduction to XLink
Transparency No. 43
A Height Function
Find the heigth of an node tree.
declare function local:height($x) { if (fn:empty($x/*)) then 1 else fn:max(for $y in $x/* return local:height($y))+1};
Introduction to XLink
Transparency No. 44
A Textual Outline
intended textual outline of a recipe unusual in that it generate plain text instead of xml output.Cailles en Sarcophages pastry chilled unsalted butter flour salt ice water filling baked chicken marinated chicken small chickens, cut up Herbes de Provence dry white wine orange juice minced garlic truffle oil...
Introduction to XLink
Transparency No. 45
Computing Textual Outlines
declare namespace rcp = "http://www.brics.dk/ixwt/recipes";
declare function local:ingredients($i,$p){ fn:string-join( (:$p is iden spaces :) for $j in $i/rcp:ingredient return fn:string-join(($p,$j/@name,"",local:ingredients($j,fn:concat($p," "))),""),"")};
declare function local:recipes($r) { fn:concat( $r/rcp:title/text(),"", local:ingredients($r," "))};fn:string-join( for $r in fn:doc("recipes.xml")//rcp:recipe[5] return local:recipes($r),””)
Introduction to XLink
Transparency No. 46
Sequence Types
2 instance of xs:integer2 instance of item()2 instance of xs:integer?() instance of empty()() instance of xs:integer*(1,2,3,4) instance of xs:integer*(1,2,3,4) instance of xs:integer+<foo/> instance of item()<foo/> instance of node()<foo/> instance of element()<foo/> instance of element(foo)<foo bar="baz"/> instance of element(foo)<foo bar="baz"/>/@bar instance of attribute()<foo bar="baz"/>/@bar instance of attribute(bar)fn:doc("recipes.xml")//rcp:ingredient instance of element()+fn:doc("recipes.xml")//rcp:ingredient instance of element(rcp:ingredient)+
Introduction to XLink
Transparency No. 47
An Untyped Function
declare function local:grade($g) { if ($g="A") then 4.0 else if ($g="A-") then 3.7 else if ($g="B+") then 3.3 else if ($g="B") then 3.0 else if ($g="B-") then 2.7 else if ($g="C+") then 2.3 else if ($g="C") then 2.0 else if ($g="C-") then 1.7 else if ($g="D+") then 1.3 else if ($g="D") then 1.0 else if ($g="D-") then 0.7 else 0};
Introduction to XLink
Transparency No. 48
A Default Typed Function
declare function local:grade($g as item()*) as item()* { if ($g="A") then 4.0 else if ($g="A-") then 3.7 else if ($g="B+") then 3.3 else if ($g="B") then 3.0 else if ($g="B-") then 2.7 else if ($g="C+") then 2.3 else if ($g="C") then 2.0 else if ($g="C-") then 1.7 else if ($g="D+") then 1.3 else if ($g="D") then 1.0 else if ($g="D-") then 0.7 else 0};
Introduction to XLink
Transparency No. 49
A Precisely Typed Function
declare function local:grade($g as xs:string) as xs:decimal { if ($g="A") then 4.0 else if ($g="A-") then 3.7 else if ($g="B+") then 3.3 else if ($g="B") then 3.0 else if ($g="B-") then 2.7 else if ($g="C+") then 2.3 else if ($g="C") then 2.0 else if ($g="C-") then 1.7 else if ($g="D+") then 1.3 else if ($g="D") then 1.0 else if ($g="D-") then 0.7 else 0};
Introduction to XLink
Transparency No. 50
Another Typed Function
declare function local:grades($s as element(students)) as attribute(grade)* { $s/student/results/result/@grade};
Introduction to XLink
Transparency No. 51
Runtime Type Checks
Type annotations are checked during runtimeA runtime type error is provoked when
an actual argument value does not match the declared type a function result value does not match the declared type a valued assigned to a variable does not match the
declared type
Introduction to XLink
Transparency No. 52
Built-In Functions Have Signatures
fn:contains($x as xs:string?, $y as xs:string?) as xs:boolean
op:union($x as node()*, $y as node()*) as node()*
Introduction to XLink
Transparency No. 53
XQueryX
for $t in fn:doc("recipes.xml")/rcp:collection/rcp:recipe/rcp:titlereturn $t
<xqx:module xmlns:xqx="http://www.w3.org/2003/12/XQueryX" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/12/XQueryX xqueryx.xsd"> <xqx:mainModule> <xqx:queryBody> <xqx:expr xsi:type="xqx:flwrExpr"> <xqx:forClause> <xqx:forClauseItem> <xqx:typedVariableBinding> <xqx:varName>t</xqx:varName> </xqx:typedVariableBinding> <xqx:forExpr> <xqx:expr xsi:type="xqx:pathExpr"> <xqx:expr xsi:type="xqx:functionCallExpr"> <xqx:functionName>doc</xqx:functionName> <xqx:parameters> <xqx:expr xsi:type="xqx:stringConstantExpr"> <xqx:value>recipes.xml</xqx:value> </xqx:expr> </xqx:parameters>
<xqx:stepExpr> <xqx:xpathAxis>child</xqx:xpathAxis> <xqx:elementTest> <xqx:nodeName> <xqx:QName>rcp:collection</xqx:QName> </xqx:nodeName> </xqx:elementTest> </xqx:stepExpr> <xqx:stepExpr> <xqx:xpathAxis>child</xqx:xpathAxis> <xqx:elementTest> <xqx:nodeName> <xqx:QName>rcp:recipe</xqx:QName> </xqx:nodeName> </xqx:elementTest> </xqx:stepExpr> <xqx:stepExpr> <xqx:xpathAxis>child</xqx:xpathAxis>
xqx:nodeName> <xqx:QName>rcp:title</xqx:QName> </xqx:nodeName> </xqx:elementTest> </xqx:stepExpr> </xqx:expr> </xqx:forExpr> </xqx:forClauseItem> </xqx:forClause> <xqx:returnClause> <xqx:expr xsi:type="xqx:variable"> <xqx:name>t</xqx:name> </xqx:expr> </xqx:returnClause> </xqx:expr> </xqx:elementContent> </xqx:expr> </xqx:queryBody> </xqx:mainModule></xqx:module>
Introduction to XLink
Transparency No. 54
XML Databases
How can XML and databases be merged?
Several different approaches: extract XML views of relations use SQL to generate XML shred XML into relational databases
Introduction to XLink
Transparency No. 55
The Student Database Again
Introduction to XLink
Transparency No. 56
Automatic XML Views (1/2)
Columns as attributes<Students> <record id="100026" name="Joe Average" age="21"/> <record id="100078" name="Jack Doe" age="18"/></Students>
Introduction to XLink
Transparency No. 57
Automatic XML Views (2/2)
Column as elements<Students> <record> <id>100026</id> <name>Joe Average</name> <age>21</age> </record> <record> <id>100078</id> <name>Jack Doe</name> <age>18</age> </record></Students>
Introduction to XLink
Transparency No. 58
Programmable Views
wrap query results by xml elements SQL/XMLxmlelement(name, "Students", select xmlelement(name, "record", xmlattributes(s.id, s.name, s.age)) from Students)
xmlelement(name, "Students", select xmlelement(name, "record", xmlforest(s.id, s.name, s.age)) from Students)
Introduction to XLink
Transparency No. 59
XML Shredding
Shredding an XML documents into multiple data base Tables
Each element type is represented by a relationEach element node is assigned a unique key in
document order so siblings can be ordered.
Each element node contains the key of its parent so parent –child relations can be maintained
The possible attributes are represented as fields, where absent attributes have the null value
Contents consisting of a single character data node (#PCDATA only) is inlined as a field.
Introduction to XLink
Transparency No. 60
From XQuery to SQL
Any XML document can be faithfully representedThis takes advantage of the existing database
implementationQueries must now be phrased in ordinary SQL rather
than XQueryBut an automatic translation is possible
//rcp:ingredient[@name="butter"]/@amount
select ingredient.amountfrom ingredientwhere ingredient.name="butter"
Introduction to XLink
Transparency No. 61
Summary
XML trees generalize relational tablesXQuery similarly generalizes SQL
XQuery and XSLT have roughly the same expressive power
But they are suited for different application domains: data-centric vs. document-centric
Introduction to XLink
Transparency No. 62
Essential Online Resources
http://www.w3.org/TR/xquery/http://www.w3.org/XML/Query/miplementations:
http://www.galaxquery.org/ http://exists.sf.net http://saxon.sf.net