module 3: xml query and manipulation · module 3: xml query and manipulation key xml query and...
TRANSCRIPT
Module 3: XML Query and Manipulation
Key XML query and manipulation languagesinclude
XPathXQueryXSLTSQL/XML
c©Munindar P. Singh, CSC 513, Spring 2010 p.45
Metaphors for Handling XML: 1
How we conceptualize XML documentsdetermines our approach for handling them
Text: an XML document is textIgnore any structure and perform simplepattern matches
Tags: an XML document is text interspersedwith tags
Treat each tag as an “event” duringreading a document, as in SAX (SimpleAPI for XML)Construct regular expressions as inscreen scraping
c©Munindar P. Singh, CSC 513, Spring 2010 p.46
Metaphors for Handling XML: 2
Tree: an XML document is a treeWalk the tree using DOM (DocumentObject Model)
Template: an XML document has regularstructure
Let XPath, XSLT, XQuery do the workThought: an XML document represents aninformation model
Access knowledge via RDF or OWL
c©Munindar P. Singh, CSC 513, Spring 2010 p.47
XPath
Used as part of XPointer, SQL/XML, XQuery,and XSLT
Models XML documents as trees with nodesElementsAttributesText (PCDATA)CommentsRoot node: above root of document
c©Munindar P. Singh, CSC 513, Spring 2010 p.48
Achtung!
Parent in XPath is like parent as traditionallyin computer scienceChild in XPath is confusing:
An attribute is not a child of its parentMakes a difference for recursion (e.g., inXSLT apply-templates)
Our terminology follows computer science:e-children, a-children, t-childrenSets via et-, ta-, and so on
c©Munindar P. Singh, CSC 513, Spring 2010 p.49
XPath Location Paths: 1
Relative or absoluteReminiscent of file system paths, but muchmore subtle
Name of an element to walk downLeading /: root/: indicates walking down a tree.: currently matched (context) node..: parent node
c©Munindar P. Singh, CSC 513, Spring 2010 p.50
XPath Location Paths: 2
@attr: to check existence or access value ofthe given attributetext(): extract the textcomment(): extract the comment
[ ]: generalized array accessors
Variety of axes, discussed below
c©Munindar P. Singh, CSC 513, Spring 2010 p.51
XPath Navigation
Select children according to position, e.g., [j],where j could be 1 . . . last()Descendant-or-self operator, //
.//elem finds all elems under the currentnode//elem finds all elems in the document
Wildcard, *:collects e-children (subelements) of thenode where it is applied, but omits thet-children@*: finds all attribute values
c©Munindar P. Singh, CSC 513, Spring 2010 p.52
XPath Queries (Selection Conditions)Attributes: //Song[@genre="jazz"]Text: //Song[starts-with(.//group, "Led")]
Existence of attribute: //Song[@genre]Existence of subelement: //Song[group]Boolean operators: and, not, orSet operator: union (|), analogous to choiceArithmetic operators: >, <, . . .String functions: contains(), concat(),length(), starts-with(), ends-with()distinct-values()Aggregates: sum(), count()
c©Munindar P. Singh, CSC 513, Spring 2010 p.53
XPath Axes: 1
Axes are addressable node sets based on thedocument tree and the current node
Axes facilitate navigation of a treeSeveral are definedMostly straightforward but some of themorder the nodes as the reverse of othersSome captured via special notation
current, child, parent, attribute, . . .
c©Munindar P. Singh, CSC 513, Spring 2010 p.54
XPath Axes: 2
preceding: nodes that precede the start ofthe context node (not ancestors, attributes,namespace nodes)following: nodes that follow the end of thecontext node (not descendants, attributes,namespace nodes)preceding-sibling: preceding nodes thatare children of the same parent, in reversedocument orderfollowing-sibling: following nodes that arechildren of the same parent
c©Munindar P. Singh, CSC 513, Spring 2010 p.55
XPath Axes: 3
ancestor: proper ancestors, i.e., elementnodes (other than the context node) thatcontain the context node, in reversedocument orderdescendant: proper descendantsancestor-or-self: ancestors, including self (ifit matches the next condition)descendant-or-self: descendants, includingself (if it matches the next condition)
c©Munindar P. Singh, CSC 513, Spring 2010 p.56
XPath Axes: 4
Longer syntax: child::SongSome captured via special notation
self::*:child::node(): node() matches all nodespreceding::*descendant::text()ancestor::Songdescendant-or-self::node(), whichabbreviates to //Compare /descendant-or-self::Song[1](first descendant Song) and //Song[1](first Songs (children of their parents))
c©Munindar P. Singh, CSC 513, Spring 2010 p.57
XPath Axes: 5
Each axis has a principal node kindattribute: attributenamespace: namespaceAll other axes: element
* matches whatever is the principal nodekind of the current axisnode() matches all nodes
c©Munindar P. Singh, CSC 513, Spring 2010 p.58
XPointer
Enables pointing to specific parts of documentsCombines XPath with URLsURL to get to a document; XPath to walkdown the documentCan be used to formulate queries, e.g.,
Song-URL#xpointer(//Song[@genre="jazz"])The part after # is a fragment identifier
Fine-grained addressability enhances theWeb architecture
High-level “conceptual” identification of node sets
c©Munindar P. Singh, CSC 513, Spring 2010 p.59
XQuery
The official query language for XML, now aW3C recommendation, as version 1.0Given a non-XML syntax, easier on thehuman eye than XMLAn XML rendition, XqueryX, is in the works
c©Munindar P. Singh, CSC 513, Spring 2010 p.60
XQuery Basic Paradigm
The basic paradigm mimics the SQL(SELECT–FROM–WHERE) clausef o r $x i n doc ( ’ q2 . xml ’ ) / / Songwhere $x / @lg = ’ en ’r e t u r n
4 <Engl ish−Sgr name= ’ { $x / Sgr /@name} ’ t i = ’ { $x / @ti } ’ / >
c©Munindar P. Singh, CSC 513, Spring 2010 p.61
FLWOR Expressions
Pronounced “flower”For: iterative binding of variables over rangeof valuesLet: one shot binding of variables over vectorof valuesWhere (optional)Order by (sort: optional)Return (required)
Need at least one of for or let
c©Munindar P. Singh, CSC 513, Spring 2010 p.62
XQuery For Clause
The for clauseIntroduces one or more variablesGenerates possible bindings for eachvariableActs as a mapping functor or iterator
In essence, all possible combinations ofbindings are generated: like a Cartesianproduct in relational algebraThe bindings form an ordered list
c©Munindar P. Singh, CSC 513, Spring 2010 p.63
XQuery Where Clause
The where clauseSelects the combinations of bindings that aredesiredBehaves like the where clause in SQL, inessence producing a join based on theCartesian product
c©Munindar P. Singh, CSC 513, Spring 2010 p.64
XQuery Return Clause
The return clauseSpecifies what node-sets are returned basedon the selected combinations of bindings
c©Munindar P. Singh, CSC 513, Spring 2010 p.65
XQuery Let Clause
The let clauseLike for, introduces one or more variablesLike for, generates possible bindings foreach variableUnlike for, generates the bindings as a list inone shot (no iteration)
c©Munindar P. Singh, CSC 513, Spring 2010 p.66
XQuery Order By Clause
The order by clauseSpecifies how the vector of variable bindingsis to be sorted before the return clauseSorting expressions can be nested byseparating them with commas
Variants allow specifyingdescending or ascending (default)empty greatest or empty least toaccommodate empty elementsstable sorts: stable order bycollations: order by $t collationcollation-URI: (obscure, so skip)
c©Munindar P. Singh, CSC 513, Spring 2010 p.67
XQuery Positional Variables
The for clause can be enhanced with a positionalvariable
A positional variable captures the position ofthe main variable in the given for clause withrespect to the expression from which themain variable is generated
Introduce a positional variable via the at $varconstruct
c©Munindar P. Singh, CSC 513, Spring 2010 p.68
XQuery Declarations
The declare clause specifies things likeNamespaces: declare namespacepref=’value’
Predefined prefixes include XML, XMLSchema, XML Schema-Instance, XPath,and local
Settings: declare boundary-space preserve(or strip)Default collation: a URI to be used forcollation when no collation is specified
c©Munindar P. Singh, CSC 513, Spring 2010 p.69
XQuery Quantification: 1
Two quantifiers some and everyEach quantifier expression evaluates to trueor falseEach quantifier introduces a bound variable,analogous to for
1 f o r $x i n . . .where some $y i n . . .s a t i s f i e s $y . . . $xr e t u r n . . .
Here the second $x refers to the same variableas the first
c©Munindar P. Singh, CSC 513, Spring 2010 p.70
XQuery Quantification: 2
A typical useful quantified expression would usevariables that were introduced outside of itsscope
The order of evaluation isimplementation-dependent: enablesoptimizationIf some bindings produce errors, this canmattersome: trivially false if no variable bindingsare found that satisfy itevery: trivially true if no variable bindings arefound
c©Munindar P. Singh, CSC 513, Spring 2010 p.71
Variables: Scoping, Bound, and Free
for, let, some, and every introduce variablesThe visibility variable follows typical scopingrulesA variable referenced within a scope is
Bound if it is declared within the scopeFree if it not declared within the scope
1 f o r $x i n . . .where some $x i n . . .s a t i s f i e s . . .r e t u r n . . .
Here the two $x refer to different variables
c©Munindar P. Singh, CSC 513, Spring 2010 p.72
XQuery Conditionals
Like a classical if-then-else clauseThe else is not optionalEmpty sequences or node sets, written ( ),indicate that nothing is returned
c©Munindar P. Singh, CSC 513, Spring 2010 p.73
XQuery Constructors
Braces { } to delimit expressions that areevaluated to generate the content to be included;analogous to macros
document { }: to create a document nodewith the specified contentselement { } { }: to create an element
element foo { ’bar’ }: creates<foo>Bar</foo>element { ’foo’ } { ’bar’ }: also evaluatesthe name expression
attribute { } { }: likewisetext { body}: simpler, because anonymous
c©Munindar P. Singh, CSC 513, Spring 2010 p.74
XQuery Effective Boolean Value
Analogous to Lisp, a general value can betreated as if it were a Boolean
A xs:boolean value maps to itselfAn empty sequence maps to falseA sequence whose first member is a nodemaps to trueA numeric that is 0 or NaN maps to false,else to trueAn empty string maps to false, others to true
c©Munindar P. Singh, CSC 513, Spring 2010 p.75
Defining Functions
1 dec lare f u n c t i o n l o c a l : i t em f top ( $ t ){
l o c a l : i t em f ( $t , ( ) )} ;
Here local: is the namespace of the queryThe arguments are specified in parentheses
All of XQuery may be used within thedefining bracesSuch functions can be used in place ofXPath expressions
c©Munindar P. Singh, CSC 513, Spring 2010 p.76
Functions with Types
1 dec lare f u n c t i o n l o c a l : i t em f top ( $ t as element ( ) )as element ( ) ∗
{l o c a l : i t em f ( $t , ( ) )
} ;
Return types as aboveAlso possible for parameters, but ignore suchfor this course
c©Munindar P. Singh, CSC 513, Spring 2010 p.77
XSLT
A programming language with a functional flavorSpecifies (stylesheet) transforms fromdocuments to documentsCan be included in a document (best not to)
<?xml vers ion ="1.0"? ><?xml−s t y l eshee t type =" t e x t / x s l "
h r e f ="URL−to−xs l−sheet "?><main−element >
5 . . .</main−element >
c©Munindar P. Singh, CSC 513, Spring 2010 p.78
XQuery versus XSLT: 1
Competitors in some ways, butShare a basis in XPathConsequently share the same data modelSame type systems (in the type-sensitiveversions)XSLT got out first and has a sizablefollowing, but XQuery has strong backingamong vendors and researchers
c©Munindar P. Singh, CSC 513, Spring 2010 p.79
XQuery versus XSLT: 2
XQuery is geared for querying databasesSupported by major relational DBMSvendors in their XML offeringsSupported by native XML DBMSsOffers superior coverage of processingjoinsIs more logical (like SQL) and potentiallymore optimizable
XSLT is geared for transforming documentsIs functional rather than declarativeBased on template matching
c©Munindar P. Singh, CSC 513, Spring 2010 p.80
XQuery versus XSLT: 3
There is a bit of an arms race between themTypes
XSLT 1.0 didn’t support typesXQuery 1.0 doesXSLT 2.0 does too
XQuery presumably will be enhanced withcapabilities to make updates, but XSLT couldtoo
c©Munindar P. Singh, CSC 513, Spring 2010 p.81
XSLT Stylesheets
A programming language that follows XMLsyntax
Use the XSLT namespace (conventionallyabbreviated xsl)Includes a large number of primitives,especially:
<copy-of> (deep copy)<copy> (shallow copy)<value-of><for-each select="..."><if test="..."><choose>
c©Munindar P. Singh, CSC 513, Spring 2010 p.82
XSLT Templates: 1
A pattern to specify where the giventransform should apply: an XPath expression
This match only works on the root:< x s l : template match =" / " >
. . .</ x s l : template >
Example: Duplicate text in an element< x s l : template match=" t e x t ( ) " >
2 < x s l : value−of s e l e c t = ’ . ’ / >< x s l : value−of s e l e c t = ’ . ’ / >
</ x s l : template >
c©Munindar P. Singh, CSC 513, Spring 2010 p.83
XSLT Templates: 2
If no pattern is specified, apply recursively onet-children via <xsl:apply-templates/>By default, if no other template matches,recursively apply to et-children of currentnode (ignores attributes) and to root:
1 < x s l : template match = "∗ | / " >< x s l : apply−templates / >
</ x s l : template >
c©Munindar P. Singh, CSC 513, Spring 2010 p.84
XSLT Templates: 3
Copy text node by defaultUse an empty template to override thedefault:< x s l : template match="X" / >
2 <!−− X = des i red pa t t e r n −−>
Confine ourselves to the examples discussed inclass (ignore explicit priorities, for example)
c©Munindar P. Singh, CSC 513, Spring 2010 p.85
XSLT Templates: 4
Templates can be namedTemplates can have parameters
Values for parameters are supplied atinvocationEmpty node sets by defaultAdditional parameters are ignored
c©Munindar P. Singh, CSC 513, Spring 2010 p.86
XSLT Variables
Explicitly declaredValues are node setsConvenient way to document templates
c©Munindar P. Singh, CSC 513, Spring 2010 p.87
Integrity Constraints in XML
Entity: xsd:unique and xsd:keyReferential: xsd:keyrefData type: XML Schema specificationsValue: Solve custom queries using XPath orXQuery
Entity and referential constraints are based onXPath
c©Munindar P. Singh, CSC 513, Spring 2010 p.88
XML Keys: 1
Keys serve as generalized identifiers, and arecaptured via XML Schema elements:
Unique: candidate keyThe selected elements yield unique fieldtuples
Key: primary key, which means candidatekey plus
The tuples exist for each selected elementKeyref: foreign key
Each tuple of fields of a selected elementcorresponds to an element in thereferenced key
c©Munindar P. Singh, CSC 513, Spring 2010 p.89
XML Keys: 2
Two subelements built using restrictedapplication of XPath from within XML Schema
Selector: specify a set of objects: this is thescope over which uniqueness appliesField: specify what is unique for eachmember of the above set: this is the identifierwithin the targeted scope
Multiple fields are treated as ordered toproduce a tuple of values for eachmember of the setThe order matters for matching keyref tokey
c©Munindar P. Singh, CSC 513, Spring 2010 p.90
Selector XPath Expression
A selector finds descendant elements of thecontext node
The sublanguage of XPath used allowsChildren via ./child or ./* or childDescendants via .// (not within a path)Choice via |
The subset of XPath used does not allowParents or ancestorstext()AttributesFancy axes such as preceding,preceding-sibling, . . .
c©Munindar P. Singh, CSC 513, Spring 2010 p.91
Field XPath Expression
A field finds a unique descendant element(simple type only) or attribute of the context node
The subset of XPath used allowsChildren via ./child or ./*Descendants via .// (not within a path)Choice via |Attributes via @attribute or @*
The subset of XPath used does not allowParents or ancestorstext()Fancy axes such as preceding, . . .
An element yields its text()c©Munindar P. Singh, CSC 513, Spring 2010 p.92
XML Foreign Keys
<keyre f name = " . . . " r e f e r =" pr imary−key−name">< s e l e c t o r xpath = " . . . " / >
3 < f i e l d name = " . . . " / ></ keyref >
Relational requirement: foreign keys don’thave to be unique or non-null, but if onecomponent is null, then all components mustbe null.
c©Munindar P. Singh, CSC 513, Spring 2010 p.93
Document Object Model (DOM)
Basis for parsing XML, which provides anode-labeled tree in its API
Conceptually simple: traverse by requestingelement, its attribute values, and its childrenProcessing program reflects documentstructure, as in recursive descentCan edit documentsInefficient for large documents: parses themfirst entirely even if a tiny part is neededCan validate with respect to a schema
c©Munindar P. Singh, CSC 513, Spring 2010 p.94
DOM Example
1 DOMParser p = new DOMParser ( ) ;p . parse ( " f i lename " ) ;Document d = p . getDocument ( )Element s = d . getDocumentElement ( ) ;NodeList l = s . getElementsByTagName ( " member " ) ;
6 Element m = ( Element ) l . i tem ( 0 ) ;i n t code = m. g e t A t t r i b u t e ( " code " ) ;NodeList k ids = m. getChildNodes ( ) ;Node k id = k ids . i tem ( 0 ) ;S t r i n g elemName = ( ( Element ) k i d ) . getTagName ( ) ; . . .
c©Munindar P. Singh, CSC 513, Spring 2010 p.95
Simple API for XML (SAX)
Parser generates a sequence of events:startElement, endElement, . . .
Programmer implements these as callbacksMore control for the programmer
Processing program does not necessarilyreflect document structure
c©Munindar P. Singh, CSC 513, Spring 2010 p.96
SAX Example: 1
c lass MemberProcess extends Defau l tHand ler {p u b l i c vo id s ta r tE lement ( S t r i n g u r i , S t r i n g n ,
S t r i n g qName, A t t r i b u t e s a t t r s ) {i f ( n . equals ( " member " ) ) code = a t t r s . getValue ( " code " ) ;
5 i f ( n . equals ( " p r o j e c t " ) ) i n P r o j e c t = t r ue ;b u f f e r . rese t ( ) ;
}
. . .
c©Munindar P. Singh, CSC 513, Spring 2010 p.97
SAX Example: 2
1
. . .
p u b l i c vo id endElement ( S t r i n g u r i , S t r i n g n ,S t r i n g qName) {
6 i f ( n . equals ( " p r o j e c t " ) ) i n P r o j e c t = f a l s e ;i f ( n . equals ( " member " ) && ! i n P r o j e c t )
. . . do something . . .}
}
c©Munindar P. Singh, CSC 513, Spring 2010 p.98
SAX Filters
A component that mediates between anXMLReader (parser) and a client
A filter would present a modified set ofevents to the clientTypical uses:
Make minor modifications to the structureSearch for patterns efficiently
What kinds of patterns, though?
Ideally modularize treatment of differentevent patternsIn general, a filter can alter the structure ofthe document
c©Munindar P. Singh, CSC 513, Spring 2010 p.99
Programming with XML
LimitationsDifficult to construct and maintaindocumentsInternal structures are cumbersome;hence the criticisms of DOM parsers
Emerging approaches provide superiorbinding from XML to
Programming languagesRelational databases
Check pull-based versus push-based parsers
c©Munindar P. Singh, CSC 513, Spring 2010 p.100