technologies for an information age: . opennet extensible style language—xslt
DESCRIPTION
Technologies for an Information Age: . opennet Extensible Style Language—XSLT. Computer Science, Informatics, Physics Indiana University Bloomington IN 47404 [email protected]. Fall Semester 2001 MW 5:00 pm - 6:20 pm CENTRAL (not Indiana) Time Bryan Carpenter and Geoffrey Fox - PowerPoint PPT PresentationTRANSCRIPT
1
Technologies for an Information Age: .opennet
Extensible Style Language—XSLT
Fall Semester 2001 MW 5:00 pm - 6:20 pm CENTRAL (not Indiana) Time
Bryan Carpenter and Geoffrey FoxPTLIU Laboratory for Community
Grids Computer Science,
Informatics, Physics
Indiana University
Bloomington IN 47404
2
XSL Transformations
XSL is the Extensible Style Language. It has two parts: the transformation
language and the formatting language. In these lectures we are most concerned
with XSL transformations, or XSLT.– The formatting language may be discussed later.
XSLT provides a syntax for defining rules that transform an XML document to another document.– For example, to an HTML document.
An XSLT “style sheet” consists primarily of a set of template rules that are used to transform nodes matching some patterns.
3
Transforming XML
Consider the important application of transforming an XML document to HTML, for display purposes.
This could happen in various places: – In a Web server. A JSP page, for example, might convert
XML documents stored on the server to HTML documents that are sent to the client via HTTP.
– In a Web browser. An XSL-enabled browser may convert XML downloaded from the server to HTML, prior to display. Currently Internet Explorer supports a subset of XSLT.
– In a standalone program. XML stored in or generated from a database, say, may be “manually” converted to HTML before placing it in the server’s document directory.
In any case, a suitable program takes an XML document as input, together with an XSLT “style-sheet”.
4
Example
Here is an XML source file that includes a “stylesheet” declaration: <?xml-stylesheet type="text/xsl" href="emptable.xsl"
version="1.0"?>
<EMPLOYEE> <PERSON> <EMPNO>100</EMPNO> <ENAME>JAMES </ENAME> <JOB>PRESIDENT</JOB> </PERSON>
<PERSON> <EMPNO>201</EMPNO> <ENAME>KELLY </ENAME> <JOB>CLERK</JOB> </PERSON> . . . </EMPLOYEE>
5
Style sheet emptable.xsl<?xml version="1.0" ?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl"> <xsl:template match="/"> <HTML> <BODY> <TABLE WIDTH="100%" ALIGN="center" BORDER="1"> <TR> <TH>Number</TH> <TH>Name</TH>
<TH>Job</TH> </TR>
<xsl:for-each select="EMPLOYEE/PERSON"> <TR> <TD> <xsl:value-of select="EMPNO" /> </TD> <TD> <xsl:value-of select="ENAME" /> </TD> <TD> <xsl:value-of select="JOB" /> </TD> </TR> </xsl:for-each> </TABLE> </BODY> </HTML> </xsl:template> </xsl:stylesheet>
6
Remarks The xml-stylesheet element in the XML instance
references an XSL style sheet. In general, children of the stylesheet element in a
stylesheet are templates. A template specifies a pattern; the template is applied to
nodes in the XML source document that match this pattern.– In this example the only pattern is trivial: the pattern “/”
matches the root node of the document. In the transformed document, the body of the template
element replaces the matched node in the source document.
In addition to text, the body may contain further XSL terms, e.g.:– xsl:foreach repeats the enclosing template text for each
member of a selected set of sub-nodes.– xsl:value-of extracts data from selected sub-nodes.
7
Viewing Example With Internet Explorer
8
References
Inside XML, Chapter 13: “XSL Transformations”.
“XSL Transformations (XSLT)”, version 1.0: http://www.w3.org/TR/xslt
“XML Path Language (XPath)”, version 1.0: http://www.w3.org/TR/xpath
9
Simple Examples
10
Motivating Examples
XSL is quite sophisticated and in detail depends on a separate specification called XPath.
Before diving into technical details, we will go through a few simple examples, taken from Inside XML.
11
An Input Document
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl"
href="eg.xsl"?><planets> <planet> <name>Mercury</name> <mass>0.0553</mass> <day units="days">58.65</day> <radius units="miles">1516</radius> <density>0.983</density> </planet> <planet> <name>Venus</name> <mass>0.815</mass> <day units="days">116.75</day> <radius units="miles">3716</radius> <density>0.943</density> </planet> <planet> <name>Earth</name> <mass>1</mass> <day units="days">1</day> <radius units="miles">2107</radius> <density>1</density> </planet></planets>
Simplified version ofexample from the“Inside XML” book(complete with astronomical errors).
12
Possible Style Sheet eg.xsl
<?xml version="1.0" ?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-
xsl">
<xsl:template match=“planets”> <HTML> <xsl:apply-templates /> </HTML> </xsl:template>
<xsl:template match=“planet”> <P> <xsl:value-of select=“NAME” /> </P> </xsl:template></xsl:stylesheet>
13
Remarks There are two template rules: one matches
elements called PLANETS and the other matches elements called PLANET.– The rule for PLANETS matches the document root of the
input XML data.– The body of the rule is emitted as output.
The <xsl:apply-templates /> instruction in the body of the rule causes the processing to continue on all children of the current node.– The rule for PLANET matches the child nodes.– For each, the body of the rule is emitted as output.
This output includes the result of the <xsl:value-of select=“NAME” /> processing instruction, which yields the text of the nested NAME element.
14
Output
So the resulting document should be:<HTML> <P>Mercury</P> <P>Venus</P> <P>Earth</P></HTML>
If you visited the original XML page using Internet Explorer, you would expect to see something like:
15
Processing Model for Templates The formal “processing model” for template rules is
given in the XSLT specification. For our purposes the following simplified interpretation should do:
1. Start at the document root node.
2. Search through all the template rules in the style sheet, looking for any that match the current node.– If more than one rule applies, choose the most specific match.
3. If a match is found, replace the body of the current node with the body of the template, following any embedded processing instructions.– Recurse to embedded nodes only as specified in embedded
processing instructions.
4. In no match is found, apply the procedure starting at 2, recursively, to all children of “the current node”.
16
XSL in Internet Explorer
That is how it is supposed to work. Microsoft (bless their hearts) seem to have done
something different. At least, I could only get the example to work in
IE by adding an extra explicit rule:
<xsl:template match="/"> <xsl:apply-templates/> </xsl:template>
– Apparently IE doesn’t automatically recurse through the document? You have to be very explicit/procedural?
17
XPath
18
XML Path Language
The XML Path Language, or XPath, is a language for addressing parts of an XML document.
It is an essential part of XSLT, and also XPointer (as well as being used in XML schema).
The patterns and other node selections appearing in XSLT rules are represented using XPath syntax.
In simple cases an XPath expression looks like a UNIX path name, with nested directory names replaced by nested element names:– “/” is the root element of a document– expressions may be absolute (relative to the root) or
relative to some context node.
19
Simple Examples The XPath:
/planets/planet
represents the set of all elements named planet directly nested in a root element called planets.
The XPath: /planets/planet/density
represents the set of all elements named density that are directly nested in any element named planet, which directly nested in a root element called planets.
The XPath: /planets/planet/*
represents the set of all elements that are directly nested in any element named planet, which is directly nested in a root element called planets.
20
Types of XPath Expressions In full generality, XPath expressions evaluate to
one of four possible kinds of thing:– A node-set—a collection of nodes in the XML document.
The nodes that can appear in a node-set include:» element nodes and attribute nodes,» text nodes (plain text children of some element), and
also» root nodes, namespace nodes, processing instruction
nodes and comment nodes.– A boolean value—true or false.– A number—internally a double-precision floating point,
although they may be written and used as an integer.– A string.
In the end we are interested in XPath expressions that evaluate to a node-set, although other the expression types can appear in subexpressions.
21
Location Paths and Location Steps
The most important kind of XPath expression is the location path.
In general a location path consists of a series of location steps separated by the slash “/”.– Compare to a UNIX path, where the individual
“step” is the name of an immediately nested subdirectory or file, or a wildcard (*), or a move up into the parent directory (..), etc.
22
Steps and Context Nodes While a UNIX path may be relative to some
current directory, an XPath expression is generally evaluated relative to some context node.
Starting from this context node of the path, the location path takes a series of steps in various possible “directions”, e.g. – into the set of child elements of the current element, – into the set of attributes of the current element, – into the set of siblings of the current element, etc.
Each individual step has its own context node, determined by preceding steps in the path.
23
Syntax for Location Steps
The commonest example of a location step—analogous to a UNIX directory name—is an XML element name.– Implicitly this should be the name of an element that is
an immediate child of the context node.
Example: the XPath: /planets/planet
has two steps. It references all elements named planet directly nested in a root element called planets.
Actually this common case is an example of what is called abbreviated syntax.
To be systematic, we will first describe the general, unabbreviated syntax for location paths.
24
Parts of a Location Step
We said that a location path is split into a series of /-separated location steps.
In general an individual location step is itself divided into three parts:– The axis—a keyword which, loosely speaking,
describes the “dimension” this location step takes us into.
» Simple examples are child and attribute which, respectively, say that this step enters the set of children or the set of attributes of an element.
– The node test—this is typically an element or attribute name, selecting within the chosen axis. It may also be less specific: a node type or wildcard.
– Optionally, one or more predicates, which use arbitrary tests (boolean expressions) to further refine the set of selected nodes.
25
Syntax for a Location Step
The unabbreviated syntax for a location step is: axis :: node-test [predicate1]
[predicate2] . . .
For example:
child :: para [position() > 1]
axis = child
node test = parapredicate = position() > 1
26
Axes
Any location step starts from some context node; the axis is relative to this node.
The available axes are:– child—contains the child elements of the context node.– descendent—contains children and all descendents of
children.– parent—contains the parent element of the context node.– ancestor—contains parent and ancestors of parent.– attribute—contains the attributes of the context node.– following—all following element nodes, in document order.– preceding—all preceding element nodes, in document
order.– following-sibling, preceding-sibling—elements at the
same syntactic level.– namespace—contains namespace nodes of context node.– self, descendent-or-self, ancestor-or-self
27
The Node Test
After choosing the axis, we refine the selection with a node test.
The most common cases for a the node-test field are: – an element or attribute name, selecting nodes in the axis
with the given name, or– the wildcard, “*”, selecting all nodes of the “principal
type” for this axis (typically, all element nodes, or all attribute nodes if axis is attribute).
The node-test field may also be a node type expression:– comment(), text(), processing-instruction(), node()– Optionally, the processing-instruction() function may
include a literal string specifying a particular type of instruction.
28
Example Location Stepschild :: para
– Child elements of of the context node, named para
child :: *– All element children of the context node
child :: text()– All text node children of the context node
child :: node()– All children, regardless of node type
attribute :: name– The attribute of the context node named name
attribute :: *– All attributes of the context node
decendent :: para– Descendent elements of the context node named para
ancestor :: div– div element ancestors of the context node
29
Location Paths, Using Unabbreviated Syntax
child :: chapter/descendent :: para– para element descendents of chapter element
children of the context node.
child :: */child :: para– All para element grandchildren of the context node.
/descendent :: para– All para elements in this document.
30
Predicates
The node test is optionally followed by a series of predicate expressions.
Each expression appears in []s. The predicates are computed successively to
further filter the set of selected nodes—after each predicate is applied, the selected node set is reduced to exclude those elements for which the expression evaluates to false.
In the next slide we give some examples. Technical details of how these examples work are in following slides.
31
Examples with Predicates
child :: para [position() = 1]– First para element child of the context node.
child :: para [position() = last()]– Last para element child of the context node.
child :: para [position() > 1]– All para element children of the context node, except the first.
/child :: doc/child :: chapter [position() = 5]/child :: section– section elements of 5th chapter element of root doc
element.
child :: para [attribute :: type = ‘warning’] [position() = 5]– 5th para child of context node having type attribute value
“warning”.
32
Context Size and Position
Various functions are available in predicate expressions. They include:– last() returns the current context size.
– position(): the current context position.
The definition of size and position is slightly subtle: they are relative to the set of nodes generated by starting at the initial context node for this location step, following the axis, and applying any earlier filters in the step (node test and preceding predicates, if any).
Positions are labelled beginning at 1, and ordered according to proximity in the text to (the start of) the context node.
33
Subscripting Notation
If the Xpath expression that appears in the predicate of a path step has numeric type, it is treated as true if its value is equal to position() and false otherwise.
This is a perverse way of saying that the location step: child :: para [5]
is equivalent to: child :: para [position() = 5]
Or in other words, that numeric expressions appearing in predicates behave like subscripts (relative to the node set described on the previous slide).
34
Subscripting Notation This convention allows us to simplify some of
the examples given earlier, e.g.:
child :: para [1]– First para element child of the context node.
child :: para [last()]– Last para element child of the context node.
/child :: doc/child :: chapter [5]/child :: section– All section elements of 5th chapter element of root doc
element.
35
Booleans
In general, an Xpath expression is converted to a boolean—if context demands—by the following rules:– A number (if not a top level predicate expression)
converts to true if non-zero, false if zero.– A non-empty string converts to true true, empty to
false.– A non-empty node converts to true, empty to false.
According to the third rule, in: child :: section [child :: para]
the predicate is true if the context node for the predicate—the section element—has a non-empty set of para children.
Operators and, or, not() are available.
36
Comparisons Numeric and string comparisons in Xpath
predicates follows fairly obvious rules. Comparisons involving node sets are defined to
be true if the comparison would hold true for the string-value of any elements of the sets involved.– Note, the string value of an element node is a
concatenation of the string values of its children.
For example, in child :: para [attribute :: type = ‘warning’]
the predicate is true iff the node set attribute :: type includes an element with string-value “warning”.
So this location step extracts all para children decorated with the attribute type=“warning”.
37
Unions
The operator “|” forms the union of two node sets.
e.g. child :: chapter [child :: section | child :: para]
selects chapters that directly contain a section or a para.
38
Abbreviated Syntax for Paths
Together, the following abbreviations allow the UNIX-like path syntax seen earlier:– The axis selector child :: can always be omitted: a
node test alone implicitly refers to the child axis.– The location step “.” is short for self :: node().– The location step “..” is short for parent :: node().
Other useful abbreviations are:– The axis selector attribute :: can be abbreviated
to @.– // is short for /descendent-or-self :: node()/
» e.g //para is short for any para element in the document.
39
Summary
In its full generality, XPath is a fairly complex (but powerful) language for addressing subsets of the nodes of an XML document.
The generality of the underlying language should not be too intimidating. The most common idioms (which have special abbreviated syntax) do “what you expect”.
If you take the time to understand XPath well, the rest of XSLT is relatively straightforward.
40
XSLT
41
Format of a Style Sheet
Not surprisingly, an XSLT style sheet is itself an XML document.
In this lecture we will be using the XSLT elements from the namespace: http://www.w3.org/1999/XSL/Transform.– As a matter of convention we use the prefix xsl: for this
namespace.
42
The Style Sheet Document
The document root in an XSLT style sheet is an xsl:stylesheet element, e.g.:<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
. . .</xsl:stylesheet>– A synonym for xsl:stylesheet is xsl:transform.
Several kinds of element can be nested inside xsl:stylesheet, but by far the most important is the xsl:template element.
43
The xsl:template Element The xsl:template element has a few possible
attributes, but the only one we will consider is the match attribute, which is generally required– unless this template element is a reference to an actual
template defined elsewhere.
So all our templates will have the form: <xsl:template match=“pattern”> template body </xsl:template>
The pattern is an Xpath expression describing the nodes to which the template can be applied.
The processor scans the input document for nodes matching this pattern, and replaces them with the text included in the template body.
In a nutshell, this explains the whole operation of XSLT.
44
Patterns
The value of the match attribute is a (slightly restricted) Xpath expression.
The expression must evaluate to a node set, and the top-level of the expression should normally be a location path.– A few other cases allowed, e.g. unions of location paths.– Arbitrary Xpath expressions can appear nested in
predicates.
Location steps in the top-level location paths may only use the child or attribute axes.– The unabbreviated descendant axis cannot be used,
but the special case syntax // is allowed.
45
Matching Patterns Formally: a pattern is defined to match a node iff
the node has an ancestor node such that evaluating the Xpath expression with the ancestor as context gives a node set that contains the node.
In practice the rules are easy to understand, e.g.:
para– Matches all para elements in the search space.
text()– Matches all text nodes in the search space.
@name– All name attributes of all elements in the search space.
para [position() = 1]– All para elements that are first children of their parent
element.
chapter [5]//section– All section elements in all chapter elements in position 5.
46
The Template Body When a node is matched, the body of the
xsl:template element is evaluated, and the resulting text placed in the output document.
The template body may include plain text, which is copied straight to the output document.
It may also include nested XSLT elements that need to be further processed.– These will be evaluated in the context of the matched
node.
Important examples of XSLT elements that may appear in the body of a template:– The xsl:value-of element.– The xsl:apply-templates element.– The xsl:element and xsl:attribute elements.– More “procedural” elements, like xsl:for-each, xsl:if.
47
An input document
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xml"
href="eg.xsl"?><planets> <planet> <name>Mercury</name> <mass>0.0553</mass> <day units="days">58.65</day> <radius
units="miles">1516</radius> <density>0.983</density> </planet> <planet> <name>Venus</name> <mass>0.815</mass> <day units="days">116.75</day> <radius
units="miles">3716</radius> <density>0.943</density> </planet> <planet> <name>Earth</name> <mass>1</mass> <day units="days">1</day> <radius
units="miles">2107</radius> <density>1</density> </planet></planets>
Simplified version ofexample from the“Inside XML” book(complete with astronomical errors).
48
Using an empty style sheet
Consider the example where there are no templates explicitly specified, eg.xsl has the form: <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" > </xsl:stylesheet>
The transformation of the input document is:Mercury0.055358.6515160.983Venus0.815116.7537160.943Earth1
121071
i.e. just the concatenated string values in all text nodes. This happens because there is a default template rule:
<xsl:template match=“text()”> <xsl:value-of select=“.”/> </xsl:template>
49
Templates without embedded XSLT Now consider a single template, with no embedded
XSLT commands: <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0”
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" > <xsl:template match="planet"> <p>planet discovered</p> </xsl:template> </xsl:stylesheet>
The transformation of the input document is:<?xml version="1.0" encoding="UTF-16"?><p>planet discovered</p><p>planet discovered</p><p>planet
discovered</p>
This is valid HTML, but not very readable (as text). We can add the command:
<xsl:output indent="yes"/>
to the xsl:stylesheet element to get prettier output formatting.
50
The xsl:apply-templates element
Suppose a second template matching the planets element is added: <xsl:template match="planet"> <p>planet discovered</p> </xsl:template> <xsl:template match="planets"> <h1>All Known Planets</h1> </xsl:template>
The output now only contains the header: <h1>All Known Planets</h1>
not the “planet discovered” messages from processing the nested planet elements.
Once a match is found, nested elements are not processed unless there is an explicit <xsl:apply-templates> instruction: <xsl:template match="planets"> <h1>All Known Planets</h1> <xsl:apply-templates/> </xsl:template>
51
The xsl:value-of element
We can now match arbitrary nodes in the source document, but we don’t yet have a way to extract data from those nodes.
To do this we need the xsl:value-of element, e.g.: <xsl:template match="planet"> <p>planet <xsl:value-of select="name"/>
discovered</p> </xsl:template> <xsl:template match="planets"> <h1>All Known Planets</h1> <xsl:apply-templates/> </xsl:template>
We now get the more interesting output: <h1>All Known Planets</h1> <p>planet Mercury discovered</p> <p>planet Venus discovered</p> <p>planet Earth discovered</p>
52
Selections
The select attribute of the xsl:value-of element is a general Xpath expression.
Its result—which may be a node set or other allowed value—is converted to a string and included in the output.
For example, the selection can be an attribute node, a set of elements, or it could be the result of a numeric computation.
If the selection is a set of elements, the text contents of all the element bodies, including nested elements, are concatenated and returned as the value.
53
XPath Expressions in Attributes Suppose we want to generate an XML element in
the output with an attribute whose value is computed from source data.
One might be tempted to try a template like: <planet name = "<xsl:value-of select='name'/>" > Status: discovered </planet>
This is ill-formed XML: we cannot have an XML element as an attribute value.
Instead {}s can be used in an attribute value to surround an Xpath expression: <planet name = "{name}" > Status: discovered </planet>
The Xpath expression name is evaluated exactly as for a select attribute (value-of element?), and interpolated into the attribute value string.
54
The xsl:element element
For similar reasons we cannot use <xsl:value-of> to compute an expression that is used as the name of an element in the generated file.
Instead one can use instead the xsl:element element.– These can optionally include nested xsl:attribute elements
(as their first children): <xsl:template match="planet"> <xsl:element name="{name}"> <xsl:attribute name="distance"> <xsl:value-of select="distance"/> </xsl:attribute> Status: discovered </xsl:element> </xsl:template>– When this template matches a planet, it generates an XML
element whose name is the planet, with a distance attribute.
55
A Table of Planets
<xsl:template match="planets"> <html><body> <h1>All Known Planets</h1> <table width="100%" align="center" border="1">
<tr><th>name</th><th>mass</th><th>density</th> <th>radius</th></tr> <xsl:apply-templates/> <!-- rows of table --> <tr> <td>AVERAGES</td> <td> <xsl:value-of select="sum(planet/density) div
count(planet)"/> </td> <td></td> <td></td> </tr> </table> </body></html> </xsl:template>
56
A row of the table
<xsl:template match="planet"> <tr> <td><xsl:value-of select="name"/></td> <td><xsl:value-of select="mass"/></td> <td><xsl:value-of select="density"/></td> <td><xsl:value-of select="radius"/></td> </tr> </xsl:template>
57
The Display