processing xml processing xml using xslt processing xml documents with java (dom) next week --...

72
Processing XML Processing XML using XSLT Processing XML documents with Java (DOM) Next week -- Processing XML documents with Java (SA

Post on 20-Dec-2015

272 views

Category:

Documents


0 download

TRANSCRIPT

Processing XML

• Processing XML using XSLT

• Processing XML documents with Java (DOM)

• Next week -- Processing XML documents with Java (SAX)

Processing XML using XSLT

To use James Clark’s xt program visit his site athttp://www.jclark.com/ and click on XML.

The following programs were tested with the command lineC:>xt somefile.xml somefile.xsl resultfile.html

The xt classes (and xslt processing) may also be accessed viaa servlet.

<?xml version="1.0" ?><?xml-stylesheet type="text/xsl" href="demo1.xsl"?><book> <title>The Catcher in the Rye</title> <author>J. D. Salinger</author> <publisher>Little, Brown and Company</publisher> </book>

Input

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:template match = "book"> <HTML><BODY><xsl:apply-templates/></BODY></HTML> </xsl:template> <xsl:template match = "title"> <H1><xsl:apply-templates/></H1> </xsl:template> <xsl:template match = "author"> <H3><xsl:apply-templates/></H3> </xsl:template> <xsl:template match = "publisher"> <P><I><xsl:apply-templates/></I></P> </xsl:template></xsl:stylesheet> Processing

<HTML><BODY> <H1>The Catcher in the Rye</H1> <H3>J. D. Salinger</H3> <P><I>Little, Brown and Company</I></P> </BODY></HTML>

Output

<?xml version="1.0" ?><?xml-stylesheet type="text/xsl" href="demo1.xsl"?><library><block><book> <title>The Catcher in the Rye</title> <author>J. D. Salinger</author> <publisher>Little, Brown and Company</publisher> </book></block></library>

Input

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

version="1.0">

<xsl:template match = "book">

<HTML><BODY><xsl:apply-templates/></BODY></HTML>

</xsl:template>

<xsl:template match = "title">

<H1><xsl:apply-templates/></H1>

</xsl:template>

<xsl:template match = "author">

<H3><xsl:apply-templates/></H3>

</xsl:template>

<xsl:template match = "publisher">

<P><I><xsl:apply-templates/></I></P>

</xsl:template>

</xsl:stylesheet>

The default rules matchesthe root, library and block elements.

<HTML><BODY> <H1>The Catcher in the Rye</H1> <H3>J. D. Salinger</H3> <P><I>Little, Brown and Company</I></P> </BODY></HTML>

The output is the same.

<?xml version="1.0" ?><?xml-stylesheet type="text/xsl" href="demo1.xsl"?>

<book> <title>The Catcher in the Rye</title> <author>J. D. Salinger</author> <publisher>Little, Brown and Company</publisher> <book>Cliff Notes on The Catcher in the Rye</book> </book>

Two books in the input

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:template match = "book"> <HTML><BODY><xsl:apply-templates/></BODY></HTML> </xsl:template>

<xsl:template match = "title"> <H1><xsl:apply-templates/></H1> </xsl:template>

<xsl:template match = "author"> <H3><xsl:apply-templates/></H3> </xsl:template>

<xsl:template match = "publisher"> <P><I><xsl:apply-templates/></I></P> </xsl:template>

</xsl:stylesheet>

What’s the output?

<HTML><BODY> <H1>The Catcher in the Rye</H1> <H3>J. D. Salinger</H3> <P><I>Little, Brown and Company</I></P> <HTML><BODY>Cliff Notes on The Catcher in the Rye</BODY></HTML> </BODY></HTML>

Illegal HTML

<?xml version="1.0" ?><?xml-stylesheet type="text/xsl" href="demo1.xsl"?>

<book> <title>The Catcher in the Rye</title> <author>J. D. Salinger</author> <publisher>Little, Brown and Company</publisher> </book>

Input

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:template match = "book"> <HTML><BODY><xsl:apply-templates/></BODY></HTML> </xsl:template>

<xsl:template match = "title"> <H1><xsl:apply-templates/></H1> </xsl:template>

<xsl:template match = "author"> <H3><xsl:apply-templates/></H3> </xsl:template><!-- <xsl:template match = "publisher"> <P><I><xsl:apply-templates/></I></P> </xsl:template>--></xsl:stylesheet>

We are not matchingon publisher.

<HTML><BODY> <H1>The Catcher in the Rye</H1> <H3>J. D. Salinger</H3> Little, Brown and Company </BODY></HTML>

We get the default rule matching thepublisher and then printing its child.

<?xml version="1.0" ?><?xml-stylesheet type="text/xsl" href="demo1.xsl"?>

<book> <title>The Catcher in the Rye</title> <author>J. D. Salinger</author> <publisher>Little, Brown and Company</publisher> </book>

Input

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:template match = "book"> <HTML><BODY><xsl:apply-templates/></BODY></HTML> </xsl:template>

<xsl:template match = "title"> <H1><xsl:apply-templates/></H1> </xsl:template>

<xsl:template match = "author"> <H3><xsl:apply-templates/></H3> </xsl:template>

<xsl:template match = "publisher"> <!-- Skip the publisher --> </xsl:template>

</xsl:stylesheet>

We can skip the publisherby matching and stoppingthe recursion.

<HTML><BODY> <H1>The Catcher in the Rye</H1> <H3>J. D. Salinger</H3> </BODY></HTML>

<?xml version="1.0" ?><?xml-stylesheet type="text/xsl" href="demo1.xsl"?><shelf> <book> <title>The Catcher in the Rye</title> <author>J. D. Salinger</author> <publisher>Little, Brown and Company</publisher> </book> <book> <title>The Catcher in the Rye</title> <author>J. D. Salinger</author> <publisher>Little, Brown and Company</publisher> </book> <book> <title>The Catcher in the Rye</title> <author>J. D. Salinger</author> <publisher>Little, Brown and Company</publisher> </book></shelf>

A shelfhas many books.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:template match = "book"> <HTML><BODY><xsl:apply-templates/></BODY></HTML> </xsl:template>

<xsl:template match = "title"> <H1><xsl:apply-templates/></H1> </xsl:template>

<xsl:template match = "author"> <H3><xsl:apply-templates/></H3> </xsl:template>

<xsl:template match = "publisher"> <i><xsl:apply-templates/></i> </xsl:template>

</xsl:stylesheet>

Will this do the job?

<HTML> <BODY> <H1>The Catcher in the Rye</H1> <H3>J. D. Salinger</H3> <i>Little, Brown and Company</i> </BODY></HTML><HTML> <BODY> <H1>The Catcher in the Rye</H1> <H3>J. D. Salinger</H3> <i>Little, Brown and Company</i> </BODY></HTML><HTML> <BODY> <H1>The Catcher in the Rye</H1> <H3>J. D. Salinger</H3> <i>Little, Brown and Company</i> </BODY></HTML>

This is not whatwe want.

<?xml version="1.0" ?><?xml-stylesheet type="text/xsl" href="demo1.xsl"?><shelf> <book> <title>The Catcher in the Rye</title> <author>J. D. Salinger</author> <publisher>Little, Brown and Company</publisher> </book> <book> <title>The Catcher in the Rye</title> <author>J. D. Salinger</author> <publisher>Little, Brown and Company</publisher> </book> <book> <title>The Catcher in the Rye</title> <author>J. D. Salinger</author> <publisher>Little, Brown and Company</publisher> </book></shelf>

Same input.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:template match = "shelf"> <HTML><BODY>Found a shelf</BODY></HTML> </xsl:template>

</xsl:stylesheet>

Checks for a shelf and quits.

<HTML><BODY>Found a shelf</BODY></HTML>

Output

<?xml version="1.0" ?><?xml-stylesheet type="text/xsl" href="demo1.xsl"?><shelf> <book> <title>The Catcher in the Rye</title> <author>J. D. Salinger</author> <publisher>Little, Brown and Company</publisher> </book> <book> <title>The Catcher in the Rye</title> <author>J. D. Salinger</author> <publisher>Little, Brown and Company</publisher> </book> <book> <title>The Catcher in the Rye</title> <author>J. D. Salinger</author> <publisher>Little, Brown and Company</publisher> </book></shelf>

Same input.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:template match = "shelf"> <HTML> <BODY> <b>These are a few of my favorite books</b> <table width = "640“ border = “5”> <xsl:apply-templates/> </table> </BODY> </HTML> </xsl:template> <xsl:template match = "book"> <tr> <td> <xsl:number/> </td> <xsl:apply-templates/> </tr> </xsl:template> <xsl:template match = "title | author | publisher"> <td><xsl:apply-templates/></td> </xsl:template></xsl:stylesheet>

Produce a table of books.

<HTML><BODY><b>These are a few of my favorite books</b><table width="640“ border = “5”> <tr><td>1</td> <td>The Catcher in the Rye</td> <td>J. D. Salinger</td> <td>Little, Brown and Company</td> </tr> <tr><td>2</td> <td>The XSLT Programmer's Reference</td> <td>Michael Kay</td> <td>Wrox Press</td> </tr> <tr>

<td>3</td> <td>Computer Organization and Design</td> <td>Patterson and Henessey</td> <td>Morgan Kaufmann</td> </tr></table></BODY></HTML>

XPATH

• Non-xml language used to identify particular parts of an xml document

• Used by XSLT for matching and selecting particular elements to be copied into the result tree.

• Used by Xpointer to identify a particular point in or part of an xml document that an Xlink links to.

Slides adapted from “XML in a Nutshell” by Harold

XPATH

First, we’ll look at three commonly used XSLT instructions:

xsl:value-of xsl:template xsl:apply-templates

XPATH

<xsl:value-of select = “XPathExpression” />

The xsl:value-of element computes the string value of an Xpathexpression and inserts it into the result tree. XPath allows us to select nodes in the tree and different node types produce differentvalues.

XPATH

<xsl:value-of select = “XPathExpression” />

element => the text content of the element after all tags are stripped text => the text of the node attribute => the value of the attribute root => the value of the root processing-instruction => the processing instruction data (<?, ?>, and the target are not included comment => the text of the comment (no comment symbols) namespace => the namespace URI node set => the value of the first node in the set

XPATH

<xsl:template match = “pattern” />

The xsl:template top-level element is the key to all of xslt.The match attribute contains a pattern (location path) againstwhich nodes are compared as they’re processed. If the patternmatches a node, then the contents are instantiated

XPATH

<xsl:apply-templates select = “XPath node set expression” />

Find and apply the highest priority template that matches the node set expression.

If the select attribute is not present then all children of the context node are processed.

The Tree Structure of an XML Document

<?xml version="1.0" ?><?xml-stylesheet type="text/xsl" href = “some.xsl" ?><people> <person born="1912" died = "1954" id="p342"> <name> <first_name>Alan</first_name> <last_name>Turing</last_name> </name> <profession>computer scientist</profession> <profession>mathematician</profession> <profession>cryptographer</profession> </person>

<person born="1918" died = "1988" id="p4567"> <name> <first_name>Richard</first_name> <middle_initial>&#x4D;</middle_initial> <last_name>Feynman</last_name> </name> <profession>physicist</profession> <hobby>Playing the bongoes</hobby> </person></people>

/

personborn = “1914”died = “1952”id=“p342”

person

name

first_name

Alan

<!– Did the word “computer scientist”exist in Turing’s day?”-- >

<?xml-stylesheet type="text/xsl" href = “some.xsl" ?>

profession

The rootElement NodesText NodesAttribute NodesComment NodesProcessing InstructionsNamespace Nodes

Nodes seen by XPath Constructs not seen by XPath

CDATA sectionsEntity referencesDocument Type Declarations

Location Paths

• The root

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"version="1.0"

><xsl:template match="/">

<a>matched the root</a></xsl:template>

<?xml version="1.0" encoding="utf-8"?><a>matched the root</a>

Location Paths

• Child element location paths (relative to context node)

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"version="1.0"

><xsl:template match="/"> <xsl:value-of select = "people/person/profession" /></xsl:template></xsl:stylesheet>

computer scientist

Location Paths

• Attribute location paths (relative to context node)<xsl:stylesheet

xmlns:xsl="http://www.w3.org/1999/XSL/Transform"version="1.0"

><xsl:template match="/"> <xsl:value-of select = "people/person/@born" /></xsl:template>

</xsl:stylesheet> <?xml version="1.0" encoding="utf-8"?>1912

Location Paths

• Attribute location paths (relative to context node)<xsl:stylesheet

xmlns:xsl="http://www.w3.org/1999/XSL/Transform"version="1.0"

><xsl:template match="/"> <xsl:apply-templates select = "people/person" /></xsl:template><xsl:template match = "person"> <date> <xsl:value-of select = "@born" /> </date></xsl:template></xsl:stylesheet> <date>1912</date><date>1918</date>

Location Paths

• Comment Location Step (comments don’t have names)

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"version="1.0"

><xsl:template match="/"> <xsl:value-of select = "people/person/comment()" /></xsl:template>

</xsl:stylesheet> <?xml version="1.0" encoding="utf-8"?> Did the word "computer scientist" exist in Turing's day?

Location Paths

• Comment Location Step

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"version="1.0"

><xsl:template match = "comment()" > <i>comment deleted</i></xsl:template>

</xsl:stylesheet>

Document content withcomments replaced as shown.Default – no comments output

Location Paths

• Text Location Step (Text nodes don’t have names)

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"version="1.0"

><xsl:template match="/"> <xsl:value-of select = "people/person/profession/text()" /></xsl:template>

</xsl:stylesheet>computer scientist

Location Paths

• Processing Instruction Location Step

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"version="1.0"

><xsl:template match="/"> <xsl:value-of select = "processing-instruction()" /></xsl:template></xsl:stylesheet>

<?xml version="1.0" encoding="utf-8"?>type="text/xsl" href = "pi.xsl"

Location Paths

• Wild cards

There are three wild cards: *, node(), @*

The * matches any element node. It will not match attributes, text nodes, comments or processing instructions nodes.

Location Paths

• Matching with *<xsl:stylesheet

xmlns:xsl="http://www.w3.org/1999/XSL/Transform"version="1.0"

><xsl:template match = "*" > <xsl:apply-templates select ="*" /></xsl:template></xsl:stylesheet> Matches all elements and requests

calls on sub-elements only. Nothingis displayed. The text nodes are never reached.

Location Paths

• Matching with node()

The node() wild card matches all nodes: element nodes,text nodes, attribute nodes, processing instruction nodes,namespace nodes and comment nodes.

Not implemented in XT

Location Paths

• Matching with @*

The @* wild card matches all attribute nodes.

XT does not like it in an <xsl:template match ..>but likes it in an <xsl:apply-templates select=…>

Location Paths

• Matching with @*

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"version="1.0"

><xsl:template match = "person" > <b> <xsl:apply-templates select = "@*" /> </b></xsl:template>

</xsl:stylesheet> <?xml version="1.0" encoding="utf-8"?>

<b>19121954p342</b>

<b>19181988p4567</b>

Location Paths

• Multiple matches with |

<xsl:template match = "profession|hobby" > <activity> <xsl:value-of select = "text()"/> </activity></xsl:template><xsl:template match = "*" > <xsl:apply-templates /></xsl:template><xsl:template match = "text()" ></xsl:template></xsl:stylesheet>

Matches all the elements.Skips the text nodes unlessthey describe a professionor hobby.

Location Paths

• Selecting from all descendants with //

// selects from all descendants of the context node as well as the context nodeitself. At the beginning of an Xpathexpression, it selects from all descendantsof the root node.

Location Paths

• Selecting from all descendants with //

<xsl:template match = "//name/last_name/text()" > <xsl:value-of select = "." /></xsl:template><xsl:template match = "text()" ></xsl:template></xsl:stylesheet>

<?xml version="1.0" encoding="utf-8"?>TuringFeynman

Location Paths

• Selecting from all descendants with //

<xsl:template match = "/" >

<xsl:value-of select = "//first_name/text()" />

</xsl:template>

</xsl:stylesheet>

<?xml version="1.0" encoding="utf-8"?>Alan

Location Paths

• Selecting from all descendants with //

<xsl:template match = "/" >

<xsl:apply-templates select = "//first_name/text()" />

</xsl:template>

<xsl:template select = "text()" >

<xsl:value-of select = "." />

</xsl:template>

</xsl:stylesheet>

<?xml version="1.0" encoding="utf-8"?>AlanRichard

Location Paths

• Selecting from all descendants with //

<xsl:template match = "/" >

<xsl:apply-templates select = "//middle_initial/../first_name" />

</xsl:template>

<xsl:template select = "text()" >

<xsl:value-of select = "." />

</xsl:template>

</xsl:stylesheet>

<?xml version="1.0" encoding="utf-8"?>Richard

Predicates

In general, an Xpath expression may refer to morethan one node. Predicates allow us to reduce the number of nodes we are interested in.

Each step in a location path may have a predicatethat selects from the node list that is current at thatstep in the expression.

The boolean expression in the predicate is tested against each node in the context node list. If the expressionis false then that node is deleted from the list.

Predicates<xsl:template match = "/" >

<xsl:apply-templates select = "//profession[.='physicist']/../name" />

</xsl:template>

<xsl:template select = "text()" >

<xsl:value-of select = "." />

</xsl:template>

</xsl:stylesheet>

<?xml version="1.0" encoding="utf-8"?>

Richard M Feynman

Predicates

<xsl:template match = "/" >

<xsl:apply-templates select = "//person[@id='p4567']" />

</xsl:template>

<xsl:template select = "text()" >

<xsl:value-of select = "." />

</xsl:template>

</xsl:stylesheet>

<?xml version="1.0" encoding="utf-8"?>

Richard M Feynman

physicist Playing the bongoes

Predicates<xsl:template match = "/" >

<xsl:apply-templates select = "//person[@born &lt;= 1915]" />

</xsl:template>

<xsl:template select = "text()" >

<xsl:value-of select = "." />

</xsl:template>

</xsl:stylesheet>

<?xml version="1.0" encoding="utf-8"?>

Alan Turing

computer scientist mathematician cryptographer

Predicates<xsl:template match = "/" >

<xsl:apply-templates select = "//person[@born &lt;= 1919 and @born &gt;= 1917]" />

</xsl:template>

<xsl:template select = "text()" >

<xsl:value-of select = "." />

</xsl:template>

</xsl:stylesheet>

<?xml version="1.0" encoding="utf-8"?>

Richard M Feynman

physicist Playing the bongoes

Predicates<xsl:template match = "/" >

<xsl:apply-templates select = "/people/person[@born &lt; 1950]/ name[first_name='Alan']" />

</xsl:template>

</xsl:stylesheet><?xml version="1.0" encoding="utf-8"?>

Alan Turing

General XPath Expressions

Xpath expressions that are not node sets can’t be usedin the match attribute of an xsl:template element.

They can be used for the values for the select attributeof xsl:value-of elements and in location path predicates.

General XPath Expressions

<xsl:template match = "/" > <xsl:apply-templates select = "/people/person" /></xsl:template>

<xsl:template match = "person"> <xsl:value-of select="@born div 10" /></xsl:template>

<xsl:template match = "text()"></xsl:template>

</xsl:stylesheet><?xml version="1.0" encoding="utf-8"?>191.2191.8

General XPath ExpressionsXpath Functions

<xsl:template match = "/" > <xsl:apply-templates select = "/people/person" /></xsl:template>

<xsl:template match = "person"> Person <xsl:value-of select="position()" /></xsl:template>

<xsl:template match = "text()"></xsl:template>

</xsl:stylesheet>

<?xml version="1.0" encoding="utf-8"?>

Person 1

Person 2

General XPath ExpressionsXpath Functions

<xsl:template match = "/" > <xsl:apply-templates select = "//name[starts-with(last_name,'T')]"/></xsl:template>

<xsl:template match = "name"> Mr. T. <xsl:value-of select="." /></xsl:template></xsl:stylesheet>

<?xml version="1.0" encoding="utf-8"?>

Mr. T. Alan Turing

Node set convertedto string

Escaping to Java

Extension functions provide a mechanism for extendingthe capabilities of XSLT by escaping into another languageSuch as Java or JavaScript.

If there is no namespace prefix on the function then it mustbe a core function built into XSLT.

Otherwise, it’s an extension function.

General XPath ExpressionsExtended Xpath Functions

<xsl:template match = "/" >

<xsl:call-template name = "show-date"/>

</xsl:template>

<xsl:template name = "show-date" xmlns:Date = "http://www.jclark.com/xt/java/java.util.Date"> <xsl:variable name = "today" select = "Date:new()" /> <xsl:value-of select = "Date:toString($today)" />

</xsl:template>

</xsl:stylesheet>

Escaping to Java

<?xml version="1.0" encoding="utf-8"?>Mon Mar 19 10:46:17 EST 2001

Escaping to Java // A simple bean saved under Www/beans/MyDate.java// The classpath c:\Jigsaw\Jigsaw\Jigsaw\Www\beansimport java.util.*;

public class MyDate {

Date d;

public MyDate() {

d = new Date(); }

public Date getDate() { return d; }

public String toString() { return "The date is " + d.toString(); } public static void main(String a[]) {

MyDate x = new MyDate(); System.out.println(x); }}

Escaping to Java<xsl:template match = "/" > <xsl:call-template name = "show-date"/></xsl:template>

<xsl:template name = "show-date" xmlns:Date = "http://www.jclark.com/xt/java/MyDate"> <xsl:variable name = "today" select = "Date:new()" /> <xsl:value-of select = "Date:toString($today)" />

</xsl:template></xsl:stylesheet> <?xml version="1.0" encoding="utf-8"?>

The date is Mon Mar 19 11:17:24 EST 2001