essential guide xml pdf
TRANSCRIPT
-
7/29/2019 Essential Guide XML PDF
1/71
THE ESSENTIAL GUIDE TO
BY SHARON L. HOFFMAN AUGUST 2005
XMLXML
is a key technology for sharing data
between business entities because it
bridges different ways of storing and
referencing data. Although XML can be described as a
language, the extensible nature of XML means that its
more correctly classified as a standard.Many interrelated standards (for a list, see Essential
XML Standards on page 4) complement XML and expand
its capabilities. XML is also a fundamental building block
for other standards. For example, many Web-services
standards, such as Simple Object Access Protocol (SOAP)
and Web Services Description Language (WSDL), are based
on XML. To give you a sense of how you might use XML
in your own applications, lets start with a quick look at
SUPPLEMENT TO iSeries NEWS 2005
XML syntax and how XML compares with languages
used for related tasks.
XML in ContextAn XML document is made up of XML elements. Each
element contains a starting tag, an ending tag, and (usually)
data nested between the two tags. By choosing descriptive
names for elements, you can make your XML documents
more human-readable and therefore self-documenting. InFigure 1, the highlighted line is a single element called
product_code. If a document contains more than one element
of the same type, the tags will be repeated for each element
as shown for the product_code and requested_qty elements
in Figure 1. For more information about XML syntax see
Essential XML Syntax and Terminology on page 3.
Repeating the data description for every element means
that XML documents are entirely self-contained you
wont need to refer to a database layout, for example.
However, the overhead of repeating all
the element-description information
quickly becomes unwieldy. As a result,
most developers prefer using data-description languages (e.g., SQL, DDS)
to define databases. However, XML shines
in data-transfer applications that involve
relatively small amounts of data (these are
typically single transactions such as an
inventory inquiry or a purchase order).
Data transfer is by far the most common
XML application in iSeries environments.
However, you can also use XML to add
meaning to text within documents. Used
in this way, XML becomes a powerful
Figure 1:Sample XML document
bike component availability9/1/2005
Acme CompanySharon [email protected]
1234556789225
-
7/29/2019 Essential Guide XML PDF
2/7
THE ESSENTIAL GUIDE TO SOFTWARE XML
SUPPLEMENT TO iSeries NEWS 20052
-
7/29/2019 Essential Guide XML PDF
3/7
THE ESSENTIAL GUIDE TO SOFTWARE XML
1 XML is case sensitive.
2 Generally, white space (e.g., indents, blank lines) in anXML document is ignored.
3 You can choose any element names you like aslong as they conform to a few basic rules:
Element names cannot contain spaces. Element names must begin with a letter or
an underline.
After the first character, element names cancontain numbers, hyphens, periods, colons,letters, and underscores. (Colons are usuallyavoided in element names because they havespecial meaning within XML.) Element names cannot begin with the lettersxml, regardless of case (i.e., xml, XML, xMl,and Xml are all invalid).
4 Elements can contain one or more attributes. Inmany cases, the XML designer may choose whether touse elements or attributes to define a particular structure.As a rule of thumb, attributes should be used forinformation that is not integral to the element.
5 An element cannot contain more than one attribute withthe same name.
6 Both starting and ending tags are required for allelements except empty elements. Empty elements occurmost often when an element is completely defined by itsattributes.
7 Elements must be properly nested (i.e., once an innerelement tag is opened, it must be closed before anyouter tags).
The following nesting is correct:
Sharon
Hoffman
The following nesting is syntactically correct, although itdoesnt make much sense:
Sharon
The following nesting is syntactically incorrect:
Sharon
Hoffman
8 The outermost element in any XML document isreferred to as the root element.
9 The root element may be preceded by a documentdeclaration and processing instructions.
10 Built-in XML entities are used to include a characterthat has special meaning in XML (e.g., a greater-thansign) within XML content. You can also defineadditional entities as short-hand for text and structuresthat you use repeatedly.
11 An XML document that has correct syntax is well formed.
12 An XML document that conforms to the structure definedby its Document Type Definition (DTD) or schema isvalid. It is possible for an XML document to be wellformed but invalid, but the reverse is not possible.
3
tool for organizing information and improving search
capabilities. To understand the benefits of an XML-
encoded document, you should consider the differences
between XML and HTML.
Although the two languages are syntactically similar
because they have the same antecedents (see Essential XML
History on page 5 for information), they have different
strengths. HTML is best used to format information for
display, while the descriptive information in XML tagsmakes it easier to deal with document content. For example,
suppose you have a document containing a list of PC
printers that contains information about the features of each
printer model. If the document is stored in HTML, its
difficult to create a search that finds all printers that support
color printing, duplex printing, and can print at least 10
pages per minute. Conversely, if you store the same document
using XML, you would probably create separate elements
for each important feature (e.g., maximum_print_speed)
and could easily develop an application that searches for
all printers that meet your criteria. Of course, a database
is ideal for such a search, but XML provides database-like
search capabilities for information that is stored in documents
such as user manuals or marketing brochures. As youll
see in the following section, the XML data can easily be
converted into HTML for display purposes.
Because XML documents are plain text, you can write
XML using any text editor (e.g., Notepad). However, as you
begin working with XML, youll quickly find that an XML-
aware editor is a big time-saver. An XML editor shouldhelp you write XML by providing syntax-checking and
document-generation capabilities. For example, if you begin
to create a new element, some editors will automatically
generate the ending tag for you.
An XML document can stand entirely on its own, without
any related documents. More often, though, an XML
document is part of a larger application architecture that
includes components that define the structure required for
a particular type of XML document, solutions that reformat
XML data (e.g., create an HTML document for display using
data from an XML document), and applications that process
ESSENTIAL XML SYNTAX AND TERMINOLOGY
SUPPLEMENT TO iSeries NEWS 2005
-
7/29/2019 Essential Guide XML PDF
4/7SUPPLEMENT TO iSeries NEWS 2005
ESSENTIAL XML STANDARDS
XLINK is a standard for defining hyperlinks in XML. XML Namespaces make it possible to create unique
element names. XML Schemas define the rules for the specialized
XML documents used to define the structure ofother XML documents.
XPATH addresses each part of an XML documentvia a hierarchical structure (e.g., first_name withincustomer_name within quote_request).
XQUERY is a relatively new standard that providesSQL-like query capabilities for XML documents.
Extensible Stylesheet Language (XSL) formatsXML documents for display. There are twocomponents of the XSL standard: XSLTransformations (XSLT) and XSL FormattingObjects (XSL FO).
XML itself is a standard, but it also involves many related standards. Here are
some of the most widely used XML standards.
XML documents. Understanding how these pieces work
together is vital to understanding XML.
The Big PictureAn XML document is almost always associated with a
second document that defines the valid structure for a
particular type of documents. For example, an XML
document might contain a particular inventory inquiry from
XYZ Company, but the structural-definition document would
define the format for all inventory inquiry documents.
There are two standards for these structural-definition
documents: DTD is the older and simpler standard, whereas
XML schema is the newer standard. DTDs and schemas
serve the same purpose, but their complexity and capabilities
vary significantly.
Figure 2 contains a DTD that you could use to define the
XML document in Figure 1, and Figure 3 contains the schema
for the same document. Both the DTD and the schema were
generated using an XML editor (WebSphere Development
Studio Client for iSeries WDSc, in this case). Youll find
that creating a sample document (e.g., an inventory inquiry)and using it to generate an initial version of the DTD or
schema is often the simplest way to create a structural-
definition document. While you may need to clean up the
generated code, it will give you a good starting point for
developing the DTD or schema.
Whether you use a DTD or a schema, there is
typically a one-to-many relationship between the
DTD or schema and the XML documents. For
example, you could publish a DTD or a schema
(or both) specifying the format for incoming
inventory inquiries and, hopefully, many of your cus-
tomers would then begin to send you inventory
inquiries in XML format. DTDs and schemas forexternal documents (versus documents that are inter-
nal to a particular company) are usually published
online so that they can be shared more easily.
Ideally, everybody would use the same structure
for the same type of document (e.g., inventory inquiries), but
thats not always the case not even within a single industry.
Fortunately, many industry groups are working on standards
that should help alleviate some of the Tower-of-Babel aspects
of XML. Youll find the latest information on industry-specific
XML structures online at xml.org.
In addition to DTDs and schemas, other components can
be associated with XML documents. For example, if you
plan to display an XML document in a Web page, youll
probably want to first convert the XML document into
an HTML document. Similarly, you often might need to
create multiple XML documents that contain the same
general information but use slightly different structures.
If you need to convert lots of documents between the same
two structures, it makes sense to automate the process. The
simplest way to do this is via an Extensible Stylesheet
Language Transformations (XSLT) document that defines
how input elements should be formatted in the output (XML
or HTML) document. For example, if several of your vendors
accept inventory inquiries in XML, but each uses a slightly
different schema, you could develop a generic XMLinventory inquiry, then create the variations using XSLT.
As with DTDs and schemas, your XML editor should include
tools to help you create XSLT documents.
An XSLT document works in conjunction with an XSLT
Figure 2:A DTD generated by WDSc
for the XML document in Figure 1
(customer_reference,date_required,customer,requested_products)>
THE ESSENTIAL GUIDE TO SOFTWARE XML
4
-
7/29/2019 Essential Guide XML PDF
5/7SUPPLEMENT TO iSeries NEWS 2005
Although most XML editors include an XML parser, youllalso need an XML parser for production applications. XMLparsers may be part of a Web application server, or theymay be available as separate software options. There are
two general standards for XML parsers: DocumentObject Model (DOM) and Simple API for XML (SAX).The only functional difference between DOM parsers
and SAX parsers is that DOM parsers can modify anXML document, while SAX parsers are read-only (ofcourse, an application that uses a SAX parser can alwayswrite out a new XML document in a different formatthan the incoming XML document). The other differ-ences between DOM and SAX parsers dont affect theircapabilities, but they can have an impact on ease-of-use,and in some cases, performance.
SAX parsers are event-driven and are best suited forapplications that need to choose specific elements from alarger XML document. Youll find the SAX parsers moreintuitive if your programming background includes languages
that have event-driven capabilities (e.g., Visual Basic, Java).DOM parsers read an entire XML document into anapplication where the elements can be referenced, muchas an RPG program might reference fields in a recordformat. Therefore, DOM parsers have an advantage overSAX parsers when you need to process a high percentageof the elements in an XML document. In addition,DOM parsers generally feel more natural than SAXparsers if your programming background includes procedurallanguages such as RPG and Cobol.
Essential XML History
The Essential XML Resources
The histories of individual computer languages are mostly just curiosities, but XMLs history provides a glimpse into
its syntax as well. XML is part of the same family of languages as HTML and is based on Standard Generalized Markup
Language (SGML). SGML is a direct descendent of Generalized Markup Language, which was developed by IBM
researchers in the 1960s.
The concept behind markup languages is to separate document content from document structure and display. Thus
in both XML and HTML, the tags contain information about data formatting information in HTML, and contextinformation in XML.
SGML became an ISO standard in 1986. HTML, which evolved somewhat independently but incorporates many SGML
concepts, is slowly being brought back into compliance with the larger SGML standard.
In 1996, developers began working on a simplified version of SGML that focuses on document structure rather than
document format. That project is the basis for XML, which became a Worldwide Web Consortium standard in 1998.
ESSENTIAL XML PARSER CONCEPTS
THE ESSENTIAL GUIDE TO SOFTWARE XML
5
Charles F. Goldfarbs All the XML Books in PrintGoldfarb, one of the developers of SGML, attempted to
list all the XML books in print. Although the list was last
updated in early 2004, its still a useful resource.xmlbooks.com
The CoverPagesThe XML CoverPages include XML news,
background material, and technical tips.
xml.coverpages.org
DevX.comXML FAQs, articles, discussion groups and more.
devx.com/xml
World Wide Web Consortium XML pagew3.org/XML
XML.comOReilly Media, Inc., a premier technical book publisher,
maintains this XML information site.
xml.com
IBM RESOURCES
Developerworks XML site
www-106.ibm.com/developerworks/xmliSeries XML information home page
www-1.ibm.com/servers/enable/site/xml/iseries/index.html
Two IBM white papers illustrate how to processXML documents using RPG or Cobol:
Parsing XML documents using the newV5R3 ILE COBOL syntaxwww-1.ibm.com/servers/enable/site/education/abstracts/3db2_abs.html
XML Interface for RPG maps XMLinto DB2 UDB for iSerieswww-1.ibm.com/servers/enable/site/education/ibo/record.html?xmlface
http://xmlbooks.com/http://xmlbooks.com/http://xml.coverpages.org/http://devx.com/xmlhttp://www.w3.org/XML/http://xml.com/http://localhost/var/www/apps/conversion/tmp/scratch_7/www-106.ibm.com/developerworks/xmlhttp://www-1.ibm.com/servers/enable/site/xml/iseries/index.htmlhttp://www-1.ibm.com/servers/enable/site/education/abstracts/3db2_abs.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_7/www-1.ibm.com/servers/enable/site/education/ibo/record.html?xmlfacehttp://localhost/var/www/apps/conversion/tmp/scratch_7/www-1.ibm.com/servers/enable/site/education/ibo/record.html?xmlfacehttp://www-1.ibm.com/servers/enable/site/education/abstracts/3db2_abs.htmlhttp://www-1.ibm.com/servers/enable/site/xml/iseries/index.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_7/www-106.ibm.com/developerworks/xmlhttp://xml.com/http://www.w3.org/XML/http://devx.com/xmlhttp://xml.coverpages.org/http://xmlbooks.com/ -
7/29/2019 Essential Guide XML PDF
6/7SUPPLEMENT TO iSeries NEWS 2005
THE ESSENTIAL GUIDE TO SOFTWARE XML
processor software that applies the rules defined in the
XSLT document to an incoming XML document and pro-
duces an output document in HTML, XML, or text format.
An XSLT processor is typically bundled into a Web appli-
cation server such as WebSphere Application Server (WAS)
and can be accessed by calling APIs in an application.
Most XML editors also include an XSLT processor for
testing purposes.
From XML to the Database
and Vice-VersaIn an iSeries environment, XML projects
almost invariably involve extracting
data from DB2 UDB for iSeries or
moving data from XML documents
into the database. While its possible
to store entire XML documents in
iSeries files, more often youll need to
separate the data for one or more elements
from its tags and store the data itself as a
field or fields within existing iSeries databaserecords. Youll also find lots of requirements for
the opposite task creating XML documents using data
from one or more database records.
The underlying software that is used to separate an
XML document into data and data-description components
is an XML parser. An XML parser understands the rules
of XML syntax, just as the parser that is part of the RPG
compiler understands RPG syntax. For more about XML
parsers, see Essential XML Parser Concepts on page 5.
As you begin developing in XML, you might not even
realize that youre using an XML parser. For example,
when an XML editor validates an XML document against
its associated DTD or schema, an XML parser is invoked
to perform the validation. XML parsers, including those
for iSeries, are typically free. The iSeries-specific XML
parser support is packaged in the no-charge licensed program
product, XML Toolkit for iSeries (5733-XT1).If youre working with very low document
volumes, it may be possible to assemble and
disassemble XML documents using the tools
built into an XML editor. However, for
production processing of XML documents,
youll usually need to develop code that
moves data back and forth between a par-
ticular type of XML document (e.g., an
inventory inquiry) and the associated data-
base records.
You can create an XML document using a
variety of techniques. At one end of the spectrum,
you could write an RPG program that creates an XML
document as an iSeries database file by hand-coding the
tags and their contents. Then, you could convert the database
file to a stream file using the CPYTOSTMF (Copy to Stream
File) CL command. Other options include using APIs to
output a stream file from an RPG program, generating an
XML document using the results of an SQL query, or
writing a Java application that builds an XML document.
Although you can write custom code to extract data from
an XML document, its simpler to leverage the capabilities
of an XML parser. For example, you might write code that
invokes specific parser functions such as reading the data
for a particular type of element (e.g., product_code).Java is the language of choice for working with XML
because it includes extensive support for accessing parser
APIs. However, you can also invoke parser APIs using
RPG or Cobol, and products are available that will auto-
mate part of the process of assembling or disassembling
XML documents.
Explore XMLXML is a powerful tool for communicating data between
applications using different databases and running on different
platforms, and it is rapidly becoming the medium of choice for
transaction-level data transfer. XML can also organize infor-
mation within a document, thus making it easier to modifyand search large amounts of text. For all its strengths, XML is
still a relatively new technology with a maze of confusing,
and sometimes competing, standards. To take advantage of
XML, it helps to have a clearly defined goal and the flexi-
bility to experiment with various tools and techniques. Its
also useful to understand how other businesses are using XML.
To explore the opportunities XML offers, visit the Web
sites listed in Essential XML Resources on page 5.
Sharon L. Hoffman is a senior technical editor foriSeries NEWS.
Figure 3:An XML schema generated by WDSc
for the XML document in Figure 1
6
-
7/29/2019 Essential Guide XML PDF
7/7SUPPLEMENT TO iSeries NEWS 2005
THE ESSENTIAL GUIDE TO SOFTWARE XML
7