o9xml

26
XML OVERVIEW e-logistics 2009 Eduard Rodés Gubern Port de Barcelona

Upload: ergoclicks

Post on 15-Jan-2015

617 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: O9xml

XML OVERVIEW

e-logistics2009

Eduard Rodés GubernPort de Barcelona

Page 2: O9xml

What is XML? EXtensible Markup Language (XML) is a way to apply

structure to a web page. XML provides a standard open format and mechanisms for structuring a document so that it can be exchanged and manipulated.

Page 3: O9xml

XML History The concept of XML is over 30 years old, beginning in the

1960’s. Its origins are in the standardized typesetting codes GENCODE used by the publishing industry.

In the 1970’s, Dr. C. F. Goldfarb proposed a method of describing text that was not specific to an application or hardware. He created Generalized Markup Language (GML). The basic tenents of GML were: Markup should emphasize the document structure not format

or style Simple input syntax for markup using <> and </> tags Markup syntax rules should be strictly controlled so that the

code could be easily read by humans or software programs. Originally the number of document types supported by GML

was limited so the addition of any new tags and document types was relatively simple. By the 1980’s, however, these numbers grew to such an extent that GENCODE and GML proponents formed the ANSI Committee on Computer Languages for the Processing of Text.

Page 4: O9xml

XML History In 1986 this committee promulgated Standardized Generalized

Markup Language (SGML) which standardized the use use of <> and </> tags, as well as Document Type Definitions (DTD). As with GENCODE and GML, the primary use of SGML was for large-scale publishing.

As interest in the Internet grew, and the functionality of Internet browsers evolved, the need for a standardized hypertext application increased. In the early 1990’s the World Wide Web Consortium (W3C) adopted HyperText Markup Language (HTML) as the standard. HTML is a subset of SGML because it borrowed existing tags from SGML and DTD.

As web communities have grown, so has the need to publish new types of documents. Many of these documents are community specific. Unfortunately, HTML cannot be extended to accommodate new document types, and browsers will not support SGML. These needs prompted the W3C to sponsor the development of an “eXtensible Markup Language.”

Page 5: O9xml

XML Design Goals The design goals for XML were proposed by the World Wide

Web Consortium (W3C) and published in January 1998. A synopsis of these design goals is as follows: XML shall be straightforwardly usable over the Internet XML shall support a wide variety of applications XML shall be compatible with SGML It shall be easy to write programs which process XML

documents The number of optional features in XML is to be kept to the

absolute minimum, ideally zero XML documents should be human-legible and reasonably clear The XML design should be prepared quickly The design of XML shall be formal and concise XML documents should be easy to create Terseness in XML markup is of minimal importance

Page 6: O9xml

A markup language is the set of rules. It declares what constitutes markup in a document, and defines exactly what the markup means. It also provides a description of document layout and logical structure.

There exist three types of markup:

• Stylistic: how a document is presented (e.g., the HTML tags <I> for italics, <B> for bold, and <U> for underline)

• Structural: how the document is to be structured (e.g., the HTML tags <P> for paragraph, <SPAN> for creating ad hoc styles in a document, and <DIV> for grouping structures aligned in the same way.

• Semantic: tells about the content of the data (e.g., the HTML tags <TITLE> for page title, <HEAD> for page header information, and <SCRIPT>to indicate a JavaScript in a page.)

In XML the only type of markup that we are concerned with is structural.

The Basics

Page 7: O9xml

Prolog & Document Type Definition XML documents should begin with an XML Declaration

which specifies version

Optionally may also include:

Encoding (recommended) Stand-alone declaration

Document Type Definition is typically next

<?xml version="1.0" encoding='UTF-8' standalone='no' ?><!DOCTYPE root SYSTEM "myDocs.dtd" >

Page 8: O9xml

Tags Tags carry the smallest unit of meaning signifying structure,

format or style of the data. They are always enclosed within angled brackets ‘< >’.

Tags are case-sensitive. This means that the tags <friend>, <Friend>, <FRIEND> carry different meanings and cannot be used interchangeably.

All tags must be paired so that they have a start <friend> and an end </friend>. Tags combined with data form elements.

Page 9: O9xml

Tags There are some basic rules to naming XML tags:

XML is case sensitive Element names may start with any letter or an underscore (_)

After the first character, element names may contain: Letters Numbers periods (.) hyphens (-) underscores (_) colons. (:)

Element names may not contain white spaces. Element names may not start with "XML" or any case

variations of these letters. These are reserved by the World Wide Web Consortium (W3C).

Page 10: O9xml

Elements Elements are the building blocks of a document. An element

consists of a start-tag, an end-tag and the content between them<friend>El Soussy</friend>

Within this single element there may be multiple levels of nested sub-elements which keep the individual pieces of data in a logical and easy to manage structure:

<?xml version=”1.0”?><friend>

<name>El Soussy</name><address>

<street>Palestinian Gardens</street><city>Alexandria</city><country>EG</country><zip>90210</zip>

</address></friend>

This is a well formed document XML

Page 11: O9xml

Attributes Attributes are used to describe the element

If elements are akin to nouns, think of attributes as adjectives modifying the noun.

Can be used to embellish content… or to associate added content to an element Attributes are written in an element's start tag with the name of the

attribute, followed by an equal sign (=) and a value given to that attribute

An HTML example would be <hr width="50%">. This tag tells the browser to put a horizontal rule on the page

Attribute naming rules are the same as those for element names. In addition, a tag may not have two attributes with the same name. Attributes beginning XML are reserved

All attribute values must be quoted. EXAMPLE of an element with an attribute

<book call_no="PZ3.S8195Gr6">

Page 12: O9xml

Attributes A question often arises as whether to make something a

child element or an attribute

There's no rule that says you have to do it one way or another

A good way that helps to decide, is to know the functions of each

Element contents, generally speaking, are meant to be displayed data that is parsed on the screen

Think of attributes as data about the data; that is, it's information that is more important to the parser than to the reader of the data, so it's not rendered on the screen

Page 13: O9xml

Attributes Declaration

As with elements, each attribute must be defined An element's attribute list must be defined outside the element

declaration; each with its own declaration :

<!ATTLIST   elementName   attributeName   type   default >

elementName is the element containing the attribute, and attributeName is the name of the attribute

An attribute defined to be a CDATA type simply contains character data. This is similar to an element's PCDATA except with attributes the value is not parsed

The other type is actually a list of possible values that may be used with an attribute. For example, the HTML <hr> tag and its align attribute which may contain only a left, right, or center value. If we were to write an attribute declaration for this tag, its type would be listed as (left|right|center)

Page 14: O9xml

Attributes The attribute declaration's default is either:

the default value: if this attribute isn't explicitly specified when the element is used or

it is a default value keyword: a default value keyword indicates the usage of the attribute.

Generally, you use a keyword when you don't have a specific value to set as a default

There are three possible keywords:

Keyword Explanation#REQUIRED The attribute must be used in the element.#IMPLIED The attribute is not required.#FIXED "value“ Whether or not the attribute is

explicitly used, this element will have the fixed value as its default and this value cannot be changed.

Page 15: O9xml

Character entities Whenever the XML parser encounters certain characters

like the < and > symbols, it interprets them as instructions. To use these symbols in your content text, you have to use

their entity references

In XML, only five character entities have been predefined:

&gt; > greater than &lt; < less than &amp; & ampersand &apos; ' apostrophe &quot; " double quote

Page 16: O9xml

Document Type Definitions In addition to well-formed documents, there are ‘valid’ XML

documents. This means the documents follow a more formal structure. The main difference between well-formed XML and valid XML is the Document Type Definition (DTD). The DTD is a set of rules that define the elements that may be used, and where they may be applied in relation to each other.

To indicate that an element's contents contain other elements, simply list those child elements in the order they should appear. There are 2 symbols that can be used to separate the listed child elements: , (comma) Each subsequent element follows the preceding

element | (pipe symbol) One or the other element may be used

Page 17: O9xml

Document Type Definitions Every element, and how that element is used in the tag set

has to be declared How they may be used must be declared as well This is the formula for defining an element:

<!ELEMENT   elementName   elementContents > Each element to be used in a valid XML document must be

declared in the DTD. If it's not in there, it can't be used In this element declaration, elementName is the name of

the element, and elementContents indicates what contents the element may contain: (other elements)Elements that can be nested are listed within

parentheses. ANY Indicates this element may contain any combination of

elements or data. EMPTY Indicates this element contains no data or elements. (#PCDATA)Indicates this element contains parsed

character data.

Page 18: O9xml

Document Type Definitions There are also ways to indicate the number of times an

element may appear in a document.

Place the frequency indicator after the element name listed in the elementContents area:  (no indicator)    Element must appear once and only once. ? (question mark) Element may or may not appear + (plus sign) Element may appear one or more times * (asterisk) Element may appear any number of times or not at

all

Page 19: O9xml

Document Type Definitions The DTD can be either an external DTD or an internal DTD

or both. The external DTD exists outside the content of a document

and carries the extension .DTD. This type of DTD could be created for use by a particular community, providing a standardized document format for all members. The DTD reference, added at the beginning of the XML file, tells the XML processor where to find the external DTD, information about its creator, the purpose of the DTD,and the language used

Page 20: O9xml

Document Type Definitions

The internal DTD is written directly in the XML document:

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE friend [

<!ELEMENT friend (name, address+)><!ELEMENT name (#PCDATA)><!ELEMENT address (street, city, country, zip)><!ELEMENT street (#PCDATA)><!ELEMENT city (#PCDATA)><!ELEMENT country (#PCDATA)><!ELEMENT zip (#PCDATA)>

]><friend>

<name>El Soussy</name><address>

<street>Palestinian Gardens</street><city>Alexandria</city><country>EG</country><zip>90210</zip>

</address></friend>

Page 21: O9xml

Document Type Definitions

<!ELEMENT friend (name, address+)><!ELEMENT name (#PCDATA)><!ELEMENT address (street, city, country, zip)><!ELEMENT street (#PCDATA)><!ELEMENT city (#PCDATA)><!ELEMENT country (#PCDATA)><!ELEMENT zip (#PCDATA)>

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE friend SYSTEM

"http://www.galactinav.com/friends/dtd/friends2009.dtd">

<friend><name>El Soussy</name><address>

<street>Palestinian Gardens</street><city>Alexandria</city><country>EG</country><zip>90210</zip>

</address></friend>

<!DOCTYPE friend PUBLIC “-//friends//DTD Standard /EN” “http://www.galactinav.com/friends/dtd/friends2009.dtd”>

The system identifier "http://www.galactinav.com/friends/dtd/friends2009.dtd "

gives the address (a URI reference) of a DTD for the document.

Page 22: O9xml

XML Style Sheets In order to format and view an XML document, you must

combine the document with a style sheet. The document can then be viewed in the appropriate browser.

Style sheets contain the rules that declare how the data of an XML document should appear or be interpreted by the user agent (browser, printer, text-to-speech converter, etc.) This is done by assigning a style to a tag. The style is then applied to the data contained within the tag.

Style sheets can be written in several languages. Two of these are:

Cascading Style Sheets (CSS), an extension of HTML Extensible Stylesheet Language (XSL), an XML specific styling

language

Page 23: O9xml

XSL<?xml version="1.0" encoding="UTF-8"?><xsl:stylesheet version="1.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><xsl:template match="/"><html>

<head><title>Friend</title>

</head><body bgcolor="#ffffff"><h1 align="center">Alex 2009</h1><xsl:for-each select="friend">

<h2><xsl:value-of select="name"/></h2><p><xsl:value-of select="address/street"/><br/><xsl:value-of

select="address/city"/>,<xsl:value-of select="address/country"/>

<xsl:value-of select="address/zip"/><hr/></p>

</xsl:for-each></body>

</html></xsl:template></xsl:stylesheet>

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE friend SYSTEM "http://www.galactinav.com/friends/dtd/friends2009.dtd">

<?xml-stylesheet href="friend.xsl" type="text/xsl"?>

<friend>

<name>El Soussy</name>

<address>

<street>Palestinian Gardens</street>

<city>Alexandria</city>

<country>EG</country>

<zip>90210</zip>

</address>

</friend>

Page 24: O9xml

XSL

Page 25: O9xml

Namespaces A namespace is a collection of names that can be used in

XML documents as element or attribute names. They identify the name as being from a particular domain (standards group, company, industry,etc.)

Namespaces are identified in XML by a Uniform Resource Identifier (URI). The URI includes both a Uniform Resource Name (URN) and a Uniform Resource Locator (URL). URL’s have become very common in the Internet world. The URN is a universally unique number or name that identifies something in a universally unique way. While not as common as URL’s, they will be used more as XML is adopted and used.

Page 26: O9xml

Namespaces Namespaces help standardize and uniquely brand elements and

attributes. Namespaces employ the URI to instruct the user-agent (browser, XML parser, XML application, etc.) where to go to find the DTD against which the XML document is checked for validity.

The namespace syntax may also use the reserved attribute ‘xmlns’. In that case the complete syntax looks like this:

xmlns:[prefix]=”[URI of namespace]” The prefix can be any characters allowed in an XML tag, except it

may not start with xml. Here is an example using the xmlns syntax:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

In the XML document the following statements occur: <xsl:value-of select="address/street"/>

When the document is processed it tells the parser: for the element < xsl:value-of select > use the element tag from

the www.w3.org for the element <value-of select> use the (possibly) undeclared

<value-of select> element tag