xml
DESCRIPTION
Introduction to XMLTRANSCRIPT
Disadvantages of HTML – Need for XML
HTML lacks syntax checking HTML lacks structure HTML is not suitable for data interchange HTML is not context aware – HTML does
not allow us to describe the information content or the semantics of the document
HTML is not object-oriented HTML is not re-usable HTML is not extensible
Introduction to XML
XML – Extensible Markup Language Extensible – capable of being extended Markup – it is a way of adding information to the text
indicating the logical components of a document How is it different from HTML?
HTML was designed to display data XML was designed to store, describe and transport
data XML is also a markup language like HTML XML tags are not predefined – we must design
our own tags.
Differences between HTML and XML
HTML XML
1. Designed to display data 1. Designed to store and transport data between applications and databases.
2. Focus is on how data looks 2. Focus is on what data is
3. It has pre-defined tags such as <B>, <LI>, etc
3. No predefined tags; all tags must be defined by the user. E.g., we can create tags such as <TO>, <FROM>, <BOOKNAME>, etc
4. HTML is used to display information
4. XML is used to describe information
5. Every tag may not have a closing tag.
5. Every tag must have a closing tag.
6. HTML is not case sensitive. 6. XML is case sensitive
7. HTML is for humans 7. XML is for computers
Advantages (Features) of XML - 1 XML simplifies data sharing
Since XML data is stored in plain text format, data can be easily shared among different hardware and software platforms.
XML separates data from HTML To display dynamic data in HTML, the code
must be rewritten each time the data changes. With XML, data can be stored in separate files so that whenever the data changes it is automatically displayed correctly. We have to design the HTML for layout only once.
Advantages (Features) of XML - 2 XML simplifies data transport
With XML, data can be easily exchanged between different platforms.
XML makes data more available Since XML is independent of hardware, software and
application, XML can make your data more available and useful.
Different applications can access your data in HTML pages
XML provides a means to package almost any type of information (binary, text, voice, video) for delivery to a receiving end.
Advantages (Features) of XML - 3 Internationality
HTML relies heavily on ASCII which makes using foreign characters very difficult. XML uses Unicode so that many European and Asian languages are also handled easily
XML Document – Example 1
<?xml version="1.0" encoding="ISO-8859-1"?><class_list>
<student><name>Anamika</name><grade>A+</grade>
</student><student>
<name>Veena</name><grade>B+</grade>
</student></class_list>
XML Document–Example 1 - Explained
The first line is the XML declaration. <?xml version="1.0" encoding="ISO-8859-1"?> It defines the XML version (1.0) It gives the encoding used (ISO-8859-1 = Latin-1/West
European character set) The XML declaration is actually a processing instruction
(PI) an it is identified by the ? At its start and end The next line describes the root element of the
document (like saying: "this document is a class_list“) The next 2 lines describe 2 child elements of the
root (student, name, and grade) And finally the last line defines the end of the root
element: </class_list>
Logical Structure
XML uses its start tags and end tags as containers.
The start tag, the content and the end tag form an element
Elements are the building blocks out of which an XML document is assembled.
An XML document has a tree-like structure with the root element at the top and all the other elements are contained within each other.
Tree structure
XML documents form a tree structure. XML documents must contain a root element. This
element is "the parent" of all other elements. The elements in an XML document form a document
tree. The tree starts at the root and branches to the lowest level of the tree.
All elements can have sub elements (child elements) <root>
<child><subchild>.....</subchild>
</child></root
XML – Example 2
XML – Example 2
<bookstore><book category = "COOKING">
<title lang = "en">Everyday Italian</title><author>Giada De Laurentiis</author><year>2005</year><price>30.00</price>
</book>
<book category = "CHILDREN"><title lang = "en">Harry Potter</title><author>J K. Rowling</author><year>2005</year><price>29.99</price>
</book>
<book category = "WEB"><title lang = "en">Learning XML</title><author>Erik T. Ray</author><year>2003</year><price>39.95</price>
</book></bookstore>
Important Definitions
XML Element An element is a start tag, content, and an
end tag. E.g., <greeting>”Hello World</greeting>
XML Attribute An attribute provides additional information
about elements E.g., <note priority = “high”>
Important Definitions
Child elements – XML elements may have child elements<employee id = “100”>
<name><first>Anita</first><initial>D</initial><last>Singh</last>
</name></employee>
Parent Element Name
Children of parent element
XML Element
An XML element is everything from the element's start tag to the element's end tag.
An element can contain other elements, simple text or a mixture of both.
Elements can also have attributes.
XML Syntax Rules
All XML elements must have a closing tag
XML tags are case sensitive. The tag <Book> is different from the tag
<book> Opening and closing tags must be
written with the same case<Message>This is incorrect</message><message>This is correct</message>
XML Syntax Rules
XML elements must be properly nested HTML permits this:
<B><I>This text is bold and italic</B></I>But in XML this is invalid. All elements must be properly nested within one another.<B><I>This text is bold and italic</I></B>
XML documents must have a root element. It is the parent of all other elements.<root>
<child><subchild>.....</subchild>
</child></root>
XML Syntax Rules
XML Entity References Some characters have a special meaning
in XML. E.g., If you place a character like "<" inside an XML element, it will generate an error because the parser interprets it as the start of a new element. <message>if salary < 1000 then
</message> To avoid this error, replace the "<"
character with an entity reference: <message>if salary < 1000
then</message>
XML Syntax Rules
XML Entity References There are 5 predefined entity references
in XML:Entity Symbol Descriptio
n< < Less than
> > Greater than
& & Ampersand
' ‘ Apostrophe
" “ Quotation mark
XML Syntax Rules
Comments in XML (similar to HTML)<!-- This is a comment -->
White space is preserved in XML but not in HTML XML Naming Rules
Names can contain letters, numbers, and other characters
Names cannot start with a number or punctuation character
Names cannot start with the letters xml (or XML, or Xml, etc)
Names cannot contain spaces Any name can be used, no words are reserved.
XML Markup Delimiters
Every XML element is made up of the following parts:
Symbol Description< Start tag open
delimiter</ End tag open
delimitersomething element name> tag close delimiter/> empty tag close
delimiter
Different Types of XML Markups 5 Types of Markup in XML
Elements Entities Comments Processing Instructions Ignored Sections
Element Markup
Element Markup It is composed of 3 parts: start tag, the
content, and the end tag. Example: <name>Neetu</name> The start tag and the end tag can be
treated as wrappers The element name that appears in the start
tag must be exactly the same as the name that appears in the end tag.
Example: <Name>Neetu</name>
Different Types of XML Markups Attribute Markup
Attributes are used to attach information to the information contained in an element.
General syntax for attributes is: <elementname property = ‘value’>
Or <elementname property = “value”>
Attribute value must be enclosed within quotation marks
Use either single quotes or double quotes but don’t mix them.
Attribute Markup
If we specify the attributes for the same element more than once, the specifications are merged.
<?xml version = “1.0”?><myparas><para num = “first”>This is Para 1
</para><para num = ‘second’ color = “red”>This
is Para 2</para><myparas>
Attribute Markup
When the XML processor encounters line 3, it will record the fact that para element has the num attribute
When it encounters the 4th line it will record the fact that para element has the color attribute
Reserved Attribute
The xml:lang attribute is reserved to identify the human language in which the element was written
The value of attribute is one of the following: en English fr French de German
XML Attributes
Attribute provides additional information about the element Similar to attributes in HTML e.g., <IMG SRC=“sky.jpg”> In this SRC is the
attribute XML Attribute values must be quoted
XML elements can have attributes in name/value pairs just like in HTML. In XML the attribute value must always be quoted.
<note date = 01/01/2010> <---------- This is invalid <to>Priya</to><from>Deeali</from>
</note>
<note date = “01/01/2010”> --------- Now OK since enclosed in double quotes
<note date = ‘01/01/2010’> --------- This is also OK since enclosed in single quotes
XML Attributes and Elements Consider the following example:
<person gender = "female">
<firstname>Geeta</firstname>
<lastname>Shah</lastname></person>
<person><gender>female</
gender><firstname>Geeta</
firstname><lastname>Shah</
lastname></person>
Gender is an attribute
Gender is an element
Problems with XML Attributes Attributes cannot contain multiple values
whereas elements can Attributes cannot contain tree structures Attributes are not easily expandable (for
future changes) Attributes are difficult to read and maintain Use elements for data. Use attributes for information that is not
relevant to the data.
Illustrating Problematic Attributes Consider the following example:
<note day=“03" month="02" year="2010"to="Tina" from=“Yasmin" heading="Reminder"body=“Happy Birthday!"></note>
Better way:<note><date><day>03</day><month>02</month><year>2010</year></date><to>Tina</to><from>Yasmin</from><heading>Reminder</heading><body>Happy Birthday!</body></note>
When to use Attributes?
XML Attributes can be used to assign ID references to elements. Metadata – data about data – should be stored as attributes The ID can then be used to identify the XML element
<messages><note id="501">
<to>Tina</to><from>Yasmin</from><heading>Reminder</heading><body>Happy Birthday!</body>
</note><note id="502">
<to>Yasmin</to><from>Tina</from><heading>Re: Reminder</heading><body>Thank you, my dear</body>
</note></messages>
What does Extensible mean in XML?
Consider the following XML example:<note>
<to>Anita</to><from>Veena</from><body>You have an exam tomorrow</body>
</note>Suppose we create an application that extracted the <to>, <from> and <body> elements from the XML document to produce the result:MESSAGE To: AnitaFrom:VeenaYou have an exam tomorrow
What does Extensible mean in XML?
Now suppose the author of the XML document added some extra information to it:
<note><date>2008-01-10</date><to>Anita</to><from>Veena</from><heading>Reminder</heading><body>You have an exam tomorrow</body>
</note>
What does Extensible mean in XML?
This application will not crash because it will still find the <to>, <from> and <body> elements in the XML document and produce the same output.
XML Validation
What is a “well formed” XML document? XML with correct syntax is "Well Formed"
XML. A "Well Formed" XML document has correct
XML syntax. XML documents must have a root element XML elements must have a closing tag XML tags are case sensitive XML elements must be properly nested XML attribute values must be quoted
Wellformed Document - Rule 1 Elements are case-sensitive. If you define you language to use
lowercase elements, then all instances of those elements must be in lowercase.
Bad Examples…
<H1>Sample Heading</H1>
<h1>Sample Heading</H1>
<H1>Sample Heading</h1>
Rule 2:
All elements that contain text or other elements must have both start and ending tags.
Rule 3:
All empty elements (commonly known as standalone tags) must have a slash (/) before the end of the tag.
Rule 4:
All attribute values must be contained in quotes, either single or double – no exceptions!
Rule 5:
Elements may not overlap. Elements must be nested properly
within other elements and can not start before a sub-element and end within the sub-element.
Rule 6:
Isolated markup characters (characters essential to creating markup documents) may not appear in parsed content as is.
Isolated markup characters must be represented as a character entity and include the following: <, [, ], >, ', " and &.
Isolated Markup Characters
< <
[ [
] ]
> >
' '
" "
& &
Bad Examples…
<h1>Jack & Jill</h1>
<equation>5 < 2</equation>
These examples are invalid since they are both examples forgetting the semi-colon following the character entity.
Good Examples…
<h1>Jack & Jill</h1>
<equation>5 < 2</equation>
Rule 7:
Element (and attribute) names must start with either a letter (uppercase or lowercase) or a underscore.
Element names may contain letters, numbers, hyphens, periods and underscores inclusively. BAD
EXAMPLES
<bad*characters><illegal space><99number-start>
GOOD EXAMPLES
<example-one><_example2><Example.Three>
XML Validation
A “well formed” XML document conforms to the rules of a Document Type Definition (DTD)
<?xml version="1.0" encoding="ISO-8859-1"?><!DOCTYPE note SYSTEM "Note.dtd"><note><to>Tina</to><from>Yasmin</from><heading>Reminder</heading><body>Happy Birthday!</body></note>
Viewing XML Files - 1
Viewing XML Files - 2
The XML document will be displayed with color-coded root and child elements.
A plus (+) or minus sign (-) to the left of the elements can be clicked to expand or collapse the element structure.
To view the raw XML source (without the + and - signs), select "View Page Source" or "View Source" from the browser menu.
Viewing XML Files - 3
Why XML documents display like this? XML documents do not carry information
about how to display the data. Since XML tags are created by the user of
the XML document, browsers do not know if a tag like <table> describes an HTML table or a dining table.
Without any information about how to display the data, most browsers will just display the XML document as it is.
Using CSS to display XML Files CSS (Cascading Style Sheets) can be
used to format a XML document. Consider this XML document:
Displaying Formatted XML document-1
<?xml version="1.0" encoding="ISO-8859-1"?><?xml-stylesheet type = "text/css" href = "birthdate.css"?><birthdate> <person>
<name><first>Anokhi</first><last>Parikh</last>
</name> <date>
<month>01</month><day>21</day><year>1992</year>
</date> </person></birthdate>
Displaying Formatted XML document-2
birthdate{
background-color: #ffffff;
width: 100%;}person{
margin-left: 0;}name{
color: #FF0000;font-size: 20pt;
}
month, day, year{
display:block;color: #000000;margin-left: 20pt;
}
Stylesheet – birthdate.css
Final Output
XSLT
XSL is a language for style sheets An XSL style sheet is a file that describes how to
display an XML document XSL contains a transformation language for XML
documents: XSLT. XSLT is used for generating HTML web pages from XML data.
XSLT - eXtensible Stylesheet Language Transformations
XSLT is used to transform an XML document into an HTML document
XSLT is the recommended style sheet language for XML