xml

11
XML Prof. Mukesh N. Tekwani Page 1 of 11 XML 1 What are the disadvantages of XML? HTML lacks syntax checking HTML lacks structure HTML is not suitable for data interchange HTML is not context aware – HTML does not allow us to describe the information content or the semantics of the document HTML is not object-oriented HTML is not re-usable HTML is not extensible HTML is suitable only for displaying content, not for “what” the content is about. HTML has a few tags to describe the meaning of the text, such as <ADDRESS> HTML is not flexible enough to markup wide variety of documents. HTML can describe only <HEAD> and <BODY>. It cannot describe abstracts, chapters, part, sections etc. 2 What is XML? XML – Ex tensible M arkup L anguage o Extensible – capable of being extended. We can make our own elements/tags. o Markup – it is a way of adding information to the text indicating the logical components of a document How is it different from HTML? o HTML was designed to display data o XML was designed to store, describe and transport data o XML separates data from HTML XML is also a markup language like HTML XML tags are not predefined – we must design our own tags. XML is portable - It is easy to produce files that capture the rules of your markup and enable other programs to properly read or process your XML documents. XML does not do anything like HTML. XML was created to structure, store, and transport information. XML is not a replacement for HTML; they do different things. 3 State the differences between HTML and XML. HTML XML 1 Designed to display data Designed to store and transport data between applications and databases. Transport here means that data can be exchanged between incompatible systems, over the Internet. 2 Focus is on how data looks Focus is on what data is 3 It has pre-defined tags such as <B>, <LI>, etc No predefined tags; all tags must be defined by the user. E.g., we can create tags such as <TO>, <FROM>, <BOOKNAME>, etc 4 HTML is used to display information XML is used to describe information 5 Every tag may not have a closing tag. Every tag must have a closing tag. 6 HTML is not case sensitive. XML is case sensitive 7 HTML is for humans XML is for computers 4 What are the advantages of XML? OR What are the features of XML? XML simplifies data sharing : Since XML data is stored in plain text format, data can be easily

Upload: mukesh-tekwani

Post on 09-Dec-2014

1.069 views

Category:

Education


3 download

DESCRIPTION

Introduction to XML. Lecture notes for TYBSc (Computer Science) and TYBSC (IT)

TRANSCRIPT

Page 1: XML

XML

Prof. Mukesh N. Tekwani Page 1 of 11

XML

1 What are the disadvantages of XML?

• HTML lacks syntax checking

• HTML lacks structure

• HTML is not suitable for data interchange

• HTML is not context aware – HTML does not allow us to describe the information

content or the semantics of the document

• HTML is not object-oriented

• HTML is not re-usable

• HTML is not extensible

• HTML is suitable only for displaying content, not for “what” the content is about.

• HTML has a few tags to describe the meaning of the text, such as <ADDRESS>

• HTML is not flexible enough to markup wide variety of documents. HTML can describe

only <HEAD> and <BODY>. It cannot describe abstracts, chapters, part, sections etc.

2 What is XML?

• XML – Extensible Markup Language

o Extensible – capable of being extended. We can make our own elements/tags.

o Markup – it is a way of adding information to the text indicating the logical components

of a document

• How is it different from HTML?

o HTML was designed to display data

o XML was designed to store, describe and transport data

o XML separates data from HTML

• XML is also a markup language like HTML

• XML tags are not predefined – we must design our own tags.

• XML is portable - It is easy to produce files that capture the rules of your markup and

enable other programs to properly read or process your XML documents.

• XML does not do anything like HTML. XML was created to structure, store, and transport

information.

• XML is not a replacement for HTML; they do different things.

3 State the differences between HTML and XML.

HTML XML

1 Designed to display data Designed to store and transport data between

applications and databases. Transport here means

that data can be exchanged between incompatible

systems, over the Internet.

2 Focus is on how data looks Focus is on what data is

3 It has pre-defined tags such as <B>,

<LI>, etc

No predefined tags; all tags must be defined by the

user. E.g., we can create tags such as <TO>,

<FROM>, <BOOKNAME>, etc

4 HTML is used to display information XML is used to describe information

5 Every tag may not have a closing tag. Every tag must have a closing tag.

6 HTML is not case sensitive. XML is case sensitive

7 HTML is for humans XML is for computers

4 What are the advantages of XML? OR What are the features of XML?

• XML simplifies data sharing : Since XML data is stored in plain text format, data can be easily

Page 2: XML

XML

Page 2 of 11 [email protected]

shared among different hardware and software platforms.

• XML separates data from HTML : To display dynamic data in HTML, the code must be

rewritten each time the data changes. With XML, data can be stored in separate files so that

whenever the data changes it is automatically displayed correctly. We have to design the HTML

for layout only once.

• XML simplifies data transport: Data can be easily exchanged between different platforms.

• XML makes data more available

o Since XML is independent of hardware, software and application, XML can make

your data more available and useful.

o Different applications can access your data in HTML pages

• XML provides a means to package almost any type of information (binary, text, voice, video) for

delivery to a receiving end.

• Internationality: HTML relies heavily on ASCII which makes using foreign characters very

difficult. XML uses Unicode so that many European and Asian languages are also handled easily

5 What are the types of XML markup?

There are 5 types of XML markup:

Elements:

1. XML elements describe the meaning of the text they contain.

2. Elements occur in pairs with a start tag and end tag that enclose the text they markup.

3. Inside the start tag, a keyword indicates the meaning of the markup. The end tag contains

the same key word with a forward slash (/). Both tags start with a less than sign and end

with a greater than sign.

<LETTER>……….</LETTER>

4. Some elements do not occur in pairs. These elements are said to be empty. The tag for

the element ends /> e.g., <BR/>

5. Some elements take attributes that modify or expand on the meaning they impart to

content they contain. Attributes are set equal to values enclosed between quotation

marks.

Entities: 1. In HTML we use entities such as &gt; &lt; &nbsp; etc. Entities in XML are very similar

to entities in HTML.

2. Some characters have a special meaning in XML. E.g., If you place a character like "<"

inside an XML element, it will generate an error because the parser interprets it as the

start of a new element. <message>if salary < 1000 then </message>

3. XML also enables us to use any Unicode character you want thus, producing documents

in other languages other than English.

4. XML entities can be defined in your XML file or externally and you can incorporate the

entities in your XML file.

5. To avoid this error, replace the "<" character with an entity reference:

<message>if salary &lt; 1000 then</message>

6. The predefined entites in XML are:

Entity Symbol Description

&lt; < Less than

&gt; > Greater than

&amp; & Ampersand

&apos; ‘ Apostrophe

&quot; “ Quotation mark

Page 3: XML

XML

Prof. Mukesh N. Tekwani Page 3 of 11

Comments: comments are same as HTML. <!-- --> .

Processing instructions: Processing instructions (PIs) enable us to embed information to be passed to an application

right in your XML document. <?name data> is the syntax.

The name, or PI target, should be anything that the processing application will recognize.

Targets with XML are reserved for standardization purposes.

The data component of PI can be anything that the processing application understands.

Ignored sections: In a mathematical expression it becomes necessary to use characters that are XML reserved.

If you put them into a ignored section like this:

<![CDATA[4 <3 is false.]]>

the expression with the less than sign passes to the application.

All ignored sections start with <![CDATA[ and end with ]]>

6 Simple example of XML document:

<?xml version="1.0" encoding="ISO-8859-1"?> <class_list> <student> <name>Anamika</name> <grade>A+</grade> </student> <student> <name>Veena</name> <grade>B+</grade> </student> </class_list>

• The first line is the XML declaration.

o It defines the XML version (1.0)

o It gives the encoding used (ISO-8859-1 = Latin-1/West European character set)

o The XML declaration is actually a processing instruction (PI) an it is identified

by the ? At its start and end

• The next line describes the root element of the document (like saying: "this document is

a class_list“). Every XML document must have only one root element. The root element

is like the parent element. All other elements must be completely enclosed within that

element. In our example, the root element is <class_list>

• In XML the non-empty element must consist of three things: a start tag, content (either

text or other elements) and an end tag. The name that you use in the element start tag

must exactly match (including case) the name you use in the end tag.

• The next 2 lines describe child elements of the root (student, name, and grade)

• And finally the last line defines the end of the root element: </class_list>.

• XML documents can contain empty XML elements.

Example,

<banner source="topbanner.gif"/>

<rule/>

Page 4: XML

XML

Page 4 of 11 [email protected]

<footer source="foot.gif"/>

With empty elements, a close delimiter is used . /> or you can you can use a closing

tag as follows: <empty_element></empty_element>

Attributes:

XML elements can have attributes. An attribute provides additional information about an

element. Attributes provide information that is not a part of the data. In the example below,

the file type is irrelevant to the data, but can be important to the software that wants to

manipulate the element: <file type="gif">computer.gif</file>

7 Describe the logical structure / tree structure of XML documents.

There is a big difference between XML and HTML markup. With a few exceptions, most

HTML tags perform functions related to how the content is displayed. XML markup, on the

other hand, is meant to convey what the content means.

Each XML document must have only one root element, and all other elements must be

perfectly nested inside that element. Perfectly nested means, that if an element contains other

elements, those elements must be completely enclosed within that element.

If we sketch the structure of the elements in XML document, we obtain a tree structure.

The root element <class_list> is at the top of the tree. All elements that are inside this

element are neatly contained within each other. An XML document can contain only one

root element, and no element can be either partially or completely outside this element. An

element is a parent of the elements that it contains. The elements inside an element are called

children. Elements that share the same parent element are called siblings.

In our example <class_list> is the parent of all elements. <student> is the parent of <name>,

<name> is a child of <student>, and <name> and <grade> are siblings. Each child element

must be fully contained within its parent element. Sibling elements may not overlap.

The arrangement of elements in XML is called its logical structure.

Tree Structure:

• XML documents form a tree structure.

• XML documents must contain a root element. This element is "the parent" of all other

elements.

• The elements in an XML document form a document tree. The tree starts at the root and

branches to the lowest level of the tree.

• All elements can have sub elements (child elements)

• <root>

<child>

<subchild>.....</subchild>

</child>

</root>

Page 5: XML

XML

Prof. Mukesh N. Tekwani Page 5 of 11

Example of tree structure:

This tree structure is a represenattion for one book in the XML document which is given

below:

<bookstore> <book category = "COOKING"> <title lang = "en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price> </book> <book category = "CHILDREN"> <title lang = "en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> <book category = "WEB"> <title lang = "en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price> </book> </bookstore>

The <book> element has 4 children: <title>, < author>, <year>, and <price>

Page 6: XML

XML

Page 6 of 11 [email protected]

8 State the XML syntax Rules

All XML Elements Must Have a Closing Tag

In HTML, elements do not have to have a closing tag:

<p>This is a paragraph

<p>This is another paragraph

In XML, it is illegal to omit the closing tag. All elements must have a closing tag:

<p>This is a paragraph</p>

<p>This is another paragraph</p>

XML Tags are Case Sensitive

XML tags are case sensitive. The tag <Letter> is different from the tag <letter>.

Opening and closing tags must be written with the same case:

<Message>This is incorrect</message>

<message>This is correct</message>

"Opening and closing tags" are also called as "Start and end tags".

XML Elements Must be Properly Nested

In HTML, you might see improperly nested elements:

<b><i>This text is bold and italic</b></i>

In XML, all elements must be properly nested within each other:

<b><i>This text is bold and italic</i></b>

In the example above, "Properly nested" simply means that since the <i> element is opened

inside the <b> element, it must be closed inside the <b> element.

XML Documents Must Have a Root Element

XML documents must contain one element that is the parent of all other elements. This

element is called the root element.

<root>

<child>

<subchild>.....</subchild>

</child>

</root>

Page 7: XML

XML

Prof. Mukesh N. Tekwani Page 7 of 11

XML Attribute Values Must be Quoted

XML elements can have attributes in name/value pairs just like in HTML. In XML, the

attribute values must always be quoted.

In the two XML documents below, the first one is incorrect, the second is correct:

<note date=12/11/2007>

<to>Raja</to>

<from>Jani</from>

</note>

<note date="12/11/2007">

<to>Raja</to>

<from>Jani</from>

</note>

The error in the first document is that the date attribute in the note element is not quoted.

Entity References

Some characters have a special meaning in XML. If you place a character like "<" inside an

XML element, it will generate an error because the parser interprets it as the start of a new

element.

This will generate an XML error:

<message>if salary < 1000 then</message>

To avoid this error, replace the "<" character with an entity reference:

<message>if salary &lt; 1000 then</message>

There are 5 predefined entity references in XML:

&lt; < less than

&gt; > greater than

&amp; & ampersand

&apos; ' apostrophe

&quot; " quotation mark

Note: Only the characters "<" and "&" are strictly illegal in XML. The greater than character

is legal, but it is a good habit to replace it.

Comments in XML

The syntax for writing comments in XML is similar to that of HTML.

Page 8: XML

XML

Page 8 of 11 [email protected]

<!-- This is a comment -->

White-space is Preserved in XML

HTML truncates multiple white-space characters to one single white-space:

HTML: Hello Tove

Output: Hello Tove

With XML, the white-space in a document is not truncated.

XML Stores New Line as LF

In Windows applications, a new line is normally stored as a pair of characters: carriage

return (CR) and line feed (LF). In Unix applications, a new line is normally stored as a LF

character. XML stores a new line as LF.

9 State the XML naming rules.

XML elements must follow these naming rules:

• Names can contain letters, numbers, and other characters

• Names cannot start with a number or punctuation character

• Names cannot start with the letters xml (or XML, or Xml, etc)

• Names cannot contain spaces.

• Any name can be used, no words are reserved.

Best Naming Practices

Make names descriptive. Names with an underscore separator are nice: <first_name>,

<last_name>.

Names should be short and simple, like this: <book_title> not like this:

<the_title_of_the_book>.

Avoid "-" characters. If you name something "first-name," some software may think you

want to subtract name from first.

Avoid "." characters. If you name something "first.name," some software may think that

"name" is a property of the object "first."

Avoid ":" characters. Colons are reserved to be used for something called namespaces (more

later).

XML documents often have a corresponding database. A good practice is to use the naming

rules of your database for the elements in the XML documents.

Non-English letters like éòá are perfectly legal in XML, but watch out for problems if your

software vendor doesn't support them.

10 XML elements are extensible. Explain this statement.

XML’s flexibility comes from its capability to enable you to make up your own XML

Page 9: XML

XML

Prof. Mukesh N. Tekwani Page 9 of 11

elements. This means that you can introduce tags into XML XML elements can be extended

to carry more information.

Look at the following XML example:

<note>

<to>Raja</to>

<from>Jani</from>

<body>Don't forget me this weekend!</body>

</note>

Let's imagine that we created an application that extracted the <to>, <from>, and <body>

elements from the XML document to produce this output:

MESSAGE

To: Raja

From: Jani

Don't forget me this weekend!

Imagine that the author of the XML document added some extra information to it:

<note>

<date>2008-01-10</date>

<to>Raja</to>

<from>Jani</from>

<heading>Reminder</heading>

<body>Don't forget me this weekend!</body>

</note>

This application will not crash because of the changes we made. The application should still

be able to find the <to>, <from>, and <body> elements in the XML document and produce

the same output. This is the concept of extensibility.

11 Write a note on XML attributes.

XML elements can have attributes. An attribute provides additional information about an

element. Attributes provide information that is not a part of the data. In the example below,

the file type is irrelevant to the data, but can be important to the software that wants to

manipulate the element:

<file type="gif">computer.gif</file>

XML Attributes Must be Quoted

Attribute values must always be quoted. Either single or double quotes can be used. For a

person's gender, the person element can be written like this:

<person gender = “female”> or <person gender = ‘female’>

Page 10: XML

XML

Page 10 of 11 [email protected]

XML attributes must be avoided for the following reasons:

• attributes cannot contain multiple values (elements can)

• attributes cannot contain tree structures (elements can)

• attributes are not easily expandable (for future changes)

• attributes are difficult to read and maintain.

Use elements for data. Use attributes for information that is not relevant to the data.

12 What is the difference between XML elements and attributes?

XML does not specify about when to use elements and when to use attributes. Consider the

following examples:

<person gender = "female"> <firstname>Anita</firstname> <lastname>Shah</lastname> </person> <person> <gender>female</gender> <firstname>Anita</firstname> <lastname>Shah</lastname> </person>

In the first example gender is an attribute. In the next example, gender is an element. Both

examples provide the same information. Generally, we avoid using attributes in XML and

instead prefer to use elements.

Another example:

Consider the following XML document :

Using date attribute:

<note date="10/01/2008">

<to>Raja</to>

<from>Jani</from>

<heading>Reminder</heading>

<body>Don't forget me this weekend!</body>

</note>

Using date element:

<note>

<date>10/01/2008</date>

<to>Raja</to>

<from>Jani</from>

<heading>Reminder</heading>

<body>Don't forget me this weekend!</body>

</note>

We now expand the date element in the next code:

<note>

Page 11: XML

XML

Prof. Mukesh N. Tekwani Page 11 of 11

<date>

<day>10</day>

<month>01</month>

<year>2008</year>

</date>

<to>Raja</to>

<from>Jani</from>

<heading>Reminder</heading>

<body>Don't forget me this weekend!</body>

</note>

13 What are the specifications needed for a document to be valid and well formed XML

document?

An XML document with correct syntax is called a “well formed XML document”. But a

document validated against a DTD is a “valid” document”.

Well formed document:

A "Well Formed" XML document has correct XML syntax. These syntax rules are:

• XML documents must have a root element

• XML elements must have a closing tag

• XML tags are case sensitive

• XML elements must be properly nested

• XML attribute values must be quoted

Valid XML document:

A "Valid" XML document is a "Well Formed" XML document, which also conforms to the

rules of a Document Type Definition (DTD):

<?xml version="1.0" encoding="ISO-8859-1"?>

<!DOCTYPE note SYSTEM "Note.dtd">

<note>

<to>Raja</to>

<from>Jani</from>

<heading>Reminder</heading>

<body>Don't forget me this weekend!</body>

</note>

The DOCTYPE declaration in the example above, is a reference to an external DTD file.

The content of the file is shown in the paragraph below. The purpose of a DTD is to define

the structure of an XML document. It defines the structure with a list of legal elements:

<!DOCTYPE note

[

<!ELEMENT note (to, from, heading, body)>

<!ELEMENT to (#PCDATA)>

<!ELEMENT from (#PCDATA)>

<!ELEMENT heading (#PCDATA)>

<!ELEMENT body (#PCDATA)>

]>