introduction to xml: part i by sandeep jangity cs 157b, section 2 dr. lee

24
Introduction to XML: Part I By Sandeep Jangity CS 157B, Section 2 Dr. Lee

Upload: ethelbert-ryan

Post on 24-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to XML: Part I By Sandeep Jangity CS 157B, Section 2 Dr. Lee

Introduction to XML: Part I

By Sandeep Jangity

CS 157B, Section 2

Dr. Lee

Page 2: Introduction to XML: Part I By Sandeep Jangity CS 157B, Section 2 Dr. Lee

Overview

• What is XML?• Why XML is popular?• How to write a XML document?• How to write XML DTD’s/Schemas?

Page 3: Introduction to XML: Part I By Sandeep Jangity CS 157B, Section 2 Dr. Lee

What is XML?

• eXtensible Markup Language • XML is a standard developed by the W3C• XML is a syntax for expressing structured

data in a text format• XML is not a language on its own. Instead,

XML is used to build markup languages.• XML is like html per-se, but unlike html

tags, XML tags convey meaning of the data inside their tags

Page 4: Introduction to XML: Part I By Sandeep Jangity CS 157B, Section 2 Dr. Lee

Structured Data

• Structured data refers to data that is tagged for its content, meaning, or use

• Includes: spreadsheets, address books, databases, PDF documents, …

• Stored in binary or text format

Page 5: Introduction to XML: Part I By Sandeep Jangity CS 157B, Section 2 Dr. Lee

XML Technology Model

• Data is modeled in XML

• The structure and constraints are modeled using DTD’s or Schemas

• The document format can be modeled using XSL (XML Style Sheets)

• REMEMBER: XML allows us to separate data from presentation!

Page 6: Introduction to XML: Part I By Sandeep Jangity CS 157B, Section 2 Dr. Lee

Why use XML?• Interoperability – XML is operating system,

platform, language independent• Separates content from presentation• Well supported by most browsers• Simple XML documents are human-readable and

can be easily parsed by machines, as well.• Easily converted to other formats. XML->PDF ||

Microsoft CHM etc.,• Can represent almost any kind of data

– Many, many applications: Math/Science/etc.,

– (continued: next slide)

Page 7: Introduction to XML: Part I By Sandeep Jangity CS 157B, Section 2 Dr. Lee

MyMathML

4 + (5 * 3)<expression> <operator> add </operator> <expression> <number> 4 </number></expression> <expression> <operator> mult </operator> <expression> <number> 5 </number> </expression> <expression> <number> 3 </number> </expression> </expression></expression>

Page 8: Introduction to XML: Part I By Sandeep Jangity CS 157B, Section 2 Dr. Lee

MyChemMLChemML (tracking experiments)

<experiment date = "03-15-2003"><introduction>

The compound under investigation is common water: <molecule>

<atom symbol="H" number ="2"/><atom symbol="O" number ="1"/>

</molecule>It boils at 100 degrees and freezes at 0 degrees!For more information about this amazing compoundsee the March 2003 issue of:

<reference type = "simple" href = "http://www.ww.com"> Water World

</reference></introduction><!-- etc -->

</experiment>(Now the technical stuff)

Page 9: Introduction to XML: Part I By Sandeep Jangity CS 157B, Section 2 Dr. Lee

XML document syntax

• Root element

• Elements and attributes are case sensitive

• Elements must be correctly nested

• Attributes values must be in quotes

• Tags must be closed

• Spaces are not allowed in element and attribute names

Page 10: Introduction to XML: Part I By Sandeep Jangity CS 157B, Section 2 Dr. Lee

XML Example<?xml version="1.0"?><Bookstore>

<Book ID=“101”> <Author>John Doe</Author> <Title>Introduction to XML</Title> <Date>12 June 2001</Date> <ISBN>121232323</ISBN> <Publisher>XYZ</Publisher>

</Book>

<Book ID=“102”> <Author>Foo Bar</Author> <Title>Introduction to XSL</Title> <Date>12 June 2001</Date> <ISBN>12323573</ISBN> <Publisher>ABC</Publisher>

</Book></Bookstore>

Page 11: Introduction to XML: Part I By Sandeep Jangity CS 157B, Section 2 Dr. Lee

Well-formed vs Valid• Syntax & Semantic checking• Well-formed (syntax):

– Properties:(1) every start tag has a matching end tag, and(2) elements are properly nested

– an XML document might be “well-formed” without being “valid“, but a “valid” document is “well-formed”

• Valid (semantic):– A valid XML document conforms to the vocabulary

constraints defined in a DTD or Schema

Page 12: Introduction to XML: Part I By Sandeep Jangity CS 157B, Section 2 Dr. Lee

Well-Formed (cont’d)

Well formed?

<?xml version=“1.0”?>

<memo>

<from> Bill

<to> Sue

</from>

</to> Dinner tonight?

</nemo>

Page 13: Introduction to XML: Part I By Sandeep Jangity CS 157B, Section 2 Dr. Lee

Definition and Validation

• Two ways to define the structure of an XML document– DTDs– Schemas

• Each set of rules specifies an XML vocabulary

Page 14: Introduction to XML: Part I By Sandeep Jangity CS 157B, Section 2 Dr. Lee

What is a DTD?

• Document Type Definitions (DTD)– Emphasis on the structure of the XML, what

elements and attributes can appear and their relationships

– Difficult to work with

– No support for data types

– Not extensible

Page 15: Introduction to XML: Part I By Sandeep Jangity CS 157B, Section 2 Dr. Lee

Bookstore Example<Bookstore>

<Book ID=“101”> <Author>John Doe</Author> <Title>Introduction to

XML</Title> <Date>12 June 2001</Date> <ISBN>121232323</ISBN> <Publisher>XYZ</Publisher>

</Book>

<Book ID=“102”> <Author>Foo Bar</Author> <Title>Introduction to

XSL</Title> <Date>12 June 2001</Date> <ISBN>12323573</ISBN> <Publisher>ABC</Publisher>

</Book></Bookstore>

<!ELEMENT Bookstore (Book)*>

<!ELEMENT Book (Title, Author+, Date, ISBN, Publisher)>

<!ATTLIST Book ID #REQUIRED>

<!ELEMENT Title (#PCDATA)>

<!ELEMENT Author (#PCDATA)><!ELEMENT Date (#PCDATA)><!ELEMENT ISBN (#PCDATA)><!ELEMENT Publisher (#PCDATA)>

Page 16: Introduction to XML: Part I By Sandeep Jangity CS 157B, Section 2 Dr. Lee

Problems with DTD’s

It's not XML syntax• You write your XML document using one syntax and the DTD

using another syntax -> inconsistent, more work for the parsers.

Limited set of primitive datatypes• Desire a set of datatypes compatible with those found in

databases– One of the main weaknesses of DTD is its lack of support

for data types beyond character strings (PCDATA).

Limited support for applying constraints.• Can support only constraints like “+” (1 or more occurences),

“?” (0 or 1 occurences), “*” (0 or more occurences), etc. No facility for providing constraints like those found in databases (enumerations, ranges, string length, etc.)

Page 17: Introduction to XML: Part I By Sandeep Jangity CS 157B, Section 2 Dr. Lee

What are Schemas?

• Schemas

– More complex than DTD’s

– Specify structure

– Support for precise data type constraints

– Allows for user-defined data types (complex/simple types)

– Enhanced datatypes (unlike PCDATA in DTD’s):

• Wider range of primitive data types, supporting those found in databases (string, boolean, decimal, integer, date, etc.)

• Can create your own datatypes (complexType)

– Support namespaces for extensibility

Page 18: Introduction to XML: Part I By Sandeep Jangity CS 157B, Section 2 Dr. Lee

Schema Example(next SLIDE)

<Bookstore> <Book ID=“101”>

<Author>John Doe</Author> <Title>Introduction to XML</Title> <Date>12 June 2001</Date> <ISBN>121232323</ISBN> <Publisher>XYZ</Publisher>

</Book>

<Book ID=“102”> <Author>Foo Bar</Author> <Title>Introduction to XSL</Title> <Date>12 June 2001</Date> <ISBN>12323573</ISBN> <Publisher>ABC</Publisher>

</Book></Bookstore>

Page 19: Introduction to XML: Part I By Sandeep Jangity CS 157B, Section 2 Dr. Lee

<xsd:schema xmlns:xsd=“http://www.w3.org/2001/XMLSchema” targetNamespace="http://www.books.org"

xmlns=“http://www.books.org”>

<xsd:element name="Bookstore">

<xsd:complexType> <xsd:sequence> <xsd:element ref="Book" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="Title" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Author" minOccurs="1" maxOccurs=“unbounded”/> <xsd:element ref="Date" minOccurs="1" maxOccurs="1"/> <xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/> <xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:Date"/> <xsd:element name="ISBN" type="xsd:integer"/> <xsd:element name="Publisher" type="xsd:string"/></xsd:schema>

Page 20: Introduction to XML: Part I By Sandeep Jangity CS 157B, Section 2 Dr. Lee

XML Namespaces: Code-reuse

• Namespaces– Identifies an XML vocabulary defined by a

URI (Uniform Resource Identifier)– Allows reuse of XML markup– Resolves problems with recognition and

collision of tags with similar names. Can happen if your combining elements from multiple documents.

(see previous slide)

Page 21: Introduction to XML: Part I By Sandeep Jangity CS 157B, Section 2 Dr. Lee

Cool XML Application: RSS<rss version="0.91">  <channel>   <title>XML.com</title>     <link>http://www.xml.com/</link>      <description>XML.com features a rich mix of information and services for the XML community.</description>    <language>en-us</language>     <item>       <title>Normalizing XML, Part 2</title>       <link>http://www.xml.com/pub/a/2002/12/04/normalizing.html</link>       <description>In this second and final look at applying relational normalization techniques to W3C XML Schema data modeling, Will Provost discusses when not to normalize, the scope of uniqueness and the fourth and fifth normal forms.</description>    </item>    <item>       <title>The .NET Schema Object Model</title>       <link>http://www.xml.com/pub/a/2002/12/04/som.html</link>       <description>Priya Lakshminarayanan describes in detail the use of the

.NET Schema Object Model for programmatic manipulation of W3C XML Schemas.</description>    </item>  </channel></rss>

Page 22: Introduction to XML: Part I By Sandeep Jangity CS 157B, Section 2 Dr. Lee

??

Almost done …

Page 23: Introduction to XML: Part I By Sandeep Jangity CS 157B, Section 2 Dr. Lee

TOools/SoftwareXML Spy By far, the most comprehensive editor. Handles XML files, DTD’s, XSL files, as

well as XSD (XML Schema). Unfortunately only a 30 day trial version.http://www.xmlspy.com/download.html

XML NotepadMicrosoft XML Notepad is a simple application for building and editing small sets

of XML-based data. Freeware.http://msdn.microsoft.com/xml/notepad/download.asp

XML ProXML Pro is a top-notch XML editor but it doesn’t include as many features as

XML Spy. Shareware.http://www.vervet.com/demo.html

$$ You can also validate your XML files by just opening them with IE5.0 or above. It checks if the XML file is well-formed or not, and also validates against a DTD (if specified on the DOCTYPE declaration

Great links:www.w3schools.comhttp://www.cs.sjsu.edu/faculty/pearce/web/front.htm

Page 24: Introduction to XML: Part I By Sandeep Jangity CS 157B, Section 2 Dr. Lee

Conclusion

• You thought HTML was easy? XML just got easier!

• Get XML certified before you graduate!Visit: http://www.whizlabs.com/articles/xml-article.html

[email protected]