topics the "bigger picture" –the "xml sales pitch" –xml/xhtml vs. sgml/html...

Post on 24-Dec-2015

224 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Topics

• The "bigger picture"– The "XML sales pitch"– XML/XHTML vs. SGML/HTML– XML in electronic publishing– XML and the future, web 2.0

• XML basics:– Building blocks: elements, attributes, …– Structural constraints: Well-formed XML– Character sets– Namespaces– Validity: DTDs and XML schemas

Week 0534 Introduction to XML 1

Week 0534 Introduction to XML 2

Why Use XML (1)

• Consider a line from a .dat file:2394287410|Verbatim|DataLife MF 2HD|10|3.5"|black

• or the XML-fragment:<product barcode="2394287410"> <manufacturer>Verbatim</manufacturer> <name>DataLife MF 2HD</name> <quantity>10</quantity> <size>3.5"</size> <color>black</color></product>

• Which one is easier to interpret, more robust, easier to use for complex structures?

Week 0534 Introduction to XML 3

Why Use XML (2)

• Simple syntax• Self documenting format• Support for hierarchical structures• Simple debugging (both for user as machine)• Language and platform independent• Many different tools• Growing library of ”standard” formats

Week 0534 Introduction to XML 4

Main Types of XML Documents

• Narrative-Centric Documents:– Largely with irregular structure, for instance a

novel

• Data-Centric Documents:– With a regular structure, for instance a telephone

directory

• Hybrid Documents:– Typically contains highly regular parts mixed with

irregular contents - e.g., a product catalog

Week 0534 Introduction to XML 5

XML/XHTML vs. SGML/HTML

• Problems with SGML/HTML:– SGML is a complex markup language– HTML is only suitable for narrative documents– HTML became a bad mix of structure and layout– HTML browsers are too tolerant for language

• The XML/XHTML promise:– XML has a simple and extendible structure– Suitable for both data and narrative documents– XHTML is for structure only - CSS is for layout– Enforces strict rules

Week 0534 Introduction to XML 6

XML in Electronic Publishing

• Some important XML-applications:– Text transformation/printing: XSLT, XSL-FO, SVG,

…– Content: GML, MathML, NewsML, DocBook, …– Data exchange: SOAP, AJAX, xCAL, …– Semantics: RDF, Dublin Core, …

Week 0534 Introduction to XML 7

Web 2.0

• Next generation of web is about data - not documents!– Read the O'Reilly Web 2.0 article

Week 0534 Introduction to XML 8

XML Basics

• Core literature:– An Introduction to XML and Web Technologies:

• Chapters 1-2

– XML in a Nutshell:• Chapters 1-2, 4-7, 26

Week 0534 Introduction to XML 9

What XML consists of

• Elements• Attributes• Entities and entity references• Text• CDATA sections• Processing instructions• Comments

Week 0534 Introduction to XML 10

XML declaration

• XML documents should begin with a XML declaration that give information about:– XML version– Encoding– If external DTDs are to be used

• Example:

<?xml version=”1.0” encoding=”ISO-8859-1” standalone=”yes”?>

Week 0534 Introduction to XML 11

XML elements

• The basic entity in XML• Consists of a start-tag, content and a end-tag• Simple content:

<title>Web page for IMT4501</title>

• Mixed content:<p> <strong>No</strong>, you can’t do<em>that</em>!</p>

• Empty element:<br></br><br /> <!-- Short version -->

Week 0534 Introduction to XML 12

Attributes

• Extra information about an element• Example:

<img height=”240” width=”320” src=”logo.gif” />

• Values enclosed by apostrophes in pairs:<a href=”http://www.hig.no/”>HiG</a><a href=’http://www.oa.no/’>Oppland Arbeiderblad</a>

• But not:<a href=”http://www.vg.no/’>VG</a><a href=’http://www.cnn.no/”>CNN</a>

Week 0534 Introduction to XML 13

Well-formed XML

• One root element• Correct nesting of elements• Always a matching end-tag to each element• Case sensitive names• Attribute values in quotes• One attribute can’t appear more than once inside an

element• No comments inside tags• No unescaped < or & inside text content

Week 0534 Introduction to XML 14

Sample XML structure<?xml version=”1.0” standalone=”yes”?><higDB> <department abbr=”IMT” ... > <subject code=”IMT4131” ... />

<subject code=”REA4001” ... /> <employee empNo=”eNo287” ... > <subject subjRef=”IMT4131” /> </employee> <employee empNo=”eNo293” ... > <subject subjRef=”REA4001” /> </employee> <employee empNo=”eNo307” ... > <subject subjRef=”IMT4131” /> </employee> </department> <!-- ... --></higDB>

Week 0534 Introduction to XML 15

Tree for the example

Week 0534 Introduction to XML 16

XML names• Have to start with ’_’ or letter• Followed by numbers, letters, ’_’, ’-’ or .’• XML as a prefix (regardless of capitalization) are

reserved• Acceptable names:

<språk><fugl-eller-fisk><_level1.melding>

• Non acceptable names:<tittel$språk><fugl eller fisk><2erStilling>

Week 0534 Introduction to XML 17

Entities and entity references

• Five predefined entity references in XML• Other entity references can be defined in an external

DTD

XHTML Entities

Å &Aring; (unicode: &#197;)Æ &Aelig; (unicode : &#198;)Ø &Oslash; (unicode : &#216;)å &aring; (unicode : &#229;)æ &aelig; (unicode : &#230;)ø &oslash; (unicode : &#248;)

Predefined XML Entities

< &lt; (less than)> &gt; (greater than)& &amp; (ampersand)” &quot; (quotation)’ &apos; (apostrophe)

Unicode Entities

© &#169; (xhtml: &copy;)α &#948; (xhtml: &alpha;)€ &#8364; (xhtml: &euro;)

Week 0534 Introduction to XML 18

Text and character parsing

• Text is basically PCDATA (Parsed Character Data):– The parser replaces entity references with value

• CDATA can be used where we want the parser to interpret the character data:<logiskUttrykk> <![CDATA[(len > 0) && (len < 256)]]></logiskUttrykk>

Week 0534 Introduction to XML 19

Comments

• Enclosed by <!-- and -->• Should not appear inside a tag• A double hyphen -- can not appear anywhere inside

the comment• Are meant for users, not application• Correct use:

<FotoDB><!-- Example of image database dump --> <Image_series> ...

• Wrong use:<FotoDB <!-- Example of image database dump -->><!-- Not finished -- look at it later -->

Week 0534 Introduction to XML 20

Processing instructions

• Enclosed by <? and ?>• Target follows right after <?• Can be used to send information to the application• Comments were used before, but XML parsers can

choose not to send comments to the application• Example:

<?php $logged_in = $_SESSION[“logged_in”]; if (!$logged_in) { echo “You have to <a href=’login.php’>log in</a> first”; }?>

Week 0534 Introduction to XML 21

Exercise

• Complete the ZVON XML tutorial:http://www.zvon.org/xxl/XMLTutorial/General/contents.html

Week 0534 Introduction to XML 22

Character Sets

• Historically, character encoding has been a challence: – The same code has been used for different

characters on different systems

• Now, there are standards:– ISO-8859-1 (ISO Latin), ˝default˝ on the web– Unicode - defines a larger character set, used by

XML on default:• UTF-8 efficient for western languages• UTF-16• UTF-32

Week 0534 Introduction to XML 23

Namespaces – why?

• Distinguish between elements and attributes from different XML vocabularies

• Namespaces allow two or more XML vocabularies to use the same document

• Group all related elements and attributes from a single XML application – easier to be recognized by the software

Week 0534 Introduction to XML 24

Namespaces – how?

• A prefix attached to a vocabulary (identified by a URI) with attributes xmlns:<Description xmlns:dc=”http://purl.org/dc/”>

• The prefix is defined inside the sub tree where the element are root

• Elements in a vocabulary identified by the prefix:<Description xmlns:dc=”http://purl.org/dc/”> <dc:title>XML in a Nutshell</dc:title> <dc:creator>Elliotte Rusty Harold</dc:creator> <dc:creator>W. Scott Means</dc:creator> <dc:date>2002</dc:date></Description>

Week 0534 Introduction to XML 25

More about the prefix

• You choose the name of the prefix, the URI identifies the vocabulary

• The prefix has to be a leagal XML name

Week 0534 Introduction to XML 26

Namespaces – what is it really?

• A vocabulary identified by a fixed Uniform Resource Identifier:– http://...– ftp://...– …

• The URI has to be unique to make the vocabulary unique

• The URI does not need to point at any defined document

Week 0534 Introduction to XML 27

Example scope

Week 0534 Introduction to XML 28

Default namespace

• Default namespace can be used where all non-prefixed elements belongs to a fixed vocabulary

• Example:<RDF xmlns=”http://www.w3.org/TC/REC-rdf-syntax#”> <Description xmlns:dc=”http://purl.org/dc/”> <dc:title>XML in a Nutshell</dc:title> <dc:creator>Elliotte Rusty Harold</dc:creator> <dc:creator>W. Scott Means</dc:creator> <dc:date>2002</dc:date> </Description></RDF>

Week 0534 Introduction to XML 29

Exercise

• Complete the ZVON XML tutorial:http://www.zvon.org/xxl/NamespaceTutorial/

Output/contents.html

top related