xhtml steven pemberton cwi, amsterdam chair, w3c html working group

95
XHTML Steven Pemberton CWI, Amsterdam Chair, W3C HTML Working Group

Upload: randell-bryan

Post on 27-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

XHTML

Steven PembertonCWI, AmsterdamChair, W3C HTML Working Group

Overview

HistoryPhilosophyXML and related technologiesXHTML 1.0ModularisationXHTML BasicXHTML 1.1The Future

HTML 1

The original HTML was designed in the early 1990’s for scientific reports

Each document was a single resource (not even <IMG>)

(This explains much about HTTP by the way)

(HTML 1)

It is amazing how much we have been able to do with a language with such beginnings

It was described using SGML

HTML as an SGML Application

SGML: an international standard in 1986

It is a Meta-language that describes data formats, using DTD’s (Document Type Definitions)

Describes structure, not presentation<H1>HTML as SGML Application</H1>

Example of a DTD fragment

<!ELEMENT table (caption?, (col*|colgroup*), thead?,

tfoot?, (tbody+|tr+))><!ELEMENT caption %Inline;><!ELEMENT thead (tr)+>...

Attributes

<!ATTLIST TABLE %attrs; -- %coreattrs, %i18n,

%events -- summary %Text; #IMPLIED width %Length; #IMPLIED border %Pixels; #IMPLIED …

>

Entities

<!ENTITY % fontstyle "TT | I | B | BIG | SMALL"><!ENTITY % inline "#PCDATA |

%fontstyle; | %phrase; | %special; | %formctrl;">

<!ENTITY % Length "CDATA" -- nn for pixels or nn% for percentage length -->

Problems with SGML

Arcane syntaxVery difficult to implement fullyNo support for types

Changes to HTML

Netscape and Microsoft start adding to HTML: mostly presentation-oriented tags (like <BLINK>, <CENTER>), and frames

The World Wide Web Consortium (W3C) started effort to: Keep HTML Pure Do presentation via Style Sheets

Separating content and presentation

HTML was designed as a data-structuring language, but the later changes undermined this.

Separating content from presentation has distinct advantages

For the author

Easier to write your documentsEasier to change your documentsEasy to change the look of your

documentsAccess to professional designsYour documents are smallerVisible on more devicesVisible to more people

For the webmaster

Separation of concernsSimpler HTML, less trainingCheaper to produce, easier to manageEasy to change house styleReach more peopleSearch engines find your stuff easierVisible on more devices

For the reader

Faster download (one of the top 4 reasons for liking a site)

Easier to find informationYou can actually read the

information if you are sight-impairedInformation more accessibleYou can use more devices

For the implementor

Improves the implementation (separation of concerns)

Can produce smaller browsers

Changes to HTML (2)

Another change that Netscape made, with insufficient thought was Frames

Frames create significant problems with web pages

The problems with frames

Can’t bookmark framesets[Back] does odd things[Page up] and [page down] work oddly[Reload] often doesn’t work rightSecurity is compromisedNested frames are hard to deal with

(how do you get out?)

What frames can do

Search and show interfacesKeeping script variables in a hidden

frame

Style languages

The first action that W3C did was to start an activity on Style Sheets (Nov 1995)

This produced CSS1 initially (Dec 1996), then CSS2 (May 1998) (CSS3 is in preparation)

Later produced XSL, an XML-based language, as complementary to CSS

CSS

CSS is a separate language from HTML that allows you to specify how an HTML document, or set of documents, should look

Separates content from presentationHTML can be a structure language

again

Examples of CSS

h1 { font-weight: bold; font-size: 2em }h2 { font-weight: bold; font-size:

1.5em }

em {background-color: yellow}

body {margin-left: 20%}

Using CSS

Use the following at the top of an XML document:<?xml-stylesheet type='text/css'

href=’mystyle.css'?>Or this in the <head> of an HTML

document:<link rel="stylesheet" type="text/css"

href=”mystyle.css" />

Advantages of CSS

Makes HTML easier to write (and read)You can define a house styleCompatible: you can still see the

content on non-CSS browsersPages are much smallerAccessible to sight-impaired...

By the way...

Check your logs: more than 95% of people browsing now use a CSS-enabled browser

The current generation of browsers (IE 5, NS 6, Opera 4) have excellent support for CSS.

You never need to use the <FONT> and <FONTFACE> elements again!

Documents

As mentioned, HTML was designed for just one sort of document (scientific reports), but is now being used for all sorts of different documents

You could use SGML to define other sorts of document, but SGML is notoriously hard to fully implement

Enter XML

Enter XML

XML is a W3C effort to simplify SGMLIt is a meta-language: a language for

defining languagesIt is a subset of SGMLOne of the aims is to allow everyone

to invent their own tagsDTD is optional: a DTD can be inferred

from a document

Consequences

The requirement of being able to infer a DTD from a document has an effect on the languages you can define: Closing tags are now required

<LI>....</LI> <P>....</P> Empty tags are marked specially

<IMG SRC=”pic.gif”/> <BR/> <HR/> (or <HR></HR> etc)

Consequences 2

CDATA sections must be marked as such (only necessary if they contain “<”, “&” etc.):

<SCRIPT><![CDATA[ ... script content ...]]>

</SCRIPT>

By the way: <P> is not like <BR>

Not Like This<H1>XML</H1>

An underlying problem with HTML is that ...

<P>You could use SGML

to define ...

But Like This<H1>XML</H1><P>An underlying

problem with HTML is that … </P>

<P>You could use SGML

to define ...</P>

Consequence of XML

Anyone can now design their own (Web-delivered) languages

CSS makes them viewable<address><name>Steven Pemberton</name><company>CWI</company><street>Kruislaan 413</street><postcode>1098 SJ</postcode><city>Amsterdam</city><speaker/></address>

So do we still need HTML?

Workshop in May 1998XML is still a meta-languageThere is still a perceived need for a

base-line mark-upHTML has some useful semantics,

both implied and explicit (search engines gladly use it, for instance)

HTML as XML application

Clean up (get rid of historical flotsam)Modularise – split into separate parts

Allows other XML applications to use parts Allows special purpose devices to use

subsetAdd any required new functionality

(forms, better event handling, Ruby)

The HTML Working group

International membership, around 20 members

Many major players (IBM, Microsoft, Netscape, etc)

Meets weekly by phone, quarterly face-to-face

Group experience

There was more to be worked out than we anticipated

XHTML is the first major application of XML, so the world’s eyes are on us

XML still needs the wrinkles ironed out

Philosophy of XHTML

Transition from ‘old world’ to XMLClean up the languageReturn to structure onlyUse generic XML as much as possibleModulariseAddress wider needs (International,

Accessibility)Add new functionality

Plan of action

HTML 4.01: corrected versionXHTML 1.0: transitional version of

HTML 4.01 in 3 flavoursModularisation: agreement on split

and methodologyXHTML Basic: Small devicesXHTML 1.1: clean version of 1.0 strict

(plan of action)

Events: accessible and device-independent

Ruby: needed Asian markupForms: more controlXHTML 2.0: Putting it all together

Differences HTML:XHTML

Because of the difference between SGML and XML, there are some necessary differences, for instance: Use lower case: <p> not <P> Attributes are always quoted:

<th colspan=”2”> Anchors use id attribute not name (and

not just on <a> by the way): <a id=”index”> <p id=”top”>

Example XHTML 1.0

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/strict.dtd">

<html xmlns="http://www.w3.org/1999/xhtml” xml:lang="en">

<head><title>Virtual Library</title></head> <body> <p>Moved to <a

href="http://vlib.org/">vlib.org</a>. </p> </body></html>

Namespaces

Namespaces have been added to XML to allow you to mix fragments from different languages (e.g. HTML + Maths)

In the same way that object-oriented languages allow you to identify which function you are using, namespaces allow you to identify which tags you are using.

Example of nesting

<html xmlns="http://www.w3.org/1999/xhtml"> <head><title>A Math Example</title></head> <body> <p>The following is MathML markup:</p> <math xmlns="http://www.w3.org/TR/REC-MathML"> <apply><log/><logbase><cn> 3 </cn> </logbase> <ci> x </ci> </apply> </math> </body></html>

Example of colonising

<math xmlns="http://www.w3.org/TR/REC-MathML"

xmlns:html="http://www.w3.org/1999/xhtml"> <apply><log/><logbase><cn> 3 </cn>

</logbase> <ci> x </ci> </apply> <html:p>This is a paragraph</html:p> </math>

Namespaced attributes

Attributes normally come from the element itself:<html:a href="next.xml">

But you may also use ‘global’ attributes from a namespace:<pointer html:href="x.xml"><music style="classical" html:style="color:

red">Beethoven’s 5th</music>

XML ‘namespace’

XML also has its own pseudo-namespace for reserved attributes:<para xml:lang="en">

Using ‘generic’ XML

Presentation use CSSLinks use Xlink or SchemasForms use CSS?Images etc. use Xlink or Schemas(Natural) language of elements

use xml:lang attribute

Xlink?

HTML has several ‘built-in’ hyperlinks: <a>, <img>, <object>, <link>, etc.

Since XML allows you to define your own elements, a browser doesn’t know which are links

Xlink was started to solve this problem.

Xlink

Xlink started as a method of describing which attributes of an element were a link

It later changed into a language of links, so it could no longer be used to describe XHTML

The current plan is now to introduce types into Schemas to describe links

Example of Xlink

<crossReference xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="simple" xlink:href="students.xml" xlink:role="studentlist" xlink:title="Student List" xlink:show="new" xlink:actuate="onRequest">Current List of Students</crossReference>

Schemas

Schemas are a new technology to replace much of DTDs.

Schemas are expressed in XMLThey have support for data typesMuch easier to parse and implement

than DTDs

Schemas: but

They don’t support the definition of entities (&eacute;)

Not easy to read (or write)

Schema fragment

<elementType name='table'> <refines> <archetypeRef name='common'/> <archetypeRef name='simpleBlockDisplay'/> </refines>

more>>>

(schema fragment)

<sequence> <elementTypeRef name='caption' minOccur='0' maxOccur='1'/> <choice> <elementTypeRef name='col' minOccur='0' maxOccur='*'/> <elementTypeRef name='colgroup' minOccur='0'

maxOccur='*'/> </choice>

more >>>

(schema fragment)

<choice> <sequence> <elementTypeRef name='thead' minOccur='0'

maxOccur='1'/> <elementTypeRef name='tfoot' minOccur='0' maxOccur='1'/> <elementTypeRef name='tbody' minOccur='1'

maxOccur='*'/> </sequence> <elementTypeRef name='tr' minOccur='1' maxOccur='*'/> </choice></sequence></elementType>

(equivalent DTD)

<!ELEMENT table (caption?, (col*|colgroup*), thead?,

tfoot?, (tbody+|tr+))>

XHTML 1.0

XHTML 1.0 is an XML-ised version of HTML 4.01

Just like HTML 4.01, there are 3 versions: ‘strict’, ‘loose’, and ‘frameset’

Transitional version

XHTML 1.0 has been carefully designed to make use of ‘quirks’ in existing HTML browsers

Use of a small number of guidelines allows XHTML to be served to HTML user agents as well as XML user agents

Examples of Guidelines

Use space before / of empty elements:

<br /> <hr /> <img src=”foo.gif” />

Don’t use <hr></hr> formUse name= and id= on <a>:

<a name= ”index” id= ”index”> … </a>

Serving XHTML 1.0

An XHTML 1.0 document that follows the guidelines can be served up either as HTML, or as XML

But beware: CSS has slightly different rules for HTML and XML

Similarly, the DOM has differences for HTML and XML

Modularisation

XHTML has been divided into a number of modules.

A module is a collection of elements and/or attributes that can be used as building blocks to build a DTD.

(modularisation)

A language can be built by using just XHTML modules, or adding your own

We had originally defined Modularisation just for our own use, but it has turned out useful for other groups as well

XHTML modules

Structure: html, head, title, bodyText: abbr, acronym, address,

blockquote, br, cite, code, dfn, div, em, h1, h2, h3, h4, h5, h6, kbd, p, pre, q, samp, span, strong, var

Hypertext: aList: ol, ul, dl, li, dt, dd

(modules)

Applet (deprecated): applet, paramPresentation: b, i, hr, big, small, sub,

sup, ttEdit: del, insBi-directional Text: bdo

(modules)

Basic Forms: simple formsForms: full formsBasic Tables: simple tablesTables: full tables

(modules)

Image: imgClient-side Image Map: map, +Server-side Image Map: change to imgObject: object, paramFramesTarget: attributeIframe

(modules)

Intrinsic Events: adds events attributes

Metainformation: metaScripting: scriptStylesheet: styleStyle AttributeLink: link

(modules)

Base: baseName Identification: name attributeLegacy: basefont, center, font, s,

strike, u, plus loads of attributes (eg align)

Ruby: Asian markup

Note on modules

Note that some modules consist of a single element, or just add some attributes to existing elements

Not all modules are independent: if you use some modules, they bring other modules with them, or change other modules

Future modules are planned (eg extended forms, events)

The XHTML family

To still be called an XHTML language you must use Structure, Hypertext, Basic Text, and List modules (you may define your own Structure module)

Example integration languages

SMIL is planning a module to integrate SMIL and HTML

Likewise for MathML

Creating a DTD

It is not expected that creating XHTML-based languages will be a daily activity

Not the place to describe the method here: it depends on understanding DTDs.

The Modularisation document has extensive examples

Future versions will also use Schemas (we hope…)

XHTML Basic

XHTML Basic is the first XHTML family-member to be defined using Modularisation

It is designed for small devices, typically mobile telephones

XHTML Basic Modules

Structure Module* body, head, html, title

Text Module* abbr, acronym, address, blockquote, br, cite, code, dfn, div, em, h1, h2, h3, h4, h5, h6, kbd, p, pre, q, samp, span, strong, var

(XHTML Basic Modules)

Hypertext Module* a

List Module* dl, dt, dd, ol, ul, li

Basic Forms Module form, input, label, select, option, textarea

(XHTML Basic Modules)

Basic Tables Module caption, table, td, th, tr

Image Module img

Object Module object, param

(XHTML Basic Modules)

Metainformation Module meta

Link Module link

Base Module base

XHTML Basic usage

<!DOCTYPE html PUBLIC

"-//W3C//DTD XHTML Basic 1.0//EN" "http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd">

XHTML 1.1

XHTML 1.1 is the second family member to be defined using Modularisation

Its main aim is to present a cleaned-up, non-transitional version of XHTML 1.0 strict (no frames)

It also adds Ruby markupOtherwise: no new functionality

XHTML 1.1 Modules

Structure, Text, Hypertext, List, Object, Presentation, Edit, Bidirectional Text, Forms, Tables, Image, Client-side Image Map, Server-side Image Map, Intrinsic Events, Metainformation, Scripting, Stylesheet Module, Style Attribute (Deprecated ), Link, Base, Ruby.

Example XHTML 1.1

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns="http://www.w3.org/1999/xhtml"

xml:lang="en" > <head> <title>Virtual Library</title> </head> <body> <p>Moved to <a

href="http://vlib.org/">vlib.org</a>.</p> </body></html>

Ruby

Example Ruby markup

<ruby> <rb>WWW</rb> <rp>(</rp><rt>World Wide

Web</rt><rp>)</rp></ruby>

(Use CSS to describe presentation)

XHTML 2.0

XHTML 2.0 is still in preparationNew formsNew eventsMore accessibility

Forms

Being produced by a separate groupConsists of three parts:

data model instances user interface

Will allow you to save and restore forms download multi-page forms

(Forms)

Will include much more client-side checking

Form data will be sent to the server as XML

Separates content from presentation (e.g. a radio button and a select box both allow you to select one from many, and you may want to use different choices on different devices)

Events

Current events are almost all in terms of mouse: onclick, onmouseover, onfocus, etc.

Future event model will be device independent, and allow you to define your own new events

Uses the DOM event model

The DOM

Domain Object Model: how you access a document via scripting

Currently only an XML DOMAn XHTML DOM is being investigated

Accessibility and Internationalisation

W3C has an accessibility group that checks that new recommendations address people with accessibility needs

There is also an internationalisation group that does the same for cultural issues (which produced <ruby>)

Accessibility problems

A sighted person can work out the structure from the visual presentation

A non-sighted person cannot: the structure must be present in the markup

That is why new features were added to forms and tables in HTML 4, like <caption>

Structure

Text would also benefit from such a treatment: not h1, h2 etc (which are subject to misuse) but nested sections with their own headings

Example of structure

<section><h>XHTML</h>…<section>

<h>Structure</h>…

</section></section>

CSS can still handle it

section h { how an h1 should look }section section h { h2 }section section section h { h3 }

etc.

Conclusions

XML with related technologies gives you the freedom to define and deliver your own document types

HTML is still needed as a base-line markup

The new HTML gives a transition path to the future

The State of Things

New generation of XML+CSS browsers emerging

Many XML applications appearingMajor companies planning XML as

output(Adobe PDF, MS Office 2000)

Now: HTML4, XHTML 1.0, Modularisation, Basic, 1.1

To Find Out More

All XHTML developments are made public at www.w3.org/Markup

Members of W3C can also look at www.w3.org/Markup/Group