xhtml steven pemberton cwi, amsterdam chair, w3c html working group
TRANSCRIPT
Overview
HistoryPhilosophyXML and related technologiesXHTML 1.0ModularisationXHTML BasicXHTML 1.1The Future
HTML 1
The original HTML was designed in the early 1990’s for scientific reports
Each document was a single resource (not even <IMG>)
(This explains much about HTTP by the way)
(HTML 1)
It is amazing how much we have been able to do with a language with such beginnings
It was described using SGML
HTML as an SGML Application
SGML: an international standard in 1986
It is a Meta-language that describes data formats, using DTD’s (Document Type Definitions)
Describes structure, not presentation<H1>HTML as SGML Application</H1>
Example of a DTD fragment
<!ELEMENT table (caption?, (col*|colgroup*), thead?,
tfoot?, (tbody+|tr+))><!ELEMENT caption %Inline;><!ELEMENT thead (tr)+>...
Attributes
<!ATTLIST TABLE %attrs; -- %coreattrs, %i18n,
%events -- summary %Text; #IMPLIED width %Length; #IMPLIED border %Pixels; #IMPLIED …
>
Entities
<!ENTITY % fontstyle "TT | I | B | BIG | SMALL"><!ENTITY % inline "#PCDATA |
%fontstyle; | %phrase; | %special; | %formctrl;">
<!ENTITY % Length "CDATA" -- nn for pixels or nn% for percentage length -->
Changes to HTML
Netscape and Microsoft start adding to HTML: mostly presentation-oriented tags (like <BLINK>, <CENTER>), and frames
The World Wide Web Consortium (W3C) started effort to: Keep HTML Pure Do presentation via Style Sheets
Separating content and presentation
HTML was designed as a data-structuring language, but the later changes undermined this.
Separating content from presentation has distinct advantages
For the author
Easier to write your documentsEasier to change your documentsEasy to change the look of your
documentsAccess to professional designsYour documents are smallerVisible on more devicesVisible to more people
For the webmaster
Separation of concernsSimpler HTML, less trainingCheaper to produce, easier to manageEasy to change house styleReach more peopleSearch engines find your stuff easierVisible on more devices
For the reader
Faster download (one of the top 4 reasons for liking a site)
Easier to find informationYou can actually read the
information if you are sight-impairedInformation more accessibleYou can use more devices
For the implementor
Improves the implementation (separation of concerns)
Can produce smaller browsers
Changes to HTML (2)
Another change that Netscape made, with insufficient thought was Frames
Frames create significant problems with web pages
The problems with frames
Can’t bookmark framesets[Back] does odd things[Page up] and [page down] work oddly[Reload] often doesn’t work rightSecurity is compromisedNested frames are hard to deal with
(how do you get out?)
Style languages
The first action that W3C did was to start an activity on Style Sheets (Nov 1995)
This produced CSS1 initially (Dec 1996), then CSS2 (May 1998) (CSS3 is in preparation)
Later produced XSL, an XML-based language, as complementary to CSS
CSS
CSS is a separate language from HTML that allows you to specify how an HTML document, or set of documents, should look
Separates content from presentationHTML can be a structure language
again
Examples of CSS
h1 { font-weight: bold; font-size: 2em }h2 { font-weight: bold; font-size:
1.5em }
em {background-color: yellow}
body {margin-left: 20%}
Using CSS
Use the following at the top of an XML document:<?xml-stylesheet type='text/css'
href=’mystyle.css'?>Or this in the <head> of an HTML
document:<link rel="stylesheet" type="text/css"
href=”mystyle.css" />
Advantages of CSS
Makes HTML easier to write (and read)You can define a house styleCompatible: you can still see the
content on non-CSS browsersPages are much smallerAccessible to sight-impaired...
By the way...
Check your logs: more than 95% of people browsing now use a CSS-enabled browser
The current generation of browsers (IE 5, NS 6, Opera 4) have excellent support for CSS.
You never need to use the <FONT> and <FONTFACE> elements again!
Documents
As mentioned, HTML was designed for just one sort of document (scientific reports), but is now being used for all sorts of different documents
You could use SGML to define other sorts of document, but SGML is notoriously hard to fully implement
Enter XML
Enter XML
XML is a W3C effort to simplify SGMLIt is a meta-language: a language for
defining languagesIt is a subset of SGMLOne of the aims is to allow everyone
to invent their own tagsDTD is optional: a DTD can be inferred
from a document
Consequences
The requirement of being able to infer a DTD from a document has an effect on the languages you can define: Closing tags are now required
<LI>....</LI> <P>....</P> Empty tags are marked specially
<IMG SRC=”pic.gif”/> <BR/> <HR/> (or <HR></HR> etc)
Consequences 2
CDATA sections must be marked as such (only necessary if they contain “<”, “&” etc.):
<SCRIPT><![CDATA[ ... script content ...]]>
</SCRIPT>
By the way: <P> is not like <BR>
Not Like This<H1>XML</H1>
An underlying problem with HTML is that ...
<P>You could use SGML
to define ...
But Like This<H1>XML</H1><P>An underlying
problem with HTML is that … </P>
<P>You could use SGML
to define ...</P>
Consequence of XML
Anyone can now design their own (Web-delivered) languages
CSS makes them viewable<address><name>Steven Pemberton</name><company>CWI</company><street>Kruislaan 413</street><postcode>1098 SJ</postcode><city>Amsterdam</city><speaker/></address>
So do we still need HTML?
Workshop in May 1998XML is still a meta-languageThere is still a perceived need for a
base-line mark-upHTML has some useful semantics,
both implied and explicit (search engines gladly use it, for instance)
HTML as XML application
Clean up (get rid of historical flotsam)Modularise – split into separate parts
Allows other XML applications to use parts Allows special purpose devices to use
subsetAdd any required new functionality
(forms, better event handling, Ruby)
The HTML Working group
International membership, around 20 members
Many major players (IBM, Microsoft, Netscape, etc)
Meets weekly by phone, quarterly face-to-face
Group experience
There was more to be worked out than we anticipated
XHTML is the first major application of XML, so the world’s eyes are on us
XML still needs the wrinkles ironed out
Philosophy of XHTML
Transition from ‘old world’ to XMLClean up the languageReturn to structure onlyUse generic XML as much as possibleModulariseAddress wider needs (International,
Accessibility)Add new functionality
Plan of action
HTML 4.01: corrected versionXHTML 1.0: transitional version of
HTML 4.01 in 3 flavoursModularisation: agreement on split
and methodologyXHTML Basic: Small devicesXHTML 1.1: clean version of 1.0 strict
(plan of action)
Events: accessible and device-independent
Ruby: needed Asian markupForms: more controlXHTML 2.0: Putting it all together
Differences HTML:XHTML
Because of the difference between SGML and XML, there are some necessary differences, for instance: Use lower case: <p> not <P> Attributes are always quoted:
<th colspan=”2”> Anchors use id attribute not name (and
not just on <a> by the way): <a id=”index”> <p id=”top”>
Example XHTML 1.0
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml” xml:lang="en">
<head><title>Virtual Library</title></head> <body> <p>Moved to <a
href="http://vlib.org/">vlib.org</a>. </p> </body></html>
Namespaces
Namespaces have been added to XML to allow you to mix fragments from different languages (e.g. HTML + Maths)
In the same way that object-oriented languages allow you to identify which function you are using, namespaces allow you to identify which tags you are using.
Example of nesting
<html xmlns="http://www.w3.org/1999/xhtml"> <head><title>A Math Example</title></head> <body> <p>The following is MathML markup:</p> <math xmlns="http://www.w3.org/TR/REC-MathML"> <apply><log/><logbase><cn> 3 </cn> </logbase> <ci> x </ci> </apply> </math> </body></html>
Example of colonising
<math xmlns="http://www.w3.org/TR/REC-MathML"
xmlns:html="http://www.w3.org/1999/xhtml"> <apply><log/><logbase><cn> 3 </cn>
</logbase> <ci> x </ci> </apply> <html:p>This is a paragraph</html:p> </math>
Namespaced attributes
Attributes normally come from the element itself:<html:a href="next.xml">
But you may also use ‘global’ attributes from a namespace:<pointer html:href="x.xml"><music style="classical" html:style="color:
red">Beethoven’s 5th</music>
Using ‘generic’ XML
Presentation use CSSLinks use Xlink or SchemasForms use CSS?Images etc. use Xlink or Schemas(Natural) language of elements
use xml:lang attribute
Xlink?
HTML has several ‘built-in’ hyperlinks: <a>, <img>, <object>, <link>, etc.
Since XML allows you to define your own elements, a browser doesn’t know which are links
Xlink was started to solve this problem.
Xlink
Xlink started as a method of describing which attributes of an element were a link
It later changed into a language of links, so it could no longer be used to describe XHTML
The current plan is now to introduce types into Schemas to describe links
Example of Xlink
<crossReference xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="simple" xlink:href="students.xml" xlink:role="studentlist" xlink:title="Student List" xlink:show="new" xlink:actuate="onRequest">Current List of Students</crossReference>
Schemas
Schemas are a new technology to replace much of DTDs.
Schemas are expressed in XMLThey have support for data typesMuch easier to parse and implement
than DTDs
Schema fragment
<elementType name='table'> <refines> <archetypeRef name='common'/> <archetypeRef name='simpleBlockDisplay'/> </refines>
more>>>
(schema fragment)
<sequence> <elementTypeRef name='caption' minOccur='0' maxOccur='1'/> <choice> <elementTypeRef name='col' minOccur='0' maxOccur='*'/> <elementTypeRef name='colgroup' minOccur='0'
maxOccur='*'/> </choice>
more >>>
(schema fragment)
<choice> <sequence> <elementTypeRef name='thead' minOccur='0'
maxOccur='1'/> <elementTypeRef name='tfoot' minOccur='0' maxOccur='1'/> <elementTypeRef name='tbody' minOccur='1'
maxOccur='*'/> </sequence> <elementTypeRef name='tr' minOccur='1' maxOccur='*'/> </choice></sequence></elementType>
XHTML 1.0
XHTML 1.0 is an XML-ised version of HTML 4.01
Just like HTML 4.01, there are 3 versions: ‘strict’, ‘loose’, and ‘frameset’
Transitional version
XHTML 1.0 has been carefully designed to make use of ‘quirks’ in existing HTML browsers
Use of a small number of guidelines allows XHTML to be served to HTML user agents as well as XML user agents
Examples of Guidelines
Use space before / of empty elements:
<br /> <hr /> <img src=”foo.gif” />
Don’t use <hr></hr> formUse name= and id= on <a>:
<a name= ”index” id= ”index”> … </a>
Serving XHTML 1.0
An XHTML 1.0 document that follows the guidelines can be served up either as HTML, or as XML
But beware: CSS has slightly different rules for HTML and XML
Similarly, the DOM has differences for HTML and XML
Modularisation
XHTML has been divided into a number of modules.
A module is a collection of elements and/or attributes that can be used as building blocks to build a DTD.
(modularisation)
A language can be built by using just XHTML modules, or adding your own
We had originally defined Modularisation just for our own use, but it has turned out useful for other groups as well
XHTML modules
Structure: html, head, title, bodyText: abbr, acronym, address,
blockquote, br, cite, code, dfn, div, em, h1, h2, h3, h4, h5, h6, kbd, p, pre, q, samp, span, strong, var
Hypertext: aList: ol, ul, dl, li, dt, dd
(modules)
Applet (deprecated): applet, paramPresentation: b, i, hr, big, small, sub,
sup, ttEdit: del, insBi-directional Text: bdo
(modules)
Image: imgClient-side Image Map: map, +Server-side Image Map: change to imgObject: object, paramFramesTarget: attributeIframe
(modules)
Intrinsic Events: adds events attributes
Metainformation: metaScripting: scriptStylesheet: styleStyle AttributeLink: link
(modules)
Base: baseName Identification: name attributeLegacy: basefont, center, font, s,
strike, u, plus loads of attributes (eg align)
Ruby: Asian markup
Note on modules
Note that some modules consist of a single element, or just add some attributes to existing elements
Not all modules are independent: if you use some modules, they bring other modules with them, or change other modules
Future modules are planned (eg extended forms, events)
The XHTML family
To still be called an XHTML language you must use Structure, Hypertext, Basic Text, and List modules (you may define your own Structure module)
Example integration languages
SMIL is planning a module to integrate SMIL and HTML
Likewise for MathML
Creating a DTD
It is not expected that creating XHTML-based languages will be a daily activity
Not the place to describe the method here: it depends on understanding DTDs.
The Modularisation document has extensive examples
Future versions will also use Schemas (we hope…)
XHTML Basic
XHTML Basic is the first XHTML family-member to be defined using Modularisation
It is designed for small devices, typically mobile telephones
XHTML Basic Modules
Structure Module* body, head, html, title
Text Module* abbr, acronym, address, blockquote, br, cite, code, dfn, div, em, h1, h2, h3, h4, h5, h6, kbd, p, pre, q, samp, span, strong, var
(XHTML Basic Modules)
Hypertext Module* a
List Module* dl, dt, dd, ol, ul, li
Basic Forms Module form, input, label, select, option, textarea
(XHTML Basic Modules)
Basic Tables Module caption, table, td, th, tr
Image Module img
Object Module object, param
XHTML Basic usage
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML Basic 1.0//EN" "http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd">
XHTML 1.1
XHTML 1.1 is the second family member to be defined using Modularisation
Its main aim is to present a cleaned-up, non-transitional version of XHTML 1.0 strict (no frames)
It also adds Ruby markupOtherwise: no new functionality
XHTML 1.1 Modules
Structure, Text, Hypertext, List, Object, Presentation, Edit, Bidirectional Text, Forms, Tables, Image, Client-side Image Map, Server-side Image Map, Intrinsic Events, Metainformation, Scripting, Stylesheet Module, Style Attribute (Deprecated ), Link, Base, Ruby.
Example XHTML 1.1
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns="http://www.w3.org/1999/xhtml"
xml:lang="en" > <head> <title>Virtual Library</title> </head> <body> <p>Moved to <a
href="http://vlib.org/">vlib.org</a>.</p> </body></html>
Example Ruby markup
<ruby> <rb>WWW</rb> <rp>(</rp><rt>World Wide
Web</rt><rp>)</rp></ruby>
(Use CSS to describe presentation)
Forms
Being produced by a separate groupConsists of three parts:
data model instances user interface
Will allow you to save and restore forms download multi-page forms
(Forms)
Will include much more client-side checking
Form data will be sent to the server as XML
Separates content from presentation (e.g. a radio button and a select box both allow you to select one from many, and you may want to use different choices on different devices)
Events
Current events are almost all in terms of mouse: onclick, onmouseover, onfocus, etc.
Future event model will be device independent, and allow you to define your own new events
Uses the DOM event model
The DOM
Domain Object Model: how you access a document via scripting
Currently only an XML DOMAn XHTML DOM is being investigated
Accessibility and Internationalisation
W3C has an accessibility group that checks that new recommendations address people with accessibility needs
There is also an internationalisation group that does the same for cultural issues (which produced <ruby>)
Accessibility problems
A sighted person can work out the structure from the visual presentation
A non-sighted person cannot: the structure must be present in the markup
That is why new features were added to forms and tables in HTML 4, like <caption>
Structure
Text would also benefit from such a treatment: not h1, h2 etc (which are subject to misuse) but nested sections with their own headings
CSS can still handle it
section h { how an h1 should look }section section h { h2 }section section section h { h3 }
etc.
Conclusions
XML with related technologies gives you the freedom to define and deliver your own document types
HTML is still needed as a base-line markup
The new HTML gives a transition path to the future
The State of Things
New generation of XML+CSS browsers emerging
Many XML applications appearingMajor companies planning XML as
output(Adobe PDF, MS Office 2000)
Now: HTML4, XHTML 1.0, Modularisation, Basic, 1.1
To Find Out More
All XHTML developments are made public at www.w3.org/Markup
Members of W3C can also look at www.w3.org/Markup/Group