extensible markup language (xml) spring technology workshops march 1998 mackenzie smith office for...
TRANSCRIPT
eXtensible Markup Language (XML)eXtensible Markup Language (XML)
Spring Technology WorkshopsMarch 1998
MacKenzie Smith
Office for Information Systems
XML DefinitionXML Definition
• XML: eXtensible Markup Language
• New standard developed by the World Wide Web Consortium (W3C)
• Issued as a “recommendation” February 10, 1998 after broad industry review
• Meant as a replacement standard for HTML to render documents on the WWW
XML DefinitionXML Definition
• Sponsored by major software companies (MicroSoft, Netscape, Adobe)
• Hardware companies (Sun, HP, Fuji Xerox)
• Electronic publishing companies (Inso, ArborText, Texcel, SoftQuad)
• And other organizations involved in WWW development (W3C, NCSA)
XML DefinitionXML Definition
• Both XML and HTML are based on SGML
(Standard Generalized Markup Language)
XML and SGMLXML and SGML
What is SGML?
• ISO 8879, published in 1986, and based on IBM’s GML markup language
• Widely used in Government (especially defense industry), publishing industry, and academia for publishing materials both online and in print.
XML and SGMLXML and SGML
• Separates a document’s structure from its display (rendering in a particular medium)
• Markup a document once, publish it many times, in many media, with different “style sheets” customized for each medium.
• Encode intellectual aspects of the document, which facilitates indexing and database applications
XML and SGMLXML and SGML
• Each type of document has a Document Type Definition (DTD) which defines its structure, and what tags may be used to encode it, e.g.– Encoded Archival Description (EAD)– Text Encoding Initiative (TEI)– AAP (American Association of Publishers)
XML and SGMLXML and SGML
• Tag sets (from the DTD) should identify the structural components of the document
– e.g. in a letter, they could include: <recipient>, <return address>, <salutation>, <paragraph>, <closing>, <signature>
– but also <date>, <name>, <company>, etc.
XML and SGMLXML and SGML
Example of an SGML-encoded document<!DOCTYPE EAD PUBLIC "-//Society of American Archivists//DTD ead.dtd (Encoded Archival Description
(EAD))//EN" "ead.dtd">
<EAD>
<EADHEADER><EADID>bak00006</EADID>
<FILEDESC><TITLESTMT><TITLEPROPER>Ezra f. Beal Diary, 1850-1862</TITLEPROPER></TITLESTMT></FILEDESC>
<PROFILEDESC><CREATION><DATE>12/27/1996</DATE>Anna Koch</CREATION></PROFILEDESC>
</EADHEADER>
<FRONTMATTER>
<TITLEPAGE>
<TITLEPROPER>Ezra F. Beal. Diary, 1850- 1862.</TITLEPROPER>
<AUTHOR>Baker Library</AUTHOR>
<PUBLISHER>Harvard Business School, Boston, MA 02163</PUBLISHER>
<P>© The President and Fellows of Harvard College</P>
</TITLEPAGE>
</FRONTMATTER>....
XML and SGMLXML and SGML
• SGML-encoded documents are published today in print, on CDROM, or various networked media, including the WWW
• Each of these media requires a separate “style sheet” to decide how best to render each part of the document
XML and SGMLXML and SGML
• To publish an SGML document on the Web now you need an SGML “viewer” defined for your web browser, such as SoftQuad’s Panorama software
• Panorama downloads the document, it’s DTD and a style sheet from a web server, then renders the document on your monitor
XML and SGMLXML and SGML
• It’s clunky, slow, and requires extra work by readers to configure their web browsers correctly... In other words we don’t want to use it
• Adding SGML support directly into web browsers is considered far too difficult (parsing “real” SGML is very complicated)
XML and HTMLXML and HTML
What about HTML?
• HTML is a simple SGML DTD, but• HTML’s tag set is a mix of structural and
display aspects of documents, with each tag tied to exactly one way of displaying it in a particular browser
XML and HTMLXML and HTML
• Some examples of this:– <H1>, <H2>, and <H3> are different sizes of
headings– <BlockQuote> indents the current paragraph– <HR> puts a line across the page– Tables are used for spacing text legibly more
often than for tabular information
XML and HTMLXML and HTML
• HTML has been repeatedly “extended” to make it more useful, but also making it non-standard across different browser types. E.g.– “Blinking” text (Netscape 1.0)– Frames (Netscape 2.0)– Forms (HTML 3.0)– javascript– MetaHTML
XML and HTMLXML and HTML
• So HTML is at the end of its extensibility (though it probably won’t go away)
• HTML doesn’t allow– different tag sets for different domains (like
libraries vs. the defense industry vs. the medical industry)
– complex document structures (e.g. nesting)– ability to support DTDs when you want to (for
databases of web documents, etc.)
XML and HTMLXML and HTML
• but as we’ve seen, SGML is too complex for the WWW
• and HTML isn’t flexible enough
• enter XML...
XMLXML
What is XML?
• Based on SGML, but greatly simplified
• Disallows some things from SGML that made it very difficult to use on the Web
• Allows for a DTD, but doesn’t require it for rendering a document on the web
XMLXML
• Web browsers (Netscape, Microsoft) will support XML directly– no need for a separate “viewer” or “plugin”– will have a default style sheet for common tag
sets like HTML
• MS Internet Explorer already supports XML (with a few tricks)
XMLXML
• Documents will be marked up with a tag set from a DTD, as is currently done with SGML and HTML (although a DTD is not required)
• But they’ll also come with a style sheet and hyperlinks using two related standards:– XSL (eXtensible Stylesheet Language)– XLL (eXtensible Linking Language)
XML and XSLXML and XSL
eXtensible Stylesheet Language (XSL)
• Based on ISO/IEC 10179, the Document Style Semantics and Specification Language (DSSSL) standard
• Build on the current Cascading Style Sheets (CSS) mechanism for displaying HTML, and will likely coexist with it
XML and XSLXML and XSL
• XSL Supports display based on things like:– formatting of source elements based on
ancestry/descendency, position, and uniqueness – creation of formatting constructs including generated
text and graphics – definition of reusable formatting macros and
extensible set of formatting objects– writing-direction independent stylesheets (for
internationalization)
XML and XLLXML and XLL
eXtensible Linking Language (XLL)
• based on ISO/IEC 10744-1992, the HyTime standard for hypermedia (an extension to SGML)
• the XLL working group has split in two: Xpointer and Xlink. Their draft recommendations will be out within the next few days, but the final version won’t be available for awhile yet.
XML and XLLXML and XLL
• specifies the mechanism for supporting true hyperlinks in XML documents, including:– links to other documents (to replace the HTML tag
<A HREF=“URL”>)– links within documents– location-independent naming (i.e. URNs)– Bi-direction linking– Link management outside of documents to which
they apply (i.e. in a database)
HTML exampleHTML example
Example HTML for a simple document:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML Strict//EN//2.0">
<HTML>
<HEAD><TITLE>Home Page Title
<BODY>
<H1 align=center>Home Page Title
<P>Introduction to the home page
<H1><A HREF=”anotherpage.html">Current List</A>
<P>Next part of the document
</HTML>
XML exampleXML example
Example XML for the document, using a new DTD:
<!DOCTYPE MyDTD PUBLIC "-//IETF//DTD MyDTD Strict//EN//2.0">
<MyDTD>
<HEAD><TITLE>Home Page itle</TITLE> <AUTHOR>Webmaster</AUTHOR></HEAD>
<CONTENT>
<PAGE HEAD>Home Page title</PAGE HEAD>
<PARA>Introduction to the home page</PARA>
<A XML-LINK=“SIMPLE” HREF=“http://server.harvard.edu/document”>
Current List</A>
<PARA>Next part of the document</PARA>
</CONTENT>
</MyDTD>
XSL ExampleXSL Example
<xsl>
<rule><root/>
<HTML><BODY font-family="Arial, helvetica, sans-serif" font-size="12pt" background color="#EEEEEE">
<children/>
</BODY></HTML>
</rule>
<rule><!-- Hides the eadheader --><target-element type="eadheader"/>
<empty/>
</rule>
<rule><target-element type="p"/>
<p color="black">
<children/>
</p>
</rule>
XML and LibrariesXML and Libraries
• So, other than using XML when HTML isn’t enough, why else should you care about XML?
XML and MetadataXML and Metadata
• Resource Description Framework – new standard from the W3C RDF Model and
Syntax Working Group– chaired by Eric Miller from OCLC– supercedes the “Warwick Framework” (mostly)– currently a draft specification, in final review– also supported by web industry software giants
(MicroSoft, Netscape, IBM, etc.)
XML and MetadataXML and Metadata
• RDF– defines a way to put searchable metadata on
the web (with or without the web resources that are described by the metadata)
– and for structural metadata to relate groups of digital objects into logical constructs (like a bunch of scanned image files into an article)
XML and MetadataXML and Metadata
• RDF– specifies a generalized data model to handle all kinds
of metadata, including Dublin Core, MARC, etc., and a transportation syntax based on XML
– encodes sets of properties of web resources• properties describe resources using various
attributes
• and also relate resources to each other
XML and MetadataXML and Metadata
RDF core record example, using Dublin Core:
<?xml:namespace name=“http://purl.org/metadata/dublin_core#” as=“DC”?>
<?xml:namespace name=“http://www.w3.org/TR/WD-rdf-syntax#” as=“RDF”?>
<RDF:RDF>
<RDF:Description RDF:HREF=“http://www.somewhere.edu/some.doc”>
<DC:Creator>John Smith</DC:Creator>
<DC:Title>John’s document</DC:Title>
<DC:Date>03/12/98</DC:Date>
</RDF:Description>
</RDF:RDF>
XML and MetadataXML and Metadata
RDF aggregates record example:
<?xml:namespace name=“http://purl.org/metadata/dublin_core#” as=“DC”?>
<?xml:namespace name=“http://www.w3.org/TR/WD-rdf-syntax#” as=“RDF”?>
<RDF:RDF>
<RDF:Description RDF:HREF=“http://www.somewhere.edu/.html”>
<DC:Creator>
<RDF:Seq ID=“CreatorsAlphabeticallyBySurname”>
<RDF:LI>John Jones</RDF:LI>
<RDF:LI>John Smith</RDF:LI>
</RDF:Seq>
</DC:Creator>
</RDF:Description>
</RDF:RDF>
XML and HarvardXML and Harvard
• Digital Finding Aids Project– began in the summer of 1995– using SGML to encode archival finding aids with
the Encoded Archival Description (EAD) DTD– EAD version 1 just released, will be fully XML
compliant– Publishing both HTML and SGML for display
today
XML and HarvardXML and Harvard
• Digital Finding Aids Project
– Susan Von Salis, Manuscripts Department, Schlesinger Library will talk about using SGML (and XML) in the library
XML and LibrariesXML and Libraries
• So, other than using XML when HTML isn’t enough, why else should you care about XML?
• Resource Description Framework (RDF)
• Library finding aids encoded with the Encoded Archival Description (EAD)
• Web-accessible full-texts encoded with the TEI
• Lots of other possibilities (MARC records, A&I databases, etc.)
XMLXML
• Stay tuned for (many) more developments...
for more info see:
http://www.w3.org/XML/