intro xml for archivists (2011)
DESCRIPTION
A short introduction to XML (intended to be used as part of a course on EAD (XML for archives).TRANSCRIPT
archives hub workshop 2011
An Introduction to XML
XML
eXtensible Markup Language
Define XML
XML syntax and rules
XML DTDs and Schemas
Displaying XML
Why use XML?
What is XML?
XML is a grammatical system for creating
languages… a meta-language
Use XML to design your own markup language,
consisting of meaningful tags that describe the
data they contain
Create a language for describing anything:
archives, books, government services, properties…
What is interoperability?
the ability to exchange/share data
provides advantages of cross-searching, so user can easily search across and retrieve resources from a variety of different systems
allows users to move beyond individual websites for individual resources
integrates information resources presented in different formats
XML facilitates interoperability
Something to remember about XML
XML does not do anything itself. It is pure
information wrapped in XML tags.
You must use other means to send, receive or
display the data
XML XML technologiesis used by
to create
Detailed description to view in a browser Summary
entry to view in a browser
PDF for print
XML: elements
<language> English </language>
<tag> </tag>content
XML attributes
Attributes are simple name/value pairs associated with an element
<tag attribute_name=“attribute_value”>content</tag>
<language …………….. >English<language>
<language langcode=“eng”>English</language>
<date>20 Sept 2004</date>
<date normal=“2004”>20 Sept 2004</date>
XML and Content
XML is essentially about structure. It focuses on
what the data is
The structure enables content to be identified by
machines so they can process the data
XML is not primarily about content, though there
might be some restrictions on content
Sample Content
Papers of John Ruskin
1864-1888
10 boxes
Held at the University of London Library
Table
Title Papers of John Ruskin
Dates 1864-1888
Extent 10 boxes
Held At University of London Library
XML: Structure
<catalog>
<title>Papers of John Ruskin</title>
<date>1864-1888</date>
<extent>10 boxes</extent>
<location>University of London Library</location>
</catalog>
Well-formed XML
a root element is required
<catalog> all content </catalog>
closing tags are required
elements must be properly nested
case must be consistent
attribute values must be in quotation marks
Create tags for your data
Hands-On
Valid XML (1)
Valid XML: rules specify elements and attributes &
how they are used
Valid XML provides consistency and facilitates the
exchange of data
Valid XML is important for displaying, processing
and exchanging XML in a wider environment
Valid XML (2)
Must conform to a Document Type Definition (DTD)
or Schema
Archives: Encoded Archival Description - EAD
version 1; EAD 2002
e-learning: IEEE Learning Object Metadata
Schema (LOM)
Government: Council Roadworks Schema
DTDs/Schemas
A Document Type Definition or Schema defines the
building blocks of an XML document
It specifies elements and attributes and defines
how they can be used
People can agree to use a common DTD/schema for
interchanging data
Usually point to an external DTD/schema from the
XML document
Schemas
Schemas perform the same task as DTDs
Schemas use XML syntax
Schemas support complex data types
Schemas are extensible
One XML document can point to more than one
schema
A simple XML document
<?xml version="1.0"?>
<note>
<to>Rachel</to>
<from>John</from>
<heading>Reminder</heading>
<body>Don't forget the concert!</body>
</note>
Example of a simple Schema
<?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.w3schools.com" xmlns="http://www.w3schools.com" elementFormDefault="qualified">
<xs:element name="note"> <xs:complexType>
<xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/>
</xs:sequence> </xs:complexType> </xs:element> </xs:schema>
What about display?
XML file DTD or Schema Valid XML
Blue Elephant Papers
……………………
…………
Blue Elephant Papers Browse
List
Displaying XML
XML technologies – for displaying, retrieving,
transforming, manipulating
DOM, SAX, XForms, XLink, XPointer
XSL FO – Extensible Stylesheet Language
Formatting Objects
XSLT – Extensible Stylesheet Language for
Transformations
CSS – a less sophisticated way to display XML
Transformation of XML
Transformation involves the reading in of an XML
file and an XSLT file to a processor,which can then
generate some output – typically HTML
XSLT
XML
processor HTML output
HTML vs. XML
HTML is ONLY for display, typically in a Web
browser
Browsers display XML but not necessarily as HTML (http://www.w3schools.com/xml/simple.xml)
HTML tags do not describe the content
HTML cannot easily be extracted
Store the data separately as XML files and change
the presentation with HTML
Why use XML?
International standard, supported by the W3C
The most common means to transmit data
XML is open, licence free and platform neutral
XML is human and machine readable
XML documents are text documents: independent
of hardware and software
More reasons to use XML
Separation of content and presentation
With proprietary systems content is inextricably
bound up with format
Use XSLT (Extensible Style Sheet Language for
Transformations) to present XML data
Flexibility to manipulate and customise
..and hierarchy
Hierarchical structure
<collection> <part> <item> One item </item> </part></collection>
…as well as sharing data
XML is the main basis for defining data exchange
languages
Meaningful/consistent tags facilitate extraction
Different incompatible systems can access and use
the same data
Summary
XML must be well-formed and valid
DTDs and Schemas provide tags, attributes and
rules
XML requires other XML technologies
XSLT can transform XML
XML is simple, flexible and great for data
exchange
It is a more efficient way to a sustainable system