xml — an opportunity for data standards in the geosciences
TRANSCRIPT
Computers & Geosciences 27 (2001) 839–849
XML } an opportunity for 5meaningful> data standardsin the geosciences$
Simon W. Houlding*
Geoscience Modeling Consultant, 8625 Saffron Place, Burnaby, BC, Canada V5A 4H9
Received 16 June 1999; received in revised form 1 January 2000; accepted 20 June 2000
Abstract
Extensible markup language (XML) is a recently introduced meta-language standard on the Web. It provides therules for development of metadata (markup) standards for information transfer in specific fields. XML allowsdevelopment of markup languages that describe what information is rather than how it should be presented. This allows
computer applications to process the information in intelligent ways. In contrast hypertext markup language (HTML),which fuelled the initial growth of the Web, is a metadata standard concerned exclusively with presentation ofinformation. Besides its potential for revolutionizing Web activities, XML provides an opportunity for development of
meaningful data standards in specific application fields. The rapid endorsement of XML by science, industry and e-commerce has already spawned new metadata standards in such fields as mathematics, chemistry, astronomy, multi-media and Web micro-payments. Development of XML-based data standards in the geosciences would significantly
reduce the effort currently wasted on manipulating and reformatting data between different computer platforms andapplications and would ensure compatibility with the new generation of Web browsers. This paper explores theevolution, benefits and status of XML and related standards in the more general context of Web activities and uses thisas a platform for discussion of its potential for development of data standards in the geosciences. Some of the
advantages of XML are illustrated by a simple, browser-compatible demonstration of XML functionality applied to aborehole log dataset. The XML dataset and the associated stylesheet and schema declarations are available for FTPdownload. # 2001 Elsevier Science Ltd. All rights reserved.
Keywords: XML; Metadata; Standard; Geoscience; Internet
1. Introduction
The rapid expansion of the Web has been fuelled in
large part by the successful introduction of hypertextmarkup language (HTML). Unfortunately, the verysuccess of HTML has created new problems for the
Web. Network traffic volume on the supposedly speed-of-light Internet is now such that it frequently moves ata crawl and, although nearly every possible kind of
information is available somewhere on-line, it is becom-ing increasingly difficult to find the piece one needs.Both problems arise from the nature of HTML.
Despite being the most successful electronic-publishinglanguage invented, HTML is superficial in its concernwith information presentation as opposed to informa-
tion content. HTML merely describes how a Webbrowser should arrange text and images on a page, itprovides no useful information about the content itself.
HTML’s concern with presentation makes it relativelyeasy to learn, but has inherent costs. The familiar phrase‘‘what you see is what you get’’ ironically highlights theproblem, in fact, with HTML, ‘‘what you see is all
you’ve got!’’
$Dataset available from server at http://www.iamg.org/
CGEditor/index.htm
*Tel.: +1-604-420-0811; fax: +1-604-420-6840.
E-mail address: [email protected] (S.W. Houlding).
0098-3004/01/$ - see front matter # 2001 Elsevier Science Ltd. All rights reserved.
PII: S 0 0 9 8 - 3 0 0 4 ( 0 0 ) 0 0 1 4 5 - X
To work effectively with information, computers needto be told exactly what the information is, how it is
related and how to deal with it. Extensible markuplanguage (XML) is a new meta-language designed to dojust that, to make information self-describing to the
computer. This apparently simple change in how compu-ters operate on and communicate information has thepotential to dramatically increase the capacity andefficiency of the Web and to extend it beyond information
delivery to many other kinds of human activity. The XMLstandard was completed in early 1998 by the W3C (WorldWide Web Consortium) and is already spreading rapidly
through certain science disciplines and industries rangingfrom manufacturing to medicine.XML is based on the concept of metadata, i.e. data
about data. Metadata is a description of the character-istics of data that have been collected for a specificpurpose. XML employs metadata tags to describe what
information is, not (like HTML) what it should looklike. For example, HTML would tag the components ofan order document for a shirt as boldface, paragraph,row and column; in contrast, an XML implementation
tags them as price, size, quantity and color. Computerapplications can then recognize the document as acustomer order and take appropriate action: display it in
appropriate ways to management or production, put itthrough an accounting system, and issue deliveryinstructions.
Another advantage of XML is its reliance on the newUnicode standard, a character-encoding system thatsupports intermingling of text in the world’s majorlanguages. In HTML a document is generally in one
particular language, whether English, Japanese orArabic. Applications that cannot read the charactersof that language, cannot do anything with the docu-
ment. But applications that read XML properly can dealwith any combination of any of these character sets.Thus, XML will enable exchange of information not
only between different computer systems but also acrossnational and cultural boundaries.XML has implications that extend well beyond the
Web and browser applications. It has already gainedsignificant acceptance within e-commerce, industry andcertain science disciplines as a data standard forinterfacing between computer applications. This is
largely because the XML standard includes specifica-tions of how an XML document should be parsed andrepresented within a computer } any computer,
irrespective of type or operating system. This internalrepresentation of an XML document, called the docu-ment object model (DOM), allows a single document to
be accessed in the same way by different applicationsrunning on different computer platforms. XML parsers(the software that creates the DOM) are now readily
available for incorporation into application software inmost programming languages.
All of these advantages combine to provide an idealplatform for development of long-awaited and mean-
ingful data standards in the geosciences. The benefits interms of efficiencies resulting from being able to moveinformation freely between computer applications re-
quire little elaboration. The additional benefits ofbrowser compatibility, being able to display a singledataset (a borehole log or map for example) in differentways for different audiences, and the ability to conduct
meaningful and efficient information searches are addedincentives.This paper traces the evolution of XML, discusses the
benefits and advantages and summarizes the currentstatus of XML and related standards. This provides aplatform for discussion of XML, and its potential for
development of new data standards, in a geosciencecontext. The paper concludes with a demonstration ofXML based on a simple borehole log example. The
references include links to a number of useful XML-related websites for readers who wish to follow up withtheir own XML research or to acquire XML-relatedsoftware.
2. XML evolution
XML has evolved from standard generalized markuplanguage (SGML), a meta-language (language about
languages) which in turn evolved within the printing andpublishing industries. HTML is an implementation ofSGML developed specifically for presentation andlinking of documents on the Web. It has been so
successful that its limitations are beginning to restrictWeb growth. Hence the need for a more generalizedapproach like XML.
2.1. SGML and the publishing industry
For generations, printers and editors scribbled noteson manuscripts to instruct typesetters. This ‘‘markup’’evolved on its own until the mid-1980s, when it became
an International Organization for Standardization (ISO)approved standard for creation of new markuplanguages.SGML has since proved useful in many large
publishing applications where it is used to define thestructure of electronic documents. HTML was definedusing SGML when the need for a simple markup
language arose on the Web. The problem with SGML isthat it is too general and full of features designed tominimize keystrokes in an era when every byte had to be
accounted for. It is more complex than Web browsersand average users can cope with.
S.W. Houlding / Computers & Geosciences 27 (2001) 839–849840
2.2. HTML and the World Wide Web
HTML is an implementation of SGML designed toprovide Web authors with a relatively simple andefficient means of publishing documents for Web
distribution. The SGML declaration for HTML isimplicit among Web implementations.In HTML documents, tags define the start and end of
documents, headings, paragraphs, lists, hypertext links,
etc. HTML elements are generally identified in adocument as a start tag, which gives the element nameand attributes, followed by the content, followed by an
end tag. Start tags are delimited by 5 and >, and endtags are delimited by 5/ and >. For example
5H1 > This is a Heading5=H1 >
5P > This is a paragraph5=P >
The content of an element is a sequence of characters(text) and nested elements. Some elements, such as
anchors, cannot be nested. The content model for a tagdefines the syntax permitted for the content. HTML isdesigned to be flexible in that the closing tags of some
elements may be omitted when they are clearly impliedby the context and tags and their attributes are caseinsensitive.HTML has become the lingua franca for publishing
hypertext on the Web. It is a non-proprietary formatthat can be created and processed by a wide range oftools, from simple plain text editors to sophisticated
wysiwyg authoring tools and Web browsers. In terms ofwhat it was originally designed to do and its acceptanceby the Web community, HTML has been highly
successful. However, it tells the computer nothing aboutthe content of a document other than how it should bedisplayed.
This is extremely wasteful in terms of computerprocessing. Client-side computers are reduced to plat-forms for document display, and server-side computersare required to endlessly produce and communicate
documents to feed the demand. It is also wasteful interms of Web search efficiency. With HTML a searchengine cannot distinguish between references to a book
by Benjamin Franklin and a book about BenjaminFranklin, which is why the results of a Web search areinvariably cluttered with many useless and inappropriate
links.
2.3. Separation of content from style
The solution is simple: use tags that say what the
information is, not how it looks, and separate thecontent of a document from its presentation (or style).XML does exactly this } it allows use of tags that are
descriptive of the contents of a document and itseparates the description of structure and content from
information concerning presentation. The former is inthe document, while the latter is in a stylesheet that the
document links to. This makes it much easier to have,and to change, a common presentation across a set ofdocuments, or to have different presentations of the
same information for different audiences. Only onestylesheet is required to render many XML documents;conversely, a single XML document may be rendered inmany ways by different stylesheets.
2.4. XML and the next-generation Web
XML was created by removing frills from SGML to
arrive at a more streamlined, digestible meta-language.XML consists of simple rules that allow a markuplanguage (tag-set) to be created from scratch. The rules
ensure that a single compact program, called a parser,can process any conforming language.Whereas HTML is a tag-set built from the SGML
meta-language, XML is not a tag-set at all; rather, it is a
more easily used form of meta-language, derived directlyfrom SGML. XML does not provide a set of tags to use,as HTML does, instead it provides rules for building
tag-sets that suit information requirements. For exam-ple, with XML a computer application can readilydistinguish between 5author>Benjamin Franklin
5/author> and 5subject>Benjamin Franklin5/subject>.Another key difference is that an XML-compliant
browser has no hard-coded knowledge of the tag-set in
a document that it will be expected to display. Instead,the XML document includes a pointer to a stylesheet, afile that accompanies the document and defines how the
content of the tags should be rendered. The XMLdocument may (optionally) also contain a pointer to adocument type definition (DTD), a declaration of
allowable tags, dependencies and content type (moreon this later).The XML rules can be summarized as follows:
* every XML document must have a root element (tag)that encloses the contents;
* every start tag must have a closing tag;* tags must nest cleanly;* empty tags have a different form to make it clear that
these are tags with no closing tag;* all attribute values must be in quotation marks;* tags are case sensitive and must match;* XML documents need a declaration at the top to
signal what they are.
An XML document that conforms to these rules, asdetermined by an XML parser, is classified as well-formed. The rules are significantly stricter than those
implemented within HTML, i.e. current HTML docu-ments do not satisfy the XML rules. This does not mean
S.W. Houlding / Computers & Geosciences 27 (2001) 839–849 841
that HTML will be replaced by XML, since HTML isstill useful for presentation purposes. HTML will remain
in its current form but will converge towards XML-conforming HTML.The nesting rule automatically forces a certain
simplicity on every XML document, which takes onthe structure known in computer science as a tree. Aswith a genealogical tree, each graphic and bit of text inthe document represents a parent, child or sibling of
some other element; relationships are unambiguous.Trees cannot represent every kind of information, butthey can represent most kinds that computers are
required to understand. A tree representation ofinformation, moreover, makes it extremely convenientfor programmers to generate software for accessing the
information. For example, the XML data for a portionof a borehole log document might be
5BOREHOLE>|5IDENTITY>
5NAME>10085/NAME>
5PROPERTY>Las Estrellas(Norte)5/PROPERTY>
5DATE>November 13 1998
5/DATE>5/IDENTITY>|
|5/BOREHOLE>
The graphical tree representation of this informationis shown in Fig. 1.This tree representation of XML content, which is
generated within a computer by an XML parser, iscalled the DOM. It is a key component of the XMLstandard and provides an efficient basis for softwareapplications to access and manipulate the XML content
in standard ways through XML tag references. In asoftware context, the DOM is the application program-ming interface (API) for an XML document.
In a browser context, the DOM is a platform- andlanguage-neutral interface that allows software scripts todynamically access and update the content, structure
and style of documents. The document can be furtherprocessed and the results of that processing can be
incorporated back into the presented page. Thus thecontent of an XML document can be manipulated(formatted, re-calculated, sorted, etc.) within a client-
side browser to suit varying presentation requirements.
3. XML advantages
The combination of more efficient distribution ofprocessing, more accurate searching and more flexible
linking will revolutionize the structure of the Web andmake possible completely new ways of accessinginformation. Users will find this new Web faster, more
powerful and more useful than the Web of today.Of equal importance, the XML DOM will standardize
the way in which information is passed between
computer applications, both on the Web and beyond.
3.1. Reduced web traffic
As XML spreads, the Web will become noticeablymore responsive. At present, client-side computersconnected to the Web, whether they are powerful
desktops or handheld devices, cannot do much morethan get an HTML form, fill it out and then swap it backand forth with a Web server until a task is completed.
The structural and semantic information that can beadded with XML allows these client-side devices to domuch more processing themselves, without recourse to a
Web server. All of the information required for aparticular client operation can be dispatched by theserver as a single dataset to be processed and presentedin different ways by the client computer. This will
significantly reduce both network traffic and the load onWeb servers.
3.2. More efficient Web search functionality
As more of the information on the Net is labeled with
field-specific XML tags, it will become easier to findexactly what is needed. Librarians determined a long timeago that the way to find information quickly is to look
not at the information itself but rather at much smaller,more focused sets of data that point to useful sources;hence the library card catalogue, a metadata approach.From the outset, part of the XML project has been to
create a standard for metadata itself. Resource descrip-tion framework (RDF) will do for Web data whatcatalogue cards do for library books. RDF integrates a
variety of web-based metadata activities includingsitemaps, content ratings, channel definitions, searchengine data collection (web crawling), digital library
collections and distributed authoring, using XML as aninterchange syntax. Deployed across the Web, RDF
Fig. 1. Tree representation of XML metadata tags and
contents.
S.W. Houlding / Computers & Geosciences 27 (2001) 839–849842
metadata will make searching much faster and moreaccurate than it is at present.
3.3. More efficient Web linking
Hyperlinks will also do more when powered by XML.A standard for XML-based hypertext, named extensiblelinking language (XLL), will provide a choice from a list
of multiple destinations. Other kinds of hyperlinks willallow insertion of linked text or images within adisplayed document, instead of forcing a move to anew document.
Perhaps most useful, XLL will enable authors to useindirect links that point to entries in a central databaserather than to the linked documents themselves. When a
document’s address changes, the author will be able toupdate all the links that point to it by editing just onedatabase record. This will significantly reduce the
familiar ‘‘404 File Not Found’’ error that signals abroken hyperlink.
3.4. Meaningful data standards
Both on the Web and beyond, the greatest impact ofXML is likely to be in data transfer efficiencies achieved
through development of new data standards. These willbe based on the ready availability of XML parsers for allsoftware languages and computer platforms, and the
ease with which the information in the XML DOM canbe accessed.XML allows anyone to design a new, custom-built
markup language, but designing languages is a challengethat cannot be undertaken lightly. The design is justthe beginning: the meanings of tags are not going to beobvious to others unless accompanied by declarations
that explain them, nor to computers unless givensoftware to process them.What XML does is simple but effective. It lays down
ground rules that strip away a layer of programmingdetail so that users with similar interests can concentrateon the hard part } agreeing on how they want to
represent the information they commonly exchange.This is not an easy problem to solve, but it is not a newone either.
Such agreements will be made, because the prolifera-tion of incompatible computer systems and applicationshas imposed delays, costs and confusion on nearly everyarea of human activity. Users want to share information
and ideas and do business without all having to use thesame computer platform and software; field-specificinterchange languages go a long way toward making
that possible.Before drafting a new markup language with XML,
designers must agree on three things: which tags will be
allowed, their content type and how tagged elementsmay nest within one another. These declarations are
typically codified in a DTD. An XML document thatconforms to a DTD, as determined by an XML parser,
is classified as ‘‘valid’’. The XML standard does notcompel language designers to use a DTD, but most newmarkup languages will probably have them, because
they make it much easier for programmers to writesoftware applications that understand the tags and dointelligent things with the content. A DTD, in effect,becomes the metadata standard (or schema) for a
particular field of activity. HTML, for instance has aDTD that is incorporated into all Web browsers.
4. XML and related standards
The W3C was founded in October 1994 to lead the
Web to its full potential by developing commonprotocols that promote its evolution and ensure itsinteroperability. It is funded by member organizations,
and is vendor neutral, working with the global commu-nity to produce specifications and reference softwarethat is made freely available throughout the world.
4.1. Extensible markup language (XML)
The current W3C recommendations are XML 1.0,February 1998. As announced by W3C
XML is primarily intended to meet the requirementsof large-scale Web content providers for industry-specific markup, vendor-neutral data exchange,
media-independent publishing, one-on-one market-ing, workflow management in collaborative author-ing environments, and the processing of Web
documents by intelligent clients. It is also expectedto find use in certain metadata applications. XML isfully internationalized for both European and Asian
languages, with all conforming processors required tosupport the Unicode character set in both its UTF-8and UTF-16 encodings. The language is designed for
the quickest possible client-side processing consistentwith its primary purpose as an electronic publishingand data interchange format.
XML includes recommendations for the DOM, DTDand XLL. However, the DTD and XLL specifications inparticular are still evolving and subject to change.
4.2. Extensible stylesheet language (XSL)
XSL is a language for expressing stylesheets and isitself an implementation of XML. As defined by W3C in
the current working draft, it consists of two parts:
* a language for transforming XML documents;* an XML vocabulary for specifying formatting
semantics.
S.W. Houlding / Computers & Geosciences 27 (2001) 839–849 843
An XSL stylesheet specifies the browser presentationof XML documents by describing how an instance is
transformed into an XML document that includes theformatting vocabulary. Thus XSL includes appropriatetags for selection, evaluation, formatting and presenta-
tion of XML content.
4.3. Emerging XML-based data standards
There is already a rapidly developing body of XML
tag-sets and applications. mathML is a tag-set thatallows integration and display of mathematical expres-sions in a browser document. Similarly, chemML allows
use of molecular symbols. W3C announced in February1999 that it has initiated the process to define extensible3D (X3D), a next-generation 3D standard for virtual
reality modeling language (VRML) that includes in-tegration with XML.Microsoft is using XML to create the channel
definition format for managing its browser-based sub-scription news channels. There are also specialized XMLtag-sets in existence for astronomy: astronomical mark-up language (ASL) and DNA sequencing: bioinformatic
markup language (BML).The synchronized multimedia integration language
(SMIL) is already a W3C recommendation. This
employs an XML implementation to coordinate, inte-grate and synchronize digital files in different media(video, audio, images and text) into a multimedia
presentation, with tags that control what to play, whenand for how long.In the context of e-commerce, W3C has a working
draft for a Common Markup for Web Micropayment
Systems. This specification provides an extensible way toembed in a Web page all the information necessary toinitialize a micropayment (amounts, currencies, payment
systems, etc.).
4.4. Available XML software applications
Microsoft has released Internet Explorer 5 (beta) asan XML-compliant browser, with its own interpretationof the W3C working drafts for XSL and DTD, i.e. these
implementations are still subject to change. Netscapehas announced imminent release of its own XML-compliant browser.A variety of shareware and commercial XML docu-
ment editors, XSL stylesheet editors and DTD editorsare already available. XML parsers are also available asshareware for incorporation into application software.
IBM recently announced the availability of the firstXML-powered search engine.Major database software vendors are implementing
XML interfaces, as evidenced by Oracle’s stated strategyof delivering a platform for software developers to build
and deploy scalable Web applications that exploit XML.More recently, Oracle announced a complete infrastruc-
ture, based on XML, for the exchange and manage-ment of information associated with all aspects ofe-commerce.
5. Data standards in the geosciences
5.1. Current focus on GIS standards
The principal focus in the development of datastandards in the geosciences has been on GIS datasets,involving both raster and vector data types. As
emphasized by Albrecht (1999), there is a plethora oforganizations concerned with GIS standards and acorresponding number of proposed standards, i.e. thereis no real standard in the strict meaning of the term.
With no apologies for the ever-growing propensity foracronyms, the list of proposed standards includesDIGEST, GDF, SAIF, SDTS, CEN/TC287, ISO/
TC211, OGIS, SQL3-MM, GRIB and BUFR. As notedby Huber and Schneider (1999), many of thesestandardization efforts are dominated by efforts to
prolong the life of legacy systems rather than to ensurethe interoperability of GIS datasets.On the brighter side, much useful work, based on
object-oriented approaches, has been accomplished bythese standardization efforts in terms of establishing thedata types and data dependencies inherent to GISdatasets. XML provides a relatively simple means of
leveraging these accomplishments to achieve meaningfuldata standards.Although a good place to start, GIS datasets are
only part of the story; they are generally limited to2D (at best 2.5D) and 3D datasets such as borehole logsand geophysical surveys must also be provided for.
Anyone familiar with computer applications in thegeosciences is only too painfully aware of the scarcityof appropriate data standards. Considerably more timeis expended on manipulating and reformatting data
between applications than on processing the applica-tions themselves.Despite the volume of data processed, there have as
yet, with the exceptions discussed in Section 5.4, been noconcerted efforts to develop comprehensive metadatastandards in the geosciences. This applies to all of the
basic data types dealt with, from vector maps, DTMsand raster images to borehole and well logs andgeophysical surveys. As stated by Strand (1995) in a
GIS context
Until definition and standards are adopted, the
definition and development of GIS applications willremain an admixture of art, science and perspiration.
S.W. Houlding / Computers & Geosciences 27 (2001) 839–849844
5.2. XML simplification of the development process
Emergence of the XML standard, and its enthusiastic
reception by science, industry and commerce, providesan ideal opportunity to rectify this situation. This comesabout not necessarily because of the advantages of Webcompatibility (see Section 5.3), but rather because of the
ready availability of XML parsers (APIs) for interfacingwith XML content in a standard way. As a result, XMLsimplifies the task of developing a new metadata
language to the following steps:
* agreement on tag names, e.g. 5BOREHOLE>,5INTERVAL>, 5SAMPLE>;
* formulation of tag dependencies, e.g. a BOREHOLEmust have only one IDENTITY, but may have zeroor many downhole INTERVALs, each of which may
have zero or many DATA or TEXT elementscontaining sample values and observations;
* specification of tag contents, e.g. a 5NAME> tag
contains text, a 5DISTANCE> tag contains a fixedvalue, a 5BOREHOLE> tag only contains othertags.
Once these specifications have been accepted andpublished, any application can access the contents of aconforming document (or dataset) by incorporating an
XML parser. The specifications should also be declaredin a DTD or schema, firstly so that documents canbe validated, and secondly so that the DTD can be made
available to the appropriate applications to ensureconformance.As stated earlier, one of XML’s advantages is that it
separates content from presentation (style). This eli-
minates from the development process any concern withhow the content is presented and allows development ofstandards to focus on the principal objective of ensuring
the interoperability of content between applications.Initially at least, concern with processing, presentationand development of stylesheets (or their equivalent) then
becomes the responsibility of individual applicationsand audiences. This separation of concern with contentfrom concerns with processing and presentation is anecessary simplifying step in the successful development
and introduction of meaningful standards.
5.3. Advantages of Web compatibility
Just as HTML fuelled the first-generation Web, XMLis set to fuel the next generation. The Web is already aprimary means of communicating and disseminating
information. The advantages of compatibility betweengeoscience datasets and the Web appear obvious and arelikely to become even more so. The advantages of being
able to perform an intelligent search on availabledatasets alone are significant. The current lack of
appropriate graphical display functionality forgeoscience datasets will be alleviated by the imminent
release of X3D, an XML implementation of VRML.And for those concerned with security of information,the Web already makes better provision than most
proprietary software applications.
5.4. Existing metadata implementations
Several ongoing attempts to develop metadata stan-
dards in the geosciences are discussed briefly below.Starting in 1996, the Federal Geographic Data
Committee (FGDC) in the US has developed a
‘‘Content Standard for Digital Geospatial Metadata’’based on SGML:
The objectives of the standard are to provide a
common set of terminology and definitions for thedocumentation of digital geospatial data. The stan-dard establishes the names of data elements andcompound elements (groups of data elements) to be
used for these purposes, the definitions of thesecompound elements and data elements, and informa-tion about the values that are to be provided for the
data elements.
The standard was developed from the perspective of
defining the information required by a prospectiveuser to determine the availability of a set ofgeospatial data, to determine the fitness of the setof geospatial data for an intended use, to determine
the means of accessing the set of geospatial data, andto successfully transfer the set of geospatial data. Assuch, the standard establishes the names of data
elements (tags) and compound elements to be usedfor these purposes, the definitions of these dataelements and compound elements, and information
about the values that are to be provided for the dataelements. The standard does not specify the means bywhich this information is organized in a computer
system or in a data transfer, nor the means by whichthis information is transmitted, communicated, orpresented to the user.
The standard is limited to GIS datasets and, based on
available discussion, appears overly complex (it employsa total of 334 different tags). The complaints regardingcomplexity are likely due at least in part to its use of
SGML and dependence on SGML software tools.The Australia and New Zealand Land Information
Commission (ANZLIC) has adopted a simpler ap-
proach, originally based on SGML and now compatiblewith XML
ANZLIC, through its Metadata Working Group, is
actively pursuing an objective to implement adistributed national directory system to form a
S.W. Houlding / Computers & Geosciences 27 (2001) 839–849 845
foundation for the Australian and New ZealandSpatial Data Infrastructures. The various State and
Commonwealth jurisdictions are currently collectingmetadata, as per the ANZLIC metadata standard(1996), to provide an extensive national picture of
available spatial data which is available through adistributed directory of the Australian Spatial DataDirectory and accessible over the Internet, managedjointly by all the ANZLIC jurisdictions.
The US approach, developed by the Federal Geo-graphic Data Committee (FGDC), specifies the
structure and expected content of some 220 elements(tags) which are intended to describe digital geospa-tial datasets adequately for all purposes. The
ANZLIC approach is deliberately less ambitiousthan what has been attempted in the US. Argumentsadvanced in support of the more modest objective
rely on experience to date with the creation of high-level directories in Australia.
Users need a level of detail, clarity and accuracy in
the metadata sufficient for them to judge whether ornot to make further inquiries of the contact
organisation responsible for a dataset. Maintaininga comprehensive directory, however, imposes a
significant burden on custodians. Experience indi-cates that a balance needs to be struck between thesetwo factors.
The ANZLIC standard is primarily concerned with
information regarding dataset accessibility and qualityrather than the dataset itself. In summary, both of themetadata implementations discussed above are, by
design, closer in function to information retrievalsystems as opposed to operational data standards.
6. XML borehole data demonstration
A small borehole dataset is employed for a simpledemonstration of XML functionality in a Web browser.The demonstration is performed with Microsoft’s
Internet Explorer 5 (beta) and includes both an XSLstylesheet and a DTD (or schema in Microsoftterminology). It should be noted that certain features
of the current Internet Explorer 5 implementation ofXML and related standards are not yet fully compliant
Fig. 2. Tree representation of XML tags and contents in XML borehole data demonstration file with Microsoft XMLNotepad
editor.
S.W. Houlding / Computers & Geosciences 27 (2001) 839–849846
with W3C recommendations due to the evolutionarystatus of the standards. However, the principles of XML
are adequately demonstrated.The XML dataset is included in file borehole.xml.
A color-coded presentation of portions of the dataset
is shown in Fig. 3. This is one of the presentationoptions provided by the stylesheet, which in realityis several stylesheets in one. Both the stylesheetand the DTD are referenced at the head of the XML
document.
The XML dataset was compiled with Microsoft’sXMLNotepad editor. A tree representation of the XML
tag dependencies is shown in Fig. 2. This is a graphicalrepresentation of the DOM produced by an XMLparser.
Features demonstrated by this simple XML imple-mentation include the following:
* Variable stylesheet formatting of XML content:For example, the Lithology values in the interval
Fig. 3. Optional color-coded presentation of XML tags, attributes and contents of XML borehole data demonstration file with
Microsoft Internet Explorer 5 browser.
S.W. Houlding / Computers & Geosciences 27 (2001) 839–849 847
records shown in Fig. 4 are color-coded
according to value and the interval records arehighlighted according to a cut-off test applied tothe Lead values.
* Stylesheet formatting and ordering of XML contentbased on data values: For example, the intervalrecords shown in Fig. 4 can be dynamically re-
ordered and sorted in ascending order by clickingon one of the column headers.
* Stylesheet computation of new values based ondocument content: For example, the $Equivalent
values included in the interval records shown in
Fig. 4 are obtained by combining the Lead and Zinc
values in a mathematical expression.* Dynamic reformatting via script manipulation of thestylesheet DOM: For example the stylesheet accom-
modates client-side presentation of the content eitherin a color-coded XML listing format (refer Fig. 3) orin a simple borehole log format (refer Fig. 4); the
stylesheet also accommodates optional display ofCollar and Survey information.
The stylesheet for the demonstration is included in fileborehole.xsl. It employs XSL tags to wrap HTML
Fig. 4. Optional borehole log presentation of contents of XML borehole data demonstration file with Microsoft Internet Explorer 5
browser.
S.W. Houlding / Computers & Geosciences 27 (2001) 839–849848
presentation and style information around the XMLcontent prior to browser display (see Harold (1998),
Pardi (1999)). Stylesheet access to the XML documentDOM is achieved through Javascript code embeddeddirectly in the stylesheet. Dynamic reformatting of the
browser presentation is achieved by Javascript codeembedded in the resulting HTML document by thestylesheet. This code accesses the stylesheet DOM toreset the values of display flags and sort criteria. It
should be noted that XSL is itself an implementation ofXML and therefore has its own accessible DOM.The schema (DTD) for the demonstration is included
in file borehole-schema.xml. It specifies tag dependenciesand content requirements. As an example of how theschema works: a 5BOREHOLE> element (tag) has no
content other than other elements, one of which must bean 5IDENTITY> element which must include oneeach of 5NAME>, 5PROPERTY> and 5DATE>
elements, each of which contains text. In contrast, a5BOREHOLE> element may contain many 5IN-TERVAL> elements, each of which must contain oneeach of 5FROM> and 5TO> elements and may
optionally contain zero or many 5DATA> elements(containing fixed values) and zero or many 5TEXT>elements (containing text).
7. Conclusions
XML is a new metadata standard that will increasethe efficiency of the Web by reducing network traffic andallowing intelligent searches. Perhaps of greater impor-
tance is its potential for creating data standards for e-commerce, industry and science that will allow efficientmovement of information between computer platforms
and applications.XML presents an opportunity for development of
meaningful data standards in the geosciences that builds
upon previous achievements in the field. For GISdatasets in particular, much of the development interms of establishing data dependencies and data types
has already been achieved. The benefits to the geo-sciences in terms of efficient, unrestricted movement ofinformation between computer applications require littleelaboration. The additional benefits of browser compat-
ibility, being able to display a single document or datasetin different ways for different audiences, and the abilityto conduct meaningful and efficient information
searches on the Web are added incentives. XML toolsthat facilitate these objectives are now readily available.
The challenge will be to establish an appropriate bodywithin the geosciences to coordinate development of
data standards.
References
Albrecht, J., 1999. Geospatial information standards. A
comparative study of approaches in the standardisation of
geospatial information. Computers & Geosciences 25 (1),
9–24.
Harold, E.R., 1998. XML Extensible Markup Language. IDG
Books Worldwide Inc., Foster City, CA, 426pp.
Huber, M., Schneider, D., 1999. Spatial data standards in view
of models of space and the functions operating on them.
Computers & Geosciences 25 (1), 25–38.
Pardi, W.J., 1999. XML in Action. Microsoft Press, Redmond
WA, 329pp.
Strand, E., 1995. GIS application transfer rooted in trees. GIS
World 6 (2), 28–30.
Further reading
ANZLIC. SGML/XML Document Type Definition (DTD) for
geospatial metadata in Australasia, http://www.environ-
ment.gov.au/net/anzmeta/.
Bosak, J., Bray, T., 1999. XML and the second-generation
Web. Scientific American 5, http://www.sciam.com/1999/
0599issue/0599bosak.html.
FGDC. Content Standard for Digital Geospatial Metadata,
http://www.fgdc.gov/metadata/contstan.html.
Microsoft. Internet Explorer 5 download, http://www.micro
soft.com/windows/ie/download/windows.htm.
Microsoft. XML Developer’s Guide, http://msdn.microsoft.
com/xml/XMLGuide/default.asp.
Microsoft. XSL Developer’s Guide, http://msdn.microsoft.
com/xml/XSLGuide/default.asp.
The XML Files (XML tutorials), http://www.webdeveloper.
com/xml/.
World Wide Web Consortium (W3C). Extensible Markup
Language (XML) 1.0, http://www.w3.org/TR/REC-
xml.html.
XML.COM (XML news reports), http://www.xml.com/xml/
pub.
XML for Fun and Profit (XML overview), http://etext.virgi-
nia.edu/helpsheets/xml-basic.html.
XML Resource (XML-related links), http://www.kric.ac.kr/
�wslee/xml/xml_resource.html.
XML Repository (XML-related links), http://xmlrepository.
com/.
XML Software (reviews and downloads), http://www.xmlsoft
ware.com/.
S.W. Houlding / Computers & Geosciences 27 (2001) 839–849 849