avenues for developing the uk’s national geospatial metadata service

Upload: makreal

Post on 14-Apr-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    1/37

    1

    Title: Avenues for developing the UKs National Geospatial Metadata Service

    Authors:

    James K. Batcheller

    [email protected]

    Bruce M. Gittings

    [email protected]

    Institute of Geography

    School of GeoSciences

    University of Edinburgh

    Drummond Street

    Edinburgh EH8 9XP

    Tel: +44 (0) 131 650 2558

    FAX: +44 (0) 131 650 2524

    Corresponding author

    Abstract:

    The state of public sector geospatial data sharing and exchange in the UK, as facilitated

    by the gigateway service, is currently at a crossroads. Ambiguities surrounding its

    purpose, direction, funding and custodianship continue to persist in the face of

    increasing demands placed upon the service, such as legal requirements (INSPIRE, PSI)

    and rising user expectations. A well-defined strategy addressing the political,

    commercial and technological considerations involved in advancing the service is

    therefore needed if these uncertainties are to be countered and demands met. The

    current work aims to provide for the technical aspects of such a strategy by considering

    potential avenues for development. Accordingly, proprietary and open source

    approaches are examined in the context of facilitating metadata publication (production,

    integrity, delivery), enhancing the service infrastructure (interoperability, future-

    proofing) as well as addressing end-user considerations (data visualisation, data access).

    The resulting roadmap outlines a technical evolution of gigateway, proposing a service

    better equipped to face the challenges of both the present and the future.

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    2/37

    2

    Keywords: gigateway, geospatial metadata, metadata service, SDI.

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    3/37

    3

    Introduction

    The advent of the World Wide Web (WWW) and the Internet has revolutionised how

    all kinds of information can be accessed and exchanged - geospatial information no less

    than other forms. From modest beginnings as point-to-point transfer via FTP1

    and e-

    mail, through the origins of customised interactive web-based mapping, as seen in the

    postings of Xeroxs Palo Alto Research Centre (PARC) in 1993 (Putz, 1994; Harder,

    1998) to distributed online metadata services and clearinghouses offering catalogues of

    records detailing geospatial dataset attributes and how to procure them, and geospatial

    one-stop shops offering an integrated access point to disparate geospatial data resources,

    widespread data dissemination is currently driven as never before. Sourcing, accessing

    and retrieving data for analysis and display have been made easier, with implications for

    public, private and academic sectors ranging from the stimulation of intellectual

    endeavours, improved data management practices and enhanced visibility of potentially

    marketable geospatial products.

    In the public sector, efforts have been given further impetus through the introduction of

    legislation at both national and international level. In the United States for instance,

    President Clintons Executive Order 12906 (1994)2

    demanded the creation of a

    coordinated National Spatial Data Infrastructure (NSDI) to support public and private

    sector applications of geospatial data with a key goals of avoiding wasteful

    duplication of effort and promoting effective and economical management of

    resources. More recently, European Union directives such as the sharing of Public

    1File Transfer Protocol

    2 http://govinfo.library.unt.edu/npr/library/direct/orders/20fa.html

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    4/37

    4

    Sector Information (PSI, 2003) and the INfrastructure for SPatial InfoRmation in

    Europe (INSPIRE, 2004) have formalised requirements that member states facilitate

    location of and access to geospatial assets for the purpose of formulation,

    implementation, monitoring and evaluation of Community policy-making3.

    Such has been the perceived worth of web-enabling geospatial holdings that the

    forerunning national initiatives have in recent times been augmented by local, regional

    and international schemes, as well as those in the private and academic sectors (Guptill,

    1999; Tulloch and Robinson, 2000; Higgins et al., 2003). Prime examples include the

    UKs public sector geospatial metadata portal gigateway, its academic counterpart Go-

    Geo!4, Environmental Systems Research Institutes (ESRI) Geography Network

    5and

    the Federal Geographic Data Committees (FGDC) National Geospatial Data

    Clearinghouse6

    (precipitated by Clintons Executive Order).

    The benefits of web-enabling data assets are nevertheless not without their own

    particular problems. Questions as to whether users can effectively find quality,

    compatible and appropriate data for their needs are balanced by resource,

    implementation and maintenance issues for data providers. Additional complications

    arise on consideration of the political issues involved in supporting a geospatial data

    sharing initiative, particularly in governmental sectors. Concerns as to where service

    ownership lies, its strategic goals, its sources of revenue, how it is promoted and who

    3http://www.ec-gis.org/inspire/

    4http://www.gogeo.ac.uk/

    5http://www.geographynetwork.com/

    6 http://clearinghouse1.fdgc.gov/

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    5/37

    5

    constitute the target community are just some of the factors which impact upon a

    services performance.

    Such are the challenges that currently face the UK's national geospatial data sharing

    initiative gigateway. With the rapid and ongoing evolution of spatially aware software

    and services offered over the Internet, it can be reasoned that end-user expectations

    have also evolved, arguably passed what the service can currently offer. From a data

    provider's perspective, active participation is arguably driven more by the desire to be

    seen to contribute or through some form of compulsion (e.g. contractual obligations,

    mandates from a higher authority, legislation) than the recognition of potential benefits

    that may be accrued. As for the gigateway service itself, it is currently at a crossroads.

    Ambiguities surrounding its purpose, technological expectations, ongoing source of

    funding (as currently enshrined within the NIMSA7

    agreement), coupled with doubts as

    to whether the Association for Geographic Information (AGI) shall continue to act as

    custodian have led to the national geospatial metadata service facing a somewhat

    uncertain future.

    It is in this light that the timeliness of a re-examination of how public geospatial

    metadata is published in the UK via the gigateway service is argued. If confidence in

    the service is to be maintained, particularly amongst those on whom gigateways

    ongoing success is dependent (i.e. the contributing community), it is crucial that the

    7The National Interest Mapping Services Agreement a contract between the Office of the Deputy Prime Minister

    (ODPM now Department of Communities and Local Government) and the Ordnance Survey (OS) under which the

    ODPM funds, or part funds, (mapping) activities that meet established criteria for being in the national interest (NIMSA

    Review Group Report, 2004).

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    6/37

    6

    initiative be seen to move forward with vision and purpose. A well-defined strategy

    addressing the political, commercial and technical considerations involved in advancing

    gigateway is therefore imperative if the investment and goodwill already accrued by the

    national geospatial metadata service is to be maintained. It is the aim of the current

    work to provide a basis for the technical aspects of such a strategy by analysing the

    current service, identifying improvement opportunities and elaborating potential

    development paths. Each stage of the geospatial metadata lifecycle, from production to

    publication and beyond, is consequently investigated, with the goal of eliminating,

    circumventing or diminishing barriers to metadata delivery.

    Background

    Metadata

    The increased availability of geospatial computing technologies has not only fed the

    demand for geospatial data with which to perform required analyses (Guptill, 1999;

    Deng, 2002), it has resulted in large volumes of such data being produced - not only by

    GIS professionals and organisations, but also by those not traditionally considered as

    geodata producers (Schweitzer, 1998; Mathys, 2004). As data are clearly critical to the

    functioning of GIS, enough so to be referred to as its fuel (Vermeij, 2001; ESRI, 2002),

    this surfeit could be viewed positively. Nevertheless there are complications. As Tsou

    (2002) observes, the storage and management of geospatial data are in themselves major

    challenges. How data are located in what can amount to a needle in a geospatial

    haystack; whether such geodata, once if located, are fit for the desired purpose;

    whether they are compatible, up-to-date and of sufficient quality, all impart their own

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    7/37

    7

    particular issues, even without contemplating data accessibility, copyright, licensing,

    potential procurement costs and training.

    Regardless of information medium or application domain, it is clearly important to

    document data assets so as to facilitate efficient storage and management (Gbel and

    Lutze, 1998). Geospatial data are documented by metadata, or data that describes data

    (Hart and Phillips, 2001; Vermeij, 2001; Tsou, 2002; Hobona et al., 2004). Just as

    geospatial data are abstractions of the real world, for requirements such as analyses and

    representation, geospatial metadata are similar abstractions of the data itself. Used not

    only to describe a range of dataset attributes, metadata also assist in the location,

    evaluation, comparison, access and exploitation of geographical datasets (Luo et al.,

    2003; OGC, 2005).

    The gigateway metadata service

    Arising from several predecessors, most notably the National Geospatial Data

    Framework (NGDF) and askGIraffe, gigateways raisons dtre remain that of its

    forerunners: to increase the use of geospatial data; to facilitate development of markets

    for data and services; and to future proof investments and enhance decision-making

    through use of better information (Gigateway, 2003). The service works towards these

    objectives through the support of a distributed web-based network, focussed on serving

    discovery metadata a subset or profile of a more elaborate metadata standard,

    designed to provide a means of identifying where the data described might be found.

    Users query metadata through a web-based form on a central portal (see Figure 1.) using

    keywords and geographical extents. Queries are then transmitted to the clients (nodes)

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    8/37

    8

    of participating organisations, which execute searches on indexed metadata. Results are

    returned to the central gateway (portal) where they are collated and sent to the users

    browser. Retrieved metadata specify where the original data may be located.

    Figure 1. The distributed gigateway service architecture.

    Context and rationale

    The initial tenet of the NGDF was as a fully-fledged NSDI (Davey and Murray, 1996),

    but budgetary constraints, the lack of integrated GI-centric solutions and the need for

    progress led to the identification of a National Metadata Service as the priority technical

    deliverable8. The first tangible service created was askGIraffe in mid 2000, based on a

    distributed search standard developed by the library community and manifested in the

    8 A fully-fledged NDSI is being revisited through the UK GI Strategy developed in 2006.

    Browser

    CentralPortal

    Z39.50Search Engine

    Z39.50Search Engine

    Z39.50Search Engine

    MetadataIndexes

    MetadataIndexes

    MetadataIndexes

    Metadata Management Systems

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    9/37

    9

    freely available Isite9

    package previously deployed successfully in the US by the

    FGDC. While Isite still forms the core of gigateway, a number of proprietary and open

    source solutions have since become available, some of which were developed

    specifically with the geospatial community in mind. Consequently, some of the

    technical barriers to developing the UKs initiative beyond the basic metadata service

    currently in operation have been removed.

    Despite these circumstances and the resources afforded to the service since its inception,

    there has been a notable deficit of comprehensive analyses aimed at reviewing the

    technical options open to what has now evolved into gigateway. The deficit may be

    considered even more curious in light of the aforementioned PSI and INSPIRE

    directives, which are predicted to place of significant demands on member states

    including the provision of metadata services (Rackham, 2004). Furthermore, as one of

    INSPIREs goals is the establishment of an EU-wide data framework based upon the

    SDIs of member states, there is a clear need to consider the technological options

    relating to not only how gigateway may be moved forward, but also to what can be done

    to address some of the challenges it currently faces.

    The gigateway metadata publication workflow

    The provision of geospatial metadata sustains gigateway; the continued success of the

    service therefore relies on those who use it to publish their metadata records. To

    safeguard existing contributions and attract new ones, perceived or actual barriers to

    participation must be addressed. Currently the path from metadata production to

    9 http://www.awcubed.com/Isite/index.html

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    10/37

    10

    publication is characterised by a series of distinct steps punctuated by extensive human

    intervention (Figure 2.). Whilst human input is important in assessing quality,

    opportunities for increased automation certainly exist, speeding the publication process

    and hence removing an obstacle to metadata contribution.

    geospatial

    dataset

    detailed

    metadata

    metadata

    repository

    discoverymetadata

    localgigateway node

    remotegigateway node

    create, update

    document,

    update

    document,

    update

    store

    subset

    subset, retrieve

    store

    host post

    index,

    publish

    Figure 2. The gigateway metadata publication workflow. Metadata are created /updated on creation / update of datasets. Datasets may be internally documented bydetailed or discovery metadata, but only discovery metadata are indexed and exposedto the gigateway service. A formal metadata repository may be used but is notcurrently assumed to be exposed for query directly.

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    11/37

    11

    Gigateways communication infrastructure

    The ISO Z39.5010

    communication protocol, embodied in Isite, remains central to

    gigateway. Influenced at the time by its success in the US and low cost of

    implementation, the choice of Isite was also motivated by the lack of any workable

    alternative. Subsequent developments have however seen an increase in the number of

    commercial and non-commercial solutions. These potentially offer the opportunity to

    reinvigorate the service, thus enhancing the number of metadata records available, as

    well as providing a future development path. Accordingly, means for advancing the

    service are examined in the context of metadata publication (production, integrity,

    delivery), the service infrastructure (interoperability, future-proofing) as well as end-

    user considerations (data visualisation, data access).

    Metadata characteristics

    The success of the service (or indeed any service which depends on metadata) relies on

    three critical aspects: quality, quantity and accessibility. Metadata quality refers not

    only to whether a metadata record is manifested in a way that is compliant with a

    specific standard (and hence is exchangeable) but whether it is unambiguously

    indicative of the dataset it depicts, is complete and up-to-date. Consistent provision of

    quality, fit for purpose metadata helps to assure user confidence in the service,

    providing impetus for return visits and in turn enhancing its reputation (Rackham,

    2004).

    10ANSI/NIS Z39.50-1995 Information Retrieval (Z39.50): Application Service Definition and Protocol Specification.

    Also known as the OGC Web Catalog Services protocol Version 1 or ISO 23950

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    12/37

    12

    For a service to be of any utility, the quantity of metadata records offered should meet

    users expectations. A paucity of records provides little motivation to use the service, as

    chances of locating appropriate data will be low.

    Metadata records are of minimal utility if they are not accessible, regardless of quality

    or quantity.Metadata accessibility in this context not only relates to the ability to locate

    and retrieve the desired items, but that they are presented in a consistent format and

    conform to employed standards. A combination of a well designed user interface and

    effective underlying search engine are necessary to ensure that the user is presented

    with the best-fit records, ordered appropriately. Metadata that users find complicated or

    time-consuming to locate, access or understand will do little to popularise the hosting

    service.

    The aforementioned factors are clearly inter-related. A vast quantity of metadata is

    pointless in the absence of assurance of quality, whilst a restricted set of high quality

    metadata is of limited value11

    .

    Development approaches

    Metadata generation

    The perception of metadata generation as being a tedious, expensive or unnecessary

    drain on time and resources presents a significant obstacle to the production process -

    even where the need for quality metadata is recognised. Streamlining the overall process

    would serve to alleviate such concerns and help counter the human bottleneck.

    11 Gigateway Advisory Group Meeting, 17th November 2004: http://www.gigateway.org.uk/aboutus/aboutus.html

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    13/37

    13

    The ability of modern GIS packages to handle metadata has enabled tighter integration

    of data editing and metadata composition into standard workflows. Once completed,

    metadata either resides with the data (easing the management and update of both), or it

    is copied to a central organisational repository or database. Metadata destined for

    exposure via gigateway should comply with the UK GEMINI12

    standard a profile of

    the ISO standard 19115/19139 Geographic Information: Metadata and the UKs e-

    Government Metadata Standard (eGMS). Currently, metadata stored in most GIS

    packages would need to be manually copied into an appropriate metadata editor (e.g. the

    gigateway-sponsored MetaGenie13

    ), and / or manually augmented to achieve

    compliance. Preparation of discovery metadata thus represents at least a duplication of

    effort, as record elements existing elsewhere must be re-entered. The consequent

    requirement to populate even the minimum required fields manually clearly tends

    toward the tedious.

    If geospatial datasets are created, manipulated and documented in proprietary GIS

    software, then the development environments included within such packages can be

    leveraged to programmatically populate metadata elements gleaned from the users

    computing environment on dataset creation or update. Completed metadata can then be

    output, validated automatically, complemented with human-mediated quality control

    measures and exported for eventual publication on an organisational, sectoral or

    national portal.

    12GEo-spatial Metadata INteroperability Initiative

    13 http://www.gigateway.org.uk/metadata/metagenie.html

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    14/37

    14

    As the UKs current market leading GIS software used extensively across the public,

    private and academic sectors, ESRIs ArcGIS suite is an obvious candidate for a

    solution, oriented towards the gigateway service. In an approach similar to that outlined

    by Vermeij (2001), the ArcCatalog component of ArcGIS can be tailored using of a

    custom metadata editing screen. Metadata elements displayed for completion are

    dictated by entries contained in an XML Stylesheet (XSL) conforming to a detailed

    metadata standard (ISO, eGMS). Mandatory GEMINI fields can be made compulsory to

    ensure that metadata later extracted for publication purposes comply with gigateways

    discovery format.

    Metadata items may be automatically populated through the programmatic

    interpretation of dataset elements and system variables (inherent metadata),

    complemented by pre-prepared metadata templates for commonly-used values (author

    metadata) and completed manually by the metadata creator (descriptive metadata,

    necessitating human intervention). The conceptual steps are outlined in Figure 3.

    Completed metadata can then be validated against an appropriate schema that checks

    compliance and verifies that all mandatory elements are populated. Thus what was once

    was a time-consuming endeavour for the metadata creator can be reduced to a limited

    authoring step and performing quality control.

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    15/37

    15

    extract inherent metadata from dataset

    +

    complement with pre-defined author metadata

    +

    complete with descriptive metadata

    =

    a minimum set of mandatory fields

    Figure 3:Stages of automating metadata production. Elements requiring userinput are reduced to those of recyclable author metadata and dataset specificdescriptive metadata.

    Metadata integrity

    Once preparation of standard compliant metadata is complete, the question of

    management arises. For a contributing organisation, the importance of aligning the

    provision of metadata to gigateway with internal metadata services is critical to ensure

    the initiatives continuing success14

    . Given the range of contributing organisations, this

    alignment is not trivial: with their own particular internal procedures, resources and

    guidelines, it is not surprising that storage techniques diverge from one organisation to

    the next (Tyler, 2002). Metadata can be stored alongside the data they describe,

    facilitating easy update; detached from the data within a DBMS in order to take

    advantage of inherent data management features; as text-based files to enable upkeep

    via simple text editors, or in any combination of the aforementioned. Additional

    difficulties appear as metadata are infrequently authored or edited where they are

    exposed, resulting in multiple metadata instances embodied in one or more standard.

    Here, metadata must not only be copied to where they are indexed and exposed but also

    transformed to conform to discovery metadata specifications. No matter the scenario,

    14 Gigateway Advisory Group Meeting, 17th November 2004: http://www.gigateway.org.uk/aboutus/aboutus.html

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    16/37

    16

    redundancy results in potential loss of integrity and the formation of discrete

    information silos which suffer from update latency, requiring cascading updates and

    related version control management.

    Metadata integrity issues can be addressed by migrating storage in its entirety to the

    database paradigm. By merging multiple metadata instances into one database

    repository, or a formal distributed database, potential sources of inconsistency are

    eliminated, while providing a secure, robust and manageable storage solution. With

    most GIS vendors offering DBMS-driven solutions, metadata composition could be

    closely integrated within data editing workflows.

    Access to database-held discovery metadata necessary for participation in gigateway

    can be achieved in two principle ways. Where organisations wish to exercise the full

    benefits of formal database management (Date 2003), metadata can be exposed directly,

    although this will involve the provision of a Z39.50 interface15

    , with possible

    performance implications. Exporting database-held metadata as text files which may

    then be exposed remains more straightforward once the relevant database record is

    updated, a new file is exported, indexed and made available to the Z39.50 service as

    normal.

    For organisations wishing to maintain current management practices based on a range

    of unconnected tools, metadata integrity problems and data silos may be addressed

    using a system based around formal synchronisation (Figure 4.). Developing the

    15For example Compusults MetaManager Toolkit, ESRIs ArcIMS Metadata Service and Intergraphs GeoConnect

    Metadata Management Server

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    17/37

    17

    approach of Dunfey et al. (In press), a highly formalised procedure, a synchronisation

    file which acts as a road map for the system or synchronisation daemon provide

    means of reconciling otherwise unconnected metadata instances. However, complexities

    associated with the synchronisation of multiple files would suggest that a way forward

    based on a centralised DBMS is preferable.

    Node hosting

    Organisations can contribute metadata to gigateway by transferring records to an

    existing node (e.g. gigateways centrally managed repository) or by exposing them on a

    node of their own. A distributed service architecture, where organisations are

    Figure 4:Metadata synchronisation. A metadata master copy isupdated / created and synchronisation is initiated. Pre-existing metadatais updated or overwritten according to storage strategy employed; newinstances are imported or copied. Discovery metadata can be directlycopied or exported to the gigateway node; detailed metadata must firstbe transformed.

    synchronisationfile gigateway

    nodemetadatasource(s)

    import

    update

    export

    transform

    copy

    overwrite

    copy

    transform

    copy / export

    transform

    create

    update

    flat-file storage

    database storage

    query / response

    daemon

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    18/37

    18

    encouraged to host their own node, was always a design goal aimed to foster a sense of

    proprietorship and participation amongst contributors (Nanson et al., 1995). Despite this

    encouragement, there remain institutions with significant data holdings that cite

    internal political problems and technical issues16

    as the cause of their inability or

    unwillingness to host a node. While surmounting these political obstacles may well

    pose the greater challenge, options do exist to address the technological concerns

    relating to node installation and maintenance.

    Currently, mounting a node involves the installation and configuration of a number of

    distinct software packages. Perceptions that the process requires a high-level of

    expertise result in setup being left to IT departments, outsourced to consultancies or

    indefinitely postponed where financial resources are insufficient. Adoption of the

    solutions proposed here will further exacerbate this. To circumvent these problems, the

    necessary components can be bundled into an automated installation, empowering non-

    specialists to easily setup and configure contributory nodes. A barrier to contribution

    amongst potential contributors can thereby be lowered and provide for the exposure of

    previously untapped geospatial resources. Nevertheless, important preconditions such as

    service level agreements and quality guarantees should be enforced to prevent against

    casual participation which could negatively impact upon the gigateway service and

    users confidence in it.

    This model does not suit all however organisations may not have sufficient numbers

    of metadata records to justify contributing in such a way, they may not have the

    16 AGI gigateway Advisory Group Meeting Minutes, 18th May 2005: http://www.gigateway.org.uk/aboutus/aboutus.html

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    19/37

    19

    resources to install and maintain the required hardware and software, or they may

    simply be unwilling for reasons of cost, effort and so on. While transferring hosting

    responsibility elsewhere may get round local issues of node maintenance and the related

    costs, what will result is a further disconnect between the metadata and the data they

    describe.

    Hosting by proxy

    Participating organisations opting not to host their own metadata may expose their

    holdings from nodes mounted elsewhere e.g. the central repository managed by the

    gigateway service. Submission may be by bulk transfer (e-mail, FTP, CD/DVD); those

    choosing the gigateway repository have the further option of submitting via the

    MetaGenie online editor. This comprises a web-based form which is completed to

    describe each dataset, generating records which still need to be manually processed.

    Regardless of approach, resources are necessary to assure the metadata is appropriate

    for publication.

    To counter these manual processing requirements and consequent update latency

    concerns, an automated metadata harvesting facility could be introduced to the metadata

    generation-publication workflow, using for instance the library communitys Open

    Archives Initiatives Protocol for Metadata Harvesting (OAI-PMH)17

    . Standing in

    contrast to the approach employed by Z39.50 solutions, OAI-PMH retrieves metadata in

    bulk into a central repository. Conceptually it may be viewed as substituting one node

    type (Z39.50) for another (OAI-PMH), but as there are little maintenance overheads

    17 http://www.openarchives.org/

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    20/37

    20

    aside from creating a web accessible folder, the approach may mitigate some concerns

    associated with its management. Moreover, as the protocol is HTTP-based, no

    additional configuration and security measures are necessary beyond those necessary

    for standard web servers (Amin, 2003). Using this as a method for contribution,

    participating organisations can at will deposit validated metadata into the web

    accessible folder from where they will be automatically harvested no dialogue need be

    opened between supplier and host.

    Update latency concerns meanwhile can be alleviated by scheduling frequent harvests.

    Furthermore, as long as metadata quality, validity and adherence to GEMINI can be

    assured, processing resources at the host site are spared and metadata can be exposed

    immediately.

    Metadata currency and quality

    The role of providers does not end with metadata submission they are responsible for

    ensuring that their metadata continue to accurately reflect the associated data. Within

    the GEMINI standard, currency is partially catered for through the Date of update of

    metadata element. There is an argument that, no matter how frequently a dataset is

    revised indeed, if at all a regularly maintained Date of update of metadata field

    confers confidence in the currency of the metadata record. For static or infrequently

    updated datasets however an oldDate of update of metadata is likely to suggest that the

    data asset is outmoded and therefore less useful. Given that it would be inappropriate to

    update the Date of update of metadata field where a review has been undertaken, but no

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    21/37

    21

    actual update has taken place, this highlights the need for aDate of metadata reviewed

    field, currently absent from GEMINI.

    Evidence from observations of the service18

    nevertheless suggest that such elements are

    mostly ignored by producers after publishing their metadata. This could be tackled

    using a regular automated email notification mechanism based upon the Date element

    andEmail address of distributormetadata fields. A quality stamp19

    associated with each

    metadata item would complement this and enhance user confidence. Providing a system

    by which metadata can be rated for quality, either independently or by user feedback,

    would allow records to be evaluated at a glance as well as place an onus on contributors

    to maintain metadata quality, thus assuring their reputation. Additionally, using the

    Date elements as criteria for evaluating quality provides impetus to distributors to

    review and maintain records on a systematic basis.

    Data access

    Complications relating to metadata not accurately reflecting the underlying data extend

    beyond contributors, affecting the end-users of the service. There is a twofold problem

    do the records returned unambiguously represent the data sought, and if so, how are the

    data obtained? Whilst the some standards (e.g. gigateways Discovery Metadata

    Specification, the forerunner to GEMINI) attempt to address these concerns by

    providing a sample field (containing a visual representation of the data) as well as the

    18AGI gigateway Advisory Group Meeting Minutes, 17

    thNov 2004: http://www.gigateway.org.uk/aboutus/aboutus.html

    19Guidelines for creating gigateway approved metadata exist but the proposed quality stamp has yet to be effectively

    associated with hosted records. Further details may be found in:

    http://www.gigateway.org.uk/metadata/downloads/Gigateway_metadata_guidelines_ukgemini.pdf

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    22/37

    22

    contact details for the data distributor, neither fully address this issue. Evidence from

    existing records suggests that the sample field is rarely used, arguably as it requires

    more effort. Similarly, supplier contact details may not guide the user directly to the

    data even when traditional contact routes (telephone, FAX, postal address) are

    supplemented by a web URL20

    . The latter typically signposts the distributors

    homepage, where the data must be again be located. The degree of separation between

    data and metadata consequently disrupts workflow efficiency for the prospective user

    and can result in the ordering of an inappropriate product.

    Providing an efficient means of accessing a more current representation of the data prior

    to procurement will go some way towards alleviating this problem. The UK GEMINI

    standard presents an improved treatment for visualisation by providing a field21

    for a

    URL pointing directly to a representation of the data, not currently exploited by

    gigateway. Whether licensed or freely available, presenting a shop window for data

    via a live preview enhances the probability that the data shall be pursued.

    Complementing this with a facility for immediate download will boost workflow

    efficiency as well as help realise one the basic objectives of gigateway to promote

    geospatial asset exchange.

    The use of theBrowse graphic element could be extended to contain a URL pointing to,

    for instance, and OGC22

    -compliant Web Feature Server (WFS) such that custodians of

    unlicensed data have an opportunity to deliver the actual data via the same means as a

    20Uniform Resource Locator

    21The Browse graphicelement

    22 Open Geospatial Consortium

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    23/37

    23

    visualisation. For the provider, resources required to administer such a service are offset

    by the time spared from fulfilling data requests; for the user, waiting times associated

    with data procurement are significantly reduced.

    Providing access to commercial data clearly requires the inclusion of a transaction

    model, which takes account both of direct payment and those subject to service-level

    licensing agreements (Figure 5.). Visualisation can be permitted via an OGC-compliant

    Web Map Service (WMS), which renders a picture of the data, and not the data itself.

    Subscribing organisations could download the data following a secure login, enabling

    retrieval in volumes or units dictated by the licensing model agreed. Individual users

    can be catered for through solutions provided by Internet Payment Service Providers

    (IPSPs e.g. PayPal).

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    24/37

    24

    Underlying architecture

    While provided in the context of the architecture of gigateway, none of the solutions

    elaborated herein are considered tightly-bound to the current service infrastructure. The

    reality is that all geospatial data-sharing initiatives based on metadata will encounter

    similar issues regarding metadata authoring, quality, currency, accessibility and

    integrity. With this in mind, the beginnings of a more extensive overhaul of gigateway

    can be contemplated.

    Since the rollout ofaskGIraffe, the UK metadata service has exclusively employed the

    Z39.50-based Isite. Flexible and efficient for near transparent querying of multiple

    metadata repositories, it is nevertheless argued by some that the Z39.50 protocol is

    useruser / corporate

    account

    transaction processorWMSWFS

    metadata

    dataset

    approve transaction

    license agreement

    account creation

    visualise,download visualise

    procure

    accountverification

    Figure 5:Adding data visualisation, access and transaction support togigateway. OGC-compliant Web Feature Services (WFS) providevisualisation and access to free and purchased data. Web Map Services(WMS) provide a means of visualising licensed content withoutproviding access to the underlying data. Purchased vector data can bedownloaded as a feature set via WFS, imagery can be downloaded incompressed file format.

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    25/37

    25

    functionally limited in a number of ways, particularly when it comes to the

    representation of results (Tsou, 2002). Troll and Moen (2001) question Z39.50s

    ongoing utility given its complexity and interoperability handicaps, while different

    flavours provide varying degrees of support for spatial searching and its ability to scale

    is also called into question (Medyckyj-Scott et al., 2001; Amin, 2003). Rocha and

    Henriques (2004) meanwhile argue that the changing face of geographical information

    services, with increased demand for mobile solutions, real-time, data-ready applications

    and the long-term aim of data retrieval in the absence of human mediation dictates the

    adoption of a different paradigm.

    The emerging OGC Catalogue Service Specification 2.x (OGC, 2005) aims to provide

    for such a different paradigm. Adhering to the trend in which the development of

    geographical information technologies continue to be more closely aligned with the

    mainstream IT industry and interoperability efforts (Higgins et al., 2005), the

    Specification details an open, standard interface that enables diverse but conformant

    applications to perform discovery, browse and query operations against distributed and

    potentially heterogeneous catalog servers23

    . Defining a number of communication

    protocols (bindings) based on CORBA, HTTP and a new iteration of Z39.50, adherence

    to the Specification enables creation of custom applications through the use of

    application profiles. Interoperability between different bindings is enabled through the

    use of a minimal abstract OGC_Common Catalogue Query Language, providing further

    support for spatial query constructs including DISJOINT, INTERSECT, WITHIN and

    OVERLAP (OGC, 2005).

    23 OGC Press release http://www.opengeospatial.org/press/?page=pressrelease&prid=188

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    26/37

    26

    Despite outlining a more sophisticated, yet open, treatment for geospatial resource

    discovery, the Catalogue Service Specification 2.x remains an abstract specification

    with few well-tested or mature implementations - the communication protocol

    predominantly relied upon remains a legacy version of Z39.50. OAI-PMH is a notable

    exception, but is promoted as a complementing rather than an alternative technology

    (Breeding, 2002). Commercially-developed solutions24

    meanwhile do provide

    sophisticated alternatives that integrate data storage, querying, middleware, desktop and

    Web clients into a coherent software stack, but concerns relating to cross-platform

    support and community acceptance may preclude their adoption.

    However, the availability of the OGC Geospatial Portal Reference Architecture (OGC,

    2004, Figure 6.) provides a new basis for commercial and open source solutions. The

    architecture offers specifications which allow a core system to be implemented, for

    example the GeoNetwork Metadata Catalogue Server, a collaborative development

    effort led by the FAO, UNEP and WFP25

    . Implementing the architectures portal and

    catalogue components, GeoNetwork continues to be based on Z39.50, thus offering the

    potential for incorporation within or replacement of the current gigateway architecture.

    Whilst not offering a departure from the protocol as espoused by Tsou (2002) or Rocha

    and Henriques (2004), its open, modular architecture provides scope to replace the

    communication protocol as laid out in the OGCs Catalogue Service Specification, as

    well as allowing interoperability with national and international schemes, as

    propounded by the INSPIRE directive.

    24For instance ESRIs GIS Portal Toolkit and products from MapInfo and Intergraph

    25 Food and Agriculture Organisation, the United Nations Environment Programme and World Food Programme

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    27/37

    27

    Further considerations

    Prospective efforts to reinvigorate the gigateway service will predictably be fraught

    with difficulty. Future visions of how the service is manifested aside, questions as to the

    prudence of jeopardising a long history of investment in the current technology,

    infrastructure and expertise certainly arise. Even with its oft-perceived limitations, the

    current infrastructures track record is proven within the UK context, underpinning what

    remains a popular and dependable service. Considering the diversity of gigateways

    stakeholders and the resistance to change witnessed in some quarters, strong reasoning

    for any proposed modifications will be necessary. Even if a consensus is forthcoming,

    damage to gigateways reputation could prove fatal if an enhanced service proves

    Portal Servicesviewers

    web query interfacesaccess management

    Data Servicescontent accessdata processing

    Catalog Servicesdata discovery

    service discoverydata querying

    Portrayal Servicesfeatures

    coveragesmaps

    Internet

    Figure 6: The OGCs Portal Reference Architecture (adapted fromGeoNetwork homepage http://193.43.36.138/.) GeoNetwork provides forcore Portal and Catalog Services, into which existing Portrayal Services(e.g. MapServer, GeoServer) and emerging Data Services may beincorporated.

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    28/37

    28

    unreliable or does not live up to user expectations. Of course, initiating and maintaining

    prospective changes in service paradigm are contingent on whether the necessary

    financial and human resources are forthcoming.

    Some of the development paths elaborated above raise further issues. With respect to

    coupling automated metadata generation with dataset editing workflows, the lack of

    open, standard geo-interfaces or Application Programming Interfaces (APIs) across the

    GIS industry currently precludes the creation of a universal solution, thereby

    necessitating the development of package-specific strategies.

    Automating metadata management and submission processes will serve to reduce the

    resources necessary for contribution to gigateway, but do underline the need for quality

    and validation safeguards to ensure that inappropriate records are not exposed on the

    service. Any implementation of the solutions suggested above should therefore be

    supplemented with systematic human-mediated quality control performed by

    appropriately trained users, whether on a spot-check basis or brute force evaluation of

    all metadata items processed. Similar deliberations are necessary if there is to be a

    system supporting the independent accreditation of metadata posted on the service.As

    for quality benchmarks, steps aimed at converging the current gigateway approved

    stamp with international accreditation schemes (such as those guided by ISO) should be

    made to facilitate cross-application compatibility and adoption.

    For organisations with few records, or datasets that change infrequently, manually

    generating, updating and submitting records may well represent the preferred way

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    29/37

    29

    forward. Similarly, preference for retaining some manual control over automated

    processes should not be discounted, particularly for those already with well-defined

    protocols in place or those reluctant to yield control to what may be perceived as a

    black box procedure. In any case, focus should remain on promoting quality

    contribution to gigateway, not the excessive imposition of further layers of complexity

    on the process where it is not wanted nor warranted.

    DBMS techniques would by their nature provide for better management and integrity of

    metadata within the gigateway service. While there is an argument that suggests this

    would significantly add to the complexity of the system, the well-established interfaces

    to DBMS based on SQL (Structured Query Language) should render such components

    appropriately modular. Although issues of cost may be raised as concerns, free and open

    source software (FOSS) such as MySQL and Postgres are viable options.

    Z39.50 has been criticised for its failings in relation to geographical metadata. However,

    the advent of geo-centric extensions, together with more recent developments associated

    with the OGC Catalogue Service Specifications, overcome some of these concerns. The

    ability to access metadata and aggregate search results through more modern protocols

    such as HTTP GET and POST requests integrate metadata access more closely with

    standard web-based systems. The key, for gigateway, is to provide a transparent

    transition from the old to the new.

    Both proprietary and FOSS solutions have been discussed each has its place, with

    their own particular advantages and disadvantages. Proprietary systems can be argued to

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    30/37

    30

    offer stability, less risk and provide buy-in to a ready-made, hopefully well-tested

    product complete with support. Yet they can prove expensive. FOSS can provide a less

    expensive alternative, although are rarely completely free, often requiring specialised

    expertise whether in-house or out-sourced. What is crucial is to ensure the modularity of

    components linked by standardised interfaces such that there should be no dependence

    on either proprietary or FOSS because these components can be readily replaced.

    Additional flexibility can be conferred by providing the aforementioned software

    complete with their source code, whether crafted in proprietary or open environments.

    While universally applicable solutions are presented, enabling access to the inner

    workings of such software will ease integration efforts with incumbent configurations

    that invariably differ between organisations. Moreover, by providing support for

    facilities similar to those of the online open source communities (e.g. SourceForge),

    namely a code repository and a user forum, enthusiastic participants can further

    develop, discuss and distribute provided solutions in a collaborative setting to the

    benefit of the wider participating community.

    While the current work is presented in the context of gigateway and the GEMINI

    discovery schema, it is important that any implemented solution not be tightly bound to

    any one particular standard. The state of flux and delays associated with standard

    stabilisation efforts (GEMINI itself is yet to be finalised), the need to implement

    metadata profile extensions and the emergence of new profiles and schemas all make

    the ability to substitute one standard for another a functional requirement for the

    adopted solutions.

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    31/37

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    32/37

    32

    accessing the data such surrogates depict and to provide linkages to other, similar

    schemes at national and international level.

    Considering the diverse nature of stakeholders involved in gigateway, any decision on

    how to evolve the service will never be based purely on the technological. Indeed, there

    remains an urgent need to resolve the aforementioned political issues and to garner

    consensus amongst both those directing the service and contributing to it not only

    regarding a future direction but also where the funding for service upkeep, improvement

    and potential overhaul shall be sourced from. Fundamental decisions must be made

    relating to the overall objectives of the service and how it should be manifested, such as

    whether it should persist as a metadata service, or whether opportunities presented by

    promising technologies should be taken to broaden gigateways scope, as suggested

    above. Any assessment shall clearly be tempered by a number of considerations.

    Interoperability with other services must remain a critical factor, particularly in light of

    legislative requirements at national and European level. The need to maintain the

    services standing in the face of emerging initiatives more in tune with both contributor

    and consumer expectations is also crucial to avoid perceptions of complacence and the

    resulting implications on numbers contributing to and exploiting gigateway. Whatever

    path ultimately taken, the overall objective should not only be the realisation of a

    service befitting that of an internationally visible initiative, but one that its users view as

    being fit for purpose.

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    33/37

    33

    References

    Amin, S. (2003). The Open Archives Initiative Protocol for Metadata Harvesting: An

    Introduction.DRTC Workshop on Digital Libraries: Theory and Practice, Bangalore,

    India: DRTC.

    Breeding, M. (2002). The Open Archives Initiative.

    Accessed 04-06-2006.

    Date, C. J. (2003). An Introduction to Database Systems, Eighth Edition. Boston, MA:

    Addison Wesley.

    Davey, A. and Murray, K. 1996. Update on the National Geospatial Database -

    Collaboration between Organisations. InAGI 96 Conference Proceedings:Geographic

    Information Towards the Millenium, Birmingham, UK. AGI.

    Deng, Y. (2002). The Metadata Architecture for Data Management in Web-based

    Choropleth Maps.

    Accessed 04-08-

    2006.

    Dunfey, R. I., Gittings, B. M. and Batcheller, J. K. (In press). Towards an Open

    Architecture Vector GIS. Computers and GeoSciences.

    ESRI. (2002).Metadata and GIS. Available from http://www.esri.com/.

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    34/37

    34

    Gigateway 2003.Discovery Metadata Specifications. Available from

    http://www.gigateway.org.uk/.

    Gbel, S. and Lutze, K. (1998). Development of meta databases for geospatial data in

    the WWW. In Proceedings of the 6th international symposium on Advances in

    geographic information systems, Washington, United States. ACM Press.

    Guptill, S. G. (1999). Metadata and data catalogues. In P. Longley, M. F. Goodchild, D.

    J. Maguire and D. W. Rhind, Geographical Information Systems (pp.677-692).

    Chichester: Wiley.

    Harder, C. (1998). Serving Maps on the Internet -Geographic Information on the World

    Wide Web. Redlands, CA: ESRI, Inc.

    Hart, D. and Phillips, H. (2001).Metadata Primer - A "How To" Guide on Metadata

    Implementation. Accessed 04-08-2006.

    Higgins, C., Medyckyj-Scott, D. and Reid, J. (2003). A Community Specific SDI - the

    Case of UK Academia. In Geodaten- und Geodienste-Infrastrukturen - von der

    Forschung zur praktischen Anwendung, Mnster, Germany. University of Mnster.

    Higgins, C., Robertson, A. and McGarva, G. (2005).Edinburgh University Data

    Library Geographic Information Standards: Final Report. Available from

    http://www.edina.ac.uk/.

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    35/37

    35

    Hobona, G., James, P. and Fairbairn, D. (2004). Facilitating Data Discovery In

    Environmental Data Clearinghouses Through Spatial Data Mining. In Proceedings of

    the GIS Research UK 12th Annual Conference, Norwich, UK. University of East

    Anglia.

    Luo, Y., Wang, X. and Xu, Z. (2003). Extension of Spatial Metadata and Agent-based

    Spatial Data Navigation Mechanism. In GIS'03: Proceedings of the 11th ACM

    international symposium on Advances in geographic information systems, New Orleans,

    LA, USA. ACM.

    Mathys, T. (2004). The Go-Geo! Portal Metadata Initiatives. In Proceedings of the GIS

    Research UK 12th Annual Conference, Norwich, UK. University of East Anglia.

    Medyckyj-Scott, D., Chappell, C., Pradhan, A. and O'Hanlon, C. (2001.)A geo-spatial

    data resource discovery tool for UK Further and Higher Education - Project Overview

    and Recommendations. Available from http://www.edina.ac.uk/.

    Nanson, B., Smith, N. and Davey, A. (1995). What is the British National Geospatial

    Database? InAGI 95 Conference Proceedings:Expanding Your World, Birmingham,

    UK. AGI.

    OGC (2004). Geospatial portal reference architecture: a community guide to

    implementing standards-based geospatial portals (OGC Draft Report No OGC 04-039).

    Open Geospatial Consortium.

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    36/37

    36

    OGC (2005). OGC Catalogue Services Specification 2.0.1. (OGC implementation

    specification 04-02lr3).

    Putz, S. (1994). Interactive information services using world-wide web hypertext.

    Computer networks and ISND System, 27, 273-280.

    Rackham, L. (2004).An Independent Review of the Sustainability of a UK Metadata

    Service for Geographically Related Information. Available from

    http://www.gigateway.org.uk/.

    Rocha, J. G. and Henriques, P. R. (2004). Towards XML Web Services based

    Clearinghouses. In Proceedings 7th Global Spatial Data Infrastructure Conference,

    Bangalore, India.

    Schweitzer, P. N. (1998). GIS and Metadata - Putting Metadata in Plain Language.

    Accessed 04-06-2006.

    Troll, D. and Moen, B. (2001).Report to the DLF on the Z39.50 Implementers' Group -

    Moving Towards the Future of Z39.50. Issues and Options Based on ZIG Meeting

    Discussions December 6-7, 2000. Available from http://www.diglib.org/.

    Tsou, M.-H. (2002). An Operational Metadata Framework for Searching, Indexing, and

    Retrieving Distributed Geographic Information Services on the Internet. In Egenhofer

  • 7/27/2019 Avenues for developing the UKs National Geospatial Metadata Service

    37/37

    M. and Mark, D. Geographic Information Science (GIScience 2002): Lecture Notes in

    Computer Science Vol. 2478(pp.313-332). Berlin: Springer-Verlag.

    Tulloch, D. L. and Robinson, M. (2000). A progress report on a U.S. National Survey of

    Geospatial Framework Data.Journal of Government Information 27, 285-298.

    Tyler, G. T. (2002).Managing Metadata: Developing technical solutions for the

    askGIraffe geospatial metadata gateway. Unpublished MSc. Thesis, University of

    Edinburgh, Edinburgh.

    Vermeij, B. (2001).Implementing European Metadata Using ArcCatalog: ArcUser

    July-September 2001. Available from http://www.esri.com/news/arcuser/.