introduction to seadatanet metadata
Post on 13-Jan-2016
50 Views
Preview:
DESCRIPTION
TRANSCRIPT
Introduction to SeaDataNet Introduction to SeaDataNet MetadataMetadata
Roy Lowry
British Oceanographic Data Centre
SeaDataNet Training CourseSeaDataNet Training Course
OverviewOverview
• An introduction to the SeaDataNet metadata formats covering
PurposeEntity definitionHistoryPopulation StrengthsWeaknesses
OverviewOverview
• SeaDataNet metadata formats
European Directory of Marine Organisations (EDMO)
Cruise Summary Report (formerly ROSCOP)
European Directory of Marine Environmental Datasets (EDMED)
European Directory of the Ocean Observing System (EDIOS)
SeaDataNet Common Data Index (CDI)
European Directory of Marine Environmental Research Projects (EDMERP)
EDMOEDMO• Purpose
Provides SeaDataNet with an address book of organisations associated with marine data
Provides descriptions of these organisations
• Entity definition Any group of people sharing a common postal
address engaged in activities associated with marine data acquisition and use
• History Developed by Maris during SEA-SEARCH in
response to a need to improve address metadata management across the project
EDMO EDMO
• Population
On-line Content Management System fronted by a web form (http://www.sea-search.net/organisations/)
Partners are responsible for maintenance of their national record set
Management supported by a reasonably sophisticated access control system that authenticates users and grants access to the appropriate database subset
EDMOEDMO
• Strengths
The maintenance tool. Please use it to look after the entries for your country
Provides a single point of entry for SeaDataNet metadata documents associated with a given organisation
Centralisation of metadata common to other catalogues, replacing four independently maintained address metadata repositories
Rich information content, including descriptions, logos and spatial location information
EDMOEDMO
• Weaknesses
Simple data model is poorly equipped for the management of organisational evolution
Organisations merge, fragment, rename and move
All we can do in EDMO is document this using plain language fields
Text fields contain embedded markup
These look very nice when displayed through the search interface
However, the markup causes problems generating XML documents for record transport between systems
Examples including graphics and relative URLs break when transported by copy/paste
CSRCSR• Purpose
To document the operational and data generation activities of an oceanographic research cruise
• Entity definition
A subject of some controversy I am a metadata purist and support the definition of a
‘cruise’ as the interval of time between leaving port and returning to port
Thus for a 3-leg cruise I would generate 3 CSR records whilst others would generate just one. I do this because:
Combining records is easier than splitting them Cruise ‘legs’ for some ships can be VERY different (e.g. 3
legs of a Meteor cruise: one JGOFS, one OMEX, one WOCE)
Merging ‘legs’ is a slippery slope – I’ve even encountered a single record covering the activities of two ships three months apart
CSRCSR• Entity definition (continued)
Problem with my definition is that the real world creates grey areas. For example, does a personnel change by pilot boat in an estuary count as ‘docking’?
Others, extend the definition to cover any activity collecting oceanographic data (shoehorning) I believe this is a very bad thing to do The activity super-class and other activity sub-classes
are much better described by other metadata standards (e.g. in OGC Observations and Measurements)
Later on in SeaDataNet we could consider incorporating some of these to further enrich our metadata portfolio
In the meantime remember that it is NOT necessary to have every measurement covered by a CSR. If it isn’t appropriate, don’t create one.
CSRCSR• History
Originally a paper form developed by IOC called a ROSCOP
Replaced in 1990 by the Cruise Summary Report with richer content (but the name ROSCOP stuck)
Numerous on-line databases developed during the 1990s
Primary repositories now DOD for SeaDataNet partners and ICES for non-SeaDataNet
CSRCSR• Population
On-line web-form (http://www.sea-search.net/roscop/welcome.html)
XML schema available for bulk transfers
• Strengths
Flexible population mechanisms
Long history with a massive legacy population
Cruise is (or should be) a well defined concept to oceanographers
CSRCSR• Weaknesses
“Parameter” vocabulary
Really a vocabulary describing shipborne activities
No clear equivalent elsewhere for interoperability, but ontological mapping to multiple vocabularies might provide a solution
On-line systems developed using plaintext fields when controlled vocabularies would have made interoperability between repositories more straightforward
Spatial coverage limitations
Coarse-grained
Described using Marsden Squares but BODC has deployed a Web Service to convert these to ISO19115/DIF standard bounding boxes
EDMEDEDMED• Purpose
To describe marine environmental datasets to promote their discovery
• Entity definition
A dataset, but what is a dataset?
ISO19101 defines a dataset as ‘an identifiable collection of data’ which covers everything from the parameters measured on a single water sample to the 7,500,000 CTDs is the USNODC World Ocean Database
Sound judgement is needed to decide upon appropriate granularity
Best approach is to establish objective criteria
Worth remembering that a measurement may be included in more than one dataset
Posing this question to metadata specialists can provide good sport!
EDMEDEDMED• History
Developed by BODC in late 80s
Adopted by EU MAST Data Committee, then SEA-SEARCH and now SeaDataNet
• Population Form interface to stand-alone Access database that
is submitted to BODC for ingestion
XML schema available for bulk transfers
• Strengths Content quality controlled on ingestion, therefore
standards are high
Rich content developed during SEA-SEARCH
EDMEDEDMED
• Weaknesses
Developed in splendid isolation, including vocabularies, therefore interoperability with other systems is difficult
Heavy dependence on plaintext fields: a problem that should be addressed during SeaDataNet
EDIOSEDIOS• Purpose
To describe marine environmental datasets comprising data that are collected repeatedly, regularly and routinely in order to promote their discovery (initially for operational planning purposes)
• Entity definition
A dataset comprised of data that are collected repeatedly, regularly and routinely, but what is a dataset (c.f. EDMED)?
• History
Developed as an EU project led by EuroGOOS
Inherited by SeaDataNet
EDIOSEDIOS• Population
Currently an issue There is a Word-based form (the MIF)
– Developed in parallel to the data model and database with no evidence of communication
– Completed MIFs entered into the database at BODC, requiring significant interpretation and information rehashing (long and painful process)
SeaDataNet work in progress
– IFREMER/BODC working to produce an XML schema to facilitate large-scale transfer
– Maris/BODC developing a web-form based content management system along the lines of EDMO
EDIOSEDIOS• Strengths
Rich data model based on structured fields with minimal plaintext
Data model includes hierarchical relationships between entities (project one-to-many observing programmes one-to-many measurement series)
Data model includes support for complex spatial objects (polygons not boxes)
Data model is particularly well suited to the description of operational oceanographic systems
EDIOSEDIOS• Weaknesses
At the start of SeaDataNet EDIOS had 17 local vocabularies
Extremely poor content governance
Undergoing replacement with managed SeaDataNet standard vocabularies (6 down 11 to go)
Legacy content has not been systematically quality controlled
EDIOS EDIOS • How is EDIOS different from EDMED?
Both are content standards designed to describe datasets
Any dataset described by an EDMED document could be described by an EDIOS document and vice versa
Once vocabularies have been harmonised and some mappings set up it should be possible to generate an EDMED document from an EDIOS document
Generation of an EDIOS document from an EDMED document will never be possible
EDIOSEDIOS• How is EDIOS different from EDMED?
SeaDataNet convention is to use EDIOS for ‘qualifying’ datasets and EDMED for everything else
EDMED currently has a working population mechanism, but EDIOS does not
Advice to partners
Identify datasets to be described by EDIOS documents, map them to the EDIOS data model (relational schema and Access prototype on BSCW) and gather together the necessary information
Prepare EDMED documents for all other data sets and get them into BODC
Submit EDIOS entries to BODC once the necessary systems are operational
CDICDI• Purpose
To provide an ultra-light discovery metadata description of accessible SeaDataNet data objects
Used to build a manageable fine-grained index of discrete data objects (millions of entries)
• Entity definition
The fundamental SeaDataNet data delivery unit such as a current meter record or a CTD profile
• History
Developed by SEA-SEARCH as a pilot for SeaDataNet
CDICDI• Population
XML schema describing files that should be generated automatically from existing digital indexes
• Strengths
Light content makes efficient handling of large numbers of records possible
• Weaknesses
Light content restricts available information
EDMERPEDMERP
• Purpose
Description of European marine research projects and programmes
• Entity definition
A co-ordinated collection of marine data acquisition activities in Europe
• History
Developed by Maris during SEA-SEARCH
EDMERPEDMERP
• Population
Access form: resulting mdb file submitted to Maris
On-line content management system planned
• Strengths
Provides centralised project metadata
• Weaknesses
Local vocabularies and plaintext
That’s All Folks!That’s All Folks!
Questions or Geoff?
top related