shoaib sufi cclrc e-science centre cclrc scientific metadata (csmd) model april 2004 nesc

25
Shoaib Sufi CCLRC e-Science Centre CCLRC Scientific Metadata (CSMD) Model April 2004 NESC

Upload: ryan-ramsey

Post on 28-Mar-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Shoaib Sufi CCLRC e-Science Centre CCLRC Scientific Metadata (CSMD) Model April 2004 NESC

Shoaib Sufi

CCLRC e-Science Centre

CCLRC Scientific Metadata (CSMD) Model

April 2004 NESC

Page 2: Shoaib Sufi CCLRC e-Science Centre CCLRC Scientific Metadata (CSMD) Model April 2004 NESC

Shoaib Sufi

CCLRC e-Science Centre

Model Motivation

• A common general format/standard for Scientific Studies and data holdings metadata does not exist

• By proposing Model and Implementation:– Form a specification for the types of metadata

studies should captured by Scientific Studies– Ease citation, collaboration, exploitation and

Integration– Allow easy Integration of distributed

heterogeneous metadata systems into a homogeneous (albeit virtual) Platform

Page 3: Shoaib Sufi CCLRC e-Science Centre CCLRC Scientific Metadata (CSMD) Model April 2004 NESC

Shoaib Sufi

CCLRC e-Science Centre

Structure of Metadata Model

• The CCLRC Scientific metadata model (CSMD) is a study-data set orientated model:– Indexing– Provenance– Data Description– Data Location– Access Conditions– Related Material

Page 4: Shoaib Sufi CCLRC e-Science Centre CCLRC Scientific Metadata (CSMD) Model April 2004 NESC

Shoaib Sufi

CCLRC e-Science Centre

What influenced CSMD

• CIP from Earth Observation• DDI from Social Sciences• DublinCore from the Library community

– Publication only metadata• XSIL as used on LIGO

– Low level ‘Scientific Data Objects’ focus• CERA from the MPIM

– A bit specific to Earth Sciences but close• … hence the need to develop out own General

Model – CCLRC Scientific Metadata Model

Page 5: Shoaib Sufi CCLRC e-Science Centre CCLRC Scientific Metadata (CSMD) Model April 2004 NESC

Shoaib Sufi

CCLRC e-Science Centre

some Model aims

• Abstract class orientated description of the types of metadata that should be captured by Scientific Studies

• Create a denominator for Scientific Study metadata which form a specification

• Metadata workshop at NIEES 2002 during a discussion on metadata standards – are people capturing metadata at the moment – simple answer given was no !!

Page 6: Shoaib Sufi CCLRC e-Science Centre CCLRC Scientific Metadata (CSMD) Model April 2004 NESC

Shoaib Sufi

CCLRC e-Science Centre

CSMD Used on DataPortal

• XML Implementation used as Data Interface for DataPortal

• Single view of heterogeneous systems/schemas

• Acts as a stress test of the model– Limitations feed into

Model Requirements– New requirements fed

back into implementation

Page 7: Shoaib Sufi CCLRC e-Science Centre CCLRC Scientific Metadata (CSMD) Model April 2004 NESC

Shoaib Sufi

CCLRC e-Science Centre

Model Breakdown: Provenance

• The Study contains the following metadata:– The Study Name– The Study Institution– The Investigator– Extended Study Information

• Abstract• Funding • Start and End times

– Investigations

Page 8: Shoaib Sufi CCLRC e-Science Centre CCLRC Scientific Metadata (CSMD) Model April 2004 NESC

Shoaib Sufi

CCLRC e-Science Centre

Investigations

• A Study can have more than one investigation; possible enumerations are experiment, simulation, measurements etc. – investigations contain:– Name– Investigation Type– Abstract– Resource– Link to DataHolding

Page 9: Shoaib Sufi CCLRC e-Science Centre CCLRC Scientific Metadata (CSMD) Model April 2004 NESC

Shoaib Sufi

CCLRC e-Science Centre

Topic (for indexing)

• Keywords– Discipline (i.e. domain)– Keyword Source (e.g.

domain dictionary)– Keyword

• Subjects– Discipline– Subject Source (e.g.

domain taxonomy)– Subject

Page 10: Shoaib Sufi CCLRC e-Science Centre CCLRC Scientific Metadata (CSMD) Model April 2004 NESC

Shoaib Sufi

CCLRC e-Science Centre

Access Condition & Related Material

• Access Conditions– Contains a list of users or groups who are

allowed access to the metadata and data, or a pointer to an access control system which contains such data for this study

• Related Material– One or many links and or textual descriptions

of material related to this study e.g. earlier studies or parallel studies

Page 11: Shoaib Sufi CCLRC e-Science Centre CCLRC Scientific Metadata (CSMD) Model April 2004 NESC

Shoaib Sufi

CCLRC e-Science Centre

Data

• Data Description holds a logical description of the Study’s data:– Data Name– Type of Data– Status– Data Topic– Parameters– Related Data Ref– Relation type (e.g.

derived)

• Data Location contains the link between logical name and physical URI’s– Data Name– Locator(s)

Page 12: Shoaib Sufi CCLRC e-Science Centre CCLRC Scientific Metadata (CSMD) Model April 2004 NESC

Shoaib Sufi

CCLRC e-Science Centre

More on Parameters

• Parameters contain a lot of information about the data objects (DO) and collections

• A collection/DO can have many parameter entries, each parameter entry contains:

• Parameter derivation (e.g. measured/fixed)– The value– The units– Range – Error margin

• Parameter aggregation is also supported

Page 13: Shoaib Sufi CCLRC e-Science Centre CCLRC Scientific Metadata (CSMD) Model April 2004 NESC

Shoaib Sufi

CCLRC e-Science Centre

Cardinality Issues

• The model recommends a certain cardinality of elements

• Certain metadata components are necessary for one to have an instance of the implemented model – treating everything as optional is not acceptable

• It is though implementations may modify this more to their needs – model attempts to remain ideal (i.e. most common Cardinality)

Page 14: Shoaib Sufi CCLRC e-Science Centre CCLRC Scientific Metadata (CSMD) Model April 2004 NESC

Shoaib Sufi

CCLRC e-Science Centre

Enumeration Issues

• Enumerations (or controlled vocabularies) e.g. types of investigator, types of institutions; these are distinct from the model e.g. as taxonomies are.

• However they are necessary for the model to work so implementations e.g. CCLRC DataPortal XML implementation of the model propose some enumerations for common things

• Recognised and relevant controlled vocabularies are hoped to be used by implementation where they are available

Page 15: Shoaib Sufi CCLRC e-Science Centre CCLRC Scientific Metadata (CSMD) Model April 2004 NESC

Shoaib Sufi

CCLRC e-Science Centre

Conformance Level

• For a complete metadata study-dataset record a large amount of metadata has to be stored/processed

• So it’s useful to have conformance levels

• Model uses 5 levels

• Each level specifies more metadata (and Indexing information) should be held

Page 16: Shoaib Sufi CCLRC e-Science Centre CCLRC Scientific Metadata (CSMD) Model April 2004 NESC

Shoaib Sufi

CCLRC e-Science Centre

Level 1

• Type of Information captured:

– Study and Investigation metadata with indexing at the Study level

• Level 1 metadata is similar to library/publication style metadata (e.g. DublinCore)

Page 17: Shoaib Sufi CCLRC e-Science Centre CCLRC Scientific Metadata (CSMD) Model April 2004 NESC

Shoaib Sufi

CCLRC e-Science Centre

Level 2

• Type of Information captured:

– Level 1 + DataHolding metadata (i.e. DataSets and DataObjects)

Page 18: Shoaib Sufi CCLRC e-Science Centre CCLRC Scientific Metadata (CSMD) Model April 2004 NESC

Shoaib Sufi

CCLRC e-Science Centre

Level 3

• Type of Information captured:

– Level 2 + related material, Access condition, indexing to data collection levels

Page 19: Shoaib Sufi CCLRC e-Science Centre CCLRC Scientific Metadata (CSMD) Model April 2004 NESC

Shoaib Sufi

CCLRC e-Science Centre

Level 4

• Type of Information captured:

– Level 3 + indexing to data object level and data object parameter information

Page 20: Shoaib Sufi CCLRC e-Science Centre CCLRC Scientific Metadata (CSMD) Model April 2004 NESC

Shoaib Sufi

CCLRC e-Science Centre

Level 5

• Type of Information captured:

– All metadata components are filled as L4 + funding, resources used, facilities used etc

Page 21: Shoaib Sufi CCLRC e-Science Centre CCLRC Scientific Metadata (CSMD) Model April 2004 NESC

Shoaib Sufi

CCLRC e-Science Centre

Conformance Levels

• L1 is similar to library/publication style metadata (e.g. DublinCore)

• The current DataPortal uses somewhere between L2 and L3 – indexing at study level moving towards collection level but with parameter information

• Envisaged only new systems designed with CSMD will conform to L4+

• Benefit of conformance levels; the higher the level of conformance to the CSMD the richer the clients that operate on the data can be– e.g. identifying datasets and objects which link directly

to keywords/taxonomies and not just studies

Page 22: Shoaib Sufi CCLRC e-Science Centre CCLRC Scientific Metadata (CSMD) Model April 2004 NESC

Shoaib Sufi

CCLRC e-Science Centre

Page 23: Shoaib Sufi CCLRC e-Science Centre CCLRC Scientific Metadata (CSMD) Model April 2004 NESC

Shoaib Sufi

CCLRC e-Science Centre

Facilities using CSMD

• CCLRC Facilities (via CCLRC DataPortal):– ISIS - Neutron Spallation at Rutherford Appleton Laboratory

(test)– SR – Synchroton Radiation source at Daresbury Laboratory

(test) – British Atmospheric Data Centre (BADC) at RAL (prototype)

• External Facilities (via CCLRC DataPortal):– Max-Planck-Institut für Meteorologie (MPIM) in Hamburg

• External Projects using CSMD– NERC funded E-mineral ‘environment from the molecular level’– EPSRC funded E-materials project– Manchester MyGrid project uses an adapted version– ISIS (RAL) have taken data needs inhouse and use a model

based heavily on CSMD

Page 24: Shoaib Sufi CCLRC e-Science Centre CCLRC Scientific Metadata (CSMD) Model April 2004 NESC

Shoaib Sufi

CCLRC e-Science Centre

The Future

• Increased use/recommendation for use of Controlled vocabularies

• Increased support for formal identification systems

• Feeding relevant ideas from other standards• Update XML and Relational implementations so

they more closely track the model.• Look into internationalisation issues and see if

these effect the model or the implementations

Page 25: Shoaib Sufi CCLRC e-Science Centre CCLRC Scientific Metadata (CSMD) Model April 2004 NESC

Shoaib Sufi

CCLRC e-Science Centre

More information

• Latest Model description

– http://www-dienst.rl.ac.uk/library/2002/tr/dltr-2002001.pdf

• For an XML implementation and Relational Implementation, newer draft of the model documentation e-mail:

[email protected] with the subject containing [metadata model request]