towards an information model for i2s2 brian matthews, leader, scientific applications group,...

31
Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory [email protected]

Upload: anthony-mcintosh

Post on 28-Mar-2015

223 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk

Towards an information model for I2S2

Brian Matthews, Leader, Scientific Applications Group,E-Science Centre,STFC Rutherford Appleton Laboratory

[email protected]

Page 2: Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk

Facilities Process

Proposal

Approval

Scheduling Experimen

tData storage

Record Publicatio

n

Scientist submits

application for beamtime

Facility committee approves

application

Facility registers, trains, and schedules

scientist’s visit

Scientists visits, facility

run’s experiment

Subsequent publication

registered with facility

Raw data filtered,

cleansed and stored

Data analysis

Tools for processing

made available

Characteristics : - formal application - set processes - central infrastructure - standard tools - hierarchical control - dedicated staff

•user office•instrument scientists•Library and IT support

Page 3: Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk

Requirements

• Secure access to user’s data• Flexible data searching• Scalable architecture• Extensible architecture• Integration with analysis tools• Access to high-performance resources• Linking to other scientific outputs• Data policy aware

Page 4: Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk

Principles

Online Proposal System

User Office System:

User Database

Scheduling

Health and Safety

Proposal Management

Metadata Catalogue

Data Acquisition

System

Storage Management

System

DataAccess Portal

Single Sign On Account Creation and Management

ICAT Software Suite, providing the crucial integration of key functions.

The ICAT software suite

• Catalogues all experiment related information

• Metadata gathered via integration with existing IT systems

– proposal systems– data acquisition

• Provides a well defined API for easy embedding into any applications.

Access data anywhere via the web Annotate and Search for data Share data with colleaguesAccess data via user’s own programs Utilise integrated e-Science resources Link to data from your publications

Page 5: Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk

Component architectur

e

Page 6: Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk

RDBMS

Web Services API

ICAT API

Command Line Tools

Glassfish / JBOSS

JavaC++Fortran

Data Storage/ Delivery System

Single Sign On

User Database System

Proposal System

Proposal System

Publication SystemPublication System

e-Science Servicese-Science Services

Software Repositor

y

Software Repositor

y

ICAT Deployment

Page 7: Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk

Data Portal

Page 8: Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk

TopCat

Page 9: Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk

Towards an Information Model

Page 10: Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk

Methodology

The Singapore Framework for Dublin Core Application Profiles.Mikael Nilsson, Tom Baker, Pete Johnstonhttp://dublincore.org/documents/singapore-framework/

Page 11: Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk

Functional requirements

Page 12: Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk

A Metadata Model for Facilities Science

A common general format/standard for Scientific Studies and data holdings metadata did not exist

By proposing a Model– A specification for the types of metadata to

capture Scientific Studies– Cataloguing data holdings: provide access for

the Data Owner– Ease citation, sharing collaboration, and

integration– Allow easy Federation of distributed

heterogeneous metadata systems into a homogeneous (virtual) Platform

Therefore – The Common Scientific Metadata Model (CSMD) developed.

Page 13: Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk

A Domain Model

Page 14: Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk

Modelling Scientific Activity

Page 15: Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk
Page 16: Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk
Page 17: Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk

Investigation

Publication KeywordTopic

SampleSample

ParameterDataset

Dataset Parameter

Datafile

Datafile Parameter

InvestigatorReference / Proposal IdPrevious ReferenceFacilityInstrumentTitleAbstractEtc.

Name

Name/Units/Value etcSearchableIs Sample ParameterIs Dataset ParameterIs Datafile ParameterVerified

NameUnitsString ValueNumeric ValueRange TopRange BottomError

Full ReferenceURL

Repository

NameParent Id

Topic Level

User IdRole

NameChemical FormulaSafety Information

NameUnitsString ValueNumeric ValueRange TopRange BottomError

NameSample Id

Description

NameUnitsString ValueNumeric ValueRange TopRange BottomError

NameDescription

VersionLocation

FormatFormat Version

Create TimeModify Time

SizeChecksum

Related DatafileRelated Datafile

Parameter

Authorisation

Source Datafile IdDestination Datafile Id

RelationS/W Apllication

S/W Version

User IdRole e.g Admin, Deleter, Updater, Reader, Creater, Downloader etc.

Element TypeElement Id

Damian FlanneryCore Scientific Metadata Model

Page 18: Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk

Description set profile

Page 19: Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk

Metadata granule

Metadata Granule

Topic

Study Description

Access Conditions

Data Location

Data Description

Keywords providing a index on what the study is about.

Provenance about what the study is, who did it and when.

Conditions of use providing information on who and how the data can be accessed.

Detailed description of the organisation of the data into datasets and files.

Locations providing a navigational aid to where the data on the study can be found.

References into the literature and community providing context about the study.

Related Material

Legal Note

Copyright, patents and conditions of use etc relating to the study and the data in the study

.

Page 20: Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk

ICAT 3.3 Schema – Study (2)

Page 21: Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk

Syntax and metadata formats

Page 22: Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk

ICAT API and XML format

Page 23: Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk

ICAT 3.3 Database Schema

Page 24: Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk

CSMD HistoryModel first pilot developed in 2001!• Now in ICAT 3.3• Serving data from STFC Facilities (ISIS, DLS)• Model proven robust – simple yet expressive

– http://code.google.com/p/icatproject/

Page 25: Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk

I2S2 - Infrastructure for Integration in Structural Sciences

Bridging the gap between raw and derived data

“Lone” researcher scenario• data sharing with colleagues via email• Little or no infrastructure• Little management of raw or derived data

EPSRC National Crystallography Service

• service provision function• operates across institutions • moderate infrastructure

Diamond & ISIS•operates on behalf of multiple institutions •processes for experiments •large infrastructure engineered to manage raw data•derived data taken off site on laptops / removable drives

Page 26: Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk

Interactions between research process

Grant Proposal

Facilities Proposal

FacilitiesExperimen

tData

cleansing

Record Publication

Data analysis

Local experimen

ts

Simulation

Sample Preparatio

n

Literature Review

Publication

Proposal

Approval

Scheduling

Facilities Experimen

t

Data storage

Record Publication

Analysis Tools

CS

MD

Cover the scientist’s research lifecycle as well as the facilities.

Extend to

To laboratory based science To secondary analysis data To preservation information To publication data To domain specific vocabularies

By being: - standardised - modular - extensible

Page 27: Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk

Methodology

The Singapore Framework for Dublin Core Application Profiles.Mikael Nilsson, Tom Baker, Pete Johnstonhttp://dublincore.org/documents/singapore-framework/

Page 28: Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk

Issues

• Metadata model• Framework for developing metadata model• Modularisation mechanisms and extensions• Formats

• Model supporting laboratory tools– How does the model fit ?– Flexibility to handle local processes

• Adhoc, partial, un-ordered

– What needs changing in the model?– What needs changing in tools?

• Data input and maintenance???• Simple ways of inputting the data• Lab books?

Page 29: Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk

Extension areas:

• Secondary analysis data• Preservation data• Publication data• Topic data

• chemistry

• Controlled lists (ontologies) for • Instruments• Facilities,• Methods

• Access control• Safety data• Blogs and notebooks

Page 30: Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk

ISIS - ICAT

Part of ISIS study

Gudrun

Control fileCorrection data Sample data Calibration data

Scattering function data

User inputs

Page 31: Towards an information model for I2S2 Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk

Derived Data

Generalised model

Managing the links between data

Inputs of data sets

Associated with a software item with a set of parameters

Managing this? - lab-books ? - simple tools? - VRE ?