an on-line collaborative data management system

23
An On-line Collaborative Data Management System Roger Curry 1 , Cameron Kiddle 1 , Rob Simmonds 1 and Gilberto Z. Pastorello Jr. 2 1 Grid Research Centre, University of Calgary 2 Centre for Earth Observation Science, University of Alberta

Upload: cameron-kiddle

Post on 11-Jun-2015

1.019 views

Category:

Technology


4 download

DESCRIPTION

A presentation I prepared that was presented by Rob Simmonds at the Gateway Computing Environments 2010 Workshop in New Orleans on November 14, 2010. It provides an overview of a data management system that was developed for GeoChronos - an on-line collaborative platform for Earth observation scientists.

TRANSCRIPT

Page 1: An On-line Collaborative Data Management System

An On-line Collaborative Data Management System

Roger Curry1, Cameron Kiddle1, Rob Simmonds1 and Gilberto Z. Pastorello Jr.2

1Grid Research Centre, University of Calgary2Centre for Earth Observation Science, University of Alberta

Page 2: An On-line Collaborative Data Management System

Data Challenges Related Work Data Management System Use Case: GeoChronos Summary and Future Work

Outline

GCE 2010 Nov. 14, 2010 2

Page 3: An On-line Collaborative Data Management System

Data Acquisition Much scientific data stored on off-line media Cumbersome and time consuming to access Making data available on-line difficult Insufficient storage and bandwidth

Sharing of Data Lack of willingness to share data Proprietary data - need for controlled access

Data Challenges - I

GCE 2010 Nov. 14, 2010 3

Page 4: An On-line Collaborative Data Management System

Usability of Data Insufficient metadata to describe data Various metadata standards in some domains,

but many lacking metadata standards – many scientists use their own metadata format

Finding Data Difficult to find data that you need Different data organized / stored differently Tools to browse, search, visualize data often

lacking

Data Challenges - II

GCE 2010 Nov. 14, 2010 4

Page 5: An On-line Collaborative Data Management System

Content Management Systems i.e., Drupal, Joomla!, Microsoft SharePoint, Plone, ... Offer rich set of features but do not handle:

Meaningful support to specific data formats Efficient association of metadata and ancillary files to data sets Access to a variety of data processing tools Uniform handling of outputs from processing tools

Spectral Libraries i.e., USGS, ASTER, Vegetation Spectral Library (VSL) Are available on-line but lack:

ability to dynamically restructure metadata for browsing collaboration features enabled by social networking

Related Work - I

GCE 2010 Nov. 14, 2010 5

Page 6: An On-line Collaborative Data Management System

Spectral Library Tools i.e., DLR-DFD Spectral Archive, SPECCHIO Flexibile in creating / handling metadata but:

Have a fixed metadata schema – do not support new metadata needs

Data repositories for other domains i.e., Astrophysics Data System, FLUXNET, European Bioinformatics (EBI)

Databases Offer wide range of functionality but:

Primarily focus on data that is already validated and structured Do not handle preliminary, intermediate, untested data (i.e. research in progress)

Digital Libraries i.e., Planetary Data Systems, NCore, SciPort Have flexible functionality but:

Most focus on well-defined digital artefacts Limited in handling collaboration on evolving data, metadata and schemas

Related Work - II

GCE 2010 Nov. 14, 2010 6

Page 7: An On-line Collaborative Data Management System

Supports the following functionality: On-line access to data Enables scientists to share data while

maintaining control of who sees it Ability to add and edit metadata while working

with multiple schemas Collaboratively create new schemas to facilitate

consistent/accurate recording of metadata Dynamically restructure the way data is browsed

Data Management System - Overview

GCE 2010 Nov. 14, 2010 7

Page 8: An On-line Collaborative Data Management System

Data Management System - Framework

GCE 2010 Nov. 14, 2010 8

User & Data: User acquires data from sensor and

uploads to portal Direct acquisition of data also possible

Elgg Portal: Built on top of Elgg – Open source

social networking platform Fine grained access control Flexible data model

Data Storage: Currently local NFS storage Working on distributed iRODS based

system Data Ingestion Service:

Creates records, parses metadata, establishes ancillary relationships

Deployed on cloud-based Condor pool

Page 9: An On-line Collaborative Data Management System

Data Management System – Data Model

GCE 2010 Nov. 14, 2010 9

Source: http://docs.Elgg.org/wiki/File:Elgg_data_model.png)

Data Management System – Data Model

Arbitrary metadata can be assigned to any entity

Annotations allow users to comment on entities not owned by them

Data management system adds three new types of ElggObjects Schema Collection Record

Page 10: An On-line Collaborative Data Management System

Data Management System - Schemas

GCE 2010 Nov. 14, 2010 10

Create schemas Custom or standards-based (i.e.

Dublin Core) Individually or as a collaborative

team Schemas consist of

Namespace Description Read/write access permissions Series of metadata keys

Metadata keys consist of Name Description Type (text, latlong, ancillary) Optionality: required,

recommended, optional

Page 11: An On-line Collaborative Data Management System

Data Management System - Collections Group of related data

i.e., spectral library, set of satellite data Collection consists of

Name, description, read/write access permissions, metadata, records

GCE 2010 Nov. 14, 2010 11

Page 12: An On-line Collaborative Data Management System

Data Management System - Records

GCE 2010 Nov. 14, 2010 12

Atomic unit of data management system Usually represents a single file, but does not need to be

associated with a file Tabbed interface for viewing:

Spectral plot, metadata, ancillary data, map, comments Custom tabs based on data type

Page 13: An On-line Collaborative Data Management System

Data Management System – Virtual Directory Structure

GCE 2010 Nov. 14, 2010 13

Dynamic restructuring of data for browsing purposes Folders based on metadata keys/values User can customize the metadata keys used to establish the

directory hierarchy

Page 14: An On-line Collaborative Data Management System

Use Case - GeoChronos

GCE 2010 Nov. 14, 2010 14

(http://geochronos.org/)

Page 15: An On-line Collaborative Data Management System

An on-line platform For:

Earth Observation Scientists Facilitating:

Collaboration between scientists Data access, management and sharing Application access, management and sharing

Leveraging: Web 2.0 and social networking technologies Cloud computing technologies

Funded by: CANARIE - Network Enabled Platform (NEP-1) program Cybera

GeoChronos - Overview

GCE 2010 Nov. 14, 2010 15

Page 16: An On-line Collaborative Data Management System

GeoChronos - Project Team

GCE 2010 Nov. 14, 2010 16

Dr. Arturo Sanchez-AzofeifaUniversity of Alberta

Dr. John GamonUniversity of Alberta

Dr. Benoit RivardUniversity of Alberta

Dr. Rob SimmondsUniversity of Calgary

Prinicipal Investigators

Project Coordination Platform Development Domain Scientists

Page 17: An On-line Collaborative Data Management System

GeoChronos - Virtual Organization

GCE 2010 Nov. 14, 2010 17

Page 18: An On-line Collaborative Data Management System

Libraries created Ingested some existing on-line libraries

USGS, ASTER, Vegetation Spectral Library (VSL) Many enhanced features as part of GeoChronos

Spectral Library module - improved browsing, dynamic plotting, mapping, annotations, ...

Domain scientists have contributed libraries Rock samples, tar sand samples, lichen samples,

vegetation samples, alfalfa/barley field samples Data formats / parsers supported

ENVI, UNISPEC, ASD, several ASCII formats Schemas incorporated

Library specific – USGS, ASTER, VSL, ... Sensor/Format specific – UNISPEC, ENVI, .. Other Standards – Dublin Core

Currently hosting (including MODIS data) 10+ schemas, 20+ collections (libraries), 20,000+ records

GeoChronos – Spectral Libraries

GCE 2010 Nov. 14, 2010 18

Page 19: An On-line Collaborative Data Management System

GeoChronos – MODIS Satellite Data Developed automated workflow

service for mosaicing, subsetting, reprojecting and masking MODIS satellite data

Significantly reduces time that scientists have spent manually doing such workflows

Data management system used to store raw MODIS satellite data and data products derived from the workflow

Parsers/schemas specific to MODIS data have been added to system

User provided with same powerful interface as Spectral Libraries for browsing, accessing and viewing data

GCE 2010 Nov. 14, 2010 19

Page 20: An On-line Collaborative Data Management System

Have developed data management system in an interactive, iterative fashion

Domain scientists on project have provided much guidance, testing and feedback

Have customized, enhanced the data management system based on feedback received

GeoChronos – User Feedback

GCE 2010 Nov. 14, 2010 20

Page 21: An On-line Collaborative Data Management System

Identified data related challenges facing scientists

Discussed some related efforts and shortcomings of these approaches

Presented an on-line collaborative data management system addressing many data challenges

Showed example usage of the data management system by GeoChronos

Summary

GCE 2010 Nov. 14, 2010 21

Page 22: An On-line Collaborative Data Management System

Currently have a single local data repository Working on extending data management system to work with

distributed data repositories using iRODS Currently have powerful browsing functionality

Need to add search functionality across collections and based on metadata values

Currently support custom metadata schemas Plan to make use of Semantic Web technologies to better

relate data and provide ontological mapping between different metadata schemas / standards

Currently work with spectral and MODIS satellite data Plan to incorporate other data such as carbon flux data, other

satellite data, meteorological data, phenology tower data

Next Steps

GCE 2010 Nov. 14, 2010 22

Page 23: An On-line Collaborative Data Management System

Contact Information

GCE 2010 Nov. 14, 2010 23

http://geochronos.org/[email protected]

http://grid.ucalgary.ca/ http://ceos.ualberta.ca/ http://www.cybera.ca/