accessing biodiversity resources in computational environments from workflow application j. s....

26
Accessing Biodiversity Resources in Computational Environments from Workflow Application J. S. Pahwa, R. J. White, A. C. Jones, M. Burgess, W. A. Gray, N. J. Fiddian, T. Sutton, P. Brewer, C. Yesson, N. Caithness, A. Culham, F. A. Bisby, M. Scoble, P. Williams and S. Bhagwat WORKS 2006, Paris

Upload: rosanna-barnett

Post on 23-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Accessing Biodiversity Resources in Computational Environments from Workflow

Application

J. S. Pahwa, R. J. White, A. C. Jones, M. Burgess, W. A. Gray, N. J. Fiddian, T. Sutton, P. Brewer, C. Yesson, N. Caithness,

A. Culham, F. A. Bisby, M. Scoble, P. Williams and S. Bhagwat

WORKS 2006, Paris

Overview

• The Biodiversity World (BDW) Project• The three exemplars chosen for BDW• BDW Architectural Components

a) Resource Wrappersb) BiodiversityWorld-GRID Interface (BGI) Communications Layerc) BDW Datatypesd) The Metadata Repository (MTR)

• Using BDW for bioclimatic modelling• Access to computational resources in BDW environment• Further Work & Conclusions

The BDW System

• A framework for biodiversity problem-solving• provides access to widely dispersed, disparate

data sources and analytical tools• Intended particularly for analysis and modelling of

biodiversity patterns

• Provides access to resources originally designed for use in isolation

• Resources may be composed into complex workflows

BDW Exemplars

A. Biodiversity richness analysis and conservation evaluation

B. Bioclimatic modelling and global climate change

C. Phylogenetic analysis and biogeography

Biodiversity Richness Analysis and Conservation Evaluation

Aim:• analysis of biodiversity richness patterns for a

particular taxon (e.g. group of species) around the world

The BDW System enables:• Taxonomic verification using the Species 2000

Catalogue of Life service• Composition of distribution datasets for the chosen

taxon from various sources around the world• Use of the WorldMap System to

• visualise the distribution datasets, and• help identify priority areas for biodiversity conservation

Bioclimatic Modelling and Global Climate Change

Aim:• Understand impact of global climate change on

distribution and diversity of plant & animal species• Identify climatic & ecological conditions under which

a single species lives, extrapolating from known occurrences

• Hence calculate a potentially wider set of areas where the species might occur, or predict future distribution under anticipated climatic conditions

• A bioclimatic modelling workflow example follows later

Phylogenetic Analysis and Biogeography

Aim:• Discover ancestral relationships between groups of

organisms using methods of phylogenetic analysis• Estimate ages of species• Use estimates of historical climate to produce

plausible estimates of geographical distributions• Assess historical relationships between changing

climate and development of new species

The BDW System provides (1):

• A flexible and extensible problem solving environment (PSE)

• Means of• bringing together heterogeneous, globally distributed,

biodiversity-related resources & analytical tools• assembling resources into workflows to perform complex

scientific analyses

• Consistent mechanisms to achieve interoperability of system components

The BDW System provides (2):

• Uniform interfaces for heterogeneous resources (resource wrappers)

• Mechanism for data packaging & transfer

• Compatibility with the Triana Workflow System for assembling and executing workflows

• Web Services-based Grid middleware for accessing remote computational resources

The BDW System Architecture

BDW architectural components (1)

Resource Wrappers

• Provide consistent interface to local & remote resources, and standard resource access/invocation mechanism

• Insulate the core BDWorld System from resource heterogeneity

• Wrap various kinds of resources and analytical tools and can be deployed in Grid/Web Services environment.

• Give consistent form to data retrieved by encapsulating them into BDWorld data types

• Resources wrapped include AVH, GBIF, OpenModeller, etc.

Resource Wrapper Architecture

BDW architectural components (2)

BDW-GRID Interface (BGI) Layer

• Provides standard mechanisms for invoking operations on heterogeneous resources

• Acts as an integrated mechanism for accessing all resource wrappers

• Isolates resource wrapper implementation to a separate layer to enable the use of web services/grid technologies

BDW architectural components (3)

BDW Datatypes

• Encapsulate different types of data and sub-datatypes for transporting data between end points

• Can be transformed into xml representations which can be easily serialised

• Flexible enough to encapsulate user-defined xml documents or data in a string representation

• Extensible; new datatypes can be incorporated

BDW Datatypes

BDW architectural components (4) BDW Metadata Repository

• A specialised BDWorld resource

• Provides information such as:• Available resources• Operations supported by each resource• Data types used by operations• Location of resource wrapper

• Stores semantic information in the BDWorld ontology, to answer questions such as • ‘Which resources can provide me with species data?’ • ‘Which available operations can accept the outputs from a

specific operation?’

Bioclimatic Modelling (1)

• By using the known localities of a species, a climate preference profile is produced by cross-referencing with present day climate data

• This climate preference profile is then used to locate other areas where such a climate exists, indicating areas climatically suitable for the species

Bioclimatic Modelling (2)

• Using present-day climate:• assess areas under threat from invasive species,

or• those that may benefit from the introduction of a

new crop• Using climate predictions for the future:

• assess possible effects of global climate change on the distribution of study species

• Using climate predictions for the past:• assess changes caused by natural factors in the

past 

Bioclimatic Modelling Workflow performed by Triana workflow package in BDW system

Example model output for the clover species Trifolium patens Schreber (a member of the bean family). The map shows areas (shaded regions across Central and Eastern Europe, South America, Asia and Australia) predicted to be suitable for the species in the 2050’s using the bioclimatic modelling algorithm GARP and the Hadley Centre climate model using the SRES A1F climate scenario.

The Current BDW Architecture:

Enables execution of BDW workflow tasks in remote nodes but with a limited scope.

- Lacks in giving sufficient control and flexibility to the user.

- Does not provide the functionality of distributing user jobs across several nodes.

- Dependent on libraries at the client side.

The new BDW System architecture (1):

• Provides user with access to:- Biodiversity resources.- Computational resources.

• Use the existing mechanism of invoking operations on remote resources via resource wrapper web services.

• It also uses condor middleware for utilising computational resources and distributing workload across available nodes.

The new BDW System architecture (2):

• Provides access to the condor pool via the web service interface.

• Gives user to flexibility to choose available computational node by using Ganglia cluster monitoring toolkit.

• Enables matching of workflow task with preferred resource(s).

The new BDW System architecture (2):

Conclusions and Further Work

• BDW brings together varied, distributed resources and analytical tools for biodiversity researchers and analyse biodiversity patterns

• Disparate resources can be accessed in the Web-Service enabled BDW PSE.

• The BDW PSE has uniform access to heterogeneous resources

• BDW allows linking of tools and resources in a workflow to automate different activities of an experiment

• Three current exemplar study areas• The new BDW architecture also provides access to

computational resources.• Security – Shibboleth/chroot

Acknowledgements

• BDW team

• Species 2000

• OpenModeller Community (including CRIA)

• BBSRC

• …