eudat user forum-london-11march2013-biovel-v3

WEB SERVICES INFRASTRUCTURESFOR BIODIVERSITY SCIENCE

Alex HardistyCoordinator, Cardiff University

EUDAT User Forum, 11-12th March 2013, London

Biodiversity Virtual e-LaboratoryAn e-Infrastructure and e-Science environment supporting research on biodiversity

Products are “services” and “workflows”

• Workflows allow to process vast amounts of data, repeatedly– Build your own workflow: select and

apply successive “services” (data analysis and processing steps)

– Import data from one’s own research and/or from existing libraries (i.e. GBIF, Catalogue of Life)

• Access a library of workflows and re-use existing workflows. – Improves efficiency by reducing

research time and overhead expensesPart of a workflow to study the ecological niche of the horseshoe crab

Creates powerful data processing tools for biodiversity research

• Carbon Sequestration• Ecosystem Functioning and Valuation• Invasive Species Management

• Aims to foster cooperation in the community by:– Discussing scientific use cases– Identifying and deploying important Web Services– Designing and offering workflows– Training scientists

An international virtual network of experts connecting2 scientific communities: biodiversity and ICT

Ecological Niche ModellingBiogeochemical modellingMetagenomicsPhylogeneticsPopulation ModellingTaxonomyGeospatial Visualization

• NoE: ALTER-Net, EDIT/PESI, LTER-Europe, EuroMarine, etc.

• Projects: 4D4Life, agINFRA, Aquamaps, ArtDataBanken, BioFresh, Envri, EU BON, EUBrazilOpenBio, Fauna Iberica, i4Life, iMarine, Micro B3, OpenPlantBio, ViBRANT

• Global: CAMERA, Catalogue of Life, COOPEUS, CReATIVE-B, EoL, GBIF, GSC Biodiversity WG, TreeBase, and many more

Fits into a portfolio of initiatives

Supported by many friends

Important contributionto infrastructure

BioVeL Tool Spectrum

TechnicalPAL

SciencePAL

DomainScientist

TavernaWorkbench

ComponentBuilder

TavernaLite / Server

Domain-SpecificWebsite

(Taverna Player)

Workflow Visibility

Concept KnowledgeWorkflow design, compute Domain science

High Low

Interaction Server

Taverna Server

Server

Serv

ers

Run timeExecution

Serv

ices

COTS Shim

Domain

Cloud

DeploymentInfrastructurehosting, compute, storage

WorkflowsComponents

Catalogues & Repositories

BioCatalogue

Services

BiodiversityCatalogue

Dat

a M

gt

Data Mgt Workspace

AuthenticationManagement System

Local FileStores

Local DataSets

Local Public BioVeL

Curators

TavernaWorkbench

Lite

ProMakers

In the FieldUsers Third Party

Channels

InterfacesDesign & Launch tools

We’re at the halfway point

• Several workflows maturing nicely– Public Shared: Data refinement, Population modelling, Ecol. niche modelling– Beta: Phylogenetic inferencing– In the pipe: Biogeochemical process modelling, metagenomics, …

• Using Web services from GBIF, CoL, CRIA, Fraunhofer, INFN, ….– Developing new services: viz and data selection, phylo, metagenomics,

Biome-BGC modelling, pop modelling

• A curated public catalogue of Web services– www.biodiversitycatalogue.org

• AWS cloud infrastructure, new user interfaces (tavlite1.biovel.eu)

• Growing profile in community– Steady enquiries from potential users and public training workshops

http://www.biodiversitycatalogue.org/

http://tavlite1.biovel.eu/

4 questions to address

1. How to use distributed centres to efficiently run distributed processing chains?

2. Is there a problem of data exchange?(And how to solve this)

3. Deploying codes close to data4. Access and security issues around managing

protected services

How to use distributed centres to efficiently run distributed processing chains?

Users’ workflows and applications

Service and Data Providers(INFN, BioVeL, GBIF, CoL,EBI, BGBM, etc.)

Resource Providers(EUDAT, EGI.eu, PRACE,commercial cloud, etc.)

Is there a problem of data exchange?(And how to solve this)

• At simplest level, we need for the user:– A "starting place", where a workflow can find the data it needs– An "ending place", where a workflow can put its results– A "transient place" where temporary data / intermediate results can be put and

retrieved

• For services we need:– Temporary spaces associated with specific services, supporting data movements

between services– Separation of users and separation of workflow runs

• Summarise as : – A replicated distributed storage space, accessible to BioVeL services, (hence

workflows) for both reading and writing; which presents to the user as a filespace, native to the user’s local environment.

• = Dropbox for services, with fast replication between known service locations. Today, typically GB not TB

Deploying codes close to data

• BioVeL Appliance– A service packaged for DCI, deployed on-demand– Working with EGI Fedcloud on this– Could be deployed close to data but this only makes sense

if this would be quicker than moving the data• So where is the break-even point?

• Taverna Server deployments– In connection with Web Services hosting

Taverna Server

Access and security issues around managing protected services

• We need a lightweight and standard solution for – User management & single sign-on to our Service Network– Permissions system for authorizing access to services

• Same for Workspace Access Service (user workspace)

Contract

Contract

RP

SP

User

Access and security issues around managing protected services

• We need a lightweight and standard solution for – User management & single sign-on to our Service Network– Permissions system for authorizing access to services

• Same for Workspace Access Service (user workspace)

• 3-legged OAuth, extended– resource / service is

independent of BioVeL OAuth provider

• Adopt from megx.net– marine ecological

genomics

BioVeL is funded by the European Commission 7th Framework Programme (FP7).It is part of its e-Infrastructures activity.

BioVeL contributes to LifeWatch and GEO BON.

BioVeL products are free to access.

Questions?

Under FP7, the e-Infrastructures activity is part of the Research Infrastructures programme, funded under the FP7 'Capacities' Specific Programme. It focuses on the further development and evolution of the high-capacity and high-performance communication network (GÉANT), distributed computing infrastructures (grids and clouds), supercomputer infrastructures, simulation software, scientific data infrastructures, e-Science services as well as on the adoption of e-Infrastructures by user communities.

eudat user forum-london-11march2013-biovel-v3

Documents

biovel services

protected services

new services

data providersinfn

specific services

data selection

data refinement

data movements