eudat user forum-london-11march2013-biovel-v3
DESCRIPTION
TRANSCRIPT
WEB SERVICES INFRASTRUCTURESFOR BIODIVERSITY SCIENCE
Alex HardistyCoordinator, Cardiff University
EUDAT User Forum, 11-12th March 2013, London
Biodiversity Virtual e-LaboratoryAn e-Infrastructure and e-Science environment supporting research on biodiversity
Products are “services” and “workflows”
• Workflows allow to process vast amounts of data, repeatedly– Build your own workflow: select and
apply successive “services” (data analysis and processing steps)
– Import data from one’s own research and/or from existing libraries (i.e. GBIF, Catalogue of Life)
• Access a library of workflows and re-use existing workflows. – Improves efficiency by reducing
research time and overhead expensesPart of a workflow to study the ecological niche of the horseshoe crab
Creates powerful data processing tools for biodiversity research
• Carbon Sequestration• Ecosystem Functioning and Valuation• Invasive Species Management
• Aims to foster cooperation in the community by:– Discussing scientific use cases– Identifying and deploying important Web Services– Designing and offering workflows– Training scientists
An international virtual network of experts connecting2 scientific communities: biodiversity and ICT
Ecological Niche ModellingBiogeochemical modellingMetagenomicsPhylogeneticsPopulation ModellingTaxonomyGeospatial Visualization
• NoE: ALTER-Net, EDIT/PESI, LTER-Europe, EuroMarine, etc.
• Projects: 4D4Life, agINFRA, Aquamaps, ArtDataBanken, BioFresh, Envri, EU BON, EUBrazilOpenBio, Fauna Iberica, i4Life, iMarine, Micro B3, OpenPlantBio, ViBRANT
• Global: CAMERA, Catalogue of Life, COOPEUS, CReATIVE-B, EoL, GBIF, GSC Biodiversity WG, TreeBase, and many more
Fits into a portfolio of initiatives
Supported by many friends
Important contributionto infrastructure
BioVeL Tool Spectrum
TechnicalPAL
SciencePAL
DomainScientist
TavernaWorkbench
ComponentBuilder
TavernaLite / Server
Domain-SpecificWebsite
(Taverna Player)
Workflow Visibility
Concept KnowledgeWorkflow design, compute Domain science
High Low
Interaction Server
Taverna Server
Server
Serv
ers
Run timeExecution
Serv
ices
COTS Shim
Domain
Cloud
DeploymentInfrastructurehosting, compute, storage
WorkflowsComponents
Catalogues & Repositories
BioCatalogue
Services
BiodiversityCatalogue
Dat
a M
gt
Data Mgt Workspace
AuthenticationManagement System
Local FileStores
Local DataSets
Local Public BioVeL
Curators
TavernaWorkbench
Lite
ProMakers
In the FieldUsers Third Party
Channels
InterfacesDesign & Launch tools
We’re at the halfway point
• Several workflows maturing nicely– Public Shared: Data refinement, Population modelling, Ecol. niche modelling– Beta: Phylogenetic inferencing– In the pipe: Biogeochemical process modelling, metagenomics, …
• Using Web services from GBIF, CoL, CRIA, Fraunhofer, INFN, ….– Developing new services: viz and data selection, phylo, metagenomics,
Biome-BGC modelling, pop modelling
• A curated public catalogue of Web services– www.biodiversitycatalogue.org
• AWS cloud infrastructure, new user interfaces (tavlite1.biovel.eu)
• Growing profile in community– Steady enquiries from potential users and public training workshops
4 questions to address
1. How to use distributed centres to efficiently run distributed processing chains?
2. Is there a problem of data exchange?(And how to solve this)
3. Deploying codes close to data4. Access and security issues around managing
protected services
How to use distributed centres to efficiently run distributed processing chains?
Users’ workflows and applications
Service and Data Providers(INFN, BioVeL, GBIF, CoL,EBI, BGBM, etc.)
Resource Providers(EUDAT, EGI.eu, PRACE,commercial cloud, etc.)
Is there a problem of data exchange?(And how to solve this)
• At simplest level, we need for the user:– A "starting place", where a workflow can find the data it needs– An "ending place", where a workflow can put its results– A "transient place" where temporary data / intermediate results can be put and
retrieved
• For services we need:– Temporary spaces associated with specific services, supporting data movements
between services– Separation of users and separation of workflow runs
• Summarise as : – A replicated distributed storage space, accessible to BioVeL services, (hence
workflows) for both reading and writing; which presents to the user as a filespace, native to the user’s local environment.
• = Dropbox for services, with fast replication between known service locations. Today, typically GB not TB
Deploying codes close to data
• BioVeL Appliance– A service packaged for DCI, deployed on-demand– Working with EGI Fedcloud on this– Could be deployed close to data but this only makes sense
if this would be quicker than moving the data• So where is the break-even point?
• Taverna Server deployments– In connection with Web Services hosting
Taverna Server
Access and security issues around managing protected services
• We need a lightweight and standard solution for – User management & single sign-on to our Service Network– Permissions system for authorizing access to services
• Same for Workspace Access Service (user workspace)
Contract
Contract
RP
SP
User
Access and security issues around managing protected services
• We need a lightweight and standard solution for – User management & single sign-on to our Service Network– Permissions system for authorizing access to services
• Same for Workspace Access Service (user workspace)
• 3-legged OAuth, extended– resource / service is
independent of BioVeL OAuth provider
• Adopt from megx.net– marine ecological
genomics
BioVeL is funded by the European Commission 7th Framework Programme (FP7).It is part of its e-Infrastructures activity.
BioVeL contributes to LifeWatch and GEO BON.
BioVeL products are free to access.
Questions?
Under FP7, the e-Infrastructures activity is part of the Research Infrastructures programme, funded under the FP7 'Capacities' Specific Programme. It focuses on the further development and evolution of the high-capacity and high-performance communication network (GÉANT), distributed computing infrastructures (grids and clouds), supercomputer infrastructures, simulation software, scientific data infrastructures, e-Science services as well as on the adoption of e-Infrastructures by user communities.