Virtual Geophysics Laboratory Scientific workflows exploiting the cloud Ryan Fraser, Terry Rankine, Lesley Wyborn, Joshua Vote, Ben Evans... Presented by Robert Woodcock October 2012
CSIRO | MINERALS DOWN UNDER FLAGSHIP
Gather data, process it, publish results Simple, isn’t it?
bedrock
surficial
mineral
geochemical
geochronologic
hyrdrogeological
Geo-information
geophysical
knowledge data
Recognise the complete picture....
Virtual Geophysics Laboratory | Robert Woodcock 4 |
Introducing The Virtual Geophysics Laboratory
Data discovery
Virtual Geophysics Laboratory | Robert Woodcock 5 |
Layers discovered via remote registries
Layers consist of numerous remote data services
Data discovery
Virtual Geophysics Laboratory | Robert Woodcock 6 |
Some data services support subsetting
Some data services support reformatting e.g. CSV, NetCDF, GeoTIFF
Data discovery
Virtual Geophysics Laboratory | Robert Woodcock 7 |
Some data is only registered with flat files
Powered by the Spatial Information Services Stack Common Platform
Marine Environment, Water
Groundwater
Geology Geophysics
What just happened?
Virtual Geophysics Laboratory | Robert Woodcock 9 |
Data processing
Virtual Geophysics Laboratory | Robert Woodcock 10 |
A variety of different scientific codes are already available in the form of “Toolboxes”
Data processing
Virtual Geophysics Laboratory | Robert Woodcock 11 |
Further input files can be uploaded.
Input files are passed directly into the cloud
Data processing
Virtual Geophysics Laboratory | Robert Woodcock 12 |
The steps so far have been building an environment to run a processing script
...or build from existing templates
Either write your own...
What just happened?
Virtual Geophysics Laboratory | Robert Woodcock 13 |
Processing script/ small input files uploaded
Start processing
Download big data sets
Perform data processing
Download Job Script/user input files
Upload processing results
Managing results - provenance
Presentation title | Presenter name 14 |
All of a job’s outputs are also accessible Each job has a lifecycle that
can be managed
Successful jobs can have their entire process captured in a ISO 19115 ‘provenance record’
What just happened?
Virtual Geophysics Laboratory | Robert Woodcock 15 |
What’s the processing status?
What are the job input/outputs?
Publish the job’s process and results
Cloud storage will persist the final artefacts
Still under construction
Virtual Geophysics Laboratory | Robert Woodcock 16 |
Courtesy - www.textfiles.com
What’s left?
• BYO cloud allocation • Users should be able to authorise VGL start jobs using their compute/storage
resources.
• Confidential Data • How do you get access to ‘restricted data’ in a secure manner? • Where can you store the results? (geographical restrictions)
• Massive Horizontal Scaling • What’s the best way to set up a truly elastic pool of CPU’s for jobs to utilise?
• A Common Processing Services Platform – SISS like?
Virtual Geophysics Laboratory | Robert Woodcock 17 |
Sustainable Resources Policy Societal Need
Virtual Solid Earth Sciences Laboratory
Environment V. Lab
Integrated Virtual Labs
Virtual Geophysical Laboratory
Virtual Core Laboratory
Virtual Geodesy
Laboratory
Virtual Climate
Laboratory
Virtual Water Laboratory
Virtual Laboratories
Geophysics Borehole data
Geodesy Climate Modelling
Water Monitoring
Virtual Libraries
Processing Services
Data
Processing Services
Data
Processing Services
Data
Processing Services
Data
Processing Services
Data
Modelling & analytic tools