an australian geoscience data cube aaron sedgmen geoscience australia
TRANSCRIPT
An Australian Geoscience Data Cube
Aaron Sedgmen
Geoscience Australia
Overview
• Organisational background
• Data cube concept
• Geoscience Australia’s data cube implementation
• The shift from traditional methods of managing EO data
• Example applications of the data cube
• Where to with the data cube
An Australian Geoscience Data Cube
Organisational Background
Geoscience Australia – a government agency providing advice and information to the Australian Government and geoscientific information to industry and other stakeholders.
National Earth Observations Group - provides earth observation products and services as well as expert advice, and information for decision makers.
An Australian Geoscience Data Cube
An Australian Geoscience Data Cube
The Space-Time Data Cube: a new paradigm for managing and using
environmental data
An Australian Geoscience Data Cube
191610 (x)
575 (t) x 7 (λ)
The Data Cube concept
An Australian Geoscience Data Cube
“Cubing” Landsat images
Dice… & … Stack
ti
me
space
Landsat images
Tile squares
GA’s data cube implementation
• GA developed a working data cube prototype in early 2013 to undertake time-series analysis of Landsat data
• Contains fifteen years (1998-2012) of the Landsat 5 & 7 archive covering the Australian land mass
• 3,960,528 tiles sourced from a total of 550,537 Level 1T, ARG25, Pixel Quality & some Fractional Cover datasets
• 110TB of compressed geoTIFF files
• Access to the cube is via a Python API that enables generation of mosaiced time slices, and temporal stacks of derived quantities
• Users can apply their own algorithms via the API for generating derived quantities
An Australian Geoscience Data Cube
Hosting of the data cube at NCI
• The GA data cube is hosted on the National Computational Infrastructure (NCI), located at the Australian National University in Canberra.
• The Raijin super computer at the NCI is currently ranked around 27th in the world, based on the following specifications:• 57,472 cores
• 160 Tbytes memory
• 10 Pbytes spinning disk
• 1.2 Pflops computer performance
• The storage and processing power available at NCI is a critical enabler for the data cube
An Australian Geoscience Data Cube
An Australian Geoscience Data Cube
1Petabyte hierarchical archive: Millions of individual scenesTape store accessed by robot.
Orthorectificationcalibration, cloud Masking, atmospheric correction, mosaicing
Feature extraction,algorithm applicationspectral unmixing Product packaging
and delivery
Identify footprint of product in space or time
Client requests product
Search catalogue order scenes
GA’s Traditional EO product process
EO products have traditionally been produced on demand for areas of interest from tape archives of scene based raw data
A paradigm shift from traditional methods
• The data cube holds multiple Landsat products for the entire archive – removes the need to generate products at time of request
• Hosting the data cube at NCI co-locates “big data” with high performance computing – enables in-situ analysis of the whole archive
• Computational analysis is moved from the scientist’s local environment to a central HPC facility
• Removes the need to download and replicate the data
• Provides computing power not otherwise available to many scientists
• Opens up possibilities to integrate the Landsat archive with other “big data” datasets hosted at the HPC facility
An Australian Geoscience Data Cube
An Australian Geoscience Data Cube
Surface water
Menindee Lakes time series
1998-2012
Total observations per grid cell ~600-1200
4000*4000 grid cells
Continental-Scale surface water results
An Australian Geoscience Data Cube
Time series analysis ofentire 15yr archive ofARG25 data at 25mresolution.
~2 days processing time(pre Raijin HPC facility)
What the GA data cube is not (yet)
• A publically available production system• Still a working prototype being used for internal environmental science
projects
• A real-time delivery system for time-series data serving large numbers of concurrent users (i.e. a web-delivery system)• A number of OGC specifications, including CF-netCDF, Web Coverage
Service (WCS), Web Processing Service (WPS) and Web Coverage Processing Service (WCPS), are being investigated for enabling this capability.
• Yet another system for delivering “pretty pictures” (a la GeoServer or Google Earth Engine)• The data cube environment is optimised for scientific analysis. The
delivery of portrayal data (e.g. map images via WMS) is best served by systems optimised for data distribution.
An Australian Geoscience Data Cube
Acknowledgements
• Dr Stuart Minchin – Chief, Environmental Geoscience Division
Geoscience Australia
• Alex Ip – Senior Developer, eResearch Infrastructure
Geoscience Australia
An Australian Geoscience Data Cube
Phone: +61 2 6249 9576
Web: www.ga.gov.au
Email: [email protected]
Address: Cnr Jerrabomberra Avenue and Hindmarsh Drive, Symonston ACT 2609
Postal Address: GPO Box 378, Canberra ACT 2601
Questions?
Thank you