extending the common framework for earth observation data to
TRANSCRIPT
ExtendingtheCommonFrameworkForEarthObserva7onDatatootherDisciplinaryDataandProgramma7cAccess
BenEvans,KelseyDruken,ClareRichards,ClaireTrenham,JingboWang,LesleyWyborn
IN22A-05
nci.org.au
Earth Observa,on Common Data Framework
TheCOMMONFRAMEWORKFOREARTH-OBSERVATIONDATAwasfinalisedonMarch23,2016hRps://www.whitehouse.gov/sites/default/files/microsites/ostp/common_framework_for_earth_observaYon_data.pdfAfewkeystatements:• FederalEarth-observa0ondataarepublicgoods,paidforbytheAmericanpeople,andthatfree,full,andopenaccess
tothesedatasignificantlyenhancestheirvalue.ThereturnonourannualEarth-observa0oninvestmentincreasesinaccordancewiththedata’swidespreaduseinpublicandprivatesectordecision-making.“
• Eachyear,thememberagenciesoftheU.S.GrouponEarthObserva0ons(USGEO)investmorethan$4billiondollarsincivilEarthobserva0ons.“
• ….interna0onalengagementprovidesU.S.en00eswithaccesstovaluablenewsourcesofEarth-observa0ondata.• Thesuccess(…)restsoneffec0ve(…)datamanagementprac0ces,ata0mewhenincreasingdatavolumesintroduce
newmanagementchallenges.• The(..)FrameworkforEarthObserva0onsDataprovidesguidance(…)forimprovingandstandardizingtheirdata
managementprac0ces.• Bystandardizingtheprotocols(…)willmakeiteasiertoobtainandassembledatafromdiversesourcesforimproved
analysis,understanding,decision-making,communityresilience,andcommercialuse.
© National Computational Infrastructure 2016 BenEvans,AGU2016
nci.org.au
Thereare10DataManagementPrinciples(DMP1-10)thattheInternaYonalGrouponEarthObservaYon(GEO)havedevelopedandtheCommonDataFrameworkconsidered5oftheseindetail.TherecommendaYons(…)arefocusedonthediscoverability,accessibility,andusabilityofEarthobservaYonsandderiveddataproducts.Topicsaddressedinclude• DataSearchandDiscoveryServices,• DataAccessServices,• DataDocumenta7on,and• Compa7bleFormatsandVocabularies.
Focus of the recommenda,ons
© National Computational Infrastructure 2016 BenEvans,AGU2016
WithineachsecYonthereare:• StandardsandProtocols-officiallyendorsedstandardsfor
useinEarth-observaYondatamanagement• MethodsandPrac7ces-recommendedwaystousethe
endorsedstandards,includingmakingdataopenbydefault• Implementa7ons-availablesoawaretouseinrealizingthe
standardsandexamplesofuseofthestandards
FAIR(Findable,Accessible,Interoperable,Reusable)issimilar.
nci.org.au
TropicalCyclones
CycloneWinston20-21Feb,2016
VolcanicAsh
ManamErupYon31July,2015
BushFires
Requirescross-domainresearch• ModellingExtreme&HighImpactevents–BoM• NWP,ClimateCoupledSystems&DataAssimilaYon–BoM,CSIRO,ResearchCollaboraYons• Hazards-GeoscienceAustralia,BoM,States• Geophysics,PotenYalFields,Seismic,Electromag–GeoscienceAustralia,UniversiYes• MonitoringtheEnvironment&Ocean–ANU,BoM,CSIRO,GA,Research,Fed/State• Agriculture/foodsecurityissues
Flooding
StGeorge,QLDFebruary,2011
WyeValley&LorneFires25-31Dec,2015
© National Computational Infrastructure 2016
Earth and Solid Earth Examples of priority areas for research at NCI
BenEvans,AGU2016
3DGeophysicalModels
nci.org.au
DataCollecYons Approx.Capacity
CMIP5,CORDEX,ACCESSModels 5Pbytes
SatelliteEarthObs:LANDSAT,Himawari-8,SenYnel,MODIS,INSAR 2Pbytes
DigitalElevaYon,BathymetryOnshore/OffshoreGeophysics
1Pbytes
SeasonalClimate 700Tbytes
BureauofMeteorologyObservaYons 350Tbytes
BureauofMeteorologyOcean-Marine 350Tbytes
TerrestrialEcosystem 290Tbytes
Reanalysisproducts 100Tbytes
1. Climate/ESSModelAssetsandDataProducts2. EarthandMarineObservaYonsandDataProducts3. GeoscienceCollecYons4. TerrestrialEcosystemsCollecYons5.WaterManagementandHydrologyCollecYonshRps://datacatalogue.nci.org.au
NCI Na,onal Earth Systems Research Data Collec,ons
© National Computational Infrastructure 2016 BenEvans,AGU2016
nci.org.au
Paradigm shiE: Data easily accessible across disciplines
AspresentedatAGUlastyear,wehasbeenworkingon“Transdisciplinary”approachtoitsData
© National Computational Infrastructure 2016 BenEvans,AGU2016
nci.org.au
10+ PB of Data for across the different Science domains
BOM GA CSIRO ANU Inter-na7onal
OtherNa7onal
CMIP5 3PB
Astronomy (Optical) 200 TB
Water Ocean 1.5 PB
Atmosphere 2.4 PB
Earth Observ.
2 PB
Marine Videos 10 TB
Weather 340 TB
Bathy, DEM
100 TB
Geophysics
© National Computational Infrastructure 2016 BenEvans,AGU2016
nci.org.au
Data Services
NERDIP Data Platform
Compute Intensive
Virtual Laboratories
NERDIP – simplified view
Fast/Deep Data Access
Portal views
Machine Connected
Program access
Server-side functions
© National Computational Infrastructure 2016 BenEvans,AGU2016
nci.org.au
NaYonalEnvironmentalResearchDataInteroperabilityPlamorm(NERDIP)
HDF5
NetCDF-4 Climate
GDAL
APILayers
HPDataLibraryLayer
[SEG-Y] [AirborneGeophysics]
[FITS] [LAS LiDAR]
DataConven4ons netCDF-CF
[HDF4- EOS]
ISO19115,ACDD,RIF-CS,DCAT,etc.
VGLAGDCVL
ServicesLayer
Fast“whole-of-library”
catalogue
Lustre OtherStorage(e.g.,HDFS)
Na4onalEnvironmentalResearchDataInteroperabilityPla>orm(NERDIP)
Climate&WeatherScienceLab
Biodiversity&ClimateChangeVL
OGC
WFS
OGC
SWE
OGC
W*PS
OGC
WCS
OGC
WMS
OGC
W*TS
RDF,LD
VHIRLGlobeClaritas
WorkflowEngines,VirtualLaboratories(VL’s),ScienceGateways
AuScopePortal
TERNPortal
AODN/IMOSPortal
eMASTSpeddexes
AllSkyVirtualObservatory
ANDS/RDAPortal
eReefs
ModelsFortran,C,C++,MPI,OpenMP
Python,R,MatLab,IDL
Visualisa7onDrish7
Ferret,NCO,GDL,GDAL,GRASS,QGIS
DigitalBathymetry&Eleva4onPortal
Data.gov.au
OpenNavSurface
ToolsDataPortals
DirectAccess
CS-W
NetCDF-4 Weather
NetCDF-4 Oceans
NetCDF-4 EO
NetCDF-4 Bathy
HDF5??
VocabService
PROVService
Open
DAP
© National Computational Infrastructure 2016 BenEvans,AGU2016
nci.org.au
Applying the Recommenda,ons – 1. Data Search and Discovery
DMP-1:DataandallassociatedmetadatawillbediscoverablethroughCatalogsandsearchengines,anddataaccessandusecondiYons,includinglicenses,willbeclearlyindicated.Interfaces:• OpenGeospaYalConsorYum(OGC,hRp://www.opengeospaYal.org/)EasyCatalogServicesfortheWeb
(CSW,hRp://www.ogcnetwork.net/node/630)• OpenSearch(hRp://www.opensearch.org/)• OpenArchivesIniYaYveProtocolforMetadataHarvesYng(OAI-PMH,hRps://www.openarchives.org/pmh/)
Soaware:• Geonetwork(hRp://geonetwork-opensource.org/)
SuggestedUpdatetoFramework• ImprovedstandardsforAPIsforgeospaYalindexesthathaveemergedthatprovide“datacube”access.
© National Computational Infrastructure 2016 BenEvans,AGU2016
nci.org.au
Applying the Recommenda,ons – 1. Data Search and Discovery (cont)
DMP-10:Datawillbeassignedappropriatepersistent,resolvableidenYfierstoenabledocumentstocitethedataonwhichtheyarebased,andtoenabledataproviderstoreceiveacknowledgementofuseoftheirdataInterfaces:• DigitalObjectIdenYfiers(DOIs)usingDataCiteserviceareassignedtodatasets
Soaware:• DOIServicesthroughAustralianNaYonalDataService(ANDS)thatiscompliantwithDataCite.
SuggestedUpdatetoFramework• PersistentIdenYfierServices(PID)areneededtoenableprogrammaYcaccesstothatdataviadataserviceend
points.
© National Computational Infrastructure 2016 BenEvans,AGU2016
nci.org.au
Applying the Recommenda,ons – 2. Data Access Services
DMP-2:Datawillbeaccessibleviaonlineservices,including,atminimum,directdownload,butpreferablyuser-customizableservicesforvisualizaYonandcomputaYonInterfaces:• OGCWebMapService(WMS,hRp://www.opengeospaYal.org/standards/wms)• Open-sourceProjectforaNetworkDataAccessProtocol(OPeNDAP,hRp://www.opendap.org/support)• OGCWebCoverageService(WCS,hRp://www.opengeospaYal.org/standards/wcs)• OGCWebFeatureService(WFS,hRp://www.opengeospaYal.org/standards/wfs)• TableDAP(hRp://coastwatch.pfeg.noaa.gov/erddap/tabledap/)
Soaware:• DAP:TDS,Hyrax,ERDDAP• WMS:ERDDAP,GeoServer,ncWMS(TDS),GSKY,Rasdaman,• WCS:GeoServer,Rasdaman• WFS:GeoServer• TableDAP:ERDDAP
SuggestedUpdatetoFramework• WPSservicesarenowanessenYalpartoftheonlineservices.
© National Computational Infrastructure 2016 BenEvans,AGU2016
nci.org.au
Applying the Recommenda,ons – 3. Data Documenta,on
DMP-4:Datawillbecomprehensivelydocumented,includingallelementsnecessarytoaccess,use,understand,andprocess,preferablyviaformalstructuredmetadatabasedoninternaYonalorcommunity-approvedstandards.Totheextentpossible,datawillalsobedescribedinpeer-reviewedpublicaYonsreferencedinthemetadatarecordInterfaces:• ISO19115-1• ISO19115-3definestheExtensibleMarkupLanguage(XML)encodingfor19115-1
Soaware:• GeoNetwork
© National Computational Infrastructure 2016 BenEvans,AGU2016
nci.org.au
Applying the Recommenda,ons – 4. Compa,ble Standards and Vocabularies
DMP-3:DatashouldbestructuredusingencodingsthatarewidelyacceptedinthetargetusercommunityandalignedwithorganizaYonalneedsandobservingmethods,withpreferencegiventonon-proprietaryinternaYonalstandardsDataFormats:• NetCDF4/HDF5(numerical)• GeoTIFF->NetCDF4/HDF5(imagery)• (extensionsbeyondEO):pointdata:LAS->HDF5.SeismicData:HDF5OpYcalAstroData:FITS
ControlledVocabulariesandParameternames• CFconvenYon• ACDD• NASA’sGlobalChangeMasterDirectory(GCMD,hRp://gcmd.nasa.gov/learn/keyword_list.html)(parYallyapplied)
SuggestedUpdatetoFramework
© National Computational Infrastructure 2016 BenEvans,AGU2016
nci.org.au
What is Data collec,on / Dataset?
NCIDefini7onsDataset:acompilaYonofdatathatconsYtutesaprogrammabledataunitthathasbeencollectedandorganisedusingtheoneprocess.Itmusthavea• namedDataOwner,• asinglelicense,• onesetofsemanYcs,ontologies,vocabularies,• hasasingledataformatandinternaldataconvenYon.• mustincludeitsversion.
© National Computational Infrastructure 2016 BenEvans,AGU2016
nci.org.au
Data Quality Strategy (DQS)
DataQualityStrategy(DQS):Whatdoesit
involve?
1. UnderlyingHPDfileformat
2. ClosecollaboraYonwithdatacustodians
andmanagers
• Planning,designing,orreassessing
thedatacollecYons
3. Qualitycontrolthroughcompliancewith
recognisedcommunitystandards
4. Dataassurancethroughdemonstrated
funcYonalityacrosscommonplamorms,
tools,andservices
© National Computational Infrastructure 2016 BenEvans,AGU2016
nci.org.au
Func,onality tests
© National Computational Infrastructure 2016 BenEvans,AGU2016
nci.org.au
Ge[ng serious about performance
• QualityAssuranceagainstperformancemetrics• Weneedtoscaledatasothatyoucananalyseinreal-Ymeandin-situ.
• Needtocombine/overlay/slice-diceallmannerofdataathighprecisionfromvastreferencewithhighlyspecificdata.
• Weneedfaster,automatedsystemsforrealworldacYviYes,decisionmakingcapabilityusingsmartnewalgorithmsandprogrammaYctechniques:
• realdatafeeds,cross-referencinglongitudinaldata,geospaYalorotherkey“metadata”queries.
0
1000
2000
3000
4000
5000
6000
7000
1 8 16 32 64 128
MB/s
Stripecount
Independent Read
HDF5 MPIIO POSIX© National Computational Infrastructure 2016 BenEvans,AGU2016
nci.org.au
Tested against Virtual Labs and web tools
ARCGISGodiva
© National Computational Infrastructure 2016 BenEvans,AGU2016
nci.org.au
• TheCommonFrameworkforEarthObservaYonDataisreadilyapplicabletootherdomains• Notonlyextendible,butwethinkisnecessaryinatransdisciplinaryworldandtoenableprogrammaYcaccess
• Someitemsshouldberefreshedforemerginguseofdata–parYcularlytoenableprogrammaYcaccess
Conclusions
© National Computational Infrastructure 2016 BenEvans,AGU2016