overview
DESCRIPTION
Environmental Data Archival: Practices and Benefits Graham Parton [email protected] Royal Meteorological Society SIG Meeting, BAS, 5 th October 2011: Transmission, presentation and archiving of meteorological data. Overview. What is data archival Why do it? - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Overview](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815ebc550346895dcd4129/html5/thumbnails/1.jpg)
VO Sandpit, November 2009
Environmental Data Archival: Practices and Benefits
Graham Parton [email protected]
Royal Meteorological Society SIG Meeting, BAS, 5th October 2011:
Transmission, presentation and archiving of meteorological data
![Page 2: Overview](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815ebc550346895dcd4129/html5/thumbnails/2.jpg)
VO Sandpit, November 2009
Overview
What is data archival
Why do it?
How do we do it within CEDA?
![Page 3: Overview](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815ebc550346895dcd4129/html5/thumbnails/3.jpg)
VO Sandpit, November 2009
What do we call “data archival”
Placing data into a repository which is:
• Backed up• Robust (identify data corruptions)• Catalogued• Recognised repository
![Page 4: Overview](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815ebc550346895dcd4129/html5/thumbnails/4.jpg)
VO Sandpit, November 2009
Why archive data
• Making data public - Openness of the result and repeatability are essential for scientific rigor
• Place to share data with project participants• Re-purposing data• Additional services (often for free!)• Maybe required for legal reasons • Secure • Get credit
And because if you don’t….
![Page 5: Overview](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815ebc550346895dcd4129/html5/thumbnails/5.jpg)
VO Sandpit, November 2009
Why archive data
![Page 6: Overview](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815ebc550346895dcd4129/html5/thumbnails/6.jpg)
VO Sandpit, November 2009
>100,000,000 files holding ~ 1 Pb of data~38,000,000 files downloaded since October 201019,000+ register users of which ~3600 are currently ‘active’ users250+ datasets26 staffResponsible for
+ other services and projects (e.g. UKCIP, CMIP5 partner)
… i.e.. We are highly reliant on scripted systems and a well structured archive
Scale of CEDA operations
![Page 7: Overview](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815ebc550346895dcd4129/html5/thumbnails/7.jpg)
VO Sandpit, November 2009
Arrivals
3rd Party Dataproviders
Data Suppliers
Ingest
Archive Archive Archive
Backup Backup Backup
External discovery service
Catalogue
met
adat
a
External U
sers
Web service
download
view
discovery
![Page 8: Overview](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815ebc550346895dcd4129/html5/thumbnails/8.jpg)
VO Sandpit, November 2009
Arrivals
3rd Party Dataproviders
Data Suppliers
Ingest
Archive Archive Archive
Data Preparation
![Page 9: Overview](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815ebc550346895dcd4129/html5/thumbnails/9.jpg)
VO Sandpit, November 2009
Data Preparation
• Data Management Plans including delivery schedules
• Conditions of Use/Licensing
• Support suppliers in data preparation
• Capture supporting documentation (formats, calibration information, flight logs, etc.)
• File naming and archive structure
• Set up ingest routes
![Page 10: Overview](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815ebc550346895dcd4129/html5/thumbnails/10.jpg)
VO Sandpit, November 2009
Data Preparation - File structure
Take the bad data challenge…. File “sw010203”
What are these data? Guess surface winds, but on what day?What are the units? Any convention?How do we read the file? Is this spatial or temporal data?... 1440 pairs of data in a file
4.31 155.3 3.92 136.1 5.15 140.2 4.23 137.1 4.75 150.2 4.71 137.9 4.35 146.5 4.52 138.0 4.83 153.7 5.40 145.8 4.63 141.0 4.90 137.3 4.31 143.3 4.58 157.0 4.94 141.7 4.65 143.1 4.63 143.0 4.88 149.5 5.42 148.5 4.92 140.4 4.04 146.7 3.92 151.5 5.02 135.3 5.06 151.6 4.65 152.3 4.31 168.8 3.79 145.3 5.92 152.9 5.02 145.8 4.77 161.6 4.79 144.1 4.60 147.5 5.33 150.1 4.81 141.0 6.02 146.9 4.38 149.0 4.42 142.5 4.58 133.4 4.35 150.5 4.96 149.8 5.56 143.4 5.08 148.5 5.19 141.6 4.40 142.4 4.10 152.6 5.02 134.0 4.94 142.9 5.27 144.4 5.38 141.5 5.88 144.8 6.00 140.1 4.75 158.3 5.08 148.1 5.46 163.5 4.27 150.8 4.69 138.8 5.71 144.0 5.21 138.8 5.00 132.4 5.06 144.4
![Page 11: Overview](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815ebc550346895dcd4129/html5/thumbnails/11.jpg)
VO Sandpit, November 2009
Supported Formats
Highly structured metadata
Standard Names
![Page 12: Overview](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815ebc550346895dcd4129/html5/thumbnails/12.jpg)
VO Sandpit, November 2009
Arrivals
3rd Party Dataproviders
Data Suppliers
Ingest
Archive Archive Archive
External discovery service
Catalogue
met
adat
a
External U
sers
Web service
discovery
Data Discovery
![Page 13: Overview](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815ebc550346895dcd4129/html5/thumbnails/13.jpg)
VO Sandpit, November 2009
CEDA Catalogue
![Page 14: Overview](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815ebc550346895dcd4129/html5/thumbnails/14.jpg)
VO Sandpit, November 2009
NERC Data Discovery Servicedata-search.nerc.ac.uk
![Page 15: Overview](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815ebc550346895dcd4129/html5/thumbnails/15.jpg)
VO Sandpit, November 2009
CEDA Document Repositorycedadocs.badc.rl.ac.uk
![Page 16: Overview](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815ebc550346895dcd4129/html5/thumbnails/16.jpg)
VO Sandpit, November 2009
Citations for Data Creators: DOIs
Citation (and DOI)
Data Citation and DOI… but only if in a recognised repository
![Page 17: Overview](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815ebc550346895dcd4129/html5/thumbnails/17.jpg)
VO Sandpit, November 2009
Arrivals
3rd Party Dataproviders
Data Suppliers
Ingest
Archive Archive Archive
External discovery service
Catalogue
met
adat
a
External U
sers
Web service
download
view
discovery
Data Services
![Page 18: Overview](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815ebc550346895dcd4129/html5/thumbnails/18.jpg)
VO Sandpit, November 2009
Visualisation Services
![Page 19: Overview](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815ebc550346895dcd4129/html5/thumbnails/19.jpg)
VO Sandpit, November 2009
Visualisation Services ISIC Video Wall
![Page 20: Overview](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815ebc550346895dcd4129/html5/thumbnails/20.jpg)
VO Sandpit, November 2009
Visualisation Services
![Page 21: Overview](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815ebc550346895dcd4129/html5/thumbnails/21.jpg)
VO Sandpit, November 2009
Processing ServicesCEDA WPS: ceda-wps2.badc.rl.ac.uk/ui/home
Chain services together
Download resultJob either run straight awayOr sent to run on backend service
![Page 22: Overview](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815ebc550346895dcd4129/html5/thumbnails/22.jpg)
VO Sandpit, November 2009
Processing ServicesTrajectory Service
![Page 23: Overview](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815ebc550346895dcd4129/html5/thumbnails/23.jpg)
VO Sandpit, November 2009
OPeNDAP ServiceWith security layer
• Navigable and scriptable
interface to archive
• CEDA has applied security
shell using “Open ID”
technology
• Give powerful sub-setting
service for large datasets
![Page 24: Overview](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815ebc550346895dcd4129/html5/thumbnails/24.jpg)
VO Sandpit, November 2009
What’s on the horizon?
Continue to develop visualisation and data processing services
Increasing data volumes becoming too large to move around
Hosting services – provide virtual environments for people to work on the data without downloading
From Petascale to Exoscale
But all this NEEDS well data that uses standards driven metadata and formats
![Page 25: Overview](https://reader036.vdocuments.mx/reader036/viewer/2022070420/56815ebc550346895dcd4129/html5/thumbnails/25.jpg)
VO Sandpit, November 2009
Take Home Messages
Team Digial Preservation Video
• Plan for data management
• Tap into standards when preparing data
• Get data catalogued for data discovery
• Data in supported repositories leads to recognition for efforts preparing data
• A suite of additional services add value to existing data