elag workshop sessie 3 v4
TRANSCRIPT
3TU.Datacentrum
Workshop session 3
“Darelux”
• Data collection:– Hydrology measurements– Several institutes– Different sensors at multiple
locations– Long term value
• Programme started 2003Data Achiving River Environment LUXembourg
Once upon a time …Once upon a time …
Maisbich1.2 km2
Ardenne Massief (Leisteen)
HuewelerbachCatchment, basin area approx. 2.7 km2, mainly sandstone.
Centre Recherche
Gabriel Lippmann
April 12, 2023 3
Rivier
Interception
Floor
Evaporation Precipitation
Shallow
ground
Deepgroun
d
MeasurementsUniversiteit Utrecht: Bodemvocht
Gabriel Lippmann, TUDPiezometers
TUD: Afvoer over de weg
Universiteit LuxemburgGabriel Lippmann, TUDMeetstuw
Universiteit LuxemburgGabriel Lippmann, TUDTracers
• 3 Soil moisture probes
• 16 Temp. sensor + 1 fibre optic
• 5 V-notch discharge meters
• Meteo station, Pluvio meter, Interception measurement device, Groundwater level piezometer, …
• Data per month
• ASCII files from sensors
Sensors and dataSensors and data
‘Evolution’ in 3TU.DC
5 steps
Each step: considerations & solutions…
• Very high importance on long term preservation
• Limited bandwith for downloads
• No known standards from community
The beginning … (1a/5)The beginning … (1a/5)
• All information (data AND meta data) in single containers.– Relations were considered too risky– ’Homebrew’ xml with very specific tags
• Container size limited to 2MB– Measurements stored per month per sensor *
location.
Xml containers … (1b/5)Xml containers … (1b/5)
Questions?
Differences:
• Need for simpler metadata
• No repetition of metadata (sensor and location for every month)
• Less afraid of long term risks with (archive internal) relations
Another step … (2a/5)Another step … (2a/5)
• New data model and ‘cleaned’ xml– Datasets, Measuring instruments, Locations,
Time
Linked xml … (2b/5)Linked xml … (2b/5)
CollectieC1
ApparaatA1
measuredBy
isMemberOfCollection
Periode(dag)
temporal
locatedAt
longitude 4.3742Periode(maand)
isPartOf
Plaats(gebied)
isPartOf
latitude
51.9973
titleDak van EWI
title windmeter
title Delft
DatasetD2calculatedFrom
Periode(jaar)
isPartOf
title
...
creator...
DatasetD1
Plaats(punt)
DATAdatafile
DATAdatafile
DATAdatafile application/x-netcdfmimeType
created
2011-01-01T00:00:00
Informationresource
Non-informationresource
3 uri’s:- the NIR (#)- html representation- rdf (ORE)
Questions?
Differences:
• Need for more generalisation (find suitable standard for numerical data)
• Binary formats considered too risky for long term preservation
Halfway there … (3a/5)Halfway there … (3a/5)
• NcML (xml of NetCDF)
NcML … (3b/5)NcML … (3b/5)
Questions?
Differences:
• Reduce processing (at ingest and dissemination)
• Increase usability
Almost there … (4a/5)Almost there … (4a/5)
• NetCDF (binary) format– Direct usable with common tools (Matlab,
Python/Java/C, …) after download.
Binary data streams … (4b/5)Binary data streams … (4b/5)
Is NetCDF a Good Archive Format?NetCDF classic or 64-bit offset formats can be used as a general-purpose archive format for storing arrays.Compression of data is possible with netCDF (e.g., using arrays of eight-bit or 16-bit integers to encode low-resolution floating-point numbers instead of arrays of 32-bit numbers), or the resulting data file may be compressed before storage (but must be uncompressed before it is read). Hence, using these netCDF formats may require more space than special-purpose archive formats that exploit knowledge of particular characteristics of specific datasets.With netCDF-4/HDF5 format, the zlib library can provide compression on a per-variable basis. That is, some variables may be compressed, others not. In this case the compression and decompression of data happen transparently to the user, and the data may be stored, read, and written compressed.
Questions?
Differences:
• Wish for ‘Flexible’ granularity
• Further increase usability(lower threshold after steeper learning curve)
• Reduce storage requirements
The ‘last’ step … (5a/5)The ‘last’ step … (5a/5)
• OPeNDAP– Combine sets server side (1 year of data = 1
download instead of 12)– Url queries from common tools– Inspect metadata for each variable or
dimension– ‘Cut, slice & sample’ in datasets server side
• Only binary formats stored– Approx. 75% reduction in size (compared to
only xml format)
Interaction with NetCDF … (5b/5)Interaction with NetCDF … (5b/5)
Questions?
Differences: