elag workshop sessie 3 v4

22
3TU.Datacentrum Workshop session 3 “Darelux”

Upload: jeroen-rombouts

Post on 26-May-2015

304 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Elag workshop sessie 3 v4

3TU.Datacentrum

Workshop session 3

“Darelux”

Page 2: Elag workshop sessie 3 v4

• Data collection:– Hydrology measurements– Several institutes– Different sensors at multiple

locations– Long term value

• Programme started 2003Data Achiving River Environment LUXembourg

Once upon a time …Once upon a time …

Maisbich1.2 km2

Ardenne Massief (Leisteen)

HuewelerbachCatchment, basin area approx. 2.7 km2, mainly sandstone.

Page 3: Elag workshop sessie 3 v4

Centre Recherche

Gabriel Lippmann

April 12, 2023 3

Rivier

Interception

Floor

Evaporation Precipitation

Shallow

ground

Deepgroun

d

MeasurementsUniversiteit Utrecht: Bodemvocht

Gabriel Lippmann, TUDPiezometers

TUD: Afvoer over de weg

Universiteit LuxemburgGabriel Lippmann, TUDMeetstuw

Universiteit LuxemburgGabriel Lippmann, TUDTracers

Page 4: Elag workshop sessie 3 v4

• 3 Soil moisture probes

• 16 Temp. sensor + 1 fibre optic

• 5 V-notch discharge meters

• Meteo station, Pluvio meter, Interception measurement device, Groundwater level piezometer, …

• Data per month

• ASCII files from sensors

Sensors and dataSensors and data

Page 5: Elag workshop sessie 3 v4
Page 6: Elag workshop sessie 3 v4

‘Evolution’ in 3TU.DC

5 steps

Each step: considerations & solutions…

Page 7: Elag workshop sessie 3 v4

• Very high importance on long term preservation

• Limited bandwith for downloads

• No known standards from community

The beginning … (1a/5)The beginning … (1a/5)

Page 8: Elag workshop sessie 3 v4

• All information (data AND meta data) in single containers.– Relations were considered too risky– ’Homebrew’ xml with very specific tags

• Container size limited to 2MB– Measurements stored per month per sensor *

location.

Xml containers … (1b/5)Xml containers … (1b/5)

Page 9: Elag workshop sessie 3 v4

Questions?

Differences:

Page 10: Elag workshop sessie 3 v4

• Need for simpler metadata

• No repetition of metadata (sensor and location for every month)

• Less afraid of long term risks with (archive internal) relations

Another step … (2a/5)Another step … (2a/5)

Page 11: Elag workshop sessie 3 v4

• New data model and ‘cleaned’ xml– Datasets, Measuring instruments, Locations,

Time

Linked xml … (2b/5)Linked xml … (2b/5)

CollectieC1

ApparaatA1

measuredBy

isMemberOfCollection

Periode(dag)

temporal

locatedAt

longitude 4.3742Periode(maand)

isPartOf

Plaats(gebied)

isPartOf

latitude

51.9973

titleDak van EWI

title windmeter

title Delft

DatasetD2calculatedFrom

Periode(jaar)

isPartOf

title

...

creator...

DatasetD1

Plaats(punt)

DATAdatafile

DATAdatafile

DATAdatafile application/x-netcdfmimeType

created

2011-01-01T00:00:00

Informationresource

Non-informationresource

3 uri’s:- the NIR (#)- html representation- rdf (ORE)

Page 12: Elag workshop sessie 3 v4

Questions?

Differences:

Page 13: Elag workshop sessie 3 v4

• Need for more generalisation (find suitable standard for numerical data)

• Binary formats considered too risky for long term preservation

Halfway there … (3a/5)Halfway there … (3a/5)

Page 14: Elag workshop sessie 3 v4

• NcML (xml of NetCDF)

NcML … (3b/5)NcML … (3b/5)

Page 15: Elag workshop sessie 3 v4

Questions?

Differences:

Page 16: Elag workshop sessie 3 v4

• Reduce processing (at ingest and dissemination)

• Increase usability

Almost there … (4a/5)Almost there … (4a/5)

Page 17: Elag workshop sessie 3 v4

• NetCDF (binary) format– Direct usable with common tools (Matlab,

Python/Java/C, …) after download.

Binary data streams … (4b/5)Binary data streams … (4b/5)

Is NetCDF a Good Archive Format?NetCDF classic or 64-bit offset formats can be used as a general-purpose archive format for storing arrays.Compression of data is possible with netCDF (e.g., using arrays of eight-bit or 16-bit integers to encode low-resolution floating-point numbers instead of arrays of 32-bit numbers), or the resulting data file may be compressed before storage (but must be uncompressed before it is read). Hence, using these netCDF formats may require more space than special-purpose archive formats that exploit knowledge of particular characteristics of specific datasets.With netCDF-4/HDF5 format, the zlib library can provide compression on a per-variable basis. That is, some variables may be compressed, others not. In this case the compression and decompression of data happen transparently to the user, and the data may be stored, read, and written compressed.

Page 18: Elag workshop sessie 3 v4

Questions?

Differences:

Page 19: Elag workshop sessie 3 v4

• Wish for ‘Flexible’ granularity

• Further increase usability(lower threshold after steeper learning curve)

• Reduce storage requirements

The ‘last’ step … (5a/5)The ‘last’ step … (5a/5)

Page 20: Elag workshop sessie 3 v4

• OPeNDAP– Combine sets server side (1 year of data = 1

download instead of 12)– Url queries from common tools– Inspect metadata for each variable or

dimension– ‘Cut, slice & sample’ in datasets server side

• Only binary formats stored– Approx. 75% reduction in size (compared to

only xml format)

Interaction with NetCDF … (5b/5)Interaction with NetCDF … (5b/5)

Page 21: Elag workshop sessie 3 v4
Page 22: Elag workshop sessie 3 v4

Questions?

Differences: