Datawarehouse Workflow: ETLP
Extract Transform Load Provide
Make user-friendly formats
Dynamic database
Charts & MapsTools & websites
Archive native formats
Datawarehouse Workflow: ETLP
Extract Transform Load Provide
tools
models
add meta information
netCDF on web server
transform to netCDF
netCDF on OPeNDAP
server
data providers = data users
data
Make user-friendly formats
Dynamic database
Charts & MapsTools & websites
Archive native formats
DATA = RAW DATA + PROCESSING
DATA =
RAW DATA (volts)
History will never change!
one parameter at one place at one time
PROCESSING
Interpretation does change!
e.g. instrument deterioriation, recalibration
+
NASA satellite data with open source SeaDas processing toollkit (in IDL)• L0: dump of recorded voltages, only averaged over 16 pixels
• LAC: MLAC
• GAC• L1: voltages + satellite track• L2 ~ physical quantities• L3 ~ binned in space (1 grid instead of zillions of warped photos)• L4 ~ binned in time (climatology)
Deltares Aukepc for flumes• Stored raw data• With calibration
coefficients• Allows for recalibration
Datawarehouse Workflow
Extract Transform Load Provide
Subversion repository
tools
models
add meta information
netCDF on web server
transform to netCDF
netCDF on OPeNDAP
server
data
data providers = data users
Make user-friendly formats
Dynamic database
Charts & MapsTools & websites
Archive native formats
Programme today, and current session
1
3 D:\...
3 http://…
tools
models
add meta information
netCDF on web server
transform to netCDF
netCDF on OPeNDAP
server
data
data providers = data users
2
Extract Transform Load Provide
Subversion repository
Repository username
• Get username and password.• Why, OpenEarth is open, right? Yes, but closed community• For best quality all actions are logged:• Nothing can be lost, only temporarily disabled• So anyone can be allowed to join
Every file is logged …
… and every line in every file is logged.
commit
central database: repos.deltares.nl
local copy
D:\ E:\ F:\
REPOSITORY basics
delete
add
copy
update
browse
checkout
commit
central database: repos.deltares.nl
local copy
D:\ E:\ F:\
REPOSITORY browse
delete
add
copy
update
browse
checkout
REPOSITORY browse
commit
central database: repos.deltares.nl
local copy
D:\ E:\ F:\
REPOSITORY checkout
delete
add
copy
update
browse
checkout
• Not handy to get files one by one with browser• Get them all at once with free program
REPOSITORY checkout
• Download and install Tortoise (http://tortoisesvn.net/)• Make a checkout in e.g. F:\checkouts\• No need to back this up, it’s only a copy ...
REPOSITORY checkout
• Copy url from browser (case sensitive!)• Make sure that tree of local copy resembles server
commit
central database: repos.deltares.nl
local copy
D:\ E:\ F:\
REPOSITORY commit
delete
add
copy
update
browse
checkout
REPOSITORY commit
up to date
modified
REPOSITORY commit
commit
central database: repos.deltares.nl
local copy
D:\ E:\ F:\
REPOSITORY update
delete
add
copy
update
browse
checkout
REPOSITORY update
REPOSITORY update
REPOSITORY statistics
commit
central database: repos.deltares.nl
local copy
D:\ E:\ F:\
REPOSITORY add
delete
add
copy
update
browse
checkout
REPOSITORY add a raw dataset
• OpenEarthRawData is very big: don’t make a full checkout• To add a thing, first make an empty checkout of the destination.
REPOSITORY add a raw dataset
• There are 2 copies of 1 file on your PC:
• Visible working copy, for editing
• Hidden shadow copy, to detect changes• Before adding a file to the server, a shadow copy must be created.• Allows
for
offline
working
REPOSITORY add a raw dataset
• Now the addition must be simply be committed as any change
REPOSITORY add
• The repository is supposed to be working anytime• Do not play with the actual repository• All advanced users will by annoyed by this• But then, how I can I learn how to work with it?• Solution: use the sandbox• Play around at the highest level as much as you like• And clean up afterwards (delete)• With your browser:
http://repos.deltares.nl/repos/OpenEarthTools/sandbox
commit
central database: repos.deltares.nl
local copy
D:\ E:\ F:\
REPOSITORY delete
delete
add
copy
update
browse
checkout
REPOSITORY add a raw dataset
• There are 2 copies of 1 file on your PC:
• Visible working copy, for editing
• Hidden shadow copy, to detect changes• When deleting a file on the server, your shadow copy be informed• Allows
for
working
offline
REPOSITORY add a raw dataset
• Now the deletion must be simply be committed as any change
REPOSITORY delete
• Now delete the addition you made in
http://repos.deltares.nl/repos/OpenEarthTools/sandbox• And check the log file, to see what colleagues did.
commit
central database: repos.deltares.nl
local copy
D:\ E:\ F:\
REPOSITORY copy
delete
add
copy
update
browse
checkout
REPOSITORY copy
• Again: first inform shadow copy locally, then commit to server …• Drag with right-mouse button
OpenEarthRawData
Raw data are stored under https://repos.deltares.nl/repos/OpenEarthRawData/trunk/
• Data are stored with copyright holder as main directory.This allows
• copyright holders to maintain their own data
• copyright holders to shift easily from private to open source
• users to identify whom to acknowlegde• Data should also contain
• dedicated processing scripts (if not in OpenEarthTools)
• url file to web source
• INSPIRE XML meta-data file
Summary: current session
1
3 D:\...
3 http://…
tools
models
add meta information
netCDF on web server
transform to netCDF
netCDF on OPeNDAP
server
data
data providers = data users
2
Extract Transform Load Provide
Subversion repository
Next: use OpenEarthTools to make netCDF
1
3 D:\...
3 http://…
tools
models
add meta information
netCDF on web server
transform to netCDF
netCDF on OPeNDAP
server
data
data providers = data users
2
Extract Transform Load Provide
Subversion repository