datawarehouse workflow: etlp extract transform loadprovide make user- friendly formats dynamic...

Post on 01-Apr-2015

225 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Datawarehouse Workflow: ETLP

Extract Transform Load Provide

Make user-friendly formats

Dynamic database

Charts & MapsTools & websites

Archive native formats

Datawarehouse Workflow: ETLP

Extract Transform Load Provide

tools

models

add meta information

netCDF on web server

transform to netCDF

netCDF on OPeNDAP

server

data providers = data users

data

Make user-friendly formats

Dynamic database

Charts & MapsTools & websites

Archive native formats

DATA = RAW DATA + PROCESSING

DATA =

RAW DATA (volts)

History will never change!

one parameter at one place at one time

PROCESSING

Interpretation does change!

e.g. instrument deterioriation, recalibration

+

NASA satellite data with open source SeaDas processing toollkit (in IDL)• L0: dump of recorded voltages, only averaged over 16 pixels

• LAC: MLAC

• GAC• L1: voltages + satellite track• L2 ~ physical quantities• L3 ~ binned in space (1 grid instead of zillions of warped photos)• L4 ~ binned in time (climatology)

Deltares Aukepc for flumes• Stored raw data• With calibration

coefficients• Allows for recalibration

Datawarehouse Workflow

Extract Transform Load Provide

Subversion repository

tools

models

add meta information

netCDF on web server

transform to netCDF

netCDF on OPeNDAP

server

data

data providers = data users

Make user-friendly formats

Dynamic database

Charts & MapsTools & websites

Archive native formats

Programme today, and current session

1

3 D:\...

3 http://…

tools

models

add meta information

netCDF on web server

transform to netCDF

netCDF on OPeNDAP

server

data

data providers = data users

2

Extract Transform Load Provide

Subversion repository

Repository username

• Get username and password.• Why, OpenEarth is open, right? Yes, but closed community• For best quality all actions are logged:• Nothing can be lost, only temporarily disabled• So anyone can be allowed to join

Every file is logged …

… and every line in every file is logged.

commit

central database: repos.deltares.nl

local copy

D:\ E:\ F:\

REPOSITORY basics

delete

add

copy

update

browse

checkout

commit

central database: repos.deltares.nl

local copy

D:\ E:\ F:\

REPOSITORY browse

delete

add

copy

update

browse

checkout

REPOSITORY browse

commit

central database: repos.deltares.nl

local copy

D:\ E:\ F:\

REPOSITORY checkout

delete

add

copy

update

browse

checkout

• Not handy to get files one by one with browser• Get them all at once with free program

REPOSITORY checkout

• Download and install Tortoise (http://tortoisesvn.net/)• Make a checkout in e.g. F:\checkouts\• No need to back this up, it’s only a copy ...

REPOSITORY checkout

• Copy url from browser (case sensitive!)• Make sure that tree of local copy resembles server

commit

central database: repos.deltares.nl

local copy

D:\ E:\ F:\

REPOSITORY commit

delete

add

copy

update

browse

checkout

REPOSITORY commit

up to date

modified

REPOSITORY commit

commit

central database: repos.deltares.nl

local copy

D:\ E:\ F:\

REPOSITORY update

delete

add

copy

update

browse

checkout

REPOSITORY update

REPOSITORY update

REPOSITORY statistics

commit

central database: repos.deltares.nl

local copy

D:\ E:\ F:\

REPOSITORY add

delete

add

copy

update

browse

checkout

REPOSITORY add a raw dataset

• OpenEarthRawData is very big: don’t make a full checkout• To add a thing, first make an empty checkout of the destination.

REPOSITORY add a raw dataset

• There are 2 copies of 1 file on your PC:

• Visible working copy, for editing

• Hidden shadow copy, to detect changes• Before adding a file to the server, a shadow copy must be created.• Allows

for

offline

working

REPOSITORY add a raw dataset

• Now the addition must be simply be committed as any change

REPOSITORY add

• The repository is supposed to be working anytime• Do not play with the actual repository• All advanced users will by annoyed by this• But then, how I can I learn how to work with it?• Solution: use the sandbox• Play around at the highest level as much as you like• And clean up afterwards (delete)• With your browser:

http://repos.deltares.nl/repos/OpenEarthTools/sandbox

commit

central database: repos.deltares.nl

local copy

D:\ E:\ F:\

REPOSITORY delete

delete

add

copy

update

browse

checkout

REPOSITORY add a raw dataset

• There are 2 copies of 1 file on your PC:

• Visible working copy, for editing

• Hidden shadow copy, to detect changes• When deleting a file on the server, your shadow copy be informed• Allows

for

working

offline

REPOSITORY add a raw dataset

• Now the deletion must be simply be committed as any change

REPOSITORY delete

• Now delete the addition you made in

http://repos.deltares.nl/repos/OpenEarthTools/sandbox• And check the log file, to see what colleagues did.

commit

central database: repos.deltares.nl

local copy

D:\ E:\ F:\

REPOSITORY copy

delete

add

copy

update

browse

checkout

REPOSITORY copy

• Again: first inform shadow copy locally, then commit to server …• Drag with right-mouse button

OpenEarthRawData

Raw data are stored under https://repos.deltares.nl/repos/OpenEarthRawData/trunk/

• Data are stored with copyright holder as main directory.This allows

• copyright holders to maintain their own data

• copyright holders to shift easily from private to open source

• users to identify whom to acknowlegde• Data should also contain

• dedicated processing scripts (if not in OpenEarthTools)

• url file to web source

• INSPIRE XML meta-data file

Summary: current session

1

3 D:\...

3 http://…

tools

models

add meta information

netCDF on web server

transform to netCDF

netCDF on OPeNDAP

server

data

data providers = data users

2

Extract Transform Load Provide

Subversion repository

Next: use OpenEarthTools to make netCDF

1

3 D:\...

3 http://…

tools

models

add meta information

netCDF on web server

transform to netCDF

netCDF on OPeNDAP

server

data

data providers = data users

2

Extract Transform Load Provide

Subversion repository

top related