datawarehouse workflow: etlp extract transform loadprovide make user- friendly formats dynamic...

33
Datawarehouse Workflow: ETLP Extract Transform Load Provide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

Upload: ricardo-marks

Post on 01-Apr-2015

225 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

Datawarehouse Workflow: ETLP

Extract Transform Load Provide

Make user-friendly formats

Dynamic database

Charts & MapsTools & websites

Archive native formats

Page 2: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

Datawarehouse Workflow: ETLP

Extract Transform Load Provide

tools

models

add meta information

netCDF on web server

transform to netCDF

netCDF on OPeNDAP

server

data providers = data users

data

Make user-friendly formats

Dynamic database

Charts & MapsTools & websites

Archive native formats

Page 3: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

DATA = RAW DATA + PROCESSING

DATA =

RAW DATA (volts)

History will never change!

one parameter at one place at one time

PROCESSING

Interpretation does change!

e.g. instrument deterioriation, recalibration

+

NASA satellite data with open source SeaDas processing toollkit (in IDL)• L0: dump of recorded voltages, only averaged over 16 pixels

• LAC: MLAC

• GAC• L1: voltages + satellite track• L2 ~ physical quantities• L3 ~ binned in space (1 grid instead of zillions of warped photos)• L4 ~ binned in time (climatology)

Deltares Aukepc for flumes• Stored raw data• With calibration

coefficients• Allows for recalibration

Page 4: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

Datawarehouse Workflow

Extract Transform Load Provide

Subversion repository

tools

models

add meta information

netCDF on web server

transform to netCDF

netCDF on OPeNDAP

server

data

data providers = data users

Make user-friendly formats

Dynamic database

Charts & MapsTools & websites

Archive native formats

Page 5: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

Programme today, and current session

1

3 D:\...

3 http://…

tools

models

add meta information

netCDF on web server

transform to netCDF

netCDF on OPeNDAP

server

data

data providers = data users

2

Extract Transform Load Provide

Subversion repository

Page 6: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

Repository username

• Get username and password.• Why, OpenEarth is open, right? Yes, but closed community• For best quality all actions are logged:• Nothing can be lost, only temporarily disabled• So anyone can be allowed to join

Every file is logged …

… and every line in every file is logged.

Page 7: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

commit

central database: repos.deltares.nl

local copy

D:\ E:\ F:\

REPOSITORY basics

delete

add

copy

update

browse

checkout

Page 8: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

commit

central database: repos.deltares.nl

local copy

D:\ E:\ F:\

REPOSITORY browse

delete

add

copy

update

browse

checkout

Page 9: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

REPOSITORY browse

Page 10: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

commit

central database: repos.deltares.nl

local copy

D:\ E:\ F:\

REPOSITORY checkout

delete

add

copy

update

browse

checkout

• Not handy to get files one by one with browser• Get them all at once with free program

Page 11: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

REPOSITORY checkout

• Download and install Tortoise (http://tortoisesvn.net/)• Make a checkout in e.g. F:\checkouts\• No need to back this up, it’s only a copy ...

Page 12: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

REPOSITORY checkout

• Copy url from browser (case sensitive!)• Make sure that tree of local copy resembles server

Page 13: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

commit

central database: repos.deltares.nl

local copy

D:\ E:\ F:\

REPOSITORY commit

delete

add

copy

update

browse

checkout

Page 14: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

REPOSITORY commit

up to date

modified

Page 15: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

REPOSITORY commit

Page 16: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

commit

central database: repos.deltares.nl

local copy

D:\ E:\ F:\

REPOSITORY update

delete

add

copy

update

browse

checkout

Page 17: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

REPOSITORY update

Page 18: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

REPOSITORY update

Page 19: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

REPOSITORY statistics

Page 20: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

commit

central database: repos.deltares.nl

local copy

D:\ E:\ F:\

REPOSITORY add

delete

add

copy

update

browse

checkout

Page 21: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

REPOSITORY add a raw dataset

• OpenEarthRawData is very big: don’t make a full checkout• To add a thing, first make an empty checkout of the destination.

Page 22: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

REPOSITORY add a raw dataset

• There are 2 copies of 1 file on your PC:

• Visible working copy, for editing

• Hidden shadow copy, to detect changes• Before adding a file to the server, a shadow copy must be created.• Allows

for

offline

working

Page 23: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

REPOSITORY add a raw dataset

• Now the addition must be simply be committed as any change

Page 24: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

REPOSITORY add

• The repository is supposed to be working anytime• Do not play with the actual repository• All advanced users will by annoyed by this• But then, how I can I learn how to work with it?• Solution: use the sandbox• Play around at the highest level as much as you like• And clean up afterwards (delete)• With your browser:

http://repos.deltares.nl/repos/OpenEarthTools/sandbox

Page 25: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

commit

central database: repos.deltares.nl

local copy

D:\ E:\ F:\

REPOSITORY delete

delete

add

copy

update

browse

checkout

Page 26: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

REPOSITORY add a raw dataset

• There are 2 copies of 1 file on your PC:

• Visible working copy, for editing

• Hidden shadow copy, to detect changes• When deleting a file on the server, your shadow copy be informed• Allows

for

working

offline

Page 27: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

REPOSITORY add a raw dataset

• Now the deletion must be simply be committed as any change

Page 28: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

REPOSITORY delete

• Now delete the addition you made in

http://repos.deltares.nl/repos/OpenEarthTools/sandbox• And check the log file, to see what colleagues did.

Page 29: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

commit

central database: repos.deltares.nl

local copy

D:\ E:\ F:\

REPOSITORY copy

delete

add

copy

update

browse

checkout

Page 30: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

REPOSITORY copy

• Again: first inform shadow copy locally, then commit to server …• Drag with right-mouse button

Page 31: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

OpenEarthRawData

Raw data are stored under https://repos.deltares.nl/repos/OpenEarthRawData/trunk/

• Data are stored with copyright holder as main directory.This allows

• copyright holders to maintain their own data

• copyright holders to shift easily from private to open source

• users to identify whom to acknowlegde• Data should also contain

• dedicated processing scripts (if not in OpenEarthTools)

• url file to web source

• INSPIRE XML meta-data file

Page 32: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

Summary: current session

1

3 D:\...

3 http://…

tools

models

add meta information

netCDF on web server

transform to netCDF

netCDF on OPeNDAP

server

data

data providers = data users

2

Extract Transform Load Provide

Subversion repository

Page 33: Datawarehouse Workflow: ETLP Extract Transform LoadProvide Make user- friendly formats Dynamic database Charts & Maps Tools & websites Archive native formats

Next: use OpenEarthTools to make netCDF

1

3 D:\...

3 http://…

tools

models

add meta information

netCDF on web server

transform to netCDF

netCDF on OPeNDAP

server

data

data providers = data users

2

Extract Transform Load Provide

Subversion repository