a pparc funded project the grid data warehouse description of prototype work in progress by...

19
A PPARC funded project The Grid Data Warehouse Description of prototype work in progress by AstroGrid. Access-Grid lecture to Universities of Leeds and Sheffield by Guy Rixon on 2004- 02-04

Upload: breana-goodnow

Post on 01-Apr-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: A PPARC funded project The Grid Data Warehouse Description of prototype work in progress by AstroGrid. Access-Grid lecture to Universities of Leeds and

A PPARC funded project

The Grid Data Warehouse

Description of prototype work in progress by AstroGrid.Access-Grid lecture to Universities of Leeds and Sheffield by Guy Rixon on 2004-02-04

Page 2: A PPARC funded project The Grid Data Warehouse Description of prototype work in progress by AstroGrid. Access-Grid lecture to Universities of Leeds and

04-02-2004 GDW description: access-grid lecture 2

AstroGrid: the UK Virtual Observatory

}

Seven UK astronomy departments collaborating to build a Virtual Observatory (VO) for the use of the entire astronomical community.

Page 3: A PPARC funded project The Grid Data Warehouse Description of prototype work in progress by AstroGrid. Access-Grid lecture to Universities of Leeds and

04-02-2004 GDW description: access-grid lecture 3

IVOA: the community of VO projects

Page 4: A PPARC funded project The Grid Data Warehouse Description of prototype work in progress by AstroGrid. Access-Grid lecture to Universities of Leeds and

04-02-2004 GDW description: access-grid lecture 4

Purpose of the virtual observatory

To combine data from all sources into a data grid.

Data grid

Private files

Archives

Live feeds

Bibliographies

Data sets can be images (mainly in files) or tabular (mainly in RDBMS).

Page 5: A PPARC funded project The Grid Data Warehouse Description of prototype work in progress by AstroGrid. Access-Grid lecture to Universities of Leeds and

04-02-2004 GDW description: access-grid lecture 5

Example of VO use

“Find brown dwarf candidates: combine optical (e.g. APM catalogue) and IR (e.g. 2MASS) data to select by colour. Combine multi-epoch data to determine proper motions; select high-PM fraction of colour-selected sample. Then use that sample to…”

Optical archive

IR archive

2nd epoch

Colour sample

Refined sample

3rd epoch

Page 6: A PPARC funded project The Grid Data Warehouse Description of prototype work in progress by AstroGrid. Access-Grid lecture to Universities of Leeds and

04-02-2004 GDW description: access-grid lecture 6

VO as collection of web sites: no good

Each site has different query protocol

Results only go to browser, not to RDBMS, reprocessing

Results in HTML etc not machine readable

Basic web sites are not sufficient for the VO.

Page 7: A PPARC funded project The Grid Data Warehouse Description of prototype work in progress by AstroGrid. Access-Grid lecture to Universities of Leeds and

04-02-2004 GDW description: access-grid lecture 7

Grid metaphor: electricity supply

Loadsa complex equipment

Simple delivery to consumer

Get your power from any supplier: commodity

Page 8: A PPARC funded project The Grid Data Warehouse Description of prototype work in progress by AstroGrid. Access-Grid lecture to Universities of Leeds and

04-02-2004 GDW description: access-grid lecture 8

Commodities in astronomy data grid

Common s/w on desktop

Algorithms

Archives

Writeable Storage

Registry of resources

(Processors)

Bulk data transport; machine-readable results; combined inside grid

Metadata transport

Page 9: A PPARC funded project The Grid Data Warehouse Description of prototype work in progress by AstroGrid. Access-Grid lecture to Universities of Leeds and

04-02-2004 GDW description: access-grid lecture 9

AstroGrid topology

Portal Registry

Algorithms Writeable storageArchives

Workflow

Page 10: A PPARC funded project The Grid Data Warehouse Description of prototype work in progress by AstroGrid. Access-Grid lecture to Universities of Leeds and

04-02-2004 GDW description: access-grid lecture 10

Difficult RDBMS operations

“Select objects with V-K > 4.5…” (i.e. find ‘red’ objects).

U, B, V, R

Optical archive service

IR archive service

J, H, K?No std. way of combining DBs.

No std. way of storing results in RDBMS

?

Page 11: A PPARC funded project The Grid Data Warehouse Description of prototype work in progress by AstroGrid. Access-Grid lecture to Universities of Leeds and

04-02-2004 GDW description: access-grid lecture 11

Need for data warehouse

Join across internet

RDBMS RDBMSRDBMS

RDBMS

RDBMS

RDBMS

RDBMSJoin inside warehouse DB

1000x speed gains

Page 12: A PPARC funded project The Grid Data Warehouse Description of prototype work in progress by AstroGrid. Access-Grid lecture to Universities of Leeds and

04-02-2004 GDW description: access-grid lecture 12

GDW topology extends AstroGrid

Portal

File storage Archive

Workflow

Registry

Grid-DB (OGSA-DAI)

Warehouse controller

Grid-DB (OGSA-DAI)

Page 13: A PPARC funded project The Grid Data Warehouse Description of prototype work in progress by AstroGrid. Access-Grid lecture to Universities of Leeds and

04-02-2004 GDW description: access-grid lecture 13

GDW people

Kona Andrews (Cambridge)Elizabeth Auden (MSSL)Martin Hill (Edinburgh)Tony Linde (Leicester)Clive Page (Leicester)Guy Rixon (Cambridge)Noel Winstanley (Jodrell Bank)

Page 14: A PPARC funded project The Grid Data Warehouse Description of prototype work in progress by AstroGrid. Access-Grid lecture to Universities of Leeds and

04-02-2004 GDW description: access-grid lecture 14

Current system

Portal

File storage Archive

Workflow

Registry

Grid-DB (OGSA-DAI)

Warehouse controller

Grid-DB (OGSA-DAI)

Link not implemented yet

DB tables preloaded; read-only DB

Link temporarily redirected

Page 15: A PPARC funded project The Grid Data Warehouse Description of prototype work in progress by AstroGrid. Access-Grid lecture to Universities of Leeds and

04-02-2004 GDW description: access-grid lecture 15

Next system (3Q2004)

Portal

File storage Archive

Workflow

Registry

Grid-DB (OGSA-DAI)

Warehouse controller

Grid-DB (OGSA-DAI)

Limited choice

Links implemented properly (GridFTP)

Two dedicated installations inside AstroGrid; multi-user

Page 16: A PPARC funded project The Grid Data Warehouse Description of prototype work in progress by AstroGrid. Access-Grid lecture to Universities of Leeds and

04-02-2004 GDW description: access-grid lecture 16

Ultimate system (2005+)

Portal

File storage Archive

Workflow

Registry

Warehouse controller

Grid-DB (OGSA-DAI)

AstroGrid

UK e-Science grid / EGEE

One node per user; any storage node

Page 17: A PPARC funded project The Grid Data Warehouse Description of prototype work in progress by AstroGrid. Access-Grid lecture to Universities of Leeds and

04-02-2004 GDW description: access-grid lecture 17

Assessment

Basic idea is soundCoding of GDW was quite simpleVery difficult to get it all integratedProblems with OGSA-DAI:

Performance Data-size limits Can’t get higher functions to work yet

Proceed? Yes; need to experiment further Still expect to get science out of it

Page 18: A PPARC funded project The Grid Data Warehouse Description of prototype work in progress by AstroGrid. Access-Grid lecture to Universities of Leeds and

04-02-2004 GDW description: access-grid lecture 18

Can one use it?

Beta testers invitedWait for release of “Iteration 4.1” system (soon!)Wait for release of “Iteration 5” system (3Q2004) to see GDW useful for scienceAstroGrid final release is at the end of 2004

http://wiki.astrogrid.org/bin/view/Astrogrid/BetaTesting

Page 19: A PPARC funded project The Grid Data Warehouse Description of prototype work in progress by AstroGrid. Access-Grid lecture to Universities of Leeds and

04-02-2004 GDW description: access-grid lecture 19

That’s all That’s all folks!folks!