a pparc funded project the grid data warehouse description of prototype work in progress by...
TRANSCRIPT
A PPARC funded project
The Grid Data Warehouse
Description of prototype work in progress by AstroGrid.Access-Grid lecture to Universities of Leeds and Sheffield by Guy Rixon on 2004-02-04
04-02-2004 GDW description: access-grid lecture 2
AstroGrid: the UK Virtual Observatory
}
Seven UK astronomy departments collaborating to build a Virtual Observatory (VO) for the use of the entire astronomical community.
04-02-2004 GDW description: access-grid lecture 3
IVOA: the community of VO projects
04-02-2004 GDW description: access-grid lecture 4
Purpose of the virtual observatory
To combine data from all sources into a data grid.
Data grid
Private files
Archives
Live feeds
Bibliographies
Data sets can be images (mainly in files) or tabular (mainly in RDBMS).
04-02-2004 GDW description: access-grid lecture 5
Example of VO use
“Find brown dwarf candidates: combine optical (e.g. APM catalogue) and IR (e.g. 2MASS) data to select by colour. Combine multi-epoch data to determine proper motions; select high-PM fraction of colour-selected sample. Then use that sample to…”
Optical archive
IR archive
2nd epoch
Colour sample
Refined sample
3rd epoch
04-02-2004 GDW description: access-grid lecture 6
VO as collection of web sites: no good
Each site has different query protocol
Results only go to browser, not to RDBMS, reprocessing
Results in HTML etc not machine readable
Basic web sites are not sufficient for the VO.
04-02-2004 GDW description: access-grid lecture 7
Grid metaphor: electricity supply
Loadsa complex equipment
Simple delivery to consumer
Get your power from any supplier: commodity
04-02-2004 GDW description: access-grid lecture 8
Commodities in astronomy data grid
Common s/w on desktop
Algorithms
Archives
Writeable Storage
Registry of resources
(Processors)
Bulk data transport; machine-readable results; combined inside grid
Metadata transport
04-02-2004 GDW description: access-grid lecture 9
AstroGrid topology
Portal Registry
Algorithms Writeable storageArchives
Workflow
04-02-2004 GDW description: access-grid lecture 10
Difficult RDBMS operations
“Select objects with V-K > 4.5…” (i.e. find ‘red’ objects).
U, B, V, R
Optical archive service
IR archive service
J, H, K?No std. way of combining DBs.
No std. way of storing results in RDBMS
?
04-02-2004 GDW description: access-grid lecture 11
Need for data warehouse
Join across internet
RDBMS RDBMSRDBMS
RDBMS
RDBMS
RDBMS
RDBMSJoin inside warehouse DB
1000x speed gains
04-02-2004 GDW description: access-grid lecture 12
GDW topology extends AstroGrid
Portal
File storage Archive
Workflow
Registry
Grid-DB (OGSA-DAI)
Warehouse controller
Grid-DB (OGSA-DAI)
04-02-2004 GDW description: access-grid lecture 13
GDW people
Kona Andrews (Cambridge)Elizabeth Auden (MSSL)Martin Hill (Edinburgh)Tony Linde (Leicester)Clive Page (Leicester)Guy Rixon (Cambridge)Noel Winstanley (Jodrell Bank)
04-02-2004 GDW description: access-grid lecture 14
Current system
Portal
File storage Archive
Workflow
Registry
Grid-DB (OGSA-DAI)
Warehouse controller
Grid-DB (OGSA-DAI)
Link not implemented yet
DB tables preloaded; read-only DB
Link temporarily redirected
04-02-2004 GDW description: access-grid lecture 15
Next system (3Q2004)
Portal
File storage Archive
Workflow
Registry
Grid-DB (OGSA-DAI)
Warehouse controller
Grid-DB (OGSA-DAI)
Limited choice
Links implemented properly (GridFTP)
Two dedicated installations inside AstroGrid; multi-user
04-02-2004 GDW description: access-grid lecture 16
Ultimate system (2005+)
Portal
File storage Archive
Workflow
Registry
Warehouse controller
Grid-DB (OGSA-DAI)
AstroGrid
UK e-Science grid / EGEE
One node per user; any storage node
04-02-2004 GDW description: access-grid lecture 17
Assessment
Basic idea is soundCoding of GDW was quite simpleVery difficult to get it all integratedProblems with OGSA-DAI:
Performance Data-size limits Can’t get higher functions to work yet
Proceed? Yes; need to experiment further Still expect to get science out of it
04-02-2004 GDW description: access-grid lecture 18
Can one use it?
Beta testers invitedWait for release of “Iteration 4.1” system (soon!)Wait for release of “Iteration 5” system (3Q2004) to see GDW useful for scienceAstroGrid final release is at the end of 2004
http://wiki.astrogrid.org/bin/view/Astrogrid/BetaTesting
04-02-2004 GDW description: access-grid lecture 19
That’s all That’s all folks!folks!