the quantum chromodynamics grid james perry, andrew jackson, matthew egbert, stephen booth, lorna...
TRANSCRIPT
The Quantum Chromodynamics Grid
James Perry, Andrew Jackson, Matthew Egbert, Stephen Booth, Lorna Smith
EPCC, The University Of Edinburgh
Overview
Overview
The data grid
The metadata catalogue and browser
Conclusions
References
Overview
Aim– To implement a 'QCDgrid' to become a production
environment for UKQCD, a collaboration of UK Scientists carrying out Quantum Chromodynamics (QCD) simulations
The Grid– This multi-terabyte storage system will supporting distributed
data management across four UK sites: Edinburgh, Glasgow, Liverpool and Swansea
Funding– QCDGrid is part of the GridPP project, a PPARC
funded initiative
Why build a QCD Grid?
QCD currently generates terabytes – petabytes of data– Especially when their purpose built HPC system QCDOC
comes on line– Post-processing is highly diverse and distributed– Involves multinational collaborations
The challenge is to store and access this data– Secure, reliable and expandable distributed storage system
required
Initially, the QCDGrid project aims to address this issue– Develop a multi-terabyte storage system, supporting
distributed data management across different UK sites
The QCDGrid
Stage 1: Implement a multi-site data storage Grid– Globus toolkit for toolkit for basic grid operations e.g. data transfer,
security – Globus replica catalogue for to maintain a directory of files on the
Grid– Intend to use EDG software in the future e.g. for file replication
Stage 2: Develop structured data which describes the characteristics of the raw data (metadata)– Develop an XML schema for lattice QCD Calculations – Implement a metadata catalogue– Develop a metadata catalogue browser
The QCDGrid Structure
Basic DataGrid Requirements
The data grid must distribute data across the four sites
Robustly– Each file must be replicated at at least two sites
Efficiently– Where possible, files should be stored close to where they are
needed most often
Transparently– End users should not need to be concerned with how the data
grid is implemented
DataGrid Implementation
Hardware– Storage elements are PCs– Data stored in RAID arrays – cheap and offer built in
redundancy
Software– Red Hat Linux 7.2 OS– Globus Toolkit 2.0 used for low level grid services– European DataGrid software intended to be used in next
phase for data replication/job submission– Custom written QCDGrid software builds on Globus to
implement QCDGrid client tools and control thread
Data Grid Structure
Simple Use Case – Adding a FileThe user issues a ‘put’ command
The software chooses a suitable storage element and copies the file to its ‘NEW’ directory
On its next scan, the control thread finds the new file and moves it to its actual home, registering it with the replica catalogue
On its next scan, the control thread finds there is only one copy of the file and makes another one at a suitable site, registering it with the replica catalogue
Simple Use Case – Getting a File
The user issues a ‘get’ command on a client machine
The software looks up the replica catalogue to find the nearest copy of the file
The file is transferred from that copy
If the transfer fails, the software looks up the replica catalogue again to find the next nearest copy, and tries to transfer that instead
Fault Tolerance
Probably the most important requirement of QCDgrid
Central control thread– Constantly monitoring nodes to make sure they are still working
Node fails without warning– E-mail sent to the system administrator– Control thread begins to replicate the files that were on the node
elsewhere Nodes can be temporarily disabled if they have to be shut down or rebooted
– Prevents the grid moving data around unnecessarily A secondary node is constantly monitoring the central node
– Backing up the replica catalogue and configuration files. – Grid can still be accessed (albeit read-only) if the central node goes
down
Current Progress
Data grid software has been implemented and is undergoing testing
A 4 node test grid has been set up across two of the sites (Edinburgh and Liverpool)
A web-based status monitor exists, allowing users to check the state of the data grid
Metadata
Storing metadata which describes the actual data – This allow users to see what is on the grid and find what they
want more easily
Data described by XML metadata files– A schema is being developed for the QCD metadata
The XML files stored centrally in an XML database – the QCDGrid metadata catalogue– Using Apache Xindice
The XML files will also be submitted to the data grid itself– Ensures there is a backup copy of the metadata
– Metadata catalogue can be reconstructed from the data grid in the event that it is lost
Implementation of Metadata
Data submitted to the grid must be accompanied by a valid metadata file
This can be enforced by checking it against the schema
A submission tool (graphical or command line) takes care of sending the data and metadata to the right places
The Xindice XML database is accessed as a grid service
The API for this is being developed by the OGSA DAI project
A graphical metadata browser will allow easy access to data stored on the grid, based on meaningful characteristics
Current Progress XML schema development is well advanced
– Prototype available
Metadata browser applet exists– May require
modification due to changes in APIs used
Metadata catalogue– OGSA DAI project are providing grid service software to
QCDGrid
Conclusions
Aim– To implement a 'QCDgrid' to become a production
environment for UKQCD
Developed a prototype distributed data grid– Adding ‘real’ data to the grid this month
Developed a prototype XML schema and browser
Utilising the OGSA DAI grid service software for the XML metadata catalogue
References
QCDGrid– Software mailing list: [email protected]– Project information e-mail: [email protected]– Or see:
http://www.epcc.ed.ac.uk/computing/research_activities/grid/qcdgrid/
– Example Schema, see:– http://www.ph.ed.ac.uk/ukqcd/community/the_grid/
xml_schema/xml_schema.html
GridPP– http://www.gridpp.ac.uk