gridpp collaboration meeting 5 th november 2001 dan tovey, university of sheffield non-lhc and...

10
GridPP Collaboration Meeting 5 th November 2001 Dan Tovey, University of Sheffield Non-LHC and Non-US- Collider Experiments’ Requirements Dan Tovey, University of Sheffield

Upload: annabella-fitzgerald

Post on 14-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GridPP Collaboration Meeting 5 th November 2001 Dan Tovey, University of Sheffield Non-LHC and Non-US-Collider Experiments’ Requirements Dan Tovey, University

GridPP Collaboration Meeting5th November 2001

Dan Tovey,University of Sheffield

Non-LHC and Non-US-Collider Experiments’

RequirementsDan Tovey,

University of Sheffield

Page 2: GridPP Collaboration Meeting 5 th November 2001 Dan Tovey, University of Sheffield Non-LHC and Non-US-Collider Experiments’ Requirements Dan Tovey, University

GridPP Collaboration Meeting5th November 2001

Dan Tovey,University of Sheffield

‘Other’ Experiments Representing non-LHC and non-US-collider experiments. Includes ANTARES, MINOS and UKDMC. In general such experiments have few resources to devote to

exclusively Grid activities (although much effort targeted at e-Science related issues, e.g. analysis code development).

At present analysis of data predominantly carried out locally or at central facilities - no requirement as yet to move to large-scale distributed data processing.

That said ….. keen interest exists in testing / making use of Grid tools if will improve data handling within existing analysis frameworks.

Situation likely to change in next few years given larger data-rates / mass uptake of Grid technology by central facilities.

Page 3: GridPP Collaboration Meeting 5 th November 2001 Dan Tovey, University of Sheffield Non-LHC and Non-US-Collider Experiments’ Requirements Dan Tovey, University

GridPP Collaboration Meeting5th November 2001

Dan Tovey,University of Sheffield

Application 1: Transfer of Data Between Mass Storage

Facilities Experiments in general need to transfer large volumes of

data quickly and conveniently between physically separated sites.

Sites may or may not possess high-speed network connections (e.g. RAL and Boulby Mine).

In former case a grid-based transfer protocol may be appropriate - In the latter data needs to be transferred by some means other than the network.

This problem is common to many HEP experiments and mirrors that faced by the LHC experiments.

It is hoped that common solutions can be found (e.g. using EDG testbed components such as GridFTP and WP5 protocols).

Page 4: GridPP Collaboration Meeting 5 th November 2001 Dan Tovey, University of Sheffield Non-LHC and Non-US-Collider Experiments’ Requirements Dan Tovey, University

GridPP Collaboration Meeting5th November 2001

Dan Tovey,University of Sheffield

1. Large subsets of data held in data storage facilities at collaborating institutes.

2. User wishes to transfer large subsets of this data (not necessarily co-located) to cpu location (datastore) prior to analysis.

3. User logs onto local machine.4. User accesses collaboration-wide web-page providing front-end to generic

data discovery and transfer tool.5. User logs onto site (password required – automatic authentication is not

required at this stage).6. Software presents query form to user.7. User specifies datasets of interest by ‘run’ properties (e.g. ‘I wish to

download all calibration data taken between 01/06/02 and 01/07/02 by detector XXX’). Specification by run number or (if necessary file name also possible.

8. Software accesses collaboration metadata catalogue to match query to file-names. Metadata catalogue probably updated manually in the first instance as part of data book keeping process. Entries (in plain-text format) give e.g. run type, run number, start time, end time for each file.

Example Use Case (Data Discovery and Transfer)

Page 5: GridPP Collaboration Meeting 5 th November 2001 Dan Tovey, University of Sheffield Non-LHC and Non-US-Collider Experiments’ Requirements Dan Tovey, University

GridPP Collaboration Meeting5th November 2001

Dan Tovey,University of Sheffield

9. Software queries replica catalogue to discover location of required files.10. Software starts up transfer protocol (e.g. GridFTP).11. Software initiates FTP-like connection between source site(s) and destination

site (not necessarily local to user). Source and destination sites must be members of list of ‘approved’ collaboration datastores (I.e. not possible to transfer data to arbitrary location – security issues).

12. Software ‘gets’ files efficiently reliably and securely from source(s) to destination.

13. Software notifies user of status of transfer via front-end (e.g. total data volume, total volume transferred, volume remaining, estimated time required, time taken, estimated time remaining, current mean transfer rate).

14. Software notifies user if faults occur: keeps trying until time-out, then returns to user with meaningful error message (i.e. suspected reason for error) if still failing. Must permit automatic partial transfer if faults only occur for certain files / locations (i.e. fully transferred files remain, partially transferred files deleted).

15. Software updates replica catalogue and transfer log file.16. Software notifies user when transfer complete.

Example Use Case (Data Discovery and Transfer)

Page 6: GridPP Collaboration Meeting 5 th November 2001 Dan Tovey, University of Sheffield Non-LHC and Non-US-Collider Experiments’ Requirements Dan Tovey, University

GridPP Collaboration Meeting5th November 2001

Dan Tovey,University of Sheffield

Data Transfer Requirements

1. MSM software should be supplied capable of transferring certain specified data sets, but not others, onto specific physical tapes. Specification must be possible on basis of file metadata as well as physical filename.

2. MSM software should decide which tapes are most suitable for this purpose on the basis of time taken to prepare tapes and/or total number of tapes required.

3. A common translation module for file metadata such that content, format and status of given transfer tapes can be assessed automatically by any specified system (there may be more than one) and used to position to and read files or segments of files from those tapes.

4. A simple-to-use ftp-like protocol with web-based front-end suitable for reliable, transparent, efficient and secure transfer of large datasets between multiple specified collaboration sites. Software must discover names and locations of files from specified run metadata using metadata and replica catalogues.

Page 7: GridPP Collaboration Meeting 5 th November 2001 Dan Tovey, University of Sheffield Non-LHC and Non-US-Collider Experiments’ Requirements Dan Tovey, University

GridPP Collaboration Meeting5th November 2001

Dan Tovey,University of Sheffield

Collaborative Project: Generic Data Discovery and Transfer

Tool Last requirement (generic data discovery and transfer tool) is

common to several experiments (ANTARES, UKDMC and MINOS). Therefore hoped that a common solution can be found. Generic nature of the requirement suggests that solution will

also be of interest to other groups, in particular UKQCD. 'Mainstream' experiments (e.g. BaBar and LHC collaborations)

have similar data transfer requirements, so the tool may be of further interest here.

Have therefore proposed a collaborative project between several experiments, including ANTARES, UKDMC and MINOS and possibly also UKQCD, BaBar and others.

The project will deliver, on a 1-2 year timescale, a fully functioning web-based data discovery and transfer tool providing an automated interface to appropriate grid applications (metadata and replica cataloguing and file transfer services).

Page 8: GridPP Collaboration Meeting 5 th November 2001 Dan Tovey, University of Sheffield Non-LHC and Non-US-Collider Experiments’ Requirements Dan Tovey, University

GridPP Collaboration Meeting5th November 2001

Dan Tovey,University of Sheffield

Application 2: Remote Control of Underground

Experiments Novel application distinct from others proposed within

GridPP. UKDMC and MINOS identified need for remote access to

and control of underground experiments. Involves remote configuring, monitoring and debugging

of DAQ code (possibly also remote high-level trigger for low background experiments).

Methodology is similar to that suggested for a Global Accelerator Network for running the next generation of colliders.

There may also be commonality with grid based remote control applications specified by AstroGrid and other collaborations.

Page 9: GridPP Collaboration Meeting 5 th November 2001 Dan Tovey, University of Sheffield Non-LHC and Non-US-Collider Experiments’ Requirements Dan Tovey, University

GridPP Collaboration Meeting5th November 2001

Dan Tovey,University of Sheffield

Application 3: Fast Access to Remote Data Sets

Simple grid-like application identified by MINOS. Would like to perform interactive ROOT analyses in UK

on selected data sets held at location(s) in US. Would involve accessing and merging data from

remotely held files => already possible in PAW (Manchester), ROOT also?

Also desire to perform batch reduction in UK on remotely held files, possibly using grid-open type command => AFS alternative?

Page 10: GridPP Collaboration Meeting 5 th November 2001 Dan Tovey, University of Sheffield Non-LHC and Non-US-Collider Experiments’ Requirements Dan Tovey, University

GridPP Collaboration Meeting5th November 2001

Dan Tovey,University of Sheffield

Summary

The 'Other' Experiments are keen to make full use ASAP of tools provided by GridPP and other initiatives in order to simplify existing analysis procedures.

Interested in developing full grid-based analyses in longer term (> 2 years).

We want to learn to walk (before we can run)!