1 the datagrid workpackage 8 f.carminati 28 june 2001
TRANSCRIPT
1
The DataGrid WorkPackage 8
F.Carminati 28 June 2001
2FOCUSJune 28, 2001
Distributed computingBasic principle
Physicists must have in principle equal access to data and resources
The system will be extremely complex Number of sites and components in each site Different tasks performed in parallel: simulation,
reconstruction, scheduled and unscheduled analysisBad news is that the basic tools are missing
Distributed authentication & resource management Distributed namespace for files and objects Local resource management of large clusters Data replication and caching WAN/LAN monitoring and logging
Good news is that we are not alone These issues are central to the developments in the US and
in Europe under the collective name of GRID
3FOCUSJune 28, 2001
The GRID“Dependable, consistent,
pervasive access to[high-end] resources”
Dependable: Can provide performance
and functionality guarantees
Consistent: Uniform interfaces to a
wide variety of resourcesPervasive:
Ability to “plug in” from anywhere
4FOCUSJune 28, 2001
GLOBUS hourglassFocus on architecture issues
Low participation cost, local control and support for adaptation
Use to construct high-level, domain-specific solutions
A set of toolkit services Security (GSI) Resource management
(GRAM) Information services (MDS) Remote file management
(GASS) Communication (I/O,
Nexus) Process monitoring (HBM)
Diverse global services
Core Globusservices
Local OS
A p p l i c a t i o n s
5FOCUSJune 28, 2001
DataGRID project
Huge project 21 partners, 10MEuro, 3 years, 12
WorkPackages
Global design still evolving An ATF has been mandated to design the
architecture of the system to be delivered at PM6 (=now!)
Continuous feed-back from users needed
Users are in three workpackages HEP in WP8 Earth Observation in WP9 Biology in WP10
6FOCUSJune 28, 2001
DataGRID Project WP1 Grid Workload Management 11/18.6 WP2 Grid Data Management 6/17.6 WP3 Grid Monitoring Services 7/10 WP4 Fabric Management 5.7/15.6 WP5 Mass Storage Management 1/5.75 WP6 Integration Testbed 6/27 WP7 Network Services 0.5/9.4 WP8 High Energy Physics Applications 1.7/23.2 WP9 Earth Observation Science Applications 3/9.4 WP10 Biology Science Applications 1.5/6 WP11 Dissemination and Exploitation 1.2/1.7 WP12 Project Management 2.5/2.5
7FOCUSJune 28, 2001
WP8 mandatePartners CNRS, INFN, NIKHEF, PPARC, CERNCoordinate exploitation of testbed by the HEP experiments
Installation kits have been developed by all experiments
Coordinate interaction of the experiments with the other WP’s and the ATF
Monthly meetings of the WP8 TWG Common feedback to WP’s and to ATF
Identify common components that could be integrated in an HEP “upper middleware” layerLarge unfunded effort (776 m/m ~ 22 people)Very small funded effort (60 m/m ~ 1.5 people)
8FOCUSJune 28, 2001
The multi-layer hourglass
OS & Net services
Bag of Services (GLOBUS)
DataGRID middlewarePPDG, GriPhyn, EuroGRID
HEPVO common application layer
Earth Obs. Biology
ALICE ATLAS CMS LHCb
Specific application layer
WP9 WP 10
GLOBUS team
DataGRID ATF
WP8-9-10 TWG
9FOCUSJune 28, 2001
CPU(SI95) DSK(GB) TP(TB) NWK(Mb/s) PeopleBari 150 100 - 12 2Birming. 800 1000 (RAL?) ? 1Bologna 140 100 - 34 2Cagliari 200 400 - 4 2Catania 1100 600 1.2 4 6CERN 350 400 5 (CASTOR) 2 Dubna 800 1500 - 2 3GSI (D) 1150 1000 ? (robot) 34 2IRB (Kr) 400 2000 - 2 5Lyon 38000 1000 30 (HPSS) 155 2Merida (MX)35 40 - 0.002 1NIKHEF (NL)850 1000 ? (robot) 10 1+OSU (US) 3400 100 130 (HPSS) 622 2Padova 210 200 - 16 1Saclay ? ? - ? 1Torino 850 1200 - 12 6UNAM (MX) 35 40 - 2 2
ALICE Site list & Resources
10FOCUSJune 28, 2001
ALICE – Testbed PreparationALICE Installation KIT (M. Sitta)List of ALICE Users (& CERTIFICATES) Distributed production test (R. Barbera)HPSS/CASTOR Storage (Y. Schutz)Logging/Bookkeeping (Y. Schutz)
Input from stdout & stderr Store in MySQL/ORACLE
Integration in DataGRID WP6 6/2001: List of ALICE users (& Certificates) to WP6 7/2001: Distributed test with bookkeeping system 8/2001: Distributed test with PROOF (reconstruction) 9/2001: Test with DataGRID WP6 resources 12/2001: Test M9 GRID Services release (preliminary)
11FOCUSJune 28, 2001
ALICE distributed analysis model
Local
Remote
Selection
Parameters
Procedure
Proc.C
Proc.C
Proc.C
Proc.C
Proc.C
PROOF
CPU
CPU
CPU
CPU
CPU
CPU
TagDB
RDB
DB1
DB4DB5DB6
DB3
DB2
Bring the KB to the PB and not the PB to the KB
12FOCUSJune 28, 2001
AliRoot - DataGRID ServicesReconstruction & Analysis for the ALICE Physics Performance Report will be the short term use caseALICE Data Catalog being developed
data selection (files, to become objects soon
TGlobus class almost completed interface to access GLOBUS security from ROOT
TLDAP class being developed interface to access IMS from ROOT
Parallel Socket Classes (TPServer, TPServerSocket) TFTP Class, root deamon modified
parallel data transfer
13FOCUSJune 28, 2001
ATLAS Grid Activities - 1ATLAS physicists are involved, and play relevant roles in the EU and US grid projects started in the last year EU DataGrid, Japan also active, US GriPhyN and
PPDG The Computing Model is based on Tier-1, Tier-2
and local Tier-3 connected using grid Data Challenges planned for testing and tuning of
the grid and the Model
Till now work has been done for requirements and use case collection first tests of the available grid services interaction with the different projects for
cooperation and at least interoperability
14FOCUSJune 28, 2001
ATLAS Grid Activities - 2The ATLAS Grid Working Group (EU, US, Japan, etc.) meets regularly during the s/w weeks and 2 ad hoc workshop have been heldJobs have been run between different ATLAS sites using Globus + additions A s/w kit for Atlas Geant3 simulation +
reconstruction on any Linux 6.2 or 7.1 machine is available and installed in INFN sites +CERN+ Glasgow and Grenoble. Inclusion of Athena-Atlfast being studied
Production style jobs have been run between Milan-CERN-Rome. In July planned with also Glasgow, Grenoble and possibly LUND using the kit
15FOCUSJune 28, 2001
ATLAS Grid Activities - 3Grid test using also Objectivity federations are planned in summer-autumnATLAS is involved in the proposed projects EU DataTag and US iVDGL Transatlantic Testbed are between the deliverables Very important for testing EU-US interoperability and
getting stricter cooperation between grid projects Links needed at level of projects: at experiment level
links in ATLAS are excellent, but this is not enough
Tests already going on between US and EU, but a lot of details (authorization, support) to be sorted out. More CERN involvement requested.
16FOCUSJune 28, 2001
CMS IssuesCMS is a “single” collaboration a coordination required among DataGrid, GriPhyN, PPDG and also INFN-Grid“Production” (Data Challenges) and CPT projects are the driving efforts
Computing & Core Software=CCS + Physics Reconstruction & Selection= PRS + Trigger and Data Acquisition Systems= TriDAS
Being Grid one of the CMS Computing activities, not all the CMS sites and people are committed to Grid Therefore (only) some of the CMS sites will be involved in first testbed activitiesThe “grid” of sites naturally include EU and US sites (+ others countries/continents)
17FOCUSJune 28, 2001
Current CMS Grid ActivitiesDefinition of the CMS Applications Requirements to the Grid
Planning and implementation of the first tests on “proto-Grid” tools Use of GDMP in some “real production” distributed environment [done] Make use of CA authorization in a distributed sites’ scenario [tested] Use of some Condor distributed (remote) submission schemes [done] Evaluate the PM9 deliverables of WP1 and WP2 (Grid Scheduler and
enhanced Replica Management)
Test dedicated resources to be provided by the participating sites To provide a reasonable complex and powerful trial To allow for “quasi-real” environment testing for the Applications
CERN has a central role in these tests, and adeguate resource support has to be provided aside of the “Experiments’ Data Challenges” needed support.
18FOCUSJune 28, 2001
Current LHCb Datagrid activities
Distributed MC production at CERN, RAL, Liverpool and Lyon Extending to Nikhef, Bologna, Edinburgh/Glasgow
by end 2001
Current Testbed-0 tests using CERN and RAL (Globus problems encountered) Will extend to other sites in next few months
Using parts of MC production system for ‘Short term use case’Active participation in WP8 from UK, France, Italy, Nikhef, CERN (15 people part-time)Need more dedicated effort
19FOCUSJune 28, 2001
1.Production started by filling out a Web form:Version of softwareAcceptance cutsDatabaseChannelNumber of events to generateNumber of jobs to runCentre where the jobs should run
2.Web form calls a java servlet that:Creates a job script (one per job)Creates a cards file (one-three per job) with random number seeds and job options(The cards files need to be accessible by the running job)Issues a job submit command to run script in batch-> WP1
3.Script does the following:Copies executable, detector database and cards filesExecutes executableExecutable creates output datasetOutput copied to local mass store -> WP5Log file copied to web browsable areaScript calls java program (see 4)
4.Java program calls servlet at CERN to:Transfer data back to CERN -> WP2Update meta-database at CERN -> WP2
LHCb Short term Use Case for GRID
20FOCUSJune 28, 2001
LHCb – Medium to long term planning
Would like to work with middleware WPs from now, helping test early prototype software with use case (parts of MC production system)More generally would like to be involved in the development of the architecture for the long termWriting planning document based on ‘use scenarios’- will develop library of use scenarios for testing Datagrid softwareStarting major project interfacing LHCb OO software framework GAUDI to GRID services Awaiting recruitment of ’new blood’
21FOCUSJune 28, 2001
GAUDI and external services(some GRID based)
Converter
Algorithm
Event DataService
PersistencyService
AlgorithmAlgorithm
Transient
Event Store
Detec. DataService
PersistencyService
Transient
Detector
Store
MessageService
JobOptionsService
Particle Prop.Service
OtherServices Histogram
ServicePersistency
Service
Transient
Histogram Store
ApplicationManager
ConverterConverterEventSelector
Analysis Program
OSMass
Storage
EventDatabasePDG
Database
DataSetDB
Other
MonitoringService
HistoPresenter
Other
JobService
Config.Service
GAUDI framework
22FOCUSJune 28, 2001
Issues and concersAfter a very difficult start the project seems now taking up speed Interaction with the WP’s and the ATF was difficult to establishExperiments are very active a large unfunded effort activity is ongoing
Approximately 5-6 people per experimentFunded effort still very slow in coming (2 out of 5!)CERN testbed still insufficiently staffed & equipped
DataGrid should have a central role in the LHC computing project – it is not the case now
The GRID testbed and the LHC testbed should merge soon
Interoperability with US Grid’s is very important We have to keep talking with them and open the testbed
to US colleagues