building a distributed software environment for cdf within the eslea framework v. bartsch, m....
TRANSCRIPT
![Page 1: Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649e915503460f94b964f2/html5/thumbnails/1.jpg)
Building a distributed software environment for
CDF within the ESLEA framework
V. Bartsch, M. Lancaster
University College London
![Page 2: Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649e915503460f94b964f2/html5/thumbnails/2.jpg)
CDF experiment
located at Fermilab close to Chicago proton/anti-proton collisions at the Tevatron of an energy of 1.2 TeV
CDF
multipurpose detector with discovery potential for the Higgs, studies of b physics and measurement of standard model parameters
luminosity of about 1fb-1 per year
![Page 3: Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649e915503460f94b964f2/html5/thumbnails/3.jpg)
Principle of data analysis
40MB/s2TB/day
assign particle momentum, tracks etc.
raw data reco data
user selection
user data
MCmonte carlo
simulation of the events
user analysis
analysis performed by
~800 physicists in ~60 institutes
![Page 4: Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649e915503460f94b964f2/html5/thumbnails/4.jpg)
CDF – data handling requirements
The experiment has ~ 800 physicists of which ~ 50 are in the UK.
The experiment produces large amounts of data which is stored in the US
~ 1000 Tb per year~ 2000 Tb data stored to date and expect this to rise to 10,000 by 2008
UK physicists:need to be able to copy datasets ( ~ 0.5-10 Tb) quickly to the UKcreate MC data within the UKother UK physicists and other CDF physicists worldwide
![Page 5: Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649e915503460f94b964f2/html5/thumbnails/5.jpg)
data handling numbers
• CDF has acquired
• produces nowadays 1Pb/year, expected to rise to 10Pb by 2008• Fermilab alone is serving about 18 Tb/day
590 TB Raw data
660 TB Reconstructed data
280 TB MC
1530 TB total
Bytes read
TB
ytes
2
6
10
![Page 6: Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649e915503460f94b964f2/html5/thumbnails/6.jpg)
CDF batch computing• 2 types of activities
– organized processing• raw data reconstruction• data reduction for different physics groups• MC production
– user analysis• need to be able to copy datasets (0.5-10Tb)
both use large amount of CPU use the same tools for all
![Page 7: Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649e915503460f94b964f2/html5/thumbnails/7.jpg)
CDF Grid Philosophy
CDF has adopted Grid concepts quite late during run time while it already had a mature software
look & feel of the old data handling system maintained reliability main issue
use existing infrastructure as portal and change software underneath
![Page 8: Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649e915503460f94b964f2/html5/thumbnails/8.jpg)
CDF Analysis Farm (CAF)
Submit and forget until receiving a mail
Does all the job handling and negotiation with the data handling system without the user knowing
• CDF batch job contains a tar ball with all the needed scripts, binaries and shared libraries and send tarball to output location• user need to authenticate with their kerberos ticket
![Page 9: Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649e915503460f94b964f2/html5/thumbnails/9.jpg)
CAF -evolution over time
CDF used several batch systems and distribution mechanisms• FBSNG• Condor• Condor with Globus• gLite WMS
CAF was able to be distributed, run on non-dedicated resources glite WMS helps to run on EGEE sites
Grid based
Used as Productionsystems
![Page 10: Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649e915503460f94b964f2/html5/thumbnails/10.jpg)
Condor-based GRID CAF
Collector
Userpriorities
Negotiator
Userjobs
Schedd
Globus
User Job
User Job
Grid nodes
Starter
StarterNegotiator assigns
nodes to jobs
Globus assignsnodes to VOs
Glide-ins
Pull Model
![Page 11: Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649e915503460f94b964f2/html5/thumbnails/11.jpg)
gLite WMS-based GRID CAF
Push Model
Userjobs
Schedd
Globus
User Job
User Job
Grid nodes
Resource Broker
![Page 12: Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649e915503460f94b964f2/html5/thumbnails/12.jpg)
Pros & ConsCondor based Grid CAFPros:• Globally managed user and job priorities within CDF• Broken nodes kill condor daemons, not user jobs• Resource selection done after a batch slot is secured
Cons:• Uses a single service proxy for all jobs to enter Grid sites• Requires outgoing connectivity
gLite WMS-based Grid CAFPros:• LCG-backed tools• No need for external connectivity• Grid sites can manage users
Cons:• No global fair share for CDF
![Page 13: Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649e915503460f94b964f2/html5/thumbnails/13.jpg)
gLite WMS-based GRID CAF
• at FNAL: CAF worker nodes used to have CDF software distribution NFS mounted, but not an option in the Grid world
• all production jobs are now self-contained
• trying Parrot to distribute CDF software over HTTP in analysis jobs
![Page 14: Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649e915503460f94b964f2/html5/thumbnails/14.jpg)
2610
1335
858
800
261
FNAL remote dedicatedresources
Condor basedGrid CAFs
LCGCafFermiGrid
avg. usable VMs (Virtual Machine)
Number of jobs on the CAF
Some numbers
![Page 15: Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649e915503460f94b964f2/html5/thumbnails/15.jpg)
Data handling system SAM
• SAM manages file storage– Data files are stored in tape systems at FNAL and elsewhere (most
use ENSTORE at FNAL)– Files are cached around the world for fast access
• SAM manages file delivery– Users at FNAL and remote sites retrieve files transparently out of file
storage. SAM handles caching for efficiency
• SAM manages file cataloging– SAM DB holds meta-data for each file transparent to the user
• SAM manages analysis bookkeeping– SAM remembers what files you ran over, what files you processed,
what applications you ran, when you ran them and where
![Page 16: Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649e915503460f94b964f2/html5/thumbnails/16.jpg)
world wide distribution of SAM stations
selected SAM stations
FNAL
CDF:
10k/20k Files declared/day
15k Files consumed/day
8 TByte of Files cons./day
main consumption of main consumption of datadata still central still central remote use on the riseremote use on the rise
test deployment
300Tb
Total CDF Files To User
![Page 17: Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London](https://reader035.vdocuments.mx/reader035/viewer/2022081519/56649e915503460f94b964f2/html5/thumbnails/17.jpg)
summary & outlook
• UCL-HEP cluster deployed• UCL-CCC cluster still to come• need a better integration of SAM and the CAF• user feedback needs to be collated