cdf computing · ccp 2006 gyeongju, korea cdf computing by igor sfiligoi 2 cdf computing philosophy...

27
CCP 2006 - Gyeongju, Korea - CDF computing - by Igor Sfiligoi 1 CDF Computing Presented by Igor Sfiligoi for the CDF CAF group

Upload: others

Post on 05-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CDF Computing · CCP 2006 Gyeongju, Korea CDF computing by Igor Sfiligoi 2 CDF Computing Philosophy Strictly distinguish interactive and batch computing – Process bulk of data in

CCP 2006 ­ Gyeongju, Korea ­ CDF computing ­ by Igor Sfiligoi 1

CDF Computing

Presented by Igor Sfiligoi

for the CDF CAF group

Page 2: CDF Computing · CCP 2006 Gyeongju, Korea CDF computing by Igor Sfiligoi 2 CDF Computing Philosophy Strictly distinguish interactive and batch computing – Process bulk of data in

CCP 2006 ­ Gyeongju, Korea ­ CDF computing ­ by Igor Sfiligoi 2

CDF Computing Philosophy● Strictly distinguish interactive and batch 

computing– Process bulk of data in batch mode– Analyze digested data in interactive

● Interactive analysis – works on small amounts of data (<1TB) – uses only small amounts of CPU (GHz hours)➔ performed on the desktop

Page 3: CDF Computing · CCP 2006 Gyeongju, Korea CDF computing by Igor Sfiligoi 2 CDF Computing Philosophy Strictly distinguish interactive and batch computing – Process bulk of data in

CCP 2006 ­ Gyeongju, Korea ­ CDF computing ­ by Igor Sfiligoi 3

CDF batch computing● Two types of activities

– Organized processing– User analysis

● Both use large amount of CPU

Use thesame tools

for allMust plan for 

the most flexible

Raw data reconstructionData reduction for different 

physics groupsMC production

Page 4: CDF Computing · CCP 2006 Gyeongju, Korea CDF computing by Igor Sfiligoi 2 CDF Computing Philosophy Strictly distinguish interactive and batch computing – Process bulk of data in

CCP 2006 ­ Gyeongju, Korea ­ CDF computing ­ by Igor Sfiligoi 4

CDF Analysis Farm (CAF)

Develop, debug  AND submit from personal PC

Output toany place

CAFNo need to

stay connectedSubmit and forget

until receiving a mail

Page 5: CDF Computing · CCP 2006 Gyeongju, Korea CDF computing by Igor Sfiligoi 2 CDF Computing Philosophy Strictly distinguish interactive and batch computing – Process bulk of data in

CCP 2006 ­ Gyeongju, Korea ­ CDF computing ­ by Igor Sfiligoi 5

CDF batch job● A user provides:

– A tar ball containing all the needed scripts, binaries and shared libraries

– The name of the startup script, plus parameters– The number of desired sections– An output location

● A tarball with whatever is left at the end of a section is sent to the output location

Page 6: CDF Computing · CCP 2006 Gyeongju, Korea CDF computing by Igor Sfiligoi 2 CDF Computing Philosophy Strictly distinguish interactive and batch computing – Process bulk of data in

CCP 2006 ­ Gyeongju, Korea ­ CDF computing ­ by Igor Sfiligoi 6

CAF Head Node

The CAF head node

Monitor

SubmitterWorkernodes

Mailer

Batch System

Kerberos

Web Server

Just a portal

Page 7: CDF Computing · CCP 2006 Gyeongju, Korea CDF computing by Igor Sfiligoi 2 CDF Computing Philosophy Strictly distinguish interactive and batch computing – Process bulk of data in

CCP 2006 ­ Gyeongju, Korea ­ CDF computing ­ by Igor Sfiligoi 7

Workernodes

CAF wrapper

The CAF wrapper

Batch Slot

Setup security

User job

Prepare environment

Return output

Mo

nit

or

Batch Slot

Batch Slot

Batch Slot

Batch Slot Batch Slot

Batch Slot

Batch Slot

Page 8: CDF Computing · CCP 2006 Gyeongju, Korea CDF computing by Igor Sfiligoi 2 CDF Computing Philosophy Strictly distinguish interactive and batch computing – Process bulk of data in

CCP 2006 ­ Gyeongju, Korea ­ CDF computing ­ by Igor Sfiligoi 8

CAF monitoring● Job monitoring essential to maximize physics output● The CAF provides both

– command line and– web based monitoring

● Limited pseudo­interactive monitoring available– directory listing, tail of any file – ps, top 

Page 9: CDF Computing · CCP 2006 Gyeongju, Korea CDF computing by Igor Sfiligoi 2 CDF Computing Philosophy Strictly distinguish interactive and batch computing – Process bulk of data in

CCP 2006 ­ Gyeongju, Korea ­ CDF computing ­ by Igor Sfiligoi 9

CAF evolution over time● CAF just a portal

– Allows to change the underlying batch system without changing the user interface

● CDF used several batch systems– FBSNG– Condor– Condor over Globus– gLite WMS

ProductionsystemsGrid

based

Page 10: CDF Computing · CCP 2006 Gyeongju, Korea CDF computing by Igor Sfiligoi 2 CDF Computing Philosophy Strictly distinguish interactive and batch computing – Process bulk of data in

CCP 2006 ­ Gyeongju, Korea ­ CDF computing ­ by Igor Sfiligoi 10

Condor based CAF

CAF Head Node

Monitor

Submitter

Mailer

Condor

WorkerWorkernodesnodes

Page 11: CDF Computing · CCP 2006 Gyeongju, Korea CDF computing by Igor Sfiligoi 2 CDF Computing Philosophy Strictly distinguish interactive and batch computing – Process bulk of data in

CCP 2006 ­ Gyeongju, Korea ­ CDF computing ­ by Igor Sfiligoi 11

Condor based Grid CAF ­ Overview

CAF Head Node

Monitor

Submitter

Mailer

Condor

Grid Site

Grid Site

Grid Site

VirtualVirtualprivateprivateCDFCDF

workerworkernodesnodes

Glidekeeper

Page 12: CDF Computing · CCP 2006 Gyeongju, Korea CDF computing by Igor Sfiligoi 2 CDF Computing Philosophy Strictly distinguish interactive and batch computing – Process bulk of data in

CCP 2006 ­ Gyeongju, Korea ­ CDF computing ­ by Igor Sfiligoi 12

Condor based Grid CAF ­ Details

CAF Head Node

Monitor

Submitter

Mailer

Condor

Grid Site

Grid Site

Glidekeeper

Globus

Globus

Globus

Condor

User job

Condor

User job

Grid Site

Condor

User job

Pull model

Page 13: CDF Computing · CCP 2006 Gyeongju, Korea CDF computing by Igor Sfiligoi 2 CDF Computing Philosophy Strictly distinguish interactive and batch computing – Process bulk of data in

CCP 2006 ­ Gyeongju, Korea ­ CDF computing ­ by Igor Sfiligoi 13

gLite WMS based Grid CAF

CAF Head Node

Monitor

Submitter

Mailer

Grid Site

Grid Site

Globus

Globus

GlobusGrid Site

User job

Push model

ResourceBroker

User job

User job

Page 14: CDF Computing · CCP 2006 Gyeongju, Korea CDF computing by Igor Sfiligoi 2 CDF Computing Philosophy Strictly distinguish interactive and batch computing – Process bulk of data in

CCP 2006 ­ Gyeongju, Korea ­ CDF computing ­ by Igor Sfiligoi 14

Condor based Grid CAF● Pros:

– Real fair share– Globally managed user 

and job priorities– Broken nodes kill condor 

daemons, not user jobs– Resource selection done 

after a batch slot is secured

● Cons:– Uses a single service 

proxy for all jobs to enter Grid sites

– Requires outgoing connectivity

– Not (yet) a blessed Grid service

Page 15: CDF Computing · CCP 2006 Gyeongju, Korea CDF computing by Igor Sfiligoi 2 CDF Computing Philosophy Strictly distinguish interactive and batch computing – Process bulk of data in

CCP 2006 ­ Gyeongju, Korea ­ CDF computing ­ by Igor Sfiligoi 15

gLite WMS based CAF● Cons:

– No global fair share– Grid­local policy 

managed at site level– Relies heavily on 

information published by Grid sites for resource selection

● Pros:– LCG­backed tools– No need for external 

connectivity– Grid sites can 

manage users 

Page 16: CDF Computing · CCP 2006 Gyeongju, Korea CDF computing by Igor Sfiligoi 2 CDF Computing Philosophy Strictly distinguish interactive and batch computing – Process bulk of data in

CCP 2006 ­ Gyeongju, Korea ­ CDF computing ­ by Igor Sfiligoi 16

Grid CAF summary● The two Grid CAF complement each other:

– Condor based Grid CAF is more flexible– gLite WMS CAF is more Grid­compliant

● We plan to use the Condor based one wherever possible, and run on all other Grid sites with the WMS based one.

Page 17: CDF Computing · CCP 2006 Gyeongju, Korea CDF computing by Igor Sfiligoi 2 CDF Computing Philosophy Strictly distinguish interactive and batch computing – Process bulk of data in

CCP 2006 ­ Gyeongju, Korea ­ CDF computing ­ by Igor Sfiligoi 17

CDF batch computing in numbers● Up to 6000 batch slots

– Approx. half coming from Grid resources● Almost 800 registered users

– Approx. 100 active at any given time

Page 18: CDF Computing · CCP 2006 Gyeongju, Korea CDF computing by Igor Sfiligoi 2 CDF Computing Philosophy Strictly distinguish interactive and batch computing – Process bulk of data in

CCP 2006 ­ Gyeongju, Korea ­ CDF computing ­ by Igor Sfiligoi 18

CDF software distribution● CDF software fairly stable by now

– Several revisions were needed to get there– Some complex analyzes still rely on old versions

● CDF software repository available on all interactive nodes– Several user scripts use it to load production 

binaries, shared libraries and helper scripts 

Page 19: CDF Computing · CCP 2006 Gyeongju, Korea CDF computing by Igor Sfiligoi 2 CDF Computing Philosophy Strictly distinguish interactive and batch computing – Process bulk of data in

CCP 2006 ­ Gyeongju, Korea ­ CDF computing ­ by Igor Sfiligoi 19

CDF software distribution (2)

● The CAF worker nodes used to have CDF software distribution NFS mounted– Not an option in the Grid world

● All production jobs are now self contained● We are trying out Parrot for seamlessly 

distributing CDF software over HTTP for analysis jobs– Currently in beta stage

Page 20: CDF Computing · CCP 2006 Gyeongju, Korea CDF computing by Igor Sfiligoi 2 CDF Computing Philosophy Strictly distinguish interactive and batch computing – Process bulk of data in

CCP 2006 ­ Gyeongju, Korea ­ CDF computing ­ by Igor Sfiligoi 20

CDF input data handling● All official data managed by SAM

– A centralized system able to cache data near the worker nodes

● Unofficial data reside on group fileservers– Rootd or kerberized rcp used to access them

● Final analysis performed on local disks on users' own desktops

Page 21: CDF Computing · CCP 2006 Gyeongju, Korea CDF computing by Igor Sfiligoi 2 CDF Computing Philosophy Strictly distinguish interactive and batch computing – Process bulk of data in

CCP 2006 ­ Gyeongju, Korea ­ CDF computing ­ by Igor Sfiligoi 21

SAM● Sequential data Access via Meta­data (SAM)

– Files based, with metadata attached– Files are organized in datasets

● Users can create their own datasets● Each CAF job runs on a single dataset

– SAM will provide files in the optimal order– A recovery project can be automatically generated 

to recover from failed sections

Page 22: CDF Computing · CCP 2006 Gyeongju, Korea CDF computing by Igor Sfiligoi 2 CDF Computing Philosophy Strictly distinguish interactive and batch computing – Process bulk of data in

CCP 2006 ­ Gyeongju, Korea ­ CDF computing ­ by Igor Sfiligoi 22

SAM caching● Each site serving data needs to have a SAM station● A SAM station manages its own data

– Over a local filesystem– Over a distributed system, like dCache

● At Fermilab, CDF owns a 290Tb dCache pool● Outside Fermilab, only a total of 80Tb 

– Most data intensive jobs performed at Fermilab 

Page 23: CDF Computing · CCP 2006 Gyeongju, Korea CDF computing by Igor Sfiligoi 2 CDF Computing Philosophy Strictly distinguish interactive and batch computing – Process bulk of data in

CCP 2006 ­ Gyeongju, Korea ­ CDF computing ­ by Igor Sfiligoi 23

CDF outgoing data handling● CAF wrapper tars the resulting directory of a 

section and rcp it to a user specified fileserver– Most users rcp their data files by themselves, too

● MC production has its own upload mechanism– Still based on kerberized rcp– Automates metadata creation and storage to tape

● Looking into using more Grid­aware tools

Page 24: CDF Computing · CCP 2006 Gyeongju, Korea CDF computing by Igor Sfiligoi 2 CDF Computing Philosophy Strictly distinguish interactive and batch computing – Process bulk of data in

CCP 2006 ­ Gyeongju, Korea ­ CDF computing ­ by Igor Sfiligoi 24

CDF data handling in numbers

● CDF has acquired 1.4 fb­1 of data● Raw data amounts to 590Tb ● Reconstructed data amounts to 660Tb● MC data amounts to 280Tb

● Fermilab alone is serving ~18Tb a day

Page 25: CDF Computing · CCP 2006 Gyeongju, Korea CDF computing by Igor Sfiligoi 2 CDF Computing Philosophy Strictly distinguish interactive and batch computing – Process bulk of data in

CCP 2006 ­ Gyeongju, Korea ­ CDF computing ­ by Igor Sfiligoi 25

Summary● Most CDF computing done in batch mode● User analysis as important as production● Grid computing is an important piece of our strategy

– Using the tools that best serve us● Data handling under control

– Looking at Grid tools to better handle job output

Page 26: CDF Computing · CCP 2006 Gyeongju, Korea CDF computing by Igor Sfiligoi 2 CDF Computing Philosophy Strictly distinguish interactive and batch computing – Process bulk of data in

CCP 2006 ­ Gyeongju, Korea ­ CDF computing ­ by Igor Sfiligoi 26

References● CDF CAF Main Page

http://cdfcaf.fnal.gov/index.html#doc● Conference proceedings

– CDF papers at CHEP 2006http://www.tifr.res.in/~chep06/

Page 27: CDF Computing · CCP 2006 Gyeongju, Korea CDF computing by Igor Sfiligoi 2 CDF Computing Philosophy Strictly distinguish interactive and batch computing – Process bulk of data in

CCP 2006 ­ Gyeongju, Korea ­ CDF computing ­ by Igor Sfiligoi 27

Thank you