gridpp11 liverpool sept04 samgrid gridpp11 liverpool sept 2004 gavin davies imperial college london
TRANSCRIPT
GridPP11 Liverpool Sept04
SAMGridSAMGrid
GridPP11 LiverpoolSept 2004
Gavin DaviesImperial College London
GridPP11 Liverpool Sept04
IntroductionIntroduction
• Tevatron– Less data than LHC, but still PBs/experiment and growing – Running experiments
• SAM (Sequential Access to Metadata) – Well developed metadata and distributed data replication
system– Developed by DØ & FNAL-CD
• JIM (Job Information and Monitoring)– handles job submission and monitoring (all but data handling)– SAM + JIM →SAMGrid – computational grid
• Runjob – handles job workflow management
See http://cdinternal.fnal.gov/RUNIIRev2004/runIIMP.asp
GridPP11 Liverpool Sept04
SAM plotsSAM plots
Up to 200TB/month
Over 2 PB in last yr
CDF usage now similar-have just topped the PB
Active SAM sites40 DØ, 26 CDF
(DØ usage)
(DØ usage)
GridPP11 Liverpool Sept04
SAMGrid-plotsSAMGrid-plots
http://samgrid.fnal.gov:8080/(09/09/04)
JIM: Active execution sites: 11DØ, 1 CDF in testing
GridPP11 Liverpool Sept04
DDØ – Production - MCØ – Production - MC
• All DØ MC always produced off-site
• SAMGrid now default (went into production in mar 04)– Based on request system and jobmanager-mc_runjob– MC software package retrieved via SAM– Currently running at (multiple) sites in Cz, Fr, UK, USA (10 in total
+ FNAL)• more on way, inc central farm
– Average production efficiency ~90%– Average inefficiency due to grid infrastructure ~1-5%
• For more details, see– GridPP10 DØ talk by Peter Love– http://www-d0.fnal.gov/computing/grid/deployment-issues.html
GridPP11 Liverpool Sept04
• P14 Autumn 2003
– 25M events in UK– Based around mc_runjob– Distributed computing rather than Grid– UK effort key to project success
• P17 Autumn 2004– x 10 larger, use of db proxy servers– SAMGrid as default– Use LCG resources
DDØ – Production - Ø – Production - ReprocessingReprocessing
GridPP11 Liverpool Sept04
DDØ – Production - Ø – Production - LCGLCG
• Increasing effort to ensure SAMGrid / LCG interoperability– MC generated on EDG/LCG and other shared resources (inc Imperial, RAL) “by hand”– Demo of sam_client functionality on LCG at London workshop in Apr– Will use LCG resources p17 data reprocessing
All Nikhef MCproduced this way
GridPP11 Liverpool Sept04
(D(DØ –) RunjobØ –) Runjob
• mc_runjob currently used by SAMGrid for MC and reprocessing• DØrunjob - the rewrite• Joint CDF, CMS, DØ, FNAL-CD project
• Base classes from common Runjob package
• DØrunjob available this autumn– Will incorporate Sandbox as a separate module
• For details see: http://projects.fnal.gov/runjob/
Runjob
CDFRunjob CMSRunjob DØRunjob
GridPP11 Liverpool Sept04
CDF – production - ICDF – production - I
• See Mòrag Burgon-Lyon’s GridPP 10 talk for details
• Goal 1: 25% of computing offsite by June 2004– Done, using DCAF and SAM
• DCAF = de-centralised CDF analysis farm, core of 7 sites, more on way
• Goal 2: 50% by June 2005, using Grid– Resources being identified / pledged
• JIM deployment – Originally planned for Oct 15th – Problematic, look at grid3 as possible alternative
GridPP11 Liverpool Sept04
CDF – production - IICDF – production - II
• Migration of DCAF sites to Condor
• Migration to SAM V6– Switch to new internal dbserve code under test– Roll out to global sites expected soon
• FroNTier - new way to serve database contents to remote institutes– Should lower load on central CDF Oracle servers
• Studying methods to lower load and avoid fragmentation on remote file servers due to simultaneous network writes
GridPP11 Liverpool Sept04
(CDF -) SAMTV(CDF -) SAMTV
• SAM TV used by CDF & DØ to monitor SAM and SAM stations– Currently created from log files– Version in dev created from MIS database, filled by new MIS server
GridPP11 Liverpool Sept04
Summary / Summary / plansplans
• SAM & SAMGrid critical – GridPP key part of effort
• SAMGrid, default for– MC production– Data reprocessing from
autumn– Analysis to follow
• dØ tools, dØrte, sandboxing
• Interoperability– Good progress
DØ• 25% of computing off-site
– Most with DCAF/SAM– GridPP effort key part of effort
• Increase to 50% for June 2005– More DCAF installations
• Encourage user migration
UKLight -10Gbit/s - “data –reprocessing”
CDF