us cms software and computing project us cms collaboration meeting at fsu, may 2002
DESCRIPTION
US CMS Software and Computing Project US CMS Collaboration Meeting at FSU, May 2002. Lothar A T Bauerdick/Fermilab Project Manager. Scope and Deliverables. Provide Computing Infrastructure in the U.S. — that needs R&D Provide software engineering support for CMS - PowerPoint PPT PresentationTRANSCRIPT
May 11, 2002U.S. CMS Collaboration Meeting 1Lothar A T Bauerdick Fermilab
US CMS Software and Computing ProjectUS CMS Software and Computing Project
US CMS Collaboration Meeting at FSU, May 2002US CMS Collaboration Meeting at FSU, May 2002
Lothar A T Bauerdick/FermilabLothar A T Bauerdick/Fermilab
Project ManagerProject Manager
May 11, 2002U.S. CMS Collaboration Meeting 2Lothar A T Bauerdick Fermilab
Scope and DeliverablesScope and DeliverablesProvide Computing Infrastructure in the U.S. — that needs R&DProvide Computing Infrastructure in the U.S. — that needs R&DProvide software engineering support for CMSProvide software engineering support for CMS
Mission is to develop and build “User Facilities” for CMS physics in the U.S.Mission is to develop and build “User Facilities” for CMS physics in the U.S. To provide the enabling IT infrastructure that will allow To provide the enabling IT infrastructure that will allow
U.S. physicists to fully participate in the physics program of CMSU.S. physicists to fully participate in the physics program of CMS To provide the U.S. share of the framework and infrastructure software To provide the U.S. share of the framework and infrastructure software
Tier-1 center at Fermilab provides computing resources and supportTier-1 center at Fermilab provides computing resources and support User Support for “CMS physics community”, e.g. software distribution, help deskUser Support for “CMS physics community”, e.g. software distribution, help desk Support for Tier-2 centers, and for physics analysis center at FermilabSupport for Tier-2 centers, and for physics analysis center at Fermilab
Five Tier-2 centers in the U.S.Five Tier-2 centers in the U.S. Together will provide same CPU/Disk resources as Tier-1Together will provide same CPU/Disk resources as Tier-1 Facilitate “involvement of collaboration” in S&C developmentFacilitate “involvement of collaboration” in S&C development
Prototyping and test-bed effort very successfulPrototyping and test-bed effort very successful Universities will “bid” to host Tier-2 centerUniversities will “bid” to host Tier-2 center
taking advantage of existing resources and expertisetaking advantage of existing resources and expertise Tier-2 centers to be funded through NSF program for “empowering Universities”Tier-2 centers to be funded through NSF program for “empowering Universities”
Proposal to the NSF submitted Nov 2001Proposal to the NSF submitted Nov 2001
May 11, 2002U.S. CMS Collaboration Meeting 3Lothar A T Bauerdick Fermilab
Project Milestones and SchedulesProject Milestones and Schedules
Prototyping, test-beds, R&D started in 2000Prototyping, test-beds, R&D started in 2000“Developing the LHC Computing Grid” in the U.S.“Developing the LHC Computing Grid” in the U.S.
R&D systems, funded in FY2002 and FY2003 R&D systems, funded in FY2002 and FY2003 Used for “5% data challenge” (end 2003)Used for “5% data challenge” (end 2003)
release Software and Computing TDR (technical design report)release Software and Computing TDR (technical design report)
Prototype T1/T2 systems, funded in FY2004 Prototype T1/T2 systems, funded in FY2004 for “20% data challenge” (end 2004)for “20% data challenge” (end 2004)
end “Phase 1”, Regional Center TDR, start deploymentend “Phase 1”, Regional Center TDR, start deployment
Deployment: 2005-2007, 30%, 30%, 40% costsDeployment: 2005-2007, 30%, 30%, 40% costs
Fully Functional Tier-1/2 funded in FY2005 through FY2007Fully Functional Tier-1/2 funded in FY2005 through FY2007 ready for LHC physics run ready for LHC physics run
start of Physics Program start of Physics Program
S&C Maintenance and Operations: 2007 onS&C Maintenance and Operations: 2007 on
May 11, 2002U.S. CMS Collaboration Meeting 4Lothar A T Bauerdick Fermilab
US CMS S&C Since UCRUS CMS S&C Since UCRConsolidation of the project, shaping out the R&D programConsolidation of the project, shaping out the R&D program
Project Baselined in Nov 2001: Workplan for CAS, UF, Grids endorsedProject Baselined in Nov 2001: Workplan for CAS, UF, Grids endorsed
CMS has become “lead experiment” for Grid work CMS has become “lead experiment” for Grid work Koen, Greg, Rick Koen, Greg, Rick US Grid Projects PPDG, GriPhyN and iVDGLUS Grid Projects PPDG, GriPhyN and iVDGL EU Grid Projects DataGrid, Data TagEU Grid Projects DataGrid, Data Tag LHC Computing Grid ProjectLHC Computing Grid Project
Fermilab UF team, Tier-2 prototypes, US CMS testbed Fermilab UF team, Tier-2 prototypes, US CMS testbed Major production efforts, PRS support Major production efforts, PRS support
Objectivity goes, LCG comesObjectivity goes, LCG comes We do have a working software and computing system! We do have a working software and computing system! Physics Analysis Physics Analysis CCS will drive much of the common LCG Application areCCS will drive much of the common LCG Application are
Major challenges to manage and execute the projectMajor challenges to manage and execute the project Since fall 2001 we knew LHC start would be delayed Since fall 2001 we knew LHC start would be delayed new date April 2007 new date April 2007 Proposal to NSF in Oct 2001, things are probably moving nowProposal to NSF in Oct 2001, things are probably moving now New DOE funding guidance (and lack thereof from NSF) is starving us in 2002-2004New DOE funding guidance (and lack thereof from NSF) is starving us in 2002-2004
Very strong support for the Project from individuals in CMS, Fermilab, Grids, Very strong support for the Project from individuals in CMS, Fermilab, Grids, FAFA
May 11, 2002U.S. CMS Collaboration Meeting 5Lothar A T Bauerdick Fermilab
Other New DevelopmentsOther New Developments NSF proposal guidance AND DOE guidance are (S&C+M&O)NSF proposal guidance AND DOE guidance are (S&C+M&O)
That prompted a change in US CMS line management That prompted a change in US CMS line management Program Manager will oversee both Construction Project and S&C ProjectProgram Manager will oversee both Construction Project and S&C Project
New DOE guidance for S&C+M&O is much below New DOE guidance for S&C+M&O is much below S&C baseline + M&O requestS&C baseline + M&O request
Europeans have achieved major UF funding, Europeans have achieved major UF funding, significantly larger relative to U.S.significantly larger relative to U.S.LCG started, expects U.S. to partner with European projectsLCG started, expects U.S. to partner with European projects
LCG Application Area possibly imposes issues on CAS structure LCG Application Area possibly imposes issues on CAS structure
Many developments and changes Many developments and changes that invalidate or challenge much of what PM tried to achievethat invalidate or challenge much of what PM tried to achieve
Opportunity to take stock of where we stand in US CMS S&COpportunity to take stock of where we stand in US CMS S&Cbefore we try to understand where we need to go before we try to understand where we need to go
May 11, 2002U.S. CMS Collaboration Meeting 6Lothar A T Bauerdick Fermilab
Vivian has left S&C Vivian has left S&C
Thanks and appreciation Thanks and appreciation for Vivian’s workfor Vivian’s workof bringing the UF project of bringing the UF project to the successful baselineto the successful baseline
New scientist positionNew scientist positionopened at Fermilabopened at Fermilabfor UF L2 managerfor UF L2 managerand physics!and physics!
Other assignments Other assignments Hans Wenzel Hans Wenzel
Tier-1 ManagerTier-1 Manager Jorge RodrigezJorge Rodrigez
U.Florida pT2 L3 managerU.Florida pT2 L3 manager Greg Graham Greg Graham
CMS GIT Production Task LeadCMS GIT Production Task Lead Rick Cavenaugh Rick Cavenaugh
US CMS Testbed CoordinatorUS CMS Testbed Coordinator
May 11, 2002U.S. CMS Collaboration Meeting 7Lothar A T Bauerdick Fermilab
Project StatusProject Status
User Facilities status and successes: User Facilities status and successes: US CMS Prototypes systems: Tier-1, Tier-2, testbedUS CMS Prototypes systems: Tier-1, Tier-2, testbed Intense collaboration with US Grid project, Grid-enabled MC production systemIntense collaboration with US Grid project, Grid-enabled MC production system User Support: facilities, software, operations for PRS studies User Support: facilities, software, operations for PRS studies
Core Application Software status and successes:Core Application Software status and successes: See Ian’s talkSee Ian’s talk
Project Office startedProject Office started Project Engineer hired, to work on WBS, Schedule, Budget, Reporting, DocumentingProject Engineer hired, to work on WBS, Schedule, Budget, Reporting, Documenting SOWs in place w/ CAS Universities —MOUs, subcontracts, invoicing is comingSOWs in place w/ CAS Universities —MOUs, subcontracts, invoicing is coming In process of signing the MOUs In process of signing the MOUs Have a draft of MOU with iVDGL on prototype Tier-2 fundingHave a draft of MOU with iVDGL on prototype Tier-2 funding
May 11, 2002U.S. CMS Collaboration Meeting 8Lothar A T Bauerdick Fermilab
Successful Base-lining ReviewSuccessful Base-lining Review““The Committee endorses the proposed The Committee endorses the proposed
project scope, schedule, budgets and management plan”project scope, schedule, budgets and management plan”
Endorsement for the “Scrubbed” project plan following the DOE/NSF guidanceEndorsement for the “Scrubbed” project plan following the DOE/NSF guidance$3.5M$3.5MDOEDOE + $2M + $2MNSFNSF in FY2003 and $5.5 in FY2003 and $5.5DOEDOE + $3M + $3MNSFNSF in FY2004! in FY2004!
Findings & Recommendations of theProject Management Subcommittee
US CMS Project Management is in place andworking well
Project appears well definedĞScope can be achieved with proposed resources
ĞBudget matches agency guidance profile
ĞSchedule takes advantage of unofficial LHC slip -appears achievable (but with little or no margin)
Findings & Recommendations of theProject Management Subcommittee (2)
US CMS has taken an excellent first step in definingwhat services they require from grid-develop SWpackagesĞ US CMS needs to implement the tracking procedure to
assess grid-project progress and its impact on US CMSschedule
Committee is concerned about the potential impact onUS CMS of future design/specification decisionsmade by CERN, especially, e.g., in the area of datapersistence models and GRID technology
May 11, 2002U.S. CMS Collaboration Meeting 9Lothar A T Bauerdick Fermilab
0.13 M0.13 MHelsinkiHelsinki
0.07 M0.07 MWisconsinWisconsin
0.31 M0.31 MIN2P3IN2P3
0.76 M0.76 MINFNINFN
1.10 M1.10 MCERNCERN
1.27 M1.27 MBristol/RALBristol/RAL
2.50 M2.50 MCaltechCaltech
0.05 M0.05 M
0.06 M0.06 M
0.43 M0.43 M
1.65 M1.65 M
Simulated EventsSimulated EventsTOTAL = 8.4 MTOTAL = 8.4 M
UFLUFL
UCSDUCSD
MoscowMoscow
FNALFNAL
TYPICAL EVENT SIZES
Simulated 1 CMSIM event
= 1 OOHit event= 1.4
MB
Reconstructed 1 “1033” event
= 1.2 MB
1 “2x1033” event
= 1.6 MB
1 “1034” event= 5.6
MB
0.22 TB0.22 TBBristol/RALBristol/RAL
0.08 TB0.08 TB
--
0.05 TB0.05 TB
0.10 TB0.10 TB
0.20 TB0.20 TB
0.40 TB0.40 TB
0.45 TB 0.45 TB
12 TB12 TB
14 TB14 TB
UFLUFL
HelsinkiHelsinki
WisconsinWisconsin
IN2P3IN2P3
UCSDUCSD
INFNINFN
MoscowMoscow
FNALFNAL
CERNCERN
Reconstructed Reconstructed w/ Pile-Upw/ Pile-UpTOTAL = 29 TBTOTAL = 29 TB
0.60 TB0.60 TBCaltechCaltech
CMS Produced Data in 2001CMS Produced Data in 2001
These fully simulated data samples are essential for physics and trigger studiesThese fully simulated data samples are essential for physics and trigger studies
Technical Design Report for DAQ and Higher Level TriggersTechnical Design Report for DAQ and Higher Level Triggers
May 11, 2002U.S. CMS Collaboration Meeting 10Lothar A T Bauerdick Fermilab
Production OperationsProduction OperationsProduction Efforts are Manpower intensive! Production Efforts are Manpower intensive!
Fermiulab Tier-1 Production Operations Fermiulab Tier-1 Production Operations ∑ = 1.7 FTE∑ = 1.7 FTE sustained effortsustained effort to fill those 8 roles to fill those 8 roles
+ the system support people that need to help if something goes wrong!!!+ the system support people that need to help if something goes wrong!!!
At Fermilab (US CMS, PPDG)
Greg Graham, Shafqat Aziz, Yujun Wu, Moacyr Souza, Hans Wenzel, Michael Ernst, Shahzad Muzaffar + staff
At U Florida (GriPhyN, iVDGL)
Dimitri Bourilkov, Jorge Rodrigez, Rick Cavenaugh + staff
At Caltech (GriPhyN, PPDG, iVDGL, USCMS)
Vladimir Litvin, Suresh Singh et al
At UCSD (PPDG, iVDGL)
Ian Fisk, James Letts + staff
At Wisconsin
Pam Chumney, R. Gowrishankara, David Mulvihill + Peter Couvares, Alain Roy et al
At CERN (USCMS)
Tony Wildish + many
May 11, 2002U.S. CMS Collaboration Meeting 11Lothar A T Bauerdick Fermilab
US CMS Prototypes and Test-bedsUS CMS Prototypes and Test-bedsTier-1Tier-1 and Tier-2 Prototypes and Test-beds operational and Tier-2 Prototypes and Test-beds operational
Facilities for event simulationFacilities for event simulationincluding reconstructionincluding reconstruction
Sophisticated processing for pile-up simulationSophisticated processing for pile-up simulation User cluster and hosting of data samples User cluster and hosting of data samples
for physics studiesfor physics studies Facilities and Grid R&DFacilities and Grid R&D
May 11, 2002U.S. CMS Collaboration Meeting 12Lothar A T Bauerdick Fermilab
IBM - servers
CMSUN1 Dell -servers
Chocolat
snickers
Chimichanga
Chalupa
Winchester Raid
Tier-1 equipmentTier-1 equipment
May 11, 2002U.S. CMS Collaboration Meeting 13Lothar A T Bauerdick Fermilab
Popcorns (MC production)
frys(user)
gyoza(test)
Tier-1 EquipmentTier-1 Equipment
May 11, 2002U.S. CMS Collaboration Meeting 14Lothar A T Bauerdick Fermilab
Using the Tier-1 system User SystemUsing the Tier-1 system User System
Until the Grid becomes reality (maybe soon!) people who want to use computing facilities at Fermilab need to obtain an account
That requires registration as a Fermilab user (DOE requirement)
We will make sure that turn-around times are reasonably short, did not hear complains yet
Go to Go to http://computing.fnal.gov/cms/http://computing.fnal.gov/cms/ click on the "CMS Account" button that will guide you through the process click on the "CMS Account" button that will guide you through the process Step 1: Get a Valid Fermilab ID Step 1: Get a Valid Fermilab ID Step 2: Get a fnalu account and CMS account Step 2: Get a fnalu account and CMS account Step 3: Get a Kerberos principal and krypto card Step 3: Get a Kerberos principal and krypto card Step 4: Information for first-time CMS account users Step 4: Information for first-time CMS account users
http://consult.cern.ch/writeup/form01/http://consult.cern.ch/writeup/form01/
Got > 100 users, currently about 1 new user per weekGot > 100 users, currently about 1 new user per week
May 11, 2002U.S. CMS Collaboration Meeting 15Lothar A T Bauerdick Fermilab
FRY1
FRY5
FRY7
FRY3
FRY2
FRY6
FRY4
FRY8
SWITCH BIGMACGigaBit
100Mbps
RAID 250 GB
SCSI 160
R&D on “reliable i/a service”OS: Mosix?batch system: Fbsng?Storage: Disk farm?
US CMS User ClusterUS CMS User Cluster
To be released June 2002! nTuple, Objy analysis etc
May 11, 2002U.S. CMS Collaboration Meeting 16Lothar A T Bauerdick Fermilab
EnstoreSTKENSilo
Snickers
RAID 1TB
IDE
AMD serverAMD/Enstore interface
Network
Users
Objects
> 10 TB
Working on providingPowerful disk cache
Host redirection protocol allows to add more servers --> scaling+ load balancing
User Access to Tier-1 DataUser Access to Tier-1 DataHosting of Jets/Met dataHosting of Jets/Met data
Muons will be coming soonMuons will be coming soon
May 11, 2002U.S. CMS Collaboration Meeting 17Lothar A T Bauerdick Fermilab
US CMS T2 Prototypes and Test-bedsUS CMS T2 Prototypes and Test-bedsTier-1 and Tier-1 and Tier-2Tier-2 Prototypes and Test-beds operational Prototypes and Test-beds operational
May 11, 2002U.S. CMS Collaboration Meeting 18Lothar A T Bauerdick Fermilab
California Prototype Tier-2 SetupCalifornia Prototype Tier-2 Setup
UCSD CaltechUCSD Caltech
May 11, 2002U.S. CMS Collaboration Meeting 19Lothar A T Bauerdick Fermilab
Benefits Of US Tier-2 CentersBenefits Of US Tier-2 CentersBring computing resources close of user communitiesBring computing resources close of user communities
Provide dedicated resources to regions (of interest and geographical)Provide dedicated resources to regions (of interest and geographical) More control over localized resources, more opportunities to pursue physics goalsMore control over localized resources, more opportunities to pursue physics goals
Leverage Additional Resources, which exist at the universities and labsLeverage Additional Resources, which exist at the universities and labs Reduce computing requirements of CERN (supposed to account for 1/3 of total LHC facilities!) Reduce computing requirements of CERN (supposed to account for 1/3 of total LHC facilities!) Help meet the LHC Computing ChallengeHelp meet the LHC Computing Challenge
Provide diverse collection of sites, equipment, expertise for development and testingProvide diverse collection of sites, equipment, expertise for development and testing Provide much needed computing resourcesProvide much needed computing resources
US-CMS plans for about 2 FTE at each Tier-2 site + Equipment fundingUS-CMS plans for about 2 FTE at each Tier-2 site + Equipment fundingsupplemented with Grid, University and Lab funds (BTW: no I/S costs in US CMS plan)supplemented with Grid, University and Lab funds (BTW: no I/S costs in US CMS plan)
Problem: How do you run a center with only two peopleProblem: How do you run a center with only two peoplethat will have much greater processing power than CERN has currently ?that will have much greater processing power than CERN has currently ?
This involved facilities and operations R&DThis involved facilities and operations R&Dto reduce the operations personnel required to run the centerto reduce the operations personnel required to run the centere.g. investigating cluster management software e.g. investigating cluster management software
May 11, 2002U.S. CMS Collaboration Meeting 20Lothar A T Bauerdick Fermilab
U.S. Tier-1/2 System OperationalU.S. Tier-1/2 System Operational
CMS Grid Integration and Deployment CMS Grid Integration and Deployment on U.S. CMS Test Bed on U.S. CMS Test Bed
Data Challenges and Production RunsData Challenges and Production Runson Tier-1/2 Prototype Systemson Tier-1/2 Prototype Systems
““Spring Production 2002” finishingSpring Production 2002” finishing Physics, Trigger, Detector studies Physics, Trigger, Detector studies Produce 10M events and 15 TB of dataProduce 10M events and 15 TB of data
also 10M mini-biasalso 10M mini-biasfully simulated including pile-upfully simulated including pile-upfully reconstructedfully reconstructed
Large assignment to U.S. CMSLarge assignment to U.S. CMS
Successful Production in 2001:Successful Production in 2001: 8.4M events fully simulated, including pile-up, 8.4M events fully simulated, including pile-up,
50% in the U.S.50% in the U.S. 29TB data processed29TB data processed
13TB in the U.S.13TB in the U.S.
May 11, 2002U.S. CMS Collaboration Meeting 21Lothar A T Bauerdick Fermilab
US CMS Prototypes and Test-bedsUS CMS Prototypes and Test-bedsAll U.S. CMS S&C Institutions are involved in DOE and NSF Grid ProjectsAll U.S. CMS S&C Institutions are involved in DOE and NSF Grid Projects
Integrating Grid softwareIntegrating Grid softwareinto CMS systemsinto CMS systems
Bringing CMS ProductionBringing CMS Productionon the Gridon the Grid
Understanding the Understanding the operational issuesoperational issues
CMS directly profit from Grid FundingCMS directly profit from Grid Funding
Deliverables of Grid Projects Deliverables of Grid Projects become useful for LHC in the “real world”become useful for LHC in the “real world” Major success: MOP, GDMPMajor success: MOP, GDMP
May 11, 2002U.S. CMS Collaboration Meeting 22Lothar A T Bauerdick Fermilab
Grid-enabled CMS ProductionGrid-enabled CMS Production
Successful collaboration with Grid Projects!Successful collaboration with Grid Projects! MOP (Fermilab, U.Wisconsin/Condor):MOP (Fermilab, U.Wisconsin/Condor):
Remote job execution Condor-G, DAGmanRemote job execution Condor-G, DAGman GDMP (Fermilab, European DataGrid WP2)GDMP (Fermilab, European DataGrid WP2)
File replication and replica catalog (Globus)File replication and replica catalog (Globus)
Successfully used on CMS testbedSuccessfully used on CMS testbed
First real CMS Production use finishing now!First real CMS Production use finishing now!
May 11, 2002U.S. CMS Collaboration Meeting 23Lothar A T Bauerdick Fermilab
Recent Successes with the GridRecent Successes with the Grid
Grid Enabled CMS Production Environment Grid Enabled CMS Production Environment NB: MOP = “Grid-ified” IMPALA, vertically integrated CMS applicationNB: MOP = “Grid-ified” IMPALA, vertically integrated CMS application
Brings together US CMS with all three US Grid ProjectsBrings together US CMS with all three US Grid Projects PPDG: Grid developers (Condor, DAGman), GDMP (w/ WP2), PPDG: Grid developers (Condor, DAGman), GDMP (w/ WP2), GriPhyN: VDT, in the future also the virtual data catalogGriPhyN: VDT, in the future also the virtual data catalog iVDGL: pT2 sites and US CMS testbediVDGL: pT2 sites and US CMS testbed
CMS Spring 2002 production assignment of 200k events to MOPCMS Spring 2002 production assignment of 200k events to MOP Half-way through, next week transfer back to CERNHalf-way through, next week transfer back to CERN
This is being considered a major success — for US CMS and Grids!This is being considered a major success — for US CMS and Grids!
Many bugs in Condor and Globus found and fixedMany bugs in Condor and Globus found and fixed
Many operational issues that needed and still need to be sorted outMany operational issues that needed and still need to be sorted out
MOP will be moved into production Tier-1/Tier-2 environmentMOP will be moved into production Tier-1/Tier-2 environment
May 11, 2002U.S. CMS Collaboration Meeting 24Lothar A T Bauerdick Fermilab
Successes: Grid-enabled ProductionSuccesses: Grid-enabled Production
Major Milestone for US CMS and PPDG Major Milestone for US CMS and PPDG From PPDG internal review of MOP:From PPDG internal review of MOP:
““From the Grid perspective, MOP has been outstanding. It has both legitimized From the Grid perspective, MOP has been outstanding. It has both legitimized the idea of using Grid tools such as DAGMAN, Condor-G, GDMP, and Globus the idea of using Grid tools such as DAGMAN, Condor-G, GDMP, and Globus in a real production environment outside of prototypes and trade show in a real production environment outside of prototypes and trade show demonstrations. Furthermore, it has motivated the use of Grid tools such as demonstrations. Furthermore, it has motivated the use of Grid tools such as DAGMAN, Condor-G, GDMP, and Globus in novel environments leading to the DAGMAN, Condor-G, GDMP, and Globus in novel environments leading to the discovery of many bugs which would otherwise have prevented these tools discovery of many bugs which would otherwise have prevented these tools from being taken seriously in a real production environment. from being taken seriously in a real production environment.
From the CMS perspective, MOP won early respect for taking on real From the CMS perspective, MOP won early respect for taking on real production problems, and is soon ready to deliver real events. In fact, today or production problems, and is soon ready to deliver real events. In fact, today or early next week we will update the RefDB at CERN which tracks production at early next week we will update the RefDB at CERN which tracks production at various regional centers. This has been delayed because of the numerous various regional centers. This has been delayed because of the numerous bugs that, while being tracked down, involved several cycles of development bugs that, while being tracked down, involved several cycles of development and redeployment. The end of the current CMS production cycle is in three and redeployment. The end of the current CMS production cycle is in three weeks, and MOP will be able to demonstrate some grid enabled production weeks, and MOP will be able to demonstrate some grid enabled production capability by then. We are confident that this will happen. It is not necessary at capability by then. We are confident that this will happen. It is not necessary at this stage to have a perfect MOP system for CMS Production; IMPALA also this stage to have a perfect MOP system for CMS Production; IMPALA also has some failover capability and we will use that where possible. However, it has some failover capability and we will use that where possible. However, it has been a very useful exercise and we believe that we are among the first has been a very useful exercise and we believe that we are among the first team to tackle Globus and Condor-G in such a stringent and HEP specific team to tackle Globus and Condor-G in such a stringent and HEP specific environment.”environment.”
May 11, 2002U.S. CMS Collaboration Meeting 25Lothar A T Bauerdick Fermilab
Successes: File TransfersSuccesses: File TransfersIn 2001 were observing typical rates for large data transfers,In 2001 were observing typical rates for large data transfers,
e.g. CERN - FNAL 4.7 GB/houre.g. CERN - FNAL 4.7 GB/hourAfter network tuning, using Grid Tools (Globus URLcopy) we gain a factor 10!After network tuning, using Grid Tools (Globus URLcopy) we gain a factor 10!
Today we are transferring 1.5 TByte of simulated data from UCSD to FNALToday we are transferring 1.5 TByte of simulated data from UCSD to FNAL at rates of 10 MByte/second! at rates of 10 MByte/second!
That almost saturates the network I/f out of Fermilab (155Mbps) and at UCSD (FastEthernet)…That almost saturates the network I/f out of Fermilab (155Mbps) and at UCSD (FastEthernet)…
The ability to transfer a TeraByte in a day is crucial for the Tier-1/Tier-2 systemThe ability to transfer a TeraByte in a day is crucial for the Tier-1/Tier-2 system
Many operational issues remain to be solvedMany operational issues remain to be solved GDMP is a grid tool for file replication, developed jointly b/w US and EUGDMP is a grid tool for file replication, developed jointly b/w US and EU ““show case” application for EU Data Grid WP2: data replicationshow case” application for EU Data Grid WP2: data replication Needs more work and strong support Needs more work and strong support VDT team (PPDG, GriPhyN, iVDGL) VDT team (PPDG, GriPhyN, iVDGL)
e.g. CMS “GDMP heartbeat” for debugging new installations and monitoring old ones.e.g. CMS “GDMP heartbeat” for debugging new installations and monitoring old ones. Installation and configuration issues — releases of underlying software like GlobusInstallation and configuration issues — releases of underlying software like Globus Issues with site security and e.g. FirewallIssues with site security and e.g. Firewall Uses Globus Security Infrastructure, which demands”VO” Certification Authority infrastructure for Uses Globus Security Infrastructure, which demands”VO” Certification Authority infrastructure for
CMSCMS Etc pp…Etc pp…
This needs to be developed, tested, deployed and shows This needs to be developed, tested, deployed and shows that the USCMS testbed is invaluable!that the USCMS testbed is invaluable!
May 11, 2002U.S. CMS Collaboration Meeting 26Lothar A T Bauerdick Fermilab
DOE/NSF Grid R&D Funding for CMSDOE/NSF Grid R&D Funding for CMS
DOE/NSF Grid R&D Funding for CMS 2001 2002 2003 2004 2005 2006
GriPhyNTotal, including CS and all Experiments 2543 2543 2543 2241
CMS Staff 582 582 582 582
iVDGLTotal, including CS and all Experiments 2650 2750 2750 2750 2750
CMS Equipment 232 192 187 57 65CMS Staff 234 336 358 390 390
PPDGTotal, including CS and all Experiements 3180 3180 3180
Caltech 187 187 187FNAL 132 132 132UCSD 80 80 80Total CMS 399 399 399
May 11, 2002U.S. CMS Collaboration Meeting 27Lothar A T Bauerdick Fermilab
Farm SetupFarm SetupAlmost any computer can run the CMKIN and CMSIM steps Almost any computer can run the CMKIN and CMSIM steps
using the CMS binary distribution system (US CMS DAR)using the CMS binary distribution system (US CMS DAR)
As long as ample storage is available problem scales wellAs long as ample storage is available problem scales well
This step is “almost trivially” This step is “almost trivially” put on the Grid — almost…put on the Grid — almost…
May 11, 2002U.S. CMS Collaboration Meeting 28Lothar A T Bauerdick Fermilab
26
24
8
4 HPSS
5
HPSS
HPSS UniTree
External Networks
External Networks
External Networks
External Networks
Site Resources Site Resources
Site ResourcesSite ResourcesNCSA/PACI8 TF240 TB
SDSC4.1 TF225 TB
Caltech Argonne
TeraGrid/DTF: NCSA, SDSC, Caltech, Argonne www.teragrid.org
e.g. on the 13.6 TF - $53M TeraGrid?e.g. on the 13.6 TF - $53M TeraGrid?
May 11, 2002U.S. CMS Collaboration Meeting 29Lothar A T Bauerdick Fermilab
Farm Setup for ReconstructionFarm Setup for Reconstruction
• The first step of the reconstruction is Hit Formatting, where simulated The first step of the reconstruction is Hit Formatting, where simulated data is taken from the Fortran files, formatted and entered into the data is taken from the Fortran files, formatted and entered into the Objectivity data base.Objectivity data base.
• Process is sufficiently fast and involves enough data that more than Process is sufficiently fast and involves enough data that more than 10-20 jobs will bog down the data base server.10-20 jobs will bog down the data base server.
May 11, 2002U.S. CMS Collaboration Meeting 30Lothar A T Bauerdick Fermilab
All charged tracks with pt > 2 GeV
Reconstructed tracks with pt > 25 GeV
(+30 minimum bias events)
This makes a CPU-limited task (event simulation) VERY I/O intensive!
Pile-up simulation!Pile-up simulation!
Unique at LHC due to high Luminosity and short bunch-crossing timeUnique at LHC due to high Luminosity and short bunch-crossing time
Up to 200 “Minimum Bias” events overlayed on interesting triggersUp to 200 “Minimum Bias” events overlayed on interesting triggers
Lead to “Pile-up” in detectors Lead to “Pile-up” in detectors needs to be simulated! needs to be simulated!
May 11, 2002U.S. CMS Collaboration Meeting 31Lothar A T Bauerdick Fermilab
Farm Setup for Pile-up DigitizationFarm Setup for Pile-up DigitizationThe most advanced production step is digitization with pile-upThe most advanced production step is digitization with pile-up
The response of the detector is digitized the physics objects are reconstructed and stored The response of the detector is digitized the physics objects are reconstructed and stored persistently and at full luminosity 200 minimum bias events are combined with the signal persistently and at full luminosity 200 minimum bias events are combined with the signal eventsevents
Due to the large amount of minimum bias events multiple Objectivity AMS data servers are needed. Several configurations have been tried.
May 11, 2002U.S. CMS Collaboration Meeting 32Lothar A T Bauerdick Fermilab
Objy Server Deployment: ComplexObjy Server Deployment: Complex
4 Production Federationsat FNAL. (Uses catalog only tolocate database files.)
3 FNAL servers plus several worker nodes usedin this configuration.
3 federation hosts with attached RAID partitions2 lock servers4 journal servers9 pileup servers
May 11, 2002U.S. CMS Collaboration Meeting 33Lothar A T Bauerdick Fermilab
Example of CMS Physics StudiesExample of CMS Physics StudiesResolution studies for jet reconstructionResolution studies for jet reconstruction
Full detector simulation essential to understand jet resolutionsFull detector simulation essential to understand jet resolutions
Indispensable to design realistic triggers and understand rates at high lumiIndispensable to design realistic triggers and understand rates at high lumi
QCD 2-jet events with FSRQCD 2-jet events with FSR
Full simulation w/ tracks, HCAL noiseFull simulation w/ tracks, HCAL noise
QCD 2-jet events, no FSRQCD 2-jet events, no FSR
No pile-up, no tracks recon, no HCAL noiseNo pile-up, no tracks recon, no HCAL noise
May 11, 2002U.S. CMS Collaboration Meeting 34Lothar A T Bauerdick Fermilab
Pile up & Jet Energy ResolutionPile up & Jet Energy ResolutionJet energy resolutionJet energy resolution
Pile-up contribution to jet are large and have large variationsPile-up contribution to jet are large and have large variations Can be estimated event-by-event from total energy in eventCan be estimated event-by-event from total energy in event Large improvement if pile-up correction applied (Large improvement if pile-up correction applied (red curvered curve)) e.g. 50% e.g. 50% 35% at 35% at EETT = 40GeV = 40GeV
Physics studies depend on full detailed detector simulation Physics studies depend on full detailed detector simulation realistic pile-up processing is essential!realistic pile-up processing is essential!
1
3.5 4.5
May 11, 2002U.S. CMS Collaboration Meeting 35Lothar A T Bauerdick Fermilab
Tutorial at UCSDTutorial at UCSD
Very successful 4-day tutorial with ~40 people attendingVery successful 4-day tutorial with ~40 people attending Covering use of CMS software, including CMKIN/CMSIM, ORCA, OSCAR, IGUANACovering use of CMS software, including CMKIN/CMSIM, ORCA, OSCAR, IGUANA Covering physics code examples from all PRS groupsCovering physics code examples from all PRS groups Covering production tools and environment and Grid toolsCovering production tools and environment and Grid tools
Opportunity to get people togetherOpportunity to get people together UF and CAS engineers with PRS physicistsUF and CAS engineers with PRS physicists Grid developers and CMS usersGrid developers and CMS users
The tutorials have been very well thought throughThe tutorials have been very well thought through very useful for self-study, so they will be maintained very useful for self-study, so they will be maintained
It is amazing what we already can do with CMS softwareIt is amazing what we already can do with CMS software E.g. impressive to see IGUANA visualization environment, E.g. impressive to see IGUANA visualization environment,
including “home made” visualizationsincluding “home made” visualizations
However, our system is (too?, still too?) complexHowever, our system is (too?, still too?) complex
We maybe need more people taking a day off and We maybe need more people taking a day off and go through the self- guided tutorials go through the self- guided tutorials
May 11, 2002U.S. CMS Collaboration Meeting 36Lothar A T Bauerdick Fermilab
Allocation for FY2002 Caltech UCSD FNAL NEU Princeton UC Davis UFlorida TOTALCore Applications Software (CAS) FTE
2.0 2.0 3.0 1.0 1.0 9.0User Facilities (UF) FTE
1.0 1.0 5.5 1.5 9.0TOTAL FTE 3.0 1.0 7.5 4.0 1.0 1.0 1.5 18.0
CAS Personnel (salary, PC, travel,...) 310.0 270.0 620.0 150.0 185.0 1,535.0 UF Personnel (salary, PC, travel,...) 155.0 155.0 795.0 232.0 1,337.0 UF Tier 1 Equipment 140.0 140.0 UF Tier 2 Equipment 120.0 112.0 232.0 Project Office, Management Reserve 390.0 274.0 664.0
TOTAL COST [AY$ x 1000] 585.0 155.0 1595.0 894.0 150.0 185.0 344.0 3,908.0
FY2002 UF FundingFY2002 UF FundingExcellent initial effort and DOE support for User FacilitiesExcellent initial effort and DOE support for User Facilities
Fermilab established as Tier-1 prototype and major Grid node for LHC computingFermilab established as Tier-1 prototype and major Grid node for LHC computing Tier-2 sites and testbeds are operational and are contributing to production and R&DTier-2 sites and testbeds are operational and are contributing to production and R&D Headstart for U.S. efforts has pushed CERN commitment to support remote sitesHeadstart for U.S. efforts has pushed CERN commitment to support remote sites
The FY2002 funding has given major headaches to PMThe FY2002 funding has given major headaches to PM DOE funding $2.24M was insufficient to ramp the Tier-1 to base-line sizeDOE funding $2.24M was insufficient to ramp the Tier-1 to base-line size The NSF contribution is unknown as of todayThe NSF contribution is unknown as of today
According to plan we should have more people and equipment at Fermilab T1According to plan we should have more people and equipment at Fermilab T1 Need some 7 additional FTEs and more equipment fundingNeed some 7 additional FTEs and more equipment funding This has been strongly endorsed by the baseline reviewsThis has been strongly endorsed by the baseline reviews All European RC (DE, FR, IT, UK, even RU!) have support at this level of effortAll European RC (DE, FR, IT, UK, even RU!) have support at this level of effort
May 11, 2002U.S. CMS Collaboration Meeting 37Lothar A T Bauerdick Fermilab
Plans For 2002 - 2003Plans For 2002 - 2003
Finish Spring Production challenge until JuneFinish Spring Production challenge until June User Cluster, User FederationsUser Cluster, User Federations Upgrade of facilities ($300k)Upgrade of facilities ($300k)
Develop CMS Grid environment toward LCG Production GridDevelop CMS Grid environment toward LCG Production Grid Move CMS Grid environment from testbed to facilitiesMove CMS Grid environment from testbed to facilities Prepare for first LCG-USUF milestone, November? Prepare for first LCG-USUF milestone, November?
Tier-2, -iVDGL milestones w/ ATLAS, SC2002Tier-2, -iVDGL milestones w/ ATLAS, SC2002 LCG-USUF Production Grid milestone in May 2003LCG-USUF Production Grid milestone in May 2003
Bring Tier-1/Tier-2 prototypes up to scaleBring Tier-1/Tier-2 prototypes up to scale Serving user community: User cluster, federations,Grid enabled user environmentServing user community: User cluster, federations,Grid enabled user environment UF studies with persistency frameworkUF studies with persistency framework Start of physics DCs and computing DCsStart of physics DCs and computing DCs
CAS: LCG “everything is on the table, but the table is not empty”CAS: LCG “everything is on the table, but the table is not empty” persistency framework - prototype in September 2002persistency framework - prototype in September 2002
Release in July 2003Release in July 2003 DDD and OSCAR/Geant4 releasesDDD and OSCAR/Geant4 releases New strategy for visualization / IGUANANew strategy for visualization / IGUANA Develop distributed analysis environment w/ Caltech et alDevelop distributed analysis environment w/ Caltech et al
May 11, 2002U.S. CMS Collaboration Meeting 38Lothar A T Bauerdick Fermilab
Funding for UF R&D PhaseFunding for UF R&D PhaseThere is lack of funding and lack of guidance for 2003-2005There is lack of funding and lack of guidance for 2003-2005
NSF proposal guidance AND DOE guidance are (S&C+M&O)NSF proposal guidance AND DOE guidance are (S&C+M&O) New DOE guidance for S&C+M&O is much below New DOE guidance for S&C+M&O is much below
S&C baseline + M&O requestS&C baseline + M&O request
Fermilab USCMS projects oversight has proposed Fermilab USCMS projects oversight has proposed minimal M&O for 2003-2004minimal M&O for 2003-2004and large cuts for S&C given the new DOE guidanceand large cuts for S&C given the new DOE guidance
The NSF has “ventilated the idea” to apply a rule of 81/250*DOE fundingThe NSF has “ventilated the idea” to apply a rule of 81/250*DOE funding This would lead to very serious problems in every year of the projectThis would lead to very serious problems in every year of the project
we would lack 1/3 of the requested funding ($14.0M/$21.2M)we would lack 1/3 of the requested funding ($14.0M/$21.2M)
May 11, 2002U.S. CMS Collaboration Meeting 39Lothar A T Bauerdick Fermilab
DOE/NSF funding shortfallDOE/NSF funding shortfallNSF Project Costs - 11/2001
NSF guidance using 81/250 rule
0.73 0.94 1.13
2.924.13 4.13
0
2
4
6
8
10
12
14
2002 2003 2004 2005 2006 2007
CAS NSF UF Tier2 labor
UF Tier2 h/w Project Office NSF
Mgmt Reserve NSF NSF (assumed)
FY
Million AY$
DOE Project Costs - 11/2001DOE-FNAL Funding Profile - 4/2002
2.22.9
3.5
9.0
12.8 12.8
0
2
4
6
8
10
12
14
2002 2003 2004 2005 2006 2007
CAS DOE UF Tier1 labor
UF Tier1 h/w Project Office DOE
Mgmt Reserve DOE DOE-FNAL(2002 S&C)
FY
Million AY$
May 11, 2002U.S. CMS Collaboration Meeting 40Lothar A T Bauerdick Fermilab
FY2003 Allocation à la Nov2001FY2003 Allocation à la Nov2001
FY2003 Budget Allocation - 11/2001
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Software Labor
Facilities Labor
FacilitiesEquipment
Project Office
ManagementReserve
DOE NSF iVDGL
$M
Total Costs $5.98M DOE $4.53M NSF $1.45M
May 11, 2002U.S. CMS Collaboration Meeting 41Lothar A T Bauerdick Fermilab
Europeans Achieved Major UF Funding Europeans Achieved Major UF Funding
Funding for European User Facilities in their countriesFunding for European User Facilities in their countriesnow looks significantly larger than UF funding in the U.S.now looks significantly larger than UF funding in the U.S.
This statement is true relative to size of their respective communitiesThis statement is true relative to size of their respective communities
It is in some cases event true in absolute terms!!It is in some cases event true in absolute terms!!
Given our funding situation:Given our funding situation:are we going to be a partner for those efforts?are we going to be a partner for those efforts? BTW: USATLAS proposes major cuts in UF/Tier-1 “pilot flame” at BNLBTW: USATLAS proposes major cuts in UF/Tier-1 “pilot flame” at BNL
May 11, 2002U.S. CMS Collaboration Meeting 42Lothar A T Bauerdick Fermilab
Forschungszentrum KarlsruheTechnik und Umwelt
Regional Data and Computing Centre Germany (RDCCG)
RDCCG Evolution (available capacity)30% rolling upgrade each year after 2007
+281837378063452111117Tape /TByte
+4501421437203113457Disk /TByte
+276896134461991CPU /kSI95
2007+200720054/20044/20034/200211/2001month/yearLHC+nLHC
FTE Evolution 2002 - 2005Support: 5 - 30Development: 8 - 10New office building to accommodate 130 FTE in 2005
Networking Evolution 2002 - 20051) RDCCG to CERN/FermiLab/SLAC (Permanent point to point): 1 GBit/s Ğ 10 GBit/s => 2 GBit/s could be arranged on a very short timescale2) RDCCG to general Internet: 34 MBit/s Ğ 100 MBit/s => Current situation, generally less affordable than 1)
How About The Others: DEHow About The Others: DE
May 11, 2002U.S. CMS Collaboration Meeting 43Lothar A T Bauerdick Fermilab
How About The Others: ITHow About The Others: IT
TIER1 Resources
FARM DISKS TAPES
Year (SI2000) (TB) (TB)
2001 60,000 10 102002 200,000 80 502003 900,000 120 3002004 1,550,000 192 6002005 3,100,000 380 2,0002006 4,000,000 480 3,000
HARDWARE
Tier2 will have almost the same amount of CPU & Disks
TIER1 Resources
Type N. New Outsource
Manager 1Deputy 1LHC Experiments Software 2Programs, Tools, Procedures 2 2FARM Management & Planning 2 2ODB & Data Management 2 1Network (LAN+WAN) 2 2Other Services (Web, Security, etc.) 2 1Administration 2 1System Managers & Operators 6 6
Total 22 9 6
PERSONNEL
Tier2 Personnel is of the same order of magnitude.
May 11, 2002U.S. CMS Collaboration Meeting 44Lothar A T Bauerdick Fermilab
How About The Others: RUHow About The Others: RU
Russian Tier2-Cluster
Cluster of institutional computing centers with Tier2 functionality and summary resources at 50-70% level of the canonical Tier1 center for each experiment (ALICE, ATLAS, CMS, LHCb):
analysis; simulations; users data support.
Participating institutes:Moscow SINP MSU, ITEP, KI, ÉMoscow region JINR, IHEP, INR RASSt.Petersburg PNPI RAS, ÉNovosibirsk BINP SB RAS
Coherent use of distributed resources by means of DataGrid technologies.
Active participation in the LCG Phase1 Prototyping and Data Challenges (at 5-10% level).
Gbps/É155/É20-30/15-302-3/10NetworkMbps
LCG/commodity
125050-1002010Tape TB
85050-70126Disk TB
41025-35105CPU kSI95
20072004 Q42002 Q42002 Q1
FTE: 10-12 (2002 Q1), 12-15 (2002 Q2), 25-30 (2004 Q4)
May 11, 2002U.S. CMS Collaboration Meeting 45Lothar A T Bauerdick Fermilab
How About The Others: UKHow About The Others: UK
John Gordon - LCG 13th March 2002- n° 5
UK Tier1/A Status
Hardware Purchase for delivery today156 Dual 1.4GHz 1GB RAM, 30GB disks (312 cpus)26 Disk servers (Dual 1.266GHz) 1.9TB disk eachExpand the capacity of the tape robot by 35TB
Current EDG TB setup14 Dual 1GHz PIII, 500MBRAM 40GB disksCompute Element (CE)Storage Element (SE)User Interfaces (UI)Information Node (IN)+ Worker Nodes (WN)
+Central Facilities(Non Grid)250 CPUs10TB Disk35TB Tape(Capacity 330 TB)
John Gordon - LCG 13th March 2002- n° 8
Projected Staff Effort [SY]
Area GridPP @CERN CS
WP1 Workload Management 0.5 [IC] 2.0 [IC]WP2 Data Management 1.5++ [Ggo] 1.0 [Oxf]WP3 Monitoring Services 5.0++ [RAL, QMW] 1.0 [HW]
Security ++ [RAL] 1.0 [Oxf]WP4 Fabric Management 1.5 [Edin., LÕpool]WP5 Mass Storage 3.5++ [RAL, LÕpool]WP6 Integration Testbed 5.0++ [RAL/MÕcr/IC/Bristol]WP7 Network Services 2.0 [UCL/MÕcr] 1.0 [UCL]WP8 Applications 17.0ATLAS/LHCb (Gaudi/Athena) 6.5 [Oxf, Cam, RHUL, BÕham, RAL]CMS 3.0 [IC, Bristol, Brunel]CDF/D0 (SAM) 4.0 [IC, Ggo, Oxf, Lanc]BaBar 2.5 [IC, MÕcr, Bristol]UKQCD 1.0 [Edin.]Tier1/A 13.0 [RAL]Total 49.0++ 10.0 ->25.0 6.0 = 80++
May 11, 2002U.S. CMS Collaboration Meeting 46Lothar A T Bauerdick Fermilab
How About The Others: FRHow About The Others: FR
LCG workshop -13 March 2002 - Denis LINGLIN
0,5 PBytesData Bases,Hierarchical storage
~ 700 cpu's, 20 k SI-9540 TB disk
Network & QoS.Custom services "à la carte"
45 people
ONE computing centre forIN2P3-CNRS & DSM-CEA(HEP, Astropart., NP,..)
National : 18 laboratories,40 experiments,
2500 people/users
International : Tier-1 / Tier-Astatus for several US, CERN
and astrop. experiments
Budget:
~ 6-7 M € /year
Plus ~ 2 M € for personnel
CC-IN2P3
May 11, 2002U.S. CMS Collaboration Meeting 47Lothar A T Bauerdick Fermilab
Compared to European efforts the US CMS UF efforts are very smallCompared to European efforts the US CMS UF efforts are very small In FY2002 The US CMS Tier-1 is sized at 4kSI CPU and 5TB storageIn FY2002 The US CMS Tier-1 is sized at 4kSI CPU and 5TB storage The Tier-1 effort is 5.5 FTE. In addition there is 2 FTE CAS and 1 FTE GridThe Tier-1 effort is 5.5 FTE. In addition there is 2 FTE CAS and 1 FTE Grid
S&C base-line 2003/2004: Tier-1 effort needs to be at least $1M/year above FY2002S&C base-line 2003/2004: Tier-1 effort needs to be at least $1M/year above FY2002to sustain the UF R&D and become full part of the LHC Physics Research Gridto sustain the UF R&D and become full part of the LHC Physics Research Grid Need some some 7 additional FTEs, more equipment funding at the Tier-1Need some some 7 additional FTEs, more equipment funding at the Tier-1 Part of this effort would go directly into user supportPart of this effort would go directly into user support Essential areas are insufficiently covered now, need to be addressed in 2003 the latestEssential areas are insufficiently covered now, need to be addressed in 2003 the latest
Fabric managent • Storage resource mgmt • Networking • System configuration management • Collaborative tools • Fabric managent • Storage resource mgmt • Networking • System configuration management • Collaborative tools • Interfacing to Grid i/s • System management & operations supportInterfacing to Grid i/s • System management & operations support
This has been strongly endorsed by the S&C baseline review Nov 2001This has been strongly endorsed by the S&C baseline review Nov 2001 All European RC (DE, FR, IT, UK, even RU!) have support at this level of effortAll European RC (DE, FR, IT, UK, even RU!) have support at this level of effort
FY2002 - FY2004 Are Critical in USFY2002 - FY2004 Are Critical in US
May 11, 2002U.S. CMS Collaboration Meeting 48Lothar A T Bauerdick Fermilab
The U.S. User Facilities The U.S. User Facilities Will Seriously Fall BackWill Seriously Fall Back
Behind European Tier-1 EffortsBehind European Tier-1 EffortsGiven The Funding Situation!Given The Funding Situation!
To Keep US Leadership andTo Keep US Leadership and
Not to put US based Science at DisadvantageNot to put US based Science at Disadvantage
Additional Funding Is RequiredAdditional Funding Is Required
at least $1M/year at Tier-1 Sitesat least $1M/year at Tier-1 Sites
May 11, 2002U.S. CMS Collaboration Meeting 49Lothar A T Bauerdick Fermilab
LHC Computing Grid ProjectLHC Computing Grid Project$36M project 2002-2004, half equipment half personnel: $36M project 2002-2004, half equipment half personnel:
“Successful” RRB“Successful” RRB Expect to ramp to >30 FTE in 2002, and ~60FTE in 2004Expect to ramp to >30 FTE in 2002, and ~60FTE in 2004 About $2M / year equipmentAbout $2M / year equipment e.g. UK delivers 26.5% of LCG funding e.g. UK delivers 26.5% of LCG funding AT CERNAT CERN ($9.6M) ($9.6M) The US CMS has The US CMS has requestedrequested $11.7M $11.7M IN THE USIN THE US + CAS $5.89 + CAS $5.89 Current Current allocationallocation (assuming CAS, iVDGL) would be $7.1 (assuming CAS, iVDGL) would be $7.1 IN THE USIN THE US
Largest personnel fraction in LCG Applications AreaLargest personnel fraction in LCG Applications Area ““All” personnel to be at CERNAll” personnel to be at CERN
““People staying at CERN for less than 6 months are counted at a 50% level, regardless of their experience.””
CCS will work on LCG AA projects CCS will work on LCG AA projects US CMS will contribute to LCG US CMS will contribute to LCG
This brings up several issues that US CMS S&C should deal withThis brings up several issues that US CMS S&C should deal with Europeans have decided to strongly support LCG Application AreaEuropeans have decided to strongly support LCG Application Area But at the same time we do not see more support for the CCS effortsBut at the same time we do not see more support for the CCS efforts
CMS and US CMS will have to do at some level a rough accounting CMS and US CMS will have to do at some level a rough accounting of LCG AA vs CAS and LCG facilities vs US UFof LCG AA vs CAS and LCG facilities vs US UF
May 11, 2002U.S. CMS Collaboration Meeting 50Lothar A T Bauerdick Fermilab
Impact of LHC DelayImpact of LHC DelayFunding shortages in FY2001 and FY2002 already lead to significant delaysFunding shortages in FY2001 and FY2002 already lead to significant delays
Others have done more --- we seriously are understaffed and do not do enough nowOthers have done more --- we seriously are understaffed and do not do enough now We lack 7FTE already this year, and will need to start hiring only in FY2003We lack 7FTE already this year, and will need to start hiring only in FY2003 This has led to delays and will further delay our effortsThis has led to delays and will further delay our efforts
Long-term Long-term do not know, too uncertain predictions of equipment costs to do not know, too uncertain predictions of equipment costs to
evaluate possible costs savings due to delays by roughly a yearevaluate possible costs savings due to delays by roughly a year However, schedules become more realisticHowever, schedules become more realistic
Medium termMedium term Major facilities (LCG) milestones shift by about 6 monthsMajor facilities (LCG) milestones shift by about 6 months 1st LCG prototype grid moved to end of 2002 --> more realistic now1st LCG prototype grid moved to end of 2002 --> more realistic now End of R&D moves from end 2004 to mid 2005 End of R&D moves from end 2004 to mid 2005 Detailed schedule and work plan expected from LCG project and CMS CCS (June)Detailed schedule and work plan expected from LCG project and CMS CCS (June)
No significant overall costs savings for R&D phaseNo significant overall costs savings for R&D phase We are already significantly delayed, We are already significantly delayed,
and not even at half the effort of what other countries are doing (UK, IT, DE, RU!!) and not even at half the effort of what other countries are doing (UK, IT, DE, RU!!) Catch up on our delayed schedule feasible, if we can manage to hire 7 people in FY2003Catch up on our delayed schedule feasible, if we can manage to hire 7 people in FY2003
and manage to support this level of effort in FY2004and manage to support this level of effort in FY2004 Major issue with lack of equipment fundingMajor issue with lack of equipment funding
Re-evaluation of equipment deployment will be done during 2002 (PASTA)Re-evaluation of equipment deployment will be done during 2002 (PASTA)
May 11, 2002U.S. CMS Collaboration Meeting 51Lothar A T Bauerdick Fermilab
US S&C Minimal RequirementsUS S&C Minimal Requirements
The DOE funding guidance for the preparation of the US LHC research program approaches adequate funding levels around when the LHC starts in 2007, but is heavily back-loaded and does not accommodate the base-lined software and computing project and the needs for pre-operations of the detector in 2002-2005.
We take up the charge to better understand the minimum requirements, and to consider non-standard scenarios for reducing some of the funding short falls, but ask the funding agencies to explore all available avenues to raise the funding level.
The LHC computing model of a worldwide distributed system is new and needs significant R&D. The experiments are approaching this with a series of "data challenges" that will test the developing systems and will eventually yield a system that works.
US CMS S&C has to be part of the data challenges (DC) and to provide support for trigger and detector studies (UF subproject) and to deliver engineering support for CMS core software (CAS subproject).
May 11, 2002U.S. CMS Collaboration Meeting 52Lothar A T Bauerdick Fermilab
UF NeedsUF Needs
The UF subproject is centered on a Tier-1 facility at Fermilab, that will be driving the US CMS participation in these Data Challenges.
The prototype Tier-2 centers will become integrated parts of the US CMS Tier-1/Tier-2 facilities.
Fermilab will be a physics analysis center for CMS. LHC physics with CMS will be an important component of Fermilab's research program. Therefore Fermilab needs to play a strong role as a Tier-1 center in the upcoming CMS and LHC data challenges.
The minimal Tier-1 effort would require to at least double the current Tier-1 FTEs at Fermilab, and to grant at least $300k yearly funding for equipment. This level represents the critical threshold.
The yearly costs for this minimally sized Tier-1 center at Fermilab would approach $2M after an initial $1.6M in FY03 (hiring delays). The minimal Tier-2 prototypes would need $400k support for operations, the rest would come out of iVDGL funds.
May 11, 2002U.S. CMS Collaboration Meeting 53Lothar A T Bauerdick Fermilab
CAS NeedsCAS Needs
Ramping down the CAS effort is not an option, as we would face very adverse effects on CMS. CCS manpower is now even more needed to be able to drive and profit from the new LCG project - there is no reason to believe that the LCG will provide a “CMS-ready” solution without CCS being heavily involved in the process. We can even less allow for slips or delays.
Possible savings with the new close collaboration between CMS and ATLAS through the LCG project will potentially give some contingency to the engineering effort that is to date missing in the project plan. That contingency (which would first have to be earned) could not be released before end of 2005.
The yearly costs of keeping the current level for CAS are about $1.7M per year (DOE $1000k, NSF $700k), including escalation and reserve.
May 11, 2002U.S. CMS Collaboration Meeting 54Lothar A T Bauerdick Fermilab
Minimal US CMS S&C until 2005Minimal US CMS S&C until 2005
Definition of Minimal:Definition of Minimal:if we can’t afford even this, the US will not participate in the if we can’t afford even this, the US will not participate in the CMS Data Challenges and LCG Milestones in 2002 - 2004CMS Data Challenges and LCG Milestones in 2002 - 2004
For US CMS S&C the minimal funding for the R&D phase (until 2005) would include (PRELIMINARY) Tier1: $1600k in FY03 and $2000k in the following years Tier2: $400k per year from the NSF to sustain the support for Tier2 manpower CAS: $1M from DOE and $700k from the NSF Project Office $300k (includes reserve)
A failure to provide this level of funding would lead to severe delays and inefficiencies in the US LHC physics program. Considering the large investments in the detectors, and the large yearly costs of the research program such an approach would not be cost efficient and productive.
The ramp-up of the UF to the final system, beyond 2005, will need to be aligned with the plans of CERN and other regional centers. After 2005 the funding profile seems to approach the demand.
May 11, 2002U.S. CMS Collaboration Meeting 55Lothar A T Bauerdick Fermilab
Where do we stand?Where do we stand?Setup of an efficient and competent s/w engineering support for CMSSetup of an efficient and competent s/w engineering support for CMS
David is happy and CCS is doing wellDavid is happy and CCS is doing well ““proposal-driven” support for det/PRS engineering supportproposal-driven” support for det/PRS engineering support
Setup of an User Support organization out of UF (and CAS) staffSetup of an User Support organization out of UF (and CAS) staff PRS is happy (but needs more)PRS is happy (but needs more) ““proposal driven” provision of resources: data servers, user clusterproposal driven” provision of resources: data servers, user cluster Staff to provide data sets and nTuples for PRS, small specialized productionStaff to provide data sets and nTuples for PRS, small specialized production Accounts, software releases, distribution, help desk etc pp Accounts, software releases, distribution, help desk etc pp Tutorials done at Tier-1 and Tier-2 sitesTutorials done at Tier-1 and Tier-2 sites
Implemented & commissioned a first Tier-1/Tier-2 system of RCsImplemented & commissioned a first Tier-1/Tier-2 system of RCs UCSD, Caltech, U.Florida, U.Wisconsin, FermilabUCSD, Caltech, U.Florida, U.Wisconsin, Fermilab Shown that Grid-tools can be used in “production”Shown that Grid-tools can be used in “production”
and greatly contribute to success of Grid projects and middlewareand greatly contribute to success of Grid projects and middleware Validated use of network between Tier-1 and Tier-2: 1TB/day!Validated use of network between Tier-1 and Tier-2: 1TB/day!
Developing a production quality Grid-enabled User FacilityDeveloping a production quality Grid-enabled User Facility ““impressive organization” for running production in USimpressive organization” for running production in US
Team at Fermilab and individual efforts at Tier-2 centersTeam at Fermilab and individual efforts at Tier-2 centers Grid technology helps to reduce the effortGrid technology helps to reduce the effort
Close collaboration with Grid project infuses additional effort into US CMSClose collaboration with Grid project infuses additional effort into US CMS Collaboration between sites (including ATLAS, like BNL) for facility issuesCollaboration between sites (including ATLAS, like BNL) for facility issues Starting to address many of the “real” issues of Grids for PhysicsStarting to address many of the “real” issues of Grids for Physics
Code/binary distribution and configuration, Remote job execution, Data replicationCode/binary distribution and configuration, Remote job execution, Data replication Authentication, authorization, accounting and VO servicesAuthentication, authorization, accounting and VO services Remote database access for analysis Remote database access for analysis
May 11, 2002U.S. CMS Collaboration Meeting 56Lothar A T Bauerdick Fermilab
What have we achieved?What have we achieved?
We are participating and driving a world-wide CMS production We are participating and driving a world-wide CMS production DC DC
We are driving a large part of the US Grid integration and deployment work We are driving a large part of the US Grid integration and deployment work That goes beyond the LHC and even HEPThat goes beyond the LHC and even HEP
We have shown that the Tier-1/Tier-2 User Facility system We have shown that the Tier-1/Tier-2 User Facility system in the US can work!in the US can work! We definitely are on the map for LHC computing and the LCGWe definitely are on the map for LHC computing and the LCG
We also are threatened to be starved over the next yearsWe also are threatened to be starved over the next years The FA have failed to recognize the opportunity for continued US leadership in this field The FA have failed to recognize the opportunity for continued US leadership in this field
— as others like the UK are realizing and supporting!— as others like the UK are realizing and supporting! We are thrown back to a minimal funding level, and even that has been challenged We are thrown back to a minimal funding level, and even that has been challenged
But this is the time where our partners at CERN will expect to see us deliver But this is the time where our partners at CERN will expect to see us deliver and work with the LCG and work with the LCG
May 11, 2002U.S. CMS Collaboration Meeting 57Lothar A T Bauerdick Fermilab
ConclusionsConclusionsThe US CMS S&C Project looks technically pretty sound The US CMS S&C Project looks technically pretty sound Our customers (CCS and US CMS Users) appear to be happy, but want moreOur customers (CCS and US CMS Users) appear to be happy, but want moreWe also need more R&D to build the system, We also need more R&D to build the system,
and we need to do more to measure up to our partnersand we need to do more to measure up to our partners
We have started in 1998 with some supplemental fundsWe have started in 1998 with some supplemental fundswe are a DOE-line item nowwe are a DOE-line item now
We have received less than requested for a couple of years now, We have received less than requested for a couple of years now, but this FY2002 the project has become bitterly under-funded but this FY2002 the project has become bitterly under-funded
— cf. the reviewed and endorsed baseline— cf. the reviewed and endorsed baseline
The funding agencies have faulted on providing funding for the US S&C The funding agencies have faulted on providing funding for the US S&C and on providing FA guidance for US User Facilitiesand on providing FA guidance for US User Facilities
The ball is in our (US CMS) park nowThe ball is in our (US CMS) park nowit is not an option to do “just a little bit” of S&Cit is not an option to do “just a little bit” of S&Cthe S&C R&D is a project: base-line plans, funding profiles, change controlthe S&C R&D is a project: base-line plans, funding profiles, change control
It is up to US CMS to decideIt is up to US CMS to decideI ask you to support my request to build up the User Facilities in the USI ask you to support my request to build up the User Facilities in the US
May 11, 2002U.S. CMS Collaboration Meeting 58Lothar A T Bauerdick Fermilab
THE ENDTHE END
May 11, 2002U.S. CMS Collaboration Meeting 59Lothar A T Bauerdick Fermilab
UF Equipment CostsUF Equipment Costs
Detailed Information on Tier-1 Facility Costing Detailed Information on Tier-1 Facility Costing
See Document in Your Handouts!See Document in Your Handouts!
All numbers in FY2002$kAll numbers in FY2002$k
Fiscal Year 2002 2003 2004 2005 2006 2007 Total 2008(Ops)
1.1 T1 Regional Center 0 0 0 2,866 2,984 2,938 8,788 2,6471.2 System Support 29 23 0 35 0 0 87 151.3 O&M 0 0 0 0 0 0 0 01.4 T2 Regional Centers 232 240 870 1,870 1,500 1,750 6,462 1,2501.5 T1 Networking 61 54 42 512 462 528 1,658 4851.6 Computing R&D 511 472 492 0 0 0 1,476 01.7 Det. Con. Support 84 53 52 0 0 0 189 01.8 Local Comp. Supp. 12 95 128 23 52 23 333 48
Total 929 938 1,584 5,306 4,998 5,239 18,992 4,446
Total T1 only 697 698 714 3,436 3,498 3,489 12,530 3,196
May 11, 2002U.S. CMS Collaboration Meeting 60Lothar A T Bauerdick Fermilab
Total Project CostsTotal Project Costs
In AY$MIn AY$M Fiscal Year 2002 2003 2004 2005 2006 2007
Project Office 0.32 0.48 0.49 0.51 0.52 0.54
DOE 0.15 0.31 0.32 0.33 0.34 0.35NSF 0.17 0.17 0.18 0.18 0.19 0.19
Software Personnel 1.49 1.72 2.14 2.25 2.36 2.48
DOE 0.87 0.91 0.96 1.01 1.06 1.11NSF 0.62 0.81 1.18 1.24 1.30 1.37
UF Personnel 1.14 2.26 3.00 5.26 6.99 7.89
for Tier-1 DOE 0.83 1.97 2.48 4.33 5.42 6.28for Tier-2 NSF 0.31 0.29 0.52 0.93 1.57 1.62
UF Equipment 0.45 0.75 1.51 5.35 5.19 5.52
for Tier-1 DOE 0.45 0.70 0.71 3.44 3.50 3.49for Tier-2 NSF 0.00 0.05 0.80 1.91 1.69 2.03
Management Reserve 0.34 0.77 0.71 1.34 1.51 1.96
DOE 0.23 0.64 0.45 0.91 1.03 1.44NSF 0.11 0.13 0.27 0.43 0.47 0.52
Total Costs 3.73 5.98 7.86 14.71 16.57 18.39
Total DOE 2.53 4.53 4.91 10.02 11.35 12.67Total NSF 1.20 1.45 2.94 4.69 5.22 5.73
May 11, 2002U.S. CMS Collaboration Meeting 61Lothar A T Bauerdick Fermilab
Fiscal Year 2002 2003 2004 2005 2006 2007
Simulation CPU (Si95) 2,000 3,000 4,000 7,200 28,800 72,000
Analysis CPU (Si95) 750 2,100 4,000 8,000 32,000 80,000Server CPU (Si95) 50 140 270 1,500 6,000 15,000
Disk (TB) 16 31 46 65 260 650
Total Resources U.S. CMS CPU (Si95) 310,000
Tier-1 and all Tier-2 Disk (TB) 1,400
U.S. CMS Tier-1 RC Installed CapacityU.S. CMS Tier-1 RC Installed Capacity
Fully Functional
Facilities
20% Data Challenge
Prototype Systems
5% Data Challenge
R&D Systems
310 kSI95 310 kSI95 today is ~ 10,000 PCs today is ~ 10,000 PCs
May 11, 2002U.S. CMS Collaboration Meeting 62Lothar A T Bauerdick Fermilab
Alternative ScenariosAlternative Scenarios
Q: revise the plans as to not have CMS and ATLAS identical scope?Q: revise the plans as to not have CMS and ATLAS identical scope?- never been tried in HEP: always competitive experimentsnever been tried in HEP: always competitive experiments- UF model is NOT to run a computer centers, but to have an UF model is NOT to run a computer centers, but to have an
experiment-driven effort to get the physics environment in placeexperiment-driven effort to get the physics environment in place- S&C is engineering support for the physics project -- outsourcing S&C is engineering support for the physics project -- outsourcing
of engineering to a non-experiment driven (common) project would of engineering to a non-experiment driven (common) project would mean a complete revision of the physics activities. This would mean a complete revision of the physics activities. This would require fundamental changes to experiment management and require fundamental changes to experiment management and structure, that are not in the purview of the US part of the structure, that are not in the purview of the US part of the collaborationcollaboration
- specifically the data challenges are not only and primarily done for specifically the data challenges are not only and primarily done for the S&C project, but are going to be conducted as a coherent effort the S&C project, but are going to be conducted as a coherent effort of the physics, detector AND S&C groups with the goal to advance of the physics, detector AND S&C groups with the goal to advance the physics, detector AND S&C efforts. the physics, detector AND S&C efforts.
- The DC are why we are here. If we cannot participate there would The DC are why we are here. If we cannot participate there would be no point in going for a experiment driven UFbe no point in going for a experiment driven UF
May 11, 2002U.S. CMS Collaboration Meeting 63Lothar A T Bauerdick Fermilab
Alternative ScenariosAlternative Scenarios
Q: are Tier-2 resources spread too thin? Q: are Tier-2 resources spread too thin? - The Tier-2 efforts should be as broad as we can afford it. We are The Tier-2 efforts should be as broad as we can afford it. We are
including university (non-funded) groups, like Princetonincluding university (non-funded) groups, like Princeton- If the role of the Tier-2 centers were just to provide computing If the role of the Tier-2 centers were just to provide computing
resources we would not distribute this, but concentrate on the Tier-resources we would not distribute this, but concentrate on the Tier-1 center. Instead the model is to put some resources at the 1 center. Instead the model is to put some resources at the prototype T2 centers, which allows us to pull in additional prototype T2 centers, which allows us to pull in additional resources at these sites. This model seems to be rather resources at these sites. This model seems to be rather successful. successful.
- iVDGL funds are being used for much of the efforts at prototype T2 iVDGL funds are being used for much of the efforts at prototype T2 centers. Hardware investments at the Tier-2 sites up to know have centers. Hardware investments at the Tier-2 sites up to know have been small. The project planned to fund 1.5FTE at each site (this been small. The project planned to fund 1.5FTE at each site (this funding is not yet there). In CMS we see additional manpower at funding is not yet there). In CMS we see additional manpower at those sites of several FTE, that comes out of the base program those sites of several FTE, that comes out of the base program and that are being attracted from CS et al communities through the and that are being attracted from CS et al communities through the involvement in Grid projectsinvolvement in Grid projects
May 11, 2002U.S. CMS Collaboration Meeting 64Lothar A T Bauerdick Fermilab
Alternative ScenariosAlternative Scenarios
Q: additional software development activities to be combined?Q: additional software development activities to be combined?
- this will certainly happen. Concretely we already started to plan the this will certainly happen. Concretely we already started to plan the first large-scale ATLAS-CMS common software project, the new first large-scale ATLAS-CMS common software project, the new persistency framework. Do we expect significant savings in the persistency framework. Do we expect significant savings in the manpower efforts? These could be in the order of some 20-30%, if manpower efforts? These could be in the order of some 20-30%, if these efforts could be closely managed. However, the these efforts could be closely managed. However, the management is not in US hands, but in the purview of the LCG management is not in US hands, but in the purview of the LCG project. Also, the very project is ADDITIONAL effort that was not project. Also, the very project is ADDITIONAL effort that was not necessary when Objectivity was meant to provide the persistency necessary when Objectivity was meant to provide the persistency solution. solution.
- generally we do not expect very significant changes in the generally we do not expect very significant changes in the estimates for the total engineering manpower required to complete estimates for the total engineering manpower required to complete the core software efforts, the possible savings would give a the core software efforts, the possible savings would give a minimal contingency to the engineering effort that is to date minimal contingency to the engineering effort that is to date missing in the project plan. -> to be earned first, then released in missing in the project plan. -> to be earned first, then released in 20052005
May 11, 2002U.S. CMS Collaboration Meeting 65Lothar A T Bauerdick Fermilab
Alternative ScenariosAlternative Scenarios
Q: are we loosing, are there real cost benefits?Q: are we loosing, are there real cost benefits?- any experiment that does not have a kernel of people to run the data any experiment that does not have a kernel of people to run the data
challenges will significantly loosechallenges will significantly loose- The commodity is people, not equipmentThe commodity is people, not equipment- Sharing of resources is possible (and will happen), but we need to keep Sharing of resources is possible (and will happen), but we need to keep
minimal R&D equipment. $300k/year for each T1 is very little funding for minimal R&D equipment. $300k/year for each T1 is very little funding for doing that. Below that we should just go home…doing that. Below that we should just go home…
- Tier2: the mission of the Tier2 centers is to enable universities to be part of Tier2: the mission of the Tier2 centers is to enable universities to be part of the LHC research program. That function will be cut in as much as the the LHC research program. That function will be cut in as much as the funding for it will be cut.funding for it will be cut.
- To separate the running of the facilities form the experiment’s effort: This is a To separate the running of the facilities form the experiment’s effort: This is a model that we are developing for our interactions with Fermilab CD -- this is model that we are developing for our interactions with Fermilab CD -- this is the ramping to “35 FTE” in 2007, not the “13 FTE” now; the ramping to “35 FTE” in 2007, not the “13 FTE” now; some services already now are being “effort-reported” to CD-CMS. We have some services already now are being “effort-reported” to CD-CMS. We have to get the structures in place to get this right - there will be overheads to get the structures in place to get this right - there will be overheads involvedinvolved
- I do not see real cost benefits in any of these for the R&D phase. I prefer not I do not see real cost benefits in any of these for the R&D phase. I prefer not to discuss the model for 2007 now, but we should stay open minded. to discuss the model for 2007 now, but we should stay open minded. However, if we want to approach unconventional scenarios we need to However, if we want to approach unconventional scenarios we need to carefully prepare for them. That may start in 2003-2004?carefully prepare for them. That may start in 2003-2004?
May 11, 2002U.S. CMS Collaboration Meeting 66Lothar A T Bauerdick Fermilab
UF + PMUF + PM
Control room logbookControl room logbook
Code dist, dar, role for gridCode dist, dar, role for grid
T2 workT2 work
CCS schedule?CCS schedule?
More comments on the jobMore comments on the job
Nucleation point vs T1 user communityNucleation point vs T1 user community
new hiresnew hires
Tony’s assignment, prod runningTony’s assignment, prod running
Disk tests, benchmarking, common work w/ BNL and iVDGL facility grpDisk tests, benchmarking, common work w/ BNL and iVDGL facility grp
Monitoring, ngop, ganglia, iosif’s stuffMonitoring, ngop, ganglia, iosif’s stuff
Mention challenges to test bed/MOP: config, certificates, installations, and Mention challenges to test bed/MOP: config, certificates, installations, and help we get from grid projects: VO, ESNET CA, VDThelp we get from grid projects: VO, ESNET CA, VDT
UF workplanUF workplan