Download - ALICE-USA Computing Overview of Hard and Soft Computing Resources Needed to Achieve Research Goals
May 23, 2007 ALICE DOE Review - Computing 1
ALICE-USA Computing
Overview of Hard and Soft Computing Resources Needed to Achieve
Research Goals
1. Calibration2. Transfer3. Reconstruction4. Storage5. Analysis6. Simulations
ALICE Computing models and projections to be used where possible
May 23, 2007 ALICE DOE Review - Computing 2
EmCal Offline - Status and Plans• Geometry
– Structures implemented– Revisit following installation
• Data– Raw and Event Summary Data structures defined
• Alignment– Alignment tools implemented– Revisit after survey (installation)
• Calibration– Data structures defined– Procedures to be implemented
• Reconstruction– First interation clusterizer implemented– Optimization underway
• PID– Preliminary /h discrimination and 0 ID implemented– electron id under study
May 23, 2007 ALICE DOE Review - Computing 3
Offline FTE Projections
• Totals represent only support for Physics Analysis– Current effort from LBNL, WS, Yale, UH
• Some positions filled outside DOE-NP– Current Offline Coordinators : Jennifer Klay (CalPoly-NSF),
Gustavo Conesa Balbastre (Frascati)
Task FY08 FY09 FY10 FY11 FY12
Calibrations 0.25 0.50 0.50 0.25 0.25Trigger 0.25 0.25 0.25 0.25 0.25
Reconstruction 0.50 0.50 0.50 0.50 0.50
Simulation 0.50 0.50 0.50 0.50 0.50
Analysis 0.00 0.00 0.25 0.50 0.50
Offline Coordinator 0.25 0.25 0.25 0.25 0.25
Total 1.75 1.75 2.25 2.25 2.25
May 23, 2007 ALICE DOE Review - Computing 4
Computing Hardware Requirements
• ALICE Computing Model submitted to WLCG 2007• Based on LHC data projections * reconstruction times• ALICE-USA contribution ~ 8% (40/500 Ph.D. fraction)• Distribute 8% across four U.S. computing facilities
– 4.0% NERSC direct DOE-NP investment– 1.5% TLC through UH DOE-NP proposal– 1.5% OSC through OSU NSF proposal– 1.0% LC through LLNL internal resource
• Network needs are modest– total ESD data export rate is 120MB/s (4-month shutdown)– test data transfer in July for OSC, LC
2008 2009 2010 2011CPU MSI2K 22.54 34.12 48.46 63.00
~cpu-equiv 9,392 12,186 14,253 15,366 Disk PB 5.60 10.55 15.47 19.20HPSS PB 6.16 13.22 20.74 28.26
May 23, 2007 ALICE DOE Review - Computing 5
NERSC Projections (Preliminary)P. Jacobs
• Storage provided by NERSC allocation• 2.0 FTE support needed (RNC proposal process)
Fiscal Year 2007 2008 2009 2010 2011
CPU
CPU added per year [MSi2K] 0.1 0.5 0.6 0.9 0.9Cumulative CPU [MSi2K] 0.1 0.6 1.2 2.1 3Cores purchased 51 211 211 264 220Nodes purchased (new SI2K) 13 27 27 17 14Nodes purchased (refresh) 0 0 0 2.37 8.3Total nodes purchased 13 27 27 19.37 22.3Base cost $50,050 $103,950 $103,950 $74,575 $85,855PDSF supplemental costs $13,853 $28,772 $28,772 $20,641 $23,763LBNL Purchase Burden $10,360 $21,518 $21,518 $15,437 $17,772Total cost CPU $74,263 $154,240 $154,240 $110,653 $127,390
DISK
Disk added per year [PB] 0 0.1 0.2 0.3 0.3Cumulative Disk [PB] 0 0.1 0.3 0.6 0.9New storage purchased (TB) 0.7 102 200 300 300
Storage units purchased 0.08 6.48 9.52 11.90 10.20
Base cost $1,172 $91,055 $133,905 $167,381 $143,469
PDSF Supplemental costs $688 $53,493 $78,667 $98,333 $84,286
LBNL Purchase Burden $243 $18,848 $27,718 $34,648 $29,698Total cost Storage $2,103 $163,397 $240,290 $300,362 $257,453
Total Cost DOE $76,366 $317,637 $394,529 $411,015 $384,844
May 23, 2007 ALICE DOE Review - Computing 6
Texas Learning & Computation Center
• 150 nodes Itanium and 20 nodes Opteron• HPSS, 24/7 support• TLC highly regarded for past achievements• AliEn installed and running in PDC07
UH/TLC $91,401 $45,701 $36,389 $70,265CPU MSI2K 0.3 0.5 0.7 0.9Disk PB 0.1 0.2 0.2 0.3HPSS PB 0.0 0.0 0.0 0.0
• Projected total computing to be provided in UH proposal
- costs shown are for cpu only, disk match from TLC- 0.75 FTE match from TLC for system support- 1.0 FTE match from UH for software development
May 23, 2007 ALICE DOE Review - Computing 7
Livermore Computing (LLNL)
• Assume 15% sustained use of 640 cpu Opteron cluster (equals 1% of aggregate cpu)
• 24/7 support, HPSS• 700 TB green data oasis outside firewall• AliEn software install proceeding
– requires LLNL account to submit jobs (AliEn compliant)– discussions underway between Jason Newby, Brian Carnes
(allocations), and Jeff Long (data oasis)
LLNL/LCCPU MSI2K 0.2 0.2 0.5 0.5Disk PB 0.1 0.1 0.2 0.2HPSS PB 0.1 0.1 0.2 0.3
May 23, 2007 ALICE DOE Review - Computing 8
Ohio Supercomputing Center (OSU)
• Supported by State of Ohio, Industrial Partners, DoD• Current systems : 252 node Pentium, 260 node Itanium• AliEn software running HPSS, 24/7 support, and funding
for Bjorn Neilsen on offline
• New 4212 core Opteron system in 2008• Projections beyond 2008 in OSU NSF proposal
– $100k/yr and 1:1 match, or – $50k/yr and 2:1:1 match via OHIO Technical Action Fund (TAF)
• Reasons to include OSU in Computing Plan– cpu/PhD ratio above ALICE-USA average– small scale DOE-NSF collaboration
OSU/OSCCPU MSI2K 0.2 0.3 0.5 0.6Disk PB 0.1 0.2 0.2 0.3HPSS PB 0.0 0.2 0.3 0.4
May 23, 2007 ALICE DOE Review - Computing 9
Computer Hardware Summary
• Summed projections meet/exceed 8% contribution by 2010
Cumulative CPU
0
1
2
3
4
5
6
2008 2009 2010 2011
YEAR
MSI2K
LLNL
OSC
TLC
NERSC
8% Total
Cumulative Disk
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2008 2009 2010 2011
YEAR
PB
LLNL
OSC
TLC
NERSC
8% Total
May 23, 2007 ALICE DOE Review - Computing 10
FTE Summary
Task FY08 FY09 FY10 FY11 FY12Calibrations 0.25 0.5 0.5 0.25 0.25Trigger 0.25 0.25 0.25 0.25 0.25Reconstruction 0.5 0.5 0.5 0.5 0.5Simulation 0.5 0.5 0.50 0.5 0.5Analysis 0.0 0.0 0.25 0.5 0.5Comp. Coord. 0.25 0.25 0.25 0.25 0.25Offline Coord. 0.25 0.25 0.25 0.25 0.25Grid Coord. 0.5 0.25 0.25 0.25 0.25Site Coord. (4x) 1.0 1.0 1.0 1.0 1.0Total 3.5 3.5 3.75 3.75 3.75
• Most is re-direction, some effort outside DOE-SC• Grid Coord. responsible for OSG interface• Additional NERSC 2.0 FTE to cover PDSF site coord.
May 23, 2007 ALICE DOE Review - Computing 11
Computing Organization
Jason Newby
May 23, 2007 ALICE DOE Review - Computing 12
Risks & Mitigations
• One or more facilities fall below projections– recruit additional facility, i.e. ORNL– distribute load to remaining facilities– distribute load across grid (reason for OSG work)
• EmCal Offline milestones unmet (low)– redirect effort to offline
Summary• EmCal Offline in respectable shape at this time• Hardware
– PDSF investment plan underway– UH proposal to DOE – LC implementation making progress– OSC proposal to NSF