computing for aiace, hermes and panda: status and future directions federico ronchetti laboratori...
TRANSCRIPT
Computing for AIACE, HERMES and PANDA: status and future directions
Federico RonchettiLaboratori Nazionali di Frascati
• Computing for Hadronic Physics • Resource Distribution:
• Off-site (foreign labs: JLAB, DESY, GSI)• On-site (INFN): − Computing and Network− Data Storage− Resource Management
• Future Development • Outlook and Conclusions
Terzo Workshop sul calcolo INFNCagliari, 25-28 Ottobre 2004
Nuclear and Hadronic Physics
Probing the hadronic structure via Electro-Magnetic or Strong interactions.
Progress in recent years: new electron accelerators (100% DC, high currents) increased detector complexity (>104 channels) higher luminosities (1033-1035 cm-2 s-1) faster detector readout electronics (FastBus, VME, PCI).
Massive production of raw data and detector simulation data
~ 50 MB/day ~0.6 TB/day >> 1 TB/day
1998 2004 2008/2010
HERMES CLAS CLAS++/PANDA
Off-Site Data Production (AIACE/CLAS)
Batch Batch and Interactive Linux Farm Interactive Linux Farm~ 15,000~ 15,000 SPECint95 SPECint95
180 nodes for the 3 JLab Halls180 nodes for the 3 JLab Halls
DAQ
2 tape robots StorageTek Powderhorn
(6000 tapes)
1 TB
CLAS (40.000 chnls)
SUN UltraSparc
PPC (VME/Fastbus)
Off-Site Data Production (HERMES)
Linux
1 StorageTek Powderhorn Tape Robot
HERMES (30.000 chnls)
readout modules
50 CPUs
Off-Site Data Processing and Transfers
Off-Site Computing Resources AIACE MINI FARM @ JLAB (LNF+GENOVA)
8 Xeon PIV HT CPUs (3.2 – 2.8 GHz) 2 GB RAM/node 3 TB RAID5 SATA File Servers
Data Types: AIACE (GE/LNF):
Skimmed (filtered) data files Production of GEANT simulations files (local farms) Final n-tuples 1-2 TB/exp
example: dpn channel in CLAS, 1 TB @ LNF (1999=40% 2004) HERMES (mainly LNF):
4 calorimeter calibration data sets per year (0.2 TB) DST data files Production of MC files (local farm) Final n-tuples
Data Transfers: AIACE: mostly DLT tapes or Internet (5 Mbit/s)
1 TB external hard disks 1394a/b HERMES: mostly network (AFS and/or SCP
AIACE/GE: Computing and Network Resources
File Server: Sun Enterprise (’96) user disk space, UNIX services (NIS)
Farm linux 6x2 Pentium III (’00) + 4x2 Pentium IV Xeon (2003) Clustered Linux nodes via DQS (2004)
plan for moving to PBS
Storage System: DISK: 1 TB RAID5 SCSI (‘00) TAPE: 7-slot DLT Tape Library (`00)
Network Cluster and workstations running on a 100 Mbit/s LAN
AIACE/GE: Computing Overview
SWITCH
LINUXFARM
SunSERVER
PCPC
useraccounts,
applicationsoftware
RAIDmass
storage externalSCSI
internalSCSI
applications
System usage• Data analysis• Data transfers
• 5 Mbit/sec ( WAN )• Mail and web access
100 Mbit/s
Data Transfers
• DLT type IV 35/70 GB
• Medium size runs (150 GB): internet connection with JLAB (~ 2 weeks)
• Improvement in long distance bandwidth: all skimmed files could be moved through the network
AIACE&HERMES/LNF: Computing Storage:NAS Procom Netforce 1750
2.2 TB SCSI HW RAID5 (Mylex) Single P-III CPU, 3 GB RAM running a Free-BSD proprietary kernel Gigabit Ethernet NICs Easy Management: WWW and telnet/ssh interface
Cluster: Computing Nodes: 12 HT 2.4 GHz P-IV Xeon (2GB RAM/unit) Disk Storage:
2x32 GB ULTRA 320 SCSI local disks (keep OS only) critical data reside on the NAS (higher redundancy / easy sharing)
Network: dual Gigabit Ehternet NICs private VLAN: optimize NAS transfers and batch job execution public: user interactive jobs
Software OS: RedHat Linux 9.0
Last free version of Redhat linux ! Possible migration to RHL Enterprise (CLAS)
Clustering: Altair openPBS queuing systemBackup:
10 Slot/100 GB LTO SCSI Autoloader (NAS controlled, not optimal) upgrade: Backup server node with mirroring and autoloader control.
Resource Access (LNF)
X11/RDP • Linux Based Thin Clients • Cloned PCs running Linux
• Maximum integration between Linux and Windows environmentspassword synchronization (SFU NIS), shared Windows/Unix homes (NAS)• Reduction of the number of windows workstation (faster obsolescence, tougher to clone, virus spreading prone, need X11 emulation)
User access to computing resources: Integration between Unix and Windows Minimization of installation and management of user workstations
Server Side: Server Nodes: 4 2.4 GHz PIV Xeon HT CPUs w/ 4 GB RAM
OS: Windows 2K Server (DC), Windows 2003 (Terminal Servers) 6 Linux Cluster Nodes: X11 Servers
Client Side: Linux PCs Clients cloned via LNF Rembo 2.0 Service (-> M. Pistoni) Linux DiskLess Thin Clients w/ X11 and Rdesktop Few public use Windows workstations (print servers, scanners) attached to the
Domain
Common Network, Computing and Storage for AIACE and HERMES (LNF)
2.2 TB NAS
Catalyst 6000 10/100/1000
WIN 2K Adv. Srv.
12 HT CPU Linux
GIGABIT (private) FAST (public)
Windows 2003 Nodes
SFU(NIS/NFS)DCADK5
18 Thin Clients + 17 PC
PBS
Expected growth in DAQ Rate (CLAS)
CLAS Data Acquisition Rate
end of 2004:8 KHz
Impact factors in higher volume ofdata production: • detector upgrades (additional channels)• longer runs (search for finer effects)
Next 10 years• energy upgrade: 12 GeV• further increase in detector complexity
AIACE/HERMES Upgrade Plans AIACE/GE (search for 5q, GGDH,)
DISK: (homes, backup and data DST storage) +8 TB NAS/SATA (GE: +3 TB 2005, +5 TB 2006) +2 TB NAS/SATA (JLAB: +1 TB 2005, +1 TB 2006)
CPU: +12 CPUs P-IV Xeon or better ( 12 CPU P-III, 2005) +2 CPU Linux P-IV Xeon ( 3 CPU Sun, 2004, CNSIII) +4 CPU Linux P-IV Xeon (JLAB: N CPU P-II/III, 2004, sezione)
NETWORK: Upgrade to Gigabit Ethernet (2004, sezione)
AIACE/LNF (new ST counter, search for 5q, SSA) DISK: (homes, backup and data DST storage)
+6 TB NAS/SCSI/SATA ( +1 TB 2004 CNSIII, +2 TB 2005, +3 TB 2006) +2 TB NAS/SATA (JLAB, 2004 CNSIII)
Conditioning: Electric: 12 KW UPS 220 V, 2 rack APC InfraXructure (2004), +12KW, +2 Rack (2005) AIR: + 12000 BTU (2004)
CPU: +4 CPUs P-IV Xeon or better (JLAB, 4 CPU PIII, 2004 CNSIII) +2 CPUs P-IV Xeon (backup server)
HERMES/LNF (recoil detector, exclusive processes) DISK: (EC and recoil calibration data, DST storage)
+5 TB NAS/EIDE (+3 TB 2005, +2 TB 2006) CPU:
+6 CPUs P-IV Xeon or better (4 ES40, 2 6 CPU alpha, 2006)
JLAB GRID Project
JLAB participates to the US PPDG (Particle Physics Data Grid) project:Development of a high-level User Job Description Language (UJDL, in collaboration with STAR@BNL)
The experiment @ GSI
p p(N) Initial PANDA planned
acquisition rate: ¼ PB/yr
use of GRID technology
PANDA-GRID:• Flexible: uses AliEn (a lightweight version of GRID developed for ALICE@CERN)
• Up-to-date: the new EGEE project contains parts of AliEn (and the most successful tools of the old EDG)
Main features:• platform independent (open source)• multiple user interfaces (CL, GUI, Portal)• supports several authentication methods• secure file transport and replication services• distributed file catalog• smart package management
GRID status Smoothly functioning core system in two
sites: Glasgow and GSI. Dedicated web portal
http://panda.physics.gla.ac.uk Customized documentation for the
PANDA VO Possibility of multiple installantions
(RH Linux, Debian, SuSe) Experimented with the GENIUS portal
(INFN Catania) Problems:
file transfer glitches certificate authentication manpower resources
Outlook and Conclusions Large amount of production data (multi TB)
increasing detector complexity and beam energies Computing resources widely available
High speed IA32 clustering ( IA64/Opteron) High speed 1 Gbit/s LAN ( 10 Gigabit/s) High availability multi TB data storage Quantitative Apparatus Testing difficult.
manpower deficit new nodes become immediately crowded
Future Directions Hope is that GRID technology will allow to exploit a
large amount of (now) disjoint computing resources Next generation experiments already designed to
make use of existing computing GRIDS Mature experiments have GRID based projects but
appear far from production phase.