computing for aiace, hermes and panda: status and future directions federico ronchetti laboratori...

Computing for AIACE, HERMES and PANDA: status and future directions

Federico RonchettiLaboratori Nazionali di Frascati

• Computing for Hadronic Physics • Resource Distribution:

• Off-site (foreign labs: JLAB, DESY, GSI)• On-site (INFN): − Computing and Network− Data Storage− Resource Management

• Future Development • Outlook and Conclusions

Terzo Workshop sul calcolo INFNCagliari, 25-28 Ottobre 2004

Nuclear and Hadronic Physics

Probing the hadronic structure via Electro-Magnetic or Strong interactions.

Progress in recent years: new electron accelerators (100% DC, high currents) increased detector complexity (>104 channels) higher luminosities (1033-1035 cm-2 s-1) faster detector readout electronics (FastBus, VME, PCI).

Massive production of raw data and detector simulation data

~ 50 MB/day ~0.6 TB/day >> 1 TB/day

1998 2004 2008/2010

HERMES CLAS CLAS++/PANDA

Off-Site Data Production (AIACE/CLAS)

Batch Batch and Interactive Linux Farm Interactive Linux Farm~ 15,000~ 15,000 SPECint95 SPECint95

180 nodes for the 3 JLab Halls180 nodes for the 3 JLab Halls

DAQ

2 tape robots StorageTek Powderhorn

(6000 tapes)

1 TB

CLAS (40.000 chnls)

SUN UltraSparc

PPC (VME/Fastbus)

Off-Site Data Production (HERMES)

Linux

1 StorageTek Powderhorn Tape Robot

HERMES (30.000 chnls)

readout modules

50 CPUs

Off-Site Data Processing and Transfers

Off-Site Computing Resources AIACE MINI FARM @ JLAB (LNF+GENOVA)

8 Xeon PIV HT CPUs (3.2 – 2.8 GHz) 2 GB RAM/node 3 TB RAID5 SATA File Servers

Data Types: AIACE (GE/LNF):

Skimmed (filtered) data files Production of GEANT simulations files (local farms) Final n-tuples 1-2 TB/exp

example: dpn channel in CLAS, 1 TB @ LNF (1999=40% 2004) HERMES (mainly LNF):

4 calorimeter calibration data sets per year (0.2 TB) DST data files Production of MC files (local farm) Final n-tuples

Data Transfers: AIACE: mostly DLT tapes or Internet (5 Mbit/s)

1 TB external hard disks 1394a/b HERMES: mostly network (AFS and/or SCP

AIACE/GE: Computing and Network Resources

File Server: Sun Enterprise (’96) user disk space, UNIX services (NIS)

Farm linux 6x2 Pentium III (’00) + 4x2 Pentium IV Xeon (2003) Clustered Linux nodes via DQS (2004)

plan for moving to PBS

Storage System: DISK: 1 TB RAID5 SCSI (‘00) TAPE: 7-slot DLT Tape Library (`00)

Network Cluster and workstations running on a 100 Mbit/s LAN

AIACE/GE: Computing Overview

SWITCH

LINUXFARM

SunSERVER

PCPC

useraccounts,

applicationsoftware

RAIDmass

storage externalSCSI

internalSCSI

applications

System usage• Data analysis• Data transfers

• 5 Mbit/sec ( WAN )• Mail and web access

100 Mbit/s

Data Transfers

• DLT type IV 35/70 GB

• Medium size runs (150 GB): internet connection with JLAB (~ 2 weeks)

• Improvement in long distance bandwidth: all skimmed files could be moved through the network

AIACE&HERMES/LNF: Computing Storage:NAS Procom Netforce 1750

2.2 TB SCSI HW RAID5 (Mylex) Single P-III CPU, 3 GB RAM running a Free-BSD proprietary kernel Gigabit Ethernet NICs Easy Management: WWW and telnet/ssh interface

Cluster: Computing Nodes: 12 HT 2.4 GHz P-IV Xeon (2GB RAM/unit) Disk Storage:

2x32 GB ULTRA 320 SCSI local disks (keep OS only) critical data reside on the NAS (higher redundancy / easy sharing)

Network: dual Gigabit Ehternet NICs private VLAN: optimize NAS transfers and batch job execution public: user interactive jobs

Software OS: RedHat Linux 9.0

Last free version of Redhat linux ! Possible migration to RHL Enterprise (CLAS)

Clustering: Altair openPBS queuing systemBackup:

10 Slot/100 GB LTO SCSI Autoloader (NAS controlled, not optimal) upgrade: Backup server node with mirroring and autoloader control.

Resource Access (LNF)

X11/RDP • Linux Based Thin Clients • Cloned PCs running Linux

• Maximum integration between Linux and Windows environmentspassword synchronization (SFU NIS), shared Windows/Unix homes (NAS)• Reduction of the number of windows workstation (faster obsolescence, tougher to clone, virus spreading prone, need X11 emulation)

User access to computing resources: Integration between Unix and Windows Minimization of installation and management of user workstations

Server Side: Server Nodes: 4 2.4 GHz PIV Xeon HT CPUs w/ 4 GB RAM

OS: Windows 2K Server (DC), Windows 2003 (Terminal Servers) 6 Linux Cluster Nodes: X11 Servers

Client Side: Linux PCs Clients cloned via LNF Rembo 2.0 Service (-> M. Pistoni) Linux DiskLess Thin Clients w/ X11 and Rdesktop Few public use Windows workstations (print servers, scanners) attached to the

Domain

Common Network, Computing and Storage for AIACE and HERMES (LNF)

2.2 TB NAS

Catalyst 6000 10/100/1000

WIN 2K Adv. Srv.

12 HT CPU Linux

GIGABIT (private) FAST (public)

Windows 2003 Nodes

SFU(NIS/NFS)DCADK5

18 Thin Clients + 17 PC

PBS

http://www.x.org/index.html

Expected growth in DAQ Rate (CLAS)

CLAS Data Acquisition Rate

end of 2004:8 KHz

Impact factors in higher volume ofdata production: • detector upgrades (additional channels)• longer runs (search for finer effects)

Next 10 years• energy upgrade: 12 GeV• further increase in detector complexity

AIACE/HERMES Upgrade Plans AIACE/GE (search for 5q, GGDH,)

DISK: (homes, backup and data DST storage) +8 TB NAS/SATA (GE: +3 TB 2005, +5 TB 2006) +2 TB NAS/SATA (JLAB: +1 TB 2005, +1 TB 2006)

CPU: +12 CPUs P-IV Xeon or better ( 12 CPU P-III, 2005) +2 CPU Linux P-IV Xeon ( 3 CPU Sun, 2004, CNSIII) +4 CPU Linux P-IV Xeon (JLAB: N CPU P-II/III, 2004, sezione)

NETWORK: Upgrade to Gigabit Ethernet (2004, sezione)

AIACE/LNF (new ST counter, search for 5q, SSA) DISK: (homes, backup and data DST storage)

+6 TB NAS/SCSI/SATA ( +1 TB 2004 CNSIII, +2 TB 2005, +3 TB 2006) +2 TB NAS/SATA (JLAB, 2004 CNSIII)

Conditioning: Electric: 12 KW UPS 220 V, 2 rack APC InfraXructure (2004), +12KW, +2 Rack (2005) AIR: + 12000 BTU (2004)

CPU: +4 CPUs P-IV Xeon or better (JLAB, 4 CPU PIII, 2004 CNSIII) +2 CPUs P-IV Xeon (backup server)

HERMES/LNF (recoil detector, exclusive processes) DISK: (EC and recoil calibration data, DST storage)

+5 TB NAS/EIDE (+3 TB 2005, +2 TB 2006) CPU:

+6 CPUs P-IV Xeon or better (4 ES40, 2 6 CPU alpha, 2006)

JLAB GRID Project

JLAB participates to the US PPDG (Particle Physics Data Grid) project:Development of a high-level User Job Description Language (UJDL, in collaboration with STAR@BNL)

The experiment @ GSI

p p(N) Initial PANDA planned

acquisition rate: ¼ PB/yr

use of GRID technology

PANDA-GRID:• Flexible: uses AliEn (a lightweight version of GRID developed for ALICE@CERN)

• Up-to-date: the new EGEE project contains parts of AliEn (and the most successful tools of the old EDG)

Main features:• platform independent (open source)• multiple user interfaces (CL, GUI, Portal)• supports several authentication methods• secure file transport and replication services• distributed file catalog• smart package management

GRID status Smoothly functioning core system in two

sites: Glasgow and GSI. Dedicated web portal

http://panda.physics.gla.ac.uk Customized documentation for the

PANDA VO Possibility of multiple installantions

(RH Linux, Debian, SuSe) Experimented with the GENIUS portal

(INFN Catania) Problems:

file transfer glitches certificate authentication manpower resources

http://panda.physics.gla.ac.uk/

Outlook and Conclusions Large amount of production data (multi TB)

increasing detector complexity and beam energies Computing resources widely available

High speed IA32 clustering ( IA64/Opteron) High speed 1 Gbit/s LAN ( 10 Gigabit/s) High availability multi TB data storage Quantitative Apparatus Testing difficult.

manpower deficit new nodes become immediately crowded

Future Directions Hope is that GRID technology will allow to exploit a

large amount of (now) disjoint computing resources Next generation experiments already designed to

make use of existing computing GRIDS Mature experiments have GRID based projects but

appear far from production phase.

computing for aiace, hermes and panda: status and future directions federico ronchetti laboratori...

Documents

offsite data processing

detector simulation

massive production of

network slide

mbits data transfers

skimmed files

computing storage

cpus slide