Download - Summary of the HEPiX Autumn 2013 Meeting
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
Summary of the HEPiX Autumn 2013 Meeting
Arne Wiebalck
Afroditi Xafi
Thomas Oulevey
CERN ITTF
November 22, 2013
Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 2
Outline
• Miscellaneous• Site reports• Storage• Basic IT Services• Computing & Batch Systems• IT facilities• End User Services • Clouds & Virtualisation• Networking & Security
Arne
Afroditi
Thomas
Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 3
HEPiX – www.hepix.org
• Global organization of service managers and support staff providing computing facilities for HEP community
• Participating sites include BNL, CERN, DESY,
FNAL, IN2P3, NIKHEF, RAL, SLAC, TRIUMF …
• Meetings are held twice per year– Spring: Europe, Autumn: U.S./Asia
• Exchange of experiences, reports on recent work,work in progress & future plans– Usually no showing-off
Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 4
Next HEPiX Meetings
• Spring 2014– LAPP, Annecy, France – May 19 – May 23, 2014
• Autumn 2014– University of Nebraska (NE), U.S.– Final approval needed, dates to be determined
• Spring 2015– U.K. discussed as an option
Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 5
HEPiX Autumn 2013
• Oct 28 - Nov 1 at U Michigan, Ann Arbor (MI)– Very well organized, pretty rich program
– Network access: eduroam (as in Bologna)
• 115 (!) registered participants– Europe: 48, U.S./Canada: 47, Asia: 3, Australia: 2 (CERN: 13)
– Many first timers, several North-American WLCG Tier-2 Univ.’s
– DoE labs could mostly participate, only few cancellations (ZFS)
– 15 participants from 9 companies
• 65 presentations from 35 institutes– 26 hours of presentations– Many offline discussions
• Sponsors: WD, UMICH, DDN, NetApp, and Univa
Updates from the WGs (1)
• Storage– WG terminated, no summary as Andrei could not participate
• Batch – WG terminated, updates to Wiki will continue
• IPv6– Big ISPs move to IPv6 (CH: >10% of Google traffic already via IPv6)
– CERN seems well prepared, some smaller labs have not even started
– IPv6 support in batch systems?
– A lot of testing ongoing, including the experiments, test bed growing – https://indico.cern.ch/getFile.py/access?contribId=26&sessionId=2&resId=1&materialId=slides&confId=247864
Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 6
Updates from the WGs (2)
• Benchmarking– New SPEC CPU benchmark suite planned for Oct 2014– Plan is to start working with the experiments early (to identify apps to validate)
• Bit preservation– New working group led by CERN (German Cancio) and DESY (Dimitry Ozerov)– Follow-up on DPHEP presentation from J. Shiers during Bologna meeting– Focus on technical advice on bit preservation– https://indico.cern.ch/getFile.py/access?contribId=45&sessionId=3&resId=1&materialId=slides&confId=247864
• Configuration Management– No update (chairs could not participate)
• Energy efficiency– On hold for now, little feedback, no interest or no resources?– To be re-discussed in Annecy
Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 7
Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 8
Site reports (1)
• Configuration Management (Puppet) “hot topic” – Sites come from Rocks, Quattor, home-grown scripts, …
– Interesting: master-less Puppet at FNAL
– Other sites discuss similar topics as we do (workflow, secrets, …)
– Little synergy in the community so far, WG activity needed!
• Batch system reviews ongoing– Univa GridEngine & HTCondor take the lead
(SLURM did not survive testing at various sites)
– IPv6 and job authentication remain open issues
• Broad use of cloud services & virtualization – Clouds move into production everywhere
– Complete virtualization of services (e.g. AFS at UMICH)
Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 9
Site reports (2)
• “Dropbox”-like service at GridKA– For 55’000 users from several universities (10GB quota)
– Powerfolder was picked as their solution
• Lustre/Hadoop established at various sites– Lustre: GSI (10PB), IHEP (3PB), FNAL (0.2PB), JLAB, …– Hadoop: smaller sites, PB installations
• Interest in & investigations around Ceph– Mostly for OpenStack VMs, but also other usage cases (RBD),
backend for dCache, NFS replacement, CASTOR complement …
– Most sites still at an early stage
Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 10
Site reports (3)
• Scientific Linux 6– Many sites finished migration (of batch) to SL6: RAL, GridKA, INFN, …
– Significantly improved performance on older systems
Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 11
Storage (1)
• dCache update – Support for v4.1/pNFS currently being tested (looks OK)
– xroot and HTTP/WebDAV federations
– Backend testing (DDN, Ceph)
• Summary of FNAL USCMS T1 storage investigation – Seeking solutions for online (2GB, POSIX) and nearline (1TB w/ tape)
– Currently on BlueArc & dCache & Lustre & EOS
– Goal: consolidation of storage solutions
– Evaluated: the current systems plus NetApp, GPFS, Nexsan, SnapScale
– Result: dCache for T1 production, EOS for LPC analysis, HNAS for home
Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 12
Storage (2)
• Western Digital on disk drive technology– Giving insights on difficulties when doing macroscopic mechanics on
nano-scale• Platter ‘non-flatness’ plus unequal lube distribution can cause problems • Heads usually fly at 10nm and “descend” to ~2nm for actual I/O (by thermal expansion!)
– Introducing a new reliability metric (MPbF): disk failure rate dependent on load (not on power-on-hours)
– http://indico.cern.ch/getFile.py/access?contribId=37&sessionId=3&resId=3&materialId=slides&confId=247864
• 3 presentations on AFS– OpenAFS status report
• 1.6 released in Sep 2011, slow (server-side) uptake • Security advisories
– YFS : new security, new Rx (WAN), IPv4/IPv6, limits removed, …
– Summary of IPv6 investigations & survey, concluding that dual-stack seems to be solution to “IPv6/AFS issue”
Wiebalck, Xafi, Oulevey: Summary of the HEPiX Autumn 2013 Meeting - 13
Questions?
• “We built the first data centre with heaters!”(from Ulf Tigerstedt’s presentation on building the Kajaani DC )
•“Controlling a disk head is like flying a Jumbo 747 above a highway at a distance of less than 1 inch for 5 years!”(from Amit Chattopadhyay’s presentation on Disk Load Monitoring)