computing & networking user group meeting

21
1 Computing & Networking User Group Meeting Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17 June 2008

Upload: suzuki

Post on 22-Jan-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Computing & Networking User Group Meeting. Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17 June 2008. Users and JLab IT. Ed Brash is User Group Board of Directors’ representative on the IT Steering Committee. Physics Computing Committee (Sandy Philpott) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Computing & Networking User Group Meeting

1

Computing & NetworkingUser Group Meeting

Roy Whitney

Andy Kowalski

Sandy Philpott

Chip Watson

17 June 2008

Page 2: Computing & Networking User Group Meeting

2

Users and JLab IT

• Ed Brash is User Group Board of Directors’ representative on the IT Steering Committee.

• Physics Computing Committee (Sandy Philpott)

• Helpdesk and CCPR requests and activities

• Challenges– Constrained budget

• Staffing• Aging infrastructure

– Cyber Security

Page 3: Computing & Networking User Group Meeting

3

Computing and Networking Infrastructure

Andy Kowalski

Page 4: Computing & Networking User Group Meeting

4

CNI Outline

• Helpdesk

• Computing

• Wide Area Network

• Cyber Security

• Networking and Asset Management

Page 5: Computing & Networking User Group Meeting

5

Helpdesk

• Hour 8am-12pm M-F– Submit a CCPR via http://cc.jlab.org/– Dial x7155– Send email to [email protected]

• Windows XP, Vista and RHEL5 Supported Desktops– Migrating older desktops

• Mac Support?

Page 6: Computing & Networking User Group Meeting

6

Computing

• Email Servers Upgraded– Dovecot IMAP Server (Indexing)– New File Server and IMAP Servers (Farm Nodes)

• Servers Migrating to Virtual Machines

• Printing– Centralized Access via jlabprt.jlab.org– Accounting Coming Soon

• Video Conferencing (working on EVO)

Page 7: Computing & Networking User Group Meeting

7

Wide Area Network• Bandwidth

– 10Gbps WAN and LAN backbone– Offsite Data Transfer Servers

• scigw.jlab.org(bbftp)• qcdgw.jlab.org(bbcp)

Page 8: Computing & Networking User Group Meeting

8

Cyber Security Challenge

• The threat: sophistication and volume of attacks continue to increase.– Phishing Attacks

• Spear Phishing/Whaling are now being observed at JLab.

• Federal, including DOE, requirements to meet the cyber security challenges require additional measures.

• JLab uses a risk based approach that incorporates achieving the mission while at the same time dealing with the threat.

Page 9: Computing & Networking User Group Meeting

9

Cyber Security

• Managed Desktops

– Skype Allowed From Managed Desktops On Certain Enclaves

• Network Scanning

• Intrusion Detection

• PII/SUI (CUI) Management

Page 10: Computing & Networking User Group Meeting

10

Networking and IT Asset Management

• Network Segmentation/Enclaves– Firewalls

• Computer Registration– https://reggie.jlab.org/user/index.php

• Managing IP Addresses– DHCP

• Assigns all IP addresses (most static)• Integrated with registration

• Automatic Port Configuration– Rolling out now– Uses registration database

Page 11: Computing & Networking User Group Meeting

11

Scientific Computing

Chip Watson & Sandy Philpott

Page 12: Computing & Networking User Group Meeting

13

Farm Evolution Motivation

• Capacity upgrades– Re-use of HPC clusters

• Movement to Open Source– O/S upgrade– Change from LSF to PBS

Page 13: Computing & Networking User Group Meeting

14

Farm Evolution Timetable

Nov 07: Auger/PBS available – RHEL3 - 35 nodes

Jan 08: Fedora 8 (F8) available – 50 nodes

May 08: Friendly-user mode; IFARML4,5

Jun 08: Production

– F8 only; IFARML3 + 60 nodes from LSF IFARML alias

Jul 08: IFARML2 + 60 nodes from LSF

Aug 08: IFARML1 + 60 nodes from LSF

Sep 08: RHEL3/LSF->F8/PBS Migration complete

– No renewal of LSF or RHEL for cluster nodes

Page 14: Computing & Networking User Group Meeting

15

Farm F8/PBS Differences

• Code must be recompiled– 2.6 kernel– gcc 4

• Software installed locally via yum– cernlib– Mysql

• Time limits: 1 day default, 3 days max

• stdout/stderr to ~/farm_out

• Email notification

Page 15: Computing & Networking User Group Meeting

16

Farm Future Plans

• Additional nodes – from HPC clusters

• CY08: ~120 4g nodes• CY09-10: ~60 6n nodes

– Purchase as budgets allow

• Support for 64 bit systems when feasible & needed

Page 16: Computing & Networking User Group Meeting

17

Storage Evolution

• Deployment of Sun x4500 “thumpers”

• Decommissioning of Panasas(old /work server)

• Planned replacement of old cache nodes

Page 17: Computing & Networking User Group Meeting

18

Tape Library

• Current STK “Powderhorn” silo is nearing end-of-life– Reaching capacity & running out of blank tapes– Doesn’t support upgrade to higher density cartridges– Is officially end-of-life December 2010

• Market trends– LTO (Linear Tape Open) Standard has proliferated since 2000– LTO-4 is 4x density, capacity/$, and bandwidth of 9940b:

800 GB/tape, $100/TB, 120 MB/s– LTO-5, out next year, will double capacity, 1.5x bandwidth:

1600 GB/tape, 180 MB/s– LTO-6 will be out prior to the 12 GeV era

3200 GB/tape, 270 MB/s

Page 18: Computing & Networking User Group Meeting

19

Tape Library Replacement

• Competitive procurement now in progress– Replace old system, support 10x growth over 5 years

• Phase 1 in August– System integration, software evolution– Begin data transfers, re-use 9940b tapes

• Tape swap through January

• 2 PB capacity by November

• DAQ to LTO-4 in January 2009

• Old silo gone in March 2009

End result: breakeven on cost by the end of 2009!

Page 19: Computing & Networking User Group Meeting

20

Long Term Planning

• Continue to increase compute & storage capacity in most cost effective manner

• Improve processes & planning– PAC submission process– 12 GeV Planning…

Page 20: Computing & Networking User Group Meeting

E.g.: Hall B Requirements

Event Simulation 2012 2013 2014 2015 2016SPECint_rate2006 sec/event 1.8 1.8 1.8 1.8 1.8Number of events 1.00E+12 1.00E+12 1.00E+12 1.00E+12 1.00E+12Event size (KB) 20 20 20 20 20

% Stored Long Term 10% 25% 25% 25% 25%Total CPU (SPECint_rate2006) 5.7E+04 5.7E+04 5.7E+04 5.7E+04 5.7E+04Petabytes / year (PB) 2 5 5 5 5

Data Acquisition          Average event size (KB) 20 20 20 20 20Max sustained event rate (kHz) 0 0 10 10 20Average event rate (kHz) 0 0 10 10 10Average 24-hour duty factor (%) 0% 0% 50% 60% 65%Weeks of operation / year 0 0 0 30 30Network (n*10gigE) 1 1 1 1 1Petabytes / year 0.0 0.0 0.0 2.2 2.4

1st Pass Analysis 2012 2013 2014 2015 2016

SPECint_rate2006 sec/event 1.5 1.5 1.5 1.5 1.5Number of analysis passes 0 0 1.5 1.5 1.5Event size out / event size in 2 2 2 2 2Total CPU (SPECint_rate2006) 0.0E+00 0.0E+00 0.0E+00 7.8E-03 8.4E-03Silo Bandwidth (MB/s) 0 0 900 900 1800Petabytes / year 0.0 0.0 0.0 4.4 4.7

Total SPECint_rate2006 5.7E+04 5.7E+04 5.7E+04 5.7E+04 5.7E+04SPECint_rate2006 / node 600 900 1350 2025 3038# nodes needed (current year) 95 63 42 28 19Petabytes / year 2 5 5 12 12

Page 21: Computing & Networking User Group Meeting

22

LQCD Computing

• JLab operates 3 clusters with nearly 1100 nodes, primarily for LQCD plus some accelerator modeling

• National LQCD Computing Project (2006-2009: BNL, FNAL, JLab; USQCD Collaboration)

• LQCD II proposal 2010-2014 would double the hardware budget to enable key calculations

• JLab Experimental Physics & LQCD computing share staff (operations & software development) & tape silo, providing efficiencies for both