jefferson lab site report kelvin edwards thomas jefferson national accelerator facility newport...

10
Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA [email protected] 757-269-7770 http://cc.jlab.org HEPiX – October, 2004

Upload: magnus-hamilton

Post on 01-Jan-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA Kelvin.Edwards@jlab.org 757-269-7770

Jefferson LabSite Report

Kelvin Edwards

Thomas Jefferson National Accelerator Facility

Newport News, Virginia USA

[email protected]

757-269-7770

http://cc.jlab.org

HEPiX – October, 2004

Page 2: Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA Kelvin.Edwards@jlab.org 757-269-7770

Central Computing

• Email – Distracted by SPAM problem – Evaluated and purchased MXLogic

• Offsite solution

• Filters virus/spam before getting to Lab

– Upgraded our email hardware

• Windows builds– Purchased MS Enterprise Agreement– Developed an automatic build process– Upgrading all of our systems to Windows XP– Still evaluating SP2, problems with CAD, etc.

Page 3: Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA Kelvin.Edwards@jlab.org 757-269-7770

File Server Storage

• Adaptec 2200S Raid and Linux XFS– Linux kernel 2.6 and Adaptec firmware (build 7244)

• It doesn’t work (I/O errors, etc.)

– RedHat EL3 WS kernel works fine, but no XFS support– Tested ext3 performance

• unacceptable (20MB/s read, 34MB/s write)

• XFS performance (approx 100MB/s read/write)

– Dropped back to prior Adaptec BIOS and 2.6 kernel works fine

Page 4: Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA Kelvin.Edwards@jlab.org 757-269-7770

File Server Storage (cont)

• Purchased 2 StorageTek B280 systems– 14 TB of disk space– 4 Sun V210 head units– Stable, but slow, NFS performance

• Aggregate -- 6MB/s write, 63MB/s read

• Each node -- 0.13 MB/s write, 1.4MB/s read average

Page 5: Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA Kelvin.Edwards@jlab.org 757-269-7770

File Server Storage (cont)

• Evaluating 10TB Panasas system– Tested 2 protocols (directFLOW and NFS)– No directFLOW problems– NFS finally stable at version 2.1.4c– Good performance with either

• Aggregate -- 160-185MB/s write, 100-180MB/s read

• Each node – 3.5 - 5MB/s write, 2.5 - 4.5 MB/s read

Page 6: Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA Kelvin.Edwards@jlab.org 757-269-7770

Jasmine Changes• Jasmine is Jlab’s mass storage system (disk+tape) stores ~1PB

and can routinely move 20TB/day.• Disk cache system recently rewritten for performance and

reliability– I/O load spread out over pool of many disk servers– Files belong to file groups (per experiment) with quotas– Quotas may be exceeded if there is enough disk space; allows

more flexible use of disk – Files deleted from servers in a modified LRU fashion– Files may be pinned until used by the batch farm

Page 7: Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA Kelvin.Edwards@jlab.org 757-269-7770

Jasmine changes (2)• New programmatic interfaces for

– Batch Farm (Auger)– Other services that need to move files (SRM, DAQ, LQCD

disk cache)

• More reliance on MySQL database; concurrency and load are challenging

• Writing 9940B tapes• Experiment data rates now ~30MB/sec

Page 8: Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA Kelvin.Edwards@jlab.org 757-269-7770

Auger Changes• Auger is Jlab’s Batch farm management system.• Uses LSF to run jobs, keeps accounting in a database for web

or command line presentation.• Users can submit thousands of jobs using a compact job

description that includes file retrieval and storage.• Interfaces with Jasmine to stage files to disk before the job runs

on the farm to keep CPUs busy

Page 9: Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA Kelvin.Edwards@jlab.org 757-269-7770

Jasmine & Auger Web Interface

• Java Server Pages

Page 10: Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA Kelvin.Edwards@jlab.org 757-269-7770

Projects

• Email upgrade– Still evaluating software/hardware

• Desktop systems– MacOS-X– Linux, Unix– Windows

• Power/Cooling issues– Reached limit of current Computer Room– New Computer Center to open in Jan 2006– Increased power requirements for 800 MHz FSB systems

• 1.3A to 2.1A (single CPU)• 1.6A to 2.8A (dual CPU)

– Shutdown problems with non-ACPI enabled systems