chris brew cclrc - ral

22
Chris Brew RAL PPD HEPiX Workshop Summery Brookhaven National Laboratory October 18-22 2004 http://www.rhic.bnl.gov/hepix/agenda.shtml Chris Brew CCLRC - RAL

Upload: donnel

Post on 14-Jan-2016

40 views

Category:

Documents


0 download

DESCRIPTION

HEPiX Workshop Summery Brookhaven National Laboratory October 18-22 2004 http://www.rhic.bnl.gov/hepix/agenda.shtml. Chris Brew CCLRC - RAL. Highlights. Spread of Scientific Linux Opterons have better price/performance ratio than Xeon XFS on Performance Disk servers - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chris Brew CCLRC - RAL

Chris Brew

RAL PPD

HEPiX Workshop SummeryBrookhaven National Laboratory October 18-22 2004

http://www.rhic.bnl.gov/hepix/agenda.shtml

Chris Brew

CCLRC - RAL

Page 2: Chris Brew CCLRC - RAL

Chris Brew

RAL PPD

Highlights

• Spread of Scientific Linux• Opterons have better price/performance ratio than

Xeon• XFS on Performance Disk servers• Spam is a major problem for a lot of labs• CHOS for maintaining old Linux Versions• Ranger as a better SWATCH• AFS• Computer (In)Security

Page 3: Chris Brew CCLRC - RAL

Chris Brew

RAL PPD

Spread of Scientific Linux

• Only 5 months on from the “Edinburgh Accord” Scientific Linux is spreading throughout HEP

• Mentioned in at least 9 large site reports including CERN, DESY and SLAC

• Next release of LCG will be primarily on SL so even more sites will soon be running it

• Only concern for future is compatibility between CERN and Core version

Page 4: Chris Brew CCLRC - RAL

Chris Brew

RAL PPD

PDSF – Other Changes

• New hardware will run SL (3.03)

• CHOS already installed and will help ease transition to SL for users

• New nodes will run under Sun GridEngine– PDSF did not renew LSF

maintenance– LSF nodes will slowly be

transitioned over to SGE

CAS/CRS Farm

● Farm of 1423 dual-CPU (Intel) systems

– Added 335 machines this year

● ~245 TB local disk storage (SCSI and IDE)

● Upgrade of RHIC Central Analysis Servers/Central Reconstruction Servers (CAS/CRS) to Scientific Linux 3.0.2 (+updates) underway: should be complete before next RHIC run

LINUX at TRIUMF

Yes~36 months for hardware, ~60 months for errata by RH

YesYesDesktop.Future Servers & Desktops – Support !

Scientific

Linux 303

YesErrata – 18months

YesYesLeading Desktop,Special Needs Servers

Fedora

Core 2

Yes(only errata / no new hardware)

YesYesServersDesktop

RH9

Auto

Updates

Kickstart

Available

ISO CD’s

Use

TRIUMF strongly supports Scientific Linux

TRIUMF Site Report for HEPiX, BNL, 18-22 October 2004 – Corrie Kost

HEPiX BNL, Brookhaven 918 October 2004

Linux

• SLC3– Desktop– Servers

CCIN2P3 Site Report - HEPiX/HEPNT @ BNL, Oct 18, 2004 3

Supported platformsSupported platforms

Supported platforms: • Linux RedHat 7.2 SL3• Solaris 2.8 Solaris 2.9• AI X 5.1

11/ 9/ 2004 Len Moss 9

Linux Status, cont’d.

Weekly Red Hat phone meetings very useful Have opened about 50 issues, currently about 16

active

Updates Cron job to pull all updates from Red Hat Network Use yum to update onsite systems Provide RHN entitlements to update mobile and offsite

systems

Starting to look at Scientific Linux, so far only for a few build and interactive servers

19/10/2004 LAL Site Report - HEPix - BNL 2004

Main Resources Changes

• More Linux CPUs– 25 dual Opteron 2,2 installed (IBM e325, 1U)

• Linux upgrade to Scientific Linux– Currently installed on all new machines (i386 or amd64)– Proposed on desktop, preconfigured with Kickstart

• Network : new switch with 10 Gbs uplink– XtremNetworks Summit 400 (48 Gb ports)– Used by Opteron farm– Plan to replace central switch (Cabletron SSR 8000) next year

18-20 October 2004 HEPiX - Brookhaven

Software

• Transition to SL3• Farms:

– Scientific Linux 3 (Fermi)• Babar batch, prototype frontend

– RedHat 7.n• 7.3: LCG batch, Tier1 batch, frontend systems• 7.2: Babar frontend systems

• Servers:– SL3

• Systems services (mail, NIS, loggers, scheduler)– Redhat 7.2/ 7.3

• Disk servers (custom Kernels)– Fedora Core

• Consoles, personal desktops– Solaris 2.6, 8, 9

• SUN systems– AIX

• AFS cell

Page 5: Chris Brew CCLRC - RAL

Chris Brew

RAL PPD

Opteron/Nocona Comparison

• http://www.rhic.bnl.gov/hepix/talks/041021am/wiesand.pdf

• DESY and BNL independently ran performance tests of next generation of 64bit i386 chips from AMD and Intel

• Issue of porting HEP software to 64 bit and supporting what is effectively an extra OS

• Overall 64bit

Page 6: Chris Brew CCLRC - RAL

Chris Brew

RAL PPD

Page 7: Chris Brew CCLRC - RAL

Chris Brew

RAL PPDP.Nevski - Oct. 2004 - BNL 7

Results

528,528

375,399

2 jobs

528

491

1 job

CERN Units

165,166

218,218

2 jobs

165

185

1 job

SixTrack

(seconds/run)

2 jobs1 job

484,484394

Intel Nocona

389,389389

AMD Opteron

ATLSIM

(seconds/event)

While both machines behave in a similar way when only one job is run, the situation changes in a visible manner in the case of two jobs. It may takeup to 30% more time to run two simultaneous jobs on Intel, while on AMDthere is a notable absence of any visible performance drop.

Page 8: Chris Brew CCLRC - RAL

Chris Brew

RAL PPD

Page 9: Chris Brew CCLRC - RAL

Chris Brew

RAL PPD

XFS on Performance Disk Servers

• There we a number of talks on disk server performance– http://www.rhic.bnl.gov/hepix/talks/041021am/iven.ppt

– http://www.rhic.bnl.gov/hepix/talks/041019pm/vaneldik.ppt

– http://www.rhic.bnl.gov/hepix/talks/041019pm/schoen.pdf

• All either just used XFS and the file system (presumably from previous tests) or where it was tested it came out significantly better

Page 10: Chris Brew CCLRC - RAL

Chris Brew

RAL PPD

Page 11: Chris Brew CCLRC - RAL

Chris Brew

RAL PPD

SPAM!

• Large sites are expending a great deal of effort to try to reduce the amount of spam received by their users

• One site (JLab) has gone as far as contracting an external company MXLogic to filter all mail offsite to block spam

• CERN having reached the limits of content filtering has now started implementing active low level blocks:– Reverse DNS Lookup (increase detection 55% to 85%)– Reverse SMTP connect (should remove 25% more)

Page 12: Chris Brew CCLRC - RAL

Chris Brew

RAL PPDHEPiX meeting 2004Rafal Otto (IT/IS)

Current Status Content based detection is not worth improving

Increasing 1% requires lot of work, and may produce false positives.

Focus on low level Spam Rejection Reverse DNS activated on 15th June: increase of Spam

rejection from 55% to 85%. Reverse SMTP connect rule activated on 6th October.

Next steps: Try and identify new techniques: SPF, SenderID, DomainKeys. Try to reject evident Spams, detected by SpamKiller, CERN

Content based Spam detection engine.

Page 13: Chris Brew CCLRC - RAL

Chris Brew

RAL PPD

CHOS – Change OShttp://www.rhic.bnl.gov/hepix/talks/041020am/canon.ppt

• CHOS was written at NERSC to aid in securely supporting multiple Linux versions on one machine

• Allows divorcing the system OS from the user OS

• Basically chroot’ing to a different OS but it’s integrated with the batch system and pam so it’s transparent to the users

Page 14: Chris Brew CCLRC - RAL

Chris Brew

RAL PPD

Ranger as a better swatchhttp://www.rhic.bnl.gov/hepix/talks/041018pm/boeheim.ppt

• Latest enhancements of the SLAC ranger package implements an extended swatch like functionality

• Talk slides give examples of uses and rulesets

What is Ranger?What is Ranger?

Ranger is a monitoring system used at Ranger is a monitoring system used at SLAC for > 10 yearsSLAC for > 10 years

Written entirely in Written entirely in PerlPerl

Implements language constructs to make Implements language constructs to make it easy to it easy to Collect monitoring observationsCollect monitoring observations

Write rules to test for conditionsWrite rules to test for conditions

Take actions triggered by rulesTake actions triggered by rules

What is Swatch?What is Swatch?

Swatch is the Simple Watcher originally Swatch is the Simple Watcher originally written by Todd Atkins of Stanfordwritten by Todd Atkins of Stanford

It had a simple patternIt had a simple pattern--action syntax and action syntax and limited actionslimited actions

/(larry|moe|curly)/&&/panic/ mail=action,exec="/etc/page” 05:00 0:16

/(larry|moe|curly)/&&/reboot/ mail=action,exec="/etc/page" 05:00 0:16

Page 15: Chris Brew CCLRC - RAL

Chris Brew

RAL PPD

ConclusionsConclusions

Watch collector provides more powerful Watch collector provides more powerful mechanism for reacting to log entriesmechanism for reacting to log entries

State and context allow more intelligent State and context allow more intelligent decisions to be madedecisions to be made

Duplicate message detection helps deal with Duplicate message detection helps deal with information floodsinformation floods

Available atAvailable atftp://ftp.slac.stanford.edu/software/sysadmin/ranftp://ftp.slac.stanford.edu/software/sysadmin/ranger.5.0beta1.tgzger.5.0beta1.tgz

Page 16: Chris Brew CCLRC - RAL

Chris Brew

RAL PPD

AFS

• AFS still going strong– Many sites at various stages on the

TansArc AFS → OpenAFS → OpenAFS+Kerberos 5 path

– SLAC has PERL modules for doing AFS admin

http://www.rhic.bnl.gov/hepix/talks/041020am/wachsmann.pdf

Page 17: Chris Brew CCLRC - RAL

Chris Brew

RAL PPD

Computer (In)Security

• Bob Cowles (SLAC) gave his customary talk to terrify the rest of the admins

• Long lists of vulnerabilities in Windows, Linux and MacOSX

• Good examples of Phishing scams

Page 18: Chris Brew CCLRC - RAL

Chris Brew

RAL PPD

Phishing 1

18 October 2004 HEPiX - Fall 2004 6

Recent Phishing E-mail

18 October 2004 HEPiX - Fall 2004 8

Don’t Take the Bait

Page 19: Chris Brew CCLRC - RAL

Chris Brew

RAL PPD18 October 2004 HEPiX - Fall 2004 9

Forged FDIC E-mail

Page 20: Chris Brew CCLRC - RAL

Chris Brew

RAL PPD18 October 2004 HEPiX - Fall 2004 10

Fake FDIC Website

Page 21: Chris Brew CCLRC - RAL

Chris Brew

RAL PPD18 October 2004 HEPiX - Fall 2004 11

Real FDIC Website

Page 22: Chris Brew CCLRC - RAL

Chris Brew

RAL PPD18 October 2004 HEPiX - Fall 2004 27

Final Thoughts

Attacks coming faster; attackers getting smarter

No simple solution works Patching helps

Firewalls help

AV & attachment removal help

Encrypted passwords/tunnels help

You can’t be “secure”; only “more secure”

We must share information better