jens g jensen cclrc/ral ahm nottingham 2005storage middleware hepix meta-summary (and storage...

24
Jens G Jensen CCLRC/RAL HEPiX Meta-Summary (and Storage Middleware) Jens Jensen GridPP16 QMUL 27-29 June 2006

Upload: elijah-webster

Post on 28-Mar-2015

219 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Jens G Jensen CCLRC/RAL AHM Nottingham 2005Storage Middleware HEPiX Meta-Summary (and Storage Middleware) Jens Jensen GridPP16 QMUL 27-29 June 2006

Jens G JensenCCLRC/RAL

HEPiX Meta-Summary(and Storage Middleware)

Jens JensenGridPP16 QMUL27-29 June 2006

Page 2: Jens G Jensen CCLRC/RAL AHM Nottingham 2005Storage Middleware HEPiX Meta-Summary (and Storage Middleware) Jens Jensen GridPP16 QMUL 27-29 June 2006

Jens G JensenCCLRC/RAL

SRM status

• dCache @ HEPiX– Both Patrick Fuhrmann and Michael Ernst

attended– Chimera should improve database

• Remove need to du PNFS

• No DPM @ HEPiX (except for GridPP talks)

Page 3: Jens G Jensen CCLRC/RAL AHM Nottingham 2005Storage Middleware HEPiX Meta-Summary (and Storage Middleware) Jens Jensen GridPP16 QMUL 27-29 June 2006

Jens G JensenCCLRC/RAL

dCache features

• Promised SRM 2.1 again– We need to test

• Because we can• Improve 2.2 maturity

• PNFS cleanup promised

• xrootd in beta, no security

• VOMS ready for testing

• Performance– Good transfer rates– Scales to large

numbers of files– Scales to large files– Can improve

requests rate ( DESY ~50, T1 ~11 )

Page 4: Jens G Jensen CCLRC/RAL AHM Nottingham 2005Storage Middleware HEPiX Meta-Summary (and Storage Middleware) Jens Jensen GridPP16 QMUL 27-29 June 2006

Jens G JensenCCLRC/RAL

Procurement

• Procure by specint, rather than by number of boxen

• Maximise performance/running-costs ratio– Running costs = power– Opteron vs Xeon, 2cores vs hyperthreading,

brand names, blade systems

• Fast parallelism – debugging/algorithms

Page 5: Jens G Jensen CCLRC/RAL AHM Nottingham 2005Storage Middleware HEPiX Meta-Summary (and Storage Middleware) Jens Jensen GridPP16 QMUL 27-29 June 2006

Jens G JensenCCLRC/RAL

CPU

• FNAL running 32 and 64 bit software on 64 bit CPU– “32 bit apps on 64 bit OS can improve

performance”

• SL will support x86_64 but not IA64

Page 6: Jens G Jensen CCLRC/RAL AHM Nottingham 2005Storage Middleware HEPiX Meta-Summary (and Storage Middleware) Jens Jensen GridPP16 QMUL 27-29 June 2006

Jens G JensenCCLRC/RAL

Storage

• Tape vs disk– “Adapt easilier to customer demand with

tape”– Easy to expand (cost of media)– HSM facilitate utilisation– Different mechanics

• Distributed filesystems– Security– Performance– Applicability

Page 7: Jens G Jensen CCLRC/RAL AHM Nottingham 2005Storage Middleware HEPiX Meta-Summary (and Storage Middleware) Jens Jensen GridPP16 QMUL 27-29 June 2006

Jens G JensenCCLRC/RAL

Filesystems

• Most SEs’ performance improve with XFS– But XFS not supported in SL

• Look at journaling file systems– Size of code– Maturity of code– Application to small/large files– Fragmentation and extents– Size of developer base– Width of deployment– Reports of stability and robustness– Performance

Page 8: Jens G Jensen CCLRC/RAL AHM Nottingham 2005Storage Middleware HEPiX Meta-Summary (and Storage Middleware) Jens Jensen GridPP16 QMUL 27-29 June 2006

Jens G JensenCCLRC/RAL

Loose ends

• Good GridPP representation– Technical talks on storage and data

transfer and optimisation and monitoring• Randall Sobie from IHEPCCC

“interested in non-HEP use of HEP Grid”– Told him about biomed running on GridPP

• SL4 discussions – roadmap• Networking – GARR (Italian NREN)

– Repromised reservations

Page 9: Jens G Jensen CCLRC/RAL AHM Nottingham 2005Storage Middleware HEPiX Meta-Summary (and Storage Middleware) Jens Jensen GridPP16 QMUL 27-29 June 2006

Jens G JensenCCLRC/RAL

Authentication

• Enough interest to keep as track• OTP implementation integrated at BNL• Certificates and Kerberos at CERN

– Windows integrated

• Certificates and Kerberos (Active Directory) at RAL

Page 10: Jens G Jensen CCLRC/RAL AHM Nottingham 2005Storage Middleware HEPiX Meta-Summary (and Storage Middleware) Jens Jensen GridPP16 QMUL 27-29 June 2006

Jens G JensenCCLRC/RAL

HEPiX references

• More technical summary at spring hepsysman

• http://hepwww.rl.ac.uk/sysman/may2006/agenda.html

• HEPiX web site– http://hepix.caspur.it/spring2006/

agenda.php

Page 11: Jens G Jensen CCLRC/RAL AHM Nottingham 2005Storage Middleware HEPiX Meta-Summary (and Storage Middleware) Jens Jensen GridPP16 QMUL 27-29 June 2006

Jens G JensenCCLRC/RAL

CASTOR

• Finally seems to stabilise• Upgraded yesterday

– SRM not working yet – different problem this time

• Being tested by CMS (Thanks Dave N!)• Developer version vs ops version

– And the CERN strange loops– Support from developers, not ops

• Needs site expertise to run and diagnose

Page 12: Jens G Jensen CCLRC/RAL AHM Nottingham 2005Storage Middleware HEPiX Meta-Summary (and Storage Middleware) Jens Jensen GridPP16 QMUL 27-29 June 2006

Jens G JensenCCLRC/RAL

Recommendations for Allocations

• Ongoing between storage group and UB– Very good input & response from (subset

of) UB

• Provide canonical ratio of storage– Most sites have storage shortfall– Absolute number unimportant

• Physical allocations– So support the “large” site VOs “physically”– The rest can share the allocation

Page 13: Jens G Jensen CCLRC/RAL AHM Nottingham 2005Storage Middleware HEPiX Meta-Summary (and Storage Middleware) Jens Jensen GridPP16 QMUL 27-29 June 2006

Jens G JensenCCLRC/RAL

Recommendations for Allocation

• Alternatives to physics brainstormed– Ask for quotas implemented– Make (some) SRM2 spaces do quotas– Smallish allocations that can be reallocated– Use soft or filesystem quotas– Cheat: publish only appropriate fraction as

free

• Recommendations document– 2nd draft in prep’n

Page 14: Jens G Jensen CCLRC/RAL AHM Nottingham 2005Storage Middleware HEPiX Meta-Summary (and Storage Middleware) Jens Jensen GridPP16 QMUL 27-29 June 2006

Jens G JensenCCLRC/RAL

Monitoring

• Need to monitor more than SFT– SFT does local SE only– Depends on availability of CE

• Asked Dave Kant to deploy a test– Currently runs put, get, advDel

Page 15: Jens G Jensen CCLRC/RAL AHM Nottingham 2005Storage Middleware HEPiX Meta-Summary (and Storage Middleware) Jens Jensen GridPP16 QMUL 27-29 June 2006

Jens G JensenCCLRC/RAL

Metrics

• Space available• Space used

– Files in “permanent” storage

• Number of files written, read• Data rates

– But those depend on other stuff

Page 16: Jens G Jensen CCLRC/RAL AHM Nottingham 2005Storage Middleware HEPiX Meta-Summary (and Storage Middleware) Jens Jensen GridPP16 QMUL 27-29 June 2006

Jens G JensenCCLRC/RAL

Storage Middleware in

GridPP3• DPM

– ~year old, also SRM– Very small developer/support team

• dCache– Mature, but SRM 2s not deployed yet

• CASTOR– Lots of recent changes with new stager– RAL has run HEAD till recently which is probably

not a good idea– Support issues

Page 17: Jens G Jensen CCLRC/RAL AHM Nottingham 2005Storage Middleware HEPiX Meta-Summary (and Storage Middleware) Jens Jensen GridPP16 QMUL 27-29 June 2006

Jens G JensenCCLRC/RAL

Software Maturity

Disk cache

GridFTP

SRM 1.1

SRM 2.1

SRM 2.2

DPM ~1 year

~1 year

~1 year

0 years

dCache ~10 years

Java simple

Simple ~1year

0 years

-1 years

CASTOR Large rewrite

Globus wuftpd

Simple ~1year

6m un tested

0 years

Page 18: Jens G Jensen CCLRC/RAL AHM Nottingham 2005Storage Middleware HEPiX Meta-Summary (and Storage Middleware) Jens Jensen GridPP16 QMUL 27-29 June 2006

Jens G JensenCCLRC/RAL

Support

• Support is often reactive– Queries are expected to come in– Admins (or users) are expected to submit

them

• Proactive during SRM deployment– Bugging admins– Sending support people out to help admins

• Support for deployment– Is more proactive; need to align with releases

Page 19: Jens G Jensen CCLRC/RAL AHM Nottingham 2005Storage Middleware HEPiX Meta-Summary (and Storage Middleware) Jens Jensen GridPP16 QMUL 27-29 June 2006

Jens G JensenCCLRC/RAL

Support

• Carrots and sticks– Some admins go solo– Don’t benefit from existing experience

• Optimising, setup, filesystems,…• Needs much more effort to do a good job

– SEs often remain broken in weird setups

• Storage support in GridPP3– Seems to be entirely reactive– No way to influence deployment

Page 20: Jens G Jensen CCLRC/RAL AHM Nottingham 2005Storage Middleware HEPiX Meta-Summary (and Storage Middleware) Jens Jensen GridPP16 QMUL 27-29 June 2006

Jens G JensenCCLRC/RAL

Support

• How to proactivate– The Stick (LCG monitoring accounting –

uptime)• Escalation … ?

– Work with DB and dteam– Superheroes (faster than a speeding bullet)

• Focus on essential SRM implementations– dCache, DPM, CASTOR

Page 21: Jens G Jensen CCLRC/RAL AHM Nottingham 2005Storage Middleware HEPiX Meta-Summary (and Storage Middleware) Jens Jensen GridPP16 QMUL 27-29 June 2006

Jens G JensenCCLRC/RAL

Storage Group

• Mailing list– ~50 members

• Weekly phone conf– Core group ~7-10

• Loads of wiki stuff• Publications• Presentations and

dissemination

• Special thanks to core:

• Edinburgh: Greig• Glasgow: Graeme• Durham: Mark• Lancaster: Matt,

Brian• RAL Tier1: Derek• RAL Storage: Owen,

Jiri

Page 22: Jens G Jensen CCLRC/RAL AHM Nottingham 2005Storage Middleware HEPiX Meta-Summary (and Storage Middleware) Jens Jensen GridPP16 QMUL 27-29 June 2006

Jens G JensenCCLRC/RAL

Standardisation

• SRM working group– SRM is the protocol– All implementations are represented

• LCG data management– Cover interactions with higher layer

• gin-data– Organisation formerly known as GGF– Currently not high priority, SRB and SRM

islands remain separate

Page 23: Jens G Jensen CCLRC/RAL AHM Nottingham 2005Storage Middleware HEPiX Meta-Summary (and Storage Middleware) Jens Jensen GridPP16 QMUL 27-29 June 2006

Jens G JensenCCLRC/RAL

NGS convergence

• Is happening more in SRM than SRB (storage)

• NGS deploying SRM – DPM– Oxford (OK)– IC LeSC, Belfast, Soton– 64 bit kit, recompiling on Solaris, etc.

• Dedicated mailing list ETF-SRM• We provide support

Page 24: Jens G Jensen CCLRC/RAL AHM Nottingham 2005Storage Middleware HEPiX Meta-Summary (and Storage Middleware) Jens Jensen GridPP16 QMUL 27-29 June 2006

Jens G JensenCCLRC/RAL

Problems

• Unreachable disk– What does “available” mean

• Meeting experiment CPU/storage ratios

• Can we prevent people from shooting themselves in the foot

• Accounting != rocket science, but needs higher priority