sc3 experiences

Post on 23-Jan-2016

57 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

SC3 experiences. Ron Trompert SARA. SC3 Infrastructure. Starting point DMF-based HSM DMF has no SRM implementation DMF does not support functionality promised by the SRM standard, like file pinning. SC3 Infrastructure. dCache. dCache provides an srm I/F - PowerPoint PPT Presentation

TRANSCRIPT

SC3 experiences

Ron Trompert

SARA

SC3 Infrastructure

Starting pointDMF-based HSM

DMF has no SRM implementation

DMF does not support functionality promised by the SRM standard, like file pinning.

SC3 Infrastructure

dCache provides an srm I/F

dCache provides flexibility with respect to HSM backends

If we need to switch to another HSM setup for some reason

dCache

SC3 Infrastructure: throughput phase

SC3 Throughput phase

Disk2disk: 100-110 MB/s Problems with stability of the nodes:solved by limiting the number of I/O movers

Disk2tape: 50 MB/sNot enough bandwidth, SAN not dedicated

SC3 Infrastructure: service phase

SC3 service phase statistics

Percentage of computational resources used (october-december)

LHCb ATLAS

SARA 28 0

NIKHEF 21 39

SC3 service phase statistics

LHCb ATLAS

GBs in 7638 881

GBs out 5 0

GB stored 3334 900

SC3 service phase statistics

Setting up the infrastructure took longer than we had hoped so unfortunately we missed ALICE.

Sizes and number of files transferred to srm SE

LHCb ATLAS

Average file size 188 MB 211 MB

# inbound transfers 41508 4277

#inbound transfers

files size < 100 MB5013 3526

# inbound transfers

file size < 1MB4922 3261

SC3 service phase observations

Networking problemsHardware problems

10GE to CERN was dedicated but the 10G switch not. Switching back and forth between dedicated 10GE and Geant.

Routing problems

Considerably less data stored for Atlas than expected.

In plans on Wiki 20 TB

SC3 service phase observations

Communication problemNetwork changes not reported

We were not informed of changes in subnets.

Problems are not always reported Failed transfers are not always reported Network outage CERN-SARA between Xmas and

New Year, nobody informed us

Monitoring: experiment monitoring websites in Wiki but also found other monitoring website urls in emails.Not clear what the experiments exact plans are

When there are no transfers and no problems are reported, it is not clear whether there is something wrong or things go just as planned.

SC3 service phase observations

Failed transfers by attempting to overwrite files

Not allowed by PNFS

At dCache sites running a gridftp door on there srm node files can be thrown away immediately using edg-gridftp-rm or glite-gridftp-rm

At dCache sites that don’t run a gridftp door on the srm node an advisory delete can be done. But then files are not immediately deleted.

SC3 service phase observations

dCache security (gsi)dcapUsing dccp it is possible to get anything in /pnfs/grid.sara.nl/data/<vo> by anyone

Unix permissions on directories are not honoured Files in a directory with –rwxr-x--- are world

readable.

File permission are honoured but when data is copied in /pnfs it gets –rw-r--r--.

Using gsidcap you are authenticated but the behaviour above stays the same.

Write permissions are OK.

Maybe this is OK for HEP VOs but for some VOs this is too liberal.

SC3 service phase observations

Oracle databaseEvery now and then it just hangs and needs to be restarted.

Backups didn’t work but FTS and LFC did.

SC3 service phase observations

A user wanted to run a job using root I/O which is rfio/dcap based.

Rfio/dcap are unauthenticated protocols to access data

Rfio comes automatically when installing a classic SE with yaim.

We don’t really like it but what do the other T1s think about this?

SC4 Outlook

Current plans (being updated)

-Replace old SE by SRM SE-Setup DB node for FTS/LFC

-Setup T2 tests-Separate T1 tape storage from general storage

top related