sc3 experiences
DESCRIPTION
SC3 experiences. Ron Trompert SARA. SC3 Infrastructure. Starting point DMF-based HSM DMF has no SRM implementation DMF does not support functionality promised by the SRM standard, like file pinning. SC3 Infrastructure. dCache. dCache provides an srm I/F - PowerPoint PPT PresentationTRANSCRIPT
SC3 experiences
Ron Trompert
SARA
SC3 Infrastructure
Starting pointDMF-based HSM
DMF has no SRM implementation
DMF does not support functionality promised by the SRM standard, like file pinning.
SC3 Infrastructure
dCache provides an srm I/F
dCache provides flexibility with respect to HSM backends
If we need to switch to another HSM setup for some reason
dCache
SC3 Infrastructure: throughput phase
SC3 Throughput phase
Disk2disk: 100-110 MB/s Problems with stability of the nodes:solved by limiting the number of I/O movers
Disk2tape: 50 MB/sNot enough bandwidth, SAN not dedicated
SC3 Infrastructure: service phase
SC3 service phase statistics
Percentage of computational resources used (october-december)
LHCb ATLAS
SARA 28 0
NIKHEF 21 39
SC3 service phase statistics
LHCb ATLAS
GBs in 7638 881
GBs out 5 0
GB stored 3334 900
SC3 service phase statistics
Setting up the infrastructure took longer than we had hoped so unfortunately we missed ALICE.
Sizes and number of files transferred to srm SE
LHCb ATLAS
Average file size 188 MB 211 MB
# inbound transfers 41508 4277
#inbound transfers
files size < 100 MB5013 3526
# inbound transfers
file size < 1MB4922 3261
SC3 service phase observations
Networking problemsHardware problems
10GE to CERN was dedicated but the 10G switch not. Switching back and forth between dedicated 10GE and Geant.
Routing problems
Considerably less data stored for Atlas than expected.
In plans on Wiki 20 TB
SC3 service phase observations
Communication problemNetwork changes not reported
We were not informed of changes in subnets.
Problems are not always reported Failed transfers are not always reported Network outage CERN-SARA between Xmas and
New Year, nobody informed us
Monitoring: experiment monitoring websites in Wiki but also found other monitoring website urls in emails.Not clear what the experiments exact plans are
When there are no transfers and no problems are reported, it is not clear whether there is something wrong or things go just as planned.
SC3 service phase observations
Failed transfers by attempting to overwrite files
Not allowed by PNFS
At dCache sites running a gridftp door on there srm node files can be thrown away immediately using edg-gridftp-rm or glite-gridftp-rm
At dCache sites that don’t run a gridftp door on the srm node an advisory delete can be done. But then files are not immediately deleted.
SC3 service phase observations
dCache security (gsi)dcapUsing dccp it is possible to get anything in /pnfs/grid.sara.nl/data/<vo> by anyone
Unix permissions on directories are not honoured Files in a directory with –rwxr-x--- are world
readable.
File permission are honoured but when data is copied in /pnfs it gets –rw-r--r--.
Using gsidcap you are authenticated but the behaviour above stays the same.
Write permissions are OK.
Maybe this is OK for HEP VOs but for some VOs this is too liberal.
SC3 service phase observations
Oracle databaseEvery now and then it just hangs and needs to be restarted.
Backups didn’t work but FTS and LFC did.
SC3 service phase observations
A user wanted to run a job using root I/O which is rfio/dcap based.
Rfio/dcap are unauthenticated protocols to access data
Rfio comes automatically when installing a classic SE with yaim.
We don’t really like it but what do the other T1s think about this?
SC4 Outlook
Current plans (being updated)
-Replace old SE by SRM SE-Setup DB node for FTS/LFC
-Setup T2 tests-Separate T1 tape storage from general storage