using ssh as portal - the cms crab over glideinwms experience

22
CHEP 2013 Using SSH as portal for CMS 1 CHEP 2013 Using ssh as portal The CMS CRAB over glideinWMS experience by I Sfiligoi 1 , S Belforte 2 , J Letts 1 , T Martin 1 , M D Saiz Santos 1 and F Fanzago 3 1 University of California San Diego 2 Università e INFN di Trieste 3 Università e INFN di Padova

Upload: igor-sfiligoi

Post on 15-Jan-2015

100 views

Category:

Technology


0 download

DESCRIPTION

The User Analysis of the CMS experiment is performed in distributed way using both Grid and dedicated resources. In order to insulate the users from the details of computing fabric, CMS relies on the CRAB (CMS Remote Analysis Builder) package as an abstraction layer. CMS has recently switched from a client-server version of CRAB to a purely client-based solution, with ssh being used to interface with either HTCondor or glideinWMS batch systems. This switch has resulted in significant improvement of user satisfaction, as well as in significant simplification of the CRAB code base. This presentation presents the reasoning behind the change as well as the new experience. Presented at CHEP2013 in Amsterdam.

TRANSCRIPT

Page 1: Using ssh as portal - The CMS CRAB over glideinWMS experience

CHEP 2013 Using SSH as portal for CMS 1

CHEP 2013

Using ssh as portal

The CMS CRAB over glideinWMS experience

byI Sfiligoi1, S Belforte2,

J Letts1, T Martin1, M D Saiz Santos1 and F Fanzago3 1University of California San Diego 2Università e INFN di Trieste

3Università e INFN di Padova

Page 2: Using ssh as portal - The CMS CRAB over glideinWMS experience

CHEP 2013 Using SSH as portal for CMS 2

Agenda

● How we got there?● What we did?● How it worked out?

Page 3: Using ssh as portal - The CMS CRAB over glideinWMS experience

CHEP 2013 Using SSH as portal for CMS 3

CMS AnaOps

● CMS is one of the 4 experiments at theLarge Hadron Collider

● CMS Analysis Operations (AnaOps) is a computing task focused on the operational aspects of enabling physics data analysis– i.e. provides the computing infrastructure

used by the actual physicists

Page 4: Using ssh as portal - The CMS CRAB over glideinWMS experience

CHEP 2013 Using SSH as portal for CMS 4

CRABCMS AnaOps submission tool

● CMS AnaOps has been using CRABas the user interface to the Gridsince 2004

● Initially was a purely client interface– CRAB basically just a fancy wrapper

– So the Client machine had to havethe full Grid stack to function

Page 5: Using ssh as portal - The CMS CRAB over glideinWMS experience

CHEP 2013 Using SSH as portal for CMS 5

CRAB2 Server

● CRAB2 Server introduced in 2007● Client does not need the Grid stack anymore

Page 6: Using ssh as portal - The CMS CRAB over glideinWMS experience

CHEP 2013 Using SSH as portal for CMS 6

glideinWMS

● In 2009 CMS added glideinWMS to the list of supported middleware

● glideinWMS, based on HTCondor, does not support remote job submission– CRAB2 Server had to sit on the same node as

(one of) the HTCondor scheduler(s)

Page 7: Using ssh as portal - The CMS CRAB over glideinWMS experience

CHEP 2013 Using SSH as portal for CMS 7

glideinWMS and CRAB2 Server

HTCondorScheduler

glideinWMS

A schematic view

Page 8: Using ssh as portal - The CMS CRAB over glideinWMS experience

CHEP 2013 Using SSH as portal for CMS 8

Operational experience

● CMS AnaOps experienced a lot of operational issues with CRAB2 Server submitting through glideinWMS– One major problem being the server

losing track of job status

● Users were complaining a lot

Page 9: Using ssh as portal - The CMS CRAB over glideinWMS experience

CHEP 2013 Using SSH as portal for CMS 9

A look at the CRAB2 Server

● Designed to be very modular and flexible– But that made it

complex, too

● CMS decided togive up on fixing it– CRAB3 would simply

replace it

Page 10: Using ssh as portal - The CMS CRAB over glideinWMS experience

CHEP 2013 Using SSH as portal for CMS 10

A look at the CRAB2 Server

● Designed to be very modular and flexible– But that made it

complex, too

● CMS decided togive up on fixing it– CRAB3 would simply

replace it

But by late 2012,CRAB3 still did

not exist

Page 11: Using ssh as portal - The CMS CRAB over glideinWMS experience

CHEP 2013 Using SSH as portal for CMS 11

What was wrong with CRAB2 Client?

● The CRAB2 Client never went away– A significant fraction of CMS users kept using it

after CRAB2 Server was put in production

– And they were, by and large, liking it

● It just could not be used to submit to glideinWMS– Due to lack of

(native) remote submission to HTCondor

Page 12: Using ssh as portal - The CMS CRAB over glideinWMS experience

CHEP 2013 Using SSH as portal for CMS 12

Why not just use ssh?

● The major stated benefit of CRAB2 Server was “remote submission using a standard interface”(there were of course other stated benefits, but they had very little impact when paired with glideinWMS)

● But what users normally use for remote access?

– SSH● So, what if we used ssh in CRAB2 client as well?

– The gsissh dialect to support x509 proxies

Page 13: Using ssh as portal - The CMS CRAB over glideinWMS experience

CHEP 2013 Using SSH as portal for CMS 13

The ssh-enabled CRAB2 Client

● There are three stages in CRAB2 Client– Job submission

● Includes uploading the input sandbox

– Job monitoring

– Output log fetching

● The Client uses ssh and/or scp to talk to a node running the HTCondor scheduler

Page 14: Using ssh as portal - The CMS CRAB over glideinWMS experience

CHEP 2013 Using SSH as portal for CMS 14

The ssh-enabled CRAB2 Client

crab_submit

crab_monitor

crab_fetch

HTCondorScheduler

glideinWMS

ssh

scp input_sandox

ssh condor_submit

ssh condor_q

scp output_logs

Client node

A schematic view

Page 15: Using ssh as portal - The CMS CRAB over glideinWMS experience

CHEP 2013 Using SSH as portal for CMS 15

Server side setup

● gsisshd is a standard package– Just installed it from RPM

● Using lcmaps for authorization– With callbacks to

GUMS in the USA, andArgus in Europe

● Each user gets a standard Linux account– No technical limits on what the user can run,

but we would kill any offenders, if spotted

Page 16: Using ssh as portal - The CMS CRAB over glideinWMS experience

CHEP 2013 Using SSH as portal for CMS 16

Users liked it!

● Took only 4 months for all users to completely switch from Server to ssh-Client for glideinWMS

● And another 4 months to get 90% of all work

ssh-based Client

Page 17: Using ssh as portal - The CMS CRAB over glideinWMS experience

CHEP 2013 Using SSH as portal for CMS 17

Smooth operations

● We have not seen any major problems with ssh-based CRAB2 client itself– Now almost all problems are in the Grid layer

● Maybe not too surprising– CRAB2 Client is really

just a fancy wrapper

– SSH is a very mature tool Operational load now at

historical lowin spite of increasing usage

Page 18: Using ssh as portal - The CMS CRAB over glideinWMS experience

CHEP 2013 Using SSH as portal for CMS 18

Works at scale

Includes non-ssh jobs as wellIncludes non-ssh jobs as well

Essentially ssh-only Essentially ssh-only

Page 19: Using ssh as portal - The CMS CRAB over glideinWMS experience

CHEP 2013 Using SSH as portal for CMS 19

Hickups during the voyage

● We occasionally hit a ssh quirk– In order to speed up ssh connections,

we re-use the ssh control session (-S)

– If it gets stuck, user needed to manually remove it

● CRAB2 Client now automatically detects and fixes the problem

Page 20: Using ssh as portal - The CMS CRAB over glideinWMS experience

CHEP 2013 Using SSH as portal for CMS 20

Remaining issue

● The major remaining issue is the distribution of the CRAB2 Client code– This still must be installed on the client node

– Any change must be pushed to ALL the users● Making server-side changes challenging

Page 21: Using ssh as portal - The CMS CRAB over glideinWMS experience

CHEP 2013 Using SSH as portal for CMS 21

Conclusions

● Using (gsi)ssh as a portalhas proven to be very effective

● By removing complexity out ofthe code we have to maintain,life got much easier

Page 22: Using ssh as portal - The CMS CRAB over glideinWMS experience

CHEP 2013 Using SSH as portal for CMS 22

Acknowledgments

● The idea of using ssh came from the RCondor package (see poster #322)

● This work was partially sponsored by the US National Science Foundation under Grants No. PHY-1148698 and PHY-1120138.