using ssh as portal - the cms crab over glideinwms experience
Post on 15-Jan-2015
102 Views
Preview:
DESCRIPTION
TRANSCRIPT
CHEP 2013 Using SSH as portal for CMS 1
CHEP 2013
Using ssh as portal
The CMS CRAB over glideinWMS experience
byI Sfiligoi1, S Belforte2,
J Letts1, T Martin1, M D Saiz Santos1 and F Fanzago3 1University of California San Diego 2Università e INFN di Trieste
3Università e INFN di Padova
CHEP 2013 Using SSH as portal for CMS 2
Agenda
● How we got there?● What we did?● How it worked out?
CHEP 2013 Using SSH as portal for CMS 3
CMS AnaOps
● CMS is one of the 4 experiments at theLarge Hadron Collider
● CMS Analysis Operations (AnaOps) is a computing task focused on the operational aspects of enabling physics data analysis– i.e. provides the computing infrastructure
used by the actual physicists
CHEP 2013 Using SSH as portal for CMS 4
CRABCMS AnaOps submission tool
● CMS AnaOps has been using CRABas the user interface to the Gridsince 2004
● Initially was a purely client interface– CRAB basically just a fancy wrapper
– So the Client machine had to havethe full Grid stack to function
CHEP 2013 Using SSH as portal for CMS 5
CRAB2 Server
● CRAB2 Server introduced in 2007● Client does not need the Grid stack anymore
CHEP 2013 Using SSH as portal for CMS 6
glideinWMS
● In 2009 CMS added glideinWMS to the list of supported middleware
● glideinWMS, based on HTCondor, does not support remote job submission– CRAB2 Server had to sit on the same node as
(one of) the HTCondor scheduler(s)
CHEP 2013 Using SSH as portal for CMS 7
glideinWMS and CRAB2 Server
HTCondorScheduler
glideinWMS
A schematic view
CHEP 2013 Using SSH as portal for CMS 8
Operational experience
● CMS AnaOps experienced a lot of operational issues with CRAB2 Server submitting through glideinWMS– One major problem being the server
losing track of job status
● Users were complaining a lot
CHEP 2013 Using SSH as portal for CMS 9
A look at the CRAB2 Server
● Designed to be very modular and flexible– But that made it
complex, too
● CMS decided togive up on fixing it– CRAB3 would simply
replace it
CHEP 2013 Using SSH as portal for CMS 10
A look at the CRAB2 Server
● Designed to be very modular and flexible– But that made it
complex, too
● CMS decided togive up on fixing it– CRAB3 would simply
replace it
But by late 2012,CRAB3 still did
not exist
CHEP 2013 Using SSH as portal for CMS 11
What was wrong with CRAB2 Client?
● The CRAB2 Client never went away– A significant fraction of CMS users kept using it
after CRAB2 Server was put in production
– And they were, by and large, liking it
● It just could not be used to submit to glideinWMS– Due to lack of
(native) remote submission to HTCondor
CHEP 2013 Using SSH as portal for CMS 12
Why not just use ssh?
● The major stated benefit of CRAB2 Server was “remote submission using a standard interface”(there were of course other stated benefits, but they had very little impact when paired with glideinWMS)
● But what users normally use for remote access?
– SSH● So, what if we used ssh in CRAB2 client as well?
– The gsissh dialect to support x509 proxies
CHEP 2013 Using SSH as portal for CMS 13
The ssh-enabled CRAB2 Client
● There are three stages in CRAB2 Client– Job submission
● Includes uploading the input sandbox
– Job monitoring
– Output log fetching
● The Client uses ssh and/or scp to talk to a node running the HTCondor scheduler
CHEP 2013 Using SSH as portal for CMS 14
The ssh-enabled CRAB2 Client
crab_submit
crab_monitor
crab_fetch
HTCondorScheduler
glideinWMS
ssh
scp input_sandox
ssh condor_submit
ssh condor_q
scp output_logs
Client node
A schematic view
CHEP 2013 Using SSH as portal for CMS 15
Server side setup
● gsisshd is a standard package– Just installed it from RPM
● Using lcmaps for authorization– With callbacks to
GUMS in the USA, andArgus in Europe
● Each user gets a standard Linux account– No technical limits on what the user can run,
but we would kill any offenders, if spotted
CHEP 2013 Using SSH as portal for CMS 16
Users liked it!
● Took only 4 months for all users to completely switch from Server to ssh-Client for glideinWMS
● And another 4 months to get 90% of all work
ssh-based Client
CHEP 2013 Using SSH as portal for CMS 17
Smooth operations
● We have not seen any major problems with ssh-based CRAB2 client itself– Now almost all problems are in the Grid layer
● Maybe not too surprising– CRAB2 Client is really
just a fancy wrapper
– SSH is a very mature tool Operational load now at
historical lowin spite of increasing usage
CHEP 2013 Using SSH as portal for CMS 18
Works at scale
Includes non-ssh jobs as wellIncludes non-ssh jobs as well
Essentially ssh-only Essentially ssh-only
CHEP 2013 Using SSH as portal for CMS 19
Hickups during the voyage
● We occasionally hit a ssh quirk– In order to speed up ssh connections,
we re-use the ssh control session (-S)
– If it gets stuck, user needed to manually remove it
● CRAB2 Client now automatically detects and fixes the problem
CHEP 2013 Using SSH as portal for CMS 20
Remaining issue
● The major remaining issue is the distribution of the CRAB2 Client code– This still must be installed on the client node
– Any change must be pushed to ALL the users● Making server-side changes challenging
CHEP 2013 Using SSH as portal for CMS 21
Conclusions
● Using (gsi)ssh as a portalhas proven to be very effective
● By removing complexity out ofthe code we have to maintain,life got much easier
CHEP 2013 Using SSH as portal for CMS 22
Acknowledgments
● The idea of using ssh came from the RCondor package (see poster #322)
● This work was partially sponsored by the US National Science Foundation under Grants No. PHY-1148698 and PHY-1120138.
top related