feasibility study – server side
DESCRIPTION
Feasibility study – server side. Fernando H. Barreiro Megino Mattia Cinquilli Daniele Spiga Daniel C. van der Ster CERN IT-ES-VOS. News. Meetings with the experts to review the analysis frameworks used by ATLAS and CMS This week focusing on server side (PanDA server and CMS WMS) - PowerPoint PPT PresentationTRANSCRIPT
www.egi.euEGI-InSPIRE RI-261323
EGI-InSPIRE
www.egi.euEGI-InSPIRE RI-261323
Feasibility study – server side
Fernando H. Barreiro Megino
Mattia Cinquilli
Daniele Spiga
Daniel C. van der Ster
CERN IT-ES-VOS
1
www.egi.euEGI-InSPIRE RI-261323
News
• Meetings with the experts to review the analysis frameworks used by ATLAS and CMS• This week focusing on server side (PanDA server and
CMS WMS)• Thanks to Paul Nilsson, Tadashi Maeno, Steve
Foulkes, Simone Campana & Eric Vaandering for their time
• Information is tracked on our document• Now readable by anyone who has the link• https://docs.google.com/document/d/1PJDBuH4gd5w5Cz
UJ5n2i7uOaFrhcYo5-5YErQdpzjvo/ed
• This presentation should give an overview of our findings and the recommendations so far• Please interrupt for discussion, questions and corrections
2
www.egi.euEGI-InSPIRE RI-261323
PanDA architecture
3
www.egi.euEGI-InSPIRE RI-261323
CMS analysis framework
4
CENTRAL DISTRIBUTED
www.egi.euEGI-InSPIRE RI-261323
Resource management and brokerage
5
ATLAS CMS
Information system
Discover CEs Pilot Factory conf WMSes
SW installed SchedconfigAssumes production version of SW is installed on pledged sites
Occupancy PanDA job table (global view) Tracked by WMAgent (local view)
Site information Schedconfig Trivial File Catalog
Brokerage
• Client discovers data locations
• Followed by load based site brokering based on weight function
• Site capacity measured dynamically
• PanDA picks best site at submit time
• PanDA tries to process whole dataset at one site
• GlobalWQ asks PheDEX/DBS data locations
• Either • WMAgent assigns based on
static, local pledges• Delegate to WMSes to decide
the final site
• CMS sends a list of sites to WMS
• CMS will spread across sites
www.egi.euEGI-InSPIRE RI-261323
PD2P and rebrokerage
6
ATLAS CMS
Dynamic Data Placement
• ATLAS has a data distribution/pre-placement model which relies on dynamic data placement
• PD2P: When a jobset is submitted, PanDA can decide to trigger a replica request
-
RebrokerageJobs waiting longer than x hours can be reassigned to another site
• Locations for jobsets in GlobalWQ are continuously refreshed
• Once the job is in the LocalWQ locations are fixed
www.egi.euEGI-InSPIRE RI-261323
Priorities and fairshares
7
ATLAS CMS
Priorities and fairshares
• Users get x CPU hours per 24h• Additional jobs are de-
prioritized• Priority boosts/beyond pledge for
users and groups at particular resources
• @ submission: Jobs in a jobset get decreasing priorities (so that a few run right away to check for errors)
• Waiting jobs: Job priority increases while jobs wait to prevent starvation
• Retried jobs get lower priority to delay slightly
• Prod/analy balance set at site level
• Priority is set by operators • RequestManager
processes requests in order of priority
• GlobalWQ fetches in order of priority
• Global and Local WQs are FIFO
• Prod/analy balance set at site level
www.egi.euEGI-InSPIRE RI-2613238
Data handling in the server
ATLAS CMS
Input
• Pilot queries LFC to get PFNs
• Flexible input data handling configured in schedconfig• Copy2scratch vs
streaming I/O
• Input handling completely delegated to CMSSW
• CMSSW uses Trivial File Catalogue
Output
• DQ2 for detector and user data
• Copied to local SE by Pilot• Registered by the client• Optional additional copies via
DaTRI
• DBS/PheDEX primarily for detector data
• CRAB handles asynchronous stage out and optional DBS publication
www.egi.euEGI-InSPIRE RI-2613239
Site status
ATLAS CMS
Site statusPanDA queue status modified by operator, AFT and SSB Switcher
Manual
www.egi.euEGI-InSPIRE RI-26132310
Bookkeeping and redundancy
ATLAS CMS
Bookkeeping
• CLI for job/task bookkeeping and WWW PanDA monitor/Dashboard historical jobs
• CLI to kill and retry jobsets
• Jobset progress tracked in PandaDB (i.e. which files have been read)
• Client to kill and retry request
• WMAgent handles retrial of jobs based on ACDC (i.e. which files are left to process)
Redundancy
• PanDA@CERN is single point of failure
• CERN Outage:• No new jobs• Running jobs ~OK• Completing jobs may fail
• Distributed with n independent queues with enough work for one day
• CERN Outage:• No new jobs
www.egi.euEGI-InSPIRE RI-261323
Ideas about Common Approach:Data I/O Issues
• After comparing the functionalities, we asked each of the server experts how the systems could be used as a common solution
• The existing tight coupling between the data management and WM systems was previously seen as a potential showstopper, so we focused on this
11
www.egi.euEGI-InSPIRE RI-261323
• Could PanDA server handle CMS data?• Not a very tight DQ2/Panda coupling
• New libraries would have to be written• Input files
• Store just LFN in PandaDB• CMSSW queries the Trivial File Catalog (TFC) and stages the
data• Panda pilot would use a no-op mover (CMSSW reads the
LFNs directly)• Output files
• Wrapper/pilot copies files to the SE• Place files according to Trivial File Catalog
• Write LFN’s and storage site name back to PanDA• Still need optional DBS registration and external asynchronous
stage out service
PanDA Data I/O Solution
12
www.egi.euEGI-InSPIRE RI-261323
• Could CMS WMSystem handle ATLAS data?
• Data discovery/registration:• Not a very tight DM/WM coupling• Interface to DM service is pluggable
• Input and output would remain responsibility of the pilot “wrapper script”
CMS WMSystem Data I/O Solution
13
www.egi.euEGI-InSPIRE RI-261323
Priority & Fairshare Issues
• PanDA has flexible priority mechanisms which are implemented and used in production for a few years
• CMS WMSystem priorities extend to the requests – nothing in the model prevents priorities from being implemented down to Local Agents
14
www.egi.euEGI-InSPIRE RI-261323
Conclusions
• This week was focused on PanDA and the CMS WMSystem at the server side
• Main differences between the two systems• Complexity of the systems and levels of queuing
• CMS designed a distributed architecture to achieve scalability and fault tolerance
• PanDA has a simple, central architecture and it has demonstrated scalability
• Clear tradeoffs: • Central service has global view/control but single point of failure• Distributed service has higher scalability reliability but lacks global view/control
• Resource allocation• Dynamic brokerage in PanDA, more fixed in WMSystem given distributed
character
• No show stoppers detected and positive attitude seen
• Next weeks: investigate pilot frameworks and glideInWMS
15