workload management - infn sezione di padovalacaprar/talks/ccs_20041102.pdf · need input also from...

22
CMS CCS CERN, Tuesday 2 Nov 2004 WorkLoad Management Stefano Lacaprara [email protected] INFN and Padova University Stefano Lacaprara – CCS, CERN, Tuesday 2 Nov 2004 – WorkLoad Management – p.1/22

Upload: others

Post on 20-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: WorkLoad Management - INFN Sezione di Padovalacaprar/talks/CCS_20041102.pdf · Need input also from Data Management (discussed during DM workshop) Probably enough people Need more

CMS

CCS

CERN, Tuesday 2 Nov 2004

WorkLoad Management

Stefano [email protected]

INFN and Padova University

Stefano Lacaprara – CCS, CERN, Tuesday 2 Nov 2004 – WorkLoad Management – p.1/22

Page 2: WorkLoad Management - INFN Sezione di Padovalacaprar/talks/CCS_20041102.pdf · Need input also from Data Management (discussed during DM workshop) Probably enough people Need more

CMS

OutlineNews,

Draft milestones and dates,

People and activity in various area,

Stefano Lacaprara – CCS, CERN, Tuesday 2 Nov 2004 – WorkLoad Management – p.2/22

Page 3: WorkLoad Management - INFN Sezione di Padovalacaprar/talks/CCS_20041102.pdf · Need input also from Data Management (discussed during DM workshop) Probably enough people Need more

CMS

Draft milestones with datesAfter gridPP (UK) and Lucas request, tentative“agenda” for next year for Workload ManagementobjectiveTry to define areas of work (already advanced) tasksTry to define what should be achieved in what timeShould match as much as possible CMS/CCSmilestones (not yet synchronized...)Try to identify people/institute for each task/sub taskFind eventual man power shortageMuch of the work for Workload managementdepends on other’s workplan

Stefano Lacaprara – CCS, CERN, Tuesday 2 Nov 2004 – WorkLoad Management – p.3/22

Page 4: WorkLoad Management - INFN Sezione di Padovalacaprar/talks/CCS_20041102.pdf · Need input also from Data Management (discussed during DM workshop) Probably enough people Need more

CMS WM dependenciesData management

Access to data depends a lot on how data will bedistributedDifferent pattern leads to different distributedscenarios

Infrastructure deploymentLCG/Grid3/NorduGridgLite EGEE deplyoment

Analysis modelStill not fully defined for CMSMore or less ok for next few monthsWill stay like this until PTDR?Will change considerably afterward?

Working scenario: much depends on otherHard to look very far in futureSo, define deliverables and timescale until end 2005(PTDR)

Stefano Lacaprara – CCS, CERN, Tuesday 2 Nov 2004 – WorkLoad Management – p.4/22

Page 5: WorkLoad Management - INFN Sezione di Padovalacaprar/talks/CCS_20041102.pdf · Need input also from Data Management (discussed during DM workshop) Probably enough people Need more

CMS WM Areas of activityData PublicationAccess to resourcesCMS software deploymentJob preparationJob splittingMonitoring and bookkeepingOutput retrieval, storage and publicationTraining and documentation

Stefano Lacaprara – CCS, CERN, Tuesday 2 Nov 2004 – WorkLoad Management – p.5/22

Page 6: WorkLoad Management - INFN Sezione di Padovalacaprar/talks/CCS_20041102.pdf · Need input also from Data Management (discussed during DM workshop) Probably enough people Need more

CMS High level deliverablesEnd 2004 Prototypes for high level tool for WM andsimple use cases03/2005 Extension of usage of prototypes to a wideaudience of PRS members to access remote data w/odata movement06/2005 Evaluation of new functionality of LCG projects(EGEE) for job submission and analysis framework09/2005 Extension to cope with more complex datadistribution scenarios, including data movement ondemand09/2005 Effective, wide usage by PRS for PTDR studies03/2006(??) Ready for DC06: effective, quasi on-line,realistic analysis during DC06 data challenge...??/2007(?) Ready for data taking

Stefano Lacaprara – CCS, CERN, Tuesday 2 Nov 2004 – WorkLoad Management – p.6/22

Page 7: WorkLoad Management - INFN Sezione di Padovalacaprar/talks/CCS_20041102.pdf · Need input also from Data Management (discussed during DM workshop) Probably enough people Need more

CMS

Lower level deliverablesMore precision of WM deliverables wrt to the high levelone (pretty general)To be reviewed, re-discussed and eventually modifiedor redefined as the project unfoldsDependency on other projects, notable Data Management,is very strictAPROM discussion to coordinate timescale,requirements, etc...Will present deliverables and timescale for each areasdefined beforeWill also discuss actual status and future plan

Stefano Lacaprara – CCS, CERN, Tuesday 2 Nov 2004 – WorkLoad Management – p.7/22

Page 8: WorkLoad Management - INFN Sezione di Padovalacaprar/talks/CCS_20041102.pdf · Need input also from Data Management (discussed during DM workshop) Probably enough people Need more

CMS Data publicationEnd 2004 Definition of requirements for publications.What, where and how. Prototype usage for jobpreparation/splitting.03/2005 Evaluation of LCG dataset catalogs for CMSpublication.Implication of DataManagement prototype forPublication schema: synchronization, duplication ofinfo, etc...Publication of private or group-wide data.09/2005 Deployment and testing for data published forPTDR, including complex schema, such as distributeddataset, streams, skims, etc

Stefano Lacaprara – CCS, CERN, Tuesday 2 Nov 2004 – WorkLoad Management – p.8/22

Page 9: WorkLoad Management - INFN Sezione di Padovalacaprar/talks/CCS_20041102.pdf · Need input also from Data Management (discussed during DM workshop) Probably enough people Need more

CMS PeopleProduction people: Alexei, Julia, Tony, ...WM: Alessandra will focus on thatRequirements, input, discussion also from WM tooldeveloping groups (grape)Need input also from Data Management (discussedduring DM workshop)Probably enough peopleNeed more involvement by APROM (integration withDM)

Stefano Lacaprara – CCS, CERN, Tuesday 2 Nov 2004 – WorkLoad Management – p.9/22

Page 10: WorkLoad Management - INFN Sezione di Padovalacaprar/talks/CCS_20041102.pdf · Need input also from Data Management (discussed during DM workshop) Probably enough people Need more

CMS

StatusVery active discussion about requirementsWhat should go into CMS publication systemand what is grid responsibilityat which level should we integrate with DMProblem in achieving a effective discussion!(common to many CMS groups!)

Difficult by mail exchange (achieved rate

mail/

min– a MailChallenge?)Difficult by “standard” meeting with presentation,even useful to drive discussionThis week (tomorrow?) technical discussion aboutscope and roles for PubDB

Stefano Lacaprara – CCS, CERN, Tuesday 2 Nov 2004 – WorkLoad Management – p.10/22

Page 11: WorkLoad Management - INFN Sezione di Padovalacaprar/talks/CCS_20041102.pdf · Need input also from Data Management (discussed during DM workshop) Probably enough people Need more

CMS

Access to resourcesEnd 2004 Usage of available testbed on various T1,eventually T203/2005 Deployment of RB able to interface to CMSdataset catalogsDeployment and initial test of gLite testbed09/2005 Wide usage of remote resources by PRSusers.

People:T1 center people . . .Need to define better responsabilities, but actual sistuation is already

� fineNeed to integrate with grid deployment: drive test bed for our needs

Big issue:How to deal with priority?Today only possible at CE level, on the hand of local site manager (ifLBS allows)Already today some delay in testing due to farm usage by non-CMSexperiment!

Stefano Lacaprara – CCS, CERN, Tuesday 2 Nov 2004 – WorkLoad Management – p.11/22

Page 12: WorkLoad Management - INFN Sezione di Padovalacaprar/talks/CCS_20041102.pdf · Need input also from Data Management (discussed during DM workshop) Probably enough people Need more

CMS

Software distributionEnd 2004 Definition of requirements for sw deploymentand tag publication03/2005 Test of Sw deployment on GRID resources09/2005 Prototype for (semi-)automatic sw deploymentworld-wide

PeopleNick, Claudio, Karlsruhe, DAR-groupWell covered – maybe too much :)Nice start with Claudio documents on requirements,need to move further on integration of different existingtoolsGAG software installation document athttp://cern.ch/fca/DCFeedBack.doc|pdf

Stefano Lacaprara – CCS, CERN, Tuesday 2 Nov 2004 – WorkLoad Management – p.12/22

Page 13: WorkLoad Management - INFN Sezione di Padovalacaprar/talks/CCS_20041102.pdf · Need input also from Data Management (discussed during DM workshop) Probably enough people Need more

CMS Job preparationEnd 2004 Prototype working. Simple use case,including private sw and executables. Access topublished information03/2005 Enhanced prototype

more complex configuration as defined by userfeedback on simple case.Training of PRS community.Access to Dataset catalogs by LCG RB.

09/2005 Full working toolPeople

Grape, Gross for LCGRunJob for UAF �MOP �LCGMario Kadastik on NorduGrid: first experiencesJulia for EGEE (need more!)Well covered, need more integrationThis week, meet with David Collins for Grape/Grosscommon developing and integration: focused on LCG

Stefano Lacaprara – CCS, CERN, Tuesday 2 Nov 2004 – WorkLoad Management – p.13/22

Page 14: WorkLoad Management - INFN Sezione di Padovalacaprar/talks/CCS_20041102.pdf · Need input also from Data Management (discussed during DM workshop) Probably enough people Need more

CMS Job preparationStatus

Progressing, bit slower than expected!Last week first successful jobs submission with grapeusing PubDB as dataset catalog (first time!)Progress slowed by many problems found (and beingaddressed)Biggest was interface with PubDBPubDB is evolving, need stable interfaceOnly hitting real problems we are understanding whatwe need from PubDB (Not a surprise!)Advantage of approach: start developing somethingsimple and try to make it works. Then learn from itGood status for RB �PubDB interface by Heintz andFlavia: prototype ready, asked for dedicateddeployment of RB for test

Stefano Lacaprara – CCS, CERN, Tuesday 2 Nov 2004 – WorkLoad Management – p.14/22

Page 15: WorkLoad Management - INFN Sezione di Padovalacaprar/talks/CCS_20041102.pdf · Need input also from Data Management (discussed during DM workshop) Probably enough people Need more

CMS Job preparation (II)Two experience on Grid3 (Rick) and NorduGrid (Mario)Very interesting results!Approach is to use “opportunistic” resources:paratrooper approach

Carry with you all you needInstall sw privatelyMove chunk of data on remoteresourcesRun on it

Different wrt what I presented as WMworkplanGOOD! We do need to play aroundwith different scenarios!

Stefano Lacaprara – CCS, CERN, Tuesday 2 Nov 2004 – WorkLoad Management – p.15/22

Page 16: WorkLoad Management - INFN Sezione di Padovalacaprar/talks/CCS_20041102.pdf · Need input also from Data Management (discussed during DM workshop) Probably enough people Need more

CMS Job preparation (III)Is this approach a choice or a forced solution given theGrid3/NorduGrid architecture?Risk: need too much resources to use resources!Install al CMS sw: very expensive! Then move data(cost depends on data size, if small no problem)Run on data: if running is fast, overhead can beimpressive!Can be improved if cluster-job is doneMother job install sw, sons use itRequirements from this experience: tool to split data insmall independent chunks (even smaller than a singlerun)Do we need this?

Stefano Lacaprara – CCS, CERN, Tuesday 2 Nov 2004 – WorkLoad Management – p.16/22

Page 17: WorkLoad Management - INFN Sezione di Padovalacaprar/talks/CCS_20041102.pdf · Need input also from Data Management (discussed during DM workshop) Probably enough people Need more

CMS Job preparation (IV)Much depends on data/event modelActual model assumes dataset complete on a TnMust evaluate carefully priority. I don’t think that this is afirst order priority.First need to move datasets with appropriate movementpublicationPhedex provide functionality: need to integrate withpublication schemaThen move on fancier data movement scenarios

Stefano Lacaprara – CCS, CERN, Tuesday 2 Nov 2004 – WorkLoad Management – p.17/22

Page 18: WorkLoad Management - INFN Sezione di Padovalacaprar/talks/CCS_20041102.pdf · Need input also from Data Management (discussed during DM workshop) Probably enough people Need more

CMS Job preparation (V)Related issueDo we need dataset splitting for job splitting? (AlsoEGEE approach)Not mandatory!Complete Dataset (or big part of it) can be on a SE, andaccessed by many splitted jobs from WNs of close CEIf data is splitted, then the job splitting will follow morestrictly data splittingIf, for better resource usage (eg very fast job readingmany runs), need to re-join small pieces of dataset.Can be complex!Not guaranteed that best use of resources is done.Need to ensure that balancing is doneImplication for data and event model

Stefano Lacaprara – CCS, CERN, Tuesday 2 Nov 2004 – WorkLoad Management – p.18/22

Page 19: WorkLoad Management - INFN Sezione di Padovalacaprar/talks/CCS_20041102.pdf · Need input also from Data Management (discussed during DM workshop) Probably enough people Need more

CMS Job splittingEnd 2004 Simple splitting based on user configuration,assuming un-splitted dataset03/2005

More complex splitting assuming also partialdatasets distribution among different site.Evaluation of new LCG functionality for job splitting.Prototype for job clusters submission.

09/2005Integration of CMS specific knowledge in job splittingwith resource matching done by LCGSplitting done according to data and resourcedistribution

PeopleSame as job preparationNot yet idea about how to do a real job clusteringmatching data and resources in an optimal way

Stefano Lacaprara – CCS, CERN, Tuesday 2 Nov 2004 – WorkLoad Management – p.19/22

Page 20: WorkLoad Management - INFN Sezione di Padovalacaprar/talks/CCS_20041102.pdf · Need input also from Data Management (discussed during DM workshop) Probably enough people Need more

CMS Monitoring and bookkeepingEnd 2004 Very simple prototype working. Definition ofrequirements for application monitoring andbookkeeping.03/2005 Evaluation and performance analysis ofexisting tools (BOSS, MonaLisa, JAM,...). Integrationwith job submission for resubmission on fault, detectionof black holes, etc...09/2005 GUI/WEB front-end for physicist. ...

PeopleMonaLisa, BOSS, JAMWell covered –too much? :) for basic servicesStill low experience on what user really want to monitorand bookkeep

Stefano Lacaprara – CCS, CERN, Tuesday 2 Nov 2004 – WorkLoad Management – p.20/22

Page 21: WorkLoad Management - INFN Sezione di Padovalacaprar/talks/CCS_20041102.pdf · Need input also from Data Management (discussed during DM workshop) Probably enough people Need more

CMS Output retrieval and publicationEnd 2004 User get back its output easily.03/2005 User can also publish output on grid storageresources with coherent publication09/2005 as required by users...

PeopleSame people as job preparationLimited experience on publicationNo experience, and limited ideas for publication

Stefano Lacaprara – CCS, CERN, Tuesday 2 Nov 2004 – WorkLoad Management – p.21/22

Page 22: WorkLoad Management - INFN Sezione di Padovalacaprar/talks/CCS_20041102.pdf · Need input also from Data Management (discussed during DM workshop) Probably enough people Need more

CMS

GAG: software installation documentAvailable athttp://cern.ch/fca/DCFeedBack.dochttp://cern.ch/fca/DCFeedBack.pdf

We are requested to give feedback/approval/etc...Feedback already given in the past, many changeswent into documentGoal: we want to be able to do in future what we aredoing nowIf service provided to do better job, fineDocument is quite general (obviously), we should checkif this is fine with our viewIMHO is (now) fine, but I’d like to have also other peoplelooking at it

Stefano Lacaprara – CCS, CERN, Tuesday 2 Nov 2004 – WorkLoad Management – p.22/22