future of distributed production in us facilities

33
Future of Distributed Production in US Facilities Kaushik De Kaushik De Univ. of Texas at Arlington Univ. of Texas at Arlington US ATLAS Distributed Facility US ATLAS Distributed Facility Workshop, Santa Cruz Workshop, Santa Cruz November 13, 2012 November 13, 2012

Upload: allen-johns

Post on 30-Dec-2015

30 views

Category:

Documents


2 download

DESCRIPTION

Future of Distributed Production in US Facilities. Kaushik De Univ. of Texas at Arlington US ATLAS Distributed Facility Workshop, Santa Cruz November 13, 2012. Background. Distributed production requires many different ATLAS specific SW components/applications - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Future of Distributed Production in US Facilities

Future of Distributed Productionin US Facilities

Kaushik DeKaushik De

Univ. of Texas at ArlingtonUniv. of Texas at Arlington

US ATLAS Distributed Facility Workshop, US ATLAS Distributed Facility Workshop, Santa CruzSanta Cruz

November 13, 2012November 13, 2012

Page 2: Future of Distributed Production in US Facilities

Background

Distributed production requires many different ATLAS Distributed production requires many different ATLAS specific SW components/applicationsspecific SW components/applications Athena and Transformations – core software ProdSys – task management system AMI – Production Tags and Metadata PanDA – job execution system DQ2 – data management system Monitoring of tasks, data and jobs

They utilize common tools like Globus, VDT, XRootD, They utilize common tools like Globus, VDT, XRootD, Dcache, CVMFS, … deployed at our facilitiesDcache, CVMFS, … deployed at our facilities

Kaushik DeKaushik De 2November 13, 2012November 13, 2012

Page 3: Future of Distributed Production in US Facilities

Overview

Many distributed production components used in ATLAS Many distributed production components used in ATLAS are being upgraded after ~5 years of continuous useare being upgraded after ~5 years of continuous use

In this talk we will focus on their evolution in 2013-2014In this talk we will focus on their evolution in 2013-2014 Athena on many fronts: AthenaMP, Athena64, AthenaGPU,

AthenaPhi, Athena event service trf -> tf DQ2 -> Rucio ProdSys -> ProdSys II PanDA -> CAF PanDA -> BigData New monitoring capabilities

Kaushik DeKaushik De 3November 13, 2012November 13, 2012

Page 4: Future of Distributed Production in US Facilities

AthenaXX

Many future paths for Athena driven by hardware – will not Many future paths for Athena driven by hardware – will not talk about them heretalk about them here

Interesting topic for distributed production – Interesting topic for distributed production – event serviceevent service Basic unit of measurement in HEP is events – not bits, bytes or files Multi-core is the new paradigm (same as the old one) Caching technologies may be best optimized at event level

Started discussions during SW week for event serviceStarted discussions during SW week for event service Client-server architecture in Athena desirable long term PanDA server with Athena client will be first step to try

November 13, 2012November 13, 2012Kaushik DeKaushik De 4

Page 5: Future of Distributed Production in US Facilities

Job Transforms

Job transforms – trf – workflow wrapper around AthenaJob transforms – trf – workflow wrapper around Athena

All production jobs use trfAll production jobs use trf

Most major ATLAS workloads are supportedMost major ATLAS workloads are supported Including multi-step jobs New workloads like overlay, FTK … are being added Major changes underway

See recent talks by Graeme StewartSee recent talks by Graeme Stewart https://indico.cern.ch/getFile.py/access?

contribId=35&sessionId=19&resId=0&materialId=slides&confId=169697

https://indico.cern.ch/getFile.py/access?contribId=7&resId=0&materialId=slides&confId=214562

Highlights of future changes in next few slidesNovember 13, 2012November 13, 2012Kaushik DeKaushik De 5

Page 6: Future of Distributed Production in US Facilities

November 13, 2012November 13, 2012Kaushik DeKaushik De 6

Page 7: Future of Distributed Production in US Facilities

November 13, 2012November 13, 2012Kaushik DeKaushik De 7

Page 8: Future of Distributed Production in US Facilities

November 13, 2012November 13, 2012Kaushik DeKaushik De 8

Page 9: Future of Distributed Production in US Facilities

November 13, 2012November 13, 2012Kaushik DeKaushik De 9

Page 10: Future of Distributed Production in US Facilities

November 13, 2012November 13, 2012Kaushik DeKaushik De 10

Page 11: Future of Distributed Production in US Facilities

November 13, 2012November 13, 2012Kaushik DeKaushik De 11

Page 12: Future of Distributed Production in US Facilities

November 13, 2012November 13, 2012Kaushik DeKaushik De 12

Page 13: Future of Distributed Production in US Facilities

November 13, 2012November 13, 2012Kaushik DeKaushik De 13

Page 14: Future of Distributed Production in US Facilities

November 13, 2012November 13, 2012Kaushik DeKaushik De 14

https://indico.cern.ch/getFile.py/access?contribId=1&sessionId=5&resId=2&materialId=slides&confId=169697

Page 15: Future of Distributed Production in US Facilities

November 13, 2012November 13, 2012Kaushik DeKaushik De 15

Page 16: Future of Distributed Production in US Facilities

November 13, 2012November 13, 2012Kaushik DeKaushik De 16

Page 17: Future of Distributed Production in US Facilities

November 13, 2012November 13, 2012Kaushik DeKaushik De 17

Page 18: Future of Distributed Production in US Facilities

November 13, 2012November 13, 2012Kaushik DeKaushik De 18

Page 19: Future of Distributed Production in US Facilities

What is ProdSys

Task management systemTask management system Interface to request production tasks Generate jobs for execution by PanDA Manage task completion

Consisting of many scriptsConsisting of many scripts Web interface for task request Bulk task submission interface Auto generation of jobs from tasks Scripts for task completion Interacts with AMI and DQ2

And add-onsAnd add-ons Task-list creation scripts developed by production managers Task monitoring

November 13, 2012November 13, 2012Kaushik DeKaushik De 19

Page 20: Future of Distributed Production in US Facilities

Current System

November 13, 2012November 13, 2012Kaushik DeKaushik De 20

ProductionManagerSubmits Tasks

JobsProdSys

Jobs

PanDA

User

Bamboo

User

Page 21: Future of Distributed Production in US Facilities

What is ProdSys II

Split ProdSys into two partsSplit ProdSys into two parts

DEfT – task request and task definitionDEfT – task request and task definition Some components will be taken from current ProdSys

JeDi – dynamic job definition and task executionJeDi – dynamic job definition and task execution Integrated with PanDA (replaces Bamboo) Will also be the engine for user analysis tasks

Need to work closely with Transforms & Rucio groupsNeed to work closely with Transforms & Rucio groups All three systems should evolve together

Integration with monitoringIntegration with monitoring Will be planned from the beginning

Kaushik DeKaushik De 21November 13, 2012November 13, 2012

Page 22: Future of Distributed Production in US Facilities

Future System

November 13, 2012November 13, 2012Kaushik DeKaushik De 22

ProductionManager

DEfT

PanDA

User

JeDi

User

Page 23: Future of Distributed Production in US Facilities

DEfT

Key featuresKey features Web UI for simplified interactive task request Task request system based on physics requirements Managers/users insulated from execution details Deprecate/remove script based task submission Error checking of task requests Built-in authentication and approval mechanisms Creates task according to a new simplified schema

Kaushik DeKaushik De 23November 13, 2012November 13, 2012

Page 24: Future of Distributed Production in US Facilities

Tasks, Meta-tasks, Basket-tasks

New extensions to the concept of taskNew extensions to the concept of task Task – basic unit

Input dataset -> Output dataset

Meta-task – chain of tasks, which will be auto-generated Manager/user makes single request Successive processing steps (transforms) created by DEfT Intermediate steps in chain may be specified as transient

Basket-task – group of related tasks (eg. same tag) Manager/user can define basket of tasks Manager/user makes single request for execution

Ability to clone tasks, meta-tasks and basket-tasks From pervious tasks, meta-tasks and basket-tasks Or from predefined templates

Kaushik DeKaushik De 24November 13, 2012November 13, 2012

Page 25: Future of Distributed Production in US Facilities

JeDi

Key featuresKey features JeDi will be core component of PanDA Generate jobs dynamically from DEfT tasks

Jobs are defined to match execution environment and specified constraints(eg. number of cores, duration, file size, dataset size…)

Number of events varies per job Jobs are not predefined with fixed number of events – key feature

PanDA responsible for optimal task execution PanDA responsible for task completion Auto-merging if requested Data will be collected by PanDA to optimize job execution and

completion (expanded concept of scout jobs)

Kaushik DeKaushik De 25November 13, 2012November 13, 2012

Page 26: Future of Distributed Production in US Facilities

Common Analysis Framework

Task force to evaluate suitability of PanDA for a LHC Task force to evaluate suitability of PanDA for a LHC common user analysis frameworkcommon user analysis framework

Latest report: Latest report: https://indico.cern.ch/getFile.py/access?https://indico.cern.ch/getFile.py/access?contribId=7&sessionId=19&resId=1&materialId=slidecontribId=7&sessionId=19&resId=1&materialId=slides&confId=169697s&confId=169697

November 13, 2012November 13, 2012Kaushik DeKaushik De 26

Page 27: Future of Distributed Production in US Facilities

November 13, 2012November 13, 2012Kaushik DeKaushik De 27

Page 28: Future of Distributed Production in US Facilities

November 13, 2012November 13, 2012Kaushik DeKaushik De 28

Page 29: Future of Distributed Production in US Facilities

November 13, 2012November 13, 2012Kaushik DeKaushik De 29

Page 30: Future of Distributed Production in US Facilities

November 13, 2012November 13, 2012Kaushik DeKaushik De 30

Page 31: Future of Distributed Production in US Facilities

November 13, 2012November 13, 2012Kaushik DeKaushik De 31

Page 32: Future of Distributed Production in US Facilities

November 13, 2012November 13, 2012Kaushik DeKaushik De 32

Page 33: Future of Distributed Production in US Facilities

Conclusion

Many updates/improvements planned 2013-2014Many updates/improvements planned 2013-2014

Some applications will be completely re-writtenSome applications will be completely re-written But based on past 5 years of LHC experience

Plans and teams are in placePlans and teams are in place

Will lead to better software running at facilitiesWill lead to better software running at facilities

Waiting for current LHC run to endWaiting for current LHC run to end

Stay tuned for moreStay tuned for more

November 13, 2012November 13, 2012Kaushik DeKaushik De 33