distributed analysis tools panda & ganga
DESCRIPTION
Distributed Analysis Tools Panda & Ganga. Tadashi Maeno (Brookhaven National Laboratory). Distributed Analysis on Grid. (script,exe). Grid. A framework to utilize large-scale and geographically distributed resources. Local analysis Limited resources 1~4 CPUs ~1TB of disk. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Distributed Analysis Tools Panda & Ganga](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56814444550346895db0e21b/html5/thumbnails/1.jpg)
Distributed Analysis ToolsPanda & Ganga
Tadashi Maeno(Brookhaven National Laboratory)
![Page 2: Distributed Analysis Tools Panda & Ganga](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56814444550346895db0e21b/html5/thumbnails/2.jpg)
2
Distributed Analysis on Grid
(script,exe)
Local analysis Limited resources
1~4 CPUs~1TB of disk
A framework to utilize large-scale and geographically distributed resources
Grid
![Page 3: Distributed Analysis Tools Panda & Ganga](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56814444550346895db0e21b/html5/thumbnails/3.jpg)
3
Distributed Analysis on Grid
Grid
Local analysis
![Page 4: Distributed Analysis Tools Panda & Ganga](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56814444550346895db0e21b/html5/thumbnails/4.jpg)
4
Distributed Analysis on Grid
Run own analysis on distributed resources– Parallelization for fast turnaround
• 1CPU×800hours 800CPUs×1hour
– Unavoidably distributed data• 10 T1 computing center for ATLAS, but
no T1 can host all data
![Page 5: Distributed Analysis Tools Panda & Ganga](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56814444550346895db0e21b/html5/thumbnails/5.jpg)
5
Traditional Procedure of DA
Grid
Local analysis
Gate Keeper = Computing Element
CPUs = Worker Nodes
File upload Authentication
Job execution
Storage Input/Output data
brokerage Site selection
![Page 6: Distributed Analysis Tools Panda & Ganga](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56814444550346895db0e21b/html5/thumbnails/6.jpg)
6
Traditional Procedure of DA
Grid
Local analysis
Gate Keeper = Computing Element
CPUs = Worker Nodes
File upload Authentication
Job execution
Storage Input/Output data
brokerage Site selection
Different among grid-flavors = grid-middleware dependent
Different among grid-flavors = grid-middleware dependent local batch-system
condor,LSF,PBS,…local batch-system condor,LSF,PBS,…
local storage for EGEE/OSG dcap,dpm,xrootd,castorremote storage for NDGF
local storage for EGEE/OSG dcap,dpm,xrootd,castorremote storage for NDGF
![Page 7: Distributed Analysis Tools Panda & Ganga](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56814444550346895db0e21b/html5/thumbnails/7.jpg)
7
Three middleware
Not entirely true ! Just for intuitive understandingNot entirely true ! Just for intuitive understanding
e.g., upload a file using different protocol
EGEE Grid = EGEE backend
OSG backend
NDGF backend
![Page 8: Distributed Analysis Tools Panda & Ganga](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56814444550346895db0e21b/html5/thumbnails/8.jpg)
8
Simple Implementation of CommonUser I/F for Various Backends
def upload (file,backendType): if backendType==EGEE:
egeeModule.upload(file) elif backendType==OSG: osgModule.upload(file) elif backendType==NDGF:
ndgfModule.upload(file)
Prepare a plug-in module for each backend Implementation of GANGA to support multiple backe
nds Easily extended for other backends
![Page 9: Distributed Analysis Tools Panda & Ganga](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56814444550346895db0e21b/html5/thumbnails/9.jpg)
9
Simple Implementation of CommonUser I/F for Various Backends
Ultimately users have to understand each backend– e.g., connection failure each backend uses a d
ifferent port check each port 3 backends expertise and support/develo
pment work are 3 times more– Limited manpower
Capability for easy extension is useful in R&D phase but is redundant in production phase
![Page 10: Distributed Analysis Tools Panda & Ganga](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56814444550346895db0e21b/html5/thumbnails/10.jpg)
10
Common I/F using pilot (PANDA System)
EGEE
Panda server
OSG
pilotpilot
Pilot factoryPilot factory
https
https
submit
pull
End-user
analysis job
pilotpilot
job
NDGF
aCTaCT
pilotpilot
arc
![Page 11: Distributed Analysis Tools Panda & Ganga](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56814444550346895db0e21b/html5/thumbnails/11.jpg)
11
EGEE
Panda server
pilotpilot
Pilot factoryPilot factory
https
https
submit
pull
End-user
analysis job
job
NDGF
aCTaCT
pilotpilot
arc
OSGpilotpilot
Interaction with backends is done centrallyInteraction with backends is done centrally
Users access a common server using a single protocolUsers access a common server using a single protocol
Common I/F using pilot (PANDA System)
![Page 12: Distributed Analysis Tools Panda & Ganga](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56814444550346895db0e21b/html5/thumbnails/12.jpg)
12
Operation/Service Model of PANDA End-users are insulated from GRID
– Communicate with the Panda (HTTP) server– Lower threshold especially for physicists
Pilot factory sends pilots using GRID middleware– Only the operator of the scheduler needs to have
enough expertise on GRID Production and Analysis run on the same
infrastructure– Production should suffer from the same problem as
analysis– Once production team (one shift crew) fix the
problem for official production, analysis get cured automatically no additional manpower is needed for analysis
![Page 13: Distributed Analysis Tools Panda & Ganga](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56814444550346895db0e21b/html5/thumbnails/13.jpg)
13
PANDA = Production ANd Distributed Analysis system– Designed for analysis as well as production– Project started Aug 2005, prototype Sep 2005, production Dec 2005– The backend for all ATLAS production jobs– The primary backend for all ATLAS anaysis jobs
A single task queue and pilots– Apache-based Central Server– Pilots retrieve jobs from the server as soon as CPU is available low latency
Highly automated, has an integrated monitoring system, and requires low operation manpower
Integrated with ATLAS Distributed Data Management (DDM) system
Panda
![Page 14: Distributed Analysis Tools Panda & Ganga](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56814444550346895db0e21b/html5/thumbnails/14.jpg)
14
Panda System Overview
EGEE
Panda server
OSG
pilotpilot
Worker Nodes
condor-g
autopyfactoryautopyfactory
https
https
submit
pull
End-user
submit
analysis job
pilotpilot
ProdDB
prod job
job
loggerlogger
bamboobambooLFC
DQ2DQ2
NDGF
aCTaCT
pilotpilot
submitarc
![Page 15: Distributed Analysis Tools Panda & Ganga](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56814444550346895db0e21b/html5/thumbnails/15.jpg)
15
GK maps each job to individual UNIX ID– In traditional model, each user sends job to GK
possible to know who runs a process– In pilot models, pilot factory sends pilots to GK
impossible to distinguish processes using UID. Note that each role is mapped to a different UID and thus it is possible to distinguish role-ed users from end-users
Separation between physical/logical layers is popular– Virtualization (e.g., cloud,LVM,…)– But conflicts with a “policy”
WLCG is going to introduce glexec which changes UID on WN– Each site admin will be able to see who runs a process wit
hout peeking logical layer– File ownership is unrelated to UID since SRM itself sets ow
ner using proxy– Only proxy delegation is required (glexec requires proxy del
egation)
Ownership Issue
![Page 16: Distributed Analysis Tools Panda & Ganga](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56814444550346895db0e21b/html5/thumbnails/16.jpg)
16
Tools to submit or manage analysis jobs on Panda
Five tools– pathena
• Athena jobs– prun
• General jobs (ROOT,python,sh,exe,…)– pbook
• Bookkeeping– psequencer
• Analysis chain (e.g., submit job + download output)
– puserinfo• Access control
panda-client
![Page 17: Distributed Analysis Tools Panda & Ganga](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56814444550346895db0e21b/html5/thumbnails/17.jpg)
17
To submit Athena jobs to PandaA simple command line tool, but contains adv
anced capabilities for more complex needs Provides a consistent interface to users who
are familiar with Athena$ athena jobO.py $ pathena jobO.py
-–inDS inputDatasetName -–outDS outputDatasetName
pathena (1/2)
![Page 18: Distributed Analysis Tools Panda & Ganga](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56814444550346895db0e21b/html5/thumbnails/18.jpg)
18
What pathena does1. Extract job configuration by running Athena wi
th fake application manager2. Collect source/jobO files in local working are
a3. Assign the job to a site where
Athena version is available Input datasets is available CPUs are free
4. Prepare one buildJob to compile source files, and one or many runAthena jobs to run Athena
5. Send them to Panda
pathena (2/2)
![Page 19: Distributed Analysis Tools Panda & Ganga](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56814444550346895db0e21b/html5/thumbnails/19.jpg)
19
What happens when job is submitted (1/2)
Local
sources
Storage
Remote
outputs
outputs
output dataset
buildJob x 1runAthena x N
inputs
inputs
input datasetAutomatically splitAutomatically split
runAthena
runAthena
dq2
pathena
binariestrigger
Single Job =
download
submitcompile
binariesbuildJob
![Page 20: Distributed Analysis Tools Panda & Ganga](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56814444550346895db0e21b/html5/thumbnails/20.jpg)
20
Why buildJob is required?– Platform (OS,CPU-architecture) may be diff
erent between local and remote• Sl5/64bit binaries cannot run on SL4/32bit
– Athena creates some absolute links in InstallArea, i.e., generally not relocatable
– The total time of (buildJob + N x runAthena) is shorter than N x (buildJob+runAthena)• Use CPUs more efficiently
– buildJob can be skipped using an option if you know the step is not required
What happens when job is submitted (2/2)
![Page 21: Distributed Analysis Tools Panda & Ganga](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56814444550346895db0e21b/html5/thumbnails/21.jpg)
21
To submit General jobs to Panda– ROOT (ARA), Python, shell script,exe …
Two-staged Analysis Model of ATLAS– Run Athena on AOD/ESD to produce DPD
pathena– Run ROOT or something on DPD to produce final plots prun
In principle you can do anything, but please avoid careless network operations unless you know well about scalability of those operations– svn co, wget, lcg-cp …– A single job is split to many sub-jobs running in parallel which can easily
break remote servers
prun
![Page 22: Distributed Analysis Tools Panda & Ganga](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56814444550346895db0e21b/html5/thumbnails/22.jpg)
22
Bookkeeping of Panda jobs– Browsing– Kill– Retry
Make local sqlite3 repository to keep personal job information– IMAP like sync-diff mechanism– Not scanning global Panda repository
quick response
Dual user interface– Command-line– Graphical
pbook
![Page 23: Distributed Analysis Tools Panda & Ganga](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56814444550346895db0e21b/html5/thumbnails/23.jpg)
23
Plug-in to access PANDA
def upload (file,backendType): if backendType==EGEE:
egeeModule.upload(file) elif backendType==OSG: osgModule.upload(file) elif backendType==NDGF:
ndgfModule.upload(file) elif backendType==PANDA pandaModule.upload(file)
All ATLAS backends will be consolidated to PANDA Other backends are still maintained for some reason
GangaPanda
![Page 24: Distributed Analysis Tools Panda & Ganga](https://reader036.vdocuments.mx/reader036/viewer/2022062408/56814444550346895db0e21b/html5/thumbnails/24.jpg)
24
Links
User [email protected]
Bug reportSavannah
Documentationspanda-client packagepathenaprunPbookganga