1 a component framework for distributed data analysis in hep jakub t. moscicki cern it/api...
TRANSCRIPT
1
A Component Framework for A Component Framework for Distributed Data Analysis in HEPDistributed Data Analysis in HEP
Jakub T. MoscickiCERN IT/[email protected]
ACAT2002, June, Moscow CERN IT/API, [email protected] 2
CERN IT/API R&D ProjectCERN IT/API R&D Project
goals: study requirements of
semi-interactive parallel analysis in HEP middleware technology evaluation & choice
CORBA, MPI, Condor, LSF... also see how to integrate API products with GRID
prototyping (focus on ntuple analysis) young project:
June 2001: start (0.5 FTE) June 2002: running prototype exists (1.5 FTE)
sample Ntuple analysis with Anaphe run-level parallel Geant4 simulation (soon)
ACAT2002, June, Moscow CERN IT/API, [email protected] 3
How does it fit with the Grid ?
Grid-enabled framework for HEP applications this framework will be a Grid component ...via a gateway that understands Grid/JDL
framework uses lower level Grid components authentication, security, load balancing
distribution aspects parallel cluster computation
"institute" or "workgroup" level (Tier 1-3) local computing center
remote analysis geographically unlimited
ACAT2002, June, Moscow CERN IT/API, [email protected] 4
Distributed Analysis: MotivationDistributed Analysis: Motivation
why do we want distributed data analysis? move processing close to data
for example ntuple job description ~ kB the data itself ~ MB, GB, TB ...
rather than downloading gigabyte data let the remote server do the job
do it in parallel – faster clusters of cheap PCs
ACAT2002, June, Moscow CERN IT/API, [email protected] 5
Topology of I/O intensive app.
ntuple mostly I/O intensive rather than CPU intensive
fast DB access from cluster slow network from user to
cluster very small amount of data
exchanged between the tasks in comparison to"input" data
ACAT2002, June, Moscow CERN IT/API, [email protected] 6
Parallel ntuple analysis
data driven all workers perform same task (similar to SPMD) synchronization quite simple (independent workers) master/worker model
ACAT2002, June, Moscow CERN IT/API, [email protected] 7
HEP public/workgroup clusters
features many users, many jobs diverse applications:
ntuple analysis, simulation, ... interactive ... semi-interactive ... batch ~ 100s of machines
dynamic environment users may submit their analysis code
mixed CPU and I/O intensive some applications may be preconfigured
general analysis e.g. ntuple projections or experiment specific apps load balancing important
thanks to Anaphe team
ACAT2002, June, Moscow CERN IT/API, [email protected] 8
Example of ntuple projection
example of semi-interactive analysis data: 30 MB HBOOK ntuple / 37K rows / 160 columns time: minutes .. hours
timings desktop (400Mhz, 128MB RAM) - c.a. 4 minutes standalone lxplus (800Mhz, SMP, 512MB RAM) - c.a. 45
sec 6 lxplus workers - c.a. 18 sec
why 6 * 18 = 45 ? job is small, so big fraction of the time is compilation and
dll loading, rather than computation pre-installing application would improve the speed caveat: example running on AFS and public machines
ACAT2002, June, Moscow CERN IT/API, [email protected] 9
Medicine applications
example: brachytherapy optimization of the treatment planning by MC simulation
features CPU intensive few users, few jobs one preconfigured application interactive: seconds .. minutes ~ 10s of machines
ongoing joint collaboration with G4and hospital units in Torino, Italy
to be deployed soon
thanks to M.G. Pia
ACAT2002, June, Moscow CERN IT/API, [email protected] 10
Space science applications
example: LISA MC simulation for gravitational
waves experiment features
CPU intensive big jobs (10 processor-years) preconfigured applications batch: days 1000+ machines
requirements: error recovery important monitoring and diagnostics
thanks to A. Howard
ACAT2002, June, Moscow CERN IT/API, [email protected] 11
Master/Worker model
applications share the same computation model so also share a big part of the framework code but have different non-functional requirements
JobData
JobResult
InputData
OutputData
Client WorkPlanner
WorkerWorker Worker
Integrator
ACAT2002, June, Moscow CERN IT/API, [email protected] 12
Architecture principles
framework core 100% application independent e.g. Anaphe/Lizard ntuple analysis is just one application
thin client approach just create a well-formed job description in XML send via CORBA and read the results back in XML so client may be a standalone application in C++ or python,
or integrated into analysis framework (e.g. Lizard)
dynamic application repository plugin repository in XML dynamic loading on the server side + meta-tools (admin)
ACAT2002, June, Moscow CERN IT/API, [email protected] 13
Architecture principles (2)
component design of the core framework find common parts for all use-cases plug-in use-case specific components do not over-generalize
AIDA-based analysis applications using Lizard/Anaphe but any AIDA compliant
tool could be used (JAS, OpenScientist) see ACAT talks by V.Serbo "AIDA" and M.Sang
"Anaphe" integrated into python environment
ACAT2002, June, Moscow CERN IT/API, [email protected] 14
Deployment of Distibuted Components
layering: abstract middleware dynamic application loading plugin components
ACAT2002, June, Moscow CERN IT/API, [email protected] 15
Using CORBA and XML
inter-operability (shown in the prototype ntuple application) cross-release (muchos gracias XML!)
client running Lizard/Anaphe 3.6.6 server running 4.0.0-pre1
cross-language (muchos gracias CORBA!)
python CORBA client (~30 lines) C++ CORBA server
compact XML data messages 500 bytes to server, 22k bytes from server of XML
description factor 106 less than original data (30 MB ntuple)
thin client: no need to run Lizard on the client side as an alternative use case scenario
ACAT2002, June, Moscow CERN IT/API, [email protected] 16
Facade for end-user analysis
3 groups of user roles developers of distributed analysis applications
brand new applications e.g. simulation advanced users with custom ntuple analysis code
similar to Lizard Analyzer execute custom algorithm on the parallel ntuple scan
interactive users do the standard projections just specify the histogram and ntuple to project
user-friendly means: show only the relevant details hide the complexity of the underlying system
ACAT2002, June, Moscow CERN IT/API, [email protected] 17
Facade for end-user analysis
ACAT2002, June, Moscow CERN IT/API, [email protected] 18
Choices for back end s/w
For LHC not yet certain (outcome of LCG) Batch Job System (e.g. LSF)
limited control -> submit jobs (black box) job queues with CPU limits automatic load balancing, scheduling (task creation
and dispatch) prototype: deployed (~10s workers)
Dedicated Interactive Cluster custom daemons more control -> explicit creation of tasks load balancing callbacks into specific application prototype: custom PULL load-balancing (~10s
workers)
ACAT2002, June, Moscow CERN IT/API, [email protected] 19
Dedicated Interactive Cluster (1)
Daemons per node Dynamic process allocation
ACAT2002, June, Moscow CERN IT/API, [email protected] 20
Dedicated Interactive Cluster (2)
Daemons per user per node Thread pools, per-user policies
ACAT2002, June, Moscow CERN IT/API, [email protected] 21
Towards a flexible architecture
Corba Component Model (CCM) pluggable components & services make a truly component system on the core
architecture level common interface to the service
components difficult due to different nature of the services
implementations example: load-balancing service
Condor - process migration LSF - black-box load balancing custom PULL implenetation - active load balancing
but first results very encouraging
22
ACAT2002, June, Moscow CERN IT/API, [email protected] 23
Error recovery service
The mechanisms daemon control layer
make sure that the core framework process are alive periodical ping – need to be hierarchized to be
scalable worker sandbox
protect from the seg-faults in the user applications memory corruption exceptions signals
based on standard Unix mechanisms: child processes and signals
24thanks to G.Chwajol
ACAT2002, June, Moscow CERN IT/API, [email protected] 25
Other services
Interactive data analysis connection-oriented vs connectionless monitoring and fault recovery
User environment replication do not rely on the common filesystem (e.g. AFS) distribution of application code
binary exchange possible for homogeneous clusters distribution of local setup data
configuration files, etc… binary dependencies (shared libraries etc)
ACAT2002, June, Moscow CERN IT/API, [email protected] 26
Optimization
Optimizing distributed I/O access to data clustering of the data in the DB on the per-task basis depends on the experiment-specific I/O solution
Load balancing framework is not directly addressing low level issues ...but the design must be LB-aware
partition the initial data set and assign data chunks to tasks how big chunks? static/adaptive algorithm?
push vs pull model for dispatching tasks etc.
ACAT2002, June, Moscow CERN IT/API, [email protected] 27
Long term evolution
Full production in 2007 (LHC startup) software evolution and policy
distributed technology (CORBA, RMI, DCOM, sockets, ...) persistency technology (LCG RTAGs -> ODBMS, RDBMS,
RIO) programming/scripting languages (C++, Java,
python,...) hardware evolution what will come out of Grid?
Globus LCG, DataGrid, CrossGrid (interactive apps) ...
ACAT2002, June, Moscow CERN IT/API, [email protected] 28
Limitations
Model limited to Master/Worker More complex synchronization patterns
some particular cpu-intensive applications require fine-grained synchronization between workers - this is NOT provided by the framework and must be achieved by other means (e.g MPI)
Intra-cluster scope: NOT a global metacomputer
Grid-enabled gateway to enter Grid universe otherwise the framework is independent thanks to
Abstract Interfaces
ACAT2002, June, Moscow CERN IT/API, [email protected] 29
Similar project in HEP
PIAF (history) using PAW
TOP-C G4 examples for parallelism at event-level
BlueOx Java using JAS for analysis some space for communality via AIDA
PROOF based on ROOT
ACAT2002, June, Moscow CERN IT/API, [email protected] 30
Summary
first prototype ready and working proof of concept for up to 50 workers ~1000 workers needs to be checked
deployment comming soon integration with Lizard analysis tool medical apps
active R&D in component architecture relation to LCG (?)
31
That's about it
cern.ch/moscicki/work cern.ch/anaphe aida.freehep.org
ACAT2002, June, Moscow CERN IT/API, [email protected] 32
Data Exchange Protocol API
/* NTupleProtocol.h */class HistogramParams : public DXP::DataObject{public: HistogramParams(DXP::DataObject *parent) : DXP::DataObject(parent),
nbins(this), xmin(this), xmax(this) {} DXP::Long nbins; DXP::Double xmin; DXP::Double xmax;};class JobResult : public DXP::DataObject{public: JobResult(DXP::DataObject *parent) : DXP::DataObject(parent),
histoXML(this), jobData(this) {}
DXP::String histoXML; JobData jobData;};