1 a component framework for distributed data analysis in hep jakub t. moscicki cern it/api...

1

A Component Framework for A Component Framework for Distributed Data Analysis in HEPDistributed Data Analysis in HEP

Jakub T. MoscickiCERN IT/[email protected]

ACAT2002, June, Moscow CERN IT/API, [email protected] 2

CERN IT/API R&D ProjectCERN IT/API R&D Project

goals: study requirements of

semi-interactive parallel analysis in HEP middleware technology evaluation & choice

CORBA, MPI, Condor, LSF... also see how to integrate API products with GRID

prototyping (focus on ntuple analysis) young project:

June 2001: start (0.5 FTE) June 2002: running prototype exists (1.5 FTE)

sample Ntuple analysis with Anaphe run-level parallel Geant4 simulation (soon)


How does it fit with the Grid ?

Grid-enabled framework for HEP applications this framework will be a Grid component ...via a gateway that understands Grid/JDL

framework uses lower level Grid components authentication, security, load balancing

distribution aspects parallel cluster computation

"institute" or "workgroup" level (Tier 1-3) local computing center

remote analysis geographically unlimited


Distributed Analysis: MotivationDistributed Analysis: Motivation

why do we want distributed data analysis? move processing close to data

for example ntuple job description ~ kB the data itself ~ MB, GB, TB ...

rather than downloading gigabyte data let the remote server do the job

do it in parallel – faster clusters of cheap PCs


Topology of I/O intensive app.

ntuple mostly I/O intensive rather than CPU intensive

fast DB access from cluster slow network from user to

cluster very small amount of data

exchanged between the tasks in comparison to"input" data


Parallel ntuple analysis

data driven all workers perform same task (similar to SPMD) synchronization quite simple (independent workers) master/worker model


HEP public/workgroup clusters

features many users, many jobs diverse applications:

ntuple analysis, simulation, ... interactive ... semi-interactive ... batch ~ 100s of machines

dynamic environment users may submit their analysis code

mixed CPU and I/O intensive some applications may be preconfigured

general analysis e.g. ntuple projections or experiment specific apps load balancing important

thanks to Anaphe team


Example of ntuple projection

example of semi-interactive analysis data: 30 MB HBOOK ntuple / 37K rows / 160 columns time: minutes .. hours

timings desktop (400Mhz, 128MB RAM) - c.a. 4 minutes standalone lxplus (800Mhz, SMP, 512MB RAM) - c.a. 45

sec 6 lxplus workers - c.a. 18 sec

why 6 * 18 = 45 ? job is small, so big fraction of the time is compilation and

dll loading, rather than computation pre-installing application would improve the speed caveat: example running on AFS and public machines


Medicine applications

example: brachytherapy optimization of the treatment planning by MC simulation

features CPU intensive few users, few jobs one preconfigured application interactive: seconds .. minutes ~ 10s of machines

ongoing joint collaboration with G4and hospital units in Torino, Italy

to be deployed soon

thanks to M.G. Pia


Space science applications

example: LISA MC simulation for gravitational

waves experiment features

CPU intensive big jobs (10 processor-years) preconfigured applications batch: days 1000+ machines

requirements: error recovery important monitoring and diagnostics

thanks to A. Howard


Master/Worker model

applications share the same computation model so also share a big part of the framework code but have different non-functional requirements

JobData

JobResult

InputData

OutputData

Client WorkPlanner

WorkerWorker Worker

Integrator


Architecture principles

framework core 100% application independent e.g. Anaphe/Lizard ntuple analysis is just one application

thin client approach just create a well-formed job description in XML send via CORBA and read the results back in XML so client may be a standalone application in C++ or python,

or integrated into analysis framework (e.g. Lizard)

dynamic application repository plugin repository in XML dynamic loading on the server side + meta-tools (admin)


Architecture principles (2)

component design of the core framework find common parts for all use-cases plug-in use-case specific components do not over-generalize

AIDA-based analysis applications using Lizard/Anaphe but any AIDA compliant

tool could be used (JAS, OpenScientist) see ACAT talks by V.Serbo "AIDA" and M.Sang

"Anaphe" integrated into python environment


Deployment of Distibuted Components

layering: abstract middleware dynamic application loading plugin components


Using CORBA and XML

inter-operability (shown in the prototype ntuple application) cross-release (muchos gracias XML!)

client running Lizard/Anaphe 3.6.6 server running 4.0.0-pre1

cross-language (muchos gracias CORBA!)

python CORBA client (~30 lines) C++ CORBA server

compact XML data messages 500 bytes to server, 22k bytes from server of XML

description factor 106 less than original data (30 MB ntuple)

thin client: no need to run Lizard on the client side as an alternative use case scenario


Facade for end-user analysis

3 groups of user roles developers of distributed analysis applications

brand new applications e.g. simulation advanced users with custom ntuple analysis code

similar to Lizard Analyzer execute custom algorithm on the parallel ntuple scan

interactive users do the standard projections just specify the histogram and ntuple to project

user-friendly means: show only the relevant details hide the complexity of the underlying system


Facade for end-user analysis


Choices for back end s/w

For LHC not yet certain (outcome of LCG) Batch Job System (e.g. LSF)

limited control -> submit jobs (black box) job queues with CPU limits automatic load balancing, scheduling (task creation

and dispatch) prototype: deployed (~10s workers)

Dedicated Interactive Cluster custom daemons more control -> explicit creation of tasks load balancing callbacks into specific application prototype: custom PULL load-balancing (~10s

workers)


Dedicated Interactive Cluster (1)

Daemons per node Dynamic process allocation


Dedicated Interactive Cluster (2)

Daemons per user per node Thread pools, per-user policies


Towards a flexible architecture

Corba Component Model (CCM) pluggable components & services make a truly component system on the core

architecture level common interface to the service

components difficult due to different nature of the services

implementations example: load-balancing service

Condor - process migration LSF - black-box load balancing custom PULL implenetation - active load balancing

but first results very encouraging


Error recovery service

The mechanisms daemon control layer

make sure that the core framework process are alive periodical ping – need to be hierarchized to be

scalable worker sandbox

protect from the seg-faults in the user applications memory corruption exceptions signals

based on standard Unix mechanisms: child processes and signals

24thanks to G.Chwajol


Other services

Interactive data analysis connection-oriented vs connectionless monitoring and fault recovery

User environment replication do not rely on the common filesystem (e.g. AFS) distribution of application code

binary exchange possible for homogeneous clusters distribution of local setup data

configuration files, etc… binary dependencies (shared libraries etc)


Optimization

Optimizing distributed I/O access to data clustering of the data in the DB on the per-task basis depends on the experiment-specific I/O solution

Load balancing framework is not directly addressing low level issues ...but the design must be LB-aware

partition the initial data set and assign data chunks to tasks how big chunks? static/adaptive algorithm?

push vs pull model for dispatching tasks etc.


Long term evolution

Full production in 2007 (LHC startup) software evolution and policy

distributed technology (CORBA, RMI, DCOM, sockets, ...) persistency technology (LCG RTAGs -> ODBMS, RDBMS,

RIO) programming/scripting languages (C++, Java,

python,...) hardware evolution what will come out of Grid?

Globus LCG, DataGrid, CrossGrid (interactive apps) ...


Limitations

Model limited to Master/Worker More complex synchronization patterns

some particular cpu-intensive applications require fine-grained synchronization between workers - this is NOT provided by the framework and must be achieved by other means (e.g MPI)

Intra-cluster scope: NOT a global metacomputer

Grid-enabled gateway to enter Grid universe otherwise the framework is independent thanks to

Abstract Interfaces


Similar project in HEP

PIAF (history) using PAW

TOP-C G4 examples for parallelism at event-level

BlueOx Java using JAS for analysis some space for communality via AIDA

PROOF based on ROOT


Summary

first prototype ready and working proof of concept for up to 50 workers ~1000 workers needs to be checked

deployment comming soon integration with Lizard analysis tool medical apps

active R&D in component architecture relation to LCG (?)

31

That's about it

cern.ch/moscicki/work cern.ch/anaphe aida.freehep.org


Data Exchange Protocol API

/* NTupleProtocol.h */class HistogramParams : public DXP::DataObject{public: HistogramParams(DXP::DataObject *parent) : DXP::DataObject(parent),

nbins(this), xmin(this), xmax(this) {} DXP::Long nbins; DXP::Double xmin; DXP::Double xmax;};class JobResult : public DXP::DataObject{public: JobResult(DXP::DataObject *parent) : DXP::DataObject(parent),

histoXML(this), jobData(this) {}

DXP::String histoXML; JobData jobData;};

1 a component framework for distributed data analysis in hep jakub t. moscicki cern it/api...

Documents