proof and condor
DESCRIPTION
PROOF and Condor. Fons Rademakers http://root.cern.ch. PROOF – Parallel ROOT Facility. Collaboration between core ROOT group at CERN and MIT Heavy Ion Group. Part of and based on ROOT framework Uses heavily ROOT networking and other infrastructure classes. Main Motivation. - PowerPoint PPT PresentationTRANSCRIPT
December, 2003 ACAT'03 1
PROOF and Condor
Fons Rademakers
http://root.cern.ch
December, 2003 ACAT'03 2
PROOF – Parallel ROOT Facility
Collaboration between core ROOT group at CERN and MIT Heavy Ion Group
Part of and based on ROOT framework Uses heavily ROOT networking and
other infrastructure classes
December, 2003 ACAT'03 3
Main Motivation Design a system for the interactive analysis of
very large sets of ROOT data files on a cluster of computers
The main idea is to speed up the query processing by employing parallelism
In the GRID context, this model will be extended from a local cluster to a wide area “virtual cluster”. The emphasis in that case is not so much on interactive response as on transparency
With a single query, a user can analyze a globally distributed data set and get back a “single” result
The main design goals are: Transparency, scalability, adaptability
December, 2003 ACAT'03 5
Parallel Chain Analysis
root
Remote PROOF Cluster
proof
proof
proof
TNetFile
TFile
Local PC
$ root
ana.Cstdout/obj
node1
node2
node3
node4
$ root
root [0] tree.Process(“ana.C”)
$ root
root [0] tree.Process(“ana.C”)
root [1] gROOT->Proof(“remote”)
$ root
root [0] tree.Process(“ana.C”)
root [1] gROOT->Proof(“remote”)
root [2] chain.Process(“ana.C”)
ana.C
proof
proof = slave server
proof
proof = master server
#proof.confslave node1slave node2slave node3slave node4
*.root
*.root
*.root
*.root
TFile
TFile
December, 2003 ACAT'03 6
PROOF - Architecture
Data Access Strategies Local data first, also rootd, rfio, dCache,
SAN/NAS Transparency
Input objects copied from client Output objects merged, returned to client
Scalability and Adaptability Vary packet size (specific workload, slave
performance, dynamic load) Heterogeneous Servers
Support to multi site configurations
December, 2003 ACAT'03 7
Workflow For Tree Analysis –Pull Architecture
Initialization
Process
Process
Process
Process
Wait for nextcommand
Slave 1Process(“ana.C”)
Pac
ket
gen
erat
or
Initialization
Process
Process
Process
Process
Wait for nextcommand
Slave NMaster
GetNextPacket()
GetNextPacket()
GetNextPacket()
GetNextPacket()
GetNextPacket()
GetNextPacket()
GetNextPacket()
GetNextPacket()
SendObject(histo)SendObject(histo)
Addhistograms
Displayhistograms
0,100
200,100
340,100
490,100
100,100
300,40
440,50
590,60
Process(“ana.C”)
December, 2003 ACAT'03 8
Data Access Strategies
Each slave get assigned, as much as possible, packets representing data in local files
If no (more) local data, get remote data via rootd, rfiod or dCache (needs good LAN, like GB eth)
In case of SAN/NAS just use round robin strategy
December, 2003 ACAT'03 9
Additional Issues
Error handling Death of master and/or slaves Ctrl-C interrupt
Authentication Globus, ssh, kerb5, SRP, clear passwd,
uid/gid matching Sandbox and package manager
Remote user environment
December, 2003 ACAT'03 10
Running a PROOF Job
Specify a collection of TTrees or files with objects
root[0] gROOT->Proof(“cluster.cern.ch”);
root[1] TDSet *set = new TDSet(“TTree”, “AOD”);
root[2] set->AddQuery(“lfn:/alice/simulation/2003-04”,“V0.6*.root”);
…
root[10] set->Print(“a”);
root[11] set->Process(“mySelector.C”);
Returned by DB or File Catalog query etc. Use logical filenames (“lfn:…”)
December, 2003 ACAT'03 11
The Selector Basic ROOT TSelector
Created via TTree::MakeSelector()
// Abbreviated version
class TSelector : public TObject {
Protected:
TList *fInput;
TList *fOutput;
public
void Init(TTree*);
void Begin(TTree*);
void SlaveBegin(TTree *);
Bool_t Process(int entry);
void SlaveTerminate();
void Terminate();
};
December, 2003 ACAT'03 12
PROOF Scalability
32 nodes: dual Itanium II 1 GHz CPU’s,2 GB RAM, 2x75 GB 15K SCSI disk,1 Fast Eth
Each node has one copy of the data set(4 files, total of 277 MB), 32 nodes: 8.8 Gbyte in 128 files, 9 million events
8.8GB, 128 files1 node: 325 s
32 nodes in parallel: 12 s
December, 2003 ACAT'03 13
PROOF and Data Grids
Many services are a good fit Authentication File Catalog, replication services Resource brokers Job schedulers Monitoring
Use abstract interfaces
December, 2003 ACAT'03 14
The Condor Batch System Full-featured batch system
Job queuing, scheduling policy, priority scheme, resource monitoring and management
Flexible, distributed architecture Dedicated clusters and/or idle desktops Transparent I/O and file transfer
Based on 15 years of advanced research Platform for ongoing CS research Production quality, in use around the world,
pools with 100’s to 1000s of nodes. See: http://www.cs.wisc.edu/condor
December, 2003 ACAT'03 15
COD - Computing On Demand
Active, ongoing research and development Share batch resource with interactive use
Most of the time normal Condor batch use Interactive job “borrows” the resource for short
time Integrated into Condor infrastructure
Benefits Large amount of resource for interactive burst Efficient use of resources (100% use)
December, 2003 ACAT'03 16
COD - Operations
BatchNormal batchBatchRequest claimBatchCODActivate claimBatchCODSuspend claimBatchCODResumeBatchDeactivateBatchRelease
December, 2003 ACAT'03 17
PROOF and COD
Integrate PROOF and Condor COD Great cooperation with Condor team
Master starts slaves as COD jobs Standard connection from master to
slave Master resumes and suspends slaves
as needed around queries Use Condor or external resource
manager to allocate nodes (vm’s)
December, 2003 ACAT'03 18
Condor
Slave Batch
Master
Condor
Slave Batch
Condor
Batch
Condor
Client
PROOF and COD
December, 2003 ACAT'03 19
PROOF and COD Status
Status Basic implementation finished Successfully demonstrated at SC’03
with 45 slaves as part of PEAC TODO
Further improve interface between PROOF and COD
Implement resource accounting
December, 2003 ACAT'03 20
PEAC – PROOF Enabled Analysis Cluster
Complete event analysis solution Data catalog and data management Resource broker PROOF
Components used: SAM catalog, dCache, new global resource broker, Condor+COD, PROOF
Multiple computing sites with independent storage systems
December, 2003 ACAT'03 21
PEAC System Overview
December, 2003 ACAT'03 22
PEAC Status
Successful demo at SC’03 Four sites, up to 25 nodes Real CDF StNtuple based analysis COD tested with 45 slaves
Doing post mortem and plan for next design and implementation phases Available manpower will determine time
line Plan to use 250 node cluster at FNAL Other cluster at UCSD
December, 2003 ACAT'03 23
Conclusions PROOF maturing Lot of interest from experiments with large
data sets COD essential to share batch and
interactive work on the same cluster Maximizes resource utilization
PROOF turns out to be powerful application to use and show the power of Grid middleware to its full extend See tomorrows talk by Andreas Peters on
PROOF and AliEn