interactive data analysis with proof bleeding edge physics with bleeding edge computing fons...

Interactive Data Analysis with PROOFBleeding Edge Physics with Bleeding Edge Computing

Fons RademakersCERN

DVD stack with oneyear of LHC data!(~ 20 Km)

Mt. Blanc(4.8 Km)

LHC Data Challenge•The LHC generates:

■ 40 million collisions per second

•Combined the 4 experiments record:■ After filtering, 100 interesting collision per

second

■ From 1 to 12 MB per collision ⇒ from 0.1 to 1.2 GB/s

■ 1010 collisions registered every year

■ ~ 10 PetaBytes (1015 B) per year

■ LHC data correspond to 20 millions DVD’s per year!

■ Computing power equivalent to 100.000 of today’s PC

■ Space equivalent to 400.000 large PC disks

Balloon(30 Km)

Airplane(10 Km)

•Typical HEP analysis needs a continuous algorithm refinement cycle

HEP Data Analysis

Implement Implement algorithmalgorithmImplement Implement algorithmalgorithm

Run over data setRun over data setRun over data setRun over data set

Make Make improvementsimprovements

HEP Data Analysis•Ranging from I/O bound to CPU bound

•Need many disks to get the needed I/O rate

•Need many CPUs for processing

•Need a lot of memory to cache as much as possible

Some ALICE Numbers

•1.5 PB of raw data per year

•360 TB of ESD+AOD per year (20% of raw)

•One pass using 400 disks at 15 MB/s will take 16 hours

Using parallelism is the only way to analyze this amount of data in a reasonable amount of time

PROOF Design Goals•System for running ROOT queries in parallel on a large number of distributed computers or multi-core machines

•PROOF is designed to be a transparent, scalable and adaptable extension of the local interactive ROOT analysis session

•Extends the interactive model to long running “interactive batch” queries

Where to Use PROOF•CERN Analysis Facility (CAF)

•Departmental workgroups (Tier-2’s)

•Multi-core, multi-disk desktops (Tier-3/4’s)

The Traditional Batch Approach

File File catalogcatalog

BatchBatchScheduleSchedule

StorageStorage

CPU’sCPU’s

QueryQuery

Split analysis job in NSplit analysis job in Nstand-alone sub-jobsstand-alone sub-jobs

Collect sub-jobs andCollect sub-jobs andmerge into single outputmerge into single output

Batch clusterBatch cluster

• Split analysis task in N batch jobs• Job submission sequential• Very hard to get feedback during processing• Analysis finished when last sub-job finished

Job Job splittersplitter

JobJob

Job Job MergerMerger

JobJob

QueueQueue

JobJob

The PROOF Approach

File File catalogcatalog

MasterMaster

ScheduleSchedulerr

StorageStorage

CPU’sCPU’s

QueryQuery

PROOF query:PROOF query:data file list, mySelector.Cdata file list, mySelector.C

Feedback,Feedback,merged final outputmerged final output

PROOF clusterPROOF cluster

• Cluster perceived as extension of local PC• Same macro and syntax as in local session• More dynamic use of resources• Real-time feedback• Automatic splitting and merging

Multi-Tier Architecture

Adapts to wide area

virtual clusters

Geographically separated domains,

heterogeneous machines

Network performanceLess important VERY important

Optimize for Optimize for data localitydata locality or high bandwidth data or high bandwidth data server accessserver access

The ROOT Data ModelTrees & Selectors

preselectipreselectionon

analysianalysiss

Output listOutput list

Process()Process()

BrancBranchh

LeaLeaff

EventEvent nnRead Read

needed needed parts onlyparts only

ChainChain

Loop over eventsLoop over events

11 22 nn lastlast

Terminate()Terminate()- Finalize analysis- Finalize analysis

(fitting, ...)(fitting, ...)

Terminate()Terminate()- Finalize analysis- Finalize analysis

(fitting, ...)(fitting, ...)

Begin()Begin()- Create histograms- Create histograms- Define output list- Define output list

TSelector - User Code

// Abbreviated versionclass TSelector : public TObject {protected: TList *fInput; TList *fOutput;public void Notify(TTree*); void Begin(TTree*); void SlaveBegin(TTree *); Bool_t Process(int entry); void SlaveTerminate(); void Terminate();};

TSelector::Process()

... ... // select event b_nlhk->GetEntry(entry); if (nlhk[ik] <= 0.1) return kFALSE; b_nlhpi->GetEntry(entry); if (nlhpi[ipi] <= 0.1) return kFALSE; b_ipis->GetEntry(entry); ipis--; if (nlhpi[ipis] <= 0.1) return kFALSE; b_njets->GetEntry(entry); if (njets < 1) return kFALSE; // selection made, now analyze event b_dm_d->GetEntry(entry); //read branch holding dm_d b_rpd0_t->GetEntry(entry); //read branch holding rpd0_t b_ptd0_d->GetEntry(entry); //read branch holding ptd0_d

//fill some histograms hdmd->Fill(dm_d); h2->Fill(dm_d,rpd0_t/0.029979*1.8646/ptd0_d); ... ...

The Packetizer•The packetizer is the heart of the system

•It runs on the master and hands out work to the workers

•Different packetizers allow for different data access policies

■All data on disk, allow network access■All data on disk, no network access■Data on mass storage, go file-by-file■Data on Grid, distribute per Storage Element

•Makes sure all workers end at the same time

Pull architectureworkers ask for work, no complex worker state in the

master

PROOF Scalability on Multi-Core Machines

Current version of Mac OS X fully8 core capable. Running my MacProsince 4 months with dual Quad CoreCPU’s.

Production Usage in Phobos

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Interactive Batch•Allow submission of long running queries

•Allow client/master disconnect, reconnect

•Allow interaction and feedback at any time during the processing

Analysis Scenario

AQ1: 1s query produces a local histogramAQ1: 1s query produces a local histogramAQ2: a 10m query submitted to PROOF1AQ2: a 10m query submitted to PROOF1AQ3 - AQ7: short queriesAQ3 - AQ7: short queriesAQ8: a 10h query submitted to PROOF2AQ8: a 10h query submitted to PROOF2

BQ1: browse results of AQ2BQ1: browse results of AQ2BQ2: browse intermediate results of AQ8BQ2: browse intermediate results of AQ8AQ3 - AQ6: submit 4 10m queries to PROOF1AQ3 - AQ6: submit 4 10m queries to PROOF1

CQ1: browse results of AQ8, BQ3 - BQ6CQ1: browse results of AQ8, BQ3 - BQ6

Monday at 10:15Monday at 10:15ROOT session on my ROOT session on my laptoplaptop

Monday at 16:25Monday at 16:25ROOT session on my ROOT session on my desktopdesktop

Wednesday at 8:40Wednesday at 8:40Browse from any web Browse from any web browserbrowser

•Open/close sessions

•Define a chain

•Submit a query,execute a command

•Query editor

•Online monitoring of feedback histograms

•Browse folders with query results

•Retrieve, archive and delete query results

Session Viewer GUI

Monitoring•MonALISA based monitoring

■Each host reports to MonALISA■Each worker reports to MonALISA

•Internal monitoring■File access rate, packet generation time and latency, processing time, etc.

■Produces a tree for further analysis

Query Monitoring

The same for: CPU usage, cluster usage, memory, event rate,local/remote MB/s and files/s

ALICE CAF Test Setup

•Since May evaluation of CAF test setup■33 machines, 2 CPUs each, 200 GB disk

•Tests performed■Usability tests■Simple speedup plot■Evaluation of different query types■Evaluation of the system when running a combination of query types

Query Type Cocktail

•4 different query types■20% very short queries■40% short queries■20% medium queries■20% long queries

•User mix■33 nodes■10 users, 10 or 30 workers/user, max ave. speedup = 6.6

■5 users, 20 workers/user■15 users, 7 workers/user

Relative Speedup

Average expected speedupAverage expected speedup

Relative Speedup

Average expected speedupAverage expected speedup

Cluster Efficiency

Multi-User Scheduling

•Scheduler is needed to control the use of available resources in multi-user environments

•Decisions taken per query based on the following metric:

■Overall cluster load■Resources needed by the query■User quotas and priorities

•Requires dynamic cluster reconfiguration

•Generic interface to external schedulers planned (Condor, LSF, ...)

Conclusions•The LHC will generate data on a scale not seen anywhere before

•LHC experiments will critically depend on parallel solutions to analyze their enormous amounts of data

•Grids will very likely not provide the needed stability and reliability we need for repeatable high statistics analysis

•Wish us good luck!

interactive data analysis with proof bleeding edge physics with bleeding edge computing fons...

interactive data analysis

yearlhc data

network accessall data

processing analysis

pb of raw data

nlhpiipis getentryentry

nlhkik getentryentry

event b

Documents

law, practice, and policy at the bleeding edge—2017...

romantic view collection jef rademakers

bleeding edgeomas pynchon bleeding edge roman aus dem...

living on the bleeding edge of collection development...

growth with data- science - wordpress.com€¦ · growth...

scaling bleeding edge technology in a fast-paced environment

logistiek & e-commerce 2015: kees willem rademakers post-nl

peter rademakers / sportadviseur. ehbso voor verenigingen

abusing bleeding edge web standards for appsec glory

the bleeding edge “coaching, consulting, and financial...

big data analytics: technology's bleeding edge

life on the bleeding edge: more secrets of the nvidia demo...

biztalk, moss, and infopathâ€¦ notes from the bleeding...

cis13: big data analytics vendor perspective: insights from...

abusing bleeding edge web standards for appsec glory con...

software testing: the bleeding edge!

visual resume of guillermo rademakers august 2011 (spanish...

single page apps a bleeding edge new concept or a revived...

sql 2012 and powershell for the bleeding edge dba

what’s here, what’s coming? bleeding edge video...