2015-10-17 atlas@home wenjing wu andrej filipčič david cameron eric lancon claire adam bourdarios...

Post on 04-Jan-2016

218 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

23/4/20

ATLAS@home Wenjing Wu

Andrej FilipčičDavid Cameron

Eric LanconClaire Adam Bourdarios

& others

ATLAS : Elementary Particle PhysicsOne of the biggest experiment at CERNtrying to understand the origin of mass which completes the standard model2012 , ATLAS and CMS discovered Higgs Boson

23/4/20

23/4/20

23/4/20

data processing flow in ATLAS

Why ATLAS@home

• It's free! Well, almost.• Public outreach – volunteers want to know more

about the project they participate• Good for ATLAS visibility• Can add a significant computing power to WLCG• A brief history

– Started end of 2013, at a test instance at IHEP, Beijing– Migrated to CERN and officially launched June 2014– are continuously running.

23/4/20

ATLAS@home• Goal: to run ATLAS simulation jobs on volunteer com

puters. • Challenges:

– Big ATLAS software base, ~10GB, and very platform dependant , runs on Scientific Linux

– Volunteer computing resources, should be integrated into the current Grid Computing infrastructure. In other words, all the volunteer computers should appear as a WLCG site, and Jobs are submited from PanDA(ATLAS Grid Computing Portal).

– Grid Computing relies heavily on personal credentials, but these credential should not be put on volunteer computers

Solutions

• Use VirtualBox+vmwrapper to virtualize volunteer hosts • Use network file system CVMFS to distribute ATLAS software,

as CVMFS supports on-demand file caching, it helps to reduce the image size.

• In order to avoid placing credential on the volunteer hosts, Arc CE is introduced in the architecture together with BOINC– Arc CE is grid middleware, it interacts with ATLAS Central Grid Services,

and manages different LRMS (Local Resource Management System), such as Condor, PBS by specific LRMS plugins

– A BOINC plugin is developped, to forward “Grid Jobs” to the BOINC server, and convert the job results into Grid format.

Architecture

23/4/20

ATLAS Workload Management System

BOINC ARC plugin(1)• Converts a ARC CE job into a BOINC job• The Plugin includes:

– Submit/scan/cancel job – Information provider (total CPUs, CPU usages, job status)

• Submit– ARC CE job: All input files into one tar.gz file– Copy the input file from ARC CE session directory into BOINC internal

directory– Setup BOINC environment and call BOINC command to generate a job

based on job templates/input files– Wrote the jobid back to ARC CE job control directory. – Upon job finishing, BOINC services put the desired output files back to

the ARC CE session directory

BOINC ARC CE plugin(2)• Scan

– Scan the job diag file (in session directory), get the exit code, upload output files to designated SE, update ARC CE job status.

• Cancel– Cancel a BOINC job

• Information provider– Query BOINC DB, get information concerning total CPU number, CPU usage,

status of each job

Current Statusgained CPU hours: 103,355 daily resource: 3% of grid computing

Current Status:

the Whole ATLAS Computing

ATLAS jobs• Full ATLAS simulation jobs

– 10 evts/job initially– Now 100 evts/job

• A typical ATLAS simulation job– 40~80MB Input data– 10~30MB output data– on average, 92 minutes CPU time, 114 minutes elapsed time

• CPU efficiency lower than on grid– Slow home network → significant– initialization time– CPUs not available all the time

• Jobs run in an SLC5 64-bit->upgraded to SLC6 (Ucernvm)• virtualization on Windows, Linux, Mac• ANY kind of job could run onATLAS@HOME

23/4/20

How Grid People see ATLAS@home

• Volunteers want to earn the credits for their contribution, they want their PCs to work optimally

– This is true for the grid sites as well, at least it should be– But volunteers are better shifters then we are

• Different to what we are used to:– On grid: jobs are failing, please fix the sites!– On Boinc: jobs suck, please fix your code!

• ATLAS@HOME is the first Boinc project massive I/O demands, even for less intensive jobs

– Server infrastructure needs to be carefully planned to cope with a high load Credentials must not be passed to PCs

• Jobs can be in the execution mode for a long time, depending on the volunteer computer preferences, not suitable for high priority tasks

23/4/20

ATLAS outreach• outreach website: https://atlasphysathome.web.cern.ch/ • feedback mail list: atlas-comp-contact-home@cern.ch

Future Effort (1)

• Customize the VM image to reduce the network traffic and speed up the initialization

• Optimize the file transfers, server load and job efficiency on the PCs

• Test and migrate to LHC@home infrastructure• Test if BOINC can replace the small Grid Sites• Investigation of the use of BOINC on local batch

clusters to run ATLAS jobs. • Investigation of running various worflows (longer

jobs, multi-core jobs) on virtual machines

23/4/20

Future Effort(2)• provide an event display & possibly screen saver that would let people see

what they are running.

Acknowledgements

• David and Rom for all the supports and suggestions.

• CERN IT for providing Servers and Storage resources for ATLAS@home, working on integrating ATLAS@home with LHC@home

top related