distributed monte carlo production forabbott/doe_review_2011/snow_doerev2010.pdf · dzero 's...

27
Distributed Monte Carlo Production for Joel Snow Langston University DOE Review March 2011

Upload: others

Post on 16-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed Monte Carlo Production forabbott/DOE_Review_2011/snow_doerev2010.pdf · DZero 's early deployment of grid technology and automation has dramatically increased MC production

Distributed Monte CarloProduction for

Joel SnowLangston University

DOE Review March 2011

Page 2: Distributed Monte Carlo Production forabbott/DOE_Review_2011/snow_doerev2010.pdf · DZero 's early deployment of grid technology and automation has dramatically increased MC production

DOE Review March 2011 Joel Snow Langston University 2

Outline● Introduction● FNAL SAM● SAMGrid● Interoperability with OSG and LCG ● Production System● Production Results● LUHEP Computing● Summary

Page 3: Distributed Monte Carlo Production forabbott/DOE_Review_2011/snow_doerev2010.pdf · DZero 's early deployment of grid technology and automation has dramatically increased MC production

DOE Review March 2011 Joel Snow Langston University 3

Introduction● Covers my tenure as MC production coordinator● Simulation data (MC) crucial to physics analysis● Tevatron luminosity and hence raw data volume

is at record levels● Challenge for analysts and production● Personnel & computing resources migrating to

LHC experiments● DZero strategy

– Increase automation

– Leverage resources and support

Page 4: Distributed Monte Carlo Production forabbott/DOE_Review_2011/snow_doerev2010.pdf · DZero 's early deployment of grid technology and automation has dramatically increased MC production

DOE Review March 2011 Joel Snow Langston University 4

Evolution ● Mature experiment, but nimble

– history of adopting innovative technologies● distributed data handling - SAM● early adopter of the grid for production - SAMGrid

– significant investment in these technologies

● Grid technology allows opportunistic usage– DZero can mix “traditional” dedicated and

opportunistic resources

● Grid interoperability– Leverages resources and support, reduces

personnel needs per CPU hour

Page 5: Distributed Monte Carlo Production forabbott/DOE_Review_2011/snow_doerev2010.pdf · DZero 's early deployment of grid technology and automation has dramatically increased MC production

DOE Review March 2011 Joel Snow Langston University 5

Sequential data Access via Metadata

Sequential data Access via Metadata

● Fermilab system first used by DZero

● SAM distributed data handling system predates grid

● Set of servers working together to store and retrieve files and metadata

● Permanent storage and local disk caches

● Database tracks location, metadata of files, job processing history

● Delivers files to jobs (using GridFTP over WAN), provides job submission capabilities

Page 6: Distributed Monte Carlo Production forabbott/DOE_Review_2011/snow_doerev2010.pdf · DZero 's early deployment of grid technology and automation has dramatically increased MC production

DOE Review March 2011 Joel Snow Langston University 6

SAMGrid● Fermilab developed grid first used by DZero for global MC

production in 2004

● SAMGrid = SAM + Job and Information Management (JIM) components

● Provides the user with transparent remote job submission, data processing and status monitoring.

● VDT based (Globus + Condor)

● Logically consists of

– Multiple execution sites

– Resource selector

– Multiple Job Submission (Scheduler) sites

– Multiple Clients (User Interface) to Submission site.

Page 7: Distributed Monte Carlo Production forabbott/DOE_Review_2011/snow_doerev2010.pdf · DZero 's early deployment of grid technology and automation has dramatically increased MC production

DOE Review March 2011 Joel Snow Langston University 7

SAMGrid Interoperability● As Open Science Grid (OSG) and LHC

Computing Grid (LCG) became operational it was desirable to leverage these resources for DZero

● FNAL and DZero developed and deployed SAMGrid interoperability with both LCG and OSG resources

● Execution site acts as a Forwarding node – packages SAMGrid jobs for OSG/LCG job

submission via Condor-G

Page 8: Distributed Monte Carlo Production forabbott/DOE_Review_2011/snow_doerev2010.pdf · DZero 's early deployment of grid technology and automation has dramatically increased MC production

DOE Review March 2011 Joel Snow Langston University 8

Consolidation, Automation, Exploitation

● SAMGrid sites require operational manpower and expert support

● People power and FNAL support migrating to LHC experiments

● Increase automation - Automc● Reduce number of SAMGrid sites, increase

use of OSG and LCG – comes with support ✔

– provides opportunistic job slots ✔

Page 9: Distributed Monte Carlo Production forabbott/DOE_Review_2011/snow_doerev2010.pdf · DZero 's early deployment of grid technology and automation has dramatically increased MC production

DOE Review March 2011 Joel Snow Langston University 9

Production SystemMC production gets work

from the SAM Request System

Physics groups' MC requests are parametrized and prioritized as a Python object

Page 10: Distributed Monte Carlo Production forabbott/DOE_Review_2011/snow_doerev2010.pdf · DZero 's early deployment of grid technology and automation has dramatically increased MC production

DOE Review March 2011 Joel Snow Langston University 10

Automatic Monte Carlo Request Processing

● Developed Automc System – in use at FNAL– Handles official DZero MC production at all but 2 sites

● From approved request to final data storage ● Easy to use – minimizes manpower needs● Site independent

– deploy for any grid site (SAMGrid, OSG, LCG)

– capable of managing many sites

● Handle recovery of common failures● Integrates with existing MC request priority protocol

Page 11: Distributed Monte Carlo Production forabbott/DOE_Review_2011/snow_doerev2010.pdf · DZero 's early deployment of grid technology and automation has dramatically increased MC production

DOE Review March 2011 Joel Snow Langston University 11

AutoMC MonitoringRunning at FNAL & managing production at 39 sites

http://www-d0.fnal.gov/computing/mcprod/dajd/dajd_status.html

Page 12: Distributed Monte Carlo Production forabbott/DOE_Review_2011/snow_doerev2010.pdf · DZero 's early deployment of grid technology and automation has dramatically increased MC production

DOE Review March 2011 Joel Snow Langston University 12

Production System Resources

● MC production uses a variety of dedicated and opportunistic resources on 4 continents– Non-gridNon-grid site at ccin2p3 Lyon (FR) – very productive,

flexible

– Native SamgridNative Samgrid sites: FZU (CZ), GridKa (DE), LUHEP (US), USTC (CN)

– LCG resourcesLCG resources: CE's, SE's, and Samgrid-LCG infrastructure in FR, UK, NL

– OSG resources:OSG resources: CE's, SE's, and Samgrid-OSG infrastructure in US

Page 13: Distributed Monte Carlo Production forabbott/DOE_Review_2011/snow_doerev2010.pdf · DZero 's early deployment of grid technology and automation has dramatically increased MC production

DOE Review March 2011 Joel Snow Langston University 13

MC Production ResultsLooking back at the last 30 days

Averaging 5.8M events per day and totaling 172.8M events in 30 days

Page 14: Distributed Monte Carlo Production forabbott/DOE_Review_2011/snow_doerev2010.pdf · DZero 's early deployment of grid technology and automation has dramatically increased MC production

DOE Review March 2011 Joel Snow Langston University 14

MC Production ResultsLooking back at the last year

Averaging 49M events per week and totaling 2.6B events in a year

cumulative since September 2005.(2010/02/14 - 2011/02/14)

Page 15: Distributed Monte Carlo Production forabbott/DOE_Review_2011/snow_doerev2010.pdf · DZero 's early deployment of grid technology and automation has dramatically increased MC production

DOE Review March 2011 Joel Snow Langston University 15

MC Production ResultsLooking back at the last year by production segment

52 week averages per week (2010/02/14 - 2011/02/14)

Non-grid: 19.8M, OSG: 11.4M, Samgrid: 12.6M, LCG: 4.9M

Page 16: Distributed Monte Carlo Production forabbott/DOE_Review_2011/snow_doerev2010.pdf · DZero 's early deployment of grid technology and automation has dramatically increased MC production

DOE Review March 2011 Joel Snow Langston University 16

MC Production ResultsLooking back at the last year by production segment

Cumulative since September, 2005

52 week totals (2010/02/14 - 2011/02/14)Non-grid: 1041M, OSG: 596M, Samgrid: 658M, LCG: 257M

40.8% 23.3% 25.8% 10.1%

Production Last Year By Segment

Nongrid OSG Samgrid LCG

Page 17: Distributed Monte Carlo Production forabbott/DOE_Review_2011/snow_doerev2010.pdf · DZero 's early deployment of grid technology and automation has dramatically increased MC production

DOE Review March 2011 Joel Snow Langston University 17

MC Production Geographic Distribution

Events Last Year:

Europe 1925M

N. America 574M

Asia 29M

S. America 24M 75.4%

0.9%

22.5%

1.1%

Europe S. AmericaN. America Asia

(2010/02/14 - 2011/02/14)

Page 18: Distributed Monte Carlo Production forabbott/DOE_Review_2011/snow_doerev2010.pdf · DZero 's early deployment of grid technology and automation has dramatically increased MC production

DOE Review March 2011 Joel Snow Langston University 18

MC Production ResultsLooking back at the last 5.5 years

Averaging 19.2M events per week and totaling 2.82B events

cumulative since September 2005.(2005/09/05 - 2011/02/14)

Page 19: Distributed Monte Carlo Production forabbott/DOE_Review_2011/snow_doerev2010.pdf · DZero 's early deployment of grid technology and automation has dramatically increased MC production

DOE Review March 2011 Joel Snow Langston University 19

MC Production ResultsLooking back at the last 5.5 years by production segment

5.5 year averages per week (2005/09/05 - 2011/02/14)

Non-grid: 8.0M, OSG: 4.8M, Samgrid: 5.3M, LCG: 1.1M

Page 20: Distributed Monte Carlo Production forabbott/DOE_Review_2011/snow_doerev2010.pdf · DZero 's early deployment of grid technology and automation has dramatically increased MC production

DOE Review March 2011 Joel Snow Langston University 20

MC Production ResultsLooking back at the last 5.5 years by production segment

Cumulative since September, 2005

5.5 year totals (2005/09/05 - 2011/02/14)Non-grid: 2.26B, OSG: 1.37B, Samgrid: 1.51B, LCG: 306M

41.5% 25.2% 27.7% 5.6%

Production Last Year By Segment

Nongrid OSG Samgrid LCG

Page 21: Distributed Monte Carlo Production forabbott/DOE_Review_2011/snow_doerev2010.pdf · DZero 's early deployment of grid technology and automation has dramatically increased MC production

DOE Review March 2011 Joel Snow Langston University 21

Production Results Last 7 YearsDZero MC Production in Millions of Events per year ending 12/26

Year Total Non-Grid OSG LCG2010 2388.5 1011.2 614.8 539.2 223.32009 1122.6 540.3 217.9 364.2 0.32008 794.8 315.6 213.6 259.7 5.82007 398.2 109.1 158.1 96.5 34.42006 348.0 144.4 195.5 0.5 7.62005 98.1 68.6 29.5 0.0 0.02004 42.4 41.8 0.6 0.0 0.0

SAMGrid

2004 2005 2006 2007 2008 2009 2010

0

500

1000

1500

2000

2500

3000

DZero MC Production in Millions of Events

LCGOSGSAMGridNon-Grid

Page 22: Distributed Monte Carlo Production forabbott/DOE_Review_2011/snow_doerev2010.pdf · DZero 's early deployment of grid technology and automation has dramatically increased MC production

DOE Review March 2011 Joel Snow Langston University 22

Production Results Last 7 YearsDZero MC Production in Terabytes of Data per year ending 12/26

Year Total Non-Grid OSG LCG2010 221.0 83.3 61.8 53.7 22.32009 95.3 42.7 19.8 32.8 0.02008 67.8 26.9 18.4 22.0 0.52007 31.6 7.3 13.2 8.2 2.92006 23.0 9.4 13.1 0.0 0.52005 6.0 4.1 1.9 0.0 0.02004 1.9 1.9 0.0 0.0 0.0

SAMGrid

2004 2005 2006 2007 2008 2009 2010

0

50

100

150

200

250

DZero MC Production in Terabytes of Data

LCGOSGSAMGridNon-Grid

Page 23: Distributed Monte Carlo Production forabbott/DOE_Review_2011/snow_doerev2010.pdf · DZero 's early deployment of grid technology and automation has dramatically increased MC production

DOE Review March 2011 Joel Snow Langston University 23

OU DZero MC Production2005/09/05 - 2011/02/14

OUHEP produced 306 M events and 28.4 TB data

Last year OUHEP produced 139 M events and 14.0 TB data

Cumulative since Sept. 20052010/02/14 – 2011/02/14

Page 24: Distributed Monte Carlo Production forabbott/DOE_Review_2011/snow_doerev2010.pdf · DZero 's early deployment of grid technology and automation has dramatically increased MC production

DOE Review March 2011 Joel Snow Langston University 24

LU DZero MC Production2005/09/05 - 2011/02/14

LUHEP produced 15.5 M events and 1.36 TB data

Last year LUHEP produced 4.6 M events and 450 GB data

Cumulative since Sept. 20052010/02/14 – 2011/02/14

Page 25: Distributed Monte Carlo Production forabbott/DOE_Review_2011/snow_doerev2010.pdf · DZero 's early deployment of grid technology and automation has dramatically increased MC production

DOE Review March 2011 Joel Snow Langston University 25

LUHEP Computing

● 2 grid enabled clusters both producing DØ MC

● Old Samgrid cluster- 12 job slots

● New OSG cluster - 12 job slots with small associated SE used as DØ cache

Page 26: Distributed Monte Carlo Production forabbott/DOE_Review_2011/snow_doerev2010.pdf · DZero 's early deployment of grid technology and automation has dramatically increased MC production

DOE Review March 2011 Joel Snow Langston University 26

Condor Q's at LUHEP

SAMGrid OSGLast Year

Page 27: Distributed Monte Carlo Production forabbott/DOE_Review_2011/snow_doerev2010.pdf · DZero 's early deployment of grid technology and automation has dramatically increased MC production

DOE Review March 2011 Joel Snow Langston University 27

Summary● DZero 's early deployment of grid technology

and automation has dramatically increased MC production– First deployment SAM distributed data handling

system

– Early SAMGrid deployment

– Use of OSG and LCG resources through interoperability with SAMGrid

– First opportunistic usage of OSG Storage Elements

– Automated MC production system● Anticipate adequate MC through the last analysis