refdb: the reference database for cms monte carlo production vronique lefbure cern hip chep 2003 -...

23
RefDB: The Reference Database for CMS Monte Carlo Production Véronique Lefébure CERN & HIP CHEP 2003 - San Diego, California 25 th of March 2003

Upload: dennis-bradley

Post on 19-Jan-2018

216 views

Category:

Documents


0 download

DESCRIPTION

Véronique Lefébure - CHEP20033 General Data Flow Web Interface : RefDB Request Physicist (many) Production Coordinator (one) Assignment Production Operator (many) Workflow Planner * RUN Summary CPU Mail box *IMPALA, McRunjob, CMSProd

TRANSCRIPT

Page 1: RefDB: The Reference Database for CMS Monte Carlo Production Vronique Lefbure CERN  HIP CHEP 2003 - San Diego, California 25 th of March 2003

RefDB: The Reference Database

for CMS Monte Carlo Production

Véronique LefébureCERN & HIP

CHEP 2003 - San Diego, California 25th of March 2003

Page 2: RefDB: The Reference Database for CMS Monte Carlo Production Vronique Lefbure CERN  HIP CHEP 2003 - San Diego, California 25 th of March 2003

Véronique Lefébure - CHEP2003 2

Functionalities of RefDB

1. Management of Physics Production Requests2. Distribution, Coordination and Progress Tracking

of Production around the World: Production Assignments3. Definition of Production Instructions for workflow-planner4. Catalogue Publication of Real and Virtual Data

MySQL Database hosted at CERN Web-server, .htaccess and Php scripts

Page 3: RefDB: The Reference Database for CMS Monte Carlo Production Vronique Lefbure CERN  HIP CHEP 2003 - San Diego, California 25 th of March 2003

Véronique Lefébure - CHEP2003 3

General Data Flow

Web Interface: http://cmsdoc.cern.ch/…./*.php

RefDB

Request

Physicist(many)

Production Coordinator

(one)

Assignment

ProductionOperator(many)

Workflow Planner *

RUN Summary

CPU E-mail

Mail box

*IMPALA, McRunjob, CMSProd

Page 4: RefDB: The Reference Database for CMS Monte Carlo Production Vronique Lefbure CERN  HIP CHEP 2003 - San Diego, California 25 th of March 2003

Véronique Lefébure - CHEP2003 4

Statistics

• RefDB was designed and implemented in Nov., Dec. of 2001, and is used intensively by CMS since January 2002– DAQ TDR Spring 2002 Production– 2003 Production for preparation of 2004 Data Challenge

• ~ 20 Requestors• > 20 Regional Centres, >40 Production sites, 70 Production Operators• > 2000 Requests, Assignments• > 300 Parameter Files, > 1300 Parameter Values• ~ 24 MB of MySQL data

Page 5: RefDB: The Reference Database for CMS Monte Carlo Production Vronique Lefbure CERN  HIP CHEP 2003 - San Diego, California 25 th of March 2003

Véronique Lefébure - CHEP2003 5

Physics Production Request

• Definition of an Atomic Production Request (“Derivation”):

1. Executable (“Transformation”)2. Input Physics Parameters3. Input Data and Number of Events

4. Input Production Parameters

Defined by the Physicist

Defined by the Production Coordinator

Page 6: RefDB: The Reference Database for CMS Monte Carlo Production Vronique Lefbure CERN  HIP CHEP 2003 - San Diego, California 25 th of March 2003

Véronique Lefébure - CHEP2003 6

Physics Production Request1. Executable

• Selected according to – Software Name – Software Version – Executable Name

(eg: “ORCA ORCA_7_1_1 writeAllDigis”)• Binaries, distributed with DAR* tool• Based on tagged code (CVS, SCRAM)• but private code may be supported (system for loading and archiving code)• I/O File-Type constraints• Monitoring Schema and Algorithm (can be used by BOSS**)

* DAR: “Distribution After Release” (http://computing.fnal.gov/cms/natasha/DAR)** BOSS: “Batch Object Submission System” (http://www.bo.infn.it/cms/computing/BOSS)

Page 7: RefDB: The Reference Database for CMS Monte Carlo Production Vronique Lefbure CERN  HIP CHEP 2003 - San Diego, California 25 th of March 2003

Véronique Lefébure - CHEP2003 7

Tables:Software & Executable

SoftwareName, Version, Dates

SoftwareTypeName

SoftwareMapDARFileName, Dates, Status

DarFileElement

ExecutableName, Package

ExecutableUse

FileTypeName

MonitoringDefinitionSchema, Algorithms

ProductionStepName, Shortname

Distribution

Web forms

in out

Page 8: RefDB: The Reference Database for CMS Monte Carlo Production Vronique Lefbure CERN  HIP CHEP 2003 - San Diego, California 25 th of March 2003

Véronique Lefébure - CHEP2003 8

Tables: Monitoring

MonitoringBlockRegular Expression,Piece of code

MonitoringDefinition

MonitoringProcess

MonitoringProcessType

MonitoringSchema

pre

runpost

ProductionStep

MonitoringObjectName, Type, Description

Page 9: RefDB: The Reference Database for CMS Monte Carlo Production Vronique Lefbure CERN  HIP CHEP 2003 - San Diego, California 25 th of March 2003

Véronique Lefébure - CHEP2003 9

Physics Production Request2. Physics Parameters

• Input Parameter File is made of 1 File Fragment(s):– Modularity:

• Detector parameters• Beam-luminosity parameters, …

• Parameter File Fragment: list of (Name,Value) pairs for each parameter– Specialised scripts for file formatting– Uniqueness checked

• Single Parameter and its Value:– selected by the Physicist – or new parameter and/or new Value entered by him/her

Page 10: RefDB: The Reference Database for CMS Monte Carlo Production Vronique Lefbure CERN  HIP CHEP 2003 - San Diego, California 25 th of March 2003

Véronique Lefébure - CHEP2003 10

Tables: Input Parameters

ParameterName, Description

ParameterFileListOfParameterValues, Location, URL

ParameterTypeName

ParameterValueValue, Description

SoftwareTypeName

ParameterMap

Web forms

Page 11: RefDB: The Reference Database for CMS Monte Carlo Production Vronique Lefbure CERN  HIP CHEP 2003 - San Diego, California 25 th of March 2003

Véronique Lefébure - CHEP2003 11

Physics Production Request3. Input Data

• Number of Events to be produced or processed• Input Data:

– Selection of Logical Name of Input Data Collection (Real or Virtual Data)

• Type checked

or– Definition of the Name of a new Dataset

• Uniqueness checked

Page 12: RefDB: The Reference Database for CMS Monte Carlo Production Vronique Lefbure CERN  HIP CHEP 2003 - San Diego, California 25 th of March 2003

Véronique Lefébure - CHEP2003 12

Datasets and Collections

• Dataset – Physics Channel: primary interactions– Detector Configuration (geometry, material, magnetic field)

• Collection– For

• Particle tracking through detector • Track reconstruction • Physics reconstruction

– one can change• Software • Software versions• Parameters

• 1 Dataset - Many Collections (re-processing, beam luminosities, filtering, cloning and adding new objects, analysis ntuples, …)

Production Cycle

Page 13: RefDB: The Reference Database for CMS Monte Carlo Production Vronique Lefbure CERN  HIP CHEP 2003 - San Diego, California 25 th of March 2003

Véronique Lefébure - CHEP2003 13

Tables: Dataset & Collection

DatasetName, Description, Validity, Date,Cross-section, NbOfEvents

DataType

DatasetMap CollectionDatasetName, CollectionNameStatus, NbOfEvents

GeometrySoftware

Executable

ParameterFile

OwnerName

ProductionCycleCalo/Tk/MuDigis(on/off)Name PUCondition

Input Collection

Page 14: RefDB: The Reference Database for CMS Monte Carlo Production Vronique Lefbure CERN  HIP CHEP 2003 - San Diego, California 25 th of March 2003

Véronique Lefébure - CHEP2003 14

Tables: Pile-Up Conditions

Dataset

DataType

DatasetMap

Collection

ParameterFile

PUConditionName

“Minimum Bias”

Page 15: RefDB: The Reference Database for CMS Monte Carlo Production Vronique Lefbure CERN  HIP CHEP 2003 - San Diego, California 25 th of March 2003

Véronique Lefébure - CHEP2003 15

Physics Production Request4. Production Parameters

• Data Clustering• Commit Interval• Monitoring• JobSplitting Placeholders in Parameter file:

– for defining • Output file names• input/output run numbers, random number seeds, ….

– overwritten by • the php script that gives access the to the Parameter file• the workflow planner, with values defined by RefDB

Job decomposition defined either – by granularity of input data (runs) or – by adequate Nb of Events per Run for a reasonable job CPU time and

output data size

Page 16: RefDB: The Reference Database for CMS Monte Carlo Production Vronique Lefbure CERN  HIP CHEP 2003 - San Diego, California 25 th of March 2003

Véronique Lefébure - CHEP2003 16

Physics Production Request:Procedure

• All steps via web-forms• Pre-registered “Requestors” for each Physics Group: .htaccess permissions• Creation of Parameter File(s) or selection of existing ones• Request web-form starting from any point in the production chain:

atomic or chain requests– Selection of Identity (Name, Group)– Selection of Software, Version , Executable– Selection of Parameter file(s)– Selection of Input Collection or Definition of Dataset Name + Description

for new Physics Channels– Uniqueness of Request checked

• Email notification to Requestor, Group Coordinator, Production Coordinator

Page 17: RefDB: The Reference Database for CMS Monte Carlo Production Vronique Lefbure CERN  HIP CHEP 2003 - San Diego, California 25 th of March 2003

Véronique Lefébure - CHEP2003 17

Production Assignments• Assignment of (slices of) Requests to Regional Centres• Assignment centrally created by the Production Coordinator

– Minimize file transfers– Local physics interest– Farm performance and status, function of time– Local manpower availability, function of time– Priority of request

• RC = 1 farm or many farms or Grid– Assignments can be re-assigned by local

production coordinator to local production sites• Assignment Status updated quasi online

– Job Monitoring: log file parsed, summary sent by email– Estimation of local and global production rate

• AssignmentID = key for Production Instructions

Véronique Lefébure - CHEP2003 16

1.2 seconds p

er event, 2

months

2x1033PU4 million events

April 12th June 6th

Page 18: RefDB: The Reference Database for CMS Monte Carlo Production Vronique Lefbure CERN  HIP CHEP 2003 - San Diego, California 25 th of March 2003

Véronique Lefébure - CHEP2003 18

Tables: Request & Assignment

AssignmentDates (assignment,Start, End)Status, NbOfEvents (assigned, produced)NbOfEventsperRun, ChainAssignmentMasterCopyLocation

RegionalCentreName. NickName, HostRCMotherRCID, Dates (start, end)

RequestDates (request, delivery), NbOfEvents(requested, produced)NtupleOnly, Status

Person

PersonType

PersonMap

PhysicsGroup

Dataset CollectionProductionCycle

input output

MonitoringDefinition

Page 19: RefDB: The Reference Database for CMS Monte Carlo Production Vronique Lefbure CERN  HIP CHEP 2003 - San Diego, California 25 th of March 2003

Véronique Lefébure - CHEP2003 19

Production Instructions• Production Instructions:

– Executable Name, Software, Version– Parameter File URL– Job Splitting Instructions URL

• Table of Placeholders versus Values– Monitoring Instructions URL

• Parsing script for email summary• Parsing scripts and schema for BOSS (optional)

– URL for Geometry File or META files,i.e. Detector Configuration (pre-created)

– Dataset Name, Production Cycle• NB: Workflow-planner knows which output files to be saved• Chain Assignments:

for running sequentially several executables in one job

Page 20: RefDB: The Reference Database for CMS Monte Carlo Production Vronique Lefbure CERN  HIP CHEP 2003 - San Diego, California 25 th of March 2003

Véronique Lefébure - CHEP2003 20

Production Book-Keeping

• one Table per Dataset, one Row per Generation Run• for each Production Cycle:

– Run Number– Seeds– (Cross-section)– LFN– Status– Assignment ID– Number of input Events– Number of output Events

• Monitored values sent by email at end of successful jobs

Page 21: RefDB: The Reference Database for CMS Monte Carlo Production Vronique Lefbure CERN  HIP CHEP 2003 - San Diego, California 25 th of March 2003

Véronique Lefébure - CHEP2003 21

Data Catalogue

• RefDB Tables:– List of Catalogues

• Objectivity/DB, POOL• disk or tapes

– Catalogue – Publication Site Map– Catalogue – Collection Map

• Completeness checking• Scripts for Dataset queries

Page 22: RefDB: The Reference Database for CMS Monte Carlo Production Vronique Lefbure CERN  HIP CHEP 2003 - San Diego, California 25 th of March 2003

Véronique Lefébure - CHEP2003 22

Prospects

• Local installation of RefDB for “private” productions• Extend I/O file-type checking to Software compatibility

Page 23: RefDB: The Reference Database for CMS Monte Carlo Production Vronique Lefbure CERN  HIP CHEP 2003 - San Diego, California 25 th of March 2003

Véronique Lefébure - CHEP2003 23

Software Executable

ExecutableUse

FileType

ProductionStep

MonitoringBlock

MonitoringDefinition

MonitoringProcess

MonitoringSchema MonitoringObject

Parameter

ParameterFile ParameterValue

Dataset

CollectionGeometry

ProductionCycle PUCondition

Assignment

RegionalCentre

Request

PersonPhysicsGroup