metadata services, i/o, and persistence

Metadata Services, I/O, and Persistence

David [email protected] National Laboratory

20 February 2012

mailto:[email protected]

David Malon

Metadata and metadata services

Components in support of metadata grew organically Use cases that motivated current Athena metadata service architecture were

related to luminosity and cross section calculation There IS a current metadata service architecture; it is not just a bricolage

– BUT its design and implementation were highly constrained by (legacy) Gaudi/Athena constraints

We have an opportunity to rethink this – And we may need to in any case

20 February 2012

David Malon

Metadata and the “objects” with which they are associated

Recall that our model is to process event collections – In some senses, files are incidental – The collection of events that happen to reside in this file, or in this list of files– The collection of events pointed to by these TAGs – The collection of events coming via this pipe – <multi-process extensions—the collection of events coming from this source …>– …

Metadata are most often associated with collections of events– The lumi range from which they were selected– Conditions for events in this run

We need to support this, and maintain/retain/propagate the associations

20 February 2012

David Malon

Current metadata “flow”

Events like opening a new input file or opening a new TAG collection are asynchronous to Gaudi/Athena global state transitions

Currently use incidents, therefore, to notify listeners when these things happen Input metadata services make input metadata objects available/retrievable in a

transient input metadata store Listeners can then do what they want

– Check which new lumi blocks are being processed, for example Listeners typically accumulate data read from input metadata

– E.g., build a list of all lumi blocks used as input At some point, they write this to an output metadata store Currently, output metadata are written as via a shadow stream, the properties of

which mirror those of an output event data stream – Same filename, for example

But the shadow stream “writes” from output metadata store at finalize rather than at the end of each event

20 February 2012

David Malon

(continued)

An outstream architecture question Should we really be using separate streams, relying upon job configuration to keep

them consistent? Should we consider an approach in which an outstream can have multiple itemlists

with different “write at” policies, and coming from different stores?

20 February 2012

David Malon

Incidents make sense, since the arrival of new event collections or the start of new files is asynchronous to Gaudi/Athena concepts of state

BUT: can we really do this in multiprocess, multithread environments? – Maybe, at least in single-reader architectures, but …

It would be helpful to have a clear model for incident handling and messaging and error handling in such environments

20 February 2012

David Malon

Peeking

We use peeking into input files extensively, principally (I believe) for job configuration purposes

At multiple stages:– Before Athena starts, to set Athena job options– After Athena starts, while components are initializing

• E.g., to determine correct conditions Before we had in-file metadata, this often involved peeking at data in the first

event Now this is less common—more can be determined from in-file metadata than in

the past—but it has not gone away entirely And this is not entirely robust, e.g., when one is skipping events or doing direct

navigation to selected events Can we (should we) put first-event metadata in in-file metadata, so one never

needs to peek at the first event?

20 February 2012

David Malon

Peeking

Isn’t it true that, in general, when a typical dataset is used as input, the job configuration should be the same for all files in the dataset?

Shouldn’t we therefore be able to figure out how to configure the jobs from dataset-level metadata alone, without peeking into the data files?

And mightn’t this be more efficient as well, if, say, at the task-to-job stage, the grid could already configure the jobs?

What do we need to do to make this possible? Can we provide a means for jobs to access task-level or input-dataset-level external

metadata?– Right now a job can’t even discover the name of the input dataset

• Though the Event Selector knows the name of its input file

20 February 2012

David Malon

We’re starting to peek at output files, too Mainly for event counting Should we be emitting metadata instead? (Pros and cons) Should we be worried about building too many technology dependencies into our

metadata peeking tools?

20 February 2012

David Malon

Metadata output

Jobs return metadata today– But where it goes is … complicated.

This began as metadata.xml files following POOL file catalog DTD. Began as a way to record the files that were written by the job, and their GUIDs. When additional metadata were needed, DTD constrained us to writing per-file

free metadata strings– Often the same metadata for all jobs in the task– Often the same metadata for all files produced by a given job

Alvin Tan worked to improve this in transform infrastructure, but … Tier 0 moved some of this to jobReport (pickle) files

20 February 2012

David Malon

Extensible output metadata?

Should Athena be able to emit a metadatum for return by the job? It turns out that there is a hack in place that makes this possible

– Specific string pattern to look for in grepping log files Shouldn’t there be a service for this? Separately, when Ilija needed to emit performance statistics and get them to a

database, he developed his own machinery—writing to a special file, and not via a general service—and provided his own post-processing to get the information into AMI

Should we try to think about this problem more generally? – Or is performance metadata a unique use case, with no others foreseen?

20 February 2012

David Malon

Metadata merging

Merging metadata is often harder than merging event data– Merging event data may require no semantic knowledge—just a larger “array” of events

• Like chaining TTrees with the same structure • Yeah, it’s not quite that simple, but you know what I mean

Merging metadata may require semantic knowledge – Summing event counts is a trivial example– Merging lumiblock ranges is a bit harder, and deciding whether lumiblocks are complete

may be harder still– And so on

We do “hybrid” merging now—but what might we do differently, knowing that metadata will often eventually be merged?

Look at ROOT’s (new) type-specific support for merging?

20 February 2012

David Malon

Bytestream metadata

In-file metadata is different in bytestream – Header information, plus free metadata strings

What do we need to do to make this more coherent with other in-file metadata and in-file metadata architecture?

20 February 2012

David Malon

Metadata in downstream data products

Metadata has been gradually added to products downstream of AOD – D3PD

A bit ad hoc sometimes – And work continues here in PAT venues and elsewhere

Can we make our metadata storage and retrieval strategy and components more coherent?

20 February 2012

David Malon

Miscellany

An asymmetry: we can read in-file metadata from TAG files and process it “correctly”– Example: query a range of runs and lumi blocks within those runs, select events from

only some of them, but retain the list of queried {run #, LB#} for cross-section calculation – But can we write in-file metadata into TAG files from Athena? – We can write in-TAG-file metadata from Oracle, and from specific event collection

utilities, but …

20 February 2012

metadata services, i/o, and persistence

Documents