hybrid event store

38
David Adams ATLAS Hybrid Event Store David Adams BNL March 7, 2002 ATLAS Software Week Database Session

Upload: ginger-thornton

Post on 03-Jan-2016

21 views

Category:

Documents


0 download

DESCRIPTION

Hybrid Event Store. ATLAS Software Week Database Session. David Adams BNL March 7, 2002. Contents. What does hybrid mean? Files and their contents File Event data object (EDO) Object ID Event Placement category (PC) File, PC and EDO associations File interface. Catalogs - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Hybrid Event Store

David Adams

ATLAS

Hybrid Event Store

David Adams

BNL

March 7, 2002

ATLAS Software Week

Database Session

Page 2: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 2

David Adams

ATLAS

Contents• What does hybrid mean?

• Files and their contents

–File

–Event data object (EDO)

–Object ID

–Event

–Placement category (PC)

–File, PC and EDO associations

–File interface

• Catalogs

• Reading and writing

–Input stream

–Output stream

–Store view

• HES components

• Tasks and schedule

Page 3: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 3

David Adams

ATLAS

What does “hybrid” mean?Hybrid merges

• Files that manage event data objects (EDO’s) and references to EDO’s with

• Relational DB’s used to catalog the files and EDO’s.

Files are self-describing• The data in a file can be traversed without

consulting any file catalog.• References between objects in files can be

resolved without consulting file catalogs.

Page 4: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 4

David Adams

ATLAS

FileFile type

• HES supports files of different types (formats).• Each file type is responsible for providing its

own means to write and read data.

File replica• A file replica contains the same data as the

original file.• The replica may be a simple bitwise copy or• It may be of a type different from the original.

Page 5: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 5

David Adams

ATLAS

File (cont)File ID and names

• Each file carries– A unique ID

– A unique logical name

– A locally unique physical name

• A file replica carries the ID and logical name of its original.

• Normally any replica may be used in place of the original file.

Page 6: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 6

David Adams

ATLAS

Event data object (EDO)What is an EDO?

• Collection of data associated with a particular beam crossing (event ID)

• Typically a homogenous collection, e.g. tracks, jets or electrons

EDO’s in HES• HES doesn’t care what is in an EDO.• HES provides an interface for file types that

– write transient EDO’s to files and

– read them back in from files.

Page 7: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 7

David Adams

ATLAS

EDO (cont)EDO ID

• Each EDO is assigned a unique ID.• The ID specifies the:

– ID of the file that owns the EDO

– Event ID

– EDO type (and version?)

– String key

• Any type-key may appear no more than once for any event ID in any file.

• An EDO is retrieved from a file with its ID.

Page 8: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 8

David Adams

ATLAS

EDO (cont)EDO reference

• A file holds an EDO by reference if it holds a ID for that EDO but does not hold the data.

• The file owning an EDO must hold that EDO by value, not just by reference.

Page 9: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 9

David Adams

ATLAS

EDO (cont)EDO replica

• A file which does not own an EDO may hold a replica.

– The replica has a copy of the EDO data and may be used in place of the original.

– The replica carries the same ID as the original.

• A reference to an EDO may be satisfied by the file owning the EDO or any file carrying a replica.

Page 10: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 10

David Adams

ATLAS

Object IDRequirement

• EDO’s contain objects• Objects in one EDO need to reference those in

another EDO– Pointer or reference in the transient world

Solution• HES defines an object ID:

– ID of the EDO holding the referenced object plus

– Index indicating the location of the referenced object in its EDO

Page 11: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 11

David Adams

ATLAS

Object ID (cont)Size considerations

• The EDO ID carries a lot of information and is fairly large (~200 bytes).

• There may be very many object ID’s.• EDO’s in files may store a small EDO index in

place of the EDO ID.– The index is only valid in the context of the EDO.

– Probably 8 bits to allow 512 referenced EDO’s.

– Converted to full EDO ID when the object is converted to transient form.

Page 12: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 12

David Adams

ATLAS

EventEvents in HES files

• Each file holds data for a specified collection of event ID’s.

• Each EDO in a file is associated with exactly one event ID.

– Add type and key to specify an EDO..

• “Event holds an EDO” means the EDO is associated with the ID of that event.

– There are no event objects.

Page 13: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 13

David Adams

ATLAS

Placement categoryPC’s in events

• Each EDO in a file (by value or reference) is associated with a named placement category (PC) in an event.

– This is hint to the file that EDO’s in the same PC are likely to be accessed together.

– Files can share data at the level of a PC.

• Each event in a file is associated with (“holds”) the same collection of PC names (types)

Page 14: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 14

David Adams

ATLAS

Placement Category (cont)PC’s in events (cont)

• Each PC holds a collection of EDO ID’s indexed by type-key.

– File may choose to organize EDO data by PC.

• The union of these type-keys or (EDO ID’s) for all PC’s in an event constitutes the view of the event for that file.

– Users may restrict this view to a subset of the PC’s.

Page 15: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 15

David Adams

ATLAS

Placement Category (cont)PC type

• Each PC is an instance of a PC type• The type defines

– the PC name and

– the allowed type-keys > (the type-keys in the PC will be a subset of these)

• The file holds the definition of all types that appear in that file

• Each event “holds” one PC of each type

Page 16: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 16

David Adams

ATLAS

Placement Category (cont)Sharing categories

• The ATLAS DB architecture design distinguishes between “placement categories” and “sharing categories”.

• We have merged the two into PC.– This was agreed to at an ANL meeting last October

and no objections were raised there or subsequently.

– We will go back and make this separation in HES if the need arises.

Page 17: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 17

David Adams

ATLAS

Placement Category (cont)PC references

• Any PC in an event in a file may be replaced with a PC reference.

– The referenced PC has the same name and event ID as the PC reference.

– The referenced PC must be held by value

• The file holding the referenced PC must be accessed to construct the view of the event

– (To know which type-keys are included.)

• Reference may only be satisfied in the original file (no PC replicas).

Page 18: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 18

David Adams

ATLAS

File, PC and EDO associations

The following figure illustrates some allowed associations between files, PC’s and EDO’s.

• The first event in the first file holds all EDO’s by value.

• The second file holds only references.– The first PC holds an EDO by reference.

– The second PC is held by reference.

– The EDO reference in the second event may be satisfied by the original EDO in the third file or its replica in the first.

Page 19: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 19

David Adams

ATLAS

File, PC and EDO associations (cont)F ile f1

Event e1

P C p1

P C p2

P C p3

ED O 1 ED O 2

ED O 3

ED O 4 ED O 5 ED O 6

Event e2

P C p1

P C p2

P C p3

ED O 7' ED O 8'

ED 10 ED O 11 ED O 12

F ile f2

Event e1

P C p1

P C p3

Event e2

P C p1

F ile f3

Event e2

P C p1

P C p2

ED O 7 ED O 8

ED O 09

P C p3

E xa m ple o f p os s ib le a s so c ia tions be tw e en H E S file s , p la c em e nt c a tego rie s (P C 's ) and e v ent da ta o b je c ts (E D O 's ).

Page 20: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 20

David Adams

ATLAS

File interfaceThe following figure illustrates the file structure implied by the file interface.

• Ovals on the right indicate data that can be obtained from the file on the left.

– Labels on the line indicate the key required to specify the data.

• Blue indicates data which is not specific to an event.

• Yellow indicates the collection of event ID’s.• Remaining is data associated with an event ID.

Page 21: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 21

David Adams

ATLAS

File interfaceF ile

F ile typ e

P hys ic al file name

Lo gic al file name

F ile ID

S tream name

P C typ e

P C nam e

P C name

EDO typ e-key

Event ID

P C ID

Event ID, PC nam

e

P C

EDO hand le

EDO

IDPC

ID

ED O his to ry

F ile ID

Event ID

P C name

EDO IDtype-key

EDO d ata

EDO ID

ref index

EDO ID

EDO

ID

parent index

Page 22: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 22

David Adams

ATLAS

CatalogsFile location catalog

• Also known as replica catalog• Enables the user to locate the physical file(s)

corresponding to a logical file name

Logical file name

Site

Directory

Physical filename

• Table at right is crude first pass

• Expect this to be implemented in the GRID environment

Page 23: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 23

David Adams

ATLAS

Catalogs (cont)File content catalog

• Enables users to locate logical file name based on ID

• Enables users to locate logical files based on stream type, event and production

• Example at right

Logical file name

File ID

Stream type name

Min event ID

Max event ID

Job ID

Production thread ID

Production environment ID

Page 24: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 24

David Adams

ATLAS

Catalogs (cont)Stream catalog

• Specifies which placement category types are included in which stream types.

• Example at right.

PC name

EDO type

EDO key

PC catalog• Specifies which type-keys are

included in which stream types.• Example at right.

Stream type name

PC name

Page 25: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 25

David Adams

ATLAS

Catalogs (cont)EDO catalog

• Enables users to locate the file holding a particular EDO.

• Unlikely this would be created for all data but would be used for subsets such as datasets.

• Example at right.• Original EDO ID relevant

for regenerated data

EDO ID (derived?)

Original EDO ID

EDO type

EDO key

Event ID

PC name

File ID

Page 26: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 26

David Adams

ATLAS

Input streamCollection of files to define events

• All files have same stream type– Stream type = set of PC types

• Any event ID appears at most once

Placement categories• Specify which PC’s are accepted or omitted

Next event ID• Can be externally specified• Stream provides means to generate

Page 27: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 27

David Adams

ATLAS

Input stream (cont)Event

• The input data for an event in a stream includes all EDO’s in accepted PC’s for the event ID

• PC’s and PC references are taken from one file• PC and EDO references can be satisfied in a

separate collection of “reference files”– The event cannot be defined (set of type-keys

discovered) if any PC’s cannot be found

– If an EDO and any of its replicas is not found, the event is defined but the data for that EDO is inaccessible

Page 28: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 28

David Adams

ATLAS

Output streamType

• Each output stream is of a named type which specifies the included PC types

– Each event added to the stream will include one PC of each type

– Each PC type specifies the allowed type-keys

– User (see view) may choose whether or not to write an EDO of allowed type-key

Page 29: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 29

David Adams

ATLAS

Output stream (cont)Files

• Output stream includes a series of files to which data is added for each accepted event

• Stream has policies for– Deciding when a file is full and opening a new file

for the stream

– Providing ID’s and logical and physical names for these files

Page 30: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 30

David Adams

ATLAS

Store viewContents

• One or more input streams• Collection of files to be used for chasing PC

and EDO references• One or more output streams

Event selection• The view can assign the ID for the next event

– By iterating over a user-defined list or

– Asking one of its streams to make this event

Page 31: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 31

David Adams

ATLAS

Store view (cont)Reading the event

• Data extracted using the same event ID for all input streams

• The input event is defined as the union of the input events in in each stream

– No type-key may be duplicated

Page 32: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 32

David Adams

ATLAS

Store view (cont)Writing the event

• User specifies which streams are accepted for each event

• Event data is written for all accepted streams• View assigns a stream to own each new EDO

that is to be written• View has policy for deciding for each stream:

– Whether each PC is written by value or reference

– Which EDO’s are written by value

– Which EDO’s are written by reference

Page 33: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 33

David Adams

ATLAS

HES components

S to re V ie w

I n p u tS tre a m

O u tp u tS tre a m

F ile

P C

E D OED O ID

E D O I D

O b je c tI D

( re f )

Page 34: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 34

David Adams

ATLAS

Tasks and schedulePlan:

• To deliver an initial version of HES that– is sufficient to meet the needs of DC1-2

– and serves as prototype for the LCG common hybrid event store

• Attempt a design that can evolve to meet the long-term goals of both ATLAS and the LCG

• Cooperate with the LCG– to whatever extent possible in the short term

– fully in the long term

Page 35: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 35

David Adams

ATLAS

Tasks and schedule (cont)DC1-2 functionality

• HES core– Base (ID’s, PC, …)

– File interface

– Simple implementation of input and output streams

– Simple implementation of view

• Athena/StoreGate integration– See talk for EDM meeting

• ROOT storage type with HES interface• Sufficient cataloging

Page 36: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 36

David Adams

ATLAS

Tasks and schedule (cont)First release

• Deliver June 1, 2002– In time for users to test and discover any design

flaws well in advance of DC1-2

• Effort required is 20 FTE-weeks

HES 4 FTE-weeks

Athena/SG integration

7 FTE-weeks

ROOT 7 FTE-weeks

ZEBRA 2 FTE-weeks

Cataloging ?

– Plus testing

– 2X contingency implied by work thus far

Page 37: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 37

David Adams

ATLAS

Tasks and schedule (cont)Completed to date:

• Design sufficient to begin the first implementation

– See the HES page at http://www.usatlas.bnl.gov/~dladams/hybrid

• HES ID’s, placement category and file interface have been implemented (see HES page)

• ROOT persistency (but not HES interface) is far along

Page 38: Hybrid Event Store

March 7, 2002Hybrid Event Store SW week – DB session 38

David Adams

ATLAS

Tasks and schedule (cont)Resources

• BNL PAS group will is focusing on HES core and ROOT

– Outside help is welcome

• Need allocation of priority (and volunteers) to implement Athena/SG integration

– BNL can provide some of this

• Cataloging (RDB) needs to be better understood

– Again BNL would like to involved but expects to share the effort