irina sourikova brookhaven national laboratory for the phenix collaboration migrating phenix...
TRANSCRIPT
Irina Sourikova
Brookhaven National Laboratory for the
PHENIX collaboration
Migrating PHENIX databases from object to relational model
Sept 27 CHEP’04 Interlaken, CH Irina Sourikova 2
Introduction
PHENIX is one of two large experiments at RHIC, produces hundreds of TB of data per year
In four years of running PHENIX accumulated tens of GB of calibration ( condition) data which used to get archived in Objectivity database
For a variety of reasons ( licensing and compiler issues among them ) the decision was made to change the underlying storage technology and use open source RDB instead of proprietary OODB
Main constraints - avoid any downtime for production and provide backward compatibility by migrating old Objectivity-based calibration data to RDB of choice
Sept 27 CHEP’04 Interlaken, CH Irina Sourikova 3
Where do we store data if not in Objy?
One option is to store metadata in RDB and data in flat files ( STAR )
Another option is to store calibrations in BLOBs ( Binary Large Objects ). PHOBOS keeps its calibration data in BLOBs in Oracle
Data consistency, data replication and performance considerations led us to the decision to store calibration data, not only metadata in the database
PostgreSQL was chosen as RDBMS
Sept 27 CHEP’04 Interlaken, CH Irina Sourikova 4
What’s involved in the database transition
Design a relational schema that supports our data and queries on it.
Migrate large amounts of old Objectivity-based data to a new DB. That requires I/O from objects in memory to tables in RDB
Preserve the existing API by providing a new implementation
Sept 27 CHEP’04 Interlaken, CH Irina Sourikova 5
Calibration data
Our calibrations differ widely in shape and size but have the same structure - they are arrays ( “banks” ) of individual channels
Example: a lookup table for slewing corrections for a PMT in ZDC can be a channel
A bank is a unit of information which is stored and retrieved based on validity ranges. For example all PMTs in ZDC form a bank
PHENIX ZDC
Sept 27 CHEP’04 Interlaken, CH Irina Sourikova 6
Relational schema
Why this doesn’t work for us: RDBs have limit on the number of columns, large size of some channels makes this approach problematicOne possibility could be to use PostgreSQL array type to store a bank, but array implementation is not optimized for big array size. Moreover other RDBs do not support array typeI/O is still a problem
Most direct approach - map channel data members to columns in a tableMakes data transparent, suitable for Web display
Sept 27 CHEP’04 Interlaken, CH Irina Sourikova 7
BLOBs
Another approach is to store calibration banks in BLOBs ( Binary Large Objects ) and calibration metadata as simple types
Solves I/O problem - ROOT I/O can be used to serialize banks into BLOBs and RDBC ( ROOT DataBase Connectivity ) to send BLOBs to the database
Makes rewriting of calibration DB interface easy Allows fast index-based calibration retrieval The only thing we lose is “transparency”, Web display
Sept 27 CHEP’04 Interlaken, CH Irina Sourikova 8
Final relational schema
Decided to proceed with BLOBs Each Objy db mapped into relational db table All tables have the same schema:
Each object in Objy container mapped to a row in a table
Each calibration header data member mapped to a column in a table
Metadata BLOBptr
Sept 27 CHEP’04 Interlaken, CH Irina Sourikova 9
Software layers Couple of months spent on installing
and testing new software After fixing few bugs adopted the
following: RDBC - talks to RDBs from ROOT
libodbc++ - c++ library for accessing RDBs, runs on top of ODBC, simplifies the code
unixODBC - free ODBC interface
psqlodbc - official PostgrSQL ODBC driver
RDBC
libodbc++
unixODBC
psqlodbc
DB
PhenixDB API
User application
Sept 27 CHEP’04 Interlaken, CH Irina Sourikova 10
New calibration API implementation
Top calibration abstract base class inherits from Tobject to use RDBC method SetObject(int, TObject *)
A ClassDef macro added to calibration headers to equip calibration classes with streamers
New calibration DB API was made ODBC-compliant to ease possible future technology changes
Data migration code was written by a perl script with Objy db name and calibration class name as arguments
One new method introduced to benefit from finer commit granularity available in RDBs
Sept 27 CHEP’04 Interlaken, CH Irina Sourikova 11
Old data transfer
A clone of Objy federation was made and its schema evolved to reflect a change in the inheritance schema ( all calibration classes got Tobject as a parent )
A CVS branch was created for the code development with new replica Objy federation
About 13 GB of old data were transferred from Objy to Postgres which took a few days
Sept 27 CHEP’04 Interlaken, CH Irina Sourikova 12
Validating new framework
Validating the new framework took a lot of time due to very active code development and Objectivity updates
Non-atomic CVS operations ( tagging the code ) added to the complexity of comparing reconstruction output in old and new frameworks
After byte-by-byte comparisons Postgres-based calibrations are now used in production
Sept 27 CHEP’04 Interlaken, CH Irina Sourikova 13
Database replication
Due to PostgreSQL source code availability and ease of administration is was not very hard to install local database servers in 6 off-site institutions and make them slave databases
This was possible without synchronizing compiler versions and paying license fees
PHENIX can run reconstruction and simulations at more sites than before
Sept 27 CHEP’04 Interlaken, CH Irina Sourikova 14
Summary
Objectivity/DB is not used in PHENIX production since July 2004
Transition from Objectivity to Postgres was relatively transparent to the Collaboration, took about 1 year of 1 FTE
New adopted software saved code development time, but now we must pay a maintenance price
Web display with BLOBs requires more work, but possible
Many thanks to Laurent Aphecetche, Saskia Mioduszewski, Chris Pinkenburg and Martin Purschke for their help