Download - Migration of ATLAS PanDA to CERN
Graeme Stewart: ATLAS Computing 1
Migration of ATLAS PanDA
to CERN
Graeme Stewart, Alexei Klimentov, Birger Koblitz, Massimo Lamanna, Tadashi Maeno, Pavel Nevski, Marcin Nowak, Pedro Salgao, Torre Wenus, Mikhail
Titov
Graeme Stewart: ATLAS Computing
Outline
PanDA Review
PanDA History
PanDA Architecture
First steps of Migration to CERN
Infrastructure Setup
PanDA Monitor
Task Request Database
Second Phase Migration
PanDA Server and Bamboo
Database bombshells
Migration, Tuning and Tweaks
Conclusions
2
Graeme Stewart: ATLAS Computing
PanDA Recent History
PanDA was developed by
US ATLAS in 2005 Became the executor of all
ATLAS production in EGEE
during 2008 March 2009: executes
production for ATLAS in
NDGF as well using ARC
Control Tower (aCT) As PanDA had become
central to ATLAS operations
it was decided in late 2008
to re-locate it to CERN
35k simultaneous running jobs 150k jobs per day finished
Graeme Stewart: ATLAS Computing
PanDA Server Architecture
PanDA (Production and
Distributed Analysis) is
a pilot job system Executes jobs from the
ATLAS production
system and from users Brokers jobs to sites
based on available
compute resource and
data Can move and
stage data if
necessary Triggers data
movement back to
Tier-1s for dataset
aggregation
Panda ServerPanda Server
ATLAS ProdDBATLAS ProdDB
Bamboo
Panda Databases
Panda Databases
Panda Client
Panda Monitor
Pilot FactoryPilot FactoryComputing SiteComputing SitePilots
Pilots get jobs
Graeme Stewart: ATLAS Computing
PanDA Monitor
PanDA Monitor is the web interface
to the panda system Provides summaries of
processing per cloud/site Drill down to individual job
logs And directly view logfiles
Task status Also provides a web interface to
request actions from the system Task requests Dataset Subscriptions
Graeme Stewart: ATLAS Computing
Task Request Database
Task request interface is hosted as part of the panda monitor Allows physicists do define MC production task
Backend database exists separately from rest of panda Prime candidate for migration from MySQL at BNL to Oracle at
CERN
AKTR MySQL
ProdDBOracle
PandaDB MySQL
AKTR Oracle
ProdDBOracle
PandaDB MySQL
Graeme Stewart: ATLAS Computing
Migration – Phase 1
Target was migration of task request database and panda monitor
First step was to prepare infrastructure for services: 3 server class machines to host
panda monitors Dual CPU, Quad Core Intel E5410
CPUs 16GB RAM 500GB HDD
Setup as much as possible as standard machines supported by CERN FIO Quattor templates Lemon monitoring Alarms for host problems
Utilise CERN Arbitrating DNS to balance load across all machines Picks the 2 ‘best’ machines of 3 with
a configurable metric
Also migrated to the ATLAS standard python environment Python 2.5, 64 bit
Graeme Stewart: ATLAS Computing
Parallel Monitors
Panda was always architected to have multiple stateless monitors Each monitor queries the backend database to retrieve user requested
information and display it Thus setting up a parallel monitor infrastructure at CERN was relatively easy
Once external dependencies were sorted ATLAS Distributed Data Management (DDM) Grid User Interface tools
This was deployed at the beginning of December 2008
DB
Graeme Stewart: ATLAS Computing
Task Request Database
First real step was to migrate the TR DB between MySQL and Oracle This is not quite as trivial as one first imagines Each database supports some non-standard SQL features
And these are not entirely compatible
Optimising databases is quite specific to the database engine First attempts ran into trouble
MySQL dump from BNL to CERN resulted in connections being
dropped Had to dump data at BNL and scp to CERN
Schema required some cleaning up Dropped unused tables Removing null constraints, CLOB->VARCHAR, resizing some text
fields
However, after a couple of trial migrations we were confident that
data could be migrated in just a couple of hours
Graeme Stewart: ATLAS Computing
Migration
Migration occurred on Monday December 8th
Database data was migrated in a couple of hours Two days were then used to iron out any glitches
In the Task Request interfaces In the scripts which manage the Task Request to ProdDB
interface Could this all have been prepared in advance?
In theory yes, but we are migrating a live system So there only a limited amount of test data which can be
inserted into the system Real tasks trigger real jobs
System was live again and accepting task requests on Wednesday Latency of tasks in the production system is usually several
days, even for short tasks Acceptable to the community
Graeme Stewart: ATLAS Computing
A Tale of Two Infrastructures
New panda monitor setup required DB plugins to talk to both
MySQL and to Oracle The MySQLdb module is bog standard The cx_oracle module much less so
In addition Python 2.4 was the supported infrastructure at BNL as
opposed to Python 2.5 at CERN
This meant after the TR migration the BNL monitors started to have
a more limited functionality This had definitely not been in the plan!
MySQL DB
Oracle DB
Graeme Stewart: ATLAS Computing
PanDA Servers
Some preliminary work on the panda server has been done already in
2008 However much still required to be done to migrate the full suite of
panda server databases: PandaDB – holds live job information and status (‘fast buffer’) LogDB – holds pilot logfile extracts MetaDB – holds panda scheduler information on sites and queues ArchiveDB – ultimate resting place of any panda job (big!)
For most databases the data volume was minimal and the main work
was in the schema details Including the setup of Oracle triggers
For the infrastructure side we copied the BNL setup, with multiple
panda servers running on the same machines as the monitors We knew the load was low and the machines were capable
We also required one server component which interfaces between the
panda servers and ProdDB, bamboo Same machine template worked fine
Graeme Stewart: ATLAS Computing
ArchiveDB
In MySQL, because of constraints on the table performance vs. size
an explicit partitioning had been adopted One ArchiveDB table for every two months of jobs
Jan_Feb_2007 Mar_Apr_2007 … Jan_Feb_2009
In Oracle internal partitioning is supported: CREATE TABLE jobs_archived (<list of columns>) PARTITION BY
RANGE(MODIFICATIONTIME) ( PARTITION jobs_archived_jan_2006 VALUES
LESS THAN (TO_DATE('01-JAN-2006','DD-MON-YYYY')), PARTITION
jobs_archived_feb_2006 VALUES LESS THAN (TO_DATE('01-MAR-2006','DD-
MON-YYYY')), PARTITION jobs_archived_mar_2006 VALUES LESS THAN
(TO_DATE('01-APR-2006','DD-MON-YYYY')), …
This allows for considerable simplification of the client code in
the panda monitor
Graeme Stewart: ATLAS Computing
Integrate, Integrate, …
By late February trial migrations of the databases had happened to
integration databases hosted at CERN (the INTR database) Trail jobs had been run through the panda server, proving basic
functionality Decision now had to be made on final migration strategy
This could be ‘big bang’ (move the whole system at once) or
‘inflation’ (gradually migrate clouds one by one) Big bang would be easier for, e.g., panda monitor
But would carry greater risks – suddenly loading the system with
35k running jobs was unwise If things went very wrong it might leave us with a big mess to
recover from
External constraint was the start of the ATLAS cosmics re-
reprocessing campaign due to start 9th March We decided to migrate piecemeal
Graeme Stewart: ATLAS Computing
Final Preparations
In fact PanDA did have two heads already IT and CERN clouds had been run from a parallel MySQL setup
from early 2008 This was an expensive infrastructure to maintain as it did not tap
into CERN IT supported services
It was obvious that migrating these two clouds would be a
natural first step Plans were made to migrate to the ATLAS production database
at CERN (aka ATLR) Things seemed to be under control a few days before…
Graeme Stewart: ATLAS Computing
DBAs
Friday before we were due to migrate CERN DBAs asked us not to
do so They were worried that not enough testing of the Oracle setup
in INTR has been done This triggered a somewhat frantic weekend of work, resulting in
several thousand jobs being run through the CERN and IT clouds
using the INTR databases From our side this testing looked to be successful
However, we reached a subsequent compromise that We would migrate the CERN and IT clouds to panda running against
the INTR They would start backups on the INTR database giving us the
confidence to run production for ATLAS though this setup Subsequent migration from INTR to ATLR could be achieved much
more rapidly as the data was already in the correct Oracle formats
Graeme Stewart: ATLAS Computing
Tuning and Tweaking
Migration of PandaDB, LogDB, MetaDB was very quick There was one unexpected piece of client code which hung
during the migration process (polling of CERN MySQL servers) Migration and index building of ArchiveDB was far slower
However, we disabled access to ArchiveDB and could bring the
system up live within half a day Since then a number of small improvements in the panda code have
been made to help optimise use of oracle Connections are much more expensive in Oracle than in MySQL
Restructure code to use a connection pool Create common reader and writer accounts for access to all database
schemas from the one connection
Migration away from triggers to .nextval() syntax Despite fears, migration of panda server to oracle has been relatively
painless and been achieved without significant loss of capacity
Graeme Stewart: ATLAS Computing
Cloud Migration
Initial migration was for CERN and IT clouds We added NG, the new Nordugrid cloud, which was from a standing
start We added DE after a major intervention in which the cloud was taken
offline Similarly TW will come up in the CERN Oracle instance
UK was the interesting case where we migrated a cloud live: Switched bamboo instance to send jobs to CERN Oracle servers
Current jobs are left being handled by old bamboo and servers
Start sending pilots to UK asking from jobs from CERN Oracle
servers Force the failure of jobs not yet started in the old instance
These return to prodDB and then are picked up again by panda using
the new bamboo
Old running jobs are handled correctly by the ‘old’ system There will be a subsequent re-merge into the CERN ArchiveDB
Graeme Stewart: ATLAS Computing
Monitor Blues
A number of problems did arise in the new monitor setup required
for the migrated clouds Coincident with the migration there was a repository change
from CVS to SVN However, the MySQL monitor was deployed from CVS and the
Oracle monitor from SVN This lead to a number of accidents and minor confusions which it
took a while to recover from
New security features cause some loss of functionality at times
as it was hard to check all the use cases And the repository problems augmented this
However, these are now mostly resolved issues and ultimely the
system will in fact become simpler
Graeme Stewart: ATLAS Computing
Conclusions
Migration of the panda infrastructure from BNL to CERN has underlined how
difficult the transition of a large scale, live, distributed computing system is A very pragmatic approach was adopted in order to get the migration done
in a reasonable time Although it always takes longer then you think
(This is true even when you try and factor in knowledge of the above)
Much has been achieved Monitor and task request database fully migrated CERN Panda server infrastructure moved to Oracle
Now running 5(6) of the 11 ATLAS clouds: CERN, DE, IT, NG, UK, (TW)
Remaining migration steps are now a matter of scaling and simplifying We learned a lot
Love your DBAs, of course If we have to do this again, now we know how
But there is still considerable work to do Mainly in improving service stability, monitoring and support
proceedures