migration of atlas panda to cern

Graeme Stewart: ATLAS Computing 1

Migration of ATLAS PanDA

to CERN

Graeme Stewart, Alexei Klimentov, Birger Koblitz, Massimo Lamanna, Tadashi Maeno, Pavel Nevski, Marcin Nowak, Pedro Salgao, Torre Wenus, Mikhail

Titov

Graeme Stewart: ATLAS Computing

Outline

PanDA Review

PanDA History

PanDA Architecture

First steps of Migration to CERN

Infrastructure Setup

PanDA Monitor

Task Request Database

Second Phase Migration

PanDA Server and Bamboo

Database bombshells

Migration, Tuning and Tweaks

Conclusions

2


PanDA Recent History

PanDA was developed by

US ATLAS in 2005 Became the executor of all

ATLAS production in EGEE

during 2008 March 2009: executes

production for ATLAS in

NDGF as well using ARC

Control Tower (aCT) As PanDA had become

central to ATLAS operations

it was decided in late 2008

to re-locate it to CERN

35k simultaneous running jobs 150k jobs per day finished


PanDA Server Architecture

PanDA (Production and

Distributed Analysis) is

a pilot job system Executes jobs from the

ATLAS production

system and from users Brokers jobs to sites

based on available

compute resource and

data Can move and

stage data if

necessary Triggers data

movement back to

Tier-1s for dataset

aggregation

Panda ServerPanda Server

ATLAS ProdDBATLAS ProdDB

Bamboo

Panda Databases

Panda Databases

Panda Client

Panda Monitor

Pilot FactoryPilot FactoryComputing SiteComputing SitePilots

Pilots get jobs


PanDA Monitor

PanDA Monitor is the web interface

to the panda system Provides summaries of

processing per cloud/site Drill down to individual job

logs And directly view logfiles

Task status Also provides a web interface to

request actions from the system Task requests Dataset Subscriptions



Task request interface is hosted as part of the panda monitor Allows physicists do define MC production task

Backend database exists separately from rest of panda Prime candidate for migration from MySQL at BNL to Oracle at

CERN

AKTR MySQL

ProdDBOracle

PandaDB MySQL

AKTR Oracle

ProdDBOracle

PandaDB MySQL


Migration – Phase 1

Target was migration of task request database and panda monitor

First step was to prepare infrastructure for services: 3 server class machines to host

panda monitors Dual CPU, Quad Core Intel E5410

CPUs 16GB RAM 500GB HDD

Setup as much as possible as standard machines supported by CERN FIO Quattor templates Lemon monitoring Alarms for host problems

Utilise CERN Arbitrating DNS to balance load across all machines Picks the 2 ‘best’ machines of 3 with

a configurable metric

Also migrated to the ATLAS standard python environment Python 2.5, 64 bit


Parallel Monitors

Panda was always architected to have multiple stateless monitors Each monitor queries the backend database to retrieve user requested

information and display it Thus setting up a parallel monitor infrastructure at CERN was relatively easy

Once external dependencies were sorted ATLAS Distributed Data Management (DDM) Grid User Interface tools

This was deployed at the beginning of December 2008

DB



First real step was to migrate the TR DB between MySQL and Oracle This is not quite as trivial as one first imagines Each database supports some non-standard SQL features

And these are not entirely compatible

Optimising databases is quite specific to the database engine First attempts ran into trouble

MySQL dump from BNL to CERN resulted in connections being

dropped Had to dump data at BNL and scp to CERN

Schema required some cleaning up Dropped unused tables Removing null constraints, CLOB->VARCHAR, resizing some text

fields

However, after a couple of trial migrations we were confident that

data could be migrated in just a couple of hours


Migration

Migration occurred on Monday December 8th

Database data was migrated in a couple of hours Two days were then used to iron out any glitches

In the Task Request interfaces In the scripts which manage the Task Request to ProdDB

interface Could this all have been prepared in advance?

In theory yes, but we are migrating a live system So there only a limited amount of test data which can be

inserted into the system Real tasks trigger real jobs

System was live again and accepting task requests on Wednesday Latency of tasks in the production system is usually several

days, even for short tasks Acceptable to the community


A Tale of Two Infrastructures

New panda monitor setup required DB plugins to talk to both

MySQL and to Oracle The MySQLdb module is bog standard The cx_oracle module much less so

In addition Python 2.4 was the supported infrastructure at BNL as

opposed to Python 2.5 at CERN

This meant after the TR migration the BNL monitors started to have

a more limited functionality This had definitely not been in the plan!

MySQL DB

Oracle DB


PanDA Servers

Some preliminary work on the panda server has been done already in

2008 However much still required to be done to migrate the full suite of

panda server databases: PandaDB – holds live job information and status (‘fast buffer’) LogDB – holds pilot logfile extracts MetaDB – holds panda scheduler information on sites and queues ArchiveDB – ultimate resting place of any panda job (big!)

For most databases the data volume was minimal and the main work

was in the schema details Including the setup of Oracle triggers

For the infrastructure side we copied the BNL setup, with multiple

panda servers running on the same machines as the monitors We knew the load was low and the machines were capable

We also required one server component which interfaces between the

panda servers and ProdDB, bamboo Same machine template worked fine


ArchiveDB

In MySQL, because of constraints on the table performance vs. size

an explicit partitioning had been adopted One ArchiveDB table for every two months of jobs

Jan_Feb_2007 Mar_Apr_2007 … Jan_Feb_2009

In Oracle internal partitioning is supported: CREATE TABLE jobs_archived (<list of columns>) PARTITION BY

RANGE(MODIFICATIONTIME) ( PARTITION jobs_archived_jan_2006 VALUES

LESS THAN (TO_DATE('01-JAN-2006','DD-MON-YYYY')), PARTITION

jobs_archived_feb_2006 VALUES LESS THAN (TO_DATE('01-MAR-2006','DD-

MON-YYYY')), PARTITION jobs_archived_mar_2006 VALUES LESS THAN

(TO_DATE('01-APR-2006','DD-MON-YYYY')), …

This allows for considerable simplification of the client code in

the panda monitor


Integrate, Integrate, …

By late February trial migrations of the databases had happened to

integration databases hosted at CERN (the INTR database) Trail jobs had been run through the panda server, proving basic

functionality Decision now had to be made on final migration strategy

This could be ‘big bang’ (move the whole system at once) or

‘inflation’ (gradually migrate clouds one by one) Big bang would be easier for, e.g., panda monitor

But would carry greater risks – suddenly loading the system with

35k running jobs was unwise If things went very wrong it might leave us with a big mess to

recover from

External constraint was the start of the ATLAS cosmics re-

reprocessing campaign due to start 9th March We decided to migrate piecemeal


Final Preparations

In fact PanDA did have two heads already IT and CERN clouds had been run from a parallel MySQL setup

from early 2008 This was an expensive infrastructure to maintain as it did not tap

into CERN IT supported services

It was obvious that migrating these two clouds would be a

natural first step Plans were made to migrate to the ATLAS production database

at CERN (aka ATLR) Things seemed to be under control a few days before…


DBAs

Friday before we were due to migrate CERN DBAs asked us not to

do so They were worried that not enough testing of the Oracle setup

in INTR has been done This triggered a somewhat frantic weekend of work, resulting in

several thousand jobs being run through the CERN and IT clouds

using the INTR databases From our side this testing looked to be successful

However, we reached a subsequent compromise that We would migrate the CERN and IT clouds to panda running against

the INTR They would start backups on the INTR database giving us the

confidence to run production for ATLAS though this setup Subsequent migration from INTR to ATLR could be achieved much

more rapidly as the data was already in the correct Oracle formats


Tuning and Tweaking

Migration of PandaDB, LogDB, MetaDB was very quick There was one unexpected piece of client code which hung

during the migration process (polling of CERN MySQL servers) Migration and index building of ArchiveDB was far slower

However, we disabled access to ArchiveDB and could bring the

system up live within half a day Since then a number of small improvements in the panda code have

been made to help optimise use of oracle Connections are much more expensive in Oracle than in MySQL

Restructure code to use a connection pool Create common reader and writer accounts for access to all database

schemas from the one connection

Migration away from triggers to .nextval() syntax Despite fears, migration of panda server to oracle has been relatively

painless and been achieved without significant loss of capacity


Cloud Migration

Initial migration was for CERN and IT clouds We added NG, the new Nordugrid cloud, which was from a standing

start We added DE after a major intervention in which the cloud was taken

offline Similarly TW will come up in the CERN Oracle instance

UK was the interesting case where we migrated a cloud live: Switched bamboo instance to send jobs to CERN Oracle servers

Current jobs are left being handled by old bamboo and servers

Start sending pilots to UK asking from jobs from CERN Oracle

servers Force the failure of jobs not yet started in the old instance

These return to prodDB and then are picked up again by panda using

the new bamboo

Old running jobs are handled correctly by the ‘old’ system There will be a subsequent re-merge into the CERN ArchiveDB


Monitor Blues

A number of problems did arise in the new monitor setup required

for the migrated clouds Coincident with the migration there was a repository change

from CVS to SVN However, the MySQL monitor was deployed from CVS and the

Oracle monitor from SVN This lead to a number of accidents and minor confusions which it

took a while to recover from

New security features cause some loss of functionality at times

as it was hard to check all the use cases And the repository problems augmented this

However, these are now mostly resolved issues and ultimely the

system will in fact become simpler


Conclusions

Migration of the panda infrastructure from BNL to CERN has underlined how

difficult the transition of a large scale, live, distributed computing system is A very pragmatic approach was adopted in order to get the migration done

in a reasonable time Although it always takes longer then you think

(This is true even when you try and factor in knowledge of the above)

Much has been achieved Monitor and task request database fully migrated CERN Panda server infrastructure moved to Oracle

Now running 5(6) of the 11 ATLAS clouds: CERN, DE, IT, NG, UK, (TW)

Remaining migration steps are now a matter of scaling and simplifying We learned a lot

Love your DBAs, of course If we have to do this again, now we know how

But there is still considerable work to do Mainly in improving service stability, monitoring and support

proceedures

migration of atlas panda to cern

Documents