issues in managing hep software development in a distributed environment

26
2/8/00 E. Buckley-Geer, CHEP 2000 1 Issues in managing HEP Software Development in a distributed environment Elizabeth Buckley-Geer Fermilab CHEP 2000, Padova, Italy

Upload: georgia-stanley

Post on 30-Dec-2015

31 views

Category:

Documents


0 download

DESCRIPTION

Issues in managing HEP Software Development in a distributed environment. Elizabeth Buckley-Geer Fermilab CHEP 2000, Padova, Italy. Contents. Characterizing the problem Key issues and solutions from CDF/D0 Collider Run II Some thoughts on the development process Conclusions. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Issues in managing HEP Software Development in a distributed environment

2/8/00 E. Buckley-Geer, CHEP 2000 1

Issues in managing HEP Software Development in a distributed environment

Elizabeth Buckley-Geer

Fermilab

CHEP 2000, Padova, Italy

Page 2: Issues in managing HEP Software Development in a distributed environment

2/8/00 E. Buckley-Geer, CHEP 2000 2

Contents

Characterizing the problemKey issues and solutions from CDF/D0

Collider Run IISome thoughts on the development processConclusions

Page 3: Issues in managing HEP Software Development in a distributed environment

2/8/00 E. Buckley-Geer, CHEP 2000 3

Characterizing the problem

Developer community of about 150 people (both collaborations) from North and South America, Europe, Asia, India, Russia

Widely varying quality of network connections between FNAL and remote locations

Widely varying abilities of groups to afford to purchase commercial tools

Page 4: Issues in managing HEP Software Development in a distributed environment

2/8/00 E. Buckley-Geer, CHEP 2000 4

Characterizing the problem

One common denominator since mid-1997: Everyone can buy a cheap PC and run Linux on it No more $10-20K workstations. Every member of the

group can have a PC They don’t want to rely on connecting to a central

machine at FNAL to do code development They want to make use of these PCs at their own

location to do their code development First release of CDF code for Linux was January 1998

– several years after the basic development environment was designed

Page 5: Issues in managing HEP Software Development in a distributed environment

2/8/00 E. Buckley-Geer, CHEP 2000 5

The situation during Run I (CDF - but similar for D0)

Highly centralized code development. Could only realistically develop code on central machine at

FNAL (VMS cluster) – no distributed development was supported even on other VMS systems

Code was ported to run on IRIX and AIX but only frozen releases were available on these platforms

Frozen release were distributed to remote sites as tar files or VMS save sets

Development version of the code was available to desktop VMS nodes at FNAL from 1993 onwards but code could not be committed to repository from these machines

Page 6: Issues in managing HEP Software Development in a distributed environment

2/8/00 E. Buckley-Geer, CHEP 2000 6

Run I development tools

Code was mostly Fortran with some small amounts of C. About 50 packages.

Used proprietary VMS tools for for version control and package building (CMS and MMS)

Used vendor compilers and debuggers . Only UNIX vendors who supported VMS extensions were considered. Luckily the list was sufficiently long!

No serious use of design tools – some early attempts at D0 but didn’t survive

No tools to locate memory leaks due to the nature of the memory management packages in use – YBOS and ZEBRA

Page 7: Issues in managing HEP Software Development in a distributed environment

2/8/00 E. Buckley-Geer, CHEP 2000 7

Goals for Run II development environment – early 1996

Obviously needed to migrate from VMS as a primary platform

Provide ability to do remote development – recognized as important even before the Linux revolution

Reduce the need for proprietary tools for base system

Handle move from Fortran to C++ Identify useful software engineering tools

Page 8: Issues in managing HEP Software Development in a distributed environment

2/8/00 E. Buckley-Geer, CHEP 2000 8

Configuration Management Joint Project

Formed joint D0, CDF, FNAL Computing Division working group to study configuration management in early 1996 (see E248 for more on Run II joint projects)

Charge was to find and implement a common solution for CDF and D0 for software management Version control Package and release organization Building packages Distribution Validation

Page 9: Issues in managing HEP Software Development in a distributed environment

2/8/00 E. Buckley-Geer, CHEP 2000 9

Configuration Management Joint Project

Group looked at existing tools in use in HEP and elsewhere

ChoseCVS for version control with customizations

from Sloan Digital Sky Survey (SDSS)SoftRelTools from BaBar for package

organization and buildingUPS/UPD from FNAL for product setup and

distribution tools

Page 10: Issues in managing HEP Software Development in a distributed environment

2/8/00 E. Buckley-Geer, CHEP 2000 10

CVS

Run in client/server mode – adopted from SDSSRepository on server + cvsuser pseudo account

running a restricted shell CVSH that only allows cvs commands to be executed

Local and remote access are identical so users do not need to be on a FNAL computer to access repository – necessary condition for remote development

Page 11: Issues in managing HEP Software Development in a distributed environment

2/8/00 E. Buckley-Geer, CHEP 2000 11

SoftRelTools (SRT)

Adapted from BaBar experimentUses cpp used to create dependencies and gmake

used to build libraries & binariesBaBar and FNAL agreed to diverge on

development It was becoming difficult to add new features

given the original structure of the packageHave since done a re-write (Spring 1999) of the

package at FNAL to make it more maintainable

Page 12: Issues in managing HEP Software Development in a distributed environment

2/8/00 E. Buckley-Geer, CHEP 2000 12

UPS – Unix Product Setup

FNAL product in use since 1991Supports existence of multiple versions of

a product. Choice is made using a ‘setup’ command.

Re-write for Run IICompleted in summer 1998In use by both CDF and D0

Page 13: Issues in managing HEP Software Development in a distributed environment

2/8/00 E. Buckley-Geer, CHEP 2000 13

Use of these tools at CDF

~ 65 code developers 1.3 million lines of code

71% C++ , 20% Fortran, 8% C, 0.6% Java + external packages 144 packages

Development release built every night on IRIX, TRU64, SUN, Linux

Daily build logs scanned for errors and reported to developers. Build logs are posted on web

Development builds lead to timely detection and fixing of bugs

Create frozen releases about every 2 months. Also create releases to capture code used for certain milestones.

Page 14: Issues in managing HEP Software Development in a distributed environment

2/8/00 E. Buckley-Geer, CHEP 2000 14

Use of these tools at CDF

Success of development rebuild varies. Somewhat correlated with number of files changed

0102030405060708090

Jul-

98

Sep

-98

Nov

-98

Jan

-99

Mar

-99

May

-99

Jul-

99

Sep

-99

% of succesfuldevelopmentbuilds

Page 15: Issues in managing HEP Software Development in a distributed environment

2/8/00 E. Buckley-Geer, CHEP 2000 15

Use of these tools at D0

~60 code developers have write access to repository Essentially 100% C++ except for external packages 280 packages – but big variation in size

Test release of entire package weekly on IRIX and Linux. Goal is to have operational reconstruction exe at the end of every release. Currently 80% success rate.

Production releases occur at intervals determined by the management. Used to capture important milestones and provide stable working versions.

5 production releases to date

Page 16: Issues in managing HEP Software Development in a distributed environment

2/8/00 E. Buckley-Geer, CHEP 2000 16

Code Distribution

CDF has a set of custom scripts to distribute code to remote sites.

Both frozen releases and development are distributed Fairly straightforward to get distribution. Currently fairly manpower intensive for development

release on remote nodes – ½ FTE devoted for fixing problems

Working on switching to UPD for ease of maintenance No significant automatic code distribution happening in

D0 yet

Page 17: Issues in managing HEP Software Development in a distributed environment

2/8/00 E. Buckley-Geer, CHEP 2000 17

Code Distribution

Majority of distribution is to Linux machines

Linux IRIX TRU64 Solaris

Development

44 7 3 2

Frozen

Release

115 13 6 2

Page 18: Issues in managing HEP Software Development in a distributed environment

2/8/00 E. Buckley-Geer, CHEP 2000 18

Compilers

We wanted to write code that adhered to the C++ ANSI standard – not get into the Fortran extensions quagmire!

GCC and vendor compilers were not thought sufficiently compliant in summer 1997

Chose KAI compiler from Kuck and Associates Compiler was available on the relevant platforms – including

LINUX Has led to issues with availability of KAI versions of external

products that must be built with the CDF/D0 software – e.g. we paid for a port of Open Inventor

We still believe it was the right choice at the time but expect to use EGCS and vendor compilers in the future

Page 19: Issues in managing HEP Software Development in a distributed environment

2/8/00 E. Buckley-Geer, CHEP 2000 19

Debuggers and other tools

Quality of the debugging tools has left a lot to be desired This was one of the few downsides of choosing KAI. Things

have been particularly problematic on Linux Have purchased TotalView which is in use on IRIX and

will shortly be available for Linux – seems to improve the situation

Case tools – used GDPro and Rational Rose Mostly used to document design – did not use automatic code

generation features Purify and Insure++ used to look for memory leaks – but

not currently available for Linux

Page 20: Issues in managing HEP Software Development in a distributed environment

2/8/00 E. Buckley-Geer, CHEP 2000 20

Licensed products

Has been very beneficial to negotiate license agreements that cover use of a product by all Run II developers independent of their location

Have done this with KAI, Open InventorGet better price - all licenses must be

ordered through Fermilab

Page 21: Issues in managing HEP Software Development in a distributed environment

2/8/00 E. Buckley-Geer, CHEP 2000 21

Thoughts on the development process

Borrowing from the terminology and observations presented in “The Cathedral and the Bazaar” by Eric Raymond – O’Reilly Books

Our code is clearly Open Source because (by and large) it is freely available to anyone who wants to use it from another experiment

However, both CDF and D0 software projects are run using the traditional “cathedral” style of software development

This is necessitated by the requirements to provide schedules, obtain manpower resources from a limited pool, meet milestones and convince review committees that you know what you are doing

We can make some comparisons between aspects of the Open Source

(aka Linux) model and what we are doing in HEP

Page 22: Issues in managing HEP Software Development in a distributed environment

2/8/00 E. Buckley-Geer, CHEP 2000 22

Thoughts on the development process

“Treat your users as co-developers” Two user communities in an experiment

Those working on the software project – programmers and physicists

The rest of the experiment – the physicist-user The first group tends to be like the Linux community

– working on the project because they are interested in the problem and want to improve the product

The second group just want to use the software to get physics results – they want to improve their physics analysis software but not the infrastructure

Page 23: Issues in managing HEP Software Development in a distributed environment

2/8/00 E. Buckley-Geer, CHEP 2000 23

Thoughts on the development process

“Release early, release often”CDF has shown that this leads to more timely

bug fixes and shorter integration time and is very desirable for the project developers

However, it drives the physicist-user to distraction because he/she just wants something that works!

Have to have stable frozen releases in addition

Page 24: Issues in managing HEP Software Development in a distributed environment

2/8/00 E. Buckley-Geer, CHEP 2000 24

Thoughts on the development process

Some of the skills necessary to co-ordinate a successful Open Source project are relevant to managing an HEP computing project Must have good people and communication skills Need to be able to attract people to the project and

keep them interested and happy These can often be more important than possessing

great technical prowess If often feels like we are in a bazaar rather than a

cathedral!

Page 25: Issues in managing HEP Software Development in a distributed environment

2/8/00 E. Buckley-Geer, CHEP 2000 25

Conclusions

CDF and D0 are successfully managing their software development projects with ~ 60 – 70 developers per experiment and 1 million lines of C++ each

We are expected to have schedules, milestones and reviews which makes it unlikely that we can ever manage a project using the bazaar model

However, some of the Open Source concepts are applicable to HEP projects

Page 26: Issues in managing HEP Software Development in a distributed environment

2/8/00 E. Buckley-Geer, CHEP 2000 26

Use of these tools at CDF

On days that development builds we create a rawhide release. This satisfies developers who need the up-to-date code but also need the whole release to actually build