distributed development, centralised delivery - sagrid jenkins + cvmfs

26
Jenkins + CVMFS : Distributed Development, Centralised Delivery Bruce Becker | [email protected] Coordinator: SAGrid SANREN, Meraka Institute, CSIR Stefanus Riekert | [email protected] HPC Application Engineer University of the Free State

Upload: bruce-becker

Post on 10-May-2015

161 views

Category:

Science


1 download

DESCRIPTION

presentation on the status of the SAGrid application porting platform based on Jenkins and CVMFS, given to the EGI Community Forum 2014

TRANSCRIPT

Jenkins + CVMFS :Distributed Development,Centralised Delivery

Bruce Becker | [email protected]: SAGrid

SANREN, Meraka Institute, CSIR

Stefanus Riekert | [email protected] Application Engineer

University of the Free State

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Outline● What users want● SAGrid VO – a catch-all VO with many applications● Problem statements:

● Problem 1: ”the usual problem” – maintaining applications in a distributed computing environment

● Problem 2: ”Another usual problem” - maintaining a complex application inventory

● General solution : CVMFS + Jenkins● Some specifics of SAGrid CI platform ● Outlook

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

SAGrid as a catch-all VO

● The South African National Grid operates a catch-all VO which all South African researchers can use to access computing and data resources.

● SAGrid VO is not a domain-specific VO, so● several widely-varying uses for the applications

supported by this VO● Applications requested by users or communities

themselves

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

What users want

Amazing infrastructure

Some users want highly varied, modular

application selection

Vertically integratedHighly specialised

applications

Highly trained supportHighly trained support

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

What users get sometimes

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

The problem (1) - ”the usual problem”

● Software distribution was done mostly by hand”:● Someone from the ops team develops script to install the application● Apps installed via job submission ● Tags applied via script or by the job itself

● Issues:● Major overhead of work● Inconsistent installation procedures between applications and sites● Bottleneck in porting applications (has to be done by someone in the

VO)● Duplication of effort, especially in dependencies of applications● Difficult to manage application lifecycles

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

The problem (2) - what about the community ?

● Managing the inventory in a catch-all VO can be complex when there are many applications

● Prioritising porting requests depends on the knowledge of the export porting the application● Can lead to major delays in porting and deploying applications

● However, a user or community usually has an expert who knows how to tune, port and configure the application properly, as well as dependencies● Usually, ”they” have to conform to ”us” - learn grid tools and

terminology, etc

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Problem (3) :Changes to the playing field

● New middleware stacks

● New architectures – GPGPU, ARM

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Questions to answer● How do we lower the barrier to entry to the grid or

cloud infrastructure ?● How can the application expert prove to the resource

provider that the application will actually run on the execution environment of the site ?

● How can we manage the lifecycle of applications across multiple versions, architectures, configurations ?

● How can we ensure that once applications are ”certified”, they are actually available on as many sites as possible ?

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

General Solution: Jenkins + CVMFS

● The issues outlined are ”typical” in a large software project

● Usually solved by judicious use of Continuous Integration system

● Once applications have been ”ported”, put them into a trusted repository

● Previously – built RPMs, but required site-admin intervention

● One-time configuration with CVMFS

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

First, some changes● Distribute the effort, centralise the tools

● Move repository from ”closed” SVN repo– https://ops.sagrid.ac.za/trac/svn/repo

● to git– https://github.com/SAGridOps/SoftwareInstallation

● Don't have to give write access to a single repo, instead accept pull requests

● Take advantage of all the Github infrastructure● Expand possible contributors to those ”outside” the

infrastructure● Recognise individuals' contribution

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Recognise individuals...

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Decentralise the team

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Collaborate with code

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Let the robots do the work

● Define what we want to deploy – let the experts take care of how to deploy

● DevOps paradigm – same review/tag/release mechanisms on operations code as we have for scientific applications● Teach a marketable skill● Allow specialisation● Enable remote management of complex services● Ensure that published methodology is adopted

methodology

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Quality Control and feedback

● Ensure that requested applications are included in the repo

● Provide testing and QA infrastructure

● Self-serve to users

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

The CI environment● Jenkins is extremely flexible... can do almost anything● AuthN/AuthZ

● Currently using Github Oauth ● Take advantage of future Identity Federation

● We wanted to simulate different execution environments● Already in production● Planned for future

● Track and re-use depedendencies

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Matrix-based builds● Independent different builds and build statuses for

different configurations:● Application name● Version● OS● Architecture● … can add specific tuning configurations...

● We can see exactly what's broken where – build more resilient integration code.

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Typical workflow

Test

ing

mat

rix

Defines relevanttests in Jenkins

Writes code to pass required tests

Dev/Stage env.Application developer

Infrastructure expert

Reads descriptionof execution environment tests

Promote a buildto CVMFS

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Dependency managementsimple case

● Common problem with applications : need a specific version of a compiler

● Compiling the compiler can itself be tricky...

● Jenkins tests the full dependency chain necessary

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Real-world application

● GADGET – astrophysics hydrodynamic simulations

● Many (levels of) dependencies

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Public Application Dashboard

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Authenticated view

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Generic build script# GADGET requires HDF5 FFTW2 ZLIB and openmpimodule add cimodule add fftw/2.1.5module add hdf5module add openmpimodule add gsl

# GADGET requires HDF5 FFTW2 ZLIB and openmpimodule add cimodule add fftw/2.1.5module add hdf5module add openmpimodule add gsl

rm ­rf $FFTW_DIRtar xvfz /repo/$SITE/$OS/$ARCH/fftw/$FFTW_VERSION/build.tar.gz ­C /rm ­rf $HDF5_DIRtar xvfz /repo/$SITE/$OS/$ARCH/hdf5/$HDF5_VERSION/build.tar.gz ­C /rm ­rf $OPENMPI_DIRtar xvfz /repo/$SITE/$OS/$ARCH/openmpi/$OPENMPI_VERSION/build.tar.gz ­C /rm ­rf $GSL_DIRtar xvfz /repo/$SITE/$OS/$ARCH/gsl/$GSL_VERSION/build.tar.gz ­C /

rm ­rf $FFTW_DIRtar xvfz /repo/$SITE/$OS/$ARCH/fftw/$FFTW_VERSION/build.tar.gz ­C /rm ­rf $HDF5_DIRtar xvfz /repo/$SITE/$OS/$ARCH/hdf5/$HDF5_VERSION/build.tar.gz ­C /rm ­rf $OPENMPI_DIRtar xvfz /repo/$SITE/$OS/$ARCH/openmpi/$OPENMPI_VERSION/build.tar.gz ­C /rm ­rf $GSL_DIRtar xvfz /repo/$SITE/$OS/$ARCH/gsl/$GSL_VERSION/build.tar.gz ­C /

Set up theenvironment

Clean build, retrieve dependency artifacts

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

Generic build scriptmake install DESTDIR=$WORKSPACE/buildmkdir ­p $REPO_DIRrm ­rf $REPO_DIR/*tar ­cvzf $REPO_DIR/build.tar.gz ­C $WORKSPACE/build apprepo

make install DESTDIR=$WORKSPACE/buildmkdir ­p $REPO_DIRrm ­rf $REPO_DIR/*tar ­cvzf $REPO_DIR/build.tar.gz ­C $WORKSPACE/build apprepo

Actually build...Create the artifact

cat <<MODULE_FILE#%Module1.0## $NAME modulefile##proc ModulesHelp { } {    puts stderr "       This module does nothing but alert the user"    puts stderr "       that the [module­info name] module is not available"}preqreq("gsl","fftw/2.1.5","hdf5")module­whatis   "$NAME $VERSION."setenv       GSL_VERSION       $VERSIONsetenv       GSL_DIR           /apprepo/$::env(SITE)/$::env(OS)/$::env(ARCH)/$NAME/$VERSIONprepend­path LD_LIBRARY_PATH   $::env(GSL_DIR)/libMODULE_FILE) > modules/$VERSION

cat <<MODULE_FILE#%Module1.0## $NAME modulefile##proc ModulesHelp { } {    puts stderr "       This module does nothing but alert the user"    puts stderr "       that the [module­info name] module is not available"}preqreq("gsl","fftw/2.1.5","hdf5")module­whatis   "$NAME $VERSION."setenv       GSL_VERSION       $VERSIONsetenv       GSL_DIR           /apprepo/$::env(SITE)/$::env(OS)/$::env(ARCH)/$NAME/$VERSIONprepend­path LD_LIBRARY_PATH   $::env(GSL_DIR)/libMODULE_FILE) > modules/$VERSION

Create the modulefile

Bruce Becker: Coordinator, SAGrid | [email protected] | http://www.sagrid.ac.za

So, it works ! … almostNext steps

● We have an open, collaborative, low-barrier platform for researchers to bring applications to the grid

● Small technical tasks : ● Implement promoted builds mechanism to populate sagrid.ac.za CVMFS repo● Implement SAML AuthN, integrate IdF● Probes to check that CVMFS is mounted on sites (?)

● Operating in ”stealth mode” at the moment – not advertising, but open to anyone who is interested to collect feedback

● Addressing specific user communities to test drive the system:● Machine learning astro applications (rapid prototyping)● Bioinformatics application suites (complex ecosystem)

● Present next phase of the project in November in Cape Town – move to production