worker node software management: the vo perspective · • dennis van dok is part of team that...

Post on 13-Mar-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Worker Node Software Management: the VO perspective

Mark Santcroos Dennis van Dok

Introduction

•  e-BioScience group –  Bioinformatics Laboratory –  Clinical Epidemiology, Biostatistics and Bioinformatics –  Academic Medical Centre, Amsterdam

•  Intermediate between medical researchers and Dutch NGI

•  Support a wide range of applications in Next Generation Sequencing and Medical Imaging

Worker Node Software

•  Running on 15 sites in the Netherlands

•  Base worker node installation (glite-WN)

•  Proof of Concept (PoC) software installation, heritage of Virtual Laboratory for e-Science (ended 2009)

Perspective

•  Dennis van Dok is part of team that developed and managed the PoC environment at BiG Grid

•  Mark is a VO manager for the vlemed VO

Job / Application Scenarios

•  Use installed software

•  Application in Job Sandbox

•  Fetch Application using wrapper

•  Upgrade versions in PoC distribution

•  Lobby for new versions with Site admins

Limitations

•  Sandbox solution has size limits

•  Sandbox and wrapper have network overhead

•  Installed version out of date / too new

•  Responsibility of maintaining applications for end-user not always preferable

•  Site admins have to be in the loop

High Level Goal

•  Have a flexible solution to make software available on the grid for end users that is also manageable from a VO admin perspective.

Packaging Requirements

•  Automatic dependency resolution

•  Supported on Linux

•  Tools for install/update/remove/status

•  Running entire in userspace, unprivileged

•  Multiple installed versions of the same software

Unsuitable candidates

•  rpm/yum •  deb/apt •  portage •  Arch User Repository •  pacman •  …

•  Reasons: too OS specific, difficult to manage unprivileged

Pkgsrc

•  Originating in NetBSD •  Supported on Linux •  Self contained •  Actively maintained •  Can be used as a non-privileged user •  Large collection of applications already packaged •  Can make use of system provided dependencies •  Allows maintaining a local set of packages •  Could add packages to the main distribution •  Supports binary and source packages

Creating a package

DISTNAME= vlet-1.3.2 CATEGORIES= local MASTER_SITES= http://orange.ebioscience.amc.nl/pkgsrc/distfiles/ EXTRACT_SUFX= .zip MAINTAINER= m.a.santcroos@amc.uva.nl HOMEPAGE= http://orange.ebioscience.amc.nl/pkgsrc/distfiles/ COMMENT= This is the VL-e Toolkit LICENSE= apache-2.0 NO_CONFIGURE= yes NO_BUILD= yes PKG_DESTDIR_SUPPORT= user-destdir INSTALLATION_DIRS= bin lib post-extract:

${CP} ${FILESDIR}/Makefile ${WRKSRC}/Makefile .include "../../mk/bsd.pkg.mk"

Package Tree Management

•  update-tree.sh –  Pull upstream pkgsrc changes –  Create tarball –  Put on website

Implementation Principles

•  $VO_[VONAME]_SW_DIR is a directory shared between all worker nodes on a site

•  Run with a Software (VO) Manager proxy

•  Install packages per site / cluster / CE

Architecture

Server (UI)

Shared Storage Area

Worker Nodes

Management Jobs

Mount

Managing packages

•  site-pkgtool.sh –  Program to manage packages centrally –  Initiates grid jobs

•  Install, Remove, Update

•  Init, Reinit, Check, Dump, Info, Version

Script on the worker node

•  pkgsrc-cmd.sh –  Wrapper program that runs on the worker node

•  Running as a grid job

Information Management

•  list-installed-packages.sh –  Display information about installed packages for sites

•  get-site-status.sh –  Gather information from all supported sites

•  verify-package.sh –  Check if a certain package is available on a site

•  get-tags.sh –  Get all the package tags for the configured sites

Installing a package

•  Check if distribution is fresh

•  Extract tree in scratch space

•  Build package and dependencies

•  Install package in shared software area

•  Install modulefile

Environment Modules

•  “The Environment Modules package provides for the dynamic modification of a user's environment via modulefiles.”

•  Select versions •  Setup environment •  Integrates with system provided setup

Tags

•  Software Tags in Information System (BDII)

•  Publish installed software versions per CE

•  Used for resource selection by adding it to the “Requirements” of a JDL

•  Use lcg-ManageVOTag tool to publish tag

•  Structure of tags is VO-${vo}_SW_${package}

Practical issues

•  Tags are not omnipresent

•  Shared area can become bottleneck

•  No intelligent matching on tags

Conclusions

•  Flexible software management system

•  Relieves burden from user

•  Creating packages is still labor intensive work

Discussion

•  One size fits all? (Did we reinvent the wheel?)

•  Connect to EGI AppDB?

•  EMI Community Repositories?

•  Usable for data distribution?

•  Other mechanism for matching?

Links

•  pkgsrc –  http://www.netbsd.org/docs/software/packages.html

•  Modules –  http://modules.sourceforge.net/

•  BiG Grid –  http://www.biggrid.nl/

•  Bioinformatics Laboratory –  http://www.bioinformaticslaboratory.nl/

•  Project Code –  http://dvandok.github.com/userspace-package-

management/

Acknowledgements

•  AMC Bioinformatics Laboratory –  Prof. dr. Antoine van Kampen –  Dr. Silvia Delgado Olabarriaga –  Barbera van Schaik

•  Big Grid / Nikhef –  Jan Just Keijser

Thanks!

top related