experiences deploying clusterfinder on the grid arthur carlson (mpe) 7th astrogrid-d meeting tum,...

19
Experiences deploying Clusterfinder on the grid Arthur Carlson (MPE) 7th AstroGrid-D Meeting TUM, 11th-12th June 2007

Upload: hilary-porter

Post on 04-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Experiences deploying Clusterfinder on the grid Arthur Carlson (MPE) 7th AstroGrid-D Meeting TUM, 11th-12th June 2007

Experiences deploying Clusterfinder on the grid

Arthur Carlson (MPE)

7th AstroGrid-D Meeting

TUM, 11th-12th June 2007

Page 2: Experiences deploying Clusterfinder on the grid Arthur Carlson (MPE) 7th AstroGrid-D Meeting TUM, 11th-12th June 2007

Experiences deploying Clusterfinder on the grid

•What is the deployment problem?•A prototype solution using

–“grid-modules”–“environments”

•Status and conclusions

Page 3: Experiences deploying Clusterfinder on the grid Arthur Carlson (MPE) 7th AstroGrid-D Meeting TUM, 11th-12th June 2007

Deployment is when ...

Page 4: Experiences deploying Clusterfinder on the grid Arthur Carlson (MPE) 7th AstroGrid-D Meeting TUM, 11th-12th June 2007

Deployment is when ...

users

applications

hosts.

each of many

can (build and) run each of many

on each of many

Page 5: Experiences deploying Clusterfinder on the grid Arthur Carlson (MPE) 7th AstroGrid-D Meeting TUM, 11th-12th June 2007

Deployment is when ...

users

applications

hosts.

each of many

can (build and) run each of many

on each of many

“each” = >90%“many” = >10

Page 6: Experiences deploying Clusterfinder on the grid Arthur Carlson (MPE) 7th AstroGrid-D Meeting TUM, 11th-12th June 2007

Deployment is when ...

users

applications

hosts.

each of many

can (build and) run each of many

on each of many

certificates/password filesVOs (update of grid-mapfile,

sharing software)firewalls

repository/distribution/version controldata access“standard software” (compiler, ...)environment

“each” = >90%“many” = >10

Page 7: Experiences deploying Clusterfinder on the grid Arthur Carlson (MPE) 7th AstroGrid-D Meeting TUM, 11th-12th June 2007

grid-modules

Page 8: Experiences deploying Clusterfinder on the grid Arthur Carlson (MPE) 7th AstroGrid-D Meeting TUM, 11th-12th June 2007

grid-modules

A prototype system for getting software from where it is maintained to where it is used.

• Inspired by environment modules package– load/unload (PATH)– initadd/initclear (.profile)

• for software from a remote repository– update/deinstall– build/clean– test

Page 9: Experiences deploying Clusterfinder on the grid Arthur Carlson (MPE) 7th AstroGrid-D Meeting TUM, 11th-12th June 2007

grid-modules: install and use

• grid-modules-clone NEWHOST(LIST)

• also copies ~/.subversion for passwords

• grid-module [update|load|initadd|build|test][gridmod|env|gmon|cf|proc|gat]

Page 10: Experiences deploying Clusterfinder on the grid Arthur Carlson (MPE) 7th AstroGrid-D Meeting TUM, 11th-12th June 2007

grid-modules: adding modules

• set_module_infoagd_rep='svn://svn.gac-grid.org/software‘

all_modules=‘gridmod cf‘case $module in gridmod) rep=$agd_rep/grid-modules; frag=gridmod/bin;; cf) rep=$agd_rep/clusterfinder; frag=unknown;;

*) rep=unknown; frag=unknown;;esac

• customization scripts

Page 11: Experiences deploying Clusterfinder on the grid Arthur Carlson (MPE) 7th AstroGrid-D Meeting TUM, 11th-12th June 2007

grid-modules: adding modules

• set_module_infoagd_rep='svn://svn.gac-grid.org/software‘planck_rep='http://www.mpa-garching.mpg.de/svn/planck-group/planckbranches‘all_modules=‘gridmod cf proc‘case $module in gridmod) rep=$agd_rep/grid-modules; frag=gridmod/bin;; cf) rep=$agd_rep/clusterfinder; frag=unknown;; proc) rep=$planck_rep/ProC-2.3; frag=proc/build/dist/bin;; *) rep=unknown; frag=unknown;;esac

• customization scripts===== proc.build =====cd ~/grid-modules/proc/ProC-baseant

===== proc.load =====mkdir -p $HOME/.planckecho "allowIncompleteConf = true" > "$HOME/.planck/pipelinecoordinator.pref"

===== proc.unload =====rm -r $HOME/.planck

Page 12: Experiences deploying Clusterfinder on the grid Arthur Carlson (MPE) 7th AstroGrid-D Meeting TUM, 11th-12th June 2007

environments

Page 13: Experiences deploying Clusterfinder on the grid Arthur Carlson (MPE) 7th AstroGrid-D Meeting TUM, 11th-12th June 2007

environments

A prototype system for making different hosts look alike.

• Does a required software package exist on a remote host, and where is it installed?export IMAGEMAGICK_HOME=/usr/local/ImageMagick-6.3.2

• Make it available!export PATH=$PATH:/usr/local/ImageMagick-6.3.2/bin

• Host-specific information must be maintained by somebody somewhere.– require modules or take the bull by the horns

Page 14: Experiences deploying Clusterfinder on the grid Arthur Carlson (MPE) 7th AstroGrid-D Meeting TUM, 11th-12th June 2007

environments: load_env

The trick is to find the right scripts to execute for each host.

if ! hostname=`hostname -f 2>/dev/null`; then hostname=`hostname`; fi

scripts=`sed -n "s/^ *$hostname *//p" <<EOF

astrogrid.aei.mpg.de aeiburan.aei.mpg.de aeilx32i1.cos.lrz-muenchen.de lrz g95 lrz-32lx64a2.cos.lrz-muenchen.de lrz g95 lrz-64...

EOF`

cd ~/grid-modules/env/bin

source ./default

if [[ -f local ]]; then \ echo sourcing local environment script source localelif [[ "$scripts" ]]; then \ echo For $hostname sourcing these scripts: $scripts for script in $scripts; do source ./$script; donefi

This may need to be changedwhen adding a new host

Page 15: Experiences deploying Clusterfinder on the grid Arthur Carlson (MPE) 7th AstroGrid-D Meeting TUM, 11th-12th June 2007

environments: scripts

The work is done in the scripts.===== default =====export GSL_INCL=-I/usr/includeexport GSL_LIBS=-L/usr/lib

export IMAGEMAGICK_INCL=-I/usr/include/export IMAGEMAGICK_LIBS=-L/usr/lib/

export FC='gfortran -std=gnu -fno-second-underscore'export F_PORTABILITY_FLAGS=-DPLANCK_GFORTRANexport F_COMMONFLAGS='-W -Wall -Wno-uninitialized -Wno-unused -O2 -Wfatal-errors $(F_PORTABILITY_FLAGS)'export FCFLAGS='-c $(F_COMMONFLAGS) -I$(INCDIR)'

export CC=gccexport CCFLAGS_NO_C='-W -Wall -I$(INCDIR) $(GSL_INCL) $(IMAGEMAGICK_INCL) -fno-strict-aliasing -O2 -g0 -s -ffast-math'export CCFLAGS='$(CCFLAGS_NO_C) -c‘

===== lrz =====export GSL_INCL='$(GSL_INC)'export GSL_LIBS='$(GSL_SHLIB) $(GSL_BLAS_SHLIB)'export ANT_HOME=/lrz/sys/apache-ant-1.6.5

module load gslmodule load javamodule load gcc/4.1.0module load g95module load mpi.shmem/gcc

export PATH=/lrz/sys/jdk1.5.0_07/bin:${PATH}

====== g95 =====export FC=g95export F_PORTABILITY_FLAGS=-DPLANCK_G95

Defaults can be overridden.

Defaults work in most cases.

Cooperates with modules.

New scripts may need to bewritten for new hosts

Page 16: Experiences deploying Clusterfinder on the grid Arthur Carlson (MPE) 7th AstroGrid-D Meeting TUM, 11th-12th June 2007

Status

Page 17: Experiences deploying Clusterfinder on the grid Arthur Carlson (MPE) 7th AstroGrid-D Meeting TUM, 11th-12th June 2007

Status

• ca. 23 AGD hosts + 9 DGI hosts are accessible• F90 build of Clusterfinder successful on 22 hosts (70%)• Some of the problems experienced:

– difficulty finding FQDNs of resources, hosts listed by mistake– gsissh disabled– default job factory type disabled for globusrun-ws– no gsiscp installed, or unexpected default ports– svn not installed, too old, or not allowed connections– shell not bash, .profile not processed with batch jobs– file quota too small– some hosts (lx[32|64]ia1 at LRZ) share a file system– no F90 compiler installed, or hard to find– deep changes in grid-modules are hard to update

Page 18: Experiences deploying Clusterfinder on the grid Arthur Carlson (MPE) 7th AstroGrid-D Meeting TUM, 11th-12th June 2007

Conclusions

Page 19: Experiences deploying Clusterfinder on the grid Arthur Carlson (MPE) 7th AstroGrid-D Meeting TUM, 11th-12th June 2007

Conclusions

• Clusterfinder has been deployed on “many” hosts using a prototype deployment system that is “easily” extendable to many users and many applications.

• The system handles diversity without standing in the way of defining standards.

• AGD should use this system or decide on something better, but should not diverge.