bring osg pools to your cluster. share your htcondor pools with

56
Rob Gardner • University of Chicago Bring OSG pools to your Cluster. Share your HTCondor pools with OSG. Introduction to High Throughput Computing for Users and System Administrators #TechEX15 Cleveland OH, October 8, 2015

Upload: dangdan

Post on 31-Dec-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bring OSG pools to your Cluster. Share your HTCondor pools with

Rob Gardner • University of Chicago

Bring OSG pools to your Cluster.Share your HTCondor pools with OSG.

Introduction to High Throughput Computing for Users and System Administrators

#TechEX15 Cleveland OH, October 8, 2015

Page 2: Bring OSG pools to your Cluster. Share your HTCondor pools with

Goals for this session

● Quick introduction to the OSG● Let’s submit to OSG from your campus● Connecting your cluster to the OSG

2

Page 3: Bring OSG pools to your Cluster. Share your HTCondor pools with

What is the OSG?

3

Page 4: Bring OSG pools to your Cluster. Share your HTCondor pools with

Open Science Grid

● A distributed computing partnership for data-intensive research

● 140+ resource providers (few in asia & europe)

rough scale: > 100k HTC jobs running concurrently

Page 5: Bring OSG pools to your Cluster. Share your HTCondor pools with

● 2014 stats○ 67% size of XD, 35% BlueWaters○ 2.5 Million CPU hours/day○ 800M hours/year○ 125M/y provided opportunistic

● >1 petabyte of data/day xfered● 50+ research groups● thousands of users● XD service provider for XSEDE

Rudi EigenmannProgram Director Division of

Advanced Cyberinfrastructure (ACI)NSF CISE

CASC Meeting, April 1, 2015

OSG profile

5

Page 6: Bring OSG pools to your Cluster. Share your HTCondor pools with

What is OSG Connect?

● http://osgconnect.net site● login node for job management,

login.osgconnect.net

● Stash storage platform — common storage for:○ scp, rsync○ http○ Globus (gridftp)

● Recommended path for (most) new OSG users

6

Page 7: Bring OSG pools to your Cluster. Share your HTCondor pools with

OSG Connect Service

Has an identity bridge: local campus identity (CILogon) ‣ OSG Connect identity (Globus) ‣ virtual organization (OSG)

+ HTCondor Glidein Overlay

⇒ Goal is to provide a virtual HTC cluster experience

Page 8: Bring OSG pools to your Cluster. Share your HTCondor pools with

Bring OSG pools to Campus

8

Page 9: Bring OSG pools to your Cluster. Share your HTCondor pools with

OSG: 140 sites

Page 10: Bring OSG pools to your Cluster. Share your HTCondor pools with

OSG: 140 sites

Page 11: Bring OSG pools to your Cluster. Share your HTCondor pools with

OSG: 140 sites

But where are the users?

Page 12: Bring OSG pools to your Cluster. Share your HTCondor pools with

OSG: 140 sites

But where are the users? On campuses everywhere!

Page 13: Bring OSG pools to your Cluster. Share your HTCondor pools with

OSG Connect - onboard quickly

Researchers without local HPC or HTC can login to OSG Connect directly

but….

Page 14: Bring OSG pools to your Cluster. Share your HTCondor pools with

Bring the submit point to campus?

Some researchers computing on campus HPC find this unnatural or inconvenient

Page 15: Bring OSG pools to your Cluster. Share your HTCondor pools with

The comforts of home

Local HPC provides a fully functional ecosystem● Local standard configuration● Local standard data management● Local standard software access tools● First point of contact for scientific computing consulting

Page 16: Bring OSG pools to your Cluster. Share your HTCondor pools with

Connect Client bring pools to campus

Idea is to bring the submit point to “home” campus

Submit locally,

run globally

Heavy lifting done by OSG hosted services 16

Page 17: Bring OSG pools to your Cluster. Share your HTCondor pools with

What is Connect Client?

On the OSG Connect login host, we encapsulate many common operations under the connect command:

● connect status● connect watch● connect histogram● connect project

17

Page 18: Bring OSG pools to your Cluster. Share your HTCondor pools with

What is Connect Client?

We bring those commands and a bit more to your campus:

● connect setup● connect pull● connect submit● connect q● connect history● connect rm● connect status● ...

18

Page 19: Bring OSG pools to your Cluster. Share your HTCondor pools with

Command summary

● connect setup remote-username○ one-time authorization setup. (Creates a new SSH key pair and uses your

password to authorize it.)○ connect test can validate access at any time

● connect pull / push○ lightweight access means no service can monitor file readiness for

transfer○ instead, we have explicit commands for uni- or bi-directional file

synchronization between local and remote (the “connected” server). The sync occurs over a secure SSH channel.

19

Page 20: Bring OSG pools to your Cluster. Share your HTCondor pools with

Command summary

● connect submit○ like condor_submit, submits a job from a job control file (submit script).

Implicitly performs a push beforehand.● connect q

○ runs condor_q remotely● connect history● connect status● connect rm

○ also condor_* wrappers● connect shell

○ gives you a login on the connect server at the location of your job

20

Page 21: Bring OSG pools to your Cluster. Share your HTCondor pools with

Get an OSG Connect credential

● Sign up at http://osgconnect.net/● Test that you can login:

$ ssh [email protected]

● You can work from here, but to submit from campus cluster log out for now

21

Page 22: Bring OSG pools to your Cluster. Share your HTCondor pools with

Setup your campus cluster

● We will install the Connect Client● Works for CentOS 6.x and similar● You will need sys privs (unless pre-reqs installed)● You can practice today using our docker service

and a vanilla CentOS 6 container image● So, either login to your home cluster, or follow

instructions on next slide...22

Page 23: Bring OSG pools to your Cluster. Share your HTCondor pools with

Aside: practice within a container

First you will need to login via SSH to docker.osgconnect.net using your OSG Connect credentials.

Once there, create a new Docker container:

docker run -ti connect-client-base

Once your container is ready, you will see a prompt similar to this:

[root@a15283b661aa ~]#

You are now inside your container as the super user. You may need root access to install software. Note that you do not need root privileges to use the client — that’s just the only user in our Docker environment.

23

Page 24: Bring OSG pools to your Cluster. Share your HTCondor pools with

Aside: practice within a container

Inside your container, you can interact just as with any Linux server (although very little software is installed in this particular image).

● Typing exit will return to your host shell, and stop the container.● Typing ^P^Q (control-P control-Q) will return you to the host shell

without stopping the container.● To resume: docker attach a15283b661aa

○ a15283b661aa is the container ID, and it appears in your shell prompt inside the container: [root@a15283b661aa ~]#

24

Page 25: Bring OSG pools to your Cluster. Share your HTCondor pools with

Prerequisite software

Before installing the Connect Client, we will need to install some other prerequisite software:

yum install -y python-paramiko curl tar

A lot of text will scroll by as the packages are downloaded and installed into your container. When it’s finished, you will see the following message and be returned to your shell:

Complete!

[root@a15283b661aa ~]#

25

Page 26: Bring OSG pools to your Cluster. Share your HTCondor pools with

Get the Connect Client

Now that the dependencies are installed, you can fetch the Connect Client distribution:

cdcurl -L http://osg.link/connect-client-0.5.2.tar.gz | tar xzf -

cd connect-client-0.5.2

26

Page 27: Bring OSG pools to your Cluster. Share your HTCondor pools with

Install the Connect Client

You are now ready to install the client:./install.sh ~/connect-client

Success looks like this:[install] Setting up the Connect module v0.5.2

Connect modulefile is in {your home directory}/connect-client/connect-client

[install] Installing Connect user commands

[install] ... connect command

[install] ... paramiko (for connect command)

== Paramiko installed.

[install] ... tutorial command

27

Page 28: Bring OSG pools to your Cluster. Share your HTCondor pools with

Setup environment

Once installed, you will need to add the Connect Client to PATH:

cd

export PATH=~/connect-client/bin:$PATH

To use the client on future logins, add it to your ~/.profile:

echo ’export PATH=”~/connect-client/bin:$PATH”’ >>~/.profile

28

Page 29: Bring OSG pools to your Cluster. Share your HTCondor pools with

Aside: catching up

If you’ve had any delays up to this point, we have another

Docker container image with the client already installed.

Exit your current container if needed, then run:

docker run -ti connect-client-ready

You can begin the following steps from this point.

29

Page 30: Bring OSG pools to your Cluster. Share your HTCondor pools with

Setup the connection to OSG

[root@a15283b661aa ~]# connect setup

Please enter the user name that you created during Connect registration. Note that it consists only of letters and numbers, with no @ symbol.

You will be connecting via the connect-client.osgconnect.net server.

Enter your Connect username: rwg

Password for [email protected]:

notice: Ongoing client access has been authorized at connect-client.osgconnect.net.

notice: Use "connect test" to verify access.

[root@a15283b661aa ~]# connect test

Success! Your client access to connect-client.osgconnect.net is working.

[root@a15283b661aa ~]#

30

Page 31: Bring OSG pools to your Cluster. Share your HTCondor pools with

What pools can I now reach?

$ connect status

31

Page 32: Bring OSG pools to your Cluster. Share your HTCondor pools with

[root@a15283b661aa ~]# cd

[root@a15283b661aa ~]# tutorial quickstart

Installing quickstart (master)...

Tutorial files installed in ./tutorial-quickstart.

Running setup in ./tutorial-quickstart…

[root@a15283b661aa ~]# cd tutorial-quickstart

Try out the quickstart $ tutorial

32

Page 33: Bring OSG pools to your Cluster. Share your HTCondor pools with

Prepare to submit 10 jobs

[root@a15283b661aa ~]# cat tutorial03.submit

Universe = vanilla

Executable = short.sh

Arguments = 5 # to sleep 5 seconds

Error = log/job.err.$(Cluster)-$(Process)

Output = log/job.out.$(Cluster)-$(Process)

Log = log/job.log.$(Cluster)

#+ProjectName="ConnectTrain"

Queue 100

Change arg to 60 seconds

Change ConnectTrain to TechEX15Change Queue value to 10

33

Page 34: Bring OSG pools to your Cluster. Share your HTCondor pools with

Inspect the job script itself

[root@a15283b661aa ~]# cat short.sh

#!/bin/bash

# short.sh: a short discovery job

printf "Start time: "; /bin/date

printf "Job is running on node: "; /bin/hostname

printf "Job running as user: "; /usr/bin/id

printf "Job is running in directory: "; /bin/pwd

echo

echo "Working hard..."

sleep ${1-15}

echo "Science complete!"

Script that runs on OSG pools

34

Page 35: Bring OSG pools to your Cluster. Share your HTCondor pools with

[root@a15283b661aa ~]# connect submit tutorial03.submit

+++++.+.+++

9 objects sent; 2 objects up to date; 0 errors

Submitting job(s)..........

10 job(s) submitted to cluster 4070.

Submit $ connect submit

35

Page 36: Bring OSG pools to your Cluster. Share your HTCondor pools with

Check queue[root@a15283b661aa ~]# connect q

-- Submitter: login02.osgconnect.net : <192.170.227.251:37303> : login02.osgconnect.net

ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD

4070.0 rwg 10/8 04:28 0+00:00:00 I 0 0.0 short.sh 60 # to s

4070.1 rwg 10/8 04:28 0+00:00:00 I 0 0.0 short.sh 60 # to s

4070.2 rwg 10/8 04:28 0+00:00:00 I 0 0.0 short.sh 60 # to s

4070.3 rwg 10/8 04:28 0+00:00:00 I 0 0.0 short.sh 60 # to s

4070.4 rwg 10/8 04:28 0+00:00:00 I 0 0.0 short.sh 60 # to s

4070.5 rwg 10/8 04:28 0+00:00:00 I 0 0.0 short.sh 60 # to s

4070.6 rwg 10/8 04:28 0+00:00:00 I 0 0.0 short.sh 60 # to s

4070.7 rwg 10/8 04:28 0+00:00:00 I 0 0.0 short.sh 60 # to s

4070.8 rwg 10/8 04:28 0+00:00:00 I 0 0.0 short.sh 60 # to s

4070.9 rwg 10/8 04:28 0+00:00:00 I 0 0.0 short.sh 60 # to s

10 jobs; 0 completed, 0 removed, 10 idle, 0 running, 0 held, 0 suspended

$ connect q

36

Page 37: Bring OSG pools to your Cluster. Share your HTCondor pools with

Where did the jobs run?

$ connect histogramor$ connect histogram --last

37

Page 38: Bring OSG pools to your Cluster. Share your HTCondor pools with

Where are the results?

● Nothing was returned to the client host automatically:

[root@a15283b661aa ~]# ls

README.md log short.sh tutorial01.submit tutorial02.submit tutorial03.submit

[root@a15283b661aa ~]# ls log/

[root@a15283b661aa ~]#

● Results are sitting on the OSG Connect server

38

Page 39: Bring OSG pools to your Cluster. Share your HTCondor pools with

Check and bring back

On OSG Connect

On campus

$ connect shell

39

Page 40: Bring OSG pools to your Cluster. Share your HTCondor pools with

connect command● connect show-projects

show projects you have access to

● connect project

set your accounting project

● connect status

show condor_status in all pools

● connect q

check progress of your job queue

● connect histogram [--last]

shows where your jobs have been run

● connect history [clusterid.subjob]

condor_history information for our jobs

40

Page 41: Bring OSG pools to your Cluster. Share your HTCondor pools with

tutorial command

sh$ tutorial

$ tutorial

usage: tutorial list - show available tutorials

tutorial info <tutorial-name> - show details of a tutorial

tutorial <tutorial-name> - set up a tutoria

Currently available tutorials:

AutoDockVina .......... Ligand-Receptor docking with AutoDock Vina

R ..................... Estimate Pi using the R programming language

ScalingUp-R ........... Scaling up compute resources - R example

blast ................. blast sequence analysis

cp2k .................. How-to for the electronic structure package CP2K

dagman-namd ........... Launch a series of NAMD simulations via an HTCondor DAG

error101 .............. Use condor_q -better-analyze to analyze stuck jobs

41

Page 42: Bring OSG pools to your Cluster. Share your HTCondor pools with

tutorial command

● Tutorials are maintained in github and downloaded on demand

● Each tutorial’s README is in the OSG Support site○ http://osg.link/connect/userguide○ http://osg.link/connect/recipes

● These are recommended for learning new techniques on OSG Connect

42

Page 43: Bring OSG pools to your Cluster. Share your HTCondor pools with

Help desk - search, ticket, chat

43

Page 44: Bring OSG pools to your Cluster. Share your HTCondor pools with

Connecting campus to national ACI

44

Page 45: Bring OSG pools to your Cluster. Share your HTCondor pools with

OSG for resource providers

● Connect your campus users to the OSG○ connect-client - job submit client for the local cluster○ provide “burst” like capability for HTC jobs to shared opportunistic

resources

● Connect campus cluster to OSG○ Lightweight connect : OSG sends “glidein” jobs to your cluster, using a

simple user account■ No local software or services needed!

○ Large scale: deploy the OSG software stack■ Support more science communities at larger scale

45

Page 46: Bring OSG pools to your Cluster. Share your HTCondor pools with

“Quick Connect” Process

● Phone call to discuss particulars of cluster○ does not need to be HTCondor -- slurm, pbs, others

supported○ Nodes need outbound network connectivity

● Create an osgconnect account that OSG team uses to access

46

Page 47: Bring OSG pools to your Cluster. Share your HTCondor pools with

47

Page 48: Bring OSG pools to your Cluster. Share your HTCondor pools with

@OsgUserssupport.opensciencegrid.orguser-support@opensciencegrid.org

opensciencegrid

48

Page 49: Bring OSG pools to your Cluster. Share your HTCondor pools with

Extra slides

49

Page 50: Bring OSG pools to your Cluster. Share your HTCondor pools with

Delivered CPU hours/month

50

Page 51: Bring OSG pools to your Cluster. Share your HTCondor pools with

Shared Science Throughput

51

Page 52: Bring OSG pools to your Cluster. Share your HTCondor pools with

User approach: cluster as abstraction

52

Page 53: Bring OSG pools to your Cluster. Share your HTCondor pools with

OSG Connect Service

View OSG as anHTC cluster

★ Login host★ Job scheduler★ Software★ Storage

53

Page 54: Bring OSG pools to your Cluster. Share your HTCondor pools with

Software & tools on the OSG

● Distributed software file system OASIS● Special module command

○ identical software on all clusters○ 170 libraries

#!/bin/bash

switchmodules oasismodule load R

module load matlab

...54

Page 55: Bring OSG pools to your Cluster. Share your HTCondor pools with

Submit jobs to OSG with HTCondor

● Simple HTCondor submission● Complexity hidden from the user● No grid (X509) certificates required● Uses HTCondor ClassAd and glidein

technology● DAGMan and other workflow tools

55

Page 56: Bring OSG pools to your Cluster. Share your HTCondor pools with

Campus ACI as hybrids of:“on premise”, national ACI, national HPC, and cloud

56