neeshub simulation capabilities february 17, 2012 webinar nees/resources/4079

28
NEEShub Simulation Capabilities February 17, 2012 Webinar http://nees.org/resources/4079 George E Brown, JR. Network for Earthquake Engineering Simulation Gregory Rodgers Ph.D. NEESComm IT Purdue University, West Lafayette, IN Post-webinar updates

Upload: sailor

Post on 14-Jan-2016

34 views

Category:

Documents


0 download

DESCRIPTION

NEEShub Simulation Capabilities February 17, 2012 Webinar http://nees.org/resources/4079 George E Brown, JR. Network for Earthquake Engineering Simulation Gregory Rodgers Ph.D. NEESComm IT Purdue University, West Lafayette, IN Post-webinar updates. Webinar Introduction. Audience - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: NEEShub Simulation Capabilities February 17, 2012 Webinar nees/resources/4079

NEEShub Simulation CapabilitiesFebruary 17, 2012 Webinar

http://nees.org/resources/4079George E Brown, JR. Network for Earthquake Engineering

Simulation

Gregory Rodgers Ph.D.NEESComm IT

Purdue University, West Lafayette, IN

Post-webinar updates

Page 2: NEEShub Simulation Capabilities February 17, 2012 Webinar nees/resources/4079

Webinar Introduction

Audience Simulation tool developers and

NEES power users who:

•have a very large simulations or many simulations in excess of 30 minutes run time.

•need to script parameter sweeps

•run a structure analysis with a large suite of ground motions.

Prerequisite An understanding of command line interfaces such as Linux bash

Summary•This webinar will introduce advanced users to new NEEShub capabilities in the area of simulation and batch processing. Power users often write a script to orchestrate a set of simulation runs to cover many different test cases. Recent batch processing services have been added to NEEShub to make this easy and to provide access to large scratch space. Upon completion of this webinar, a user will be able to write scripts to submit one or more jobs to multiple execution venues to utilize high performance computing resources available to NEES

Page 3: NEEShub Simulation Capabilities February 17, 2012 Webinar nees/resources/4079

Agenda

HOUR 1 •How Simulation fits into the NEES CyberInfrastructure

•Introduction to the linux workspace tool on NEEShub

•Manual execution (command line) of applications.

•Manual execution (command line) of the opensees simulator

•Use of the new batchsubmit command to run opensees

•Use of batchsubmit to run other applications

•The batchstatus command

•Demonstration of how HOME directory space is linked to scratch space.

•Advanced batchsubmit options and scripting the execution of batchsubmit.

HOUR 2 (advanced)•How to build a bash command file including editors available on Linux

•Simple parallel execution (The --ncpus argument to batchsubmit)

•Parallel opensees (how to modify sequential input to be parallel input)

•How to use batchsubmit for other venues.

•Overview of various NEES execution High Performance Computing (HPC) venues: They are local hub execution, osg, hansen, steele, kraken, and ranger,

•How does the openseeslab user interface use batchsubmit

• Advanced batchsubmit options review

•Scratch cleanup algorithm

Page 4: NEEShub Simulation Capabilities February 17, 2012 Webinar nees/resources/4079

NEES Cyber Infrastructure

WebBrowser

Purdue Hansen

Open Science Grid

NSF Xsede

A. Site Operations Tools

B. NEESHub Web Server

E. EOT

F. Spreadsheet DBs

Scratch Space

NE

Group SpacePersonal Space

Hub Tool Sessions

C. Cloud / Simulation Environment

Sync

hron

ees

site/personal data

NEESWeb

ServicesServer

PEN

Custom

WS tools

Experiment Data

Pro

ject

Ed

ito

r

Res

ou

rces

Co

llab

ora

tio

n

D. The NEES Project Warehouse

Page 5: NEEShub Simulation Capabilities February 17, 2012 Webinar nees/resources/4079

Introduction to the linux workspace tool on NEEShub.

• Start a workspace from this page

http://nees.org/resources/workspace

Click “Launch”• You must be part of a special group. If you are not in this group,

open a ticket stating that you need workspace access, provide justification, and we will add you to the group.

• The window can be resized and popped out of the browser. • Multiple terminals can be opened in the same window.

• A workspace session is persistent. You can leave the browser and you can get back to existing workspace from myneeshub page at any time.

• This session can also be shared with other users or administrators.

Page 6: NEEShub Simulation Capabilities February 17, 2012 Webinar nees/resources/4079

Execution of Applications

from the command line

• Simple utilities

date Print the date

env List environment variables

ls (and ls –l) Show list of files (long list)

cd Change working directory

pwd Show working directory

mkdir make a directory

rm (rmdir) Remove a file (directory)

cat Write contents of file on the screen

cp Copy a file

man <command> Show help about a command (man pages)

exit Teminate your session

Use Arrow keys get previous commands

Page 7: NEEShub Simulation Capabilities February 17, 2012 Webinar nees/resources/4079

Putting commands in a script file

• A list of commands can be put into a script– Avoid retyping– Loop through commands– To make executable use command:

chmod 755 <filename>

• 3 important scripting languages to considerbash linux commands, also csh

Tcl/Tk The language for opensees

Python Advanced high performance language

Page 8: NEEShub Simulation Capabilities February 17, 2012 Webinar nees/resources/4079

Manual Execution of OpenSees

• Opensees tcl prompt verses Linux command prompt• Start opensees with a tcl prompt (no argument)• Start opensees to execute a file of tcl commands. ( one

argument)• The binary OpenSees verses the wrapper shell called

opensees

opensees <input TCL file>

The spelling of the OpenSees binary is OpenSees, but opensees is a wrapper to call OpenSees that sets up the environment correctly.

Page 9: NEEShub Simulation Capabilities February 17, 2012 Webinar nees/resources/4079

High Volume Batch on NEEShub

• Consistent and asynchronous submission to multiple venues: local, osg, steele, hansen, kraken. The last three are part of the new xsede system that replaces Teragrid.

• Asynchronous: job is submitted without waiting for job to complete before returning control to submitter.

• Your run directories in $HOME/scratch will be symbolic links to a large (>30TB) shared space. Runs will be compressed or purged with a cleanup algorithm as needed.

• Only user will have access to run directories

Page 10: NEEShub Simulation Capabilities February 17, 2012 Webinar nees/resources/4079

batchsubmit

• The batchsubmit command is a wrapper around any command to execute an asynchronous batch job.

batchsubmit <batchsubmitoptions> command <command options>

• batchsubmitoptions begin with a double dash.

• batchsubmit prints one line of output: the name of the newly created directory where BOTH job input is located and output will be found.

• The help for batchsubmit gives an example of how to run opensees batchsubmit –h

batchsubmit –h | more

batchsubmit date

batchsubmit opensees /apps/demo/sine/sine.tcl

batchsubmit –appdir /apps/openseesbuild/osg OpenSees /apps/demo/sine/sine.tcl

batchsubmit –jo

btype sine –onlyinfile opensees /apps/demo/sine/sine.tcl

Page 11: NEEShub Simulation Capabilities February 17, 2012 Webinar nees/resources/4079

Input Processing

• Default: The first argument after the application command is considered an input file. All files from this directory are copied to the scratch run directory. Two other options:--onlyinfile Only copy the input file--rcopyindir Recursive copy all files and directories from the same directory as the inputfile.

• Note: input file not allowed to be home directory unless –onlyinfile specified. You should create a directory for your opensees tcl file. Recommend a dir for each simulation.

• What if you have an application where the first argument is NOT the input file (unlike opensees)?--infilearg Indicates which argument is the input file--infile Use this file as an input file where this file is implied by application command hence not one of its arguments.

Page 12: NEEShub Simulation Capabilities February 17, 2012 Webinar nees/resources/4079

• Job input exists in new scratch directory upon completion of the batchsubmit command. One scratch directory for each batchsubmit command (each job). The directory name has this template.

$HOME/scratch/<jobtype>/<jobname>/

• Job output exists when the job is completed. –You will get an email when job starts and when job completes unless you specify –nonotify

• Review the various output files generated in a job run directory <jobname>.stdout Standard output. What would be printed to screen

<jobname>.stderr Standard error.

The run directory Same directory name where the input file was found.

Note: your input file is in this directory.

@STATUS

joblog Interesting info about the environment job was run

.log Statistics recorded about this job

.born_on_date Used for scratch cleanup.

batchsubmit files/dirs

Page 13: NEEShub Simulation Capabilities February 17, 2012 Webinar nees/resources/4079

Job lifecycle

• System uses the file @STATUS to store the job status.• States:

Presubmit – only for remote venuesSubmitted – Waiting to start. Only for remote venues.Started – application is running. Remote venues will actually update this fileCompleted – All results are returned. Deleted – Job has been removed from the shared scratch space but your scratch directory still shows it.Saved – Job was moved to your HOME directory. Symbolic link to shared scratch space is gone. Job is taking up your quota when it is saved.

Page 14: NEEShub Simulation Capabilities February 17, 2012 Webinar nees/resources/4079

batchstatus and batchcancel

• Other batchsubmit utilitiesbatchstatus – shows the status of each of your jobs.

batchcancel – Cancel a job. This command is not released yet.

batchsave – Remove a job from scratch space and save it to your HOME directory space.

Page 15: NEEShub Simulation Capabilities February 17, 2012 Webinar nees/resources/4079

NEEShub Disk Space

• NEEShub data locations:HOME space

Groups Space

Scratch space

Warehouse

• Use of synchronees to upload and download between your workstation and NEEShub spaces

• Advice: Use relative names for input and output files so your job can run on venues other than “local” Scratch Space

NE

Group SpaceHOME Space

batchsubmit

Syn

chro

nee

s

workstation data

web

dav/home/neeshub/<youruserid>

/data/groups/<groupname>$HOME/scratch/nees/home/<PROJ-DIR>

Page 16: NEEShub Simulation Capabilities February 17, 2012 Webinar nees/resources/4079

Advance batchsubmit options

--wait Only for venues local and osg. This option will hold the completion of batchsubmit until the job is COMPLETED. Standard output and standard error will be printed on the screen.

--appdir A Directory containing the application with bin and lib subdirs. The app_command must be in appdir/bin subdirectory. Both bin and lib directories are sent to execution machine for every run. So be careful not to specify a large installed application directory. This option eliminates need to install apps on venues other than local. See your application provider.

--envars List of environment variables separated by commas. Only specify names here, values must be set before calling the batchsubmit command thus allowing special characters. For local execution, all environment variables are commuted.

Page 17: NEEShub Simulation Capabilities February 17, 2012 Webinar nees/resources/4079

HOUR 2 Agenda

• Simple parallel execution (The --ncpus argument to batchsubmit) • Parallel opensees (how to modify sequential input to be parallel

input) • How to use batchsubmit for other venues. • Overview of various NEES execution High Performance Computing

(HPC) venues: Local Use for testing small jobs less than 4 hours ncpus<16

osg Use for many moderate size jobs. ncpus=1

hansen Use for large parallel jobs ncpus<=48

Steele Use for many parallel jobs ncpus<=8

kraken and ranger (pending)

• Advanced batchsubmit options and scripting the execution of batchsubmit.

• Building bash scripts to save typing. • Scratch cleanup algorithm

Page 18: NEEShub Simulation Capabilities February 17, 2012 Webinar nees/resources/4079

Simple Parallel Execution

--ncpus <value>•The above options will cause your application command to execute <value> times in parallel. Example:batchsubmit –ncpus 4 date •What good is it to run the same thing ncpus times? None, unless your application is aware that it is running in parallel. •A parallel aware application will only do 1 Nth the amount of work, knowing that the other processors will do the other parts of the work. •It is not hard to make your application become parallel aware especially with a scripting language like TCL.

Page 19: NEEShub Simulation Capabilities February 17, 2012 Webinar nees/resources/4079

Simple Parallel Execution

Example: Run the same model through 27 ground motions. We want to divide the ground motions among 8 processors, PID = 0..7

P0 P1 P2 P3 P4 P5 P6 P7

0 1 2 3 4 5 6 7

8 9 10 11 12 13 14 15

16 17 18 19 20 21 22 23

24 25 26

/* If PID is processor number, then this can be run on all 8 processors */

For count = 0 to 26

if (count % 8) == PID then /* % gets remainder from division */

Do analysis for ground motion #count.

else

skip

end

Page 20: NEEShub Simulation Capabilities February 17, 2012 Webinar nees/resources/4079

set pid [getPID]set numP [getNP]set count 0;source ReadRecord.tclset g 384.4foreach scaleFactor {0.25 0.5 0.75 1.0} { foreach gMotion [glob -nocomplain -directory GM *.AT2] { if {[expr $count % $numP] == $pid} { source model.tcl source analysis.tcl set ok [doGravity] loadConst -time 0.0 if {$ok == 0} { set gMotionName [string range $gMotion 0 end-4 ] ReadRecord ./$gMotionName.AT2 ./$gMotionName$scaleFactor.dat dT nPts timeSeries Path 1 -filePath $gMotionName$scaleFactor.dat -dt $dT -factor [expr $g*$scaleFactor] if {$nPts != 0} { recorder EnvelopeNode -file $gMotionName$scaleFactor.out -node 3 4 -dof 1 2 3 disp doDynamic $dT $nPts file delete $gMotionName$scaleFactor.dat if {$ok == 0} { puts "$gMotionName with factor: $scaleFactor OK" } else { puts "$gMotionName with factor: $scaleFActor FAILED" } } else { puts "$gMotion - NO RECORD" } } wipe } incr count 1; }}

Yellow highlighted code is possible in OpenSeesMPYou can remove the yellow and run in OpenSeesBut it will take much longer. The value of numP will be the –ncpus value provided to batchsubmit

Page 21: NEEShub Simulation Capabilities February 17, 2012 Webinar nees/resources/4079

How to use batchsubmit for other execution venues

--venue hansen | steele | osg

Future values will include kraken and ranger.

Note: The batchsubmit options --nn and --ppn are not yet functional. In the future, this will allow extremely large values of --ncpus. -- ncpus will be the product of --nn and --ppn.

--mpiargs This option specifies additional arguments to mpirun.

Wrap these arguments in single quotes.

Typically no additional arguments mpi agruments are needed.

Page 22: NEEShub Simulation Capabilities February 17, 2012 Webinar nees/resources/4079

Sample parallel jobs

To save typing, I created the following scripts

/apps/demo/bin/ex1/apps/demo/bin/ex2

The above will just print the batchsubmit examles but not run them. The following scripts will print and run the commands

/apps/demo/bin/ex1.sh/apps/demo/bin/ex2.sh

Lets take time to study these examples.

Page 23: NEEShub Simulation Capabilities February 17, 2012 Webinar nees/resources/4079

Venue guidelines

Venue Guidance --ncpus

--------- -------------------------------------- -------------

local Use for testing small jobs less than 4 hours --ncpus<16

osg Use for many moderate size jobs. --ncpus=1

hansen Use for large parallel jobs --ncpus<=48

steele Use for many parallel jobs --ncpus<=8

• Future venues to include kraken and ranger. • Xsede (formerly teragrid) venues are steele, kraken, and ranger.

Xsede and hansen use PBS for job submission. PBS jobs submission is automated by batchsubmit.

• This batchsubmit option can change the pbs queue --xdqueue

The default queue for steele is "standby". The default queue for hansen is "nees".

Page 24: NEEShub Simulation Capabilities February 17, 2012 Webinar nees/resources/4079

--jnpref Job name prefix, default is "job". Blanks not allowed

Environment variable JNPREF will also override this.

Try "export JNPREF="run_" before batchsubmit.

--jobname Specify jobname and override autoincrement generated jobname.

Recommend not to use this to avoid jobname collisions.

However, if a collision occurs with an existing scratch dir,

batchsubmit will create a new directory.

--xdqueue Queue for xsede machines (steele or hansen)

The default queue is "standby".

The default queue for hansen is "nees".

Advanced batchsubmit options

Page 25: NEEShub Simulation Capabilities February 17, 2012 Webinar nees/resources/4079

Building bash scripts

• Commands can be stored in a file and these files can be executed

• File can be “ sourced” or executed. • Recommend you store your personal scripts in

$HOME/bin• Text Editors available on NEEShub

geditnanovi

Page 26: NEEShub Simulation Capabilities February 17, 2012 Webinar nees/resources/4079

Scratch Cleanup Algorithm

1. IF used < 75% THEN EXIT report no activity required 2. Delete all jobs > 1yr old , log action3. FOR ACTION = compress, archive (phase1 , phase2)4. | FOR T=6m, 5m,4m,3m,2m,4w,3w (pass 1, pass 2, …)5. | | FOR X= 5,10,20,40,ALL 6. | | | Calculate set of top X users of scratch space7. | | | FOR SIZE=128G,32G,8G,2G,512M,128M,32M8. | | | | FOREACH rundirectory9. | | | | | IF rundirectory size>SIZE AND10. | | | | | rundirectory is owned by X AND11. | | | | | rundirectory lifetime >T12. | | | | | THEN ACTION rundirectory, log

action13. | | | | IF used < 50% THEN14. | | | | EXIT report SIZE,X,T,A thresholds15. IF used > 50% THEN report policy failure and revise policy

Values in red are policy parameters that can be revised by management as needed

Page 27: NEEShub Simulation Capabilities February 17, 2012 Webinar nees/resources/4079

GC Algorithm Lemmas

• No jobs < 3 weeks old will ever be deleted or compressed without a policy change. • Very small jobs < 32MB compressed will be never be deleted by the system.

– Worst case: 500,000 32MB jobs would consume 50% of a 32TB scratch. • Process largest to smallest jobs for a fixed set of users and older than a specific age

(inner loop)• Process sets of large users with jobs older than a specific age (middle loop)• Outer loop

– Pass 1 Process jobs > 6months – Pass 2 Process jobs > 5 months …

• No jobs are deleted until all jobs >3 weeks old and > 32MB have been compressed.Compression is phase 1, deletion is phase 2.

• Example report stream:– Day1 : No activity, 74% used– Day2 : >2GB,Top 10 users, >3 months old, compressed 50% used– Day3 : >32MB, All users, > 3 weeks old, compressed 50% used (closest call to deletion)– Day4 : > 32GB, Top 5 users, >2 months old, deleted 45% used– Day5 : No activity, 65% used– Day X-1 : >2GB, All users, > 3 weeks old, deleted 50% used (close to policy failure)– DayX : >32M, All users, > 3 weeks old deleted, 60% used , POLICY FAILURE

Policy parameters need adjustment

Page 28: NEEShub Simulation Capabilities February 17, 2012 Webinar nees/resources/4079

Topics Not covered in this webinar

• Use of batchsubmit to build User Interface • Use of pegasus for workflow management

– This is in development and test. A single pegaus job can submit many jobs that have inter-job dependencies.

• Creation of appdir for portable applications. Only functional appdir today is /app/openseesbuild/osg

• Modification of OpenSees source to create personal copy of OpenSees with custom materials and models. – Process in development with Prof. Elwood’s graduate

student.