03 sge training

Upload: william-agudelo

Post on 01-Jun-2018

229 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 03 SGE Training

    1/39

    Introduction to Sun GridIntroduction to Sun GridEngine (SGE)Engine (SGE)

  • 8/9/2019 03 SGE Training

    2/39

    2

    What is SGE?• Sun Grid Engine (SGE) is an open

    source community effort to facilitate theadoption of distributed computingsolutions. Sponsored by SunMicrosystems

     – Features :

    • Automatic computing resource selection• Resource Accounting

    • Support for parallel computing (mpi)

    • Support for Grid Computing

  • 8/9/2019 03 SGE Training

    3/39

    3

    SGE Job Management

  • 8/9/2019 03 SGE Training

    4/39

    4

    Job management in SGE1. Each user submit their job into SGE

    scheduler. No need to wait for the job tofinish.

    2. SGE choose node(s) to run the job.

    3. Output and error of the job will be placedin output and error file

  • 8/9/2019 03 SGE Training

    5/39

    5

    SGE Architecture &

    Components

  • 8/9/2019 03 SGE Training

    6/39

    6

    SGE Components• Host type

     – Master Host• Control all jobs• Run at frontend node

     – Execution Host

    • Host that compute the job(s)• Run at compute node

     – Submit Host• Where user log-in and submit their job

    • In ROCKS, frontend is also Submit Host – Administrative Host

    • Where admin log-in and do administrative task over SGE• Also frontend in ROCKS.

  • 8/9/2019 03 SGE Training

    7/39

    7

    SGE Components• SGE Software Components

     – sge_commd - Communication daemon. Centralizingall communication. Run on all nodes

     – sge_qmaster - Entry point for all command (qsub,qstat, etc…). Run at Master Host (frontend)

     – sge_execd - Execution daemon. Run only on remotecomputing resource. Run at Execution Host (computenode)

     – SGE Utility (qsub, qdel, qstat, etc…) - Utilitycommand for user job submission and statistics.Install on Submit Host and Administrative Host only.

  • 8/9/2019 03 SGE Training

    8/39

    8

    SGE Components• Queue

     – A container for a class of jobs allowed toexecute on a host concurrently – A queue determines jobs types

    • Cpu (itanium.q, xeon.q)

    • Mem (himem.q)• Time (short.q, long.q)• Licences (Fluent.q)

     – No need to submit job to a particular queue!• Only need to specify your job requirements

     – OS, software, mem

    • SGE will dispatch to suitable queue on a low-loaded host

     – ROCKS automatically setup queue for you!

  • 8/9/2019 03 SGE Training

    9/39

    9

    Basic SGE Command•   qsub   - Job submission

    •   qstat   - View job statistics

    •   qdel   - Delete a job from queue

    •   qhos t  - show current online host•   qalter   - job parameter alteration

  • 8/9/2019 03 SGE Training

    10/39

    10

    Basic Job Submission• NOTE: Must use ordinary user to submitthe job!

    • Example : Create a simple “Job Script” tosubmit the job

    • Save it to a file named “simplejob”

    • Then submit the job using

     – qsub simplejob

    #!/bin/shdateecho “Hello world”

  • 8/9/2019 03 SGE Training

    11/39

    11

    Basic job submission (con’t)• The job id will be shown after job submited

    • After job finished, output will be placed in“simplejob.o” and error in“simplejob.e”

  • 8/9/2019 03 SGE Training

    12/39

    12

    Job statistics• Now create another job script called

    “simplejob2” with the following content

    • Submit the job

    qsub simplejob2

    #!/bin/shdateecho “sleep 10000 seconds”

    sleep 1000

  • 8/9/2019 03 SGE Training

    13/39

    13

    Job statistics (con’t)• Now, let’s see the status of our job with “qstat”

    • state “qw” means job is waiting in the queue (SGE isallocating a node for the job). Now try “qstat” again

    • state “t” means job is starting. “r” means job is running

  • 8/9/2019 03 SGE Training

    14/39

    14

    Job statistics (con’t)• Important field in job statistics

     – Job ID - Job ID – Name - job script name

     – user name - owner of the job

     – state - job state

     – queue - queue name (in ROCKS, it usually a

    node name)

  • 8/9/2019 03 SGE Training

    15/39

    15

    Job deletion• Use “qstat” to see the job id of

    “simplejob2”

    • Now, let’s delete the job with

    qdel

  • 8/9/2019 03 SGE Training

    16/39

    16

    Job deletion (con’t)• Job output and error (until the job was

    killed) will be placed in simplejob2.o.

  • 8/9/2019 03 SGE Training

    17/39

    17

    What is Job Script?• Job script is a shell script that describe the

     job – The program command

     – Some job parameter (aka. “qsub” option)

     – May include the command to start parallel job(such as “mpirun”)

  • 8/9/2019 03 SGE Training

    18/39

    18

    More on job submission• Let’s see what we can do on job submission• Create a directory named “myproject” then cd to that

    directory – mkdir myproject – cd myproject

    • Then, create a program “myprog” with the following

    content

    • Compile this program into “myprog” – gcc myprog.c -o myprog

  • 8/9/2019 03 SGE Training

    19/39

    19

    More on job submission (con’t)• Now let’s create a job script “advancejob”

    • Note the “./myprog” line

  • 8/9/2019 03 SGE Training

    20/39

    20

    More on job submission (con’t)• Now, try submiiting the job with the same

    commandqsub advancejob

    • Now, let’s see the output

  • 8/9/2019 03 SGE Training

    21/39

    21

    More on job submission (con’t)• SGE always run the job on user’s home

    directory• The output and error file also placed in

    user’s home directory

    • You need to supply “-cwd”, “-o”, and “-e” tofix this problem

     – -cwd - Change to current working directorybefore doing anything

     – -o, -e - specify output file name (instead ofxx.{o,e})

  • 8/9/2019 03 SGE Training

    22/39

    22

    More on job submission (con’t)• Now let’s submit the job again with the

    following command

    qsub -cwd -o ./advancejob.out -e./advancejob.err advancejob arg1 arg2 arg3

     – NOTE: you can pass job script argument as“arg1 arg2 arg3” in this example

  • 8/9/2019 03 SGE Training

    23/39

    23

    More job options• qsub   -N theadvancejob -a 03121500 -cwd   -

    S /bin/sh   -o advance.out   - j y advancejobarg1 arg2 arg3 – -N - specify job name

     – -a - specify job start date([YY]MMDDHHMM[.ss]) – -S - specify the shell interpreter for the job

    script

     – -j y - merge standard error to output file(advance.out) in this case

    • Try to submit the job and see the result!

  • 8/9/2019 03 SGE Training

    24/39

    24

    Placing job option in the script• You can specify the job option in job script,

    by prefix the line with “#$”

  • 8/9/2019 03 SGE Training

    25/39

    25

     Altering the job• You can alter the job parameter after it

    was queued• Only some part of parameter can be

    altered after the job was launched!

    • Using “qalter” command to altering job,using the same argument and option as

    “qsub”

  • 8/9/2019 03 SGE Training

    26/39

    26

     Altering the job parameter 

    • Please consult the man page (man qalter)for the list of option that could be alteredafter the job launched (in “t” or “r” state

  • 8/9/2019 03 SGE Training

    27/39

    27

    Job suspension• You can suspend the job state at any time

     – Suspend queued job stop that job from beinglaunched

    • When to suspend job?

     – You need to run another more important job,but the old job consume all resource

     – Admin. wants to suspend some job because itconsume too much resource on the system

  • 8/9/2019 03 SGE Training

    28/39

    28

    Job suspension (con’t)• Using “qhold” command

     – qhold

    • Using “qlrs” command to release a hold job

     – qrls

  • 8/9/2019 03 SGE Training

    29/39

    29

    The “qhost” command• You can use “qhost” command to see the

    online node in SGE – qhost

     – Try supplying -j option and see what’shappened (try it after submit some job)

  • 8/9/2019 03 SGE Training

    30/39

    30

    “qmon”: SGE in Graphics Mode• Previous section we introduce using SGE

    via command line• We can comfortably utilize SGE via

    Graphical User Interface (GUI) by qmon

    • Among the facilities provided by the qmonare submitting jobs, managing jobs,

    managing hosts, and managing jobqueues

  • 8/9/2019 03 SGE Training

    31/39

    31

    Running qmon• X-Windows is required by qmon for

    providing GUI• Start X-Windows by “startx”

    • Start the qmon by “qmon”

  • 8/9/2019 03 SGE Training

    32/39

    32

    Submitting a Job via QMON• Click , the submit job window will show

  • 8/9/2019 03 SGE Training

    33/39

    33

    Job Control via QMON• Click for viewing job status and

    controlling jobs

  • 8/9/2019 03 SGE Training

    34/39

    34

    Queue Control• Only one compute node usually consists

    of one queue but you can add morequeues or remove existing queues

    • Slot management

     – Slot is the capacity of a queue that can handleconcurrent jobs

     – May provide “Number of slot of a queue =Number of processor of the compute node”

  • 8/9/2019 03 SGE Training

    35/39

    35

    Queue Control via SGE• Click for control queues

  • 8/9/2019 03 SGE Training

    36/39

    36

    Queue Control via SGE

    (Cont…)

    • This icon present a queue named ‘compute0’prepared for a host named ‘comp-pvfs-0-0’

    • This queue consists of only one slot• You can modify properties of this queue by

    highlight its icon and click the ‘Modify’ button

    * Normal user cannot control queues

    Q C l i SGE

  • 8/9/2019 03 SGE Training

    37/39

    37

    Queue Control via SGE

    (Cont…)• Modify the properties of a queue

    • Try to modify the number of slot

  • 8/9/2019 03 SGE Training

    38/39

    38

    Lab 1: Batch scheduler 

    • Write a small program that calculate the

    multiplication table. Save the file inmultab.c

     – Program takes one argument which is the

    number used to generate the multiplicationtable

    • Multab 2 - generate multiplication table for number 2

     – Print the multiplication table to standardoutput

    • Using SGE to submit the job . Calculatethe multiplication table of 2 to 12

  • 8/9/2019 03 SGE Training

    39/39

    The End