03 sge training
TRANSCRIPT
-
8/9/2019 03 SGE Training
1/39
Introduction to Sun GridIntroduction to Sun GridEngine (SGE)Engine (SGE)
-
8/9/2019 03 SGE Training
2/39
2
What is SGE?• Sun Grid Engine (SGE) is an open
source community effort to facilitate theadoption of distributed computingsolutions. Sponsored by SunMicrosystems
– Features :
• Automatic computing resource selection• Resource Accounting
• Support for parallel computing (mpi)
• Support for Grid Computing
-
8/9/2019 03 SGE Training
3/39
3
SGE Job Management
-
8/9/2019 03 SGE Training
4/39
4
Job management in SGE1. Each user submit their job into SGE
scheduler. No need to wait for the job tofinish.
2. SGE choose node(s) to run the job.
3. Output and error of the job will be placedin output and error file
-
8/9/2019 03 SGE Training
5/39
5
SGE Architecture &
Components
-
8/9/2019 03 SGE Training
6/39
6
SGE Components• Host type
– Master Host• Control all jobs• Run at frontend node
– Execution Host
• Host that compute the job(s)• Run at compute node
– Submit Host• Where user log-in and submit their job
• In ROCKS, frontend is also Submit Host – Administrative Host
• Where admin log-in and do administrative task over SGE• Also frontend in ROCKS.
-
8/9/2019 03 SGE Training
7/39
7
SGE Components• SGE Software Components
– sge_commd - Communication daemon. Centralizingall communication. Run on all nodes
– sge_qmaster - Entry point for all command (qsub,qstat, etc…). Run at Master Host (frontend)
– sge_execd - Execution daemon. Run only on remotecomputing resource. Run at Execution Host (computenode)
– SGE Utility (qsub, qdel, qstat, etc…) - Utilitycommand for user job submission and statistics.Install on Submit Host and Administrative Host only.
-
8/9/2019 03 SGE Training
8/39
8
SGE Components• Queue
– A container for a class of jobs allowed toexecute on a host concurrently – A queue determines jobs types
• Cpu (itanium.q, xeon.q)
• Mem (himem.q)• Time (short.q, long.q)• Licences (Fluent.q)
– No need to submit job to a particular queue!• Only need to specify your job requirements
– OS, software, mem
• SGE will dispatch to suitable queue on a low-loaded host
– ROCKS automatically setup queue for you!
-
8/9/2019 03 SGE Training
9/39
9
Basic SGE Command• qsub - Job submission
• qstat - View job statistics
• qdel - Delete a job from queue
• qhos t - show current online host• qalter - job parameter alteration
-
8/9/2019 03 SGE Training
10/39
10
Basic Job Submission• NOTE: Must use ordinary user to submitthe job!
• Example : Create a simple “Job Script” tosubmit the job
• Save it to a file named “simplejob”
• Then submit the job using
– qsub simplejob
#!/bin/shdateecho “Hello world”
-
8/9/2019 03 SGE Training
11/39
11
Basic job submission (con’t)• The job id will be shown after job submited
• After job finished, output will be placed in“simplejob.o” and error in“simplejob.e”
-
8/9/2019 03 SGE Training
12/39
12
Job statistics• Now create another job script called
“simplejob2” with the following content
• Submit the job
qsub simplejob2
#!/bin/shdateecho “sleep 10000 seconds”
sleep 1000
-
8/9/2019 03 SGE Training
13/39
13
Job statistics (con’t)• Now, let’s see the status of our job with “qstat”
• state “qw” means job is waiting in the queue (SGE isallocating a node for the job). Now try “qstat” again
• state “t” means job is starting. “r” means job is running
-
8/9/2019 03 SGE Training
14/39
14
Job statistics (con’t)• Important field in job statistics
– Job ID - Job ID – Name - job script name
– user name - owner of the job
– state - job state
– queue - queue name (in ROCKS, it usually a
node name)
-
8/9/2019 03 SGE Training
15/39
15
Job deletion• Use “qstat” to see the job id of
“simplejob2”
• Now, let’s delete the job with
qdel
-
8/9/2019 03 SGE Training
16/39
16
Job deletion (con’t)• Job output and error (until the job was
killed) will be placed in simplejob2.o.
-
8/9/2019 03 SGE Training
17/39
17
What is Job Script?• Job script is a shell script that describe the
job – The program command
– Some job parameter (aka. “qsub” option)
– May include the command to start parallel job(such as “mpirun”)
-
8/9/2019 03 SGE Training
18/39
18
More on job submission• Let’s see what we can do on job submission• Create a directory named “myproject” then cd to that
directory – mkdir myproject – cd myproject
• Then, create a program “myprog” with the following
content
• Compile this program into “myprog” – gcc myprog.c -o myprog
-
8/9/2019 03 SGE Training
19/39
19
More on job submission (con’t)• Now let’s create a job script “advancejob”
• Note the “./myprog” line
-
8/9/2019 03 SGE Training
20/39
20
More on job submission (con’t)• Now, try submiiting the job with the same
commandqsub advancejob
• Now, let’s see the output
-
8/9/2019 03 SGE Training
21/39
21
More on job submission (con’t)• SGE always run the job on user’s home
directory• The output and error file also placed in
user’s home directory
• You need to supply “-cwd”, “-o”, and “-e” tofix this problem
– -cwd - Change to current working directorybefore doing anything
– -o, -e - specify output file name (instead ofxx.{o,e})
-
8/9/2019 03 SGE Training
22/39
22
More on job submission (con’t)• Now let’s submit the job again with the
following command
qsub -cwd -o ./advancejob.out -e./advancejob.err advancejob arg1 arg2 arg3
– NOTE: you can pass job script argument as“arg1 arg2 arg3” in this example
-
8/9/2019 03 SGE Training
23/39
23
More job options• qsub -N theadvancejob -a 03121500 -cwd -
S /bin/sh -o advance.out - j y advancejobarg1 arg2 arg3 – -N - specify job name
– -a - specify job start date([YY]MMDDHHMM[.ss]) – -S - specify the shell interpreter for the job
script
– -j y - merge standard error to output file(advance.out) in this case
• Try to submit the job and see the result!
-
8/9/2019 03 SGE Training
24/39
24
Placing job option in the script• You can specify the job option in job script,
by prefix the line with “#$”
-
8/9/2019 03 SGE Training
25/39
25
Altering the job• You can alter the job parameter after it
was queued• Only some part of parameter can be
altered after the job was launched!
• Using “qalter” command to altering job,using the same argument and option as
“qsub”
-
8/9/2019 03 SGE Training
26/39
26
Altering the job parameter
• Please consult the man page (man qalter)for the list of option that could be alteredafter the job launched (in “t” or “r” state
-
8/9/2019 03 SGE Training
27/39
27
Job suspension• You can suspend the job state at any time
– Suspend queued job stop that job from beinglaunched
• When to suspend job?
– You need to run another more important job,but the old job consume all resource
– Admin. wants to suspend some job because itconsume too much resource on the system
-
8/9/2019 03 SGE Training
28/39
28
Job suspension (con’t)• Using “qhold” command
– qhold
• Using “qlrs” command to release a hold job
– qrls
-
8/9/2019 03 SGE Training
29/39
29
The “qhost” command• You can use “qhost” command to see the
online node in SGE – qhost
– Try supplying -j option and see what’shappened (try it after submit some job)
-
8/9/2019 03 SGE Training
30/39
30
“qmon”: SGE in Graphics Mode• Previous section we introduce using SGE
via command line• We can comfortably utilize SGE via
Graphical User Interface (GUI) by qmon
• Among the facilities provided by the qmonare submitting jobs, managing jobs,
managing hosts, and managing jobqueues
-
8/9/2019 03 SGE Training
31/39
31
Running qmon• X-Windows is required by qmon for
providing GUI• Start X-Windows by “startx”
• Start the qmon by “qmon”
-
8/9/2019 03 SGE Training
32/39
32
Submitting a Job via QMON• Click , the submit job window will show
-
8/9/2019 03 SGE Training
33/39
33
Job Control via QMON• Click for viewing job status and
controlling jobs
-
8/9/2019 03 SGE Training
34/39
34
Queue Control• Only one compute node usually consists
of one queue but you can add morequeues or remove existing queues
• Slot management
– Slot is the capacity of a queue that can handleconcurrent jobs
– May provide “Number of slot of a queue =Number of processor of the compute node”
-
8/9/2019 03 SGE Training
35/39
35
Queue Control via SGE• Click for control queues
-
8/9/2019 03 SGE Training
36/39
36
Queue Control via SGE
(Cont…)
• This icon present a queue named ‘compute0’prepared for a host named ‘comp-pvfs-0-0’
• This queue consists of only one slot• You can modify properties of this queue by
highlight its icon and click the ‘Modify’ button
* Normal user cannot control queues
Q C l i SGE
-
8/9/2019 03 SGE Training
37/39
37
Queue Control via SGE
(Cont…)• Modify the properties of a queue
• Try to modify the number of slot
-
8/9/2019 03 SGE Training
38/39
38
Lab 1: Batch scheduler
• Write a small program that calculate the
multiplication table. Save the file inmultab.c
– Program takes one argument which is the
number used to generate the multiplicationtable
• Multab 2 - generate multiplication table for number 2
– Print the multiplication table to standardoutput
• Using SGE to submit the job . Calculatethe multiplication table of 2 to 12
-
8/9/2019 03 SGE Training
39/39
The End