SQMS is a simple queueing management system developed to simplify the task management in beowulf cluster. It was a part of SCE (Scalable Cluster Environment, http://smile.cpe.ku.ac.th/research/sce). SQMS provides a set of command to support developer in beowulf cluster environment to submit, query status, deleting his/her program to the computing node(s) in beowulf cluster. SQMS is released under Opensource license (BSD based).
# gzip -dc sqms-1.x.x.tar.gz | tar -xvf -
or
# tar xzvf sqms-1.x.x.tar.gz
After unpack, the directory structure of scms will be as listed below.
# configure
or
# configure --prefix=<install directory> # make # make install
where <install directory> is where you want SQMS to install. Default is /usr/software/sqms-<version>
export PATH=$PATH:<install directory>/sqms-1.x.x-x/bin
Example:
export PATH=$PATH:/usr/software/sqms-1.0b-2/bin
After config SQMS. You can start it by:
# $etcdir/rc.d/init.d/sqms start ***************** Start each daemon ***************** Starting SQMS scheduler [ OK ] Starting SQMS loadbalancer [ OK ] Starting SQMS dispatcher [ OK ]
Or you may start it manually by:
# $sbindir/scheduler # $sbindir/loadbalancer # $sbindir/dispatcher
If you want to shutdown SQMS. You can do it by:
# $etcdir/rc.d/init.d/sqms stop
or
# killall scheduler loadbalancer dispatcher
sqsub - submit a job to SQMS. A job is any executable program. For example
$ sqsub myprog 1
Output and error will be redirected to myprog.1out, myprog.1err. 1 is Job ID. If you want to apply arguments to your program. You should create a shell script contains your program and arguments. For example, you might create a file called 'myprog.sh' which contains
#!/bin/sh myprog arg1 arg2 arg3 arg4 arg5
and submit this program to SQMS by
$ sqsub myprog.sh
For more information, try man sqsub
sqdel - When you want to delete a job from SQMS. You can do this by:
$ sqdel 1
where 1 is job id. For more information, please try man sqdel
sqstat - When you want to see your job status. Just type
$ sqstat JOBID OWNER COMMAND JOBTYPE STATUS RUNHOSTS 28 ssy test local wait 29 ssy test local wait 30 ssy test local wait 3 ssy test local run amata11 5 ssy test local run amata1 6 ssy test local run amata2 7 ssy test local run amata3 8 ssy test local run amata4 9 ssy test local run amata5 10 ssy test local run amata6 11 ssy test local run amata7 15 ssy test local run amata11 17 ssy test local run amata1 18 ssy test local run amata2 19 ssy test local run amata3 20 ssy test local run amata4 21 ssy test local run amata12 22 ssy test local run amata1 23 ssy test local run amata2 24 ssy test local run amata3 25 ssy test local run amata4 26 ssy test local run amata5 27 ssy test local run amata6
For more information, please try man sqstat
Normally, SQMS reads configuration from 2 files:
Since $etcdir is etc directory that was set on compile time using option -with-etcdir. After fresh SCE installation, SQMS RPM will determine the installed hostname automatically. SCE wizard will update sqms_hostlist to match the cluster configuration. If you wish, you can edit sqms_hostlist by hand.
This file should exists on each nodes in the system. If not, user will not be able to submit the job from the host other than installed host.
$etcdir/sce/sqms_hostlist This file contains hostlist of the cluster. One node per line. You can weight the host by adding duplicate entry in the list.
############################################################################# #Configuration file for SQMS #$Id: sqms.tex,v 1.5 2001/06/21 08:55:22 b40sup Exp $ ############################################################################# #Below is the current working directory of SQMS. If not set. SQMS 'll #determine the working directory from the program's name #working_directory /usr/local/share/sqms #Default scheduler, dispatcher, load balancer address and port #These daemon might be in different host. But the hosts should shared the #working directory of SQMS #Scheduler address scheduler_host amata1.cpe.ku.ac.th scheduler_port 3456 #Dispatcher address dispatcher_host amata1.cpe.ku.ac.th dispatcher_port 3457 #Load balancer address loadbal_host amata1.cpe.ku.ac.th loadbal_port 3458 #Define Maximum execution job max_exec_job 20 sqms_hostlist amata1 amata2 amata2 amata2 amata3 amata4 amata5 amata6 amata7 amata8 amata9 amata10 amata11 amata12
In this example. amata2 will have more job than other host