I keep trying to finish the documentation about our computing cluster at SANBI and SGE (Sun Grid Engine)1 and how to run jobs. In the meantime, however, here’s how to run a job on the cluster at SANBI.
Firstly, the structure of the cluster. Our storage, for now, is provided by a storage server and shared across the whole cluster. This means that your home directory and the
/cip0 storage area is shared across the whole cluster. We still need to implement better research data management practices but you should do your work in a
scratch directory and store your results in your
research directory. I’m not going to talk more about that now because the system is in flux.
Secondly, the cluster has a number of compute nodes and a single submit node. The submit node is
queue00.sanbi.ac.za, so log in there to submit your job. It is a smallish virtual machine, so don’t run anything substantial on the submit node!
So lets’s imagine that you want to run a tool like
fastqc on the cluster. First, is the tool available? We use a system called environment modules to manage the available software. This allows us to install software in a central place and just add the relevant environment variables to run the tool you need. The
module avail command lists available commands, so
module avail 2>&1 |grep fastqc will show us that
fastqc is indeed available.
Next we need a working directory (e.g.
/cip0/scratch/pvh) and a script that will run the command. Here is a little script (lets imagine it is called
#!/bin/sh . /etc/profile.d/module.sh module add fastqc if [ ! -d out ] ; then mkdir out fi fastqc -t 1 -out `pwd`/out -f fastq data.fastq
. /etc/profile.d/module.sh ensures that the
module command is available. The
module add fastqc adds the path to
fastqc so that it is available to our script. You can use
module add outside a script if you want to examine how to run a command. It also sets a variable,
FASTQC_HOME, pointing to where the command is installed, so you can
ls $FASTQC_HOME to see if there is perhaps a README or other useful data in that directory.
fastqc needs you to create the output directory, it won’t do it itself, so the script does that, creating a directory under the current directory, named
out. Now you need to send the script to the cluster’s scheduler:
qsub -wd $(pwd) -q all.q -N myfastqc run_fastqc.sh
This will send
run_fastqc.sh to the scheduler and tell it to run it on
all.q (which happens to be the default queue) with the job named
myfastqc. This queue has a time limit of 8 hours, so if you need to run for longer than that you need a
-q long.q. The
long.q queue has no time limit but fewer CPUs available. The
-wd flags sets the job’s working directory, in this example the directory you are in when you submit the job.
You can check the status of your job with
qstat. Job output is, by default, written into the working directory in two files, one for
stderr and the other for
There’s much more to say about the cluster and the use of the
qhost commands, but this should be enough to get you started with your first cluster job. The rest will have to wait till I’ve got time to write more extensive documentation.
Sun Grid Engine is part of the Grid Engine family of job schedulers that has undergone complex evolution over the last years due to Sun’s takeover by Oracle and the subsequent forking of the codebase. See the Grid Engine Wiki for details. Back to text