Using PBS on nyx

For general PBS Usage (how it works, how to write scripts, etc), please see the general PBS page. The following are the specifics of using PBS on nyx.

The Resources

The nyx cluster has processors ranging from 1.4GHz (Opteron 240) to 2.8GHz (Opteron 254), different memory, different processors/per node (some w/ 2 1-core cpus, others with 1 dual-core cpu and others with 2 dual-core cputs). Make sure you specify what requirements you need.

To request memory, you should request memory in mb rather than gb, as well as making 1000mb=1gb. As an example for our 2gb nodes, they actually only have available after system usage maybe 2025mb. A request for 2gb==2048mb will never be honored. In a nutshell, round down a little for your memory approximations.

The Queues

In general, on nyx you should submit to the route queue - this will feed your job into the most appropriate queue so you don't have to worry about queues changing definition. However, if you want to submit to a specific queue (like you have special permission to use a set of nodes), you should look at the output of qstat -q to see what the queue definitions and limits are.

Other Considerations:

You can request specific attributes, such as number of nodes, memory or job runtime. Memory is a cumulative number (2 nodes * 2GB = 4GB). Set the walltime to a number close but slightly longer than you expect the job to run. Its usage is:
#PBS -l nodes=2:ppn=2,mem=4gb,walltime=1:00:00

You can see all other attributes by reading the pbs_resources man page

 

Using Preemption

If you run jobs that are re-startable or are relatively short (a couple of hours) or you don't mind risking having your jobs killed, you can have them start sooner by marking them as preemptible.

Preemptible jobs will first try to run where they won't be killed, but if they can't start they will be run on nodes that are privately owned. If the owner of those nodes submits a job that needs to use one of the nodes the preemptible job is on, the preemptible job will be killed (and, be default, requeued).

While this sounds like a large risk, many preemptible jobs run to completion with no trouble; the only way to see if this will work for you is to try it.

For more information and instructions, please see our Preemption web page.

The Importance of Estimating Your Job's Runtime

You need to estimate how long your job will run. If you do not estimate the wall clock time required by your run, (e.g. walltime=45:00), PBS will terminate your job after 15 minutes. However, if you specify an excessively long runtime, your job may be delayed in the queue longer than it should be. Therefore, please attempt to accurately estimate your wall clock runtime. (A modest amount of overestimation (10-20%) is probably ideal).

How to Write a PBS Batch Script

PBS scripts are rather simple. An MPI example for user your-user-name (using 14 processes) - only use the /tmp directory section if your code doesn't require a shared filespace:

Example: MPI Code

#!/bin/sh
#PBS -S /bin/sh
#PBS -N your-mpi-job
#PBS -l nodes=7:ppn=2,walltime=1:00:00
#PBS -q route
#PBS -M your-email-address
#PBS -m abe
#PBS -V
#
echo "I ran on:"
cat $PBS_NODEFILE
# Create a local directory to run and copy your files to local.
# Let PBS handle your output
mkdir /tmp/${PBS_JOBID}
cd /tmp/${PBS_JOBID}
cp ~/your_stuff .


# Use mpirun to run with 7 nodes for 1 hour
mpirun -np 14 ./your-mpi-program

#Clean up your files
cd
/bin/rm -rf /tmp/${PBS_JOBID}

 
The PBS script parameters are as follows:
#PBS -N your-mpi-job
     Name of the job in the queue is "your-mpi-job". This can be anything as long as it is less that 13 characters long; you should make it descriptive so you know which of your jobs are running and queued.
#PBS -l nodes=7:ppn=2,walltime=1:00:00
     Reserve 7 machines (14 processors), for 1 hour.
#PBS -S /path/to/shell
     Script is /bin/sh (see below)
#PBS -q default
     Submit to the queue named default.
#PBS -M your-email-address
     Email me at this address.
#PBS -m abe
     Email me when the job aborts, begins, and ends.
#PBS -joe
     Join your stdout and stderr output into one file, to be placed in your home directory.

For complete information on PBS flags, use "man qsub". For further information on PBS, use "man pbs".

The MPI (mpirun) parameters are as follows:
-np    Number pf processes.
-stdin <filename>    Use "filename" as standard input.
-t   Test but do not execute.

 

Example: Serial Code

If you have a serial code (e.g. octave) just set 'nodes=1:ppn=1'.
For example:

#PBS -N your-serial-job
#PBS -l nodes=1:ppn=1,walltime=24:00
#PBS -q route
#PBS -M your-email-address
#PBS -m abe
#PBS -V
#
# Create a local directory to run and copy your files to local.
# Let PBS handle your output
mkdir /tmp/${PBS_JOBID}
cd /tmp/${PBS_JOBID}
cp ~/your_stuff .

octave < input.m > out.mat

#Clean up your files
cd
# Retrieve your output
cp /tmp/${PBS_JOBID}/* ~/your_stuff
/bin/rm -rf /tmp/${PBS_JOBID}

How to Submit a PBS Batch Script

To submit an PBS script simply type:
qsub your-scriptname

where your-scriptname is the name of your PBS script. Note that PBS runs your script under the your shell, unless otherwise told to do so. One benefit of running under /bin/sh is that csh is arguably broken in how it handles terminal-disconnected jobs (same goes for tcsh). Using csh or tcsh is fine, but you will receive error warnings at the beginning of your output file:

Warning: no access to stty (Bad file descriptor).
Thus no job control in this shell.

 

How to Check the Status of a PBS Batch Job

To check the status of your job in the queue, type:

qstat your-job-id

To see all jobs in the queue, type:

qstat -a

To see detailed info on each job, type:

qstat -f

To see the number of idle nodes in the queue: queue, type:

freenodes

How to Cancel a PBS Batch Job

If you realize that you made a mistake in your script file or if you've made modifications to your program since you submitted your job and you want to cancel your job, first get the "Job ID" by typing qstat. If you encounter an error while using qdel, send us email and we'll delete the job for you.

For example:

qdel 203
    or
qdel 203.nyx.engin.umich.edu

 

How to Query the PBS Queues

To see the names of the available queues and their current parameters, type:
qstat -q

The notable parameters in the output are the queue names (in the Queue column) and the CPU time limits (in the Walltime column).

Queuing Policy

At the CAC we strive to promote equitable access to our resources. Because all jobs run on the CAC systems are submitted to a batch queuing system, we enforce this fairness by controlling several parameters to the scheduling algorithm used by the queuing system.

When a job is submitted to the queuing system, the queuing system looks for free nodes on which to run it. If it can't find any nodes that are suitable for your job, your job stays in the queued state (in PBS this is denoted by the letter "Q"). While your job is queued, its position in the queue is adjusted relative to the other jobs in the queue based on two primary factors: limits and priority.

In the general access partition on each cluster, we limit the number of cpus that any one person can use at a time. On nyx this is 32 cpus. However, to get the maximum use out of the CAC systems, these limits are soft, so if no one else is waiting, it is possible for one person to use more nodes than the soft limit.

We also limit the number of jobs that are considered for scheduling. We will schedule 2 jobs per person at a time. This means that if USER-A submits 50 jobs with job IDs 101 through 150, and USER-B later submits a job with job ID 151, the scheduler will consider only jobs 101, 102 and 151, 152 for scheduling. When USER-A's jobs are started, her next job will become eligible for scheduling; while it is waiting, it is not accumulating priority.

To further promote fair use of the CAC resources, jobs in the queued state are ordered by their priority. The priority of a job is computed from several factors:

  • The amount of time the job has been in the queue; the longer the time, the greater the priority. However, only one of your jobs at a time accumulates priority based on how long it has been in the queue.
  • Your usage over the past 30 days. If you have used a large amount of wallclock time on the cluster in the past month, people who have used less will receive a higher priority. This is known as "fairshare" and attempts to insure that the widest possible range of users will have access to the CAC resources.

There are exceptions to these rules, the largest being that for people who have purchased nodes or dedicated time on the cluster the limits do not apply. Fairshare still applies to promote fair use within the group of people with access to the private set of nodes.

If you have questions about this policy or feel that it is not fairly enforced, please contact us at cac-support@umich.edu