Frequently Asked Questions
Commonly Received Messages
- What does this e-mail mean: moab job resource violation: "job ####### exceeded MEM usage soft limit"?
- This message is sent when you use more memory than you asked for (default is 768 MB per core).
- You can request additional memory by adding "#PBS -l pmem=###MB" to your pbs file, which will ask for ###MB of memory per process asked for (i.e., if you asked for 2 nodes with ppn=2 and pmem=3000MB, you will have asked for 12000MB of memory total). This is not in addition to the default, but replaces it.
- Why did my job die and I the notice I receive have this error: "job ####### exceeded MEM usage hard limit"?
- Jobs that use an excessive amount of memory over the amount they asked for will be killed by the system to prevent memory overruns for harming other jobs and the nodes they run on.
- You can request additional memory by adding "#PBS -l pmem=###MB" to your pbs file, which will ask for ###MB of memory per process asked for (i.e., if you asked for 2 nodes with ppn=2 and pmem=3000MB, you will have asked for 12000MB of memory total). This is not in addition to the default, but replaces it.
- What does the error, " A high-performance Open MPI point-to-point messaging module was unable to find any relevant network interfaces: ... Another transport will be used instead, although this may result in lower performance." mean?
- This message is actually not an error but an informational message that Infiniband, a high performance network, is not available on the nodes your job is running on. While the message can safely be ignored, it can be prevented by suppressing Infiniband by passing the flags '--mca btl ^openib' to your call to mpirun. Please note that if you do suppress Infiniband and your job lands across nodes that are connected by Infiniband they will not see any increase in performance.
- When using the Intel compilers I get the error: libimf.so: warning: warning:feupdateenv is not implemented and will always fail
- This message is harmless and can be ignored. Users can avoid the message by compiling with
-shared-intel. For details see the OpenMPI FAQ Page.
Usage
- Can I run non-parallel (serial) jobs on the clusters?
- Yes, Many of the applications run on our clusters are serial jobs.
Connecting to a cluster
- How to I connect to your clusters?
- First, you need an account ( CAC account application ) . Once you have your account, you can connect to a cluster using ssh.
- I'm having problems connecting to nyx from off-campus?
- Nyx is unavailable from off-campus. You'll either want to use the U of M VPN client or ssh to login.itd.umich.edu or login.engin.umich.edu, and then login to nyx-login.
- I'm having problems connecting with scp?
- If you are using a calling a different shell for login, you should check that it is an interactive shell. You can do so w/ the following test:
if [ ! -z $PS1 ] ; then
if [ -x /usr/bin/tcsh ] ; then
exec /usr/bin/tcsh
fi
fi
The PS1 is set for interactive shell usage, so only when you are logging in will it attempt to invoke tcsh.
Limits on the number of jobs submitted?
- Is there any limit on the number of jobs someone can submit? Isn't submitting too many jobs abusive?
-
There is a 999 job limit that one person can have queued at any one time. However, this has no effect on how jobs are actually scheduled so doesn't affect when your jobs run. There is some information on this at: http://engin.umich.edu/cac/resources/systems/nyxV2/pbs.html#policy
If you want to see the real state of jobs that are running, waiting, and blocked you can type: showq | less and you'll see the running jobs in order of last-to-finish to first-to-finish, the eligible jobs in order of start priority, and the blocked jobs in the order they were submitted.
Additionally, the more jobs any one person runs, the lower the priority is for their following jobs (this is called "fairshare" and is addressed somewhat at the URL above) so when their jobs do become eligible for scheduling, they are low on the list to start.
Because of all of those controls, the number of jobs any one person submits doesn't affect the other people's jobs on the system and it is often more convenient for people to submit large amounts of jobs at a time and let the system manage them. In fact, the more jobs the scheduler has to work with, the more efficient the system is and the more fair the overall usage is.
C++ runtime abort: internal error: static object marked for destruction more than once
- I'm writing C++ MPI code and when I run it I get the error C++ runtime abort: internal error: static object marked for destruction more than once; what do I do?
- This is a known problem with C++ code that you can work around by adding the -fPIC option to the end of your compile line, for example:
mpiCC -o myProg.exe mySub1.cpp mySub2.cpp mySub3.cpp -fPIC
SIGILL When running on nyx
- When running code in batch on nyx some executables will fail with SIGILL (4).
- When code built with the PGI compilers is ran on a machine with older CPUs than the one the executable was created on, the compiler may place instructions that are not valid on that CPU. To work around this tell the PGI compilers exactly what (multiple) CPU models to support. List of PGI CPU types
pgf90 -tp x64,amd64e,barcelona-64 source.f90



