Matlab/DCT
Mathworks' Matlab Distributed Computing Toolkit has an interface to batch queuing systems and will, given the appropriate Matlab commands, submit jobs to your queuing system and perform Matlab operations in a parallel or distributed manner. For more on this see http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/.
For Matlab to be able to do this, it needs some understanding of the queuing system and it requires that the end-user of MatlabDCT follow some simple steps to stage the distributed jobs.
This documents assumes module load matlab/2008b or newer.
Distributed Jobs
A distributed job is one whose tasks do not directly communicate with each other. The tasks do not need to run simultaneously, and a worker might run several tasks of the same job in succession. Typically, all tasks perform the same or similar functions on different data sets in an embarrassingly parallel configuration.
The CAC recomends the use of distributed jobs over parallel jobs due to how well the cluster can run many single jobs vs larger multiple cpu jobs. For those who can not use tasks that do not communicate, they will use parallel jobs.
Configure the Torque(PBS) Scheduler
Distributed jobs are submited from matlab. Therefor matlab creates
and submits yoru PBS scripts for you. Distributed jobs allows you to
shutdown the parrent matlab, wait for the jobs to finish, and read in
the values from the jobs at a latter time.
%create the sched object
sched=findResource('scheduler', 'type', 'torque')
%set PBS options 'man qsub' for valid options
%add -l qos=preempt to use preemption
%Do NOT set nodes, ppn or tpn
set(sched, 'SubmitArguments', '-l walltime=24:00:00 -q cac')
%Create a job
job1=createJob(sched)
%add a few tasks, each task will be a one cpu jobs in PBS
createTask(job1, @rand, 1, {3,3});
createTask(job1, @rand, 1, {3,3});
get(job1, 'Tasks')
%Submit job and its tasks to the cluster
submit(job1)
%you may now exit matlab
Check on the status of your jobs using qstat -u uniquename, when some or all of the task have finished start up matlab again, create the sched object and use functions like findJob, and getAllOutputArguments to get the results from the finished tasks. See the matlab DCT documentation for usage.
Parallel Jobs
Note: Parallel jobs requires that matlab be in your default modules
Parallel jobs are those in which the workers (or labs) can communicate with each other during the evaluation of their tasks.
The example will use the file colsum.m. You will need to place this on the cluster or your lab machine.
%create parallel sched object
sched=findResource('scheduler', 'type', 'torque')
%set PBS options 'man qsub' for valid options
%add -l qos=preempt to use preemption
%Do NOT set nodes, ppn or tpn
set(sched, 'SubmitArguments', '-l walltime=24:00:00 -q cac -M email@umich.edu -m abe')
%create parallel job, set the number of CPUs to use
pjob=createParallelJob(sched);
set(pjob, 'MaximumNumberOfWorkers', 4)
set(pjob, 'MinimumNumberOfWorkers', 4)
%create parallel taks using colsum.m from above
set(pjob, 'FileDependencies', {'colsum.m'})
t=createTask(pjob, @colsum, 1, {})
%submit 10 cpu PBS job
submit(pjob)
waitForState(pjob)
results=getAllOutputArguments(pjob)
destroy(pjob)
Parallel jobs submitted from matlab also allows the sbumitting matlab process ran on the login node to shutdown while the tasks run. Behavior is the same as distributed jobs for recovering results from a new matlab session.
matlabpool (parfor)
Currently use of matlabpool is limited to using the local configname. The local config is limited to 4 cores (8 is using matlab/2009a or newer) in use in shared memory parallel.
The most common use of matlabpool is for the matlab command parfor. For an introduction to parfor see Mathworks Getting Started with parfor.
"A parfor-loop is useful in situations where you need many loop iterations of a simple calculation, such as a Monte Carlo simulation. parfor divides the loop iterations into groups so that each worker executes some portion of the total number of iterations. parfor-loops are also useful when you have loop iterations that take a long time to execute, because the workers can execute iterations simultaneously."
matlabpool and PBS
Be sure to set the number of cores to be used by matlab in your PBS file. Follow the example on our PBS page but change the resources line.
#PBS -l nodes=1:ppn=4,gres=matlab
The value of ppn= must equal the number of labs (works in matlab) to the same value, note this only applies for matlabpool/parfor jobs.
For more specific instructions on using Matlab at the CAC, please see our Matlab page.%matlab example for matlab pool %the size of the pool (4) should equal ppn in PBS matlabpool open local 4 % clear A parfor i=1:100 A(i)=i; end A % %reduction example x=0; parfor i=1:100 x=x+i; end x % %close the pool matlabpool close



