Task Farming
What is task farming?
Task farming is a technique allowing users to achieve high throughput of serial (single processor) tasks on large parallel computers. ICHEC provides a utility called taskfarm2 to do this. As ICHEC policies do not facilitate individual serial jobs users must use the taskfarm to run multiple serial processes in a single job.
There are two versions of the ICHEC taskfarm utility.
- taskfarm2, supercedes taskfarm and has a number of advantages. Most importantly for the user it does not require a dedicated management node, so in each job an additonal node is available for executing a user's tasks.
- taskfarm, is ICHEC's original task farm utility it is a small MPI application that allows a single master process to distribute tasks to a number of slave processes. It is now provided for legacy support only.
How to access the taskfarm
The taskfarm & taskfarm2 utilities are not in your default PATH but can be loaded when required using an environment module, this command can be entered before submission if the PBS script uses the -V directive to propagate environment settings or it can be added to the job submission script itself.
module load taskfarm2
or
module load taskfarm
What is a task?
In this context a task is a single command or a group of consecutively executed commands. An example of a group of commands as a task:
cd dir01; ../my_executable input_file > output_file
This simple task changes to a specific subdirectory and runs an executable from the parent directory with an input file. The output from the executable is redirected to a file. It's important that all the components of the tasks are separated by semicolons and are all on the one line. A more complex tasks involving multiple different steps can be created. A more complex example involving extra commands:
cd dir01; pwd; date; ../my_executable input_file > output_file; ../my_postprocessor output_file > processed_output_file; date
A taskfarm2 example
This is an example of using taskfarm2 running on 8 processors on Stokes. For the purpose of the example we will assume that we have a directory that contains the following:
- a PBS script to run the taskfarm
- the taskfarm2 input file
- the executable(s) being run in the tasks
- a subdirectory for each task containing its input file (task01, task02, ... )
First the PBS script (called tasks.pbs in this example). It runs a job on eight processors, as found on Stoney, for up to one hour. For more information on batch processing see here and here.
#!/bin/sh
#PBS -N Taskfarm_job_name
#PBS -A projectname
#PBS -r n
#PBS -V
#PBS -j oe
#PBS -m bea
#PBS -M me@my_email.ie
# Note: ppn should equal 8 for Stoney and 12 for Stokes
#PBS -l nodes=1:ppn=8,walltime=1:00:00
# This job's working directory
cd $PBS_O_WORKDIR
module load taskfarm2
taskfarm tasks
Next the taskfarm input file (called tasks in this example).Here 14 tasks are included. When the taskfarm first starts the first eight tasks will be started then each time a task exits the next available task will be started. As Stokes has 12 cores per node it will initially start 12 tasks.
cd task01; ../my_executable input_file > output_file
cd task02; ../my_executable input_file > output_file
cd task03; ../my_executable input_file > output_file
cd task04; ../my_executable input_file > output_file
cd task05; ../my_executable input_file > output_file
cd task06; ../my_executable input_file > output_file
cd task07; ../my_executable input_file > output_file
cd task08; ../my_executable input_file > output_file
cd task09; ../my_executable input_file > output_file
cd task10; ../my_executable input_file > output_file
cd task11; ../my_executable input_file > output_file
cd task12; ../my_executable input_file > output_file
cd task13; ../my_executable input_file > output_file
cd task14; ../my_executable input_file > output_file
Finally submit the job.
qsub tasks.pbs
If you are using the original taskfarm utility you must load the appropriate module as mentioned and preface it in the pbs script with mpiexec, as it is an mpi program.
In this example the taskfarm stderr and stdout will be written to spool files in the main parent directory and each task has an indivdual output files in it's subdirectory.
It is also possible to run the taskfarm in an interactive PBS job for debugging purposes.
Additional taskfarm2 options
With taskfarm2 the following options are available:
- TASKFARM_PPN, if this environment variable is set to an integer value between 1 and 8 for Stoney and 1 and 12 for Stokes the taskfarm utility will use only that number of processors on each node allocated to it. This can be very useful if you need to limit the number of tasks run on a node at once because of memory requirments. By default 8 processors are used on Stoney and 12 on Stokes. If you find you must use this option on the Stokes system please contact us to discuss using the Stoney system which has three times more memory per core.
- TASKFARM_SILENT, if this evironment varible is set then the taskfarm utility will not produce output, however tasks may still produce their own output.
- %TASKFARM_TASKNUM%, if the proceeding string is used in a task line the utility will replace it with the number of the task, starting from zero for the first task. This can be used to create files or directories that are named based on the tasknumber for instance.
- TASKFARM_SMT, if this environment variable is set then the taskfarm utility will allocate as many tasks per core as threads it supports in SMT mode. This option can be useful if tasks are heavy on IO but light on computation and thus each core can sustain multiple threads.
Efficiency considerations
- For latest version of the task farm utility, the minimum number of tasks specified should be the number of CPUs requested for the batch job.
- For the original taskfarm utility the minimum number of tasks specified should be the number of CPUs requested for the batch job minus one. For example a 32 CPU job would require 31 tasks or more.
- The tasks (or series of sub-tasks) to be executed within the task farm should preferably have no dependencies on each other.
- It is important to be careful that the tasks are reasonably well balanced. If one task has a substantially longer runtime it may stay running long after the others have completed wasting a lot of CPU time. Users who have tasks with random runtimes are best adivsed to put a large number of tasks in their input file and run for a longer period as this will yield a higher average efficiency albeit with some killed tasks to be cleaned up when the walltime expires.