Using Conda on Kay
As well as providing a very user-friendly way to program and load existing code bases, Python and Anaconda/Conda are among the most popular tools for machine learning/deep learning research, and together comprise a powerful environment for training and running models efficiently, especially as they also have GPU support.
We show here the basics of creating, loading and working with Conda environments on Kay. Take care to observe carefully the commands shown as it can be the difference between success and initial frustration, as there are different ways to activate environments and not all are supported on Kay.
The first step is always to load the Conda module, whether at the command prompt or in an sbatch script. There is just one Conda version currently on Kay.
module load conda/2
Next, for every customization of Conda (i.e. installing libraries etc), we need an environment. You should create a subfolder in your work directory and create the environment as follows:
conda create --prefix /ichec/work/project/env_name python=3.8
This creates an environment called env_name at the location /ichec/work/project/env_name, and sets up the environment to use version 3.8 of python.
Now here is the most important step - on Kay, this is the correct way to activate a conda environment:
source activate /ichec/work/project/env_name
Notice we have specified the full path to the environment install folder, not just the environment name. Also note we have not just executed the command 'source activate'. At the moment you can disregard any system banners that advise using the command 'conda activate', which does not work correctly on Kay.
There are a couple of different ways to install libraries in Conda. For example you can install Tensorflow by activating an environment and executing
pip install tensorflow
Another possibility (e.g. to install pytorch) is
conda install pytorch
the latter of which is more general in that it supports non-python dependencies.
Lastly, anything you do at the terminal such as the above, can be done in a slurm script. So to run a python program named train.py in the py37 environment, the script could be
#!/bin/sh
#SBATCH -p GpuQ
#SBATCH -N 1
#SBATCH -t 00:30:00
#SBATCH -A account
cd $SLURM_SUBMIT_DIR
module load cuda/11.2
module load conda/2
source activate /ichec/work/project/py37
echo "This is the GpuQ run."
time python train.py
You might also be interested in our PyTorch tutorial, which goes into more detail on deep learning specifics but carries out the above commands as a matter of routine:
https://www.ichec.ie/academic/national-hpc/documentation/tutorials/training-pytorch-net-gpus