PyTorch is a Python package that provides two high-level features:
- Tensor computation (like NumPy) with strong GPU acceleration
- Deep neural networks built on a tape-based autograd system
PyTorch has a distant connection with Torch, but for all practical purposes you can treat them as separate packages.
Latest available wheels
To see the latest version of PyTorch that we have built:
[name@server ~]$ avail_wheels "torch*"
For more information on listing wheels, see listing available wheels.
Installing Compute Canada wheel
The preferred option is to install it using the Python wheel as follows:
- 1. Load a Python module, either python/2.7, python/3.5, python/3.6 or python/3.7
- 2. Create and start a virtual environment.
- 3. Install PyTorch in the virtual environment with
GPU and CPU
(venv) [name@server ~] pip install torch --no-index
In addition to torch, you can install torchvision, torchtext and torchaudio:
(venv) [name@server ~] pip install torch torchvision torchtext torchaudio --no-index
libtorch.so is included in the wheel. Once Pytorch is installed in a virtual environment, you can find it at: $VIRTUAL_ENV/lib/python3.6/site-packages/torch/lib/libtorch.so.
Here is an example of a job submission script using the python wheel, with a virtual environment inside a job:
#!/bin/bash #SBATCH --gres=gpu:1 # Request GPU "generic resources" #SBATCH --cpus-per-task=6 # Cores proportional to GPUs: 6 on Cedar, 16 on Graham. #SBATCH --mem=32000M # Memory proportional to GPUs: 32000 Cedar, 64000 Graham. #SBATCH --time=0-03:00 #SBATCH --output=%N-%j.out module load python/3.6 virtualenv --no-download $SLURM_TMPDIR/env source $SLURM_TMPDIR/env/bin/activate pip install torch --no-index python pytorch-test.py
The Python script
pytorch-test.py has the form
import torch x = torch.Tensor(5, 3) print(x) y = torch.rand(5, 3) print(y) # let us run the following only if CUDA is available if torch.cuda.is_available(): x = x.cuda() y = y.cuda() print(x + y)
You can then submit a PyTorch job with:
[name@server ~]$ sbatch pytorch-test.sh
This section gives ResNet-18 benchmark results on different clusters with various configurations.
All numbers are images per second per GPU, using
DistributedDataParallel and NCCL.
These results are provisional and there is a lot of variance in their measurement. Work is being done to get a clearer picture.
|Batch Size||1 Node, 1 GPU (baseline)||1 Node, 2 GPUs||2 * (1 Node, 2 GPUs)||3 * (1 Node, 2 GPUs)|
On AVX512 hardware (Béluga, Skylake or V100 nodes), older versions of Pytorch (less than v1.0.1) using older libraries (cuDNN < v7.5 or MAGMA < v2.5) may considerably leak memory resulting in an out-of-memory exception and death of your tasks. Please upgrade to the latest torch version.