Using cloud vGPUs

From CC Doc
Jump to navigation Jump to search
Other languages:

This guide describes how to

  • allocate vGPU resources to a virtual machine (VM),
  • install the necessary drivers and
  • check whether the vGPU can be used.

Access to repositories as well as to the vGPUs is currently only available within Arbutus Cloud. Please note that the documentation below only covers the vGPU driver installation; the CUDA toolkit is not pre-installed. The CUDA toolkit can be installed directly from Nvidia or used from the CVMFS software stack.

Supported flavors

To use a vGPU within a VM, the instance needs to be deployed on one of the flavors listed below. The vGPU will be available to the operating system via the PCI bus.

  • g1-8gb-c4-22gb

Preparation of a VM running CentOS7

Once the VM is available, make sure to update the OS to the latest available software, including the kernel. Then reboot the VM to have the latest kernel running.

[root@centos7]# yum -y update && reboot

Since the proprietary Nvidia drivers need to be compiled against the running kernel, the package dkms is required from the EPEL Repository

[root@centos7]# yum -y install epel-release

Install the Arbutus Cloud repository. This also installs the public key the packages are signed with to ensure their authenticity These drivers and user-space tools are carefully tested against the infrastructure before they are made available.

[root@centos7]# yum -y install http://repo.arbutus.cloud.computecanada.ca/pulp/repos/centos/arbutus-cloud-vgpu-repo.el7.noarch.rpm

The last step is to install the nvidia vGPU packages. The kernel module package 'nvidia-vgpu-kmod' will take a few minutes as it compiles the required kernel modules in the background.

[root@centos7]# yum -y install nvidia-vgpu-kmod nvidia-vgpu-gridd nvidia-vgpu-tools

If your installation was successful, the vGPU will be accessible and licensed. Test by running nvidia-smi:

[root@centos7]# nvidia-smi         
Tue Sep 21 17:40:33 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03    Driver Version: 460.91.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GRID V100D-8C       On   | 00000000:00:05.0 Off |                  N/A |
| N/A   N/A    P0    N/A /  N/A |    560MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

To check for the license status as well as other information about the vGPU:

[root@centos7]# nvidia-smi -q |less
==============NVSMI LOG==============

Timestamp                                 : Tue Sep 21 17:41:48 2021
Driver Version                            : 460.91.03
CUDA Version                              : 11.2

Attached GPUs                             : 1
GPU 00000000:00:05.0
    Product Name                          : GRID V100D-8C
    Product Brand                         : NVIDIA Virtual Compute Server
    Display Mode                          : Enabled
    Display Active                        : Disabled
    Persistence Mode                      : Enabled
    MIG Mode
        Current                           : N/A
        Pending                           : N/A
    Accounting Mode                       : Disabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : N/A
    GPU UUID                              : GPU-c6d5d6c1-1b00-11ec-b031-a89a79e5169c
    Minor Number                          : 0
    VBIOS Version                         : 00.00.00.00.00
    MultiGPU Board                        : No
    Board ID                              : 0x5
    GPU Part Number                       : N/A
    Inforom Version
        Image Version                     : N/A
        OEM Object                        : N/A
        ECC Object                        : N/A
        Power Management Object           : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GPU Virtualization Mode
        Virtualization Mode               : VGPU
        Host VGPU Mode                    : N/A
    vGPU Software Licensed Product
        Product Name                      : NVIDIA Virtual Compute Server
        License Status                    : Licensed
    IBMNPU
        Relaxed Ordering Mode             : N/A
    PCI
        Bus                               : 0x00
        Device                            : 0x05
        Domain                            : 0x0000
        Device Id                         : 0x1DB610DE
        Bus Id                            : 00000000:00:05.0

Preparation of a VM running CentOS8

Once the VM is available, make sure to update the OS to the latest available software, including the kernel. Then reboot the VM to have the latest kernel running.

[root@centos8]# dnf -y update && reboot

Since the proprietary Nvidia drivers need to be compiled against the running kernel, the package dkms is required from the EPEL Repository

[root@centos8]# dnf -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm

Install the Arbutus Cloud repository. It also installs the public key the packages are signed with to ensure their authenticity. These drivers and user-space tools are carefully tested against the infrastructure before they are made available.

[root@centos8]# dnf -y install http://repo.arbutus.cloud.computecanada.ca/pulp/repos/centos/arbutus-cloud-vgpu-repo.el8.noarch.rpm

The last step is to install the nvidia vGPU packages. The kernel module package 'nvidia-vgpu-kmod' will take a few minutes as it compiles the required kernel modules in the background.

[root@centos8]# dnf -y install nvidia-vgpu-kmod nvidia-vgpu-gridd nvidia-vgpu-tools

If your installation was successful, the vGPU will be accessible and licensed. Test by running nvidia-smi as shown above for Centos7.

Preparation of a VM running Debian10

Ensure that the latest packages are installed and the system has been booted with the latest stable kernel, as dkms will request the latest one available from the Debian repositories.

root@debian10:~# apt-get update && apt-get -y dist-upgrade && reboot

After a successful reboot, the system should have the latest available kernel running and the repository can be installed, by installing the arbutus-cloud-repo package. This package also contains the gpg key all packages are signed with.

root@debian10:~# apt-get -y install gnupg
root@debian10:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/debian/pool/main/arbutus-cloud-repo_0.1_all.deb
root@debian10:~# dpkg -i arbutus-cloud-repo_0.1_all.deb

The installation of the package will display a warning, since the key is directly imported (for convenience) via the package's post-installation procedure.

Setting up arbutus-cloud-repo (0.1) ...
Warning: apt-key should not be used in scripts (called from postinst maintainerscript of the package arbutus-cloud-repo)
OK

Update the local apt cache and install the vGPU packages:

root@debian10:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd

If your installation was successful, the vGPU will be accessible and licensed. Test by running nvidia-smi as shown above for Centos7.

Preparation of a VM running Ubuntu20

Ensure that the OS is up to date and all the latest patches are installed and the latest stable kernel is running.

root@ubuntu20:~# apt-get update && apt-get -y dist-upgrade && reboot

After a successful reboot, the system should have the latest available kernel running. Now the repository can be installed by installing the arbutus-cloud-repo package. This package also contains the gpg key all packages are signed with.

root@ubuntu20:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/ubuntu/pool/main/arbutus-cloud-repo_0.1ubuntu20_all.deb
root@ubuntu20:~# dpkg -i arbutus-cloud-repo_0.1ubuntu20_all.deb

A warning will be displayed since the signature key is added in the post-install stage. The warning can be ignored. Update the local apt cache and install the vGPU packages:

root@ubuntu20:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd

If your installation was successful, the vGPU will be accessible and licensed. Test by running nvidia-smi as shown above for Centos7.

Preparation of a VM running Ubuntu18

Ensure that the OS is up to date and all the latest patches are installed and the latest stable kernel is running.

root@ubuntu18:~# apt-get update && apt-get -y dist-upgrade && reboot

After a successful reboot, the system should have the latest available kernel running. Now the repository can be installed by installing the arbutus-cloud-repo package. This package also contains the gpg key all packages are signed with.

root@ubuntu18:~# wget http://repo.arbutus.cloud.computecanada.ca/pulp/deb/ubuntu18/pool/main/arbutus-cloud-repo_0.1ubuntu18_all.deb
root@ubuntu18:~# dpkg -i arbutus-cloud-repo_0.1ubuntu18_all.deb

A warning will be displayed since the signature key is added in the post-install stage. The warning can be ignored. Update the local apt cache and install the vGPU packages:

root@ubuntu18:~# apt-get update && apt-get -y install nvidia-vgpu-kmod nvidia-vgpu-tools nvidia-vgpu-gridd

If your installation was successful, the vGPU will be accessible and licensed. Test by running nvidia-smi as shown above for Centos7.