Nvidia GPU Passthrough on Windows Server 2022 to a Gen2 Hyper-V Linux VM
If you landed here, there is a high probability that you faced the same issue as I did. The scenario is that you have a GPU installed in Windows Server (or other Windows versions) machine, which is the physical host for a number of VMs using Hyper-V. Our own case is a Dell PowerEdge server and an Nvidia A100 80GB GPU. Nvidia is charging their vGPU software to share these resources to multiple VMs which, in my mind, makes it unreasonably expensive to even use your own hardware. Since our developers mostly work using Docker or Kubernetes on VMs set up for performing experiments, these are mostly Linux machines. We wanted to pass through the GPU to one VM running Linux, install the appropriate drivers and let a few teams of researchers use it for their AI experiments. Unfortunately, neither Nvidia’s guides nor others we found seemed to be complete and we either had to do extra digging, or assume certain things. Here, I will attempt to provide a complete admin’s guide to setting it up from start to finish, without the extra details.
On the Hyper-V Windows host
Assuming you have installed the GPU you need to find its Location Path
- Open Device Manager, right click the GPU, click Properties
- Open Details, expand the drop-down list and select Location Paths
There may be one or two there. You need to note the one that looks like this:
"PCIROOT(3A)#PCI(0000)#PCI(0000)"
Then you need to disable the device by clicking the GPU from the Device Manager. The next step is to dismount it using an elevated PowerShell with this command:
Dismount-VMHostAssignableDevice -LocationPath "Your GPU Location Path" -force
Now we need to set up our Gen2 VM with a few options. The Automatic Stop Action for the VM needs to be to turn off. Name-of-your-VM should be replaced with your VM’s name.
Set-VM -Name Name-of-your-VM -AutomaticStopAction TurnOff
Then you need to make sure that RAM and Minimum RAM are equal by going to the VM’s settings and setting these values.
The next step is to set up the MMIO space to be able access the GPU’s memory. In cases of older GPUs with less VRAM most online instructions you’ll find will suggest up to 62Gb of address space. Here, we need to have much more. Here we tried 256Gb for our A100 80Gb and it worked.
Set-VM -GuestControlledCacheTypes $true -VMName Name-of-your-VM
Set-VM -LowMemoryMappedIoSpace 3Gb -VMName Name-of-your-VM
Set-VM -HighMemoryMappedIoSpace 256Gb -VMName Name-of-your-VM
After that, you should be ready to assign the GPU to your VM with the following command, substituting the Location Path and the name of your VM accordingly:
Add-VMAssignableDevice -LocationPath "PCIROOT(3A)#PCI(0000)#PCI(0000)" -VMName Name-of-your-VM
To verify that the assignment has taken place use the following command:
Get-VMAssignableDevice
You are ready to power on your VM.
On the Linux VM
Our example here doesn’t come with a UI, but I will mention the part of Nvidia’s instructions that involve that. I am assuming you are connecting to the VM using SSH.
To check if the GPU is visible to the machine use either of the two commands
lspci
sudo lshw -C Display
If the list is empty, as it was for us for quite some time, chances are that the HighMemoryMappedIoSpace isn’t enough for your GPU. Try increasing it after turning the VM off, running the same command as above with an increased value and then turning it on again. On a Windows VM, the GPU appears in the VM’s Device Manager with an “Code 12” error, which means “This device cannot find enough free resources that it can use. If you want to use this device, you will need to disable one of the other devices on this system.”.
After you’ve made sure the MMIO settings are correct, you need to install the compiler toolchain:
sudo apt update
sudo apt install build-essential
and the kernel headers:
sudo apt-get install linux-headers-$(uname -r)
If you have a UI, you need to exit the X server and terminate all OpenGL applications. For Ubuntu this is done by entering a console login prompt with CTRL+ALT+F1 and running:
sudo service lightdm stop
On Red Hat Enterprise Linux and CentOS this is done by runnimg:
sudo init 3
Although there is an option to install the drivers directly, I opted to download them locally and install them from there as per Nvidia’s instructions. We downloaded NVIDIA-Linux-x86_64–550.54.15.run and installed it with:
sudo sh ./NVIDIA-Linux-x86_64-550.54.15.run
Since Gen2 VMs have secure boot enabled, the installation will ask to generate a signaling key (.der file) and save it in /usr/share/nvidia/. You then need to enroll the certificate with Mokutil using the following command (our file was nvidia-modsign-crt-FD856451.der):
sudo mokutil --import /usr/share/nvidia/nvidia-modsign-crt-FD856451.der
You will be prompted to provide a password dung this process. Then you need to shut down the VM.
On the Hyper-V Windows host
Connect to the VM directly from the Hyper-V interface before turning on the VM. The reason for this is that as we tested it, the MOK manager isn’t visible if you don’t do this in this sequence. MOK Management will appear and you can should select “Enroll MOK” > “Continue” > “Yes”. Enter the password you set and then reboot.
On the Linux VM
To confirm that the driver was installed correctly and the GPU is accessible, run:
nvidia-smi
Your system should be able to use your GPU from this point onwards.
Important Note
From the tests I have performed, irrespective of whether the drivers are signed or not, after any reboot of the host, e.g. after an update, the driver needs to be re-installed. In case someone has found a workaround for this, I would appreciate it in the comments below.