As our CTO Ray O’Farrell recently mentioned, VMware is committed to helping customers build intelligent infrastructure, which includes the ability to take advantage of Machine Learning within their private and hybrid cloud environments. As part of delivering this vision, the Office of the CTO collaborates with customers and with VMware R&D teams to ensure the appropriate computing capabilities are available to support these workloads.
Use of high-end GPUs and other PCI devices to accelerate compute intensive tasks has become increasingly important in Machine Learning and also in Science, Engineering, and Research, as well as in Finance. In vSphere, such devices are accessed via VM Direct Path I/O (passthrough mode), which allows these PCI devices to be accessed directly to unlock the massive parallel capabilities of the hardware and to achieve near-native performance for these workloads.
This blog post explains in detail the steps needed to unlock the power of these devices on vSphere.
To enable these devices for passthrough your host BIOS must be configured correctly and the virtual machine destined to run these accelerated workloads must meet specific requirements as well. In addition, the instructions provided here are only required to enable very high-end devices. This section describes all of these requirements.
Note that while VMware supports VM Direct Path I/O (passthrough) as an ESXi feature and a device listed here may work correctly in passthrough mode, you should contact your device vendor if you require a formal statement of support for a particular device model.
This article is only relevant if your PCI device maps memory regions whose sizes total more than 16GB. Devices in this class include the nVidia K40m, K80, and P100; Intel Xeon Phi; and some FPGA cards. Devices for which these instructions should not be necessary include the nVidia K40c, K2, K20m, and AMD FirePro W9100, S7150x2. For these cards, simply follow the published instructions to enable passthrough devices under vSphere.
These memory mappings are specified in the PCI BARs (Base Address Registers) for the device. If you aren’t sure how much memory your device maps, you can check the vendor’s website or try enabling the device in passthrough model in your VM. While this operation will presumably fail (otherwise you would not be reading this article) you can still examine the VM’s vmware.log file and look for lines like the following:
2017-03-07T07:40:38.467Z| vmx| I125: PCIPassthru: Device 0000:09:00.0 barIndex 0 type 2 realaddr 0xc6000000 size 16777216 flags
02017-03-07T07:40:38.467Z| vmx| I125: PCIPassthru: Device 0000:09:00.0 barIndex 1 type 3 realaddr 0x3b800000000 size 17179869184 flags 12
2017-03-07T07:40:38.467Z| vmx| I125: PCIPassthru: Device 0000:09:00.0 barIndex 3 type 3 realaddr 0x3bc00000000 size 33554432 flags 12
In this example, the PCI device at address 0000:09:00.0 is requesting to map a total of just over 16GB — the sum of the three decimal sizes (16MB, 16GB, 32MB) shown here. This is a high-end card — an nVidia P100, in fact. As a general rule, cards that require more than 16GB of memory are high end cards and you should follow the instructions in this article to enable them for use in passthrough mode within a virtual machine.
Your host BIOS must be configured to support the large memory regions needed by these high-end PCI devices. To enable this, find the host BIOS setting for “above 4G decoding” or “memory mapped I/O above 4GB” or “PCI 64 bit resource handing above 4G” and enable it. The exact wording of this option varies by system vendor, though the option is often found in the PCI section of the BIOS menu. Consult your system provider if necessary to enable this option.
To access these large memory mappings, your guest OS must boot with EFI. That is, you must enable EFI in the VM and then do an EFI installation of the guest OS. Some earlier published advice on this blog stated incorrectly that a non-EFI guest OS installation could be used by adding some “legacy” entries to the VMX file and setting the firmware type to “efi”. This information was incorrect and that approach should never be used, especially in a production environment. You must create an EFI VM.
Enabling High-end Devices
With the above requirements satisfied, two entries must be added to the VM’s VMX file, either by modifying the file directly or by using the vSphere client to add these capabilities. The first entry is:
Specifying the 2nd entry requires a simple calculation. Count the number of high-end PCI devices(*) you intend to pass into the VM, multiply that number by 16 and then round up to the next power of two. For example, to use passthrough with two devices, the value would be: 2 * 16 = 32, rounded up to the next power of two to yield 64. For a single device, use 32. Use this value in the 2nd entry:
With these two changes to the VMX file, follow the standard vSphere instructions for enabling passthrough devices at the host level and for specifying which devices should be passed into your VM. The VM should now boot correctly with your device(s) in passthrough mode.
(*) Note that some products, like the nVidia K80 GPU, have two PCI devices on each physical PCI card. Count the number of PCI devices you intend to pass into the VM and not the number of PCI cards. For example, if you intend to use both of the K80’s GPU devices within a single VM, then your device count is two, not one.
If you have followed the above instructions and your VM still does not boot correctly with the devices enabled, the material in this section may be helpful. If you have tried the suggestions below and are still having problems, please contact me directly at “simons at vmware dot com” and we’ll be happy to help you.
If you see an error similar to the following in the VM’s vmware.log file:
then your BIOS settings do not meet ESXi 6.5 requirements for enabling this type of passthrough device. Specifically, ESXi requires that memory mapped for PCI devices all be below 16TB. It may be possible to work around this problem if your BIOS supports the ability to control how high in the host’s memory address space PCI memory regions are mapped. Some manufacturers — SuperMicro, for example — have BIOS options to change how high this memory is mapped. On SuperMicro systems, the MMIOHBase parameter can be changed to a lower value from its default of 56TB. Sugon systems also have a similar (hidden) BIOS setting. Check with your system vendor to learn whether your BIOS supports this remapping feature.
An error in the vmware.log file of the following form:
indicates that you have not correctly enabled “above 4GB” mappings in your host BIOS as described in the “Host BIOS” section above.
Cannot Use Device
If you have followed all of the above instructions and your VM has booted correctly, but you see a message similar to the following when running the nvidia-smi utility in your guest OS:
then we suggest contacting nVidia directly or performing a web search using this string to find additional information that may be of help.