[UPDATE: Feb, 2017: This blog entry has been updated to correct an error. To use advanced features like passthrough of large-BAR PCI devices, you must use a UEFI-enabled VM and guest OS.]

Compute accelerators  — whether they be GPUs, Intel Xeon Phi, or FPGAs — are increasingly common in HPC and so it is important that we assess the use of such technologies from within VMware vSphere® as part of our efforts to virtualize research computing and other HPC environments.

Last year when I tested Intel Xeon Phi in passthrough mode (VM Direct Path I/O) with VMware ESX® 5.5, we found that it didn’t work due to some limitations in our passthrough implementation. However, using an engineering build of ESX, I was able to successfully configure the device in passthrough mode, run Intel’s bundled Phi performance tests, and demonstrate good performance. While this wasn’t of practical use to customers since I was not testing with a released version of ESX, it did validate that with appropriate engineering changes it would be possible to use Intel Xeon Phi with ESX and achieve good performance. It was a promising first step.

With the release of ESX 6.0, Na Zhang recently validated that, 1) ESX 6.0 does now correctly allow access to Intel Xeon Phi in passthrough mode, and 2) performance is generally very good. This blog entry shares our performance results and explains how to expose Intel Xeon Phi in passthrough mode, which is a bit more involved than merely adding a PCI device to the guest.

Performance

We ran the Intel micperf tests on an Intel-supplied prototype machine, comparing bare-metal and virtualized performance in passthrough mode. All tests were run on RedHat 6.4 and MPSS 3.1.2, a now-old version of the Intel Manycore Platform Software Stack. The micperf utility reports our board SKU as C0-7120 P/A/X/D (Knights Corner). Power management was disabled on the host as well as the Phi card to generate best and most stable performance.

In the graphs below, the data series are each described with two descriptors. The first descriptor (“pragma”, “native”, or “scif”) refers to the Phi programming mode used for the test. The second descriptor (“baremetal”, “virtual_6.0”) refers to whether the data were generated on un-virtualized RedHat or on RedHat running in a VM on ESX 6.0.

The virtual STREAM, LINPACK, and DGEMM (and SGEMM — not shown) results are essentially the same as their un-virtualized counterparts. Since STREAM measures local (Phi) memory bandwidth and execution of both LINPACK and DGEMM are dominated by local computation on the Phi card, these results are not surprising.

SHOC Download and Readback measure data transfer speeds between the host and Phi card over a range of message sizes. As the graphs show, when the low-level SCIF interfaces are used, virtual and bare-metal performance is identical. When the higher-level pragma programming interface is used, we see a drop in bandwidth at some message sizes, which we have yet to analyze.

Judging from these tests, we are able to deliver very close to native Intel Xeon Phi performance from a virtual machine running on ESX 6, which is important for those HPC customers interested in accruing the benefits of using virtualization while at the same time having access to compute accelerators, whose use is becoming increasingly common. The next section describes how to configure a VM to use the Intel Xeon Phi.

Intel Xeon Phi virtual/baremetal comparison: Stream

Intel Xeon Phi virtual/baremetal comparison: Stream

Intel Xeon Phi virtual/baremetal comparison: SHOC Readback

Intel Xeon Phi virtual/baremetal comparison: SHOC Readback

Intel Xeon Phi virtual/baremetal comparison: SHOC Download

Intel Xeon Phi virtual/baremetal comparison: SHOC Download

Intel Xeon Phi virtual/baremetal comparison: LINPACK

Intel Xeon Phi virtual/baremetal comparison: LINPACK

Intel Xeon Phi virtual/baremetal comparison: DGEMM

Intel Xeon Phi virtual/baremetal comparison: DGEMM

Configuration

Before enabling passthrough mode for the Phi device, the following configuration changes must be in place:

To use an Intel Xeon Phi card in passthrough mode with ESX 6.0, your hardware BIOS must be set to allow use of large MMIO PCI BARs (base address registers) — this is a requirement of the Phi card rather than an ESX requirement.

There are several VMX parameters that must be set for the virtual machine.  The settings are detailed below, but in essence the following changes must be made: support for large (64-bit) BARs must be enabled. In addition, the VM must a UEFI VM and the guest OS must be installed to use UEFI so that the large BARs required by the Phi card can be properly mapped to the top of the guest’s physical address space.

It is also important to explicitly change the default topology of the VM by specifying a number of cores per socket that matches the underlying hardware of your system. If this setting is not included in the VMX file, then programs attempting to use the Phi will hang. We are currently working with Intel to understand this interaction, but for now this workaround is effective.

Once these changes are in place, enable the Xeon Phi card for passthrough use in the usual way via the vSphere client.

Here are the required VMX file entries:

# Boot with EFI, not BIOS. This line will be included automatically when  # you create a UEFI-enabled VM.  firmware="efi"
# Enable 64-bit BARs  pciPassthru.use64bitMMIO="TRUE"
# Set the topology to mimic that of the underlying hardware   # (replace '8' with # the number of physical cores per CPU in   # your system). Failure to do this will # result in Xeon Phi   # programs hanging when run within a VM.   cpuid.coresPerSocket=8