We just announced a pile of new vSphere features at VMworld 2012 in San Francisco this week so I thought I’d take a few moments to describe several of those capabilities that will be of particular interest to High Performance Computing customers.
Bigger Virtual Machines
First, monster VMs got a bit more monstrous in vSphere 5.1. While we continue to support up to 1 TB of memory per VM, we have doubled the number of supported vCPUs to 64, which will be a boon to those wanting to run large-scale OpenMP or other threaded applications in a virtual environment. Those of you who regularly read the HPC entries on the CTO blog will recall that last year we presented a paper on vNUMA at the HPCVirt 2011 Workshop in Bordeaux that showed SPECOMP performance results running in virtual machines with up to 64 vCPUs, so you perhaps aren’t surprised to see this enhancement. That paper, by the way, is here and I recommend reading it if you intend to run NUMA-sensitive workloads in VMs that will span more than one physical socket.
Performance Counters
Second — and this is very cool — we’ve introduced support for virtualized CPU performance counters, which means tools like the Performance Application Programming Interface (PAPI) should now be able to enable profilers to gather performance data on applications running within VMs on vSphere. In fact, we’ve been funding the University of Tennessee to do exactly that as part of our VMware Academic Program (VMAP).
Our implementation provides several options to control how these performance counters function in a virtual environment. By default, we count only guest instructions for “instructions retired” and “branches retired” — these events do not increment when executing hypervisor instructions. All other events increment whenever the physical CPU is executing either guest or hypervisor instructions on behalf of the virtual machine. It is also possible to ignore all hypervisor code when incrementing the counters or to increment the counters regardless of whether guest or hypervisor code is executing. This flexibility supports different use-cases for performance data in a virtualized environment — from end-users interested in application performance to implementors interested in understanding virtualization overheads.
Enhanced vMotion
Enhancements to vMotion in vSphere 5.1 provide a new level of flexibility for live virtual machine migrations. Specifically, we can now support vMotion without the need for shared storage, a situation that arises on HPC clusters in which important application state is resident on the local nodes of a cluster, usually for performance reasons.

With enhanced vMotion, a migration copies both the virtual machine memory and its disk over the network to the destination host. This type of migration can be done within DRS clusters in small environments or across DRS clusters as would often be the case in large virtualized HPC compute farms.
PCI Device Sharing (SR-IOV)
Single Root I/O Virtualization (SR-IOV) is a part of the PCI standard that enables one PCI Express (PCIe) adapter to be presented as multiple separate logical devices to virtual machines. The hypervisor manages the physical function (PF) while the virtual functions (VFs) are exposed to the virtual machines. Previously, one could use VMware DirectPath I/O (passthrough) to expose an entire InfiniBand HCA to one virtual machine, but the HCA could not be shared between VMs as is sometimes useful in multi-tenant HPC environments or in environments in which individual MPI ranks are not able to consume the increasingly large amount of parallelism available from modern multicore systems. SR-IOV enables this sharing.
Note that using SR-IOV to gain direct guest access to a virtual slice of an HCA still disables use of vMotion and Snapshots — the same restrictions placed by use of VMware DirectPath I/O. This is why the research work being done on vRDMA by my colleague Bhavesh Davda in the Office of the CTO is so important: it aims to deliver CPU offload capabilities as well as bandwidths and latencies that are similar to those achievable with passthrough while maintaining vMotion and Snapshot capabilities, both of which can enable interesting use-cases for virtualization in HPC environments.
Auto Deploy
vSphere 5 included our first release of Auto Deploy, which supports provisioning stateless/diskless ESXi hosts using PXE boot and host profiles to customize each host. With vSphere 5.1 we’ve added two additional modes — stateless caching and stateful installs. Stateless caching stores a backup copy of the in-memory image on a dedicated boot device (local disk, SAN, USB) and will use this cached copy on reboot in cases where the PXE infrastructure or other components required to run in stateless mode are unavailable. The stateful install mode allows administrators to leverage Auto Deploy’s provisioning capabilities to configure a new host on first boot and then subsequently boot the system from its dedicated boot device. In such a scenario, the system will always try first to boot from its boot device and will only fail over to a network boot in the event of an error.
More Information
For additional information on the features mentioned above as well as the other enhancements and new capabilities included in vSphere 5.1, see the technical white papers available here.
Comments