HPC Performance in the Cloud: Status and Future Prospects

I spoke last week at ISC Cloud 2012 in Mannheim, Germany about the performance of HPC applications in the cloud, citing results from several studies. I have summarized the talk below and my PDF slide deck is available here. For full details of ISC Cloud 2012, I recommend the coverage at HPC in the Cloud.

I gave my talk to share some specific data about HPC performance in virtual environments, but I first described what new capabilities can be gained by virtualizing an HPC environment. I felt this was necessary because in HPC discussions it is often the case that virtualization is only discussed in a negative context: how much performance is it going to cost me? That’s a legitimate question, but it is also important to understand what additional capabilities might be gained in return to do a proper cost-benefit analysis.

Those additional capabilities include:

  • “Bring your own software stack” — Rather than mandating what OS distribution will be run across an entire HPC cluster, a virtualized HPC environment can run whatever operating system, middleware, and applications each user needs. This is especially important in multi-department environments where different groups require different software configurations or have requirements that change over time. For example, organizations attempting to centralize their HPC compute resources to increase efficiencies and cost savings, can use virtualization to help address a significant concern of groups being asked to give up control of their dedicated resources: that they will be forced to use a standard software environment that does not match their specific requirements.
  • Separate workloads — Applications can be isolated from each other by running them within their own virtual machine instances. This separation provides both security and fault isolation between running jobs, which is not possible in bare-metal environments where slot-based batch schedulers run jobs from multiple users within the same operating system instance. This isolation is especially important in environments where multiple jobs must be run on each physical machine to use all available compute resources and where data leakage between jobs is unacceptable or where cross-job interference significantly affects efficiency.
  • Use resources more efficiently — In bare-metal HPC environments, jobs that are launched by batch schedulers are constrained to run where they are placed and they remain on those resources until they either complete, fail, or are killed. In contrast, virtual environments allow running workloads to be migrated from one physical machine to another, allowing policy-based, dynamic load balancing to  use resources more efficiently.
  • Protect applications from failures — Since the virtual machine abstraction encapsulates the full state of the operating system and application, virtualization can be used to checkpoint applications so they can be restarted if they fail due to hardware or software problems. In addition to supporting this “fail and recover” approach, virtualization could also use live migration to proactively move applications (or pieces of MPI applications) from failing to healthy machines. This “move and continue” approach would reduce the need for full checkpoints, which require significant time and space to execute and which result in redoing lost work, which can be expensive if third-party software licenses are involved.

I focused on performance in the second half of the talk, making the point that single-process HPC applications across a range of vertical markets (Life Sciences, Digital Content Creation, Electronic Design Automation) generally show slowdowns of about 0-5% relative to bare-metal performance. For specific Life Sciences data, I referenced the paper Pragmatics of Virtual Machines for High-Performance Computing: A Quantitative Study of Basic Overheads by Cam MacDonell and Paul Lu of the University of Alberta. Here are their results for HMMer, an important biosequencing code:

And for GROMACS, a molecular dynamics code:

I then discussed distributed applications, starting with Hadoop which is a workload we have seen run faster in some cases when virtualized. We previously published a technical white paper about this, which can be found here.

Turning to more challenging cases, I showed the results of Intel experiments in 2009 showing HPCC and STAR-CD performance using InfiniBand in passthrough (VM DirectPath I/O)  mode. HPCC generally ran well with two exceptions, one of which we have explored and understand (MPIRandomAccess should be run with large pages to reduce TLB miss rates) and the other (NaturallyOrderedRingBandwidth) which has not yet been investigated. The HPCC results are shown below for two, four, and eight-node configurations.

STAR-CD, a computational fluid dynamics code,  ran with an overhead of about 15% in the tested configuration. The results for an eight-node case are shown below. As mentioned in the slides, STAR-CD is moderately latency sensitive due to the number of small messages it exchanges. Other less sensitive applications will see less overhead and more sensitive applications will see more.

Intel’s tests were run on DDR InfiniBand using ESX4, both of which are dated technologies today. To partially address this, I also shared our results from QDR InfiniBand experiments in 2011 that showed we could achieve ping-pong latencies under two microseconds with passthrough mode using vSphere 5.1. These results were recently reported in a research note, available here. We are also working with Intel and Mellanox to run additional MPI tests with newer hardware and software and expect to report those results soon.

My message to ISC Cloud attendees was that today’s cloud is not tomorrow’s cloud: As virtualized performance continues to advance and as cloud providers see a business value in deploying high-bandwidth, low-latency interconnects, the number of applications that will run well in a cloud environment will continue to expand. In the meantime, many single-process applications run very well in virtualized environments, MPI overheads may be acceptable for some applications, and virtualization can offer new capabilities not available in traditional, bare-metal HPC environments.


Other posts by

How to Enable Compute Accelerators on vSphere 6.5 for Machine Learning and Other HPC Workloads

As our CTO Ray O’Farrell recently mentioned, VMware is committed to helping customers build intelligent infrastructure, which includes the ability to take advantage of Machine Learning within their private and hybrid cloud environments. As part of delivering this vision, the Office of the CTO collaborates with customers and with VMware R&D teams to ensure the […]

Creative Calculations: VMware RADIO 2017 MathWall Results

Josh Simons and Na Zhang At the recent RADIO conference, engineers were challenged to create mathematical expressions for each number from 1 to 100 using only the digits in ‘1998’ (the year VMware was founded) combined with mathematical operators of their choice. What follows is the summary of the event that was shared with attendees. […]

High Performance Computing with Altitude: SC’16 Begins Tomorrow!

As readers may know, VMware has had a presence in the EMC booth for the last several years at Supercomputing, the HPC community’s largest annual ACM/IEEE conference and exhibition. With the fusion of Dell and EMC into DellEMC and with VMware now under the Dell Technologies umbrella, I am very pleased that we will have two […]