My name is Josh Simons. I work for Steve Herrod in the Office of the CTO here at VMware, having joined the company in March of this year from Sun Microsystems. I’m an HPC guy. That’s High Performance Computing, for those not familiar. HPC is all about compute and storage intensive workloads, which are often floating-point heavy, usually technical, very often parallel, and often scientific. But not always.
While some of the largest computers on the planet are HPC systems, HPC is not all about the high end. There are many companies and other organizations using HPC techniques for business and other advantage in their own areas. According to IDC, many of these are enterprise customers who have realized that they are running an increasing amount of compute-intensive workload in their environments for a variety of reasons.
I am sometimes asked what someone with over 20 years of experience in HPC is doing at VMware. Why the career change? Actually, I’ve joined VMware to lead a new HPC effort, one based on bringing the value of virtualization to the HPC Community. I spent roughly the last year looking at the strengths and weaknesses of virtualization for HPC and came to the conclusion that vHPC — virtualized HPC — may be uniquely able to address several important pain points for HPC customers while also offering entirely new capabilities that we can leverage to great advantage for HPC.
I’d like to use this space within the Office of the CTO community page to explore the above topic in more detail through a series of pieces that I hope will attract some commentary and discussion. While I am enthusiastic about the potentials of vHPC, you will not find fanboi giddiness here. I’m quite aware of both the benefits and the considerable challenges of delivering a vHPC vision and intend to present a balanced view by covering all aspects.
Back in June I was in Hamburg, Germany at the 25th annual International Supercomputing Conference (ISC ’10) where I ran a Birds-of-a-Feather session titled Virtualization and HPC that was very well attended. The talk outlined some of the major potential uses of virtualization for HPC and touched on some performance-related topics as well. As this is very appropriate material for this introductory piece, I’ve reproduced much of what I covered below. Readers already familiar with the basics of virtualization and its capabilities can skip to the second section.
Since some readers may not be very familiar with virtualization, I’d like to first define what we mean by the term and also briefly discuss how virtualization is used in an enterprise setting. With this context we can then look at the uses of virtualization for HPC in more detail.
OS virtualization (Fig 1) involves two basic concepts: the insertion of a virtualization layer that manages the physical resources of a server and the encapsulation of an operating system and its applications in a container called a virtual machine. Once this encapsulation is done it is then possible to run multiple virtual machines on a single physical system with access to underlying resources controlled by the virtualization layer. Many readers will be familiar with desktop virtualization through the use of VMware Fusion, Parallels Desktop, or Oracle VM VirtualBox. With such products, one’s guest operating system runs within a virtual machine on a host operating system — for example, Windows XP running in VMware Fusion on Mac OS X. While the principles are similar, in the server virtualization case there is not a host operating system running under the virtual machine, but rather a thin and highly optimized piece of software called a hypervisor. As we will discuss later, the nature of the interface between the hypervisor and the virtual machine is important for both enterprise and HPC uses of virtualization.
Within the enterprise, virtualization is primarily used to reduce the number of physical servers needed in a datacenter by consolidating workload from multiple systems onto a smaller number of servers. Depending on the nature of the workload being consolidated and the hardware available it may be possible to run 10s or more virtual machines per physical server. While this massive over-subscription of resources can work well in the enterprise due to the relatively low resource requirements of many enterprise applications, HPC applications need dedicated compute resources to achieve high performance. However, as chip vendors continue to aggressively increase the number of processor cores per socket some degree of consolidation will be required for HPC workloads that are not able to take advantage of the degree of parallelism available on even single-socket systems. This will be especially true for organizations with throughput-oriented workloads like Life Sciences, EDA, DCC, Energy, and Financial Services. In such cases, the degree of consolidation used will match the total number of virtual CPUs instantiate across all virtual machines with the total number of real CPUs in the underlying physical system to avoid over-subscription and deliver high performance to each virtual machine.
Two additional virtualization capabilities should be mentioned before we turn to a discussion of HPC use cases. The first is snapshots and the second is live workload migration. Both of these capabilities are enabled by the clean, well-defined nature of the interface that exists between the hypervisor and virtual machine.
Virtual machines can be suspended, written to disk, and then restarted later as anyone who has used desktop virtualization products is aware. This snapshot capability can add significant flexibility in an enterprise environment, allowing applications and services to be scaled up and down as needed.
Live workload migration – or VMotion as we call it at VMware – is perhaps one of the slickest virtualization technologies. With VMotion one can migrate running virtual machines (and their running applications) from one physical machine to another. The virtual machine continues to run as the memory footprint of the virtual machine is paged from source machine to destination machine in multiple rounds. When the vast majority of the memory has been transferred, the virtual machine is paused very briefly while the remainder of its working set is transferred to the destination machine and then the virtual machine continues execution on the new hardware. This capability is used in the enterprise in a variety of ways including dynamic balancing of workload between machines, power management, and system maintenance.
Virtualization and HPC
Let’s turn now to how virtualization can be used to both address significant pain points in current HPC environments and to add new capabilities not available without virtualization. We will then discuss the performance of HPC applications on a virtualized infrastructure.
Virtualization is the on-ramp to cloud computing – it enables access to potentially large amounts of computational power without the capital expenditures traditionally required to run large-scale computations. Thus, a desire to exploit cloud computing resources is perhaps one of the most common reasons members of the HPC community are beginning to express interest in virtualization technologies. While accessing remote compute resources has been a common HPC model for many years, the use of virtualization on those resources is a new aspect which requires a better understanding of both the costs and values of virtualization for compute-intensive workloads.
Current HPC clusters are for the most part homogeneous masses of compute nodes all running an identical version of a pre-selected operating system. While this uniformity greatly reduces administrative complexity and allows supported applications to be scheduled to any node, it is also extremely inflexible. Consider, for example, a large HPC installation designed to serve a wide user base with potentially disparate application requirements. These applications may require Windows, Solaris, or an older version of Linux than is installed on the cluster. Since temporarily re-provisioning, rebooting, and reconfiguring nodes to satisfy these user requirements is not feasible, these users are unable to use the shared resource for their work, forcing them to either look elsewhere for appropriate resources or to port their application to the site’s supported operating system. Virtualization technology can greatly improve this situation.
By deploying a virtualized infrastructure across all compute nodes rather than installing a standard operating system, administrators now free users to run virtual machines on the shared resource. Because the virtual machine is a black box from the site administrator’s perspective, users are free to run any operating system with any HPC software stack they desire to support their work. Old operating system revisions, completely different operating systems, old MPI libraries, experimental MPI libraries-users are completely free to make any such choices when building and configuring their virtual machines for later deployment on the shared cluster resource.
Of course, a site may also elect to supply a standard virtual machine with a standard operating system and software stack for those end users who do not require this additional degree of flexibility, thus supporting existing users while also adding the ability to address the requirements of a much broader user base.
This shift from a physical to virtual cluster infrastructure has other benefits as well. There are several in particular that I collectively refer to as clean computing, which address several existing HPC pain points related to the current practice of running multiple applications on nodes in a shared cluster resource.
Current physical compute clusters use standard HPC distributed resource managers like Oracle Grid Engine or Platform LSF to schedule jobs onto nodes. This can work well when no problems occur, but in situations in which jobs abort and leave the system in a corrupted state (e.g. /tmp has filled or an aborted job has not shut down properly and is holding a critical resource,) workload throughput can be greatly reduced as further jobs are scheduled onto these broken systems. System administrators spend significant time crafting homegrown approaches for dealing with these problems, including periodic node reboots to reset the node to a known good state. This problem does not exist in a virtualized cluster because when virtual machines are launched on a node they start in a well-defined and pristine state and cannot be affected by the previous job that ran on the node. Similarly, when launching an MPI job using virtual machines to encapsulate the job’s MPI processes, all ranks are guaranteed to start in a consistent state.
Security is another aspect of clean computing. As was mentioned earlier, non-oversubscribed consolidation – running a set of virtual machines whose total virtual CPU count does not exceed the total number of physical cores in a server – is an important use case for some HPC customer workloads due to the rapidly increasing number of cores available per socket. Virtualization allows multiple jobs to be run on the same physical hardware and provides a security barrier to prevent information leakage between jobs or users. This is especially important in shared environments in which multiple populations must be separated; for example, two departments within a company or academic and industrial users on a shared university resource.
Virtualization can be used by developers in some interesting ways as well. For any developer, the ability to use snapshots to more quickly debug programs by checkpointing the virtual machine and its application just prior to a failure allows the programmer to quickly and repeatedly trigger the failure and therefore debug the problem more efficiently.
For teams doing distributed development, virtualization can be used to guarantee that all developers are working in identically configured environments, which would go well beyond the benefits of a centralized code repository and version management system. I don’t know of anyone using virtualization in this way, but it would be interesting area to explore.
From an HPC-specific perspective virtualization can be used with “reverse consolidation” to debug high-scaling applications as they are developed. For example, the correctness of a new MPI application can be explored on a modest sized test cluster prior to running the full-scale job on an large, expensive, shared resource by running a very large number of virtual machines on each node of the test cluster to simulate a full-scale application run. While clearly performance aspects could not be tested, the placement of MPI processes within their own virtual machine and operating system instances would allow correctness testing in an environment that more closely mimics the production environment than other physically-based approaches.
End of Part I
Because this piece has become quite long, I’ve decided to break it into two parts. The second installment will cover the remainder of the primary use-cases for virtualization in HPC (checkpointing and dynamic resource utilization) and will also touch on some performance-related issues, which I know are of paramount concern to HPC users. Stay tuned for Part II!