Virtualized High Performance Computing in 2014
As we complete another circuit around the sun, it’s time again to make some predictions about what we’ll see over the course of our next orbit. Since my work in the Office of the CTO focuses on High Performance (Technical) Computing, I’ll focus there. But before looking ahead, let’s take a minute to look back and see how I did with my predictions last year.
2013 Predictions Scorecard
Last year, my predictions related to public cloud, private cloud, RDMA in the cloud, virtual machine evolution, and HPC acquisitions. My biggest “miss” was also my most specific prediction — that Amazon would announce RDMA support in 2013 in order to better support latency-sensitive MPI applications. They did not, leaving Azure as still the only major cloud service provider to include this important capability. At VMware, we continue working to expand our RDMA support beyond the currently-available use of VM Direct Path I/O (i.e., passthrough or direct assignment). I expect to update our earlier QDR InfiniBand performance study with results using vSphere 5.5 and newer hardware, including FDR InfiniBand and 40 Gb RoCE.
I also predicted the acquisition of HPC assets by providers of broader Enterprise capabilities in the areas of provisioning, monitoring, management, and scheduling at massive horizontal scale. I’m not aware of any such acquisitions in 2013, but I still believe this prediction is correct in the longer term: it would be a shame not to leverage for more general enterprise use the substantial intellectual and product investments made by the HPC community around tools for massive horizontal scale, especially as requirements in the two realms continue to converge.
On the plus side of the prediction ledger, both of my bellwethers for the health of HPC public cloud — Bitbrains in the Netherlands and EngineRoom.io in Australia — made significant advances this year. Bitbrains spoke at VMworld Europe (“vCloud Powered HPC is Better and Outperforming Physical”) and at ISC Cloud 2013 (“Mission-Critical Financial Insurance Services in the Cloud”) in Heidelberg and are expanding their physical infrastructure to handle increased demand for their cloud-based HPC services. EngineRoom.io is firing on all cylinders and has successfully delivering Big Data services to an array of “name brand” customers in the areas of fraud & forensics, applied analytics, and customer insight analytics. In another significant milestone for public cloud and complementing the high-touch, high-value, customized solutions offered by partners like Bitbrains and EngineRoom.io, VMware this year officially launched its own public cloud service — VMware vCloud Hybrid Service — which enables organizations to seamlessly extend their on-premise data centers into the cloud. vCHS infrastructure can support HPC throughput workloads now and will become even more capable in 2014 as enhancements to the service continue.
I was also correct in my assessment of the importance of Private Cloud for HPC and that is the focus of my predictions for 2014.
I’m doubling down this year on the continued rise of internal use of virtualization for HPC workloads, either as vCAC-based private clouds or vSphere deployments. Why? Because in 2013 we saw a big jump in interest within our customer base for virtualizing HPC workloads, especially those in Electronic Design Automation (EDA), Life Sciences, and Financial Services. Beyond interest, we saw actual customer deployments in EDA and Defense and launched two proofs-of-concept collaborations within the University of California system, focused on Life Sciences workloads. Looking forward, I expect to engage in at least two additional collaborations, one in EDA and one in academia, in 2014.
The Defense example mentioned above was presented at VMworld USA by Edmond DeMattia, Virtualization Architect at the Johns Hopkins Applied Physics Laboratory, which runs many Monte Carlo simulations for Air and Missile Defense purposes. The facility has limited (and expensive) data center space that has been used to host two water-cooled HPC clusters totaling over 3700 cores — one for Linux workloads and one for Windows. The use-case for virtualization was simple and elegant and depended on the fact that the Linux and Windows workloads are uncorrelated. By combining the two physical clusters into one larger, virtualized environment the Windows and Linux workloads could be mixed on the same hardware, allowing each workload class access to more physical resources and avoiding the earlier inefficiency of having one cluster idling while the other was heavily loaded and running at capacity. Allowing two OSes to share hardware resulted in better utilization of compute resources, allowed higher peak demands to be satisfied, and did so with better performance than native — roughly a 2.2% increase in application throughput. For the Windows and Linux end-users, pooling and virtualizing the hardware appeared to them as though both of their clusters had suddenly been made larger, but within the same physical footprint.
Private clouds are also gaining traction beyond our customer base. I was happy, for example, to learn from Narayan Desai at Argonne National Laboratory that despite the fact that the Magellan project concluded in 2011, Argonne has since been working to create a private/community cloud to address the midrange needs of their Department of Energy (DOE) research community. There are clear values for virtualizing internal compute resources and it is good to see that acknowledged (and acted upon) by DOE.
As further evidence that this trend is taking hold, a panel titled, “And You Thought HPC Virtualization was never going to happen” was held in Dell’s booth at SC’13 in Denver last month. Take a break from listening to me talk about the benefits of virtualization for HPC and instead listen to representatives from five organizations talk about why they are deploying virtualized HPC resources to address their users’ requirements.
In short, 2014 is going to be the breakout year for use of virtualization for HPC, specifically in private cloud deployments.
Have a different opinion about any of this? Post a comment!