It’s been a busy few months on the HPC front here at VMware. I’ve been spending increasing amounts of time answering questions from our field people and talking with customers about the benefits of using virtualization and cloud technologies for HPC workloads. The two hottest application areas are clearly Life Sciences and Electronic Design Automation (EDA), with Financial Services as a third area where serious interest is starting to ramp. Customers in these sectors are actively evaluating or deploying HPC solutions based on VMware technology.
Some of this work will be presented at VMworld 2013 August 25-29 in San Francisco. Here are the talks that will be of special interest to organizations whose job mixes include the technical, scientific or engineering workloads that are the mainstay of High Performance Computing.
VAPP5419 – High-Performance Computing (HPC) in the Virtualized Data CenterEdmond DeMattia, Johns Hopkins University Applied Physics Laboratory
The Air and Missile Defense Department’s Combat Systems Development Facility at JHU Applied Physics Laboratory relies on large-scale, Monte-Carlo simulations to perform classified combat systems performance assessments and concept studies for multiple Department of Defense sponsors. The increasing demand for modeling and simulation work drives the ever growing requirement for additional high-performance computing (HPC) capabilities, but reduced IT budgets and tribally managed infrastructures dictated that we learn how to use existing resources more efficiently. This session explains how we successfully pooled the resources of independent Linux and Windows HPC grids into a 2720-core, fully virtualized, high-performance computing platform that has allowed our engineers to achieve decreased runtimes by an order of magnitude. In addition to consolidating two dis-joined clusters for improved utilization, the ESXi abstraction layer reveals a specific use case that realizes a 2.2% performance increase over its native hardware configuration. We examine the technical and non-technical hurdles we had to overcome, such as the problems of scaling storage and network infrastructure to handle the increased number of simulations, modifying user workflows for the expanded HPC grid resources and the cultural challenges that naturally come with the sharing of computing resources. By leveraging the power of VMware vSphere, pooling disjoined HPC clusters, without compromising performance or guest OS integrity is achieved while continuing to meet strict Defense Security Services (DSS) NISPOM requirements, demonstrating VMware virtualization can reshape the path for scientific computing.
VSVC5272 – How UC San Francisco Delivered ‘Science as a Service’ with Private Cloud for HPCBrad Dispensa, UCSF Andy Nelson, VMware
The University of California, San Francisco (UCSF) has a goal of being the world’s preeminent health sciences innovator. To this end, multiple High Performance Computing (HPC) environments exist at UCSF, each administered by different departments. These physical environments are expensive to maintain and difficult to scale. They do not meet the scheduling needs of academic researchers, and lack the flexibility to support disparate workloads. As a result they do not keep pace with the rapid developments and aspirational goals of Life Science research. To avoid these limitations, UCSF and VMware piloted a private cloud for HPC workloads. This environment was built to serve the needs of multiple research groups from a centralized pool of IT resources. Through virtualization and private cloud technologies the University hopes to transform how academic researchers consume HPC resources so that time and funding go toward scientific discovery rather than to maintaining physical silos of compute.
VAPP5402 – Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cloud-scale AppsRichard McDougall, VMware Chris Greer, FedEx
This session will share VMware’s strategy and capabilities as a platform for next generation applications, including big-data, high performance computing, low latency and distributed web apps.
There will also be several talks about running latency- and jitter-sensitive applications on vSphere, which is a big focus of our engineering efforts and, of course, highly relevant for many HPC workloads.
VSVC5187 – Silent Killer: How Latency Destroys Performance…And What to Do About ItBhavesh Davda, VMware Josh Simons, VMware
If your virtualized application is running slower than expected, latency may be the culprit. In this talk, we will explain what latency is, describe where it can appear, and demonstrate how it can degrade application performance — sometimes significantly. We will discuss all three primary areas in which latency appears: networking, storage, and memory. We will then discuss best practices for reducing latencies on vSphere and conclude by describing upcoming enhancements to further improve the performance of latency sensitive applications on vSphere. This talk is designed for technical people, but it does not assume any particular expertise.
VSVC5596 – Extreme Performance Series: Network Speed AheadLenin Singaravelu, VMware Haoqiang Zheng, VMware
Extremely latency sensitive applications such as distributed in-memory data management and stock trading have long been thought to be incompatible with virtualization due to their latency and jitter requirements of the order of few microseconds to few tens of microseconds. The key aspects of virtualization — hardware abstraction and resource sharing — tend to introduce latency overhead and jitter of the order of few microseconds to few hundreds of microseconds. In order to support applications with such extreme latency requirements, vSphere 5.5 introduces a new feature called latency-sensitivity that, allows VMs to achieve near-physical latency by removing the major sources of overhead in two ways: 1) allow VMs to exclusively own physical resources to eliminate contention and 2) to bypass virtualization layers. This session starts out by introducing the state of network performance in vSphere, including the latest performance numbers, new network stack optimizations in vSphere 5.5, key tunables to achieve even better performance and consolidation ratios and two troubleshooting tools. The second part of the talk will focus on the new latency-sensitivity feature where we will talk about the sources of overhead and how we went about addressing them. We will finally discuss the benefits and limitations of the feature and conclude with a series of best practices.
We are also holding an High Performance Computing on VMware roundtable session on Tuesday, August 27th from 3:30-4:30pm Pacific at which I and others will be speaking. Space is limited so registration is required. Here is the session description:
VMware helps Enterprises deploy, run, and manage High Performance Compute workloads on a common virtualized infrastructure.
I will be at VMworld all week so if anyone wants to sit down and talk HPC, you can reach me any time as “simons” in the obvious domain.