Uncategorized

SC’14 in New Orleans: After-Action Report

SC'14 LogoOur first official presence at Supercomputing this week — SC’14 in New Orleans — was very gratifying. We had almost a dozen pre-scheduled meetings with customers and partners, and had what seemed like a steady stream of visitors at our demo station within the EMC booth on the exhibit floor. We also had a demo station in the Mellanox booth — more on that below. In addition to myself and Andy Nelson from the Office of the CTO, we were joined by several members of our field organization — Matt Herreras, Jeff Adams, and Mike Pileggi — as well as Ranjit Sawant from our Ecosystem and Solutions Engineering (EASE) team.

We showed four things at the show. First, we highlighted a demo of our self-service, private-cloud based approach to delivering HPC cycles to scientists and researchers using VMware vRealize Automation, integrated with our Office of the CTO cluster provisioning prototype (video). Second, we showed a demo of another more comprehensive and complementary Research-as-a-Service approach being developed by our EASE team in collaboration with several of our customers. Third, we demonstrated the use of InfiniBand SR-IOV to enable access to a Lustre parallel file system from multiple VMs on a host. And, fourth, we displayed performance data demonstrating that a wide range of HPC applications run at very close to native speed on ESXi … and that we are continuing to enhance the platform to address even more applications in the future. The overwhelming reaction to our message was simple– “This is exactly what we are looking to do” — a far cry from the rampant skepticism about virtualization and cloud computing of even a few years ago. We’ve come a long way, as you can hear in this HPC update video interview Matt Herreras and I did with Rich Brueckner of insideHPC at the SC’14 booth. And for those interested in an overview of virtualization for HPC, I recommend insideHPC’s Guide to Virtualization, the Cloud, and HPC.

I did have several conversations with customers who expressed concern about virtualized performance. After sharing our results (some of which are included below) they expressed interest in testing some of their specific applications and models on vSphere within their own environments. We encourage this because while our testing is intended to show broad application performance characteristics, we can’t possibly cover all of the codes (or even types of codes) of interest to the HPC community. It is, therefore, very important for organizations to evaluate performance on their own key applications and data sets where possible. We, of course, can help to ensure that our platform is tuned correctly — an essential step in delivering good performance on latency-sensitive HPC applications. As a result of these meetings, I expect that we will establish some new customer collaborations to work more closely on exploring the value of deploying cloud-based solutions for HPC workloads. Stay tuned for details.

The bottom line on SC’14: It was energizing to talk with so many excited customers and to be able to show the progress we’ve been making in addressing their needs.

RH6.5 bare-metal versus 16 vCPU VM with ESXi 5.5u1 on HP DL380 G8 16-core (2S) IVB with 128 GB
RH6.5 bare-metal versus 16 vCPU VM with ESXi 5.5u1 on HP DL380 G8 16-core (2S) IVB 128 GB
RH 6.5 bare-metal ESX 5.5u1, four-node DL380 G8 16-core (2S) 128 GB with FDR InfiniBand. One 16-vCPU VM per host with MPI processes varied from 4 to 64.
RH 6.5 bare-metal versus ESX 5.5u1 using four DL380 G8 16-core (2S) 128 GB cluster nodes with FDR InfiniBand. One 16-vCPU VM per host with MPI processes varied from 4 to 64.
FDR IB send (OFED) latencies between two DL380 G8 IVB 16-core (2S) nodes comparing bare-metal to an engineering build of ESX. 16-vCPU VMs. VM Direct Path I/O.
FDR IB send (OFED) latencies between two DL380 G8 IVB 16-core (2S) 128GB nodes comparing bare-metal to an engineering build of ESX. 16-vCPU VMs. VM Direct Path I/O. Results are essentially identical.
12-core bare-metal CentOS 6.4 versus three 4-vCPU VMs Lustre IOR bandwidth measurement, configuration details as shown. Varying number of IOR reader/writers shows comparable write performance and up to 50% increase in reader performance on virtual. Very preliminary results with additional analysis underway.
12-core bare-metal CentOS 6.4 versus three 4-vCPU VMs Lustre IOR bandwidth measurement, configuration details as shown. Varying number of IOR reader/writers shows comparable write performance and up to 50% increase in reader performance on virtual. Very preliminary results with additional analysis underway.