I attended the OpenFabrics Workshop in Monterey earlier this week and delivered a talk on converging requirements between HPC, Enterprise, and Cloud with an emphasis on interconnect-specific issues, specifically RDMA. With its lower latencies and lower CPU utilization along with the potential for higher bandwidths, RDMA is of increasing interest for addressing the requirements of important new Enterprise applications classes (discussed more below).
For those not familiar, the OpenFabrics Alliance is an organization wrapped around the OpenFabrics Enterprise Distribution (OFED) open source community that creates and tests binary releases of the OFED stack for Linux and Windows. They also coordinate patches, offer training courses, and do marketing activities to promote the OFED brand. OFED is the software stack that supports user-level and kernel-level access to iWARP, RoCE, and InfiniBand through a unifed set of interfaces.
What is RDMA and why is it interesting? RDMA (remote direct memory access) allows data to be transferred from the memory of one machine to the memory of another machine without CPU intervention — the transfer is handled by the interconnect endpoints themselves rather than the CPU. This approach can greatly reduce data transfer latencies and increase the efficiency of high throughput data motion. RDMA has been used for many years within the HPC community, most notably with InfiniBand, to build high-performance compute clusters. But it could be used to advantage as well in a (virtualized) Enterprise environment as well. While VMware does not currently support it, there are clear benefits to doing so.
At the virtual infrastructure level (i.e. at the vmkernel or hypervisor level) RDMA could be used to accelerate several distributed services within the platform. For example, it could be used to significantly improve the performance of vMotion by increasing overall vMotion speed while also decreasing VM pause time. In addition, since the VM has less time to dirty additional memory pages due to the quicker transfer of state data, less overall data is actually transmitted over the wire when using RDMA for vMotion. We have run experiments internally to validate these benefits. In addition, DK Panda‘s team at Ohio State published an excellent paper that looks as these issues using Xen: High Performance Virtual Machine Migration with RDMA over Modern Interconnects [PDF].
Exposing RDMA at the guest level would enable many distributed parallel (MPI) HPC applications to run on our virtual platform. While certainly useful to our existing vSphere customers who have such workloads and to HPC customers who are not yet virtualized, it is perhaps even more interesting from a business perspective to contemplate how RDMA might benefit the broader Enterprise computing market. With the rise of many scale-out middleware frameworks and applications, it is becoming increasingly clear that RDMA can add value for such workloads, which include GemFire, memcached, Hadoop, etc. DK Panda’s group presented an interesting talk about memcached and Hadoop acceleration using RDMA at the OFA Workshop. Entitled, Can HPC Interconnects Benefit MemCached And Hadoop? the slides should eventually be posted here and a detailed paper is currently under review.
I will talk more about my presentation and requirements for virtual RDMA in a subsequent blog entry. I’ll also discuss what Oracle said on this topic at the workshop. Stay tuned!