Virtualizing Big Data

Analysis of large-scale, often unstructured data is becoming increasingly important within both the Enterprise and the HPC community. This is perhaps one of the most apparent areas where the convergence of HPC and Enterprise requirements can be seen as the tools and algorithmic approaches required are often the same or very similar. I imagine, for example, that the large-scale, graph-oriented “social network” analyses done by companies like Facebook are quite similar to the “anti-social network” analyses done by Homeland Security and the Intelligence community.

Unsurprisingly, many VMware customers are interested in running Big Data workloads and are looking for guidance about how best to do this in a virtual environment. To help, we have published a whitepaper that examines Hadoop performance using local storage in a vSphere environment, the first in what will eventually be a series of whitepapers in this area. The current paper is available here.

For those interested in a broader discussion of Big Data, NOSQL databases, and virtualization I recommend an audio recording of the Big Data panel that was held at VMworld in Las Vegas. Our panelists were luminaries from across the Big Data space: Amr Awadallah, CTO Cloudera; Clint Green, Principal Engineer Data Tactics; Paul Kent, VP Platform R&D SAS; Luke Lonergan, CTO Greenplum/EMC; and Richard McDougall, Technical Architect for Big Data, VMware. It was a real treat to have all of these experts together in one panel session.

The audio is available here (free registration required).

Other posts by

High Performance Computing with Altitude: SC’16 Begins Tomorrow!

As readers may know, VMware has had a presence in the EMC booth for the last several years at Supercomputing, the HPC community’s largest annual ACM/IEEE¬†conference and exhibition. With the fusion of Dell and EMC into DellEMC and with VMware now under the Dell Technologies umbrella, I am very pleased that we will have two […]

Performance of RDMA and HPC Applications in VMs using FDR InfiniBand on VMware vSphere

Customers often ask whether InfiniBand (IB) can be used with vSphere. The short answer is, yes. VM Direct Path I/O (passthrough) allows InfiniBand cards to be made directly visible within a virtual machine so the guest operating system can directly access the device. With this approach, no ESX-specific driver is required — just the hardware […]

Virtualized HPC at Johns Hopkins Applied Physics Laboratory

Johns Hopkins University Applied Physics Laboratory¬†(JHUAPL) has deployed a virtualized HPC solution on vSphere to run Monte Carlo simulations for the US Air Force. The idea was conceived and implemented by Edmond DeMattia at JHUAPL, and has been presented by Edmond and his colleague Michael Chinn at two VMworld conferences. We now have a white […]