Virtualizing Big Data
Analysis of large-scale, often unstructured data is becoming increasingly important within both the Enterprise and the HPC community. This is perhaps one of the most apparent areas where the convergence of HPC and Enterprise requirements can be seen as the tools and algorithmic approaches required are often the same or very similar. I imagine, for example, that the large-scale, graph-oriented “social network” analyses done by companies like Facebook are quite similar to the “anti-social network” analyses done by Homeland Security and the Intelligence community.
Unsurprisingly, many VMware customers are interested in running Big Data workloads and are looking for guidance about how best to do this in a virtual environment. To help, we have published a whitepaper that examines Hadoop performance using local storage in a vSphere environment, the first in what will eventually be a series of whitepapers in this area. The current paper is available here.
For those interested in a broader discussion of Big Data, NOSQL databases, and virtualization I recommend an audio recording of the Big Data panel that was held at VMworld in Las Vegas. Our panelists were luminaries from across the Big Data space: Amr Awadallah, CTO Cloudera; Clint Green, Principal Engineer Data Tactics; Paul Kent, VP Platform R&D SAS; Luke Lonergan, CTO Greenplum/EMC; and Richard McDougall, Technical Architect for Big Data, VMware. It was a real treat to have all of these experts together in one panel session.
The audio is available here (free registration required).