I am excited we have now started our joint HPC exploration with our partner, AMAX. Based on an initial meeting on the show floor at VMworld in San Francisco last year, we decided to work together to examine several aspects of virtualized HPC of mutual interest. Areas where we see converging requirements between HPC and Enterprise customers are of particular interest to VMware as an Enterprise software company looking at broader markets and to AMAX as a dynamic computing solutions provider to HPC and Enterprise, and now Cloud customers.
We are starting with Hadoop since scale-out data analytics is rapidly becoming an important workload in the Enterprise while Data Intensive Computing is simultaneously rising in importance within the HPC community. We are in the process of installing both RHEL and ESX on the cluster and have already done some initial Terabyte Sort runs to help tune and calibrate the system and to verify our configuration before starting more formal benchmarking.
The AMAX hardware we are using for our collaboration is physically at AMAX’s facility in Fremont, CA. It is an eight-node cluster (AMAX ClusterMax architecture) with two 2.66GHz Xeon X5650 processors, 96GB of 1333 MHz DDR3 memory, and 12 500GB SATA II disks per node. They are connected with Mellanox dual DDR InfiniBand / 10 GbE HCAs and Voltaire IB and 10 GbE switches. All of the networking gear has been loaned to us by Mellanox.
We defined our cluster node configuration based on the hardware recommendations for Hadoop published by Cloudera, specifying systems that are a combination of their Storage Heavy and Compute Intensive configurations, enabling us to include a wide variety of benchmarks in our analysis.
InfiniBand and 10 GbE are not typical interconnects for Hadoop clusters and have been included to support other important aspects of virtualized HPC testing. In particular, we are very interested in experimenting with both RoCE and InfiniBand to understand how well these perform in a virtualized HPC environment. In fact, we have a PhD student from the Georgia Institute of Technology arriving next month for a summer internship who will focus on this aspect of the analysis. While RDMA is traditionally considered an HPC-only interconnect technology, the recent emergence of latency-sensitive, scale-out Enterprise frameworks (e.g. GemFire, memcached, etc.) presage the expanding applicability of these high bandwidth, low-latency interconnects.
The final area we’d like to explore is GPU use from a virtualized environment since GPGPU is such an active area of interest within the HPC community.
I’ll post more entries about this collaboration as we start to generate results and best practices based on our work.
Comments