I just got back from the 37th IDC HPC User Forum meeting in Seattle where I had been invited to participate on an HPC Cloud panel. The meeting was, as usual, a good event that brought together users and vendors for presentations and informal discussions on a variety of topics. For me, there were a few particularly interesting presentations, which I’ll cover in separate posts. I’ll cover the Cloud panel in this post.

Our HPC Cloud panel, which was well organized by Sharan Kalwani  from KAUST, focused on three issues: security, data transfer, and performance. I briefly summarize my comments on each topic.

Having just returned from VMworld 2010 in San Francisco and being a newcomer to the enterprise software space (just six months at VMware), I shared my surprise at how much larger the VMworld conference was than Supercomputing — almost twice as big, in fact. I mentioned this to emphasize that both the Enterprise customer base and the size of vendor investments in that base dwarf those of the HPC community.

It is precisely these size disparities that ensure Cloud Computing will be defined by and driven by Enterprise IT requirements. It would be a mistake for “HPC Cloud” to be anything other than a use of “Enterprise Cloud” for HPC workloads. To do otherwise would be to lose the benefit of leveraging the large Cloud investments currently underway for the Enterprise. Work will of course still be required to ensure that these Cloud infrastructures are capable of adequately supporting HPC workloads.

Since this was an HPC audience, I explained that “virtualization” goes well beyond just the hypervisor — that it involves virtualization of the entire datacenter, including virtual machines, management and provisioning, network, storage, and security. With respect to security, I described the shift we are seeing from physical security devices to a virtualized security infrastructure that can offer a “defense in depth” approach for Cloud security — protection at the VM, Application, and Virtual Datacenter level. See Allwyn’s post here for more details.

The panel talked about two aspects of data as related to the cloud: the problems moving large amounts of data into or out of the cloud and the expense of doing so, given current pricing models (with EC2 as the obvious example.) There was general consensus among panel members that Amazon’s model is just one model — others will arise as the market demands. As for data volume, there were several common themes, among them use of physical media for data transfer (e.g., Amazon’s Import/Export Service) or the co-location or peering of data-generating sources with large cloud infrastructures. There are also cases in which large community datasets (e.g. genetic sequence data) has been moved into Amazon S3 and made available for shared use by customers. I pointed out that this data transfer issue isn’t  a Cloud problem per se. Rather, it is a remote computing problem. And the HPC community arguably has the most experience with this issue given that we’ve been running large, national shared supercomputing centers for almost three decades which serve a widely distributed user base with a huge range of dataset sizes.

The performance discussion started with a supposition that cloud adoption is forcing HPC users to the x86 model. That was fairly uniformly rejected as a hypothesis since it is clear that x86 has been broadly adopted within the HPC community for reasons having nothing to do with Cloud. Some, in fact, argued that attempting to create non-x86 clouds for HPC use would detrimentally take the community away from the cloud mainstream.

I took the opportunity to briefly outline the benefits of using virtualization for HPC — primarily values related to application resiliency and much more dynamic and efficient resource utilization. I also touched on performance issues — specifically the fact that much straight-line computation runs at essentially full native speed now due to hardware and software virtualization advances — while also acknowledging challenges related to performance accelerators. In this case, by “accelerator” I was referring to both InfiniBand and GPGPU as direct access to these devices is required to deliver accelerated levels of performance for HPC workloads. While we do have the capability to punch devices like these through the VM abstraction to allow direct guest OS access to this hardware, the issue that needs to be addressed is how to offer this capability while also still allowing virtual machines to be live-migrated from one physical machine to another  since this functionality is key to delivering some of the resilience and dynamic resource management values of virtualization.