VMware’s AI/ML direction and Hardware Acceleration with vSphere Bitfusion

Machine Learning (ML) workloads are emerging as increasingly important for our customers as the competitive value of predictive modeling becomes manifest. We see these workloads from edge to data center to cloud, depending on a host of variables and customer requirements. CPU-based ML is quite common and microprocessor vendors continue to enhance their processors with new instructions and data types specifically designed to accelerate ML workloads, extending the reach of CPU for these workloads. However, there are important cases in which additional and more powerful hardware acceleration is required. This is the realm of hardware accelerators like GPUs, FPGAs, and an increasing number of domain-specific ASICs (DSA) from an array of startups. One can see this need for additional acceleration acknowledged, for example, in Intel’s acquisition of Habana Labs, a producer DSAs for both ML training and inference.

Hardware-accelerated ML

The emergence of these hardware accelerators — with GPUs leading the way — has been an enabler of some of the most impressive advances in AI over the last several years. The term Deep Learning has come to refer to the subset of ML in which the models being trained, and the data sets used for training are both so large that, practically speaking, hardware accelerators become a requirement for creating and deploying these models.

VMware has long supported vSphere VMs accessing hardware accelerator devices for High Performance Computing and now for ML workloads. Longstanding work by VMware’s Office of the CTO, VMware engineers and partners have created a rich set of offerings in this space, focused on delivering accelerator access to VMs running on hosts with installed hardware accelerators. In August of 2019, VMware acquired Bitfusion to further enhance our accelerator options and provide even more flexibility to customers.

 Bitfusion Device Pooling and Sharing

Unlike our other supported mechanisms, Bitfusion allows Deep Learning applications running anywhere in the data center to consume single, multiple, or fractional GPU resources on other hosts, allowing physical GPUs to be aggregated into a centralized hardware pool. This pooling allows customers to drive up overall utilization of expensive GPU resources by avoiding situations in which under-utilized GPUs lie scattered across an organization, dedicated to separate teams who may not be using their resources optimally.

A few scenarios might make the value more obvious. Imagine you are a retailer deploying next-generation product scanners that augment barcode scanning with ML-based image classification to combat fraud. Rather than deploying a dedicated GPU for each checkout device, Bitfusion could be used to provide Deep Learning inference for multiple checkout devices by sharing access to a single powerful GPU while the checkout logic runs spread across multiple VMs and hosts for fault resilience. Or perhaps you are an educational institution wanting to provide students access to fractional GPUs for a class from their assigned generic VMs that do not have access to local GPUs.

As the leader of VMware’s Machine Learning Program Office, I was on the Bitfusion acquisition team last year and was a strong proponent for bringing the technology into VMware. As VMware Bitfusion nears its first release, I thought I’d share some broad thoughts on how the product will likely grow and morph over time.

 Whither Bitfusion?

One theme that will be evident is a deepening integration into vSphere, primarily in the management area. An obvious first step is providing visibility into Bitfusion’s GPU resource allocations via a vCenter plugin. But there are other opportunities. Consider, for example, that Bitfusion has its own resource management capability for mapping user requests for GPU resources to available GPUs. In vSphere, that type of resource management falls under the purview of DRS and so one might expect to see tighter ties between these two components in the future.

While GPUs are currently the most popular Deep Learning hardware acceleration option, both FPGAs and the veritable Cambrian explosion of DSAs that have begun to emerge have the potential to bring significant value to Machine Learning. Happily, Bitfusion is well-positioned to embrace these emerging technologies through tooling it had developed prior to acquisition that helps ease the burden of supporting APIs for new devices. For example, Bitfusion did early work enabling remote access to FPGAs via the Open Programmable Acceleration Engine (OPAE) interface. Similar work was done to enable access to OpenCL-based devices. While the initial release focuses solely on GPU access, one should expect to see additional device enablement over time.

It’s an exciting time for Machine Learning as customers begin to adopt these approaches in earnest. With our acquisition of Bitfusion, VMware has doubled down on providing the most agile, secure, and flexible infrastructure for ML workloads to our customers, whether running on the edge, in the data center, or in the cloud.

Other posts by

Machine Learning at VMworld US 2019

Machine Learning (ML) creates tremendous opportunities for enterprises to detect patterns in the data they collect, and then use those patterns to create new products or services, improve existing offerings, and improve their internal operations. To obtain these benefits, enterprises will need to navigate a number of challenges. These challenges, and more, are discussed in […]

High Performance Computing Conference: VMware at SC’17 in Denver next week

VMware will once again have a demo kiosk inside of the Dell EMC booth at Supercomputing, which is being held next week in Denver. We will showcase the benefits of virtualization of on-premise HPC environments with good performance for an array of workloads. We have a lot to talk about! Here is a preview… New […]

vSphere Scale-Out for HPC and Big Data

I’m very excited that we’ve announced vSphere Scale-Out this week at VMworld here in Las Vegas. This new vSphere edition is specifically and exclusively designed for running HPC and Big Data workloads. This is an important development in our work to offer compelling virtualization solutions for these two emerging workload classes. Our strategy for addressing […]