AMD is excited to team up with VMware on Project Radium! It is a giant step towards helping enterprises get the most from their GPU investments. The combination enables pooling and sharing of GPU resources in virtualized environments. Project Radium helps make all AI/ML accelerator resources available for users by providing a virtualization and management layer, giving users the ability to choose and pool the right GPU resource for the task at hand. AMD is proud to share that Project Radium will support the newly announced AMD Instinct™ MI200 series accelerators.
A closer look at the AMD Instinct MI200 accelerators
The AMD Instinct accelerators are built on AMD CDNA™ architecture. This purpose-built and optimized architecture is designed to do one thing extremely well: run compute-intensive workloads. The latest AMD CDNA 2 architecture offers exceptional performance for artificial intelligence/machine learning (AI/ML) workloads, as well as high-performance computing (HPC) applications. It improves accelerator performance and scaling by leveraging the unique AMD Infinity Fabric™ to increase the number of high-speed links between the Graphics Compute Dies within the accelerator, as well as between accelerators in a compute node. In addition, the AMD CDNA 2 architecture improves on the Matrix Core Technology introduced in the previous generation for FP32, FP16, and INT8 matrix math by extending the technology to FP64 and BF16. The result of these and other features of the AMD CDNA 2 architecture is that the highest AMD accelerator supported on Radium — the AMD Instinct MI250X — comes with 220 Compute Units and 128GB of HBM2e memory (to support large models), and delivers theoretical peak 95.7 TFLOPS of high-precision Matrix FP32, 383 TFLOPS of Matrix FP16 and BF16, and 383 TOPS of INT4| INT8 performance. Visit AMD.com to learn more about the AMD CDNA 2 architecture.
AMD ROCm™: An ecosystem without borders
AMD ROCm is the open software platform allowing users to tap into the power of AMD Instinct accelerators to drive insight and discovery. The ROCm platform is built on the foundation of open portability, supporting environments across multiple accelerator vendors and architectures. With AMD ROCm, developers can take advantage of open compute languages, compilers, libraries, and tools designed to accelerate code development and solve today’s toughest challenges.
Within the AI ecosystem, all the major ML frameworks have ROCm-supported binaries that are fully upstreamed. That means there is no converting, porting, or recompiling of the frameworks needed for users to enable AMD Instinct accelerators. Additionally, tools, guidance, and insights are shared freely across the ROCm GitHub community and forums. Read more about the AMD ROCm platform on AMD.com.
Accelerating enterprise AI/ML with Project Radium
The accelerator-agnostic approach with Project Radium allows customers to quickly enable and take advantage of new hardware architectures, like the AMD Instinct MI200 accelerators, without changing existing software. Through virtualization, workloads dynamically attach to accelerators over a standard network (10GbE and above) and require no code or workflow changes. With the latest AMD accelerators supporting the major frameworks — including TensorFlow, PyTorch and ONNX Runtime — data scientists and engineers can remain focused on models, without having to worry about compilers, vendor drivers, or having to tune each model for a different device. The result is that for the first time, users can quickly integrate new, high-performance AI accelerators, taking advantage of the latest technology to work with larger, more complex data sets, as well as to reduce the time necessary to train and discover new insights. For example, in knowledge distillation where predictions of large, complex teacher models are distilled into smaller models, sample datasets on TensorFlow performed 9.9X faster with the AMD Instinct MI100 GPU than on a single 64-core CPU.* This combination of VMware Project Radium and AMD Instinct accelerators is a compelling solution for enterprises to accelerate their AI/ML workloads, maximizing the use of these new, powerful accelerators.
* VMWare tested remote performance w/standard 10GbE without RDMA enabled. Baseline is local (non-remoted) CPU-only training throughput w/ AMD EPYC 7742 64-Core with 256GB DDR system, TensorFlow 2.4, default scripts for Keras, knowledge distillation examples. AMD Instinct MI100 testing completed using remote access to the AMD Accelerator Cloud. AMD has not independently reviewed or verified these results.