For years, financial-service providers’ trading platforms have run on physical, purpose-built, high-performance hardware. But as customers explore moving to more agile development and operations models, they want to leverage their existing investments in the software-defined datacenter model already in use throughout their organizations, including VMware Cloud Foundation (VCF).
That’s why VMware’s Office of the CTO and Performance Engineering team collaborated with strategic partners HPE and Xilinx and a financial-services customer to demonstrate the VMware vSphere platform’s ability to host a low-latency trading application previously thought to require a bare-metal solution.
The benchmarking occurred in the HPE Customer Innovation Center in Geneva, Switzerland. HPE provided the server infrastructure based on the HPE ProLiant DL380 Gen10 platform. Connectivity between hosts used the HPE FlexFabric 5950 switch series and the Xilinx XtremeScale™ X2522-25G-PLUS Ethernet adapters and the OpenOnload network stack.
Test 1: Xilinx in a bare-metal environment
We established a baseline for comparison by benchmarking a bare-metal environment. First, we ran the Xilinx performance microbenchmark and found the average round-trip latency through the switch was 3μs (or 1.5μs 1/2RTT).
Next, using the customer’s application-layer based microbenchmark tool, we measured round-trip times between physical hosts again. Note that both end-to-end latency and consistency of that latency (jitter) are important metrics for this application type.
The test consisted of three multi-hour runs with different message rates and sizes with the following results:
You can see that the minimum roundtrip latency through the switch at the application layer was consistently measured at 4μs. Average latencies varied between 4μs and 5μs.
Test 2: vSphere with Xilinx adapters in virtual machines
Next, we installed VMware vSphere and used VM DirectPath I/O technology to pass the Xilinx adapters directly into the guest virtual machines running on the same physical infrastructure. We then re-executed the tests, with the following results:
You can see that both the minimum and average round-trip latencies for the virtual configuration match the bare-metal results. There are, however, some differences within the tail of the latency distribution. Let’s look at this graphically:
Comparing the bare-metal result with the vSphere result shows that the minimum and average latencies are the same. With respect to jitter, vSphere has slightly higher latencies in the intermediate percentiles, while the maximum latency is actually lower in virtual than in bare metal. Generally, the performance in both cases demonstrated both low and well-controlled latencies, showing that virtualization is a viable option for this latency-sensitive workload.
Test 3: VCF with Xilinx adapters in virtual machines
Finally, we tested with the same virtual-machine configuration using the full VMware Cloud Foundation (VCF) software stack. In this model, VCF provides additional capabilities on top of vSphere, but it does require more host resources. The results are shown below. The tail latencies are slightly higher with the additional resource consumption, although the impact is acceptable, and maximum latency remains well controlled.
We optimized the hardware and software configurations to help achieve this deterministic performance, including using the following high-level techniques:
Host and Network:
- The Xilinx XtremeScale adapter was installed in a PCI-E slot that aligned with a specific processor socket and configured with the full-feature (ff) firmware
- The network switch was configured with the following features: undo port fec enable, cut-through enable, burst-mode enable
- The Server BIOS was configured with the settings shown in Table 1
Table 1: HOST BIOS settings used for all tests
- The VMs were sized to fit within a single processor socket and configured with proper CPU/NUMA topology
- Direct Path I/O was used to pass the XtremeScale adapter directly to the guest OS
- Full CPU and memory reservations were granted to each VM
- The VM latency sensitivity policy was set to “High”
- The Guest OS was optimized in alignment with the customer’s application benchmark requirements and best practices. These optimizations were the same in virtual and bare-metal scenarios.
The Results: virtualization is viable for latency-sensitive workloads
VMware vSphere, together with the appropriate underlying infrastructure, shows it is possible to achieve latency and jitter very similar to bare metal on these benchmarks. We concluded that it is no longer necessary to rely solely on dedicated bare-metal infrastructure in separate environments with different operational processes to support this type of workload.
VMware continues to invest in optimizing the vSphere platform for our customer to further enhance performance and to demonstrate that full-application stack results satisfy the customer’s KPIs.
To learn more about this demonstration or get more specific details, please reach out to your VMware team.
This post was jointly authored by the below contributors, members of the HPCML Group, part of the Office of the CTO’s Advanced Technology Group. It focuses on enabling important emerging workloads, primarily High-Performance Computing and Machine Learning, by working collaboratively with VMware R&D, partners, and customers. It explores the use of virtualization for these new workloads, creating technical proof points, whitepapers, and working with customers and partners to demonstrate the value of the VMware virtual platform for these workloads.