Shot of Asian IT Specialist Using Laptop in Data Center Full of Rack Servers. Concept of High Speed Internet with Blue Neon Visualization Projection of Binary Data Transfer
Tech Deep Dives Co-innovation

Virtualizing Low-Latency Trading Platforms with VMware vSphere

For years, financial-service providers’ trading platforms have run on physical, purpose-built, high-performance hardware. But as customers explore moving to more agile development and operations models, they want to leverage their existing investments in the software-defined datacenter model already in use throughout their organizations, including VMware Cloud Foundation (VCF). 

That’s why VMware’s Office of the CTO and Performance Engineering team collaborated with strategic partners HPE and Xilinx and a financial-services customer to demonstrate the VMware vSphere platform’s ability to host a low-latency trading application previously thought to require a bare-metal solution.

The benchmarking occurred in the HPE Customer Innovation Center in Geneva, Switzerland. HPE provided the server infrastructure based on the HPE ProLiant DL380 Gen10 platform. Connectivity between hosts used the HPE FlexFabric 5950 switch series and the Xilinx XtremeScale™ X2522-25G-PLUS Ethernet adapters and the OpenOnload network stack.

Test 1: Xilinx in a bare-metal environment

We established a baseline for comparison by benchmarking a bare-metal environment. First, we ran the Xilinx performance microbenchmark and found the average round-trip latency through the switch was 3μs (or 1.5μs 1/2RTT).

Next, using the customer’s application-layer based microbenchmark tool, we measured round-trip times between physical hosts again. Note that both end-to-end latency and consistency of that latency (jitter) are important metrics for this application type.

The test consisted of three multi-hour runs with different message rates and sizes with the following results:


You can see that the minimum roundtrip latency through the switch at the application layer was consistently measured at 4μs. Average latencies varied between 4μs and 5μs.

Test 2: vSphere with Xilinx adapters in virtual machines

Next, we installed VMware vSphere and used VM DirectPath I/O technology to pass the Xilinx adapters directly into the guest virtual machines running on the same physical infrastructure. We then re-executed the tests, with the following results:


You can see that both the minimum and average round-trip latencies for the virtual configuration match the bare-metal results. There are, however, some differences within the tail of the latency distribution. Let’s look at this graphically:


Comparing the bare-metal result with the vSphere result shows that the minimum and average latencies are the same. With respect to jitter, vSphere has slightly higher latencies in the intermediate percentiles, while the maximum latency is actually lower in virtual than in bare metal. Generally, the performance in both cases demonstrated both low and well-controlled latencies, showing that virtualization is a viable option for this latency-sensitive workload.

Test 3: VCF with Xilinx adapters in virtual machines

Finally, we tested with the same virtual-machine configuration using the full VMware Cloud Foundation (VCF) software stack. In this model, VCF provides additional capabilities on top of vSphere, but it does require more host resources. The results are shown below. The tail latencies are slightly higher with the additional resource consumption, although the impact is acceptable, and maximum latency remains well controlled.


Configuration details

We optimized the hardware and software configurations to help achieve this deterministic performance, including using the following high-level techniques:

Host and Network:

  • The Xilinx XtremeScale adapter was installed in a PCI-E slot that aligned with a specific processor socket and configured with the full-feature (ff) firmware
  • The network switch was configured with the following features: undo port fec enable, cut-through enable, burst-mode enable
  • The Server BIOS was configured with the settings shown in Table 1


Table 1: HOST BIOS settings used for all tests

Virtual machine: 

  • The VMs were sized to fit within a single processor socket and configured with proper CPU/NUMA topology
  • Direct Path I/O was used to pass the XtremeScale adapter directly to the guest OS
  • Full CPU and memory reservations were granted to each VM
  • The VM latency sensitivity policy was set to “High”

Guest OS: 

  • The Guest OS was optimized in alignment with the customer’s application benchmark requirements and best practices. These optimizations were the same in virtual and bare-metal scenarios.

The Results: virtualization is viable for latency-sensitive workloads

VMware vSphere, together with the appropriate underlying infrastructure, shows it is possible to achieve latency and jitter very similar to bare metal on these benchmarks. We concluded that it is no longer necessary to rely solely on dedicated bare-metal infrastructure in separate environments with different operational processes to support this type of workload.

VMware continues to invest in optimizing the vSphere platform for our customer to further enhance performance and to demonstrate that full-application stack results satisfy the customer’s KPIs.

To learn more about this demonstration or get more specific details, please reach out to your VMware team.

 

Authors:

This post was jointly authored by the below contributors, members of the HPCML Group, part of the Office of the CTO’s Advanced Technology Group. It focuses on enabling important emerging workloads, primarily High-Performance Computing and Machine Learning, by working collaboratively with VMware R&D, partners, and customers. It explores the use of virtualization for these new workloads, creating technical proof points, whitepapers, and working with customers and partners to demonstrate the value of the VMware virtual platform for these workloads.

  Mark Achtemichuk currently works as a Staff 2 Engineer within VMware’s Engineering Services (VES) Performance team, focusing on education, benchmarking, collateral, and performance architectures. He is recognized as an industry expert and holds a VMware Certified Design Expert (VCDX#50) certification. He has also held various performance-focused field, specialist, and technical marketing positions within VMware over the last 10 years. His experience and expertise, from infrastructure to application, helps customers ensure that performance is no longer a perceived or real barrier to virtualizing and operating an organization’s software-defined assets.
  Michael Cui is a Senior Member of Technical Staff in the VMware Office of the CTO, focusing on virtualizing high-performance computing. His expertise spans broadly across distributed systems and parallel computing. His daily work includes integrating various software and hardware solutions, conducting proof-of-concept studies, performance testing and tuning, and publishing technical papers. Michael serves on Hyperion’s HPC Advisory Panel and participates in paper reviewing in several international conferences and journals, such as IPCCC, TC, and TSC. He holds both a PhD and a Master’s degree in computer science from the University of Pittsburgh.
Josh Simons leads an effort within VMware’s Office of the CTO to bring the full value of virtualization to HPC. With over 20 years of experience in high-performance computing, he was a Distinguished Engineer at Sun Microsystems with broad responsibilities for HPC direction and strategy. Josh has worked on developer tools for distributed parallel computing, including language and compiler design, scalable parallel debugger design and development, and MPI. Josh has an undergraduate degree in Engineering from Harvard College and a Master’s in Computer Science from Harvard University. He has served as a member of the OpenMP ARB Board of Directors since 2002.