The continual increase in the quantity of the data that systems must process places a burden on traditional system architectures, leading to bottlenecks in performance. In this blog post, I’ll explain how computational storage technology can enable existing servers to gain performance, increase their energy efficiency, and open new technical and business opportunities at the edge.
What is computational storage?
Computational storage is an architecture that provides compute functions embedded within storage devices. Computational storage devices (CSx) enhance the capabilities of traditional storage devices by adding compute functionality that can improve application performance and drive infrastructure efficiency. They reduce the movement of data between storage and main CPU, while augmenting the overall compute capability of a given system. Similar to the advantages an onboard CPU gives to SmartNICs, CSx bring low-latency efficient processing of vast amounts of data to existing servers.
CSx is part of a broader industry trend in system design focused on moving compute to where the data resides. We see this pattern at a macro scale with the evolution of edge computing. We’re also seeing the rise of Smart NICs, which move computation closer to where network packets arrive. These systems-design patterns and macro compute patterns are all based on the goal to move compute to where data is stored or generated, rather than always moving the data to a centralized compute location. CSx give us the choice to process data where it is stored and avoid the need to move it across the system bus to the system CPU for processing — reducing the energy associated with massive data transfers and increasing the parallelism of bulk data-processing operations.
There are a number of reasons these devices make sense and are starting to gain traction in today’s application and infrastructure designs:
- They offload some of the processing from the host CPU, moving it onto the storage devices. Each Computational Storage Device (CSD) may have its own CPU, such as a four-core ARM A72 and 6GB RAM per device. Other types of CSx have a field-programmable gate array (FPGA) onboard and perform specialized acceleration functions, such as deep data scanning for DB queries. Both the general-purpose onboard CPU and the “fix-function” FPGA accelerators have important uses. We’re likely to see more CSx blending both capabilities in future. Offloading the main CPU allows blending of real-time workloads and data-intensive analytics applications on the same system.
- They reduce some of the data movement, so data can be processed where it resides — on the storage. A system with 24 onboard CSx could have up to an additional 24GB/sec I/O bandwidth, allowing intensive data processing, without placing high-bandwidth transfers onto the system PCI bus.
- They add more compute into smaller server systems. For edge, this means a 2U server with up to 24 additional CPU enhancement NVMe Solid State Devices (SSDs) that can process in parallel and operate collectively on up to 1PB of data. Such a system can have an additional 96GHz of ARM CPU capability on the storage layer to augment the approximately 32GHz provided by the main x86 CPU.
The following diagram illustrates the way these factors compare to the “traditional” architecture.
Why computational storage makes sense for the edge
As mentioned earlier, CSx are becoming an important part of system architecture, due to the large increases in data generation requiring processing with greater speeds and more energy efficiency and smaller form factors. Today’s edge systems can gather vast amounts of data but struggle to perform the needed analytics where they reside without the benefit of larger systems and compute clusters.
Instead of moving the data to a central location for processing, we move the processor to the data! By allowing the storage devices to process the raw data locally, CSx (SSDs with embedded compute, like those provided by NGD Systems, are able to reduce the amount of data that needs to be moved around and processed by the main CPU. Pushing analytics and raw data processing to the storage layer frees up the main CPU for other essential and real-time tasks.
How are CSx being used today?
Many examples of basic use cases — some as simple as offloading compression to the devices — are available today. However, more advanced applications include running DBs, machine-learning (ML) models, and other forms of analytics on storage. Running a full Linux OS on the device, for example, can allow for AI/ML and other advanced applications to be executed on the drive itself. We at VMware have starting exploratory work to run ESXi ARM directly on the storage NVMe. Perhaps we’ll be able to have a vSAN cluster spanning nonvolatile memory express (NVMe) devices in future.
One use case for CSx technology we are currently investigating with NGD Systems is running a parallel DB (such as VMware’s Greenplum) directly on the storage layer. This offloads the main CPU and allows for traditional and ML inference-driven queries to be executed on the storage layer. Greenplum shards data across nodes to minimize cross-traffic and distributes queries across all data segments for highly parallel performance. For more information on our progress, check out this awesome VMworld session recording.
Other areas we are exploring for the use of CSx include improvements in video collection and analytics use cases, IoT analytics, and even emerging automotive technology. For locations with limited compute space — such as smaller offices, ships, branch offices, and on satellites in space — CSx-enabled systems can augment traditional system resources, reducing the number of servers needed, driving higher density and better sustainable computing outcomes.
VMware’s vSphere platform technology and our modern apps Tanzu technology stand to benefit from the opportunities CSx-enabled solutions will provide. We’re exploring these emerging opportunities to bring new industry-leading solutions to market.
What does this mean to infrastructure architects and consumers of CSx?
Below are two of the most frequent questions we have come across with this new technology:
Can I put CSx in any server? Yes. These devices were intentionally designed to be “plug and play” with traditional storage devices, lowering the barrier to entry and increasing the flexibility of the deployments.
Do I need to change my app architecture to use CSx? Some devices have locked or fixed features that are pre-loaded on the drives. Each performs a single function and requires some programming to enable them. Other devices have much more open and programmable resources and might simply require a cross-compile from x86 to ARM, avoiding the need to rewrite the entire application or function.
CSx is still in its infancy, but VMware researchers are starting to see that computational storage is part of a broader industry trend in system design focused on moving compute to where the data resides. There are a number of interesting use cases for combining VMware software offerings with these types of hardware. The ability to pack more horsepower into existing servers offers exciting possibilities. Both VMware and NGD Systems are working with SNIA around the Computational Storage standards efforts, as well. Check out their website for more information.
We have created the following quick video to give you an overview of the exciting use cases and where computational storage makes sense. Let us know what you think!
Do you have use cases that you think CSx can address? Are you interested in learning more about VMware’s use cases? Please feel free to reach out to me via email at email@example.com and let’s discuss!