What if the concepts Abstract, Pool, and Automate are applied to Storage?
The foundation of the modern data center is pooled, automated resources that are sufficiently abstracted from physical hardware leading to significant gains in operational efficiency and resource utilization efficiency. This is true for almost every virtualization and cloud platform today.
The advantages come from many dimensions, most significantly the ability to radically simplify the development and operational management of applications in the datacenter. By pooling resources, we no-longer need to deal with individual servers or specific hardware configurations. Rather we can provision by policy onto a pooled set of resources, and then eliminate much of the ongoing management through automation. Overloaded servers and hotspots are automatically rebalanced, and resources are re-allocated where they are needed.
Arguably, we’ve done a great job of the abstract, pool and automate for compute resources. The path forward for virtualization is two fold – to expand the breadth of applicability of virtualization to new workloads – for example big data, mission critical applications – and to expand the capabilities of abstract, pool and automate to networking and storage dimensions.
With networking, we’re bringing together the best of our vCNS layer 4-7 networking technologies and the Nicira platform to build VMware NSX, our integrated control plane for virtual network infrastructure, allowing full software-control of the networking substrate.
VMware’s software-defined storage strategy will to bring the same levels of abstraction and automation to the storage infrastructure, allowing storage to have a level of simplicity never seen before, as well as the introduction of storage at a much lower economic cost.
Application Diversity leads to new Storage Models
As I’ve written in the past, we’re seeing significant new storage types emerging – two key types being object storage for applications and Big Data file systems for analytics. Those new architectures are often fueled by the need for radical increase in capacity, scale-out and economics. For example, traditional storage systems range from $2 – $10/Gbyte of provisioned space, and most Big Data storage systems are in the range of $.10 – $0.50/Gbyte and are deployed at peta-byte scale. The economics of storage at this scale brings a new set of needs for deploying storage on industry standard servers, which drives the second wave of software-defined storage – when we begin to see a significant portion of storage running on industry standard servers.
Looking forward, the key storage classes will include different types of storage from both VMware and our partners.
Through the software-defined storage journey, we apply pooling and automation capabilities to traditional block storage – i.e. the storage behind the VM’s virtual disks and increasingly the same pooling and automation will be used for the new types of storage.
Software-defined storage is the automation and pooling of storage through a software control plane, and the ability to provide storage from industry standard servers. This offers a significant simplification to the way storage is provisioned and managed, and also paves the way for storage on industry standard servers at a fraction of the cost.
Today, storage is typically managed in dedicated hardware silo’s for individual products. This leads to increasingly complex and fragmented management, deployment processes, and skillsets for each silo. With software-defined storage, the opportunity is to provide a unified control plane across these silo’s, to simplify the provisioning of applications with the best storage system.
Applications and Policies
Just as we’ve experienced with pool compute resources, we want to be able to provision new storage as part of an application workflow, using application-aware policies for performance, cost, availability and recovery.
With software-defined storage, a control plane allows central control of storage, provided by heterogeneous storage back-ends. We use an application policy to affect the placement and control of storage, according to its needs at the time of provisioning, and adjusted as ongoing needs for capacity and performance change.
Heterogeneous storage resources are abstracted into logical pools where they are consumed and managed through app-centric policies.
Adding Data Services through the Virtual Data Plane
We add data services to storage through the Virtual Data Plane. Through this layer, we enrich data services with layered functionality and by integrations with the backend storage. For example, efficient clones and snapshots can be leveraged through the backend storage through storage APIs.
At VMworld, we will talk about a number of technologies that are part of strategy for software-defined storage. Let me offer a short summary here.
Extending Policy through to the Array with Virtual Volumes (VVol)
The storage interfaces with external storage are being enriched to include policy and automation capabilities through VVols, a new enriched storage API between the control plane and the backend storage providers. Through VVols, the external storage can become VM-aware and allow it to distinguish between individual VMs. This enables negotiation and acting of policy for performance, replication and availability.
Through the VVols interface, an external storage device can provide snapshots, replication and other operations at VM granularity. In addition, the virtual disks now have a native representation on the storage device, allowing further optimization.
VVols work with both external SAN and NAS storage, in addition to working on the new vSAN storage discussed later in this post.
Flash Read Cache
One of the significant trends in storage is an increasing amount of flash memory in the storage hierarchy. Flash memory’s notable characteristic is that it delivers several orders of magnitude higher random read performance than a traditional magnetic disk. For example, a traditional device delivers between 100-200 IOPS, where as a flash based solid-state disk can deliver tens of thousands of IOPS for random reads.
While flash memory is also used at the storage tier, there is a trend toward more flash at or close to the compute layer. Today, flash memory can be connected by using a solid-state disk or a PCI express card, and just around the corner is PCI-express connected flash in removable disk form factors.
VMware’s strategy for flash is to complement storage-side flash with caching on the compute tier.
Flash Read Cache provides a compute-tier read cache using Flash technologies. The cache is fully integrated with vSphere, and embedded in the hypervisor layer to provide optimal low latency performance.
Further more, and new layer, Virsto provides a performance optimization layer between the VMs and external storage, and also enables fast yet efficient snapshots of VMs state.
The Virsto storage model uses a journaling technique to optimize performance. Random writes are consolidated in a linear log, transforming random writes into sequential writes on disk – and in some cases deliver an order of magnitude increase in virtual write IOPS.
In addition to performance Virsto provides scalable snapshots – where the time taken to create each snapshot is very low, and the performance as snapshots are taken remains consistent.
The Virsto product is part of our announcements this week at VMworld, and we have several great sessions to attend.
While storage has traditionally been external to the compute layer, there is a growing opportunity for a converged architecture with both compute and storage in the same layer. Today’s industy standard storage servers can pack a whole lot of capacity. In fact I’ve recently seen up to 320TB in a single 4U rack recently. But the trade-off with local server based storage is that while it’s low cost, it’s unreliable.
Enter VSAN! With the right software reliability layer, we can turn a cluster of industry standard servers into enterprise grade reliable storage. VSAN uses software-replication techniques to create reliable distributed storage across multiple servers. No special networking is required, VSAN uses regular 10Gbe Ethernet between hosts in the cluster to keep data consistent and fully accessible to all VMs in the cluster.
VSAN uses a converged storage model, where compute and storage resources are combined in a single hardware platform. It provides reliable storage for virtual disks using a cluster of machines as a pool of storage resources. An individual virtual disk is stored across the cluster using Network RAID, so that a failure of an individual disk or an entire host does not impact the durability of the data.
Supporting the trend of increasingly ubiquitous flash at the compute layer, VSAN uses local flash to cache reads and writes, offering the performance advantages of flash-based storage with a hybrid of flash and magnetic devices. And the VSAN layer can handily optimize the placement of compute, flash and magnetic disk copies of the data, to ensure best performance and locality.
With VSAN, gone are the days of dealing with many layers of storage across different groups. To simplify provisioning of storage, VSAN allows a policy to be applied to the VM’s storage, that allows storage to be allocated and balanced automatically across the cluster. For example, a virtual disk can be specified with a SLA containing declaratives for capacity, the type of replication, and the amount of SSD to be used to accelerate performance. In fact, a single datastore can support many different virtual-disk SLAs within the same store.
Configuration of a VSAN cluster is also simplified and integrated into the core vSphere workflow. Creating a VSAN cluster follows the same procedure as creating a regular vSphere cluster – a few extra steps allow the storage within the hosts of the cluster to participate in the distributed storage data store.
VSAN is fully integrated with important vSphere features, including VM snapshots, HA, DRS, vMotion, Storage vMotion, SRM/VR and VDP/VDPA. VSAN was designed from day one as a scale-out architecture – every aspect of VSAN is distributed across the hosts in the cluster, and thus there are no special “master nodes” or any that might lead to scaling limitations or single points of failure. Initially, vSAN scales to eight nodes and is designed to span much higher and will progress in subsequent releases.
For more on the original motivation for VSAN, check out this blog by my colleague Christos Karamanolis. And Chuck Hollis also wrote a blog on VSAN’s key capabilities.
I think you’ll agree that we’re entering a very interesting time for storage, a disruptive stage that will lead to significant simplification and economic change for the provisioning and management of storage. I’m happy to announce some of those new technologies are here today. If you’re at VMworld this week, I highly recommend you attend one of these key storage sessions on to learn more about VMware’s storage strategy and technologies:
- STO1001-GD: VSAN with Cormac Hogan and VMware R&D Engineers
- STO4973: VMware Virtual SAN Panel Discussion
- STO5391: VMware Virtual SAN Overview
- STO5027: VMware Virtual SAN Technical Best Practices
- STO5559: The Future of Storage : A Panel Discussion
- STO5715: Software-defined Storage – The Next Phase in the Evolution of Enterprise Storage
- STO1004-GD: vSphere Flash Read Cache, VMware VSANTM, VMware Virsto, Software-Defined Storage Architecture with Rawlinson Rivera and VMware R&D Engineers
- STO5359: VMware Virsto Technical Overview: Optimizing Your SAN Infrastructure for VDI and Virtual Datacenter Environments
The full list of VMware storage sessions can be found here, but folks from our Storage and Big Data teams will be at VMworld this year to demo the new and upcoming storage technologies, so I encourage you to stop by the VMware booth, attend a session and learn more.