Blue glowing high energy plasma field in space, computer generated abstract background

Availability in a Software-Defined Datacenter

Last week, VMware hosted one of its most exciting VMworld conferences to date. The keynotes emphasized a bold vision for the future of IT: the software-defined datacenter (SDDC). The SDDC is the datacenter where the infrastructure, including compute, storage, and networking, has been virtualized and fully automated by software, thus enabling IT to be delivered as a service. As Steve Herrod summarized in his Day 1 keynote – the SDDC is based on the ability to abstract applications from the underlying infrastructure, pool physical resources into shared logical resources, and automate IT operations.

The promise of the SDDC is to enable the delivery of IT services with greater agility, lower cost, and increased reliability. Focusing in on the reliability dimension, the objective of the software-defined datacenter is to provide all applications with availability and business continuity as an automated infrastructure service.

Availability, BC/DR and the SDDC
Traditional solutions for availability and business continuity are complex and expensive, not only from a HW/SW perspective, but also from an operational perspective.

The vCloud Suite, announced at VMworld 2012, provides a comprehensive suite of capabilities to ensure the availability of business-critical applications. These include vSphere High Availability (HA) and Fault Tolerance (FT) to automatically protect applications against local infrastructure failures; vSphere Data Protection (VDP) and vSphere Replication (VR) for data protection; and vCenter Site Recovery Manager (SRM) to orchestrate disaster recovery processes.

vSphere HA has long been a core feature of the vSphere platform for protecting all of the VMs in your environment from physical server failures. HA is a great example to illustrate the value of abstraction, pooling, and automation. Thanks to the abstraction provided by vSphere, VMs are portable across all the physical servers within the pool of virtualized compute resources (i.e., the cluster). In the event of host failure, VMs are automatically restarted on another host according to HA policies.

Disaster Recovery Solutions
Traditional disaster recovery (DR) solutions, designed for non-virtualized datacenters, are frustratingly complex and expensive. IT Operations typically has to deploy a carbon-copy of the production datacenter in the recovery location – an expensive proposition, akin to purchasing a second home for fire insurance. The recovery process is typically documented in complex runbooks, and relies on error-prone manual intervention. At the end of the day, despite spending considerable resources on DR, most IT organizations are unsure whether they will be able to meet business requirements in the event of a disaster.

The vCloud Suite provides a complete set of products to implement a cost-effective, reliable, and fully orchestrated DR solution. The solution for vSphere environments includes two components: data replication and orchestration.

Data replication technology
In order to recover from a disaster, you need to have a copy of your data in a second location. Traditionally, data replication has been provided by storage array-based replication from the storage vendors. Last year, in the SRM 5.0 release, we introduced vSphere Replication the industry’s first hypervisor-based replication product. VR is a great illustration of the SDDC vision. It enables replication at the virtual machine level, with a software-only solution, managed directly from vCenter, and independent of the underlying physical storage.

A complete DR solution must provide much more than just data replication. After a disaster, a decision is made to failover from the primary datacenter to a secondary site. The data may already be at the secondary site, but the typical DR plan also includes a complex “runbook” of steps required to ensure that applications can actually be recovered at the DR site. This can include restarting critical services and applications in a specific order to satisfy dependencies, reconfiguring IP addresses in virtual machines, etc. Given the complexity of the DR plan, many companies schedule a regular DR test to ensure that the recovery will go smoothly in the event of an actual disaster. These DR tests, though, require taking down the production site.

Site Recovery Manager addresses these issues by orchestrating the end-to-end DR process. SRM integrates with and manages the data replication component, supporting both storage array-based replication and vSphere Replication. SRM also provides valuable “test” capability, which creates a test bubble environment on the recovery site, where the recovery can be tested without disrupting the production environment.

With the introduction of vCloud Suite 5.1, VMware continues to improve the availability solutions for software-defined datacenters. More specifically, we made a number of significant announcements on vSphere Replication.

vSphere Replication Enhancements
Site Recovery Manager 5.1, the new version of SRM, provides enhanced capabilities for vSphere Replication, including reprotect, automated failback, VSS application quiescing, and forced recovery. In addition, vSphere Replication is now available separately from SRM, as a standalone replication solution included with vSphere 5.1.

vSphere Replication is a key technology for enabling availability in the software-defined datacenter because it provides two essential capabilities:

  1. Data replication as a software solution, abstracted from the physical storage infrastructureA storage array-agnostic solution provides the greatest degree of flexibility for protecting VM data by providing more abstraction from the underlying physical storage system. VMs can be protected across arrays from different vendors; across arrays of different types (e.g., FC SAN primary datacenter to NFS in the secondary site); and across a broad set of locations. A software-based solution like VR is also a key enabler for DR-to-the-cloud solutions.
  2. Data replication on a per VM (rather than per LUN) basisTraditional storage-based replication technology works at the granularity of a LUN, rather than at the granularity of a virtual machine or virtual disk. This means that all of the VMDKs on a single LUN are replicated together and also have to be failed over together. Unfortunately, this mismatch introduces a number of operational headaches when testing and managing a DR plan. vSphere Replication avoids these problems by replicating at the VM level.

Storage array-based replication solutions are unable to support these requirements today. However, we recognize that array-based replication has had and will continue to have its place in the SDDC, especially for supporting large, mission-critical environments. Our objective is to enable customers to choose the best solution to meet their needs by supporting the development of a rich ecosystem of solutions around the vCloud Suite. With that objective in mind, we are engaging with our storage partners to address these gaps through technology initiatives such as virtual volumes and storage policy-based management that were previewed at VMworld.

The availability team at VMware is actively working on turning the SDDC vision into reality through our suite of availability products. We highlighted some of the more prominent aspects of our latest launch, but stay tuned as we are working on many more exciting projects in software-defined storage and availability!


Leave a Reply

Your email address will not be published.