Last week, Andreesen Horowitz published a much-discussed blog entitled The Cost of Cloud, a Trillion Dollar Paradox. The full blog and follow up Twitter thread is worth a read, but the tl;dr is that for some software companies, the cost of cloud eats into their profit margins and thus suppresses stock price/valuation, to the tune of potentially hundreds of billions of dollars in aggregate across all companies. The post looks at the topic of repatriation as a special one-off event. I want to apply the multi-cloud strategy concepts I’ve been talking about in previous posts to the ideas discussed in this blog. I think that taking the broader multi-cloud perspective sets you up well for repatriation, should it ever be needed.
In the blog, Andreesen Horowitz noted that in order for this market cap impact to apply to a company, cloud costs must dominate cost of revenue (COR) or cost of goods sold (COGS). And indeed, we see many SaaS vendors in this situation, as discussed in the blog. Their massive financial outlay on cloud costs, and scale of their digital operations, mean that there is huge potential upside for them to repatriate workloads, i.e., take them back on-prem and off cloud. The thesis is that since public cloud vendors make about a 30+% profit margin for their cloud services (with some of them upwards of 75% margin), it makes sense that a company could save money on-prem operating the infrastructure itself. Their costs would be higher than the public cloud though, so they could not necessarily realize the full 30+% savings, but the realizable savings are still substantial.
While the market cap hit discussed in the blog only applies to at-scale SaaS companies, cloud costs are something real that impacts everyone (even if those costs are not so dramatic as to overwhelm COR or COGS). In fact, a big aspect of a properly executed, multi-cloud strategy is the ability to choose a cloud based on factors such as security, compliance, location, performance, and, yes, cost. This means that multi-cloud really is relevant to everyone and goes to the heart of the cloud economics discussion. I want to walk through considerations the post laid out for dealing with the “ballooning cost of cloud,” and what we at VMware are doing about it. We will see that the right multi-cloud strategy along with VMware technology can address many of these considerations:
Cloud Spend as a KPI
The Horowitz blog argues that you should make cloud spend a first-class metric and KPI for your business. It’s something everyone, from the CEO on down, should be focused on. This has been a key focus area for CloudHealth by VMware. The average CloudHealth customer saves 25% on their cloud bill by providing visibility back to the business and identifying optimizations. The cool part about CloudHealth is that you can give engineering teams direct access to it, so they can see the services’ precise cloud spend. This feedback loop allows teams to independently monitor their spend and consequently work to lower it. CloudHealth is multi-cloud, so teams using it can get the benefits of this feedback loop from all their apps in cloud.
Incentivize the Right Behaviors
The example of a SPIFF (essentially a monetary incentive) to developers who reduce or optimize their cloud spend is a key point in the blog. While this certainly will drive the right types of behavior, our experience has been just giving developers direct visibility (through CloudHealth) will, by itself, have a big impact on driving good behavior. People naturally want to save money, but historically have been in the dark about exactly how much they are spending. Providing them that visibility helps to close that information gap.
You can also get the right behaviors by making the right behaviors easy. A common security example is a library of pre-vetted, secure, and compliant images enables developers to quickly get the functionality they need but in a way that is compliant with enterprise security policy. But we can also make it easy for developers to save on costs. For instance, you company can set a policy that you don’t allow unattached EBS volumes for more than 30 days. With that, any unattached EBS volumes for more than 30 days are automatically deleted. You don’t need to bother sending reports to teams for them to act on or having to follow up with them in case they didn’t get to it in a timely manner. These sorts of default policies and automatic behavior drive great savings over time.
Optimization, Optimization, Optimization
There are plentiful opportunities for optimization in cloud. Due to the fast, self-service nature of cloud consumption, it’s natural that your cloud spend is not optimized. Here CloudHealth really shines, as it can automatically identify optimizations in your cloud environment. For instance, moving from on-demand VM instances to 1- or 3-year reserved instances. Another example is automatically identifying waste, such as unused VMs. These VMs continue to run and you are billed for that, even if your business isn’t deriving any value from them. It’s these sorts of optimizations that enable customers to easily save 25% or more on their cloud spend.
Think About Repatriation Up Front
The post suggests that system architects design the app such that it can more easily be repatriated, specifically mentioning Kubernetes as an example of a technology that can enable greater portability. VMware’s focus on multi-cloud strategy enables choice of the abstraction layer where you want to drive consistency, and thus app portability, at two levels:
- Infrastructure: we provide consistent infrastructure through our standardized SDDC platform as well as with Kubernetes via Tanzu Kubernetes Grid. This means applications that run on the SDDC or on Kubernetes can now run anywhere that the SDDC or Kubernetes is (which is basically everywhere), without modification. In fact, the operational tooling can also remain consistent.
- Application: we enable application consistency in two ways – app architecture and DevSecOps pipeline standardization. For app architecture we provide integrated offerings like Tanzu Application Service and Spring for turn-key, microservice-based applications. For DevSecOps pipeline standardization, we offer Tanzu Advanced Edition which delivers consistent DevSecOps tooling that works across every cloud, including your on-prem datacenter. This ensures that the way your app gets built is consistent and the tooling for updating your app also works after repatriation.
Incrementally Repatriate
The post makes the great point that repatriation is not an all or nothing proposition. Indeed, most businesses today are hybrid, with some workloads on-prem and others in the cloud. The exact balance differs by company and can change over time as their environment evolves. And again, having a well thought out multi-cloud strategy is key to enabling incremental repatriation. The consistency options explained above apply equally well to the datacenter as they do to public cloud, enabling easy workloads mobility within that hybrid architecture.
The other question the post raises is the best type of workload to repatriate. It suggests the most resource intensive ones. I would also say ones that leverage commodity IaaS services are best suited and indeed the infrastructure consistency discussed above should support those applications well. Apps that use a lot of higher-level services specific to that public cloud, such as Lambda on AWS, likely will be much more difficult to repatriate and thus you should look for lower-hanging fruit first.
Choice of On-Prem Economic Model
Historically running workloads on-prem meant a large upfront CapEx expenditure to buy servers, network gear, and other equipment and possibly a physical location for the datacenter. As the blog post notes, today there are OpEx options as well. Using VMware Cloud on Dell EMC (VMC on Dell) either in your existing datacenter or in a co location facility (for full stack OpEx) delivers the same VMware SDDC infrastructure that you normally get on-prem, but as-a-service. This means that VMware handles the task of operating that infrastructure, freeing your team to focus on your applications. This is especially important for those companies that don’t have strong, in-house technical infrastructure talent.
However, if your company does have that talent, then we can help them through better automation of the underlying infrastructure. Here VMware Cloud Foundation (VCF) is best, as it provides the VMware SDDC as automated software, allowing your operations team to manage it, but more efficiently.
In terms of economics, both VCF and VMC on Dell are priced based on hardware characteristics (sockets for VCF and overall host configuration for VMC on Dell) and not on the number of VMs or workloads you run on them. This is in stark contrast to public cloud where you are charged based on the number of VMs, Kubernetes cluster instances, requests/second, and so forth. In other words – aspects of workload use. The important difference is that once you buy an additional server for VCF or VMC on Dell, you want to maximize its use to maximize the value of your spend. You can leverage all of vSphere’s great resource management capabilities to increase the number of workloads per server. In this way, you can optimize to save money compared to public cloud.
Summary
As we see above, a good multi-cloud strategy can help to address the considerations outlined in the blog. The reality is that every business is a multi-cloud business, whether they intend to be or not. It’s best to have a multi-cloud strategy for addressing the challenges that come with managing across many clouds. And as your cloud use grows, and your cloud spend grows, having a multi-cloud strategy will give you greater flexibility in terms of workload placement and cost management.
Comments