In today’s general session keynote at VMworld 2020@, VMware CTO Greg Lavender went through how Tanzu Service Mesh (TSM) autoscaling was implemented to make a cloud native application more resilient.
Key highlights of the demo:
- Ability to configure autoscaling functionality without intrusion to application logic.
- Visualize the ACME cloud native application from within TSM.
- Inspect performance charts of how each microservice is scaling.
The demo shows ACME Inc., a cloud native application, working as expected under normal traffic conditions, and without autoscaling, however, once traffic rapidly increases, the application starts to perform poorly. A quick inspection of the application determines that autoscaling is not configured on the application, therefore in order to remediate, an administrator installs autoscaling YAML to help activate TSM autoscaling at runtime without needing to redeploy the application. Immediately after the autoscaling is turned on, microservices instances are being scaled and the latency is back to normal levels. The demo then shows that when traffic subsides, the TSM autoscaler starts to descale the microservice instances without causing latency or performance issues. Finally, the demo finishes with a quick sneak into the Service Level Objectives (SLO) feature of TSM.
The rest of this post walks through how to set up process in 5 step process.
Step 1: Inspect ACME Application Service Graph in Tanzu Service Mesh
Step 2: Navigate application under normal traffic conditions and without any autoscaling being configured
In the demo, we navigate through the application and inspect its various performance charts. Figure-2 shows the application is up and running, all the functionality seems to be fine. Now we can use TSM performance charts as shown in Figure-3, which shows the service instance counts (the number of scale-out instances of the microservice, in this case there is no autoscaling configured), Figure-4, shows the service request count (essentially amount of traffic against the service), Figure-5 shows the Latency chart of a microservice, and Figure-6 shows the microservice CPU chart. All these charts allow us to see that the ACME application, under normal traffic levels, seems to be operating well with latency less than 100ms, as shown in Figure-7. At this point the application has no autoscaling configured, so we will try to generate load against it in Step-3 and see how it performs.
In Figure-4 we show the request count chart, we see traffic is steadily being processed by the application.
Step-3: Generate load against ACME Application and inspect its performance
Let’s generate traffic to see if we can negatively impact the application’s performance by applying the quick command in Figure-8. Applying this command allows us to show that as traffic is building up, there is no scaling in action (service instance count stays constant). We then see the latencies rapidly increase, causing various performance issues to the application (see Figure-9 for the results).
Step 4: Configure TSM Autoscaling and Improve ACME Application Performance when under heavy traffic
In Figure-10, we show the autoscale.yaml which specifies the minimum number of microservice instances to be 1, and the maximum we should scale out to as 10. It also specifies the CPU scaleUp threshold of 60% and scaleDown CPU threshold of 40. Then in Figure-11, we use a quick kubectl command to apply the autoscale.yaml to the ACME application. This was applied live at runtime without needing to redeploy the application.
In Figure-12, we immediately see from the performance charts that autoscaling is working (service instance counts are increasing) – latencies are back down to normal levels even though the traffic continues to increase. We just demonstrated that with a quick configuration by an SRE, a non-scaling or nonperforming application can be made more resilient by enabling autoscaling on it without needing to impact the business logic, and without needing to hard code anything that requires any redeployment of the applications. It essentially means autoscaling becomes a platform level resiliency feature offered to all application services that are interested in having it turned on.
In Figure-13, we see that as traffic subsides, the TSM autoscaler starts to descale the number of instances in a way that performance is not negatively impacted.
Now that the ACME application has been healed with a quick non-intrusive to the code base TSM autoscaling, we leave you with how you can setup SLOs in TSM.
Step-5: Introducing how you can quickly configure SLOs in TSM
Figure-14 shows how we can quickly setup an SLO for P-90 latency less than 120ms. To see the full demo on how to create SLOs in TSM go here.
To learn more about what the Office of the CTO Cloud Architecture team is doing at VMworld 2020, please take a look at this blog post.
Co-innovation in Action
This demo would not have been possible without the great collaboration and co-innovation between the Office of the CTO (OCTO) Cloud Architecture team and the Networking Security Business Team (NSBU). We are grateful for the support of our executive sponsors, VMware CTO Greg Lavender and VMware NSBU CTO Pere Monclus, to the entire engineering team across both groups who worked on PRTC, TSM Autoscaling scaling, and SLO product features of TSM.