Tech Deep Dives

A Deep Dive into the Tanzu Service Mesh Autoscaling VMworld 2020 Keynote Demo

In today’s general session keynote at VMworld 2020@, VMware CTO Greg Lavender went through how Tanzu Service Mesh (TSM) autoscaling was implemented to make a cloud native application more resilient.


Key highlights of the demo:

  1. Ability to configure autoscaling functionality without intrusion to application logic.
  2. Visualize the ACME cloud native application from within TSM.
  3. Inspect performance charts of how each microservice is scaling.


The demo shows ACME Inc., a cloud native application, working as expected under normal traffic conditions, and without autoscaling, however, once traffic rapidly increases, the application starts to perform poorly. A quick inspection of the application determines that autoscaling is not configured on the application, therefore in order to remediate, an administrator installs autoscaling YAML to help activate TSM autoscaling at runtime without needing to redeploy the application.  Immediately after the autoscaling is turned on, microservices instances are being scaled and the latency is back to normal levels.  The demo then shows that when traffic subsides, the TSM autoscaler starts to descale the microservice instances without causing latency or performance issues.  Finally, the demo finishes with a quick sneak into the Service Level Objectives (SLO) feature of TSM.

The rest of this post walks through how to set up process in 5 step process.

Step 1: Inspect ACME Application Service Graph in Tanzu Service Mesh

ACME is a typical cloud native application made of several microservices written in various languages, like node.js, python, Go, and JavaScript. Figure-1 shows a visualization of the ACME Application.  The application also uses a MongoDB NoSQL cluster for its persistence.  What is great with this visual is that you can see which service is communicating with each other, and ability to navigate to service concern quickly.  Also shown on the diagram is the load generator that is going to be used later to generate traffic against this application.

Figure-1 ACME service graph as configured in TSM

Step 2: Navigate application under normal traffic conditions and without any autoscaling being configured

In the demo, we navigate through the application and inspect its various performance charts.  Figure-2 shows the application is up and running, all the functionality seems to be fine.  Now we can use TSM performance charts as shown in Figure-3, which shows the service instance counts (the number of scale-out instances of the microservice, in this case there is no autoscaling configured), Figure-4, shows the service request count (essentially amount of traffic against the service), Figure-5 shows the Latency chart of a microservice, and Figure-6 shows the microservice CPU chart.  All these charts allow us to see that the ACME application, under normal traffic levels, seems to be operating well with latency less than 100ms, as shown in Figure-7.  At this point the application has no autoscaling configured, so we will try to generate load against it in Step-3 and see how it performs.


Figure-2 ACME Application UI


Figure-3 ACME Application Service Instance Count chart of a microservice in TSM

In Figure-4 we show the request count chart, we see traffic is steadily being processed by the application.

Figure-4 ACME Application Service Request Count chart of a microservice in TSM
Figure-5 ACME Application Service Latency chart of a microservice in TSM
Figure-6 ACME Application Service CPU chart of a microservice in TSM
Figure-7 ACME Application Service Latencies showing less than 100ms

Step-3: Generate load against ACME Application and inspect its performance

Let’s generate traffic to see if we can negatively impact the application’s performance by applying the quick command in Figure-8. Applying this command allows us to show that as traffic is building up, there is no scaling in action (service instance count stays constant). We then see the latencies rapidly increase, causing various performance issues to the application (see Figure-9 for the results).

Figure-8 generate Traffic against the ACME application
Figure-9 ACME Application performance charts show a decline in its performance

Step 4: Configure TSM Autoscaling and Improve ACME Application Performance when under heavy traffic

In Figure-10, we show the autoscale.yaml which specifies the minimum number of microservice instances to be 1, and the maximum we should scale out to as 10. It also specifies the CPU scaleUp threshold of 60% and scaleDown CPU threshold of 40. Then in Figure-11, we use a quick kubectl command to apply the autoscale.yaml to the ACME application. This was applied live at runtime without needing to redeploy the application.

In Figure-12, we immediately see from the performance charts that autoscaling is working (service instance counts are increasing) – latencies are back down to normal levels even though the traffic continues to increase.  We just demonstrated that with a quick configuration by an SRE, a non-scaling or nonperforming application can be made more resilient by enabling autoscaling on it without needing to impact the business logic, and without needing to hard code anything that requires any redeployment of the applications.  It essentially means autoscaling becomes a platform level resiliency feature offered to all application services that are interested in having it turned on.

In Figure-13, we see that as traffic subsides, the TSM autoscaler starts to descale the number of instances in a way that performance is not negatively impacted.


Figure-10: Applying a Configuration to enable TSM Autoscaling
Figure-11: Apply the actual autoscale.yaml via kubectl
Figure-12: TSM Autoscaling taking action against ACME by autoscaling it
Figure-13: Descaling in action

Now that the ACME application has been healed with a quick non-intrusive to the code base TSM autoscaling, we leave you with how you can setup SLOs in TSM.

Step-5: Introducing how you can quickly configure SLOs in TSM 

Figure-14 shows how we can quickly setup an SLO for P-90 latency less than 120ms. To see the full demo on how to create SLOs in TSM go here.

Figure-14: TSM SLO Feature 

To learn more about what the Office of the CTO Cloud Architecture team is doing at VMworld 2020, please take a look at this blog post.

Co-innovation in Action

This demo would not have been possible without the great collaboration and co-innovation between the Office of the CTO (OCTO) Cloud Architecture team and the Networking Security Business Team (NSBU). We are grateful for the support of our executive sponsors, VMware CTO Greg Lavender and VMware NSBU CTO Pere Monclus, to the entire engineering team across both groups who worked on PRTC, TSM Autoscaling scaling, and SLO product features of TSM.