Service Mesh is fast becoming one of those hot topics where every industry player must have an offering in this space. Open source service mesh projects like Linkerd and Istio, or others like Consul from HashiCorp and Universal Service Mesh from Avi Networks (now a VMware company!) are all trying to answer many of the challenges we see with micro-services architecture. Istio specifically is holding the momentum and is being packaged and/or further developed by multiple cloud and Kubernetes vendors, from the mega clouds like Google that owns Istio and Anthos, AWS with App Mesh and Azure Fabric Mesh to the “on-premises” and Hybrid cloud vendors like Pivotal, Red Hat and others.
With such a crowded landscape, one may ask what does VMware has to offer that provides differentiating value?
But first, in a nutshell, what is service mesh?
As microservices architecture is becoming more prevalent in software development, the challenges it brings are becoming more apparent and they will need to be solved. Just one small example can be seen in this excellent blog by Target, “On Infrastructure at Scale: A Cascading Failure of Distributed Systems,” which demonstrates how complicated “modern applications”, or cloud-native applications can be at scale, and how dangerous these challenges can be to the business.
The most popular way of deploying microservices today is with Kubernetes.
Kubernetes allows you to take a declarative description of your microservices architecture (intended state) and implement that on the underlying IaaS provider. Kubernetes will take care of finding the right node, launching and maintaining your services in the cluster without the operator needing to specify the details. Kubernetes also takes care of basic service discovery where services can find each other using a name (instead of IPs). This takes advantage of the internal DNS within Kubernetes. When a service is reaching out to another service, the traffic will go through the east-west load-balancing within Kubernetes (single hop).
Service mesh brings a new level of connectivity between services. With service mesh, we inject a proxy in front of each service; in Istio, for example, this is done using a “sidecar” within the pod.
There are other implementations of the proxy but for now we will focus on the most common one which is the “sidecar”. This “sidecar” proxy will intercept the traffic between the pods and by looking into the http headers, will do the load balancing at the sidecar level within the pod without the need to add a hop to the east-west communications flow.
The main benefits of service mesh are:
- Security – Each service is assigned a name and a certificate by the service mesh CA (in Istio that CA is called Citadel), and now services talk with each other over mTLS (encrypted with a verified identity). We can also create L7 micro-segmentation policies based on the service identity (Service “frontend” can speak with service “backend”) which adds another layer of security to IP-based segmentation. All of this is done without the need to manage the certificates and key rotation; the CA component of the service mesh takes care of it.
- Observability – The sidecars can see into the traffic between them and see the roundtrip latency, error rates and other performance metrics between the micro-services.
- Traffic Control – When we are doing lifecycle management of our microservices we can control where traffic will go using routing rules. The main use cases are traffic shifting (canary testing) and traffic splitting between versions based on application level attributes, circuit breaking and more.
Today’s service meshes address the above aspects on service to service communications in a distributed architecture only. By abstracting the L7 communication to a “proxy” component, the above tasks can now be handled by abstraction layer in a declarative configuration, agnostic of any vendor libraries or coding language, and reducing the developer overhead.
So, have we found the be-all, end-all solution for all software development challenges?
Not quite. I am sure any reader of this blog post would agree, that while this sounds great, and it is, there are new challenges service mesh brings by itself, and the road to enjoying these benefits is long and tough. We see the following main challenges (but not limited to them) in the enterprise:
- Implementing service mesh is hard! From applications failing to take advantage of it due to misconfiguration/specific requirements imposed by Istio, mTLS configurations, difficulties with bootstrapping root/certificate management and more. Yes, it removes the overhead from the developers, but at scale with potentially hundreds and even thousands of YAML files to manage, it’s still far from easy. Considering many organizations are still struggling with implementing just basic Kubernetes in the Enterprise, the complexity of deployment and operations is a blocker for adopting service mesh.
- In addition to operations, you also have silos for security policies and enforcement as most of today’s service meshes are tied to a Kubernetes cluster or to a cloud vendor, so we are now again left with silos. An organization that has set to use multiple clouds and deploys the application in many Kubernetes environments needs to be able to manage and operate all of these clouds and the services that run on them. On the other hand, most customers we talk with say that while they do want better manageability, they would prefer that no dependencies between clouds/zones and Kubernetes clusters exist and that each one continues to operate even if the other clouds are unavailable.
- There’s more to cloud-native than just Kubernetes. Yet, all service meshes today address service-to-service communications and mostly in Kubernetes as it is the most widely used distributed applications runtime. But, most application workloads are still running on VMs which will continue to be the case for the foreseeable future, and there are other workloads like FaaS (serverless) and tomorrow there might be something new. We cannot assume that the whole world will switch to Kubernetes, and the service meshes of today do not address that.
VMware NSX Service Mesh — Go Long, Go Wide
The NSX Service Mesh solution from VMware, which will join the very successful NSX portfolio is unique in the fast-evolving service mesh landscape; here’s how:
Global Namespaces
This is the primary construct with NSX Service Mesh and one of the main differentiation points. We discussed how Kubernetes provides service discovery and scheduling and how service mesh does that in a fully distributed way, but what about when we have more than one Kubernetes cluster where services are running on? When it comes to running applications on Kubernetes you will likely find yourself running an application in multiple Kubernetes clusters, whether for resiliency purposes, or separation between production, test and Dev, or even for disaster recovery. Istio and almost all service mesh offerings today are bound to one Kubernetes cluster as organizations want to keep their Kubernetes clusters independent from one another. With federation we can now create mTLS relationships between clusters as well as service mesh providers and apply observability and control across clusters. (More on Federation explained in the next section).
NSX Service Mesh, will automatically group users into users groups, data to data groups, and services to service groups. By arranging these objects in groups, it enables us to create a “virtual sandbox” for an application which includes all its components. We call this a “Global Namespace”, or in short GNS. It is like the Kubernetes definition of a namespace (grouping of objects and settings in Kubernetes used for a specific tenant or application), but instead of being tied to a Kubernetes cluster, VMware NSX Service Mesh is elevating it above the physical world or a particular Kubernetes cluster.
There are obvious use cases for this. For example, we now can upgrade a Kubernetes cluster seamlessly by placing its services in a different Kubernetes cluster, NSX Service Mesh will perform all the required wiring automatically so that it won’t break the flow of our application. Complete freedom of movement.
Another glaring use case is a disaster recovery situation where you need to shift traffic at the external load balancer when a zone is down, which is outside of Kubernetes. With NSX Service Mesh, we will take care of the internal load balancer and the external one within our abstraction which is the Global Namespace so everything can happen automatically.
Each GNS has its own service discovery system, observability, encryption, policies, SLAs all of those are GNS features in NSX Service Mesh.
VMware is THE abstraction company. It’s in our DNA, what we evolved from, where we started.
As such we understand how to build solutions that are not tied to any specific cloud or platform, we took it to a level that we abstract even our own stuff. We also understand that as it goes with abstraction layers you start with one use case and end up with a hundred more. Service mesh is an abstraction layer, and no company is better situated to bring a service mesh solution to the enterprise market.
Federation
This is the basis of how we are able to create Global Name Spaces across clouds. Today’s service meshes are tied to a specific Kubernetes cluster or a cloud. As enterprises need to connect across disjoined environments, VMware understands that any solution is supposed to be inclusive and interoperate. This is our federation.
NSX Service Mesh federates in two ways:
- NSX Service Mesh will be able to federate between Kubernetes clusters, which will enable us to apply a unified set of features across clusters on Global Namespaces.
- We can also federate with external service meshes such as Pivotal Service Mesh, Red Hat, Amazon etc. In the latest Google Next conference, VMware and Google demonstrated how NSX Service Mesh federates with Google Anthos. You can see that session here. We are working with many companies like Scytale, HashiCorp, Google, Microsoft and more on an initiative to create an open source definition of how to federate between different service mesh implementations.
More than just Kubernetes or Istio
VMware’s NSX Service Mesh will extend service mesh to include a lot more than just Kubernetes, out to any platform, cloud, or workload type. VMware is not only a VM company or a Kubernetes company, we are also a networking and a hybrid cloud company. While we will start with support for Istio and different flavors of Kubernetes, we are uniquely positioned to be able to also support VMs, gateways, multiple clouds and other types of service meshes in our data plane as we continue to develop the NSX Service Mesh solution.
Users and Data
Global Namespaces within NSX Service Mesh will give enterprises deep visibility into how users are accessing applications and data, and be able to apply rich access control policies for data protection and privacy. How is that done?
As service meshes today only address service-to-service control, security, and observability, NSX Service Mesh also introduces users and data as primary objects, where we can create policies, secure and monitor users accessing data through services across all the constructs specified above. VMware is upstreaming a lot of this work on extending service mesh to data access from the proxy to allow other vendors to extend it as well. Now we will be able not only see and control which services are accessing other services, but also which users are accessing which datastores and services. As explained above with GNS, NSX Service Mesh will be able to add all the services, users and datastores that an application holds across clouds in a GNS and apply observability, security and policies on them.
Predictable Response Time
This is where we start creating new use cases beyond what we initially envisioned with service mesh. Just like when compute virtualization started, and we only had “consolidation” and now virtualization is the basis of IaaS with all its use cases, the same goes with service mesh where we will start to see new use cases we haven’t thought about or envisioned. One of the things we are working on is using service mesh to assign latency SLA policy to our services and our controllers to automatically optimize, and self-heal distributed microservices applications to achieve the SLA. A huge challenge with microservices and distributed applications are the cascading failures and ripple effect of any change you make, e.g. scaling out one service affects all other services in the mesh. At VMware we are taking our vast experience optimizing applications and creating an engine that ensures predictable response times and resiliency across microservices applications. You can read an abstract of this work and get a feel of what it is here.
Final Thoughts
Service mesh is an abstraction layer by itself that brings a lot of promise, and along with it, some new challenges. To really be able to enjoy its benefits, it needs to become simple and elevated from any physical anchors. NSX Service Mesh is providing a way to harness the power of service mesh across clouds and vendors, and extending to provide visibility and control across the end to end transaction from users to data.
To learn more, come to see us at VMworld in the following sessions:
Introduction to NSX Service Mesh – CNET1033BU
Why networking and Service Mesh matter for the future of apps – CNET2741BU
Niran (@niranec on Twitter) is a Principal SE in the Office of the CTO at VMware. Niran devotes his time in helping Enterprise and global customers in their journey to digital transformation in their business with a focus on network and security. Niran is a frequent speaker at industry events and conferences including VMworld, VMUGs, SQL Saturdays and more.