[This post was written by Bruce Davie, Martin Casado and Brad Hedlund.]
Some months ago, we wrote a post over on Network Heresy explaining the relationship between various tunneling protocols that are used in support of network virtualization. Because this issue of the encapsulation used for network virtualization seems to keep on causing confusion, we’re providing an updated version of the post here.
To boil this down to its essence, there are just two main issues:
- Network Virtualization is an architectural change in how networks are built and operated, and tunnel encapsulations are a small, modular part of a network virtualization solution.
- As encapsulations, VXLAN and STT each have their respective merits, and we see them co-existing for the next few years.
To talk about VXLAN and STT (or NVGRE) as competing architectures is to miss the point. The tunnel encapsulation is a piece of mechanism, and not even the most important one. To make this very concrete: the network virtualization platform (NVP) that we ship today supports both GRE and STT tunnel types, and will soon support VXLAN as well. We also support IPSEC as a tunneling option in some scenarios. The choice of any one tunnel type to interconnect a pair of endpoints in a virtual network has negligible impact on the overall solution. You can even mix tunnel types in a single virtual network.
In a nutshell, the choice of tunnel type to build a virtual network is analogous to making a choice of cable type to build a physical network. The cable you choose is a decision based on the interface type of the two devices requiring a connection. Independent of the choice, the features and capabilities of the network are delivered by the network devices and their control CPUs, not by the cables that connect them together. Similarly, network virtualization delivers a set of capabilities that is largely independent of the chosen tunneling protocol.
In the following paragraphs, then, let’s keep in mind the fact that a complete network virtualization solution is about much more than how the bits look on the wire. Network virtualization entails (at least) a control plane, a management plane, and a set of new abstractions for networking, all of which aim to change the operational model of networks from the traditional, physical model. We’ve written about these aspects of network virtualization before (e.g., here).
Now, however, we do want to talk about tunneling encapsulations, for reasons that will probably be readily apparent. There is more than one viable encapsulation in the marketplace now, and that will be the case for some time to come. Does it make any difference which one is used? In our opinion, it does, but it’s not a simple beauty contest in which one encapsulation will be declared the winner. We will explore some of the tradeoffs in this post.
There are three main encapsulation formats that have been proposed for network virtualization: VXLAN, NVGRE, and STT. We’ll focus on VXLAN and STT here: we’ll soon be supporting both in NVP, and they represent two quite distinct points in the design space. Each encapsulation has its merits, as we’ll see below.
One of the salient advantages of VXLAN is that it’s gained traction with a solid number of vendors in a relatively short period. There were demonstrations of several vendors’ implementations at VMworld 2012. It fills an important market need, by providing a straightforward way to encapsulate Layer 2 payloads such that the logical semantics of a LAN can be provided among virtual machines without concern for the limitations of physical layer 2 networks. For example, a VXLAN can provide logical L2 semantics among machines spread across a large data center network, without requiring the physical network to provide arbitrarily large L2 segments.
At the risk of stating the obvious, the fact that VXLAN has been implemented by multiple vendors makes it an ideal choice for multi-vendor deployments. But we should be clear what “multi-vendor” means in this case. Network virtualization entails tunneling packets through the data center routers and switches, and those devices only forward based on the outer header of the tunnel – a plain old IP (or MAC header). So the entities that need to terminate tunnels for network virtualization are the ones that we are concerned about here.
In many virtualized data center deployments, most of the traffic flows from VM to VM (“east-west” traffic) in which case the tunnels are terminated in vswitches. It is very rare for those vswitches to be from different vendors, so in this case, one might not be so concerned about multi-vendor support for the tunnel encapsulation. Other issues, such as efficiency and ability to evolve quickly might be more important, as we’ll discuss below.
Of course, there are plenty of cases where traffic doesn’t just flow east-west. It might need to go out of the data center to the Internet (or some other WAN), i.e. “north-south”. It might also need to be sent to some sort of appliance such as a load balancer, firewall, intrusion detection system, etc. And there are also plenty of cases where a tunnel does need to be terminated on a switch or router, such as to connect non-virtualized workloads to the virtualized network. In all of these cases, we’re clearly likely to run into multi-vendor situations for tunnel termination. Hence the need for a common, stable, and straightfoward approach to tunneling among all those devices.
Now, getting back to server-server traffic, why wouldn’t you just use VXLAN? One clear reason is efficiency, as we’ve discussed here. Since tunneling between hypervisors is required for network virtualization, it’s essential that tunneling not impose too high an overhead in terms of CPU load and network throughput. STT was designed with those goals in mind and performs very well on those dimensions using today’s commodity NICs. Given the general lack of multi-vendor issues when tunneling between hypervisors, STT’s significant performance advantage makes it a better fit in this scenario.
The performance advantage of STT may be viewed as somewhat temporary – it’s a result of STT’s ability to leverage TCP segmentation offload (TSO) in today’s NICs. Given the rise in importance of tunneling, and the momentum behind VXLAN, future NICs will support other tunnel encapsulations without disabling TSO. When that happens, performance differences between STT and VXLAN should (mostly) disappear, given appropriate software to leverage the new NICs.
Another factor that comes into play when tunneling traffic from server to server is that we may want to change the semantics of the encapsualution from time to time as new features and capabilities make their way into the network virtualization platform. Indeed, one of overall advantages of network virtualization is the ease with which the capabilities of the network can be upgraded over time, since they are all implemented in software that is completely independent of the underlying hardware. To make the most of this potential for new feature deployment, it’s helpful to have a tunnel header with fields that can be modified as required by new capabilities. An encapsulation that typically operates between the vswitches of a single vendor (like STT) can meet this goal, while one designed to facilitate multi-vendor scenarios (like VXLAN) needs to have the meaning of every header field pretty well nailed down.
So, where does that leave us? In essence, with two good solutions for tunneling, each of which meets a subset of the total needs of the market, but which can be used side-by-side with no ill effect. Consequently, we believe that VXLAN will continue to be a good solution for the multi-vendor environments that often occur in data center deployments, while STT will, for at least a couple of years, be the best approach for hypervisor-to-hypervisor tunnels. A complete network virtualization solution will need to use both encapsulations. There’s nothing wrong with that – building tunnels of the correct encapsulation type can be handled by the controller, without the need for user involvement. And, of course, we need to remember that a complete solution is about much more than just the bits on the wire.