Tech Deep Dives

Enterprise CI/CD: A Holistic View

Software companies have an ongoing need to accelerate development and integrate code to keep pace with market demands. There are many techniques and tools designed to fulfill this objective, but one approach has become an imperative in cloud-native development — Continuous Integration/Continuous Development (CI/CD). Why? Because CI/CD is one of the most effective ways to deliver continuous improvements to customers.

Companies that have adopted the CI/CD methodology report significant improvements to the operational efficiency of their teams, processes, technology, and tools. In the VMware Office of the CTO (OCTO) Cloud Architecture group, we have wholeheartedly embraced CI/CD. In this post, we explore CI/CD from the viewpoints of various infrastructure types and application segments, as well as by the personas involved in the full lifecycle of application development.

CI/CD process infrastructure and tools

There are many moving parts and personas involved in the full lifecycle of application development. Having witnessed CI/CD becoming the new normal in cloud-native application development, I have seen the tactic give rise to a plethora of available tools with robust feature sets, including automated builds, deployments, rollbacks, and support for multiple infrastructures as deployment targets. In some cases, these technologies provide advanced progressive-delivery capabilities, such as Canary and Blue-green. Figure 1 highlights the tools available on the infrastructure and platforms side, versus the application side.

Figure 1: Enterprise CI/CD Tools

 

In our experience across multiple programs and projects, we have used several of these tools, such as Saltstack, Puppet, vRealize Automation/Cloud Assembly, VMware Codestream, Concourse (in the infrastructure/platform segment), Bamboo, Jenkins, and Gitlab CI/CD (in the application segment). It is also a common practice to develop a CI/CD platform that offers users an enhanced integrated experience with the help of the APIs and extensions these tools provide.

In Figure 2, we depict enterprise CI/CD maturity level in various areas. We have seen companies attaining high levels of maturity in the areas of application rollouts and infrastructure provisioning (indicated by darker green boxes in the right side of the diagram), due to the modern tools and solutions that have evolved in these areas. The lighter green boxes indicate the opportunity areas for CI/CD products and solutions. These activities can help reduce automation silos and provide an integrated application CI/CD experience.

Figure 2: Enterprise CI/CD Segments

Who’s involved with CI/CD?

While it is important to address and evolve CI/CD across those multiple segments, it is also vital to ensure that the various personas involved in application development and delivery play their parts. These roles include infrastructure operators, application-platforms operators, application developers, application architects, quality engineers, release engineers, and site-reliability engineers. These personas have one great thing in common: they all write code, which is key to rapid development and higher feature velocity.

People in certain functional roles — such as business-systems analysts — often think that their work cannot be codified. However, there is methodology they can leverage to facilitate rapid iteration and full automation. It is a matter of breaking down the problem into smaller automatable and manageable functions.

Functional roles play a key part in CI/CD. If they can codify their work, it will boost project efficiency. Using the example above, a business-systems analyst can complement functional user stories with scenarios using well-defined semantic constructs around “given-when-then.” While a program or tool can read these scenarios and assist in autogenerating test cases, that is beyond the scope of this post.

Infrastructure/application platform operators: Modern applications run on modern infrastructure and platforms. All the infrastructure operations, including Day 1 provisioning and Day 2 activities — such as scaling, OS patches, application/database software updates, log aggregation, and backups — are typically codified to enable faster and more reliable large-scale operations. Like any other code, this code must follow CI/CD in order to guarantee continuous reliability.

In Figure 3, we show how a code operator can execute an example CI/CD workflow, preferably one implemented with pipeline automation.  Following that, we will review the role each person plays in this workflow.

Figure 3:  CI/CD Application Code Promotion

 

Application architects and developers: Architects and developers need to ensure that every change in code and configuration related to applications is committed into the version-control system, including database scripts. The changes must go through essential validations to adhere to requirements, in terms of coding standards, performance, security, data validations, health checks, and so on. Fail fast with these validations to avoid more expense later in the delivery.

The CI/CD workflow for application delivery should incorporate all the required safety nets, preferably as automated tasks, or as manual tasks where automation is not possible. (NOTE: every attempt should be made to adhere to the first principal of CI/CD — automate everything, and if it cannot be automated, break it down to smaller automatable pieces). Developer tests must be extensive and should include unit tests, integration tests, and consumer-driven contract tests. These development tests must be run on each commit as part of the CI/CD pipeline, as well as part of local CI on developer machines.

Solutions like Tanzu Service Mesh Autoscaling help empower developers even more, because they can configure application autoscaling rules and policies in Tanzu Service Mesh as Kubernetes CRDs that move along with their application code as part of CI/CD, without breaking their existing CI/CD workflows. Check out this Tanzu Service Mesh Autoscaling use case and example. This allows for a great developer experience, while ensuring powerful integrated service monitoring and control for operators.

In addition, application API and data architecture should support backward compatibility when progressive delivery techniques (Blue-green, Canary deployments, etc.) are applied.

Quality engineers: Test automation is key to CI/CD. Test engineers ensure the efficiency of test suites by amending behavior tests to existing automation suites as part of new feature rollouts. At the same time, they should retire tests that are no longer in use, once their features become dormant. Furthermore, large test suites must be broken into modular suites. This allows for parallel execution of large tests and faster completion of test tasks in CI/CD pipelines. Another key to successful testing in CI/CD is manual exploratory testing by test engineers. Beyond the regular functional automation and manual tests, quality functions must also periodically focus on designing and executing scale/performance tests, failover tests, chaos experiments, and so on. This contributes to application stability and availability, helping meet production SLOs.

Release engineers: Release engineers play a pivotal role in CI/CD. They handle a long list of tasks, including providing curated application images and pipeline templates to enable developers to easily onboard new applications as they come into existence, application scaling policies, automated application updates/rollback, application versioning policies, automated progressive delivery controls, and so on.

The bottom line

Reliability and repeatability are key aspects of CI/CD that require automation. We recommend questioning every manual task in a greenfield opportunity. Implementing automation from the start is easier than performing major revamps to clear technical debts across multiple components of a distributed system. In a brownfield scenario, we recommend formulating an automation strategy that best suits the teams and the technology stack. The best strategy is to automate low-effort, high-value tasks first.