A VMware Research team led by Lalith Suresh created a tool that enables programmers to specify cluster management logic in a high-level declarative language. The tool then synthesizes the code to compute policy-compliant configurations automatically and efficiently. Using this new tool, called Declarative Cluster Management (DCM), data center developers can use SQL to easily add, remove, and modify constraints and policies—the essential work of a cluster manager.
With DCM, programmers can write policies in a declarative style, using SQL. DCM then automates the details, and algorithms that they would otherwise have to write by hand. Behind the scenes, a compiler takes care of all the heavy lifting. As a result, DCM significantly lowers the barrier to building schedulers declaratively, something no other tool that has been able to do until now.
Why DCM?
Modern cluster management systems like Kubernetes, DRS, OpenStack, and OpenShift are responsible for configuring a complex, distributed system and allocating resources efficiently. Whether juggling containers, virtual machines, microservices, virtual network appliances, or serverless functions, these systems must enforce numerous cluster management policies.
Currently, developers implement such systems by designing custom application-specific heuristics. This approach has proved to be unsustainable, since ad-hoc heuristics both perform poorly and introduce overwhelming complexity. The ad-hoc heuristics make it challenging to add important new features, and they must be adapted continuously in order to work for arbitrary combinations of policies, which makes it hard to evolve such systems over time.
With DCM, the developer maintains the application state in a relational database and specifies constraints in the form of database queries in SQL. The compiler then generates code that can be used to efficiently find configurations that satisfy those constraints. As a result, DCM hides much of the complexity of building such systems, making it very easy for programmers to write their own cluster scheduler. Typically, schedulers take several years to stabilize, but the VMware Research team expects to cut that time significantly with DCM.
DCM Compiler Architecture
The DCM compiler uses structural information extracted from the SQL specifications. The tool generates code that efficiently translates the state from the database into an optimization model of the problem. At runtime, when a system configuration decision is to be made, the generated code extracts the current state of the system from the database, solves it using an off-the-shelf solver, and generates a new configuration that satisfies all the specified constraints.
Use Case: Kubernetes Scheduler
In a paper published in 2019, the VMware Research team built a Kubernetes scheduler to show how DCM automates cluster management. The scheduler operates as a drop-in replacement for the default Kubernetes scheduler, supporting all its capabilities and adding new ones.
The VMware Research team found that it was significantly easier to build a scheduler using DCM than building a Kubernetes scheduler in the conventional way, which demands more than 10,000 lines of code. Now a programmer can build the same scheduler with about a thousand lines of Java code and a couple of hundred lines of SQL, a significant benefit.
Until now, there was no way to easily build a policy-based cluster manager without having to code all of it from scratch. With DCM, policies are easily specified using SQL, since the tool automates the work of turning those policies into optimal decisions for the programmer.
VMware Research released DCM as open-source code, and it is available for use on the VMware GitHub repository at: https://github.com/vmware/declarative-cluster-management/
For more details on DCM, download the paper Synthesizing Cluster Management Code for Distributed Systems at: https://dl.acm.org/doi/10.1145/3317550.3321444
The Team
VMware Researcher Lalith Suresh led the DCM project after many years of thinking about how to simplify developing cluster managers. The VMware Research team also included senior researchers Nina Narodytska and Leonid Ryzhyk, post-doctoral researcher Sangeetha Abdu Jyothi, and research interns João Loff and Faria Kalim.