Cloud Native Federated Machine Learning with KubeFATE

The success of Artificial Intelligence (AI) relies heavily on the quantity and quality of data to train useful prediction models. In reality, data owned by an organization is so limited that it can hardly be used to build meaningful models. For example, a bank wants to create a model to detect suspicious transactions of money laundering. However, the ratio of suspected transactions is usually very low which makes the sample data scarce. Hence, the data the bank has is not enough to construct a good model.

One solution is to gather data from different financial institutions to create a large enough data set for training a model. Nevertheless, organizations cannot share data with each other due to business reasons or the enforcement of privacy-protecting laws and regulations. For these reasons, organizations usually have limited or poor-quality data which make it hard to create precise models. One of the most promising machine learning technologies to solve problems of data silos and data privacy and security is Privacy-preserving Federated Learning (FL). It allows collaborating organizations to create models without leaking data privacy. In the example mentioned, the bank can partner with other banks to participate in a federated training, which would allow them to obtain a prediction model based on the data each of the banks have without exposing their data to each other.

The distributed nature of a federated learning system brings in new operational challenges to manage and coordinate multiple parties in federated training. An efficient operational tool is needed to address these challenges.

An Enterprise Managed Solution

The Cloud Native Lab (CNL) within VMware’s Office of the CTO China, developed Project KubeFATE to solve these challenges by providing an enterprise managed solution that builds federated learning on Kubernetes in datacenters. KubeFATE is part of the open source project Federated AI Technology Enabler (FATE), which is hosted by Linux Foundation, to orchestrate infrastructure and serve across organizations.

FATE is the first open source framework to support both horizontal and vertical federated machine learning, as well as online and offline federated inference. KubeFATE supports the deployment of FATE in both Docker-Compose for dev/test environments and Kubernetes for production use.

The latest release of KubeFATE v1.4.0 brings in a new user experience of deploying and managing the full lifecycle of federated learning platform on Kubernetes. The release included a command line tool that offers common management operations of FATE clusters, including deploying, cleaning and updating clusters. The modular deployment was also enhanced, which fits well in users’ complex IT environment. Furthermore, data scientists can leverage Jupyter Notebook that is integrated with KubeFATE to develop models of federated learning.

KubeFATE includes two parts:

  1. The KubeFATE command line tool (CLI), which offers most common management operations for FATE cluster.
  2. The KubeFATE service, which is deployed as an application in Kubernetes. It exposes REST APIs which are designed in a way that can be easily extended and integrated into existing cloud management systems.

Above is a high-level architecture diagram of KubeFATE.

Quite a few open source FATE community users are using KubeFATE to operate their federated learning platform. These users are from different industries including banking, insurance, education and cloud service providers.

What’s Next?

KubeFATE follows FATE’s monthly release cadence. It is a core component to manage and operate a federated learning platform based on FATE. Future development of KubeFATE will focus on monitoring and logging features to provide full capabilities to operate federated machine learning clusters across clouds.

Go here to download KubeFATE and if you have any feedback please submit through  GitHub.


Henry, Director of Cloud Native Lab, leads the development and incubation of solutions of emerging technologies, including AI/ML, cloud native applications and blockchain. Henry is the creator of Harbor – an open source cloud native registry. He has been a speaker of KubeCon EU/NA/China for multiple times. Henry is the coauthor of the book “Blockchain Technical Guide” (in Chinese).


Layne Peng, Staff Engineer, is an architect of Cloud Native Lab, VMware OCTO China, responsible for Cloud Native ML/FML, Cloud Native Infrastructure initiatives. Layne is an active open source contributor and technical speaker. He has over 10+ granted patents of data center management, cloud computing, big data areas. Layne is the coauthor of book “Big Data: Strategy, Technology, Application” (in Chinese).