Tech Deep Dives The Future of...

Three Approaches to Machine Learning

At VMware, we view machine learning (ML) as a transformative technology — both for our customers and within VMware itself. Because of its import and its enormous potential, we created a Machine Learning Program Office within the Office of the CTO to coordinate and drive ML-related activities internally and externally.

This article is the first in a series dedicated to ML techniques written by members of our ML community. We start with this introductory piece and will continue with future articles on a wide variety of topics related to ML and other related data-science topics. We hope you will enjoy them and find them useful.

Today’s enterprise captures a massive amount of data from its customers and transactions, including web traffic, purchase histories, processing time (of many varieties), accounts payable, human resources, and more. The ability to harness, analyze, and use this data is key to winning in a competitive market.

Data science has emerged to fill this demand gap. Tailor-made for big and unstructured data, this discipline combines and adds a layer of intelligence on top of the more traditional data-analysis techniques, such as statistics, operations, and research. That layer of intelligence comes from machine learning (ML).

This blog post introduces machine learning’s three most common uses: prediction, pattern recognition, and optimization. We’ll explore the rationale, key assumptions, and final goals of each. Since each of these is an active research area encompassing a wide range of solution approaches, the treatment in this article will be at a relatively high level.

Prediction modeling

Also known as “supervised learning,” prediction modeling is the most common approach in machine learning. It has a wide range of applications,
for example:

  • Distinguishing legitimate emails from spam
  • Dispatching customer-reported problems to appropriate support teams, based on symptom description
  • Determining whether a person is allowed entry to a building, based on real-time facial recognition

Technically speaking, our goal is to discover the hidden relationships between a set of known input data attributes and an output attribute. This is done through a training process that creates a prediction model.

The supervised learning process shown in the diagram above depicts the following steps:

  1. We first collect many examples of real-life data with corresponding input attributes as well as the labeled output attribute. These examples define the ground truth relationship between inputs and outputs – the relationship that we wish to model. The set of examples with known input and output attributes is called “training data.”
  2. The training data is fed into a training process, which uses one of many available parameterized models (e.g., a linear regression model with coefficients as parameters).
  3. As part of the training process, the model’s parameters will be iteratively adjusted so that it can learn to produce an output close to the labeled output for each training example. When the model cannot further improve its accuracy, the training process is complete, and we save the trained model.

The trained model is then deployed into a production environment (usually SaaS), where the model is served in a highly scalable service-oriented architecture. Once deployed, the model can then be used to predict outputs for new input data that have not been seen during the training process. This prediction process is called inference. The serving framework extracts input attributes from the incoming request, executes the model to estimate the output attributes, and returns its predicted output to the requester.

Pattern Recognition

Based on unsupervised learning (which doesn’t require data to contain an explicit output label), pattern recognition is about extracting higher-level insights from raw data. Extracted insights are then combined with human knowledge to make strategic decisions. Pattern recognition has a wide range of applications and can be further grouped into the following categories:

Discovering hidden relationships. The unsupervised learning mechanism examines different combinations of attribute variables to see how one variable is dependent on another variable. It reports those variables with a strong interrelationship, “teenagers spend more time on Instagram, while older generations spend more time on Facebook,” or “customers who purchase product x and product y tend to upgrade product x within a year.” These findings can be used to confirm or correct current beliefs. When combined with existing human knowledge, the finding may generate new hypotheses and ideas. We can conduct experiments to explore new emerging opportunities.

Grouping similar items. Based on a user-defined “distance” function, we can measure how similar two items are (shorter distance means more similar). We can then partition items such that similar items will fall into the same group. For example, we could use grouping to design different marketing campaigns to target different groups of customers or to identify trending topics in online forums. Once the items are grouped, we can create more effective strategies at the group level.

In the diagram below, K-Means is a simple algorithm to create such a grouping. It uses an iterative algorithm to alternate between assigning data points to the center of groups and recalculating the center based on adjusted membership. After many rounds of iteration, data is partitioned into multiple groups.

DiagramDescription automatically generated

Detecting anomalies. After we identify frequent patterns in the data, we may be interested in detecting deviation from the pattern which we now consider “normal.” Such deviation may indicate the occurrence of rare but suspicious activities that we would like to identify or predict, such as fraudulent transactions, customer churn, or even a planned terrorist attack. Anomaly detection follows a common pattern of building a model to capture the characteristics of “normality” and then using the model to evaluate new data to see whether its distribution differs from the training data beyond a threshold.


In business, our ultimate goal to create value. Optimization is about how to determine the best decision, given all available information. Examples may include:

  • How to allocate salespeople to potential customers with a high propensity to purchase
  • How to assign support engineers to customer problems while minimizing the problem’s impact to a customer’s smooth IT operations
  • How to configure our datacenter infrastructure to maximize the workload throughput while minimizing energy consumption

One could, for example, use the output of a predictive model to drive an optimized decision-making process. One straightforward approach is to feed the output of a prediction model to the application logic, which codifies how the decision is made under different conditions.

DiagramDescription automatically generated

Hard coding decision logic into applications can quickly become unmanageable when there are many possible combinations of actions and conditions. To address this issue, ML can also take over the decision-making steps via “reinforcement learning,” an emerging ML approach that is structured as an interaction between an agent (the ML portion) and the environment (the real world).

The agent mimics the way human beings continuously learn from past experiences.
Reinforcement learning is built on the following basic concepts:

  • “State” represents the conditions of the environment at the moment when a decision is made. It may be discrete or continuous.
  • “Action” represents the decision itself. It may be discrete (for example, “which car do I want to drive today?”) or continuous (for example, “how fast should I drive the car?”).
  • “Reward” represents the benefit the agent obtains from the environment after taking the action at that state.

“Environment” is the world in which the agent lives. It provides the agent with the current state then accepts the action taken by the agent. It calculates a reward for the agent and then transitions to the next state. We assume the transition has the “Markov Chain” property, meaning that the probability of the next state depends only on the current state and action but is independent of previous states (e.g., where your car is located after 5 seconds depends on its location now and your steering wheel’s angle). After this, the cycle of interaction repeats.

DiagramDescription automatically generated

The agent’s goal is to learn an optimal “policy.” Policies are how the agent decides what action to take at a particular state. An “optimal policy” is a policy that, when followed, produces the maximum total reward.

During the learning process, the agent must predict the future states that will be produced by particular actions. Next, it must take actions that result in the optimal states that generate maximum reward. Reinforcement learning contains both an element of prediction and an element of optimization. It differs from other ML approaches in the following ways:

  • It contains both prediction and optimization.
  • It doesn’t have isolated learning (when the model is trained) and inference (when the model makes a decision) phases. Learning and inference are coupled together in an ongoing interaction.
  • The relationship between the action and the reward is unclear. There is usually an unknown delay in between the two.
  • Training data collected for reinforcement learning is inherently biased because the agent’s experience heavily depends on the actions it takes.  (e.g. The agent never learned how to avoid serious damage in an accident if a self-driven car drives too well and has never had an accident.)
  • The learning process itself may carry high risk when the agent tries to explore unknown territories to gain more experiences. The agent must balance between risk vs. the potential reward.

Reinforcement learning is an active research area, for which there are many proposed architectures. One popular agent architecture involves a policy module and a reward-estimator module. After observing the state from the environment, the policy module consults the reward estimator to evaluate the benefits of each possible action, so it can take the action with the best outlook. After taking the action and receiving the reward from the environment, the agent writes both action and reward to an experience log. The estimator learns from this log, using it to improve its future decisions.

Next Steps

So far, we have described three groups of scenarios where machine learning is commonly used today. We hope to have provided you a broader view of ML capabilities and gain an appreciation of how ML can be applied in a wide spectrum of problem settings.  In future blog posts, we will further expand on the different areas of ML.

Learn More