Machine Learning - Understanding Ensemble Learning

Ensemble means a group of ENTITIES viewed as a whole versus viewed individually.

So, what is this got to do with machine learning ???

We know that some machine learning algorithms perform better than the others for a given data set. The natural question then is - why can't we combine these different machine learning algorithms together to create a better performing meta model ??? This in essence is the core idea behind Ensemble Learning.

Bagging is the short for Bootstrap AGGregatING.

The following are the details behind the bagging technique:

Train each model in the bagging ensemble with random samples of data from the training data set. The method of taking random samples from a data set with replacement is referred to as Bootstrapping.

In addition, each model is trained with a random subset of feature variables from the training data set. If $N$ is the total number of feature variables in the data set, then as a general rule of thumb, one would randomly select $\sqrt{N}$ number of features for each model in the ensemble for classification problems AND $\large{\frac{N}{3}}$ number of features for each model in the ensemble for regression problems.

The illustration below depicts this step:

Figure.1

Note that with bootstrapping, it is possible that some samples appear more than once and some samples are never chosen. The unselected (or unseen) samples are referred to as Out of Bag samples.

The illustration below depicts an example of Out of Bag samples:

Figure.2

The Bagging technique can internally use the out of bag samples to perform validation on the ensemble model.
To predict the outcome for a new data sample, evaluate the sample through each of the trained models in the ensemble in parallel.

The illustration below depicts this step:

Figure.3
Aggregate the individual predictions from each of the trained models in the ensemble. For classification problems, the aggregation method is mode (majority vote) and for regression problems, the aggregation method is mean (average).

The illustration below depicts this step:

Figure.4

One of the popular machine learning algorithms using the bagging method is Random Forest.

The following are the details behind the boosting technique:

Add an additional feature to the training data set which represents the weight of each of the samples in the training data set. During training of the model, these weights influence which samples the model should focus on more - more the weight, more the focus. Initialize the weights to the same value for all the samples in the training data set.
Pass the training data set through the first Weak Model to fit. Think of the Weak Model as a very simple Decision Tree of depth one, meaning, the tree has a decision node and two leaves. This is often referred to as a Decision Stump.

The illustration below depicts this step:

Figure.5

The weak model may accurately predict for some samples (the blue square in the green region) and poorly for the other remaining samples (other shapes in the blue region).
Adjust the weights of the samples in the training data set based on the predictions from the previous weak model. The weights of the samples predicted corrected is DECREASED and the weights of the samples with errors is INCREASED.

The illustration below depicts this step:

Figure.6

Notice the size of the blue square in reduced since it was predicted correctly.
Pass the weight adjusted training data set through the next Weak Model to fit. Based on the prediction accuracy, once again adjust the weights of the samples in the training data set.

The illustration below depicts this step:

Figure.7

The general idea is that the next weak model will learn from the mistakes of the previous weak model.
Continue this process of adjusting the weights and training the next weak model until the errors are reduced or have reached a threshold.

The illustration below depicts this step:

Figure.8
To predict the outcome for a new data sample, pass the sample through each of the trained weak models in the ensemble (in sequence) for the target outcome.

Some of the popular machine learning algorithms using the boosting method are as follows:

AdaBoost
Gradient Boosting Machine
XGBoost