Introduction to Deep Learning

Introduction to Deep Learning - Part 6

Bhaskar S

07/22/2023

Introduction

In Introduction to Deep Learning - Part 5 of this series, we continued our journey with PyTorch, covering the following topics:

Tensor Shape Manipulation
Autograd Feature
Building Basic Linear Models (with and without GPU)

In this article, we will continue the journey further in building a non-linear PyTorch model.

Hands-on PyTorch

We will now move onto the next section on building a non-linear Binary Classification model.

Binary Classification Loss Function

In Introduction to Deep Learning - Part 2 of this series, we described the Loss Function as a measure of the deviation of the predicted value from the actual target value.

For Regression problems, one would either use the Mean Absolute Error (also known as the L1 Loss) or the Mean Squared Error loss.

However, for Classification problems, one would typically use the Cross Entropy loss, in particular the Binary Cross Entropy loss for the Binary Classification problems.

The Binary Cross Entropy loss $L(x, W, b)$ is defined as follows:

$L(x, W, b) = -[y.log(\sigma(x, W, b)) + (1 - y).log(1-\sigma(x, W, b))]$

where, $x$ is the input, $W$ is the weights, $b$ is the biases, $\sigma$ is the activation function with its output as the predicted value, and $y$ is the actual target prediction class.

PyTorch Model Basics - Part 2

For the non-linear Binary Classification use-case, we will leverage one of the Scikit-Learn capabilities to create the non-linear synthetic data.

To import the necessary Python module(s), execute the following code snippet:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_moons
from sklearn.metrics import accuracy_score
import torch
from torch import nn

To create the non-linear synthetic data for the Binary Classification with two features, execute the following code snippet:

num_samples = 500

np_Xcp, np_ycp = make_moons(num_samples, noise=0.15, random_state=101)

To create the tensor dataset, execute the following code snippet:

Xcp = torch.tensor(np_Xcp, dtype=torch.float)
ycp = torch.tensor(np_ycp, dtype=torch.float).unsqueeze(dim=1)

To create the training and testing samples, execute the following code snippet:

Xc_train, Xc_test, yc_train, yc_test = train_test_split(Xc, yc, test_size=0.2, random_state=101)

The following illustration shows the plot for the training set:

Figure.1

To initialize variables for the number of features, number of target, and the number of epoch, execute the following code snippet:

num_features_3 = 2
num_target_3 = 1
num_epochs_3 = 2001

To create a simple non-linear Binary Classification model without any hidden layers for the above case, execute the following code snippet:

class BinaryClassNonLinearModel(nn.Module):
  def __init__(self):
    super(BinaryClassNonLinearModel, self).__init__()
    self.nn_layers = nn.Sequential(
      nn.Linear(num_features_3, num_target_3),
      nn.Sigmoid()
    )

  def forward(self, cx: torch.Tensor) -> torch.Tensor:
    return self.nn_layers(cx)

To create an instance of BinaryClassNonLinearModel, execute the following code snippet:

nl_model = BinaryClassNonLinearModel()

To create an instance of the Binary Cross Entropy loss function, execute the following code snippet:

nl_criterion = nn.BCELoss()

To create an instance of the gradient descent function, execute the following code snippet:

nl_optimizer = torch.optim.SGD(nl_model.parameters(), lr=0.05)

To implement the iterative training loop for the forward pass to predict, compute the loss, and execute the backward pass to adjust the parameters, execute the following code snippet:

for epoch in range(1, num_epochs_3):
  nl_model.train()
  nl_optimizer.zero_grad()
  ycp_predict = nl_model(Xcp_train)
  loss = nl_criterion(ycp_predict, ycp_train)
  if epoch % 100 == 0:
    print(f'Non-Linear Model [1] -> Epoch: {epoch}, Loss: {loss}')
  loss.backward()
  nl_optimizer.step()

The following would be a typical output:

Output.1

Non-Linear Model [1] -> Epoch: 100, Loss: 0.4583107829093933
Non-Linear Model [1] -> Epoch: 200, Loss: 0.38705599308013916
Non-Linear Model [1] -> Epoch: 300, Loss: 0.35396769642829895
Non-Linear Model [1] -> Epoch: 400, Loss: 0.333740770816803
Non-Linear Model [1] -> Epoch: 500, Loss: 0.3197369873523712
Non-Linear Model [1] -> Epoch: 600, Loss: 0.3093145191669464
Non-Linear Model [1] -> Epoch: 700, Loss: 0.30119335651397705
Non-Linear Model [1] -> Epoch: 800, Loss: 0.29466748237609863
Non-Linear Model [1] -> Epoch: 900, Loss: 0.2893083989620209
Non-Linear Model [1] -> Epoch: 1000, Loss: 0.2848363220691681
Non-Linear Model [1] -> Epoch: 1100, Loss: 0.28105801343917847
Non-Linear Model [1] -> Epoch: 1200, Loss: 0.27783405780792236
Non-Linear Model [1] -> Epoch: 1300, Loss: 0.27506041526794434
Non-Linear Model [1] -> Epoch: 1400, Loss: 0.2726573050022125
Non-Linear Model [1] -> Epoch: 1500, Loss: 0.2705625891685486
Non-Linear Model [1] -> Epoch: 1600, Loss: 0.26872673630714417
Non-Linear Model [1] -> Epoch: 1700, Loss: 0.2671099007129669
Non-Linear Model [1] -> Epoch: 1800, Loss: 0.26567962765693665
Non-Linear Model [1] -> Epoch: 1900, Loss: 0.2644093632698059
Non-Linear Model [1] -> Epoch: 2000, Loss: 0.26327699422836304

To predict the target values using the trained model, execute the following code snippet:

nl_model.eval()
with torch.no_grad():
  y_predict_nl = nl_model(Xcp_test)
  y_predict_nl = torch.round(y_predict_nl)

To display the model prediction accuracy, execute the following code snippet:

print(f'Non-Linear Model [1] -> Accuracy: {accuracy_score(y_predict_nl, ycp_test)}')

The following would be a typical output:

Output.2

Non-Linear Model [1] -> Accuracy: 0.86

A visual plot of the decision boundary that segregates the two classes would be very useful.

To define the method to display the decision boundary along with the scatter plot, execute the following code snippet:

def plot_with_decision_boundary(model, X, y):
  margin = 0.1
  # Set the grid bounds - identify min and max values with some margin
  x_min, x_max = X[:, 0].min() - margin, X[:, 0].max() + margin
  y_min, y_max = X[:, 1].min() - margin, X[:, 1].max() + margin
  # Create the x and y scale with spacing
  space = 0.1
  x_scale = np.arange(x_min, x_max, space)
  y_scale = np.arange(y_min, y_max, space)
  # Create the x and y grid
  x_grid, y_grid = np.meshgrid(x_scale, y_scale)
  # Flatten the x and y grid to vectors
  x_flat = x_grid.ravel()
  y_flat = y_grid.ravel()
  # Predict using the model for the combined x and y vectors
  y_p = model(torch.tensor(np.c_[x_flat, y_flat], dtype=torch.float)).numpy()
  y_p = y_p.reshape(x_grid.shape)
  # Plot the contour to display the boundary
  plt.contourf(x_grid, y_grid, y_p, cmap=plt.cm.RdBu, alpha=0.3)
  plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.RdBu, alpha=0.5)
  plt.show()

To plot the decision boundary along with the scatter plot on the training data using the model we just created above, execute the following code snippet:

nl_model.eval()
with torch.no_grad():
  plot_with_decision_boundary(nl_model, Xcp_train, ycp_train)

The following illustration depicts the scatter plot with the decision boundary as predicted by the model:

Figure.2

Given that the model uses a linear layer it is not surprising to see a linear demarcation between the two classes from the plot in Figure.2 above.

Let us see if we can improve the model by adding a hidden layer.

To create a non-linear Binary Classification model with one hidden layer consisting of 8 neurons, execute the following code snippet:

num_hidden_3 = 8

class BinaryClassNonLinearModel_2(nn.Module):
  def __init__(self):
    super(BinaryClassNonLinearModel_2, self).__init__()
    self.nn_layers = nn.Sequential(
      nn.Linear(num_features_3, num_hidden_3),
      nn.ReLU(),
      nn.Linear(num_hidden_3, num_target_3),
      nn.Sigmoid()
    )
  
  def forward(self, cx: torch.Tensor) -> torch.Tensor:
    return self.nn_layers(cx)

To create an instance of BinaryClassNonLinearModel_2, execute the following code snippet:

nl_model_2 = BinaryClassNonLinearModel_2()

To create an instance of the Binary Cross Entropy loss function, execute the following code snippet:

nl_criterion_2 = nn.BCELoss()

To create an instance of the gradient descent function, execute the following code snippet:

nl_optimizer_2 = torch.optim.SGD(nl_model_2.parameters(), lr=0.05)

To implement the iterative training loop for the forward pass to predict, compute the loss, and execute the backward pass to adjust the parameters, execute the following code snippet:

for epoch in range(1, num_epochs_3):
  nl_model_2.train()
  nl_optimizer_2.zero_grad()
  ycp_predict = nl_model_2(Xcp_train)
  loss = nl_criterion_2(ycp_predict, ycp_train)
  if epoch % 100 == 0:
    print(f'Non-Linear Model [2] -> Epoch: {epoch}, Loss: {loss}')
  loss.backward()
  nl_optimizer_2.step()

The following would be a typical output:

Output.3

Non-Linear Model [2] -> Epoch: 200, Loss: 0.3945632874965668
Non-Linear Model [2] -> Epoch: 300, Loss: 0.33575549721717834
Non-Linear Model [2] -> Epoch: 400, Loss: 0.3044913411140442
Non-Linear Model [2] -> Epoch: 500, Loss: 0.2857321798801422
Non-Linear Model [2] -> Epoch: 600, Loss: 0.27366170287132263
Non-Linear Model [2] -> Epoch: 700, Loss: 0.26533225178718567
Non-Linear Model [2] -> Epoch: 800, Loss: 0.2591221332550049
Non-Linear Model [2] -> Epoch: 900, Loss: 0.2543095052242279
Non-Linear Model [2] -> Epoch: 1000, Loss: 0.2502143681049347
Non-Linear Model [2] -> Epoch: 1100, Loss: 0.24629908800125122
Non-Linear Model [2] -> Epoch: 1200, Loss: 0.24235355854034424
Non-Linear Model [2] -> Epoch: 1300, Loss: 0.23860971629619598
Non-Linear Model [2] -> Epoch: 1400, Loss: 0.23535583913326263
Non-Linear Model [2] -> Epoch: 1500, Loss: 0.23240886628627777
Non-Linear Model [2] -> Epoch: 1600, Loss: 0.2295774221420288
Non-Linear Model [2] -> Epoch: 1700, Loss: 0.22672024369239807
Non-Linear Model [2] -> Epoch: 1800, Loss: 0.2237909436225891
Non-Linear Model [2] -> Epoch: 1900, Loss: 0.22083352506160736
Non-Linear Model [2] -> Epoch: 2000, Loss: 0.21780216693878174

To predict the target values using the trained model, execute the following code snippet:

nl_model_2.eval()
with torch.no_grad():
  y_predict_nl_2 = nl_model_2(Xcp_test)
  y_predict_nl_2 = torch.round(y_predict_nl_2)

To display the model prediction accuracy, execute the following code snippet:

print(f'Non-Linear Model [2] -> Accuracy: {accuracy_score(y_predict_nl_2, ycp_test)}')

The following would be a typical output:

Output.4

Non-Linear Model [2] -> Accuracy: 0.89

To plot the decision boundary along with the scatter plot on the training data using the model we just created above, execute the following code snippet:

nl_model_2.eval()
with torch.no_grad():
  plot_with_decision_boundary(nl_model_2, Xcp_train, ycp_train)

The following illustration depicts the scatter plot with the decision boundary as predicted by the model:

Figure.3

The prediction accuracy has improved a little bit. Also, we observe a better demarcation between the two classes from the plot in Figure.3 above.

One last time, let us see if we can improve the model by adding one more hidden layer for a total of two.

To create a non-linear Binary Classification model with two hidden layers - the first hidden layer consisting of 16 neurons and the second hidden layer consisting of 8 neurons, execute the following code snippet:

num_hidden_1_3 = 16
num_hidden_2_3 = 8

class BinaryClassNonLinearModel_3(nn.Module):
  def __init__(self):
    super(BinaryClassNonLinearModel_3, self).__init__()
    self.hidden_layer = nn.Sequential(
      nn.Linear(num_features_3, num_hidden_1_3),
      nn.ReLU(),
      nn.Linear(num_hidden_1_3, num_hidden_2_3),
      nn.ReLU(),
      nn.Linear(num_hidden_2_3, num_target_3),
      nn.Sigmoid()
    )

  def forward(self, cx: torch.Tensor) -> torch.Tensor:
    return self.hidden_layer(cx)

To create an instance of BinaryClassNonLinearModel_3, execute the following code snippet:

nl_model_3 = BinaryClassNonLinearModel_3()

To create an instance of the Binary Cross Entropy loss function, execute the following code snippet:

nl_criterion_3 = nn.BCELoss()

To create an instance of the gradient descent function, execute the following code snippet:

nl_optimizer_3 = torch.optim.SGD(nl_model_3.parameters(), lr=0.05)

To implement the iterative training loop for the forward pass to predict, compute the loss, and execute the backward pass to adjust the parameters, execute the following code snippet:

for epoch in range(1, num_epochs_3):
  nl_model_3.train()
  nl_optimizer_3.zero_grad()
  ycp_predict = nl_model_3(Xcp_train)
  loss = nl_criterion_3(ycp_predict, ycp_train)
  if epoch % 100 == 0:
    print(f'Non-Linear Model [3] -> Epoch: {epoch}, Loss: {loss}')
  loss.backward()
  nl_optimizer_3.step()

The following would be a typical output:

Output.5

Non-Linear Model [3] -> Epoch: 200, Loss: 0.32728901505470276
Non-Linear Model [3] -> Epoch: 300, Loss: 0.26362675428390503
Non-Linear Model [3] -> Epoch: 400, Loss: 0.2460920661687851
Non-Linear Model [3] -> Epoch: 500, Loss: 0.23748859763145447
Non-Linear Model [3] -> Epoch: 600, Loss: 0.22980321943759918
Non-Linear Model [3] -> Epoch: 700, Loss: 0.2213612049818039
Non-Linear Model [3] -> Epoch: 800, Loss: 0.21136374771595
Non-Linear Model [3] -> Epoch: 900, Loss: 0.1993836611509323
Non-Linear Model [3] -> Epoch: 1000, Loss: 0.1850656270980835
Non-Linear Model [3] -> Epoch: 1100, Loss: 0.1690329909324646
Non-Linear Model [3] -> Epoch: 1200, Loss: 0.15195232629776
Non-Linear Model [3] -> Epoch: 1300, Loss: 0.1345341056585312
Non-Linear Model [3] -> Epoch: 1400, Loss: 0.1175856664776802
Non-Linear Model [3] -> Epoch: 1500, Loss: 0.10193011909723282
Non-Linear Model [3] -> Epoch: 1600, Loss: 0.08826065808534622
Non-Linear Model [3] -> Epoch: 1700, Loss: 0.07683814316987991
Non-Linear Model [3] -> Epoch: 1800, Loss: 0.06754428148269653
Non-Linear Model [3] -> Epoch: 1900, Loss: 0.060103677213191986
Non-Linear Model [3] -> Epoch: 2000, Loss: 0.05398537218570709

From the Output.5 above, we see the loss reduce quite a bit, which is great !!!

To predict the target values using the trained model, execute the following code snippet:

nl_model_3.eval()
with torch.no_grad():
  y_predict_nl_3 = nl_model_3(Xcp_test)
  y_predict_nl_3 = torch.round(y_predict_nl_3)

To display the model prediction accuracy, execute the following code snippet:

print(f'Non-Linear Model [3] -> Accuracy: {accuracy_score(y_predict_nl_3, ycp_test)}')

The following would be a typical output:

Output.6

Non-Linear Model [3] -> Accuracy: 0.96

From the Output.6 above, we clearly see a BETTER performing model.

To plot the decision boundary along with the scatter plot on the training data using the model we just created above, execute the following code snippet:

nl_model_3.eval()
with torch.no_grad():
  plot_with_decision_boundary(nl_model_3, Xcp_train, ycp_train)

The following illustration depicts the scatter plot with the decision boundary as predicted by the model:

Figure.4

WALLA !!! We observe a much better demarcation between the two classes from the plot in Figure.4 above.

References

PyTorch Documentation

Introduction to Deep Learning - Part 5

Introduction to Deep Learning - Part 4

Introduction to Deep Learning - Part 3

Introduction to Deep Learning - Part 2

Introduction to Deep Learning - Part 1