1. Introduction

This post belongs to a new series of posts related to a huge and popular topic in machine learning: fully connected neural networks.

The general series scope is three-fold:

  1. visualize the model features and characteristics with schematic pictures and charts
  2. learn to implement the model with different levels of abstraction, given by the framework used
  3. have some fun with one of the hottest topics right now!

In this new post, we are going to analyze how to train a neural network on toy examples with Scikit-learn. We are going through the following steps:

  1. training setting
  2. define the network architecture: dense layer, activation function and stack of layers
  3. train: loss and accuracy functions, optimizer and learning process
  4. visualize prediction

Point 2 implies to create a layer class with corresponding weights and biases that need to be learned during train step.

The whole code to create a synthetic dataset and learn a neural network model with any of the four libraries mentioned above is wrapped into a Python class, trainFCNN(), and can be found in my Github repo.

2. Installing and importing

First of all, we need to install this library with the pip command and to import the required package.

pngS

Figure 1 - Scikit-learn logo

$ pip install numpy matplotlib sklearn
import numpy as np
from sklearn.neural_network import MLPClassifier, MLPRegressor
%matplotlib inline
import matplotlib.pyplot as plt

3. Train function

Now we need to define the network itself with any of the four different libraries. The functionalities and the process are created and handled within the train function.

We initialize the main parameters for this function as follows:

def train(self, nb_epochs=100, dims=[2], activation='sigmoid'):
        self.LR = 0.005 # learning_rate
        self.nb_epochs = nb_epochs
        self.nb_batch = 100
        self.activation = activation
        self.nb_layer = len(self.dims)-1 # number of layers in the network with learnable parameters

dims helps us to define the dimensions of the number of units for every hidden layer. This is a list of integers where each integer specifies the number of units for every hidden layer.

If we want to have 1-input layer, 3-hidden layer and 1-output layer neural network, then we can just feed a list of 3 integers, such as [2, 4, 4], because internally the code is going to also append one dimension for the input, whose number of units is fixed and equal to 2 for the two inputs only, and, at the end, one for the output, which can be either 1, if you have a regression or a binary classification, or the number of output layer units equal to the number of classes self.nb_class, if we have multi-class problem.

Just recall that, if we have a binary problem, we just need to output the probability of the input to belong to one of the two classes, that’s why we only need one output!

self.dims = [2] + dims + [self.nb_class if self.kind=='multiCls' else 1]

We want to differentiate the activation function for the output layer self.lastActFun. For regression that has to be a linear activation, since we just need to take the dense layer output and use it as the response variable.

We need a sigmoid function for a binary classification because we want to squeeze the dense layer output to 0-1 range and that is going to represent the probability of the input to belong to any of the two classes.

In the last case, a multi-class problem, we need a softmax function for the last layer.

self.lastActFun = 'sigmoid' if self.kind == 'binCls' else 'softmax' if self.kind == 'multiCls' else 'linear'

4. Building and training the model with Sklearn

We start analyzing how we can define a model with Sklearn. If we have a regression problem, we just need to use the regressor class MLPRegressor. Instead, if we have a classification problem, either binary or multi-class, then we need to use the classifier class MLPClassifier. MLP stands for multi-layer perceptron. Both classes can be imported from sklearn.neural_network.

Except that, the entire set of attributes that we need to define for both cases is exactly the same, so we just need to understand one case. What follows applies to regression.

mdl = MLPRegressor(hidden_layer_sizes=tuple(self.dims[1:-1]), max_iter=self.nb_epochs,\
                   alpha=0, activation=self.activation, learning_rate_init=self.LR, solver=optName)

The next one applies to classification.

mdl = MLPClassifier(hidden_layer_sizes=tuple(self.dims[1:-1]), max_iter=self.nb_epochs,\
                    alpha=0, activation=self.activation, learning_rate_init=self.LR, solver=optName)

The first attribute hidden_layer_sizes we need to specify is a tuple of the number of units for hidden layers so we are going to take dims and we start from the second element up to the last but one, because we just need to feed whatever is related to the hidden layers.

Then we need to specify the number of epochs self.nb_epochs as max_iter. Here we do not want to have regularization active so we specified alpha equal to 0, activation equal to the user-defined activation function and initial learning rate equal to self.LR.

By default, the learning rate is not going to change during the training process but there is also the option to make it reduce to eventually have a different behaviour of convergence.

We let the user select either sgd or adam as the optimizer. Please note that the Sklearn solver here refers to the same concept as the optimizer in Tensorflow, Keras and Pytorch. Solver aims at sorting out the optimization problem!

Now everything is stored into mdl instance. The model is built and located into this object. From this object, we just need to call the fit method and to feed X as input and Y as output. Since our structure for Y is a 2D array in every case, we need to convert that into a 1D array by using the ravel() function from Numpy. At the end of this line, Sklearn has learnt the optimal weights and biases of the entire network. It means that the entire learning process is going to happen within this line.

The next step is to use the prediction function predict() so we want to have the prediction of the model for the entire grid of inputs so we can visualize the actual behaviour on the entire two-dimensional space and compare such behaviour to the dataset points.

Finally, we want to have the history of the loss stored in lossHistory with respect to the number of epochs, so we can learn and understand the behaviour of the model over time during the learning process.

If it is also required to have a look at the model parameters nn_prms, weights and biases, we can get access by using coefs_ and intercepts_ for weights and biases, respectively.

YY = self.YY.ravel()
mdl.fit(self.XX, YY)
self.nn_Ygrd = mdl.predict(self.XXgrd)
self.lossHistory = mdl.loss_curve_
self.nn_prms = mdl.coefs_ + mdl.intercepts_

5. Visualize some results

5.1 NN model with a regression problem

We visualize the loss history and the model prediction throughout the 2D grid for the product problem (regression).

tnn = trainFCNN(nb_pnt=2500, dataset='prod')
tnn.train(lib='skl', dims=[6], activation='relu', nb_epochs=500, lr=0.005)
plt.plot(tnn.lossHistory)
plt.title(tnn.mdlDescription());

png

tnn.plotModelEstimate(figsize=(16, 9))

png

5.2 NN model with binary classification

We visualize the loss history and the model prediction throughout the 2D grid for the XOR problem (binary-classification).

tnn = trainFCNN(nb_pnt=2500, dataset='xor')
tnn.train(lib='skl', dims=[6], activation='relu', nb_epochs=500, lr=0.005)
plt.plot(tnn.lossHistory)
plt.title(tnn.mdlDescription());

png

tnn.plotModelEstimate(figsize=(16, 9))

png