1. Introduction

Welcome back to the FCNN series!

In this new post, we are going to dig in and see what happens inside a feed-forward neural network that has been trained to return the right output.

We train the neural network on some basic examples with Tensorflow. If you are new to this library, please check these two posts out, 1 and 2, as well as my introductory post on linear regression and that one on neural networks.

The whole code to create a synthetic dataset and learn a neural network model with any of the four libraries mentioned above is wrapped into a Python class, trainFCNN(), and can be found in my Github repo.

We are going through the following steps:

  1. What do we want to achieve and how?
  2. Create a dataset, train the network and extract the final values of parameters and intermediate variables for a given set of inputs.
  3. Visualize two samples for a binary classification problem
  4. Describe the Python class, visFCNN(), step-by-step. The source code can be found in my Github repo.
  5. Play a bit with the class to visualize different neural network architectures for some funny toy cases.

Point 4 implies to create the dictionary fcnn containing what it is required to visualize the network, the initial settings, how to draw a neuron and a bias term, a line that represents a linear layer weight, how to define the colors of any object, how to draw the input layer, each hidden layer and the output layer.

2. What do we want to achieve and how?

We want to visualize what happens inside a neural network for specific inputs. We are going to use the class that we have been working on throughout the last posts to train the network to specific tasks. We will then play with it by feeding an input sample and inspecting how the neurons look like for a fixed configuration of parameters, i.e., the weights connecting the neurons (purple lines in the below scheme) and the biases (yellow rectangles).

We simplify the nomenclature as follows: every variable (input, output, hidden variables) is a circle and is modelled as a neuron. In the below scheme, the input x is green, each output of the affine transformation z is cyan and the output a of the activation function (black rightward arrow) is blue. The last activation output is the actual response of the network y.

pngM

Figure 1 - Scheme of the neural network that we are going to visualize

We list here the description and the variable name used in the class code:

  1. hor_mrg: horizontal distance between the previous activation layer $a^{(kk-1)}$ and the next dense layer $z^{(kk)}$. It gives the horizontal length of the weights’ grid. For the first layer $a^{(0)} = x$.
  2. vert_mrg: vertical distance between two consecutive neurons, no matter which layer.
  3. neuron_radius: radius of the circle representing a neuron.
  4. bias_size: side of the rectangle representing a bias term.
  5. hor_mrgLayer: horizontal distance between the dense layer $z^{(kk)}$ and the activation layer $a^{(kk)}$ layer. It gives the horizontal length of the activation arrow.
  6. layerDist: horizontal distance between two consecutive activation layers, $a^{(kk-1)}$ and $a^{(kk)}$. It is the sum of hor_mrg and hor_mrgLayer.
  7. Nneuron_max: maximum number of neurons in a single layer. In the below scheme, it is equal to 3, in the first hidden layer.
  8. bottomMargin: vertical offset between the lowest neuron of the current layer kk and the lowest neuron from the largest layer. It is required to make sure every neuron layer is vertically centred with fixed space between neurons throughout the whole network. In the below scheme, the first hidden layer H1 comes with the largest number of neurons. The second hidden layer H2 is offset by bottomMargin (yBH in the figure).
  9. Nlayers: number of layers, except for the input layer. In this case, two hidden and one output layers, 3.

In the below chart, we give a visual description of the geometric parameters we need to define the network layout.

pngM

Figure 2 - Geometric parameters to visualize the neural network

3. Installing and importing

First of all, we need to install this library with the pip command and to import the required package.

$ pip install numpy matplotlib tensorflow
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib.colors import Normalize as mColNorm
import tensorflow as tf
from keras.utils import np_utils

4. Training

In this section, we create the dataset for a binary classification problem, stripe. We basically divided the domain into three areas by drawing two parallel lines. Further details are given in this post.

The whole code to create a synthetic dataset and learn a neural network model with any of the four libraries mentioned above is wrapped into a Python class, trainFCNN(), and can be found in my Github repo.

tnn = trainFCNN(nb_pnt=2500, dataset='stripe')
tnn.plotPoints(idx=0)

png

We then use the Tensorflow library to train a fully connected neural network. We define the network with the dims attribute. Since we want two hidden layers, with 3 and 2 neurons each, we set dims=[3, 2]. Input and output dimensions are inferred from the dataset itself.

The training will be happening within the train method.

At the end of the training stage, we visualize the loss history to check whether it has reached convergence and the model outcome for the whole domain grid with plotModelEstimate.

tnn.train(lib='tf', dims=[3, 2], activation='relu', nb_epochs=250, lr=0.005)
The final model loss is 0.000722280761692673
plt.plot(tnn.lossHistory)
plt.title(tnn.mdlDescription());

png

tnn.plotModelEstimate(figsize=(16, 9))

png

Amazing, so far so good!

Now we need to extract the network parameters and the values of each variable within it to visualize the forward flow through the network. Parameters and variables are stored in nn_prms and nn_vars, respectively.

We print out the dimension of each object. It is a series of 2D weights and 1D biases. The first weight matrix (2, 3) transforms the two inputs into three hidden variables in the first hidden layers. That’s why a (3,) bias comes next. The second transformation goes from 3 neurons to 2, while the last one from 2 to the sole output, 1.

{objNm: objVal.shape for objNm, objVal in zip(['W1', 'b1', 'W2', 'b2', 'W3', 'b3'], tnn.nn_prms)}
{'W1': (2, 3), 'b1': (3,), 'W2': (3, 2), 'b2': (2,), 'W3': (2, 1), 'b3': (1,)}

Since we have fed the entire dataset of 2500 points, we have 2D arrays for the variables with a fixed number of rows. The second dimension stems from the layer dimensionality.

{objNm: objVal.shape for objNm, objVal in zip(['X', 'z1', 'a1', 'z2', 'a2', 'z3', 'a3=Y'], [tnn.XX]+tnn.nn_vars)}
{'X': (2500, 2),
 'z1': (2500, 3),
 'a1': (2500, 3),
 'z2': (2500, 2),
 'a2': (2500, 2),
 'z3': (2500, 1),
 'a3=Y': (2500, 1)}

We define the structure fcnn, a Python dictionary, containing all the information required to visualize the network flow.

We treat the input as the output of a fictitious previous activation layer. The set of activation layers, $a^{(kk)}$, is retrieved from tnn.nn_vars[1::2], while the set of dense layers, $z^{(kk)}$, come from tnn.nn_vars[::2]. In a similar fashion, we extract weights and biases from tnn.nn_prms.

fcnn = {'activations': [tnn.XX] + tnn.nn_vars[1::2],
        'linNeurons':tnn.nn_vars[::2],
        'weights': tnn.nn_prms[::2],
        'biases': tnn.nn_prms[1::2],}

5. What does this visualization class look like?

Let’s have a look at what we are going to build!

Here the dataset of 2500 points, where the sampled point (idx=10) is highlighted with a black circle.

idx = 10
tnn.plotPoints(idx=idx)

png

We create an instance of the class and call the visualize method with a tuple of figure size as an attribute. The input coordinates are (0.44, 1.6) and the outcome is 0, the probability of the input to belonging to class 1 (blue). Let’s have a look at the top-most neuron of the first dense layer, -3.17. We get it as:

$$ 1.6\cdot (-2.21) + .44\cdot 2.23 - 0.62 \approx -3.17 $$

Since the activation function is the ReLu function, whatever is negative becomes 0, otherwise it does not change at all. The output layer employs a sigmoid function to return the probability of the input to belonging to class 1 (blue).

The logit is quite negative, -6.82, thus the probability is basically 0. The input belongs to class 0 (red) with probability equal to 1.

vnn = visFCNN()
vnn.visualize(fcnn, sample=idx, (15, 8))

png

We take now a different point from class 1.

idx = 500
tnn.plotPoints(idx=idx)

png

From the first hidden layer, the bottom-most neuron only survives and its weights from first to second layers become the second layer outputs (biases are indeed negligible). Highly positive weights at the end give a final logit of 29.67, which means 100% probability that it belongs to class 1.

Nice!

vnn.visualize(fcnn, sample=idx, (15, 8))

png