## Neural Networks with scikit

### Multi-layer Perceptron

We will start with examples using the multilayer perceptron (MLP). The multilayer perceptron (MLP) is a feedforward artificial neural network model that maps sets of input data onto a set of appropriate outputs. An MLP consists of multiple layers and each layer is fully connected to the following one. The nodes of the layers are neurons using nonlinear activation functions, except for the nodes of the input layer. There can be one or more non-linear hidden layers between the input and the output layer.

from sklearn.neural_network import MLPClassifier X = [[0., 0.], [0., 1.], [1., 0.], [1., 1.]] y = [0, 0, 0, 1] clf = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(5, 2), random_state=1) print(clf.fit(X, y))

MLPClassifier(activation='relu', alpha=1e-05, batch_size='auto', beta_1=0.9, beta_2=0.999, early_stopping=False, epsilon=1e-08, hidden_layer_sizes=(5, 2), learning_rate='constant', learning_rate_init=0.001, max_iter=200, momentum=0.9, nesterovs_momentum=True, power_t=0.5, random_state=1, shuffle=True, solver='lbfgs', tol=0.0001, validation_fraction=0.1, verbose=False, warm_start=False)

The following diagram depicts the neural network, that we have trained for our classifier clf. We have two input nodes $X_0$ and $X_1$, called the input layer, and one output neuron 'Out'. We have two hidden layers the first one with the neurons $H_{00}$ ... $H_{04}$ and the second hidden layer consisting of $H_{10}$ and $H_{11}$. Each neuron of the hidden layers and the output neuron possesses a corresponding Bias, i.e. $B_{00}$ is the corresponding Bias to the neuron $H_{00}$, $B_{01}$ is the corresponding Bias to the neuron $H_{01}$ and so on.

Each neuron of the hidden layers receives the output from every neuron of the previous layers and transforms these values with a weighted linear summation
$$\sum_{i=0}^{n-1}w_ix_i = w_0x_0 + w_1x_1 + ... + w_{n-1}x_{n-1}$$
into an output value, where n is the number of neurons of the layer and $w_i$ corresponds to the i^{th} component of the weight vector.
The output layer receives the values from the last hidden layer. It also performs a linear summation, but a non-linear activation function
$$g(\cdot):R \rightarrow R$$
like the hyperbolic tan function will be applied to the summation result.

%matplotlib inline from IPython.display import Image Image(filename='images/mlp_example_layer.png')

The attribute coefs_ contains a list of weight matrices for every layer. The weight matrix at index i holds the weights between the layer i and layer i + 1.

print(clf.coefs_)The previous code returned the following output:

[array([[-0.14203691, -1.18304359, -0.85567518, -4.53250719, -0.60466275], [-0.69781111, -3.5850093 , -0.26436018, -4.39161248, 0.06644423]]), array([[ 0.29179638, -0.14155284], [ 4.02666592, -0.61556475], [-0.51677234, 0.51479708], [ 7.37215202, -0.31936965], [ 0.32920668, 0.64428109]]), array([[-4.96774269], [-0.86330397]])]

The summation formula of the neuron H_{00} is defined by:

$$\sum_{i=0}^{n-1}w_ix_i = w_0x_0 + w_1x_1 + w_{B_{11}} * B_{11}$$

which can be written as

$$\sum_{i=0}^{n-1}w_ix_i = w_0x_0 + w_1x_1 + w_{B_{11}}$$ because $B_{11} = 1$.

We can get the values for $w_0$ and $w_1$ from clf.coefs_ like this:

$w_0 =$ clf.coefs_[0][0][0] and $w_1 =$ clf.coefs_[0][1][0]

print("w0 = ", clf.coefs_[0][0][0]) print("w1 = ", clf.coefs_[0][1][0])

w0 = -0.142036912678 w1 = -0.697811114978

The weight vector of $H_{00}$ can be accessed with

clf.coefs_[0][:,0]The above code returned the following:

array([-0.14203691, -0.69781111])

We can generalize the above to access a neuron $H_{ij}$ in the following way:

for i in range(len(clf.coefs_)): number_neurons_in_layer = clf.coefs_[i].shape[1] for j in range(number_neurons_in_layer): weights = clf.coefs_[i][:,j] print(i, j, weights, end=", ") print() print()

0 0 [-0.14203691 -0.69781111], 0 1 [-1.18304359 -3.5850093 ], 0 2 [-0.85567518 -0.26436018], 0 3 [-4.53250719 -4.39161248], 0 4 [-0.60466275 0.06644423], 1 0 [ 0.29179638 4.02666592 -0.51677234 7.37215202 0.32920668], 1 1 [-0.14155284 -0.61556475 0.51479708 -0.31936965 0.64428109], 2 0 [-4.96774269 -0.86330397],

intercepts_ is a list of bias vectors, where the vector at index i represents the bias values added to layer i+1.

print(clf.intercepts_)The above Python code returned the following output:

[array([-0.14962269, -0.59232707, -0.5472481 , 7.02667699, -0.87510813]), array([-3.61417672, -0.76834882]), array([ 8.48188176])]

The main reason, why we train a classifier is to predict results for new samples. We can do this with the predict method. The method returns a predicted class for a sample, in our case a "0" or a "1" :

result = clf.predict([[0, 0], [0, 1], [1, 0], [0, 1], [1, 1], [2., 2.], [1.3, 1.3], [2, 4.8]])

Instead of just looking at the class results, we can also use the predict_proba method to get the probability estimates.

prob_results = clf.predict_proba([[0, 0], [0, 1], [1, 0], [0, 1], [1, 1], [2., 2.], [1.3, 1.3], [2, 4.8]]) print(prob_results)

[[ 1.00000000e+000 5.25723951e-101] [ 1.00000000e+000 3.71534882e-031] [ 1.00000000e+000 6.47069178e-029] [ 1.00000000e+000 3.71534882e-031] [ 2.07145538e-004 9.99792854e-001] [ 2.07145538e-004 9.99792854e-001] [ 2.07145538e-004 9.99792854e-001] [ 2.07145538e-004 9.99792854e-001]]

prob_results[i][0] gives us the probability for the class0, i.e. a "0" and results[i][1] the probabilty for a "1". i corresponds to the i^{th} sample.

### Another Example

We will populate to clusters (class0 and class1) in a two dimensional space.

import numpy as np from matplotlib import pyplot as plt npoints = 50 X, Y = [], [] # class 0 X.append(np.random.uniform(low=-2.5, high=2.3, size=(npoints,)) ) Y.append(np.random.uniform(low=-1.7, high=2.8, size=(npoints,))) # class 1 X.append(np.random.uniform(low=-7.2, high=-4.4, size=(npoints,)) ) Y.append(np.random.uniform(low=3, high=6.5, size=(npoints,))) learnset = [] learnlabels = [] for i in range(2): # adding points of class i to learnset points = zip(X[i], Y[i]) for p in points: learnset.append(p) learnlabels.append(i) npoints_test = 3 * npoints TestX = np.random.uniform(low=-7.2, high=5, size=(npoints_test,)) TestY = np.random.uniform(low=-4, high=9, size=(npoints_test,)) testset = [] points = zip(TestX, TestY) for p in points: testset.append(p) colours = ["b", "r"] for i in range(2): plt.scatter(X[i], Y[i], c=colours[i]) plt.scatter(TestX, TestY, c="g") plt.show()

We will train a MLPClassifier for out two classes:

import matplotlib.pyplot as plt from sklearn.datasets import fetch_mldata from sklearn.neural_network import MLPClassifier # mlp = MLPClassifier(hidden_layer_sizes=(100, 100), max_iter=400, alpha=1e-4, # solver='sgd', verbose=10, tol=1e-4, random_state=1) mlp = MLPClassifier(hidden_layer_sizes=(20, 3), max_iter=150, alpha=1e-4, solver='sgd', verbose=10, tol=1e-4, random_state=1, learning_rate_init=.1) mlp.fit(learnset, learnlabels) print("Training set score: %f" % mlp.score(learnset, learnlabels)) print("Test set score: %f" % mlp.score(learnset, learnlabels)) mlp.classes_

Iteration 1, loss = 0.47209614 Iteration 2, loss = 0.44614294 Iteration 3, loss = 0.41336245 Iteration 4, loss = 0.37903617 Iteration 5, loss = 0.34893492 Iteration 6, loss = 0.31801372 Iteration 7, loss = 0.28795204 Iteration 8, loss = 0.25973898 Iteration 9, loss = 0.23339132 Iteration 10, loss = 0.20923182 Iteration 11, loss = 0.18742655 Iteration 12, loss = 0.16785779 Iteration 13, loss = 0.15037921 Iteration 14, loss = 0.13479158 Iteration 15, loss = 0.12095939 Iteration 16, loss = 0.10880727 Iteration 17, loss = 0.09810485 Iteration 18, loss = 0.08870370 Iteration 19, loss = 0.08049147 Iteration 20, loss = 0.07329201 Iteration 21, loss = 0.06696649 Iteration 22, loss = 0.06140222 Iteration 23, loss = 0.05650041 Iteration 24, loss = 0.05217473 Iteration 25, loss = 0.04835234 Iteration 26, loss = 0.04497095 Iteration 27, loss = 0.04196786 Iteration 28, loss = 0.03929475 Iteration 29, loss = 0.03690869 Iteration 30, loss = 0.03477277 Iteration 31, loss = 0.03285525 Iteration 32, loss = 0.03112890 Iteration 33, loss = 0.02957041 Iteration 34, loss = 0.02815974 Iteration 35, loss = 0.02687962 Iteration 36, loss = 0.02571506 Iteration 37, loss = 0.02465300 Iteration 38, loss = 0.02368203 Iteration 39, loss = 0.02279213 Iteration 40, loss = 0.02197453 Iteration 41, loss = 0.02122149 Iteration 42, loss = 0.02052625 Iteration 43, loss = 0.01988283 Iteration 44, loss = 0.01928600 Iteration 45, loss = 0.01873112 Iteration 46, loss = 0.01821413 Iteration 47, loss = 0.01773141 Iteration 48, loss = 0.01727976 Iteration 49, loss = 0.01685633 Iteration 50, loss = 0.01645859 Iteration 51, loss = 0.01608425 Iteration 52, loss = 0.01573129 Iteration 53, loss = 0.01539788 Iteration 54, loss = 0.01508238 Iteration 55, loss = 0.01478333 Iteration 56, loss = 0.01449938 Iteration 57, loss = 0.01422935 Iteration 58, loss = 0.01397216 Iteration 59, loss = 0.01372683 Iteration 60, loss = 0.01349248 Iteration 61, loss = 0.01326831 Iteration 62, loss = 0.01305360 Iteration 63, loss = 0.01284768 Iteration 64, loss = 0.01264995 Iteration 65, loss = 0.01245986 Iteration 66, loss = 0.01227692 Iteration 67, loss = 0.01210067 Iteration 68, loss = 0.01193067 Iteration 69, loss = 0.01176657 Iteration 70, loss = 0.01160798 Iteration 71, loss = 0.01145461 Iteration 72, loss = 0.01130613 Iteration 73, loss = 0.01116229 Iteration 74, loss = 0.01102282 Iteration 75, loss = 0.01088750 Iteration 76, loss = 0.01075610 Iteration 77, loss = 0.01062842 Iteration 78, loss = 0.01050428 Iteration 79, loss = 0.01038351 Iteration 80, loss = 0.01026593 Iteration 81, loss = 0.01015136 Iteration 82, loss = 0.01003970 Iteration 83, loss = 0.00993082 Iteration 84, loss = 0.00982460 Iteration 85, loss = 0.00972093 Iteration 86, loss = 0.00961971 Iteration 87, loss = 0.00952082 Iteration 88, loss = 0.00942417 Iteration 89, loss = 0.00932969 Training loss did not improve more than tol=0.000100 for two consecutive epochs. Stopping. Training set score: 1.000000 Test set score: 1.000000The previous Python code returned the following:

array([0, 1])

predictions = clf.predict(testset) predictionsThe previous Python code returned the following result:

array([1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0])

testset = np.array(testset) testset[predictions==1] colours = ['#C0FFFF', "#FFC8C8"] for i in range(2): plt.scatter(X[i], Y[i], c=colours[i]) colours = ["b", "r"] for i in range(2): cls = testset[predictions==i] Xt, Yt = zip(*cls) plt.scatter(Xt, Yt, marker="D", c=colours[i])