## Neural Networks

### Introduction

When we say "Neural Networks", we mean artificial Neural Networks (ANN). The idea of ANN is based on biological neural networks like the brain.

The basis structure of a neural network is the neuron. A neuron in biology consists of three major parts: the soma (cell body), the dendrites, and the axon.

The dendrites branch of from the soma in a tree-like way and getting thinner with every branch. They receive signals (impulses) from other neurons at synapses. The axon - there is always only one - also leaves the soma and usually tend to extend for longer distances than the dentrites.

The following image by Quasar Jarosz, courtesy of Wikipedia, illustrates this:

Even though the above image is already an abstraction for a biologist, we can further abstract it:

A perceptron of artificial neural networks is simulating a biological neuron.

It is amazingly simple, what is going on inside the body of a perceptron. The input signals get multiplied by weight values, i.e. each input has its corresponding weight. This way the input can be adjusted individually for every x_{i}. We can see all the inputs as an input vector and the corresponding weights as the weights vector.

When a signal comes in, it gets multiplied by a weight value that is assigned to this particular input. That is, if a neuron has three inputs, then it has three weights that can be adjusted individually. The weights usually get adjusted during the learn phase.

After this the modified input signals are summed up. It is also possible to add additionally a so-called bias b to this sum. The bias is a value which can also be adjusted during the learn phase.

Finally, the actual output has to be determined. For this purpose an activation or step function Φ is used.

The simplest form of an activation function is a binary function. If the result of the summation is greater than some threshold s, the result of \(\Phi\) will be 1, otherwise 0.

\[ \Phi(x) = \left\{ \begin{array}{rl} 1 &\mbox{ wx + b > s} \\ 0 &\mbox{ otherwise} \end{array} \right. \]

## A Simple Neural Network

The following image shows the general building principle of a simple artificial neural network:

We will write a very simple Neural Network implementing the logical "And" and "Or" functions.

Let's start with the "And" function. It is defined for two inputs:

Input1 | Input2 | Output |
---|---|---|

0 | 0 | 0 |

0 | 1 | 0 |

1 | 0 | 0 |

1 | 1 | 1 |

import numpy as np class Perceptron: def __init__(self, input_length, weights=None): if weights is None: self.weights = np.ones(input_length) * 0.5 else: self.weights = weights @staticmethod def unit_step_function(x): if x > 0.5: return 1 return 0 def __call__(self, in_data): weighted_input = self.weights * in_data weighted_sum = weighted_input.sum() return Perceptron.unit_step_function(weighted_sum) p = Perceptron(2, np.array([0.5, 0.5])) for x in [np.array([0, 0]), np.array([0, 1]), np.array([1, 0]), np.array([1, 1])]: y = p(np.array(x)) print(x, y)This gets us the following output:

[0 0] 0 [0 1] 0 [1 0] 0 [1 1] 1

## Line Separation

In the following program, we train a neural network to classify two clusters in a 2-dimensional space. We show this in the following diagram with the two classes class1 and class2. We will create those points randomly with the help of a line, the points of class2 will be above the line and the points of class1 will be below the line.

We will see that the neural network will find a line that separates the two classes. This line should not be mistaken for the line, which we used to create the points.

This line is called a **decision boundary**.

import numpy as np from collections import Counter class Perceptron: def __init__(self, input_length, weights=None): if weights==None: self.weights = np.random.random((input_length)) * 2 - 1 self.learning_rate = 0.1 @staticmethod def unit_step_function(x): if x < 0: return 0 return 1 def __call__(self, in_data): weighted_input = self.weights * in_data weighted_sum = weighted_input.sum() return Perceptron.unit_step_function(weighted_sum) def adjust(self, target_result, calculated_result, in_data): error = target_result - calculated_result for i in range(len(in_data)): correction = error * in_data[i] *self.learning_rate self.weights[i] += correction def above_line(point, line_func): x, y = point if y > line_func(x): return 1 else: return 0 points = np.random.randint(1, 100, (100, 2)) p = Perceptron(2) def lin1(x): return x + 4 for point in points: p.adjust(above_line(point, lin1), p(point), point) evaluation = Counter() for point in points: if p(point) == above_line(point, lin1): evaluation["correct"] += 1 else: evaluation["wrong"] += 1 print(evaluation.most_common())The previous code returned the following output:

[('correct', 86), ('wrong', 14)]

The decision boundary of our previous network can be calculated by looking at the following condition

\[x_1 w_1 + x_2w_2 = 0\]

We can change the equation into

\[ x_2 = -\frac{w_1}{w_2}x_1\]

When we look at the general form of a straight line $ y = mx + b$, we can easily see that our equation corresponds to the definition of a line and the slope (aka gradient) \(m\) is \(-\frac{w_1}{w_2}\) and \(b\) is equal to 0.

As the constant term \(b\) determines the point at which a line crosses the y-axis, i.e. the y-intercept, we can see that our network can only calculate lines which pass through the origin, i.e. the point (0, 0). We will need a bias to get other lines as well.

We add now some code to print the points and the dividing line according to the previous equation:

# the following line is only needed, # if you use "ipython notebook": %matplotlib inline from matplotlib import pyplot as plt cls = [[], []] for point in points: cls[above_line(point, lin1)].append(tuple(point)) colours = ("r", "b") for i in range(2): X, Y = zip(*cls[i]) plt.scatter(X, Y, c=colours[i]) X = np.arange(-3, 120) m = -p.weights[0] / p.weights[1] print(m) plt.plot(X, m*X, label="ANN line") plt.plot(X, lin1(X), label="line1") plt.legend() plt.show()After having executed the Python code above we received the following:

1.45889598341

We create a new dataset for our next experiments:

from matplotlib import pyplot as plt class1 = [(3, 4), (4.2, 5.3), (4, 3), (6, 5), (4, 6), (3.7, 5.8), (3.2, 4.6), (5.2, 5.9), (5, 4), (7, 4), (3, 7), (4.3, 4.3) ] class2 = [(-3, -4), (-2, -3.5), (-1, -6), (-3, -4.3), (-4, -5.6), (-3.2, -4.8), (-2.3, -4.3), (-2.7, -2.6), (-1.5, -3.6), (-3.6, -5.6), (-4.5, -4.6), (-3.7, -5.8) ] X, Y = zip(*class1) plt.scatter(X, Y, c="r") X, Y = zip(*class2) plt.scatter(X, Y, c="b") plt.show()

from itertools import chain p = Perceptron(2) def lin1(x): return x + 4 for point in class1: p.adjust(1, p(point), point) for point in class2: p.adjust(0, p(point), point) evaluation = Counter() for point in chain(class1, class2): if p(point) == 1: evaluation["correct"] += 1 else: evaluation["wrong"] += 1 testpoints = [(3.9, 6.9), (-2.9, -5.9)] for point in testpoints: print(p(point)) print(evaluation.most_common())The previous Python code returned the following output:

1 0 [('correct', 12), ('wrong', 12)]

from matplotlib import pyplot as plt X, Y = zip(*class1) plt.scatter(X, Y, c="r") X, Y = zip(*class2) plt.scatter(X, Y, c="b") x = np.arange(-7, 10) y = 5*x + 10 m = -p.weights[0] / p.weights[1] plt.plot(x, m*x) plt.show()

from matplotlib import pyplot as plt class1 = [(3, 4, 3), (4.2, 5.3, 2.5), (4, 3, 3.8), (6, 5, 2.7), (4, 6, 2.9), (3.7, 5.8, 4.2), (3.2, 4.6, 1.9), (5.2, 5.9, 2.7), (5, 4, 3.5), (7, 4, 2.7), (3, 7, 3.1), (4.3, 4.3, 3.8) ] class2 = [(-3, -4, 7.6), (-2, -3.5, 6.9), (-1, -6, 8.6), (-3, -4.3, 7.4), (-4, -5.6, 7.9), (-3.2, -4.8, 5.3), (-2.3, -4.3, 8.1), (-2.7, -2.6, 7.3), (-1.5, -3.6, 7.8), (-3.6, -5.6, 6.8), (-4.5, -4.6, 8.3), (-3.7, -5.8, 8.7) ] X, Y, Z = zip(*class1) plt.scatter(X, Y, Z, c="r") X, Y, Z = zip(*class2) plt.scatter(X, Y, Z, c="b") plt.show()

## Single Layer with Bias

If two data clusters (classes) can be separated by a decision boundary in the form of a linear equation

\[\sum_{i=1}^{n} x_i \cdot w_i = 0\]

they are called linearly separable.

Otherwise, i.e. if such a decision boundary does not exist, the two classes are called linearly inseparable. In this case, we cannot use a simple neural network.

For this purpose, we need neural networks with bias nodes, like the one in the following diagram:

Now, the linear equation for a perceptron contains a bias:

\[b + \sum_{i=1}^{n} x_i \cdot w_i = 0\]

In the following section, we will introduce the XOR problem for neural networks. It is the simplest example of a non linearly separable neural network. I can be solved with an additional layer of neurons, which is called a hidden layer.

## The XOR Problem for Neural Networks

The XOR (exclusive or) function is defined by the following truth table:

Input1 | Input2 | XOR Output |
---|---|---|

0 | 0 | 0 |

0 | 1 | 1 |

1 | 0 | 1 |

1 | 1 | 0 |

This problem can't be solved with a simple neural network. We need to introduce a new type of neural networks, a network with so-called hidden layers. A hidden layer allows the network to reorganize or rearrange the input data.

ANN with hidden layers:

The task is to find a line which separates the orange points from the blue points. But they can be separated by two lines, e.g. L_{1} and L_{2} in the following diagram:

To solve this problem, we need a network of the following kind, i.e with a hidden layer N_{1} and N_{2}

The neuron N_{1} will determine one line, e.g. L_{1} and the neuron N_{2} will determine the other line L_{2}. N_{3} will finally solve our problem:

### Neural Network with Bias Values

We will come back now to our initial example with the random points above and below a line. We will rewrite the code using a bias value.

First we will create two classes with random points, which are not separable by a line crossing the origin.

We will add a bias b to our neural network. This leads us to the following condition

\[x_1 w_1 + x_2w_2 + b w_3= 0\]

We can change the equation into

\[ x_2 = -\frac{w_1}{w_2}x_1 -\frac{w_3}{w_2}b\]

import numpy as np from matplotlib import pyplot as plt npoints = 50 X, Y = [], [] # class 0 X.append(np.random.uniform(low=-2.5, high=2.3, size=(npoints,)) ) Y.append(np.random.uniform(low=-1.7, high=2.8, size=(npoints,))) # class 1 X.append(np.random.uniform(low=-7.2, high=-4.4, size=(npoints,)) ) Y.append(np.random.uniform(low=3, high=6.5, size=(npoints,))) learnset = [] for i in range(2): # adding points of class i to learnset points = zip(X[i], Y[i]) for p in points: learnset.append((p, i)) colours = ["b", "r"] for i in range(2): plt.scatter(X[i], Y[i], c=colours[i])

import numpy as np from collections import Counter class Perceptron: def __init__(self, input_length, weights=None): if weights==None: self.weights = np.random.random((input_length)) * 2 - 1 self.learning_rate = 0.1 @staticmethod def unit_step_function(x): if x < 0: return 0 return 1 def __call__(self, in_data): weighted_input = self.weights * in_data weighted_sum = weighted_input.sum() return Perceptron.unit_step_function(weighted_sum) def adjust(self, target_result, calculated_result, in_data): error = target_result - calculated_result for i in range(len(in_data)): correction = error * in_data[i] *self.learning_rate self.weights[i] += correction p = Perceptron(2) for point, label in learnset: p.adjust(label, p(point), point) evaluation = Counter() for point, label in learnset: if p(point) == label: evaluation["correct"] += 1 else: evaluation["wrong"] += 1 print(evaluation.most_common()) colours = ["b", "r"] for i in range(2): plt.scatter(X[i], Y[i], c=colours[i]) XR = np.arange(-8, 4) m = -p.weights[0] / p.weights[1] print(m) plt.plot(XR, m*XR, label="decision boundary") plt.legend() plt.show()The above Python code returned the following result:

[('correct', 77), ('wrong', 23)] 3.10186712936

import numpy as np from collections import Counter class Perceptron: def __init__(self, input_length, weights=None): if weights==None: # input_length + 1 because bias needs a weight as well self.weights = np.random.random((input_length + 1)) * 2 - 1 self.learning_rate = 0.05 self.bias = 1 @staticmethod def sigmoid_function(x): res = 1 / (1 + np.power(np.e, -x)) return 0 if res < 0.5 else 1 def __call__(self, in_data): weighted_input = self.weights[:-1] * in_data weighted_sum = weighted_input.sum() + self.bias *self.weights[-1] return Perceptron.sigmoid_function(weighted_sum) def adjust(self, target_result, calculated_result, in_data): error = target_result - calculated_result for i in range(len(in_data)): correction = error * in_data[i] *self.learning_rate #print("weights: ", self.weights) #print(target_result, calculated_result, in_data, error, correction) self.weights[i] += correction # correct the bias: correction = error * self.bias * self.learning_rate self.weights[-1] += correction p = Perceptron(2) for point, label in learnset: p.adjust(label, p(point), point) evaluation = Counter() for point, label in learnset: if p(point) == label: evaluation["correct"] += 1 else: evaluation["wrong"] += 1 print(evaluation.most_common()) colours = ["b", "r"] for i in range(2): plt.scatter(X[i], Y[i], c=colours[i]) XR = np.arange(-8, 4) m = -p.weights[0] / p.weights[1] b = -p.weights[-1]/p.weights[1] print(m, b) plt.plot(XR, m*XR + b, label="decision boundary") plt.legend() plt.show()The above code returned the following result:

[('correct', 90), ('wrong', 10)] -5.07932788718 -6.08697420041

We will need only one hidden layer with two neurons. One works like an AND gate and the other one like an OR gate. The output will "fire", when the OR gate fires and the AND gate doesn't.