## Neural Networks

### Introduction

When we say "Neural Networks", we mean artificial Neural Networks (ANN). The idea of ANN is based on biological neural networks like the brain.

The basic structure of a neural network is the neuron. A neuron in biology consists of three major parts: the soma (cell body), the dendrites, and the axon.

The dendrites branch of from the soma in a tree-like way and getting thinner with every branch. They receive signals (impulses) from other neurons at synapses. The axon - there is always only one - also leaves the soma and usually tend to extend for longer distances than the dentrites. The axon is used for sending the output of the neuron to other neurons or better to the synapsis of other neurons.

The following image by Quasar Jarosz, courtesy of Wikipedia, illustrates this:

Even though the above image is already an abstraction for a biologist, we can further abstract it:

A perceptron of artificial neural networks is simulating a biological neuron.

It is amazingly simple, what is going on inside the body of a perceptron or neuron. The input signals get multiplied by weight values, i.e. each input has its corresponding weight. This way the input can be adjusted individually for every $x_i$. We can see all the inputs as an input vector and the corresponding weights as the weights vector.

When a signal comes in, it gets multiplied by a weight value that is assigned to this particular input. That is, if a neuron has three inputs, then it has three weights that can be adjusted individually. The weights usually get adjusted during the learn phase.

After this the modified input signals are summed up. It is also possible to add additionally a so-called bias b to this sum. The bias is a value which can also be adjusted during the learn phase.

Finally, the actual output has to be determined. For this purpose an activation or step function Φ is applied to weighted sum of the input values.

The simplest form of an activation function is a binary function. If the result of the summation is greater than some threshold s, the result of $\Phi$ will be 1, otherwise 0.

$$ \Phi(x) = \left\{ \begin{array}{rl} 1 &\mbox{ wx + b > s} \\ 0 &\mbox{ otherwise} \end{array} \right. $$## A Simple Neural Network

The following image shows the general building principle of a simple artificial neural network:

We will write a very simple Neural Network implementing the logical "And" and "Or" functions.

Let's start with the "And" function. It is defined for two inputs:

Input1 | Input2 | Output |
---|---|---|

0 | 0 | 0 |

0 | 1 | 0 |

1 | 0 | 0 |

1 | 1 | 1 |

## Line Separation

You could imagine that you have two attributes describing am eddible object like a fruit for example: "sweetness" and "sourness"

We could describe this by points in a two-dimensional space. The x axis for the sweetness and the y axis for the sourness. Imagine now that we have two fruits as points in this space, i.e. an orange at position (3.5, 1.8) and a lemon at (1.1, 3.9).

We could define dividing lines to define the points which are more lemon-like and which are more orange-like. The following program calculates and renders a bunch of lines. The red ones are completely unusable for this purpose, because they are not separating the classes. Yet, it is obvious that even the green ones are not all useful.

import numpy as np import matplotlib.pyplot as plt def create_distance_function(a, b, c): """ 0 = ax + by + c """ def distance(x, y): """ returns tuple (d, pos) d is the distance If pos == -1 point is below the line, 0 on the line and +1 if above the line """ nom = a * x + b * y + c if nom == 0: pos = 0 elif (nom<0 and b<0) or (nom>0 and b>0): pos = -1 else: pos = 1 return (np.absolute(nom) / np.sqrt( a ** 2 + b ** 2), pos) return distance points = [ (3.5, 1.8), (1.1, 3.9) ] fig, ax = plt.subplots() ax.set_xlabel("sweetness") ax.set_ylabel("sourness") ax.set_xlim([-1, 6]) ax.set_ylim([-1, 8]) X = np.arange(-0.5, 5, 0.1) colors = ["r", ""] # for the samples size = 10 for (index, (x, y)) in enumerate(points): if index== 0: ax.plot(x, y, "o", color="darkorange", markersize=size) else: ax.plot(x, y, "oy", markersize=size) step = 0.05 for x in np.arange(0, 1+step, step): slope = np.tan(np.arccos(x)) dist4line1 = create_distance_function(slope, -1, 0) #print("x: ", x, "slope: ", slope) Y = slope * X results = [] for point in points: results.append(dist4line1(*point)) #print(slope, results) if (results[0][1] != results[1][1]): ax.plot(X, Y, "g-") else: ax.plot(X, Y, "r-") plt.show()

In the following program, we train a neural network to classify two clusters in a 2-dimensional space. We show this in the following diagram with the two classes class1 and class2. We will create those points randomly with the help of a line, the points of class2 will be above the line and the points of class1 will be below the line.

import numpy as np class Perceptron: def __init__(self, input_length, weights=None): if weights is None: self.weights = np.ones(input_length) * 0.5 else: self.weights = weights @staticmethod def unit_step_function(x): if x > 0.5: return 1 return 0 def __call__(self, in_data): weighted_input = self.weights * in_data weighted_sum = weighted_input.sum() return Perceptron.unit_step_function(weighted_sum) p = Perceptron(2, np.array([0.5, 0.5])) data_in = np.empty((2,)) for in1 in range(2): for in2 in range(2): data_in = (in1, in2) data_out = p(data_in) print(data_in, data_out)

(0, 0) 0 (0, 1) 0 (1, 0) 0 (1, 1) 1

We will see that the neural network will find a line that separates the two classes. This line should not be mistaken for the line, which we used to create the points.

This line is called a **decision boundary**.

import numpy as np from collections import Counter class Perceptron: def __init__(self, input_length, weights=None): if weights==None: self.weights = np.random.random((input_length)) * 2 - 1 self.learning_rate = 0.1 @staticmethod def unit_step_function(x): if x < 0: return 0 return 1 def __call__(self, in_data): weighted_input = self.weights * in_data weighted_sum = weighted_input.sum() return Perceptron.unit_step_function(weighted_sum) def adjust(self, target_result, calculated_result, in_data): error = target_result - calculated_result for i in range(len(in_data)): correction = error * in_data[i] *self.learning_rate self.weights[i] += correction def above_line(point, line_func): x, y = point if y > line_func(x): return 1 else: return 0 points = np.random.randint(1, 100, (100, 2)) p = Perceptron(2) def lin1(x): return x + 4 for point in points: p.adjust(above_line(point, lin1), p(point), point) evaluation = Counter() for point in points: if p(point) == above_line(point, lin1): evaluation["correct"] += 1 else: evaluation["wrong"] += 1 print(evaluation.most_common())

[('correct', 100)]

The decision boundary of our previous network can be calculated by looking at the following condition

$$x_1 w_1 + x_2w_2 = 0$$We can change the equation into

$$ x_2 = -\frac{w_1}{w_2}x_1$$When we look at the general form of a straight line $ y = mx + b$, we can easily see that our equation corresponds to the definition of a line and the slope (aka gradient) $m$ is $-\frac{w_1}{w_2}$ and $b$ is equal to 0.

### Single Layer with Bias

As the constant term $b$ determines the point at which a line crosses the y-axis, i.e. the y-intercept, we can see that our network can only calculate lines which pass through the origin, i.e. the point (0, 0). We will need a bias to get other lines as well, i.e. lines which don't go through the origin. A neural network with bias nodes can look like this:

Now, the linear equation for a perceptron contains a bias:

$$b + \sum_{i=1}^{n} x_i \cdot w_i = 0$$We add now some code to print the points and the dividing line according to the previous equation:

# the following line is only needed, # if you use "ipython notebook": %matplotlib inline from matplotlib import pyplot as plt cls = [[], []] for point in points: cls[above_line(point, lin1)].append(tuple(point)) colours = ("r", "b") for i in range(2): X, Y = zip(*cls[i]) plt.scatter(X, Y, c=colours[i]) X = np.arange(-3, 120) m = -p.weights[0] / p.weights[1] print(m) plt.plot(X, m*X, label="ANN line") plt.plot(X, lin1(X), label="line1") plt.legend() plt.show()

1.11082111934

We create a new dataset for our next experiments:

from matplotlib import pyplot as plt class1 = [(3, 4), (4.2, 5.3), (4, 3), (6, 5), (4, 6), (3.7, 5.8), (3.2, 4.6), (5.2, 5.9), (5, 4), (7, 4), (3, 7), (4.3, 4.3) ] class2 = [(-3, -4), (-2, -3.5), (-1, -6), (-3, -4.3), (-4, -5.6), (-3.2, -4.8), (-2.3, -4.3), (-2.7, -2.6), (-1.5, -3.6), (-3.6, -5.6), (-4.5, -4.6), (-3.7, -5.8) ] X, Y = zip(*class1) plt.scatter(X, Y, c="r") X, Y = zip(*class2) plt.scatter(X, Y, c="b") plt.show()

from itertools import chain p = Perceptron(2) def lin1(x): return x + 4 for point in class1: p.adjust(1, p(point), point) for point in class2: p.adjust(0, p(point), point) evaluation = Counter() for point in chain(class1, class2): if p(point) == 1: evaluation["correct"] += 1 else: evaluation["wrong"] += 1 testpoints = [(3.9, 6.9), (-2.9, -5.9)] for point in testpoints: print(p(point)) print(evaluation.most_common())

1 0 [('correct', 12), ('wrong', 12)]

from matplotlib import pyplot as plt X, Y = zip(*class1) plt.scatter(X, Y, c="r") X, Y = zip(*class2) plt.scatter(X, Y, c="b") x = np.arange(-7, 10) y = 5*x + 10 m = -p.weights[0] / p.weights[1] plt.plot(x, m*x) plt.show()

from matplotlib import pyplot as plt class1 = [(3, 4, 3), (4.2, 5.3, 2.5), (4, 3, 3.8), (6, 5, 2.7), (4, 6, 2.9), (3.7, 5.8, 4.2), (3.2, 4.6, 1.9), (5.2, 5.9, 2.7), (5, 4, 3.5), (7, 4, 2.7), (3, 7, 3.1), (4.3, 4.3, 3.8) ] class2 = [(-3, -4, 7.6), (-2, -3.5, 6.9), (-1, -6, 8.6), (-3, -4.3, 7.4), (-4, -5.6, 7.9), (-3.2, -4.8, 5.3), (-2.3, -4.3, 8.1), (-2.7, -2.6, 7.3), (-1.5, -3.6, 7.8), (-3.6, -5.6, 6.8), (-4.5, -4.6, 8.3), (-3.7, -5.8, 8.7) ] X, Y, Z = zip(*class1) plt.scatter(X, Y, Z, c="r") X, Y, Z = zip(*class2) plt.scatter(X, Y, Z, c="b") plt.show()

### Linearly Separable and Inseparable Neural Networks

If two data clusters (classes) can be separated by a decision boundary in the form of a linear equation

$$\sum_{i=1}^{n} x_i \cdot w_i = 0$$they are called linearly separable.

Otherwise, i.e. if such a decision boundary does not exist, the two classes are called linearly inseparable. In this case, we cannot use a simple neural network.

In the following section, we will introduce the XOR problem for neural networks. It is the simplest example of a non linearly separable neural network. I can be solved with an additional layer of neurons, which is called a hidden layer.

### The XOR Problem for Neural Networks

The XOR (exclusive or) function is defined by the following truth table:

Input1 | Input2 | XOR Output |
---|---|---|

0 | 0 | 0 |

0 | 1 | 1 |

1 | 0 | 1 |

1 | 1 | 0 |

This problem can't be solved with a simple neural network. We need to introduce a new type of neural networks, a network with so-called hidden layers. A hidden layer allows the network to reorganize or rearrange the input data.

We will need only one hidden layer with two neurons. One works like an AND gate and the other one like an OR gate. The output will "fire", when the OR gate fires and the AND gate doesn't.

ANN with hidden layers:

The task is to find a line which separates the orange points from the blue points. But they can be separated by two lines, e.g. L_{1} and L_{2} in the following diagram:

To solve this problem, we need a network of the following kind, i.e with a hidden layer N_{1} and N_{2}

The neuron N_{1} will determine one line, e.g. L_{1} and the neuron N_{2} will determine the other line L_{2}.
N_{3} will finally solve our problem:

### Neural Network with Bias Values

We will come back now to our initial example with the random points above and below a line. We will rewrite the code using a bias value.

First we will create two classes with random points, which are not separable by a line crossing the origin.

We will add a bias b to our neural network. This leads us to the following condition

$$x_1 w_1 + x_2w_2 + b w_3= 0$$We can change the equation into

$$ x_2 = -\frac{w_1}{w_2}x_1 -\frac{w_3}{w_2}b$$import numpy as np from matplotlib import pyplot as plt npoints = 50 X, Y = [], [] # class 0 X.append(np.random.uniform(low=-2.5, high=2.3, size=(npoints,)) ) Y.append(np.random.uniform(low=-1.7, high=2.8, size=(npoints,))) # class 1 X.append(np.random.uniform(low=-7.2, high=-4.4, size=(npoints,)) ) Y.append(np.random.uniform(low=3, high=6.5, size=(npoints,))) learnset = [] for i in range(2): # adding points of class i to learnset points = zip(X[i], Y[i]) for p in points: learnset.append((p, i)) colours = ["b", "r"] for i in range(2): plt.scatter(X[i], Y[i], c=colours[i])

import numpy as np from collections import Counter class Perceptron: def __init__(self, input_length, weights=None): if weights==None: self.weights = np.random.random((input_length)) * 2 - 1 self.learning_rate = 0.1 @staticmethod def unit_step_function(x): if x < 0: return 0 return 1 def __call__(self, in_data): weighted_input = self.weights * in_data weighted_sum = weighted_input.sum() return Perceptron.unit_step_function(weighted_sum) def adjust(self, target_result, calculated_result, in_data): error = target_result - calculated_result for i in range(len(in_data)): correction = error * in_data[i] *self.learning_rate self.weights[i] += correction p = Perceptron(2) for point, label in learnset: p.adjust(label, p(point), point) evaluation = Counter() for point, label in learnset: if p(point) == label: evaluation["correct"] += 1 else: evaluation["wrong"] += 1 print(evaluation.most_common()) colours = ["b", "r"] for i in range(2): plt.scatter(X[i], Y[i], c=colours[i]) XR = np.arange(-8, 4) m = -p.weights[0] / p.weights[1] print(m) plt.plot(XR, m*XR, label="decision boundary") plt.legend() plt.show()

[('correct', 77), ('wrong', 23)] 3.10186712936

It is not possible to find a solution with one neuron and without a bias node. The reason is that the class of the blue data points spread around the origin. Without bias nodes we get only lines going through the origin as we have mentioned earlier. It is easy to see that no line going through the origin can separate the blue from the red data.

The following class uses bias nodes and solves this problem:

import numpy as np from collections import Counter class Perceptron: def __init__(self, input_length, weights=None): if weights==None: # input_length + 1 because bias needs a weight as well self.weights = np.random.random((input_length + 1)) * 2 - 1 self.learning_rate = 0.05 self.bias = 1 @staticmethod def sigmoid_function(x): res = 1 / (1 + np.power(np.e, -x)) return 0 if res < 0.5 else 1 def __call__(self, in_data): weighted_input = self.weights[:-1] * in_data weighted_sum = weighted_input.sum() + self.bias *self.weights[-1] return Perceptron.sigmoid_function(weighted_sum) def adjust(self, target_result, calculated_result, in_data): error = target_result - calculated_result for i in range(len(in_data)): correction = error * in_data[i] *self.learning_rate #print("weights: ", self.weights) #print(target_result, calculated_result, in_data, error, correction) self.weights[i] += correction # correct the bias: correction = error * self.bias * self.learning_rate self.weights[-1] += correction p = Perceptron(2) for point, label in learnset: p.adjust(label, p(point), point) evaluation = Counter() for point, label in learnset: if p(point) == label: evaluation["correct"] += 1 else: evaluation["wrong"] += 1 print(evaluation.most_common()) colours = ["b", "r"] for i in range(2): plt.scatter(X[i], Y[i], c=colours[i]) XR = np.arange(-8, 4) m = -p.weights[0] / p.weights[1] b = -p.weights[-1]/p.weights[1] print(m, b) plt.plot(XR, m*XR + b, label="decision boundary") plt.legend() plt.show()

[('correct', 90), ('wrong', 10)] -5.07932788718 -6.08697420041