Numpy Tutorial


Introduction

Visualision of a Matrix using
 a Hinton diagram

NumPy is an open source extension module for Python. The module NumPy provides fast precompiled functions for numerical routines. It adds support to Python for multi-dimensional arrays and matrices. The implementation is aiming at huge matrices and arrays. Besides that the module supplies a large library of high-level mathematical functions to operate on these matrices and arrays.

SciPy (Scientific Python) is often mentioned in the same breath with NumPy. SciPy extends the capabilities of NumPy with further useful functions for minimization, regression, Fourier-transformation and many others.

Both NumPy and SciPy are usually not installed by default. NumPy has to be installed before installing SciPy. Numpy can be downloaded from the website:

http://www.numpy.org

(Comment: The diagram of the image on the right side is the graphical visualisation of a matrix with 14 rows and 20 columns. It's a so-called Hinton diagram. The size of a square within this diagram corresponds to the size of the value of the depicted matrix. The colour determines, if the value is positive or negative. In our example: the colour red denotes negative values and the colour green denotes positive values.)

NumPy is based on two earlier Python modules dealing with arrays. One of these is Numeric. Numeric is like NumPy a Python module for high-performance, numeric computing, but it is obsolete nowadays. Another predecessor of NumPy is Numarray, which is a complete rewrite of Numeric but is deprecated as well. NumPy is a merger of those two, i.e. it is build on the code of Numeric and the features of Numarray.


The Python Alternative to Matlab

Python in combination with Numpy, Scipy and Matplotlib can be used as a replacement for MATLAB. The combination of NumPy, SciPy and Matplotlib is a free (meaning both "free" as in "free beer" and "free" as in "freedom") alternative to MATLAB. Even though MATLAB has a huge number of additional toolboxes available, NumPy has the advantage that Python is a more modern and complete programming language and - as we have said already before - is open source. SciPy adds even more MATLAB-like functionalities to Python. Python is rounded out in the direction of MATLAB with the module Matplotlib, which provides MATLAB-like plotting functionality.

Relationship between Numpy, Scipy, Matplotlib und Matlab


Comparison between Core Python and Numpy

When we say "Core Python", we mean Python without any special modules, i.e. especially without NumPy.

The advantages of Core Python:

Advantages of using Numpy with Python:


A Simple Numpy Example

Before we can use NumPy we will have to import it. It has to be imported like any other module:

import numpy

But you will hardly ever see this. Numpy is usually renamed to np:

import numpy as np

We have a list with values, e.g. temperatures in Celsius:

cvalues = [25.3, 24.8, 26.9, 23.9]

We will turn this into a one-dimensional numpy array:

C = np.array(cvalues)
print(C)
Output:
[ 25.3  24.8  26.9  23.9]

Let's assume, we want to turn the values into degrees Fahrenheit. It's very easy with an numpy array, which we can see as a vector in this case as well. The solution to our problem can be achieved by simple scalar multiplication:

print(C * 9 / 5 + 32)
Output:
[ 77.54  76.64  80.42  75.02]

Compared to this the solution for our Python list is extremely awkward:

fvalues = [ x*9/5 + 32 for x in cvalues] 
print(fvalues)
Output:
[77.54, 76.64, 80.42, 75.02]


Time Comparison between Python Lists and Numpy Arrays

One of the main advantages of NumPy is its advantage in time compared to standard Python. Let's look at the following functions:

import time
size_of_vec = 1000
def pure_python_version():
    t1 = time.time()
    X = range(size_of_vec)
    Y = range(size_of_vec)
    Z = []
    for i in range(len(X)):
        Z.append(X[i] + Y[i])
    return time.time() - t1
def numpy_version():
    t1 = time.time()
    X = np.arange(size_of_vec)
    Y = np.arange(size_of_vec)
    Z = X + Y
    return time.time() - t1

Let's call these functions and see the time consumption:

t1 = pure_python_version()
t2 = numpy_version()
print(t1, t2)
print("Numpy is in this example " + str(t1/t2) + " faster!")
Output:
0.0002086162567138672 3.695487976074219e-05
Numpy is in this example 5.645161290322581 faster!

It's an easier and above all better way to measure the times by using the timeit module. We will use the Timer class in the following script.

The constructor of a Timer object takes a statement to be timed, an additional statement used for setup, and a timer function. Both statements default to 'pass'.

The statements may contain newlines, as long as they don't contain multi-line string literals.

import numpy as np
from timeit import Timer
size_of_vec = 1000
def pure_python_version():
    X = range(size_of_vec)
    Y = range(size_of_vec)
    Z = []
    for i in range(len(X)):
        Z.append(X[i] + Y[i])
def numpy_version():
    X = np.arange(size_of_vec)
    Y = np.arange(size_of_vec)
    Z = X + Y
#timer_obj = Timer("x = x + 1", "x = 0")
timer_obj1 = Timer("pure_python_version()", "from __main__ import pure_python_version")
timer_obj2 = Timer("numpy_version()", "from __main__ import numpy_version")
print(timer_obj1.timeit(10))
print(timer_obj2.timeit(10))
Output:
0.004545956995571032
7.534799806308001e-05


Creating Arrays

Zero-dimensional Arrays in Numpy

It's possible to create multidimensional arrays in numpy. Scalars are zero dimensional. In the following example, we will create the scalar 42. Applying the ndim method to our scalar, we get the dimension of the array. We can also see that the type is a "numpy.ndarray" type.

x = np.array(42)
print(type(x))
print(np.ndim(x))
Output:
<class 'numpy.ndarray'>
0

One-dimensional Arrays

We have already encountered a 1-dimenional array - better known to some as vectors - in our initial example. What we have not mentioned so far, but what you may have assumed, is the fact that numpy arrays are containers of items of the same type. The homogenous type of the array can be determined with the attribute "dtype", as we can learn from the following example:

F = np.array([1, 1, 2, 3, 5, 8, 13, 21])
V = np.array([3.4, 6.9, 99.8, 12.8])
print(F.dtype)
print(V.dtype)
print(np.ndim(F))
print(np.ndim(V))
Output:
int64
float64
1
1

Two- and Multidimensional Arrays

Of course, arrays of NumPy are not limited to one dimension. They are of arbitrary dimension. We create them by passing nested lists (or tuples) to the array method of numpy.

A = np.array([ [3.4, 8.7, 9.9], 
               [1.1, -7.8, -0.7],
               [4.1, 12.3, 4.8]])
print(A)
print(A.ndim)
Output:
[[  3.4   8.7   9.9]
 [  1.1  -7.8  -0.7]
 [  4.1  12.3   4.8]]
2
B = np.array([ [[111, 112], [121, 122]],
               [[211, 212], [221, 222]],
               [[311, 312], [321, 322]] ])
print(B)
print(B.ndim)
Output:
[[[111 112]
  [121 122]]
 [[211 212]
  [221 222]]
 [[311 312]
  [321 322]]]
3


Shape of an Array

Shape of a two-dimensional array

The function "shape" returns the shape of an array. The shape is a tuple of integers. These numbers denote the lengths of the corresponding array dimension. In other words: The "shape" of an array is a tuple with the number of elements per axis (dimension). In our example, the shape is equal to (6,3), i.e. we have 6 lines and 3 columns.

x = np.array([ [67, 63, 87],
               [77, 69, 59],
               [85, 87, 99],
               [79, 72, 71],
               [63, 89, 93],
               [68, 92, 78]])
print(np.shape(x))
Output:
(6, 3)

There is also an equivalent array property:

print(x.shape)
Output:
(7, 3)
Numbering of axis

The shape tells us also something about the order in which the indices are processed, i.e. first rows, then columns and after that the further dimensions.

"shape" can also be used to change the shape of an array.

x.shape = (3, 6)
print(x)
Output:
[[67 63 87 77 69 59]
 [85 87 99 79 72 71]
 [63 89 93 68 92 78]]
x.shape = (2, 9)
print(x)
Output:
[[67 63 87 77 69 59 85 87 99]
 [79 72 71 63 89 93 68 92 78]]

You might have guessed by now that the new shape must correspond to the number of elements of the array, i.e. the total size of the new array must be the same as the old one. We will raise an exception, if this is not the case:

x.shape = (4, 4)
Output:
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-44-5c4497921b8c> in <module>()
----> 1 x.shape = (4, 4)
ValueError: total size of new array must be unchanged

Let's look at some further examples.

The shape of a scalar is an empty tuple:

x = np.array(11)
print(np.shape(x))
Output:
()
B = np.array([ [[111, 112], [121, 122]],
               [[211, 212], [221, 222]],
               [[311, 312], [321, 322]] ])
print(B.shape)
Output:
(3, 2, 2)


Indexing and Slicing

Assigning to and accessing the elements of an array is similar to other sequential data types of Python, i.e. lists and tuples. We have also many options to indexing, which makes indexing in Numpy very powerful and similar to core Python.

Single indexing is the way, you will most probably expect it:

F = np.array([1, 1, 2, 3, 5, 8, 13, 21])
# print the first element of F, i.e. the element with the index 0
print(F[0])
# print the last element of F
print(F[-1])
B = np.array([ [[111, 112], [121, 122]],
               [[211, 212], [221, 222]],
               [[311, 312], [321, 322]] ])
print(B[0][1][0])
Output:
1
21
121

Indexing multidimensional arrays:

A = np.array([ [3.4, 8.7, 9.9], 
               [1.1, -7.8, -0.7],
               [4.1, 12.3, 4.8]])
print(A[1][0])
Output:
1.1

We accessed the element in the second row, i.e. the row with the index 1, and the first column (index 0). We accessed it the same way, we would have accessed an element of a nested Python list. There is another way to access elements of multidimensional arrays in numpy. We use only one pair of square brackets and all the indices are separated by commas:

print(A[1, 0])
Output:
1.1

You have to be aware of the fact, that the second way is more efficient. In the first case, we create an intermediate array A[1] from which we access the element with the index 0. So it behaves similar to this:

tmp = A[1]
print(tmp)
print(tmp[0])
Output:
[ 1.1 -7.8 -0.7]
1.1

We assume that you are familar with the slicing of lists and tuples. The syntax is the same in numpy for one-dimensional arrays, but it can be applied to multiple dimensions as well.

The general syntax for a one-dimensional array A looks like this:

A[start:stop:step]

We illustrate the operating principle of "slicing" with some examples.

We start with the easiest case, i.e. the slicing of a one-dimensional array:

S = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
print(S[2:5])
print(S[:4])
print(S[6:])
print(S[:])
Output:
[2 3 4]
[0 1 2 3]
[6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]

We will illustrate the multidimensional slicing in the following examples. The ranges for each dimension are separated by commas:

A = np.array([
[11,12,13,14,15],
[21,22,23,24,25],
[31,32,33,34,35],
[41,42,43,44,45],
[51,52,53,54,55]])
print(A[:3,2:])
Output:
[[13 14 15]
 [23 24 25]
 [33 34 35]]
Picture of first example of two-dimensional slicing of arrays in numpy
print(A[3:,:])
Output:
[[41 42 43 44 45]
 [51 52 53 54 55]]
Picture of second example of two-dimensional slicing of arrays in numpy
print(A[:,4:])
Output:
[[15]
 [25]
 [35]
 [45]
 [55]]
Picture of third example of two-dimensional slicing of arrays in numpy

The following two examples use the third parameter "step". The reshape module is used to construct the two-dimensional array. We will explain the module in the following subchapter:

X = np.arange(28).reshape(4,7)
print(X)
Output:
[[ 0  1  2  3  4  5  6]
 [ 7  8  9 10 11 12 13]
 [14 15 16 17 18 19 20]
 [21 22 23 24 25 26 27]]
print(X[::2, ::3])
Output:
[[ 0  3  6]
 [14 17 20]]
Picture of fourth example of two-dimensional slicing of arrays in numpy
print(X[::, ::3])
Output:
[[ 0  3  6]
 [ 7 10 13]
 [14 17 20]
 [21 24 27]]
Picture of fifth example of two-dimensional slicing of arrays in numpy

Warning Comment

Attention: Whereas slicings on lists and tuples create new objects, a slicing operation on an array creates a view on the original array. So we get an another way to access the array, or better a part of the array. From this follows that if we modify a view, the original array will be modified as well. If you want to check, if two array names share the same memory block, you can use the function np.may_share_memory.

may_share_memory(A, B)

To determine if two arrays can share memory the memory-bounds of A and B are computed. The function returns True, if they overlap and False otherwise.

The function may give false positives, i.e. if it returns True it just means that the arrays may be the same.

A = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
S = A[2:6]
S[0] = 22
S[1] = 23
print(A)
Output:
[ 0  1 22 23  4  5  6  7  8  9]

Doing the similar thing with lists, we can see that we get a copy:

lst = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
lst2 = lst[2:6]
lst2[0] = 22
lst2[1] = 23
print(lst)
Output:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


Ones and Zeros

There are two ways of initializing Arrays with Zeros or Ones. The method ones(t) takes a tuple t with the shape of the array and fills the array accordingly with ones. By default it will be filled with Ones of type float. If you need integer Ones, you have to set the optional parameter dtype to int:

import numpy as np
E = np.ones((2,3))
print(E)
F = np.ones((3,4),dtype=int)
print(F)
Output:
[[ 1.  1.  1.]
 [ 1.  1.  1.]]
[[1 1 1 1]
 [1 1 1 1]
 [1 1 1 1]]

What we have said about the method ones() is valid for the method zeros() analogously, as we can see in the following example:

Z = np.zeros((2,4))
print(Z)
Output:
[[ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]]

There is another interesting way to create a matrix with Ones or a matrix with Zeros, if it has to have the same shape as another existing array. Numpy supplies for this purpose the methods ones_like(a) and zeros_like(a).

x = np.array([2,5,18,14,4])
E = np.ones_like(x)
print(E)
Output:
[1 1 1 1 1]
Z = np.zeros_like(x)
print(Z)
Output:
[0 0 0 0 0]


Copying Arrays


numpy.copy()

copy(obj, order='K') Return an array copy of the given object 'obj'.

Parameter Meaning
obj array_like input data.
order The possible values are {'C', 'F', 'A', 'K'}. It controls the memory layout of the copy. 'C' means C-order, 'F' means F-order, 'A' means 'F' if the object 'obj' is Fortran contiguous, 'C' otherwise. 'K' means match the layout of 'obj' as closely as possible.
import numpy as np
x = np.array([[42,22,12],[44,53,66]], order='F')
y = x.copy()
x[0,0] = 1001
print(x)
print(y)
Output:
[[1001   22   12]
 [  44   53   66]]
[[42 22 12]
 [44 53 66]]
print(x.flags['C_CONTIGUOUS'])
print(y.flags['C_CONTIGUOUS'])
Output:
False
True

ndarray.copy()

There is also a ndarray method 'copy', which can be directly applied to an array. It is similiar to the above function, but the default values for the order arguments are different.

a.copy(order='C')

Returns a copy of the array 'a'.

Parameter Meaning
order The same as with numpy.copy, but 'C' is the default value for order.
import numpy as np
x = np.array([[42,22,12],[44,53,66]], order='F')
y = x.copy()
x[0,0] = 1001
print(x)
print(y)
print(x.flags['C_CONTIGUOUS'])
print(y.flags['C_CONTIGUOUS'])
Output:
[[1001   22   12]
 [  44   53   66]]
[[42 22 12]
 [44 53 66]]
False
True


Identity Array

An identity array is a square array with ones on its main diagonal. There are two ways to create identity array.

The identity Function

We can create identity arrays with the function identity:

identity(n, dtype=None)

The parameters:

Parameter Meaning
n An integer number defining the number of rows and columns of the output, i.e. 'n' x 'n'
dtype An optional argument, defining the data-type of the output. The default is 'float'

The output of identity is an 'n' x 'n' array with its main diagonal set to one, and all other elements are 0.

import numpy as np
np.identity(4)
Output:
array([[ 1.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  1.]])
np.identity(4, dtype=int) # equivalent to np.identity(3, int)
Output:
array([[1, 0, 0, 0],
       [0, 1, 0, 0],
       [0, 0, 1, 0],
       [0, 0, 0, 1]])

The eye Function

Another way to create identity arrays provides the function eye. It returns a 2-D array with ones on the diagonal and zeros elsewhere.

eye(N, M=None, k=0, dtype=float)

Parameter Meaning
N An integer number defining the rows of the output arrays.
M An optional integer for setting the number of columns in the output. If it is None, it defaults to 'N'.
k Defining the position of the diagonal. The default is 0. 0 refers to the main diagonal. A positive value refers to an upper diagonal, and a negative value to a lower diagonal.
dtype Optional data-type of the returned array.

eye returns an ndarray of shape (N,M). All elements of this array are equal to zero, except for the 'k'-th diagonal, whose values are equal to one.

import numpy as np
np.eye(5, 8, k=1, dtype=int)
Output:
array([[0, 1, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 1, 0, 0, 0],
       [0, 0, 0, 0, 0, 1, 0, 0]])

The principle of operation of the parameter 'd' of the eye function is illustrated in the following diagram:

Principle of operation eye function and parameter d



Exercises:

  1. Create an arbitrary one dimensional array called "v".

  2. Create a new array which consists of the odd indices of previously created array "v".

  3. Create a new array in backwards ordering from v.

  4. What will be the output of the following code:
       a = np.array([1, 2, 3, 4, 5])
       b = a[1:4]
       print(a[1])
       
  5. Create a two dimensional array called "m".

  6. Create a new array from m, in which the elements of each line are in reverse order.

  7. Another one, where the rows are in reverse order.

  8. Create an array from m, where columns and rows are in reverse order.

  9. Cut of the first and last row and the first and last column.



Solutions to the Exercises:

  1.    import numpy as np
       a = np.array([3,8,12,18,7,11,30])
       
  2. odd_elements = a[1::2]
  3. reverse_order = a[::-1]

  4. The output will be 200, because slices are views in numpy and not copies. This is different to the slicing of lists or tuples in Python.

  5. m = np.array([ [11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34]])

  6. m[::,::-1]

  7. m[::-1]

  8. m[::-1,::-1]

  9. m[1:-1,1:-1]