Numerical & Scientific Computing with Python: Histograms with Python

## Matplotlib Tutorial, Histograms

This chapter of our tutorial deals with histograms. It's hard to imagine that you open a newspaper or magazin without seeing some histograms telling you about the number of smokers in certain age groups, the number of births per year and so on. It's a great way to depict facts without having to use too many words, but on the downside they can be used to manipulate or lie with statistics" as well.

What is a histogram? A formal definition can be: It's a graphical representation of a frequency distribution of some numerical data. Rectangles with equal sizes in the horizontal directions have heights with the corresponding frequencies.

If we construct a histogram, we start with distribute the range of possible x values into usually equal sized and adjacent intervals or bins.

We start now with a practical Python program. We create a histogram with random numbers:

# the next "inline" statement is only needed,
# if you are working with "ipython notebook"
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
gaussian_numbers = np.random.normal(size=10000)
plt.hist(gaussian_numbers)
plt.title("Gaussian Histogram")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()

n, bins, patches = plt.hist(gaussian_numbers)
print("n: ", n, sum(n))
print("bins: ", bins)
for i in range(len(bins)-1):
print(bins[i+1] -bins[i])
print("patches: ", patches)
print(patches[1])

n:  [    6.    85.   509.  1605.  2924.  2899.  1495.   412.    59.     6.] 10000.0
bins:  [-4.00466342 -3.19549377 -2.38632411 -1.57715446 -0.7679848   0.04118485
0.85035451  1.65952416  2.46869382  3.27786347  4.08703313]
0.809169654613
0.809169654613
0.809169654613
0.809169654613
0.809169654613
0.809169654613
0.809169654613
0.809169654613
0.809169654613
0.809169654613
patches:  <a list of 10 Patch objects>
Rectangle(-3.19549,0;0.80917x85)


Let's increase the number of bins. 10 bins is not a lot, if you imagine, that we have 10,000 random values. To do so, we set the keyword parameter bins to 100:

plt.hist(gaussian_numbers, bins=100)
plt.show()


Another important keyword parameter of hist is "normed". "normed" is optional and the default value is 'False'. If it is set to 'True', the first element of the return tuple will be the counts normalized to form a probability density, i.e., "n/(len(x)dbin)", ie the integral of the histogram will sum to 1.

plt.hist(gaussian_numbers, bins=100, normed=True)
plt.show()


If both the parameters 'normed' and 'stacked' are set to 'True', the sum of the histograms is normalized to 1.

plt.hist(gaussian_numbers,
bins=100,
normed=True,
stacked=True,
edgecolor="#6A9662",
color="#DDFFDD")
plt.show()


Okay, you want to see it as depicted as a plot a cumulative values? We can plot it as a cumulative distribution function as well by setting the parameter 'cumulative'.

plt.hist(gaussian_numbers,
bins=100,
normed=True,
stacked=True,
cumulative=True)
plt.show()


### Bar Plots

bars = plt.bar([1,2,3,4], [1,4,9,16])
bars[0].set_color('green')
plt.show()

f=plt.figure()
ax.bar([1,2,3,4], [1,4,9,16])
children = ax.get_children()
children[3].set_color('g')

import matplotlib.pyplot as plt
import numpy as np
years = ('2010', '2011', '2012', '2013', '2014')
visitors = (1241, 50927, 162242, 222093, 296665 / 8 * 12)
index = np.arange(len(visitors))
bar_width = 1.0
plt.bar(index, visitors, bar_width,  color="green")
plt.xticks(index + bar_width / 2, years) # labels get centered
plt.show()
`