Numerical & Scientific Computing with Python: Pandas Tutorial

Introduction into Pandas

Playing Pandas

The pandas we are writing about in this chapter have nothing to do with the cute panda bears, and they are neither what our visitors are expecting in a Python tutorial. Pandas is a Python module, which is rounding up the capabilities of Numpy, Scipy and Matplotlab. The word pandas is an acronym which is derived from "Python and data analysis" and "panel data".

There is often some confusion about whether Pandas is an alternative to Numpy, SciPy and Matplotlib. The truth is that it is built on top of Numpy. This means that Numpy is required by pandas. Scipy and Matplotlib on the other hand are not required by pandas but they are extremely useful. That's why the Pandas project lists them as "optional dependency".

Pandas is a software library written for the Python programming language. It is used for data manipulation and analysis. It provides special data structures and operations for the manipulation of numerical tables and time series. Pandas is free software released under the three-clause BSD license.

Data Structures

We will start with the following two important data structures of Pandas:

Series

A Series is a one-dimensional labelled array-like object. It is capable of holding any data type, e.g. integers, floats, strings, Python objects, and so on. It can be seen as a data structure with two arrays: one functioning as the index, i.e. the labels, and the other one contains the actual data.

We define a simple Series object in the following example by instantiating a Pandas Series object with a list. We will later see that we can use other data objects for example Numpy arrays and dictionaries as well to instantiate a Series object.

import pandas as pd
S = pd.Series([11, 28, 72, 3, 5, 8])
S
We received the following result:
0    11
1    28
2    72
3     3
4     5
5     8
dtype: int64

We haven't defined an index in our example, but we see two columns in our output: The right column contains our data, whereas the left column contains the index. Pandas created a default index starting with 0 going to 5, which is the length of the data minus 1.

We can directly access the index and the values of our Series S:

print(S.index)
print(S.values)
RangeIndex(start=0, stop=6, step=1)
[11 28 72  3  5  8]

If we compare this to creating an array in numpy, there are still lots of similarities:

import numpy as np
X = np.array([11, 28, 72, 3, 5, 8])
print(X)
print(S.values)
# both are the same type:
print(type(S.values), type(X))
[11 28 72  3  5  8]
[11 28 72  3  5  8]
<class 'numpy.ndarray'> <class 'numpy.ndarray'>

So far our Series have not been very different to ndarrays of Numpy. This changes, as soon as we start defining Series objects with individual indices:

fruits = ['apples', 'oranges', 'cherries', 'pears']
quantities = [20, 33, 52, 10]
S = pd.Series(quantities, index=fruits)
S
The above Python code returned the following output:
apples      20
oranges     33
cherries    52
pears       10
dtype: int64

A big advantage to NumPy arrays is obvious from the previous example: We can use arbitrary indices.

If we add two series with the same indices, we get a new series with the same index and the correponding values will be added:

fruits = ['apples', 'oranges', 'cherries', 'pears']
S = pd.Series([20, 33, 52, 10], index=fruits)
S2 = pd.Series([17, 13, 31, 32], index=fruits)
print(S + S2)
print("sum of S: ", sum(S))
apples      37
oranges     46
cherries    83
pears       42
dtype: int64
sum of S:  115

The indices do not have to be the same for the Series addition. The index will be the "union" of both indices. If an index doesn't occur in both Series, the value for this Series will be NaN:

fruits = ['peaches', 'oranges', 'cherries', 'pears']
fruits2 = ['raspberries', 'oranges', 'cherries', 'pears']
S = pd.Series([20, 33, 52, 10], index=fruits)
S2 = pd.Series([17, 13, 31, 32], index=fruits2)
print(S + S2)
cherries       83.0
oranges        46.0
peaches         NaN
pears          42.0
raspberries     NaN
dtype: float64
fruits = ['apples', 'oranges', 'cherries', 'pears']
fruits_gr = ['μήλα', 'πορτοκάλια', 'κεράσια', 'αχλάδια']
S = pd.Series([20, 33, 52, 10], index=fruits)
S2 = pd.Series([17, 13, 31, 32], index=fruits_gr)
print(S+S2)
apples       NaN
cherries     NaN
oranges      NaN
pears        NaN
αχλάδια      NaN
κεράσια      NaN
μήλα         NaN
πορτοκάλια   NaN
dtype: float64

It's possible to access single values of a Series or more than one value by a list of indices:

print(S['apples'])
20
print(S[['apples', 'oranges', 'cherries']])
apples      20
oranges     33
cherries    52
dtype: int64

Similar to Numpy we can use scalar operations or mathematical functions on a series:

import numpy as np
print((S + 3) * 4)
print("======================")
print(np.sin(S))
apples       92
oranges     144
cherries    220
pears        52
dtype: int64
======================
apples      0.912945
oranges     0.999912
cherries    0.986628
pears      -0.544021
dtype: float64

pandas.Series.apply

Series.apply(func, convert_dtype=True, args=(), **kwds)

The function "func" will be applied to the Series and it returns either a Series or a DataFrame, depending on "func".

Parameter Meaning
func a function, which can be a NumPy function that will be applied to the entire Series or a Python function that will be applied to every single value of the series
convert_dtype A boolean value. If it is set to True (default), apply will try to find better dtype for elementwise function results. If False, leave as dtype=object
args Positional arguments which will be passed to the function "func" additionally to the values from the series.
**kwds Additional keyword arguments will be passed as keywords to the function

Example:

S.apply(np.sin)
The previous Python code returned the following output:
apples      0.912945
oranges     0.999912
cherries    0.986628
pears      -0.544021
dtype: float64

We can also use Python lambda functions. Let's assume, we have the following task. The test the amount of fruit for every kind. It there are less than 50 available, we will augment the stock by 10:

S.apply(lambda x: x if x > 50 else x+10 )
This gets us the following output:
apples      30
oranges     43
cherries    52
pears       20
dtype: int64

Filtering with a boolean array:

S[S>30]
After having executed the Python code above we received the following output:
oranges     33
cherries    52
dtype: int64

A series can be seen as an ordered Python dictionary with a fixed length.

"apples" in S
The previous code returned the following output:
True

We can even pass a dictionary to a Series object, when we create it. We get a Series with the dict's keys as the indices. The indices will be sorted.

cities = {"London":   8615246, 
          "Berlin":   3562166, 
          "Madrid":   3165235, 
          "Rome":     2874038, 
          "Paris":    2273305, 
          "Vienna":   1805681, 
          "Bucharest":1803425, 
          "Hamburg":  1760433,
          "Budapest": 1754000,
          "Warsaw":   1740119,
          "Barcelona":1602386,
          "Munich":   1493900,
          "Milan":    1350680}
city_series = pd.Series(cities)
print(city_series)
Barcelona    1602386
Berlin       3562166
Bucharest    1803425
Budapest     1754000
Hamburg      1760433
London       8615246
Madrid       3165235
Milan        1350680
Munich       1493900
Paris        2273305
Rome         2874038
Vienna       1805681
Warsaw       1740119
dtype: int64

We have already seen that we can pass a list or a tuple to the keyword argument 'index'. In this case, the list (or tuple) passed to index might not be equal to the keys, e.g. there may be less or more entries in index:

my_cities = ["London", "Paris", "Zurich", "Berlin", 
             "Stuttgart", "Hamburg"]
my_city_series = pd.Series(cities, index=my_cities)
print(my_city_series)
London       8615246.0
Paris        2273305.0
Zurich             NaN
Berlin       3562166.0
Stuttgart          NaN
Hamburg      1760433.0
dtype: float64

We can see, that the cities, which are not included in the dictionary, get the value NaN assigned. NaN stands for "not a number". It can also be seen as meaning "missing" in our example.

We can check for missing values with the methods isnull and notnull:

print(my_city_series.isnull())
London       False
Paris        False
Zurich        True
Berlin       False
Stuttgart     True
Hamburg      False
dtype: bool
print(my_city_series.notnull())
London        True
Paris         True
Zurich       False
Berlin        True
Stuttgart    False
Hamburg       True
dtype: bool

We get also a NaN, if a value in the dictionary has a None:

d = {"a":23, "b":45, "c":None, "d":0}
S = pd.Series(d)
print(S)
a    23.0
b    45.0
c     NaN
d     0.0
dtype: float64
pd.isnull(S)
The previous code returned the following result:
a    False
b    False
c     True
d    False
dtype: bool
pd.notnull(S)
The Python code above returned the following:
a     True
b     True
c    False
d     True
dtype: bool

DataFrame

The underlying idea of a DataFrame is based on spreadsheets. We can see the data structure of a DataFrame as tabular and spreadsheet-like. It contains an ordered collection of columns. Each column consists of a unique data typye, but different columns can have different types, e.g. the first column may consist of integers, while the second one consists of boolean values and so on.

A DataFrame has a row and column index; it's like a dict of Series with a common index.

cities = {"name": ["London", "Berlin", "Madrid", "Rome", 
                   "Paris", "Vienna", "Bucharest", "Hamburg", 
                   "Budapest", "Warsaw", "Barcelona", 
                   "Munich", "Milan"],
          "population": [8615246, 3562166, 3165235, 2874038,
                         2273305, 1805681, 1803425, 1760433,
                         1754000, 1740119, 1602386, 1493900,
                         1350680],
          "country": ["England", "Germany", "Spain", "Italy",
                      "France", "Austria", "Romania", 
                      "Germany", "Hungary", "Poland", "Spain",
                      "Germany", "Italy"]}
city_frame = pd.DataFrame(cities)
print(city_frame)
    country       name  population
0   England     London     8615246
1   Germany     Berlin     3562166
2     Spain     Madrid     3165235
3     Italy       Rome     2874038
4    France      Paris     2273305
5   Austria     Vienna     1805681
6   Romania  Bucharest     1803425
7   Germany    Hamburg     1760433
8   Hungary   Budapest     1754000
9    Poland     Warsaw     1740119
10    Spain  Barcelona     1602386
11  Germany     Munich     1493900
12    Italy      Milan     1350680

We can see that an index (0,1,2, ...) has been automatically assigned to the DataFrame. We can also assign a custom index to the DataFrame object:

ordinals = ["first", "second", "third", "fourth",
            "fifth", "sixth", "seventh", "eigth",
            "ninth", "tenth", "eleventh", "twelvth",
            "thirteenth"]
city_frame = pd.DataFrame(cities, index=ordinals)
print(city_frame)
            country       name  population
first       England     London     8615246
second      Germany     Berlin     3562166
third         Spain     Madrid     3165235
fourth        Italy       Rome     2874038
fifth        France      Paris     2273305
sixth       Austria     Vienna     1805681
seventh     Romania  Bucharest     1803425
eigth       Germany    Hamburg     1760433
ninth       Hungary   Budapest     1754000
tenth        Poland     Warsaw     1740119
eleventh      Spain  Barcelona     1602386
twelvth     Germany     Munich     1493900
thirteenth    Italy      Milan     1350680

We can also define or rearrange the order of the columns.

city_frame = pd.DataFrame(cities,
                          columns=["name", 
                                   "country", 
                                   "population"],
                          index=ordinals)
print(city_frame)
                 name  country  population
first          London  England     8615246
second         Berlin  Germany     3562166
third          Madrid    Spain     3165235
fourth           Rome    Italy     2874038
fifth           Paris   France     2273305
sixth          Vienna  Austria     1805681
seventh     Bucharest  Romania     1803425
eigth         Hamburg  Germany     1760433
ninth        Budapest  Hungary     1754000
tenth          Warsaw   Poland     1740119
eleventh    Barcelona    Spain     1602386
twelvth        Munich  Germany     1493900
thirteenth      Milan    Italy     1350680

We can calculate the sum of all the columns of a DataFrame or the sum of certain columns:

city_frame.sum()
The previous Python code returned the following:
name          LondonBerlinMadridRomeParisViennaBucharestHamb...
country       EnglandGermanySpainItalyFranceAustriaRomaniaGe...
population                                             33800614
dtype: object
city_frame["population"].sum()
After having executed the Python code above we received the following result:
33800614

We can use "cumsum" to calculate the cumulative sum:

x = city_frame["population"].cumsum()
print(x)
first          8615246
second        12177412
third         15342647
fourth        18216685
fifth         20489990
sixth         22295671
seventh       24099096
eigth         25859529
ninth         27613529
tenth         29353648
eleventh      30956034
twelvth       32449934
thirteenth    33800614
Name: population, dtype: int64

x is a Pandas Series. We can reassign it to the population column:

city_frame["population"] = x
print(city_frame)
                 name  country  population
first          London  England     8615246
second         Berlin  Germany    12177412
third          Madrid    Spain    15342647
fourth           Rome    Italy    18216685
fifth           Paris   France    20489990
sixth          Vienna  Austria    22295671
seventh     Bucharest  Romania    24099096
eigth         Hamburg  Germany    25859529
ninth        Budapest  Hungary    27613529
tenth          Warsaw   Poland    29353648
eleventh    Barcelona    Spain    30956034
twelvth        Munich  Germany    32449934
thirteenth      Milan    Italy    33800614

We can also include a column name which is not contained in the dictionary. In this case, all the values of this column will be set to NaN:

city_frame = pd.DataFrame(cities,
                          columns=["name", 
                                   "country", 
                                   "area",
                                   "population"],
                          index=ordinals)
print(city_frame)
                 name  country area  population
first          London  England  NaN     8615246
second         Berlin  Germany  NaN     3562166
third          Madrid    Spain  NaN     3165235
fourth           Rome    Italy  NaN     2874038
fifth           Paris   France  NaN     2273305
sixth          Vienna  Austria  NaN     1805681
seventh     Bucharest  Romania  NaN     1803425
eigth         Hamburg  Germany  NaN     1760433
ninth        Budapest  Hungary  NaN     1754000
tenth          Warsaw   Poland  NaN     1740119
eleventh    Barcelona    Spain  NaN     1602386
twelvth        Munich  Germany  NaN     1493900
thirteenth      Milan    Italy  NaN     1350680

There are two ways to access a column of a DataFrame. The result is in both cases a Series:

# in a dictionary-like way:
print(city_frame["population"])
first         8615246
second        3562166
third         3165235
fourth        2874038
fifth         2273305
sixth         1805681
seventh       1803425
eigth         1760433
ninth         1754000
tenth         1740119
eleventh      1602386
twelvth       1493900
thirteenth    1350680
Name: population, dtype: int64
# as an attribute
print(city_frame.population)
first         8615246
second        3562166
third         3165235
fourth        2874038
fifth         2273305
sixth         1805681
seventh       1803425
eigth         1760433
ninth         1754000
tenth         1740119
eleventh      1602386
twelvth       1493900
thirteenth    1350680
Name: population, dtype: int64
print(type(city_frame.population))
<class 'pandas.core.series.Series'>
city_frame.population
After having executed the Python code above we received the following output:
first         8615246
second        3562166
third         3165235
fourth        2874038
fifth         2273305
sixth         1805681
seventh       1803425
eigth         1760433
ninth         1754000
tenth         1740119
eleventh      1602386
twelvth       1493900
thirteenth    1350680
Name: population, dtype: int64
city_frame["population", "first"] = 9000000
print(city_frame)
                 name  country area  population  (population, first)
first          London  England  NaN     8615246              9000000
second         Berlin  Germany  NaN     3562166              9000000
third          Madrid    Spain  NaN     3165235              9000000
fourth           Rome    Italy  NaN     2874038              9000000
fifth           Paris   France  NaN     2273305              9000000
sixth          Vienna  Austria  NaN     1805681              9000000
seventh     Bucharest  Romania  NaN     1803425              9000000
eigth         Hamburg  Germany  NaN     1760433              9000000
ninth        Budapest  Hungary  NaN     1754000              9000000
tenth          Warsaw   Poland  NaN     1740119              9000000
eleventh    Barcelona    Spain  NaN     1602386              9000000
twelvth        Munich  Germany  NaN     1493900              9000000
thirteenth      Milan    Italy  NaN     1350680              9000000

From the previous example, we can see that we have not copied the population column. "p" is a view on the data of city_frame.

We can also access the rows directly. We access the info of the fourth city in the following way:

city_frame.ix['fourth']
We received the following output:
name                      Rome
country                  Italy
area                       NaN
population             2874038
(population, first)    9000000
Name: fourth, dtype: object

The column area is still not defined. We can set all elements of the column to the same value:

city_frame["area"] = 1572
print(city_frame)
                 name  country  area  population  (population, first)
first          London  England  1572     8615246              9000000
second         Berlin  Germany  1572     3562166              9000000
third          Madrid    Spain  1572     3165235              9000000
fourth           Rome    Italy  1572     2874038              9000000
fifth           Paris   France  1572     2273305              9000000
sixth          Vienna  Austria  1572     1805681              9000000
seventh     Bucharest  Romania  1572     1803425              9000000
eigth         Hamburg  Germany  1572     1760433              9000000
ninth        Budapest  Hungary  1572     1754000              9000000
tenth          Warsaw   Poland  1572     1740119              9000000
eleventh    Barcelona    Spain  1572     1602386              9000000
twelvth        Munich  Germany  1572     1493900              9000000
thirteenth      Milan    Italy  1572     1350680              9000000

In this case, it will be definitely better to assign the exact area to the cities. The list with the area values needs to have the same length as the number of rows in our DataFrame.

# area in square km:
area = [1572, 891.85, 605.77, 1285, 
        105.4, 414.6, 228, 755, 
        525.2, 517, 101.9, 310.4, 
        181.8]
city_frame["area"] = area
print(city_frame)
                 name  country     area  population  (population, first)
first          London  England  1572.00     8615246              9000000
second         Berlin  Germany   891.85     3562166              9000000
third          Madrid    Spain   605.77     3165235              9000000
fourth           Rome    Italy  1285.00     2874038              9000000
fifth           Paris   France   105.40     2273305              9000000
sixth          Vienna  Austria   414.60     1805681              9000000
seventh     Bucharest  Romania   228.00     1803425              9000000
eigth         Hamburg  Germany   755.00     1760433              9000000
ninth        Budapest  Hungary   525.20     1754000              9000000
tenth          Warsaw   Poland   517.00     1740119              9000000
eleventh    Barcelona    Spain   101.90     1602386              9000000
twelvth        Munich  Germany   310.40     1493900              9000000
thirteenth      Milan    Italy   181.80     1350680              9000000

Let's sort our DataFrame according to the city area:

city_frame = city_frame.sort_values(by="area", ascending=False)
print(city_frame)
                 name  country     area  population  (population, first)
first          London  England  1572.00     8615246              9000000
fourth           Rome    Italy  1285.00     2874038              9000000
second         Berlin  Germany   891.85     3562166              9000000
eigth         Hamburg  Germany   755.00     1760433              9000000
third          Madrid    Spain   605.77     3165235              9000000
ninth        Budapest  Hungary   525.20     1754000              9000000
tenth          Warsaw   Poland   517.00     1740119              9000000
sixth          Vienna  Austria   414.60     1805681              9000000
twelvth        Munich  Germany   310.40     1493900              9000000
seventh     Bucharest  Romania   228.00     1803425              9000000
thirteenth      Milan    Italy   181.80     1350680              9000000
fifth           Paris   France   105.40     2273305              9000000
eleventh    Barcelona    Spain   101.90     1602386              9000000

Let's assume, we have only the areas of London, Hamburg and Milan. The areas are in a series with the correct indices. We can assign this series as well:

city_frame = pd.DataFrame(cities,
                          columns=["name", 
                                   "country", 
                                   "area",
                                   "population"],
                          index=ordinals)
some_areas = pd.Series([1572, 755, 181.8], 
                    index=['first', 'eigth', 'thirteenth'])
city_frame['area'] = some_areas
print(city_frame)
                 name  country    area  population
first          London  England  1572.0     8615246
second         Berlin  Germany     NaN     3562166
third          Madrid    Spain     NaN     3165235
fourth           Rome    Italy     NaN     2874038
fifth           Paris   France     NaN     2273305
sixth          Vienna  Austria     NaN     1805681
seventh     Bucharest  Romania     NaN     1803425
eigth         Hamburg  Germany   755.0     1760433
ninth        Budapest  Hungary     NaN     1754000
tenth          Warsaw   Poland     NaN     1740119
eleventh    Barcelona    Spain     NaN     1602386
twelvth        Munich  Germany     NaN     1493900
thirteenth      Milan    Italy   181.8     1350680

A nested dictionary of dicts can be passed to a DataFrame as well. The indices of the outer dictionary are taken as the the columns and the inner keys. i.e. the keys of the nested dictionaries, are used as the row indices:

growth = {"Switzerland": {"2010": 3.0, "2011": 1.8, "2012": 1.1, "2013": 1.9},
          "Germany": {"2010": 4.1, "2011": 3.6, "2012":	0.4, "2013": 0.1},
          "France": {"2010":2.0,  "2011":2.1, "2012": 0.3, "2013": 0.3},
          "Greece": {"2010":-5.4, "2011":-8.9, "2012":-6.6, "2013":	-3.3},
          "Italy": {"2010":1.7, "2011":	0.6, "2012":-2.3, "2013":-1.9}
          } 
growth_frame = pd.DataFrame(growth)
growth_frame
This gets us the following result:
France Germany Greece Italy Switzerland
2010 2.0 4.1 -5.4 1.7 3.0
2011 2.1 3.6 -8.9 0.6 1.8
2012 0.3 0.4 -6.6 -2.3 1.1
2013 0.3 0.1 -3.3 -1.9 1.9

You like to have the years in the columns and the countries in the rows? No problem, you can transpose the data:

growth_frame.T
The previous code returned the following:
2010 2011 2012 2013
France 2.0 2.1 0.3 0.3
Germany 4.1 3.6 0.4 0.1
Greece -5.4 -8.9 -6.6 -3.3
Italy 1.7 0.6 -2.3 -1.9
Switzerland 3.0 1.8 1.1 1.9
growth_frame = pd.DataFrame(growth)
growth_frame.reindex(["2013", "2012", "2011", "2010"])
After having executed the Python code above we received the following:
France Germany Greece Italy Switzerland
2013 0.3 0.1 -3.3 -1.9 1.9
2012 0.3 0.4 -6.6 -2.3 1.1
2011 2.1 3.6 -8.9 0.6 1.8
2010 2.0 4.1 -5.4 1.7 3.0

Filling a DataFrame with random values:

df = pd.DataFrame(np.random.randn(10, 5),
columns=['a', 'b', 'c', 'd', 'e'])
df
The Python code above returned the following:
a b c d e
0 0.295727 0.291932 0.075200 0.803098 -0.149588
1 -0.551503 -0.058048 -1.555376 1.176424 -1.154612
2 0.066221 0.794367 0.652569 -0.980411 0.300658
3 0.162977 -0.461645 -0.718161 1.557218 -0.822975
4 0.363506 -0.396961 -1.797157 0.504989 -0.392984
5 0.574621 -1.637724 0.674160 1.330299 -0.908328
6 -0.749366 -0.091287 -1.018482 0.722062 -1.466433
7 -0.390965 1.136712 -0.382102 -2.233372 1.274391
8 0.028914 -0.320607 -0.213236 -0.075069 1.516394
9 -1.174272 0.782454 -0.771434 -2.441267 0.456506

We want to read in a csv file with the population data of all countries (July 2014). The delimiter of the file a a space and commas are used to separate groups of thousands in the numbers:

pop = pd.read_csv("countries_population.csv", 
                  quotechar="'", 
                  sep=" ", 
                  thousands=",")
pop
The previous Python code returned the following result:
China 1,355,692,576
0 India 1236344631
1 European Union 511434812
2 United States 318892103
3 Indonesia 253609643
4 Brazil 202656788
5 Pakistan 196174380
6 Nigeria 177155754
7 Bangladesh 166280712
8 Russia 142470272
9 Japan 127103388
10 Mexico 120286655
11 Philippines 107668231
12 Ethiopia 96633458
13 Vietnam 93421835
14 Egypt 86895099
15 Turkey 81619392
16 Germany 80996685
17 Iran 80840713
18 Congo, Democratic Republic of the 77433744
19 Thailand 67741401
20 France 66259012
21 United Kingdom 63742977
22 Italy 61680122
23 Burma 55746253
24 Tanzania 49639138
25 Korea, South 49039986
26 South Africa 48375645
27 Spain 47737941
28 Colombia 46245297
29 Kenya 45010056
... ... ...
207 Saint Kitts and Nevis 51538
208 Northern Mariana Islands 51483
209 Faroe Islands 49947
210 Turks and Caicos Islands 49070
211 Sint Maarten 39689
212 Liechtenstein 37313
213 San Marino 32742
214 British Virgin Islands 32680
215 Saint Martin 31530
216 Monaco 30508
217 Gibraltar 29185
218 Palau 21186
219 Anguilla 16086
220 Wallis and Futuna 15561
221 Tuvalu 10782
222 Cook Islands 10134
223 Nauru 9488
224 Saint Helena, Ascension, and Tristan da Cunha 7776
225 Saint Barthelemy 7267
226 Saint Pierre and Miquelon 5716
227 Montserrat 5215
228 Falkland Islands (Islas Malvinas) 3361
229 Norfolk Island 2210
230 Svalbard 1872
231 Christmas Island 1530
232 Tokelau 1337
233 Niue 1190
234 Holy See (Vatican City) 842
235 Cocos (Keeling) Islands 596
236 Pitcairn Islands 48

237 rows × 2 columns

In [ ]: