# Introduction into Pandas

The pandas we are writing about in this chapter have nothing to do with the cute panda bears, and they are neither what our visitors are expecting in a Python tutorial. Pandas is a Python module, which is rounding up the capabilities of Numpy, Scipy and Matplotlab. The word pandas is an acronym which is derived from "Python and data analysis" and "panel data".

There is often some confusion about whether Pandas is an alternative to Numpy, SciPy and Matplotlib. The truth is that it is built on top of Numpy. This means that Numpy is required by pandas. Scipy and Matplotlib on the other hand are not required by pandas but they are extremely useful. That's why the Pandas project lists them as "optional dependency".

Pandas is a software library written for the Python programming language. It is used for data manipulation and analysis. It provides special data structures and operations for the manipulation of numerical tables and time series. Pandas is free software released under the three-clause BSD license.

## Data Structures

We will start with the following two important data structures of Pandas:

- Series and
- DataFrame

### Series

A Series is a one-dimensional labelled array-like object. It is capable of holding any data type, e.g. integers, floats, strings, Python objects, and so on. It can be seen as a data structure with two arrays: one functioning as the index, i.e. the labels, and the other one contains the actual data.

We define a simple Series object in the following example by instantiating a Pandas Series object with a list. We will later see that we can use other data objects for example Numpy arrays and dictionaries as well to instantiate a Series object.

import pandas as pd S = pd.Series([11, 28, 72, 3, 5, 8]) SWe received the following result:

0 11 1 28 2 72 3 3 4 5 5 8 dtype: int64

We haven't defined an index in our example, but we see two columns in our output: The right column contains our data, whereas the left column contains the index. Pandas created a default index starting with 0 going to 5, which is the length of the data minus 1.

We can directly access the index and the values of our Series S:

print(S.index) print(S.values)

RangeIndex(start=0, stop=6, step=1) [11 28 72 3 5 8]

If we compare this to creating an array in numpy, there are still lots of similarities:

import numpy as np X = np.array([11, 28, 72, 3, 5, 8]) print(X) print(S.values) # both are the same type: print(type(S.values), type(X))

[11 28 72 3 5 8] [11 28 72 3 5 8] <class 'numpy.ndarray'> <class 'numpy.ndarray'>

So far our Series have not been very different to ndarrays of Numpy. This changes, as soon as we start defining Series objects with individual indices:

fruits = ['apples', 'oranges', 'cherries', 'pears'] quantities = [20, 33, 52, 10] S = pd.Series(quantities, index=fruits) SThe above Python code returned the following output:

apples 20 oranges 33 cherries 52 pears 10 dtype: int64

A big advantage to NumPy arrays is obvious from the previous example: We can use arbitrary indices.

If we add two series with the same indices, we get a new series with the same index and the correponding values will be added:

fruits = ['apples', 'oranges', 'cherries', 'pears'] S = pd.Series([20, 33, 52, 10], index=fruits) S2 = pd.Series([17, 13, 31, 32], index=fruits) print(S + S2) print("sum of S: ", sum(S))

apples 37 oranges 46 cherries 83 pears 42 dtype: int64 sum of S: 115

The indices do not have to be the same for the Series addition. The index will be the "union" of both indices. If an index doesn't occur in both Series, the value for this Series will be NaN:

fruits = ['peaches', 'oranges', 'cherries', 'pears'] fruits2 = ['raspberries', 'oranges', 'cherries', 'pears'] S = pd.Series([20, 33, 52, 10], index=fruits) S2 = pd.Series([17, 13, 31, 32], index=fruits2) print(S + S2)

cherries 83.0 oranges 46.0 peaches NaN pears 42.0 raspberries NaN dtype: float64

fruits = ['apples', 'oranges', 'cherries', 'pears'] fruits_gr = ['μήλα', 'πορτοκάλια', 'κεράσια', 'αχλάδια'] S = pd.Series([20, 33, 52, 10], index=fruits) S2 = pd.Series([17, 13, 31, 32], index=fruits_gr) print(S+S2)

apples NaN cherries NaN oranges NaN pears NaN αχλάδια NaN κεράσια NaN μήλα NaN πορτοκάλια NaN dtype: float64

It's possible to access single values of a Series or more than one value by a list of indices:

print(S['apples'])

20

print(S[['apples', 'oranges', 'cherries']])

apples 20 oranges 33 cherries 52 dtype: int64

Similar to Numpy we can use scalar operations or mathematical functions on a series:

import numpy as np print((S + 3) * 4) print("======================") print(np.sin(S))

apples 92 oranges 144 cherries 220 pears 52 dtype: int64 ====================== apples 0.912945 oranges 0.999912 cherries 0.986628 pears -0.544021 dtype: float64

#### pandas.Series.apply

Series.apply(func, convert_dtype=True, args=(), **kwds)

The function "func" will be applied to the Series and it returns either a Series or a DataFrame, depending on "func".

Parameter | Meaning |
---|---|

func | a function, which can be a NumPy function that will be applied to the entire Series or a Python function that will be applied to every single value of the series |

convert_dtype | A boolean value. If it is set to True (default), apply will try to find better dtype for elementwise function results. If False, leave as dtype=object |

args | Positional arguments which will be passed to the function "func" additionally to the values from the series. |

**kwds | Additional keyword arguments will be passed as keywords to the function |

Example:

S.apply(np.sin)The previous Python code returned the following output:

apples 0.912945 oranges 0.999912 cherries 0.986628 pears -0.544021 dtype: float64

We can also use Python lambda functions. Let's assume, we have the following task. The test the amount of fruit for every kind. It there are less than 50 available, we will augment the stock by 10:

S.apply(lambda x: x if x > 50 else x+10 )This gets us the following output:

apples 30 oranges 43 cherries 52 pears 20 dtype: int64

Filtering with a boolean array:

S[S>30]After having executed the Python code above we received the following output:

oranges 33 cherries 52 dtype: int64

A series can be seen as an ordered Python dictionary with a fixed length.

"apples" in SThe previous code returned the following output:

True

We can even pass a dictionary to a Series object, when we create it. We get a Series with the dict's keys as the indices. The indices will be sorted.

cities = {"London": 8615246, "Berlin": 3562166, "Madrid": 3165235, "Rome": 2874038, "Paris": 2273305, "Vienna": 1805681, "Bucharest":1803425, "Hamburg": 1760433, "Budapest": 1754000, "Warsaw": 1740119, "Barcelona":1602386, "Munich": 1493900, "Milan": 1350680} city_series = pd.Series(cities) print(city_series)

Barcelona 1602386 Berlin 3562166 Bucharest 1803425 Budapest 1754000 Hamburg 1760433 London 8615246 Madrid 3165235 Milan 1350680 Munich 1493900 Paris 2273305 Rome 2874038 Vienna 1805681 Warsaw 1740119 dtype: int64

We have already seen that we can pass a list or a tuple to the keyword argument 'index'. In this case, the list (or tuple) passed to index might not be equal to the keys, e.g. there may be less or more entries in index:

my_cities = ["London", "Paris", "Zurich", "Berlin", "Stuttgart", "Hamburg"] my_city_series = pd.Series(cities, index=my_cities) print(my_city_series)

London 8615246.0 Paris 2273305.0 Zurich NaN Berlin 3562166.0 Stuttgart NaN Hamburg 1760433.0 dtype: float64

We can see, that the cities, which are not included in the dictionary, get the value NaN assigned. NaN stands for "not a number". It can also be seen as meaning "missing" in our example.

We can check for missing values with the methods isnull and notnull:

print(my_city_series.isnull())

London False Paris False Zurich True Berlin False Stuttgart True Hamburg False dtype: bool

print(my_city_series.notnull())

London True Paris True Zurich False Berlin True Stuttgart False Hamburg True dtype: bool

We get also a NaN, if a value in the dictionary has a None:

d = {"a":23, "b":45, "c":None, "d":0} S = pd.Series(d) print(S)

a 23.0 b 45.0 c NaN d 0.0 dtype: float64

pd.isnull(S)The previous code returned the following result:

a False b False c True d False dtype: bool

pd.notnull(S)The Python code above returned the following:

a True b True c False d True dtype: bool

### DataFrame

The underlying idea of a DataFrame is based on spreadsheets. We can see the data structure of a DataFrame as tabular and spreadsheet-like. It contains an ordered collection of columns. Each column consists of a unique data typye, but different columns can have different types, e.g. the first column may consist of integers, while the second one consists of boolean values and so on.

A DataFrame has a row and column index; it's like a dict of Series with a common index.

cities = {"name": ["London", "Berlin", "Madrid", "Rome", "Paris", "Vienna", "Bucharest", "Hamburg", "Budapest", "Warsaw", "Barcelona", "Munich", "Milan"], "population": [8615246, 3562166, 3165235, 2874038, 2273305, 1805681, 1803425, 1760433, 1754000, 1740119, 1602386, 1493900, 1350680], "country": ["England", "Germany", "Spain", "Italy", "France", "Austria", "Romania", "Germany", "Hungary", "Poland", "Spain", "Germany", "Italy"]} city_frame = pd.DataFrame(cities) print(city_frame)

country name population 0 England London 8615246 1 Germany Berlin 3562166 2 Spain Madrid 3165235 3 Italy Rome 2874038 4 France Paris 2273305 5 Austria Vienna 1805681 6 Romania Bucharest 1803425 7 Germany Hamburg 1760433 8 Hungary Budapest 1754000 9 Poland Warsaw 1740119 10 Spain Barcelona 1602386 11 Germany Munich 1493900 12 Italy Milan 1350680

We can see that an index (0,1,2, ...) has been automatically assigned to the DataFrame. We can also assign a custom index to the DataFrame object:

ordinals = ["first", "second", "third", "fourth", "fifth", "sixth", "seventh", "eigth", "ninth", "tenth", "eleventh", "twelvth", "thirteenth"] city_frame = pd.DataFrame(cities, index=ordinals) print(city_frame)

country name population first England London 8615246 second Germany Berlin 3562166 third Spain Madrid 3165235 fourth Italy Rome 2874038 fifth France Paris 2273305 sixth Austria Vienna 1805681 seventh Romania Bucharest 1803425 eigth Germany Hamburg 1760433 ninth Hungary Budapest 1754000 tenth Poland Warsaw 1740119 eleventh Spain Barcelona 1602386 twelvth Germany Munich 1493900 thirteenth Italy Milan 1350680

We can also define or rearrange the order of the columns.

city_frame = pd.DataFrame(cities, columns=["name", "country", "population"], index=ordinals) print(city_frame)

name country population first London England 8615246 second Berlin Germany 3562166 third Madrid Spain 3165235 fourth Rome Italy 2874038 fifth Paris France 2273305 sixth Vienna Austria 1805681 seventh Bucharest Romania 1803425 eigth Hamburg Germany 1760433 ninth Budapest Hungary 1754000 tenth Warsaw Poland 1740119 eleventh Barcelona Spain 1602386 twelvth Munich Germany 1493900 thirteenth Milan Italy 1350680

We can calculate the sum of all the columns of a DataFrame or the sum of certain columns:

city_frame.sum()The previous Python code returned the following:

name LondonBerlinMadridRomeParisViennaBucharestHamb... country EnglandGermanySpainItalyFranceAustriaRomaniaGe... population 33800614 dtype: object

city_frame["population"].sum()After having executed the Python code above we received the following result:

33800614

We can use "cumsum" to calculate the cumulative sum:

x = city_frame["population"].cumsum() print(x)

first 8615246 second 12177412 third 15342647 fourth 18216685 fifth 20489990 sixth 22295671 seventh 24099096 eigth 25859529 ninth 27613529 tenth 29353648 eleventh 30956034 twelvth 32449934 thirteenth 33800614 Name: population, dtype: int64

x is a Pandas Series. We can reassign it to the population column:

city_frame["population"] = x print(city_frame)

name country population first London England 8615246 second Berlin Germany 12177412 third Madrid Spain 15342647 fourth Rome Italy 18216685 fifth Paris France 20489990 sixth Vienna Austria 22295671 seventh Bucharest Romania 24099096 eigth Hamburg Germany 25859529 ninth Budapest Hungary 27613529 tenth Warsaw Poland 29353648 eleventh Barcelona Spain 30956034 twelvth Munich Germany 32449934 thirteenth Milan Italy 33800614

We can also include a column name which is not contained in the dictionary. In this case, all the values of this column will be set to NaN:

city_frame = pd.DataFrame(cities, columns=["name", "country", "area", "population"], index=ordinals) print(city_frame)

name country area population first London England NaN 8615246 second Berlin Germany NaN 3562166 third Madrid Spain NaN 3165235 fourth Rome Italy NaN 2874038 fifth Paris France NaN 2273305 sixth Vienna Austria NaN 1805681 seventh Bucharest Romania NaN 1803425 eigth Hamburg Germany NaN 1760433 ninth Budapest Hungary NaN 1754000 tenth Warsaw Poland NaN 1740119 eleventh Barcelona Spain NaN 1602386 twelvth Munich Germany NaN 1493900 thirteenth Milan Italy NaN 1350680

There are two ways to access a column of a DataFrame. The result is in both cases a Series:

# in a dictionary-like way: print(city_frame["population"])

first 8615246 second 3562166 third 3165235 fourth 2874038 fifth 2273305 sixth 1805681 seventh 1803425 eigth 1760433 ninth 1754000 tenth 1740119 eleventh 1602386 twelvth 1493900 thirteenth 1350680 Name: population, dtype: int64

# as an attribute print(city_frame.population)

first 8615246 second 3562166 third 3165235 fourth 2874038 fifth 2273305 sixth 1805681 seventh 1803425 eigth 1760433 ninth 1754000 tenth 1740119 eleventh 1602386 twelvth 1493900 thirteenth 1350680 Name: population, dtype: int64

print(type(city_frame.population))

<class 'pandas.core.series.Series'>

city_frame.populationAfter having executed the Python code above we received the following output:

first 8615246 second 3562166 third 3165235 fourth 2874038 fifth 2273305 sixth 1805681 seventh 1803425 eigth 1760433 ninth 1754000 tenth 1740119 eleventh 1602386 twelvth 1493900 thirteenth 1350680 Name: population, dtype: int64

city_frame["population", "first"] = 9000000 print(city_frame)

name country area population (population, first) first London England NaN 8615246 9000000 second Berlin Germany NaN 3562166 9000000 third Madrid Spain NaN 3165235 9000000 fourth Rome Italy NaN 2874038 9000000 fifth Paris France NaN 2273305 9000000 sixth Vienna Austria NaN 1805681 9000000 seventh Bucharest Romania NaN 1803425 9000000 eigth Hamburg Germany NaN 1760433 9000000 ninth Budapest Hungary NaN 1754000 9000000 tenth Warsaw Poland NaN 1740119 9000000 eleventh Barcelona Spain NaN 1602386 9000000 twelvth Munich Germany NaN 1493900 9000000 thirteenth Milan Italy NaN 1350680 9000000

From the previous example, we can see that we have not copied the population column. "p" is a view on the data of city_frame.

We can also access the rows directly. We access the info of the fourth city in the following way:

city_frame.ix['fourth']We received the following output:

name Rome country Italy area NaN population 2874038 (population, first) 9000000 Name: fourth, dtype: object

The column area is still not defined. We can set all elements of the column to the same value:

city_frame["area"] = 1572 print(city_frame)

name country area population (population, first) first London England 1572 8615246 9000000 second Berlin Germany 1572 3562166 9000000 third Madrid Spain 1572 3165235 9000000 fourth Rome Italy 1572 2874038 9000000 fifth Paris France 1572 2273305 9000000 sixth Vienna Austria 1572 1805681 9000000 seventh Bucharest Romania 1572 1803425 9000000 eigth Hamburg Germany 1572 1760433 9000000 ninth Budapest Hungary 1572 1754000 9000000 tenth Warsaw Poland 1572 1740119 9000000 eleventh Barcelona Spain 1572 1602386 9000000 twelvth Munich Germany 1572 1493900 9000000 thirteenth Milan Italy 1572 1350680 9000000

In this case, it will be definitely better to assign the exact area to the cities. The list with the area values needs to have the same length as the number of rows in our DataFrame.

# area in square km: area = [1572, 891.85, 605.77, 1285, 105.4, 414.6, 228, 755, 525.2, 517, 101.9, 310.4, 181.8] city_frame["area"] = area print(city_frame)

name country area population (population, first) first London England 1572.00 8615246 9000000 second Berlin Germany 891.85 3562166 9000000 third Madrid Spain 605.77 3165235 9000000 fourth Rome Italy 1285.00 2874038 9000000 fifth Paris France 105.40 2273305 9000000 sixth Vienna Austria 414.60 1805681 9000000 seventh Bucharest Romania 228.00 1803425 9000000 eigth Hamburg Germany 755.00 1760433 9000000 ninth Budapest Hungary 525.20 1754000 9000000 tenth Warsaw Poland 517.00 1740119 9000000 eleventh Barcelona Spain 101.90 1602386 9000000 twelvth Munich Germany 310.40 1493900 9000000 thirteenth Milan Italy 181.80 1350680 9000000

Let's sort our DataFrame according to the city area:

city_frame = city_frame.sort_values(by="area", ascending=False) print(city_frame)

name country area population (population, first) first London England 1572.00 8615246 9000000 fourth Rome Italy 1285.00 2874038 9000000 second Berlin Germany 891.85 3562166 9000000 eigth Hamburg Germany 755.00 1760433 9000000 third Madrid Spain 605.77 3165235 9000000 ninth Budapest Hungary 525.20 1754000 9000000 tenth Warsaw Poland 517.00 1740119 9000000 sixth Vienna Austria 414.60 1805681 9000000 twelvth Munich Germany 310.40 1493900 9000000 seventh Bucharest Romania 228.00 1803425 9000000 thirteenth Milan Italy 181.80 1350680 9000000 fifth Paris France 105.40 2273305 9000000 eleventh Barcelona Spain 101.90 1602386 9000000

Let's assume, we have only the areas of London, Hamburg and Milan. The areas are in a series with the correct indices. We can assign this series as well:

city_frame = pd.DataFrame(cities, columns=["name", "country", "area", "population"], index=ordinals) some_areas = pd.Series([1572, 755, 181.8], index=['first', 'eigth', 'thirteenth']) city_frame['area'] = some_areas print(city_frame)

name country area population first London England 1572.0 8615246 second Berlin Germany NaN 3562166 third Madrid Spain NaN 3165235 fourth Rome Italy NaN 2874038 fifth Paris France NaN 2273305 sixth Vienna Austria NaN 1805681 seventh Bucharest Romania NaN 1803425 eigth Hamburg Germany 755.0 1760433 ninth Budapest Hungary NaN 1754000 tenth Warsaw Poland NaN 1740119 eleventh Barcelona Spain NaN 1602386 twelvth Munich Germany NaN 1493900 thirteenth Milan Italy 181.8 1350680

A nested dictionary of dicts can be passed to a DataFrame as well. The indices of the outer dictionary are taken as the the columns and the inner keys. i.e. the keys of the nested dictionaries, are used as the row indices:

growth = {"Switzerland": {"2010": 3.0, "2011": 1.8, "2012": 1.1, "2013": 1.9}, "Germany": {"2010": 4.1, "2011": 3.6, "2012": 0.4, "2013": 0.1}, "France": {"2010":2.0, "2011":2.1, "2012": 0.3, "2013": 0.3}, "Greece": {"2010":-5.4, "2011":-8.9, "2012":-6.6, "2013": -3.3}, "Italy": {"2010":1.7, "2011": 0.6, "2012":-2.3, "2013":-1.9} }

growth_frame = pd.DataFrame(growth) growth_frameThis gets us the following result:

France | Germany | Greece | Italy | Switzerland | |
---|---|---|---|---|---|

2010 | 2.0 | 4.1 | -5.4 | 1.7 | 3.0 |

2011 | 2.1 | 3.6 | -8.9 | 0.6 | 1.8 |

2012 | 0.3 | 0.4 | -6.6 | -2.3 | 1.1 |

2013 | 0.3 | 0.1 | -3.3 | -1.9 | 1.9 |

You like to have the years in the columns and the countries in the rows? No problem, you can transpose the data:

growth_frame.TThe previous code returned the following:

2010 | 2011 | 2012 | 2013 | |
---|---|---|---|---|

France | 2.0 | 2.1 | 0.3 | 0.3 |

Germany | 4.1 | 3.6 | 0.4 | 0.1 |

Greece | -5.4 | -8.9 | -6.6 | -3.3 |

Italy | 1.7 | 0.6 | -2.3 | -1.9 |

Switzerland | 3.0 | 1.8 | 1.1 | 1.9 |

growth_frame = pd.DataFrame(growth) growth_frame.reindex(["2013", "2012", "2011", "2010"])After having executed the Python code above we received the following:

France | Germany | Greece | Italy | Switzerland | |
---|---|---|---|---|---|

2013 | 0.3 | 0.1 | -3.3 | -1.9 | 1.9 |

2012 | 0.3 | 0.4 | -6.6 | -2.3 | 1.1 |

2011 | 2.1 | 3.6 | -8.9 | 0.6 | 1.8 |

2010 | 2.0 | 4.1 | -5.4 | 1.7 | 3.0 |

Filling a DataFrame with random values:

df = pd.DataFrame(np.random.randn(10, 5), columns=['a', 'b', 'c', 'd', 'e']) dfThe Python code above returned the following:

a | b | c | d | e | |
---|---|---|---|---|---|

0 | 0.295727 | 0.291932 | 0.075200 | 0.803098 | -0.149588 |

1 | -0.551503 | -0.058048 | -1.555376 | 1.176424 | -1.154612 |

2 | 0.066221 | 0.794367 | 0.652569 | -0.980411 | 0.300658 |

3 | 0.162977 | -0.461645 | -0.718161 | 1.557218 | -0.822975 |

4 | 0.363506 | -0.396961 | -1.797157 | 0.504989 | -0.392984 |

5 | 0.574621 | -1.637724 | 0.674160 | 1.330299 | -0.908328 |

6 | -0.749366 | -0.091287 | -1.018482 | 0.722062 | -1.466433 |

7 | -0.390965 | 1.136712 | -0.382102 | -2.233372 | 1.274391 |

8 | 0.028914 | -0.320607 | -0.213236 | -0.075069 | 1.516394 |

9 | -1.174272 | 0.782454 | -0.771434 | -2.441267 | 0.456506 |

We want to read in a csv file with the population data of all countries (July 2014). The delimiter of the file a a space and commas are used to separate groups of thousands in the numbers:

pop = pd.read_csv("countries_population.csv", quotechar="'", sep=" ", thousands=",") popThe previous Python code returned the following result:

China | 1,355,692,576 | |
---|---|---|

0 | India | 1236344631 |

1 | European Union | 511434812 |

2 | United States | 318892103 |

3 | Indonesia | 253609643 |

4 | Brazil | 202656788 |

5 | Pakistan | 196174380 |

6 | Nigeria | 177155754 |

7 | Bangladesh | 166280712 |

8 | Russia | 142470272 |

9 | Japan | 127103388 |

10 | Mexico | 120286655 |

11 | Philippines | 107668231 |

12 | Ethiopia | 96633458 |

13 | Vietnam | 93421835 |

14 | Egypt | 86895099 |

15 | Turkey | 81619392 |

16 | Germany | 80996685 |

17 | Iran | 80840713 |

18 | Congo, Democratic Republic of the | 77433744 |

19 | Thailand | 67741401 |

20 | France | 66259012 |

21 | United Kingdom | 63742977 |

22 | Italy | 61680122 |

23 | Burma | 55746253 |

24 | Tanzania | 49639138 |

25 | Korea, South | 49039986 |

26 | South Africa | 48375645 |

27 | Spain | 47737941 |

28 | Colombia | 46245297 |

29 | Kenya | 45010056 |

... | ... | ... |

207 | Saint Kitts and Nevis | 51538 |

208 | Northern Mariana Islands | 51483 |

209 | Faroe Islands | 49947 |

210 | Turks and Caicos Islands | 49070 |

211 | Sint Maarten | 39689 |

212 | Liechtenstein | 37313 |

213 | San Marino | 32742 |

214 | British Virgin Islands | 32680 |

215 | Saint Martin | 31530 |

216 | Monaco | 30508 |

217 | Gibraltar | 29185 |

218 | Palau | 21186 |

219 | Anguilla | 16086 |

220 | Wallis and Futuna | 15561 |

221 | Tuvalu | 10782 |

222 | Cook Islands | 10134 |

223 | Nauru | 9488 |

224 | Saint Helena, Ascension, and Tristan da Cunha | 7776 |

225 | Saint Barthelemy | 7267 |

226 | Saint Pierre and Miquelon | 5716 |

227 | Montserrat | 5215 |

228 | Falkland Islands (Islas Malvinas) | 3361 |

229 | Norfolk Island | 2210 |

230 | Svalbard | 1872 |

231 | Christmas Island | 1530 |

232 | Tokelau | 1337 |

233 | Niue | 1190 |

234 | Holy See (Vatican City) | 842 |

235 | Cocos (Keeling) Islands | 596 |

236 | Pitcairn Islands | 48 |

237 rows × 2 columns

In [ ]: