Numerical & Scientific Computing with Python: Tutorial on Time Series

## Python, Pandas and Time Series

### Introduction

Our next chapter of our Pandas Tutorial deals with time series. A time series is a series of data points, which are listed (or indexed) in time order. Usually, a time series is a sequence of values, which are equally spaced points in time. Everything which consists of measured data connected with the corresponding time can be seen as a time series. Measurements can be taken irregularly, but in most cases time series consist of fixed frequencies. This means that data is measured or taken in a regular pattern, i.e. for example every 5 milliseconds, every 10 seconds, or very hour. Often time series are plotted as line charts.

In this chapter of our tutorial on Python with Pandas, we will introduce the tools from Pandas dealing with time series. You will learn how to cope with large time series and how modify time series.

Before you continue reading it might be useful to go through our tutorial on the standard Python modules dealing with time processing, i.e. datetime, time and calendar:

### Time Series in Pandas and Python

We could define a Pandas Series, which is built with an index consisting of time stamps.

import numpy as np
import pandas as pd
from datetime import datetime, timedelta as delta
ndays = 10
start = datetime(2017, 3, 31)
dates = [start - delta(days=x) for x in range(0, ndays)]
values = [25, 50, 15, 67, 70, 9, 28, 30, 32, 12]
ts = pd.Series(values, index=dates)
ts

The code above returned the following:
2017-03-31    25
2017-03-30    50
2017-03-29    15
2017-03-28    67
2017-03-27    70
2017-03-26     9
2017-03-25    28
2017-03-24    30
2017-03-23    32
2017-03-22    12
dtype: int64
type(ts)

The Python code above returned the following:
pandas.core.series.Series
ts.index

After having executed the Python code above we received the following:
DatetimeIndex(['2017-03-31', '2017-03-30', '2017-03-29', '2017-03-28',
'2017-03-27', '2017-03-26', '2017-03-25', '2017-03-24',
'2017-03-23', '2017-03-22'],
dtype='datetime64[ns]', freq=None)
values2 = [32, 54, 18, 61, 72, 19, 21, 33, 29, 17]
ts2 = pd.Series(values2, index=dates)


It is possible to use arithmetic operations on time series like we did with other series. We can for example at our two time series:

ts + ts2

The above code returned the following:
2017-03-31     57
2017-03-30    104
2017-03-29     33
2017-03-28    128
2017-03-27    142
2017-03-26     28
2017-03-25     49
2017-03-24     63
2017-03-23     61
2017-03-22     29
dtype: int64

Arithmetic mean between both Series, i.e. the values of the series:

(ts + ts2) / 2

The above code returned the following result:
2017-03-31    28.5
2017-03-30    52.0
2017-03-29    16.5
2017-03-28    64.0
2017-03-27    71.0
2017-03-26    14.0
2017-03-25    24.5
2017-03-24    31.5
2017-03-23    30.5
2017-03-22    14.5
dtype: float64

As with other series the indices don't have to be the same.

import pandas as pd
from datetime import datetime, timedelta as delta
ndays = 10
start = datetime(2017, 3, 31)
dates = [start - delta(days=x) for x in range(0, ndays)]
start2 = datetime(2017, 3, 26)
dates2 = [start2 - delta(days=x) for x in range(0, ndays)]
values = [25, 50, 15, 67, 70, 9, 28, 30, 32, 12]
values2 = [32, 54, 18, 61, 72, 19, 21, 33, 29, 17]
ts = pd.Series(values, index=dates)
ts2 = pd.Series(values2, index=dates2)
ts + ts2

After having executed the Python code above we received the following output:
2017-03-17     NaN
2017-03-18     NaN
2017-03-19     NaN
2017-03-20     NaN
2017-03-21     NaN
2017-03-22    84.0
2017-03-23    93.0
2017-03-24    48.0
2017-03-25    82.0
2017-03-26    41.0
2017-03-27     NaN
2017-03-28     NaN
2017-03-29     NaN
2017-03-30     NaN
2017-03-31     NaN
dtype: float64

### Create Date Ranges

The date_range method of the pandas module can be used to generate a DatetimeIndex:

import pandas as pd
index = pd.date_range('12/24/1970', '01/03/1971')
index

The above Python code returned the following:
DatetimeIndex(['1970-12-24', '1970-12-25', '1970-12-26', '1970-12-27',
'1970-12-28', '1970-12-29', '1970-12-30', '1970-12-31',
'1971-01-01', '1971-01-02', '1971-01-03'],
dtype='datetime64[ns]', freq='D')

We have passed a start and an end date to date_range in our previous example. It is also possible to pass only a start or an end date to the function. In this case, we have to determine the number of periods to generate by setting the keyword parameter 'periods':

index = pd.date_range(start='12/24/1970', periods=4)
print(index)

DatetimeIndex(['1970-12-24', '1970-12-25', '1970-12-26', '1970-12-27'], dtype='datetime64[ns]', freq='D')

index = pd.date_range(end='12/24/1970', periods=3)
print(index)

DatetimeIndex(['1970-12-22', '1970-12-23', '1970-12-24'], dtype='datetime64[ns]', freq='D')


We can also create time frequencies, which consists only of business days for example by setting the keyword parameter 'freq' to the string 'B':

index = pd.date_range('2017-04-07', '2017-04-13', freq="B")
print(index)

DatetimeIndex(['2017-04-07', '2017-04-10', '2017-04-11', '2017-04-12',
'2017-04-13'],
dtype='datetime64[ns]', freq='B')


In the following example, we create a time frequency which contains the month ends between two dates. We can see that the year 2016 contained the 29th of February, because it was a leap year:

index = pd.date_range('2016-02-25', '2016-07-02', freq="M")
index

This gets us the following:
DatetimeIndex(['2016-02-29', '2016-03-31', '2016-04-30', '2016-05-31',
'2016-06-30'],
dtype='datetime64[ns]', freq='M')

Other aliases:

Alias Description
C custom business day frequency (experimental)
D calendar day frequency
W weekly frequency
M month end frequency
MS month start frequency
Q quarter end frequency
QS quarter start frequency
A year end frequency
AS year start frequency
H hourly frequency
T minutely frequency
S secondly frequency
L milliseonds
U microseconds
index = pd.date_range('2017-02-05', '2017-04-13', freq="W-Mon")
index

DatetimeIndex(['2017-02-06', '2017-02-13', '2017-02-20', '2017-02-27',
dtype='datetime64[ns]', freq='W-MON')