ARIMA Model Python Example - Time Series Forecasting

May 25, 2019

ARIMA Model Python Example - Time Series Forecasting

The ability to make predictions based upon historical observations creates a competitive advantage. For example, if an organization has the capacity to better forecast the sales quantities of a product, it will be in a more favourable position to optimize inventory levels. This can result in an increased liquidity of the organizations cash reserves, decrease of working capital and improved customer satisfaction by decreasing the backlog of orders.

In the domain of machine learning, there’s a specific collection of methods and techniques particularly well suited for predicting the value of a dependent variable according to time. In the proceeding article, we’ll cover AutoRegressive Integrated Moving Average (ARIMA).

We refer to a series of data points indexed (or graphed) in time order as a time series. A time series can be broken down into 3 components.

  • Trend: Upward & downward movement of the data with time over a large period of time (i.e. house appreciation)
  • Seasonality: Seasonal variance (i.e. an increase in demand for ice cream during summer)
  • Noise: Spikes & troughs at random intervals

Before applying any statistical model on a time series, we want to ensure it’s stationary.

If a time series is stationary and has a particular behaviour over a given time interval, then it is safe to assume that it will have same behaviour at some later point in time. Most statistical modelling methods assume or require the time series to be stationary.


The statsmodels library provides a suite of functions for working with time series data.

import numpy as np  
import pandas as pd  
from matplotlib import pyplot as plt  
from statsmodels.tsa.stattools import adfuller  
from statsmodels.tsa.seasonal import seasonal_decompose  
from statsmodels.tsa.arima_model import ARIMA  
from pandas.plotting import register_matplotlib_converters  

We’ll be working with a dataset that contains the number of airplane passengers on a given day.

df = pd.read_csv('air_passengers.csv', parse_dates = ['Month'], index_col = ['Month'])
plt.ylabel('Number of air passengers')  

As mentioned previously, before we can build a model, we must ensure that the time series is stationary. There are two primary way to determine whether a given time series is stationary.

  • Rolling Statistics: Plot the rolling mean and rolling standard deviation. The time series is stationary if they remain constant with time (with the naked eye look to see if the lines are straight and parallel to the x-axis).
  • Augmented Dickey-Fuller Test: The time series is considered stationary if the p-value is low (according to the null hypothesis) and the critical values at 1%, 5%, 10% confidence intervals are as close as possible to the ADF Statistics

For those who don’t understand the difference between average and rolling average, a 10-day rolling average would average out the closing prices for the first 10 days as the first data point. The next data point would drop the earliest price, add the price on day 11 and take the average, and so on as shown below.

``` rolling_mean = df.rolling(window = 12).mean() rolling_std = df.rolling(window = 12).std() ``` ``` plt.plot(df, color = 'blue', label = 'Original') plt.plot(rolling_mean, color = 'red', label = 'Rolling Mean') plt.plot(rolling_std, color = 'black', label = 'Rolling Std') plt.legend(loc = 'best') plt.title('Rolling Mean & Rolling Standard Deviation') ```

As you can see, the rolling mean and rolling standard deviation increase with time. Therefore, we can conclude that the time series is not stationary.

result = adfuller(df['Passengers'])
print('ADF Statistic: {}'.format(result[0]))  
print('p-value: {}'.format(result[1]))  
print('Critical Values:')  
for key, value in result[4].items():  
    print('\t{}: {}'.format(key, value))

The ADF Statistic is far from the critical values and the p-value is greater than the threshold (0.05). Thus, we can conclude that the time series is not stationary.

Taking the log of the dependent variable is as simple way of lowering the rate at which rolling mean increases.

df_log = np.log(df)  

Let’s create a function to run the two tests which determine whether a given time series is stationary.

def get_stationarity(timeseries):  
    # rolling statistics  
    rolling_mean = timeseries.rolling(window=12).mean()  
    rolling_std = timeseries.rolling(window=12).std()  
    # rolling statistics plot  
    original = plt.plot(timeseries, color='blue', label='Original')  
    mean = plt.plot(rolling_mean, color='red', label='Rolling Mean')  
    std = plt.plot(rolling_std, color='black', label='Rolling Std')  
    plt.title('Rolling Mean & Standard Deviation')  
    # Dickey–Fuller test:  
    result = adfuller(timeseries['Passengers'])  
    print('ADF Statistic: {}'.format(result[0]))  
    print('p-value: {}'.format(result[1]))  
    print('Critical Values:')  
    for key, value in result[4].items():  
        print('\t{}: {}'.format(key, value))

There are multiple transformations that we can apply to a time series to render it stationary. For instance, we subtract the rolling mean.

rolling_mean = df_log.rolling(window=12).mean()  
df_log_minus_mean = df_log - rolling_mean  

As we can see, after subtracting the mean, the rolling mean and standard deviation are approximately horizontal. The p-value is below the threshold of 0.05 and the ADF Statistic is close to the critical values. Therefore, the time series is stationary.

Applying exponential decay is another way of transforming a time series such that it is stationary.

rolling_mean_exp_decay = df_log.ewm(halflife=12, min_periods=0, adjust=True).mean()  
df_log_exp_decay = df_log - rolling_mean_exp_decay  

Exponential decay performed worse than subtracting the rolling mean. However, it is still more stationary than the original.

Let’s try one more method to determine whether an even better solution exists. When applying time shifting, we subtract every the point by the one that preceded it.

null, (x1−x0), (x2−x1), (x3−x2), (x4−x3), …, (xn−xn−1)

df_log_shift = df_log - df_log.shift()  

Time shifting performed worse than subtracting the rolling mean. However, it is still more stationary than the original.

AutoRegressive Model (AR)

Autoregressive models operate under the premise that past values have an effect on current values. AR models are commonly used in analyzing nature, economics, and other time-varying processes. As long as the assumption holds, we can build a linear regression model that attempts to predict value of a dependent variable today, given the values it had on previous days.

The order of the AR model corresponds to the number of days incorporated in the formula.

Moving Average Model (MA)

Assumes the value of the dependent variable on the current day depends on the previous days error terms. The formula can be expressed as:

You’ll also come across the equation written as:

where μ is the mean of the series, the _θ_1, …, θq are the parameters of the model and the εt, _εt_−1,…, _εt_−q are white noise error terms. The value of q is called the order of the MA model.

Auto Regressive Moving Average (ARMA)

The ARMA model is simply the combination of the AR and MA models.

AutoRegressive Integrated Moving Average Model (ARIMA)

The ARIMA (aka Box-Jenkins) model adds differencing to an ARMA model. Differencing subtracts the current value from the previous and can be used to transform a time series into one that’s stationary. For example, first-order differencing addresses linear trends, and employs the transformation zi = yi — yi-1. Second-order differencing addresses quadratic trends and employs a first-order difference of a first-order difference, namely zi = (yi — yi-1) — (yi-1 — yi-2), and so on.

Three integers (p, d, q) are typically used to parametrize ARIMA models.

  • p: number of autoregressive terms (AR order)
  • d: number of nonseasonal differences (differencing order)
  • q: number of moving-average terms (MA order)

Auto Correlation Function (ACF)

The correlation between the observations at the current point in time and the observations at all previous points in time. We can use ACF to determine the optimal number of MA terms. The number of terms determines the order of the model.

Partial Auto Correlation Function (PACF)

As the name implies, PACF is a subset of ACF. PACF expresses the correlation between observations made at two points in time while accounting for any influence from other data points. We can use PACF to determine the optimal number of terms to use in the AR model. The number of terms determines the order of the model.

Let’s take a look at an example. Recall, that PACF can be used to figure out the best order of the AR model. The horizontal blue dashed lines represent the significance thresholds. The vertical lines represent the ACF and PACF values at in point in time. Only the vertical lines that exceed the horizontal lines are considered significant.

Thus, we’d use the preceding two days in the autoregression equation.

Recall, that ACF can be used to figure out the best order of the MA model.

Thus, we’d only use yesterday in the moving average equation.

Going back to our example, we can create and fit an ARIMA model with AR of order 2, differencing of order 1 and MA of order 2.

decomposition = seasonal_decompose(df_log)   
model = ARIMA(df_log, order=(2,1,2))  
results =  
plt.plot(results.fittedvalues, color='red')

Then, we can see how the model compares to the original time series.

predictions_ARIMA_diff = pd.Series(results.fittedvalues, copy=True)  
predictions_ARIMA_diff_cumsum = predictions_ARIMA_diff.cumsum()  
predictions_ARIMA_log = pd.Series(df_log['Passengers'].iloc[0], index=df_log.index)  
predictions_ARIMA_log = predictions_ARIMA_log.add(predictions_ARIMA_diff_cumsum, fill_value=0)  
predictions_ARIMA = np.exp(predictions_ARIMA_log)  

Given that we have data going for every month going back 12 years and want to forecast the number of passengers for the next 10 years, we use (12 x12)+ (12 x 10) = 264.


Final Thoughts

In the domain of machine learning, there is a collection techniques for manipulating and interpreting variables that depend on time. Among these include ARIMA which can remove the trend component in order to accurately predict future values.

Profile picture

Written by Cory Maklin Genius is making complex ideas simple, not making simple ideas complex - Albert Einstein You should follow them on Twitter