Irregular timestamps

When working with time series data, the frequency of the timestamps is a crucial factor that can significantly impact the forecasting results. Regular frequencies like daily, weekly, or monthly are straightforward to handle. However, irregular frequencies like business days, which exclude weekends, can be challenging for time series forecasting methods.

Our forecast method is equipped to handle this kind of irregular time series data, as long as you specify the frequency of the series. For example, in the case of business days, the frequency should be passed as ‘B’. Without this, the method might fail to automatically detect the frequency, especially when the timestamps are irregular.

1. Import packages

First, we import the required packages and initialize the Nixtla client

import pandas as pd
from nixtla import NixtlaClient
nixtla_client = NixtlaClient(
    # defaults to os.environ.get("NIXTLA_API_KEY")
    api_key = 'my_api_key_provided_by_nixtla'
)

2. Load data

The first step is to fetch your time series data. The data must include timestamps and the associated values. For instance, you might be working with stock prices, and your data could look something like the following. In this example we use OpenBB.

pltr_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/openbb/pltr.csv')
pltr_df['date'] = pd.to_datetime(pltr_df['date'])
pltr_df.head()
dateOpenHighLowCloseAdj CloseVolumeDividendsStock Splits
02020-09-3010.0011.419.119.509.503385844000.00.0
12020-10-019.6910.109.239.469.461242976000.00.0
22020-10-029.069.288.949.209.20550183000.00.0
32020-10-059.439.498.929.039.03363169000.00.0
42020-10-069.0410.188.909.909.90908640000.00.0

Let’s see that this dataset has irregular timestamps. The dayofweek attribute from pandas’ DatetimeIndex returns the day of the week with Monday=0,…,Sunday=6. So, checking if dayofweek > 4 is essentially checking if the date falls on a Saturday (5) or Sunday (6), which are typically non-business days (weekends).

(pltr_df['date'].dt.dayofweek > 4).sum()
0

As we can see the timestamp is irregular. Let’s inspect the Close series.

nixtla_client.plot(pltr_df, time_col='date', target_col='Close')

3. Forecast with irregular timestamps

To forecast this data, you can use our forecast method. Importantly, remember to specify the frequency of the data using the freq argument. In this case, it would be ‘B’ for business days. We also need to define the time_col to select the index of the series (by default is ds), and the target_col to forecast our target variable, in this case we will forecast Close:

fcst_pltr_df = nixtla_client.forecast(
    df=pltr_df, h=14, freq='B',
    time_col='date', target_col='Close',
)
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
WARNING:nixtla.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
fcst_pltr_df.head()
dateTimeGPT
02023-09-2514.688427
12023-09-2614.742798
22023-09-2714.781240
32023-09-2814.824156
42023-09-2914.795214

Remember, for business days, the frequency is ‘B’. For other frequencies, you can refer to the pandas offset aliases documentation.

By specifying the frequency, you’re helping the forecast method better understand the pattern in your data, resulting in more accurate and reliable forecasts.

Let’s plot the forecasts generated by TimeGPT.

nixtla_client.plot(
    pltr_df, 
    fcst_pltr_df, 
    time_col='date',
    target_col='Close',
    max_insample_length=90, 
)

You can also add uncertainty quantification to your forecasts using the level argument:

fcst_pltr_levels_df = nixtla_client.forecast(
    df=pltr_df, h=42, freq='B',
    time_col='date', target_col='Close',
    add_history=True,
    level=[80, 90],
)
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
WARNING:nixtla.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
INFO:nixtla.nixtla_client:Calling Historical Forecast Endpoint...
nixtla_client.plot(
    pltr_df, 
    fcst_pltr_levels_df, 
    time_col='date',
    target_col='Close',
    level=[80, 90],
)

If you want to forecast another variable just change the target_col parameter. Let’s forecast Volume now:

fcst_pltr_df = nixtla_client.forecast(
    df=pltr_df, h=14, freq='B',
    time_col='date', target_col='Volume',
)
nixtla_client.plot(
    pltr_df, 
    fcst_pltr_df, 
    time_col='date',
    max_insample_length=90,
    target_col='Volume',
)
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
WARNING:nixtla.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...

But what if we want to predict all the time series at once? We can do that reshaping our dataframe. Currently, the dataframe is in wide format (each series is a column), but we need to have them in long format (stacked one each other). We can do it with:

pltr_long_df = pd.melt(
    pltr_df, 
    id_vars=['date'],
    var_name='series_id'
)
pltr_long_df.head()
dateseries_idvalue
02020-09-30Open10.00
12020-10-01Open9.69
22020-10-02Open9.06
32020-10-05Open9.43
42020-10-06Open9.04

Then we just simply call the forecast method specifying the id_col parameter.

fcst_pltr_long_df = nixtla_client.forecast(
    df=pltr_long_df, h=14, freq='B',
    id_col='series_id', time_col='date', target_col='value',
)
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
WARNING:nixtla.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
fcst_pltr_long_df.head()
series_iddateTimeGPT
0Adj Close2023-09-2514.688427
1Adj Close2023-09-2614.742798
2Adj Close2023-09-2714.781240
3Adj Close2023-09-2814.824156
4Adj Close2023-09-2914.795214

Then we can forecast the Open series:

nixtla_client.plot(
    pltr_long_df, 
    fcst_pltr_long_df, 
    id_col='series_id',
    time_col='date',
    target_col='value',
    unique_ids=['Open'],
    max_insample_length=90,
)

Adding extra information

In time series forecasting, the variables that we predict are often influenced not just by their past values, but also by other factors or variables. These external variables, known as exogenous variables, can provide vital additional context that can significantly improve the accuracy of our forecasts. One such factor, and the focus of this tutorial, is the company’s revenue. Revenue figures can provide a key indicator of a company’s financial health and growth potential, both of which can heavily influence its stock price. That we can obtain from OpenBB.

revenue_pltr = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/openbb/revenue-pltr.csv')
revenue_pltr.tail()
fiscalDateEndingtotalRevenue
52022-06-30473010000.0
62022-09-30477880000.0
72022-12-31508624000.0
82023-03-31525186000.0
92023-06-30533317000.0

The first thing we observe in our dataset is that we have information available only up until the end of the first quarter of 2023. Our data is represented in a quarterly frequency, and our goal is to leverage this information to forecast the daily stock prices for the next 14 days beyond this date.

However, to accurately compute such a forecast that includes the revenue as an exogenous variable, we need to have an understanding of the future values of the revenue. This is critical because these future revenue values can significantly influence the stock price.

Since we’re aiming to predict 14 daily stock prices, we only need to forecast the revenue for the upcoming quarter. This approach allows us to create a cohesive forecasting pipeline where the output of one forecast (revenue) is used as an input to another (stock price), thereby leveraging all available information for the most accurate predictions possible.

fcst_pltr_revenue = nixtla_client.forecast(revenue_pltr, h=1, time_col='fiscalDateEnding', target_col='totalRevenue')
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: Q-DEC
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
fcst_pltr_revenue.head()
fiscalDateEndingTimeGPT
02023-09-30547264448

Continuing from where we left off, the next crucial step in our forecasting pipeline is to adjust the frequency of our data to match the stock prices’ frequency, which is represented on a business day basis. To accomplish this, we need to resample both the historical and future forecasted revenue data.

We can achieve this using the following code

revenue_pltr['fiscalDateEnding'] = pd.to_datetime(revenue_pltr['fiscalDateEnding'])
revenue_pltr = revenue_pltr.set_index('fiscalDateEnding').resample('B').ffill().reset_index()

IMPORTANT NOTE: It’s crucial to highlight that in this process, we are assigning the same revenue value to all days within the given quarter. This simplification is necessary due to the disparity in granularity between quarterly revenue data and daily stock price data. However, it’s vital to treat this assumption with caution in practical applications. The impact of quarterly revenue figures on daily stock prices can vary significantly within the quarter based on a range of factors, including changing market expectations, other financial news, and events. In this tutorial, we use this assumption to illustrate the process of incorporating exogenous variables into our forecasting model, but in real-world scenarios, a more nuanced approach may be needed, depending on the available data and the specific use case.

Then we can create the full historic dataset.

pltr_revenue_df = pltr_df.merge(revenue_pltr.rename(columns={'fiscalDateEnding': 'date'}))
pltr_revenue_df.head()
dateOpenHighLowCloseAdj CloseVolumeDividendsStock SplitstotalRevenue
02021-03-3122.50000023.85000022.37999923.29000123.290001614585000.00.0341234000.0
12021-04-0123.95000123.95000122.73000023.07000023.070000517888000.00.0341234000.0
22021-04-0523.78000124.45000123.34000023.44000123.440001653743000.00.0341234000.0
32021-04-0623.54999923.61000122.83000023.27000023.270000419335000.00.0341234000.0
42021-04-0723.00000023.54999922.80999922.90000022.900000327662000.00.0341234000.0

To calculate the dataframe of the future revenue:

horizon = 14
import numpy as np
future_df = pd.DataFrame({
    'date': pd.date_range(pltr_revenue_df['date'].iloc[-1], periods=horizon + 1, freq='B')[-horizon:],
    'totalRevenue': np.repeat(fcst_pltr_revenue.iloc[0]['TimeGPT'], horizon)
})
future_df.head()
datetotalRevenue
02023-07-03547264448
12023-07-04547264448
22023-07-05547264448
32023-07-06547264448
42023-07-07547264448

And then we can pass the future revenue in the forecast method using the X_df argument. Since the revenue is in the historic dataframe, that information will be used in the model.

fcst_pltr_df = nixtla_client.forecast(
    pltr_revenue_df, h=horizon, 
    freq='B',
    time_col='date', 
    target_col='Close',
    X_df=future_df,
)
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
WARNING:nixtla.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
INFO:nixtla.nixtla_client:Using the following exogenous variables: totalRevenue
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
nixtla_client.plot(
    pltr_revenue_df, 
    fcst_pltr_df, 
    id_col='series_id',
    time_col='date',
    target_col='Close',
    max_insample_length=90,
)

We can also see the importance of the revenue:

nixtla_client.weights_x.plot.barh(x='features', y='weights')

From the feature importance plot, we can conclude that the revenue is an important factor in the model’s predictions, meaning changes in revenue will impact the forecast outcome.