Irregular timestamps

When working with time series data, the frequency of the timestamps is a crucial factor that can significantly impact the forecasting results. Regular frequencies like daily, weekly, or monthly are straightforward to handle. However, irregular frequencies like business days, which exclude weekends, can be challenging for time series forecasting methods.

Our forecast method is equipped to handle this kind of irregular time series data, as long as you specify the frequency of the series. For example, in the case of business days, the frequency should be passed as ‘B’. Without this, the method might fail to automatically detect the frequency, especially when the timestamps are irregular.

1. Import packages

First, we import the required packages and initialize the Nixtla client

import pandas as pd
from nixtla import NixtlaClient

nixtla_client = NixtlaClient(
    # defaults to os.environ.get("NIXTLA_API_KEY")
    api_key = 'my_api_key_provided_by_nixtla'
)

👍
Use an Azure AI endpoint
To use an Azure AI endpoint, set the base_url argument:
nixtla_client = NixtlaClient(base_url="you azure ai endpoint", api_key="your api_key")

2. Load data

The first step is to fetch your time series data. The data must include timestamps and the associated values. For instance, you might be working with stock prices, and your data could look something like the following. In this example we use OpenBB.

pltr_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/openbb/pltr.csv')
pltr_df['date'] = pd.to_datetime(pltr_df['date'])

pltr_df.head()

	date	Open	High	Low	Close	Adj Close	Volume
0	2020-09-30	10.00	11.41	9.11	9.50	9.50	338584400
1	2020-10-01	9.69	10.10	9.23	9.46	9.46	124297600
2	2020-10-02	9.06	9.28	8.94	9.20	9.20	55018300
3	2020-10-05	9.43	9.49	8.92	9.03	9.03	36316900
4	2020-10-06	9.04	10.18	8.90	9.90	9.90	90864000

Let’s see that this dataset has irregular timestamps. The dayofweek attribute from pandas’ DatetimeIndex returns the day of the week with Monday=0,…,Sunday=6. So, checking if dayofweek > 4 is essentially checking if the date falls on a Saturday (5) or Sunday (6), which are typically non-business days (weekends).

(pltr_df['date'].dt.dayofweek > 4).sum()

As we can see the timestamp is irregular. Let’s inspect the Close series.

nixtla_client.plot(pltr_df, time_col='date', target_col='Close')

3. Forecast with irregular timestamps

To forecast this data, you can use our forecast method. Importantly, remember to specify the frequency of the data using the freq argument. In this case, it would be ‘B’ for business days. We also need to define the time_col to select the index of the series (by default is ds), and the target_col to forecast our target variable, in this case we will forecast Close:

fcst_pltr_df = nixtla_client.forecast(
    df=pltr_df, h=14, freq='B',
    time_col='date', target_col='Close',
)

INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
WARNING:nixtla.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...

📘
Available models in Azure AI
If you are using an Azure AI endpoint, please be sure to set model="azureai":
nixtla_client.forecast(..., model="azureai")
For the public API, we support two models: timegpt-1 and timegpt-1-long-horizon.
By default, timegpt-1 is used. Please see this tutorial on how and when to use timegpt-1-long-horizon.

fcst_pltr_df.head()

	date	TimeGPT
0	2023-09-25	14.688427
1	2023-09-26	14.742798
2	2023-09-27	14.781240
3	2023-09-28	14.824156
4	2023-09-29	14.795214

Remember, for business days, the frequency is ‘B’. For other frequencies, you can refer to the pandas offset aliases documentation.

By specifying the frequency, you’re helping the forecast method better understand the pattern in your data, resulting in more accurate and reliable forecasts.

Let’s plot the forecasts generated by TimeGPT.

nixtla_client.plot(
    pltr_df, 
    fcst_pltr_df, 
    time_col='date',
    target_col='Close',
    max_insample_length=90, 
)

You can also add uncertainty quantification to your forecasts using the level argument:

fcst_pltr_levels_df = nixtla_client.forecast(
    df=pltr_df, h=42, freq='B',
    time_col='date', target_col='Close',
    add_history=True,
    level=[80, 90],
)

INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
WARNING:nixtla.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
INFO:nixtla.nixtla_client:Calling Historical Forecast Endpoint...

nixtla_client.plot(
    pltr_df, 
    fcst_pltr_levels_df, 
    time_col='date',
    target_col='Close',
    level=[80, 90],
)

If you want to forecast another variable just change the target_col parameter. Let’s forecast Volume now:

fcst_pltr_df = nixtla_client.forecast(
    df=pltr_df, h=14, freq='B',
    time_col='date', target_col='Volume',
)
nixtla_client.plot(
    pltr_df, 
    fcst_pltr_df, 
    time_col='date',
    max_insample_length=90,
    target_col='Volume',
)

INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
WARNING:nixtla.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...

But what if we want to predict all the time series at once? We can do that reshaping our dataframe. Currently, the dataframe is in wide format (each series is a column), but we need to have them in long format (stacked one each other). We can do it with:

pltr_long_df = pd.melt(
    pltr_df, 
    id_vars=['date'],
    var_name='series_id'
)

pltr_long_df.head()

	date	series_id	value
0	2020-09-30	Open	10.00
1	2020-10-01	Open	9.69
2	2020-10-02	Open	9.06
3	2020-10-05	Open	9.43
4	2020-10-06	Open	9.04

Then we just simply call the forecast method specifying the id_col parameter.

fcst_pltr_long_df = nixtla_client.forecast(
    df=pltr_long_df, h=14, freq='B',
    id_col='series_id', time_col='date', target_col='value',
)

INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
WARNING:nixtla.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...

fcst_pltr_long_df.head()

	series_id	date	TimeGPT
0	Adj Close	2023-09-25	14.688427
1	Adj Close	2023-09-26	14.742798
2	Adj Close	2023-09-27	14.781240
3	Adj Close	2023-09-28	14.824156
4	Adj Close	2023-09-29	14.795214

Then we can forecast the Open series:

nixtla_client.plot(
    pltr_long_df, 
    fcst_pltr_long_df, 
    id_col='series_id',
    time_col='date',
    target_col='value',
    unique_ids=['Open'],
    max_insample_length=90,
)

Adding extra information

In time series forecasting, the variables that we predict are often influenced not just by their past values, but also by other factors or variables. These external variables, known as exogenous variables, can provide vital additional context that can significantly improve the accuracy of our forecasts. One such factor, and the focus of this tutorial, is the company’s revenue. Revenue figures can provide a key indicator of a company’s financial health and growth potential, both of which can heavily influence its stock price. That we can obtain from OpenBB.

revenue_pltr = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/openbb/revenue-pltr.csv')

revenue_pltr.tail()

	fiscalDateEnding	totalRevenue
5	2022-06-30	473010000.0
6	2022-09-30	477880000.0
7	2022-12-31	508624000.0
8	2023-03-31	525186000.0
9	2023-06-30	533317000.0

The first thing we observe in our dataset is that we have information available only up until the end of the first quarter of 2023. Our data is represented in a quarterly frequency, and our goal is to leverage this information to forecast the daily stock prices for the next 14 days beyond this date.

However, to accurately compute such a forecast that includes the revenue as an exogenous variable, we need to have an understanding of the future values of the revenue. This is critical because these future revenue values can significantly influence the stock price.

Since we’re aiming to predict 14 daily stock prices, we only need to forecast the revenue for the upcoming quarter. This approach allows us to create a cohesive forecasting pipeline where the output of one forecast (revenue) is used as an input to another (stock price), thereby leveraging all available information for the most accurate predictions possible.

fcst_pltr_revenue = nixtla_client.forecast(revenue_pltr, h=1, time_col='fiscalDateEnding', target_col='totalRevenue')

INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: Q-DEC
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...

fcst_pltr_revenue.head()

	fiscalDateEnding	TimeGPT
0	2023-09-30	547264448

Continuing from where we left off, the next crucial step in our forecasting pipeline is to adjust the frequency of our data to match the stock prices’ frequency, which is represented on a business day basis. To accomplish this, we need to resample both the historical and future forecasted revenue data.

We can achieve this using the following code

revenue_pltr['fiscalDateEnding'] = pd.to_datetime(revenue_pltr['fiscalDateEnding'])
revenue_pltr = revenue_pltr.set_index('fiscalDateEnding').resample('B').ffill().reset_index()

IMPORTANT NOTE: It’s crucial to highlight that in this process, we are assigning the same revenue value to all days within the given quarter. This simplification is necessary due to the disparity in granularity between quarterly revenue data and daily stock price data. However, it’s vital to treat this assumption with caution in practical applications. The impact of quarterly revenue figures on daily stock prices can vary significantly within the quarter based on a range of factors, including changing market expectations, other financial news, and events. In this tutorial, we use this assumption to illustrate the process of incorporating exogenous variables into our forecasting model, but in real-world scenarios, a more nuanced approach may be needed, depending on the available data and the specific use case.

Then we can create the full historic dataset.

pltr_revenue_df = pltr_df.merge(revenue_pltr.rename(columns={'fiscalDateEnding': 'date'}))

pltr_revenue_df.head()

	date	Open	High	Low	Close	Adj Close	Volume	totalRevenue
0	2021-03-31	22.500000	23.850000	22.379999	23.290001	23.290001	61458500	341234000.0
1	2021-04-01	23.950001	23.950001	22.730000	23.070000	23.070000	51788800	341234000.0
2	2021-04-05	23.780001	24.450001	23.340000	23.440001	23.440001	65374300	341234000.0
3	2021-04-06	23.549999	23.610001	22.830000	23.270000	23.270000	41933500	341234000.0
4	2021-04-07	23.000000	23.549999	22.809999	22.900000	22.900000	32766200	341234000.0

To calculate the dataframe of the future revenue:

horizon = 14

import numpy as np

future_df = pd.DataFrame({
    'date': pd.date_range(pltr_revenue_df['date'].iloc[-1], periods=horizon + 1, freq='B')[-horizon:],
    'totalRevenue': np.repeat(fcst_pltr_revenue.iloc[0]['TimeGPT'], horizon)
})

future_df.head()

	date	totalRevenue
0	2023-07-03	547264448
1	2023-07-04	547264448
2	2023-07-05	547264448
3	2023-07-06	547264448
4	2023-07-07	547264448

And then we can pass the future revenue in the forecast method using the X_df argument. Since the revenue is in the historic dataframe, that information will be used in the model.

fcst_pltr_df = nixtla_client.forecast(
    pltr_revenue_df, h=horizon, 
    freq='B',
    time_col='date', 
    target_col='Close',
    X_df=future_df,
)

INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
WARNING:nixtla.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
INFO:nixtla.nixtla_client:Using the following exogenous variables: totalRevenue
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...

nixtla_client.plot(
    pltr_revenue_df, 
    fcst_pltr_df, 
    id_col='series_id',
    time_col='date',
    target_col='Close',
    max_insample_length=90,
)

We can also see the importance of the revenue:

nixtla_client.weights_x.plot.barh(x='features', y='weights')

From the feature importance plot, we can conclude that the revenue is an important factor in the model’s predictions, meaning changes in revenue will impact the forecast outcome.

Irregular timestamps

1. Import packages

👍
Use an Azure AI endpoint

2. Load data

3. Forecast with irregular timestamps

📘
Available models in Azure AI

Adding extra information

1. Import packages

👍Use an Azure AI endpoint

2. Load data

3. Forecast with irregular timestamps

📘Available models in Azure AI

Adding extra information

👍
Use an Azure AI endpoint

📘
Available models in Azure AI