Cross-validation

One of the primary challenges in time series forecasting is the inherent uncertainty and variability over time, making it crucial to validate the accuracy and reliability of the models employed. Cross-validation, a robust model validation technique, is particularly adapted for this task, as it provides insights into the expected performance of a model on unseen data, ensuring the forecasts are reliable and resilient before being deployed in real-world scenarios.

TimeGPT, understanding the intricate needs of time series forecasting, incorporates the cross_validation method, designed to streamline the validation process for time series models. This functionality enables practitioners to rigorously test their forecasting models against historical data, assessing their effectiveness while tuning them for optimal performance. This tutorial will guide you through the nuanced process of conducting cross-validation within the NixtlaClient class, ensuring your time series forecasting models are not just well-constructed, but also validated for trustworthiness and precision.

1. Import packages

First, we install and import the required packages and initialize the Nixtla client.

We start off by initializing an instance of NixtlaClient.

import pandas as pd
from nixtla import NixtlaClient

from IPython.display import display
nixtla_client = NixtlaClient(
    # defaults to os.environ.get("NIXTLA_API_KEY")
    api_key = 'my_api_key_provided_by_nixtla'
)

๐Ÿ‘

Use an Azure AI endpoint

To use an Azure AI endpoint, remember to set also the base_url argument:

nixtla_client = NixtlaClient(base_url="you azure ai endpoint", api_key="your api_key")

2. Load data

Letโ€™s see an example, using the Peyton Manning dataset.

pm_df = pd.read_csv('https://datasets-nixtla.s3.amazonaws.com/peyton-manning.csv')

3. Cross-validation

The cross_validation method within the TimeGPT class is an advanced functionality crafted to perform systematic validation on time series forecasting models. This method necessitates a dataframe comprising time-ordered data and employs a rolling-window scheme to meticulously evaluate the modelโ€™s performance across different time periods, thereby ensuring the modelโ€™s reliability and stability over time. The animation below shows how TimeGPT performs cross-validation.

Rolling-window cross-validation

Key parameters include freq, which denotes the dataโ€™s frequency and is automatically inferred if not specified. The id_col, time_col, and target_col parameters designate the respective columns for each seriesโ€™ identifier, time step, and target values. The method offers customization through parameters like n_windows, indicating the number of separate time windows on which the model is assessed, and step_size, determining the gap between these windows. If step_size is unspecified, it defaults to the forecast horizon h.

The process also allows for model refinement via finetune_steps, specifying the number of iterations for model fine-tuning on new data. Data pre-processing is manageable through clean_ex_first, deciding whether to cleanse the exogenous signal prior to forecasting. Additionally, the method supports enhanced feature engineering from time data through the date_features parameter, which can automatically generate crucial date-related features or accept custom functions for bespoke feature creation. The date_features_to_one_hot parameter further enables the transformation of categorical date features into a format suitable for machine learning models.

In execution, cross_validation assesses the modelโ€™s forecasting accuracy in each window, providing a robust view of the modelโ€™s performance variability over time and potential overfitting. This detailed evaluation ensures the forecasts generated are not only accurate but also consistent across diverse temporal contexts.

timegpt_cv_df = nixtla_client.cross_validation(
    pm_df, 
    h=7, 
    n_windows=5, 
    freq='D',
)
timegpt_cv_df.head()
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Querying model metadata...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Restricting input...
INFO:nixtla.nixtla_client:Calling Cross Validation Endpoint...
unique_iddscutoffyTimeGPT
002015-12-172015-12-167.5918627.939553
102015-12-182015-12-167.5288697.887512
202015-12-192015-12-167.1716577.766617
302015-12-202015-12-167.8913317.931502
402015-12-212015-12-168.3600718.312632

๐Ÿ“˜

Available models in Azure AI

If you are using an Azure AI endpoint, please be sure to set model="azureai":

nixtla_client.cross_validation(..., model="azureai")

For the public API, we support two models: timegpt-1 and timegpt-1-long-horizon.

By default, timegpt-1 is used. Please see this tutorial on how and when to use timegpt-1-long-horizon.

cutoffs = timegpt_cv_df['cutoff'].unique()
for cutoff in cutoffs:
    fig = nixtla_client.plot(
        pm_df.tail(100), 
        timegpt_cv_df.query('cutoff == @cutoff').drop(columns=['cutoff', 'y']),
    )
    display(fig)

4. Cross-validation with prediction intervals

It is also possible to generate prediction intervals during cross-validation. To do so, we simply use the level argument.

timegpt_cv_df = nixtla_client.cross_validation(
    pm_df, 
    h=7, 
    n_windows=5, 
    freq='D',
    level=[80, 90],
)
timegpt_cv_df.head()
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Restricting input...
INFO:nixtla.nixtla_client:Calling Cross Validation Endpoint...
unique_iddscutoffyTimeGPTTimeGPT-hi-80TimeGPT-hi-90TimeGPT-lo-80TimeGPT-lo-90
002015-12-172015-12-167.5918627.9395538.2014658.3149567.6776427.564151
102015-12-182015-12-167.5288697.8875128.1754148.2074707.5996097.567553
202015-12-192015-12-167.1716577.7666178.2673638.3866747.2658717.146560
302015-12-202015-12-167.8913317.9315028.2059298.3699837.6570757.493020
402015-12-212015-12-168.3600718.3126329.1848939.6257947.4403716.999469

๐Ÿ“˜

Available models in Azure AI

If you are using an Azure AI endpoint, please be sure to set model="azureai":

nixtla_client.cross_validation(..., model="azureai")

For the public API, we support two models: timegpt-1 and timegpt-1-long-horizon.

By default, timegpt-1 is used. Please see this tutorial on how and when to use timegpt-1-long-horizon.

cutoffs = timegpt_cv_df['cutoff'].unique()
for cutoff in cutoffs:
    fig = nixtla_client.plot(
        pm_df.tail(100), 
        timegpt_cv_df.query('cutoff == @cutoff').drop(columns=['cutoff', 'y']),
        level=[80, 90],
        models=['TimeGPT']
    )
    display(fig)

5. Cross-validation with exogenous variables

Time features

It is possible to include exogenous variables when performing cross-validation. Here we use the date_features parameter to create labels for each month. These features are then used by the model to make predictions during cross-validation.

timegpt_cv_df = nixtla_client.cross_validation(
    pm_df, 
    h=7, 
    n_windows=5,  
    freq='D',
    level=[80, 90],
    date_features=['month'],
    date_features_to_one_hot=True,
)
timegpt_cv_df.head()
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Using the following exogenous features: ['month_1.0', 'month_2.0', 'month_3.0', 'month_4.0', 'month_5.0', 'month_6.0', 'month_7.0', 'month_8.0', 'month_9.0', 'month_10.0', 'month_11.0', 'month_12.0']
INFO:nixtla.nixtla_client:Calling Cross Validation Endpoint...
unique_iddscutoffyTimeGPTTimeGPT-hi-80TimeGPT-hi-90TimeGPT-lo-80TimeGPT-lo-90
00.02015-12-172015-12-167.5918628.4263208.7219968.8241018.1306448.028540
10.02015-12-182015-12-167.5288698.0499628.4520838.6586037.6478427.441321
20.02015-12-192015-12-167.1716577.5090987.9847888.1380177.0334096.880180
30.02015-12-202015-12-167.8913317.7395368.3069148.6413557.1721586.837718
40.02015-12-212015-12-168.3600718.0274718.7228289.1523067.3321136.902636
cutoffs = timegpt_cv_df['cutoff'].unique()
for cutoff in cutoffs:
    fig = nixtla_client.plot(
        pm_df.tail(100), 
        timegpt_cv_df.query('cutoff == @cutoff').drop(columns=['cutoff', 'y']),
        level=[80, 90],
        models=['TimeGPT']
    )
    display(fig)

Dynamic features

Additionally you can pass dynamic exogenous variables to better inform TimeGPT about the data. You just simply have to add the exogenous regressors after the target column.

Y_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity.csv')
X_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/exogenous-vars-electricity.csv')
df = Y_df.merge(X_df)

Now letโ€™s cross validate TimeGPT considering this information

timegpt_cv_df_x = nixtla_client.cross_validation(
    df.groupby('unique_id').tail(100 * 48), 
    h=48, 
    n_windows=2,
    level=[80, 90]
)
cutoffs = timegpt_cv_df_x.query('unique_id == "BE"')['cutoff'].unique()
for cutoff in cutoffs:
    fig = nixtla_client.plot(
        df.query('unique_id == "BE"').tail(24 * 7), 
        timegpt_cv_df_x.query('cutoff == @cutoff & unique_id == "BE"').drop(columns=['cutoff', 'y']),
        models=['TimeGPT'],
        level=[80, 90],
    )
    display(fig)
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Inferred freq: h
INFO:nixtla.nixtla_client:Querying model metadata...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Using the following exogenous features: ['Exogenous1', 'Exogenous2', 'day_0', 'day_1', 'day_2', 'day_3', 'day_4', 'day_5', 'day_6']
INFO:nixtla.nixtla_client:Calling Cross Validation Endpoint...

๐Ÿ“˜

Available models in Azure AI

If you are using an Azure AI endpoint, please be sure to set model="azureai":

nixtla_client.cross_validation(..., model="azureai")

For the public API, we support two models: timegpt-1 and timegpt-1-long-horizon.

By default, timegpt-1 is used. Please see this tutorial on how and when to use timegpt-1-long-horizon.

6. Cross-validation with different TimeGPT instances

Also, you can generate cross validation for different instances of TimeGPT using the model argument. Here we use the base model and the model for long-horizon forecasting.

timegpt_cv_df_x_long_horizon = nixtla_client.cross_validation(
    df.groupby('unique_id').tail(100 * 48), 
    h=48, 
    n_windows=2,
    level=[80, 90],
    model='timegpt-1-long-horizon',
)
timegpt_cv_df_x_long_horizon.columns = timegpt_cv_df_x_long_horizon.columns.str.replace('TimeGPT', 'TimeGPT-LongHorizon')
timegpt_cv_df_x_models = timegpt_cv_df_x_long_horizon.merge(timegpt_cv_df_x)
cutoffs = timegpt_cv_df_x_models.query('unique_id == "BE"')['cutoff'].unique()
for cutoff in cutoffs:
    fig = nixtla_client.plot(
        df.query('unique_id == "BE"').tail(24 * 7), 
        timegpt_cv_df_x_models.query('cutoff == @cutoff & unique_id == "BE"').drop(columns=['cutoff', 'y']),
        models=['TimeGPT', 'TimeGPT-LongHorizon'],
        level=[80, 90],
    )
    display(fig)
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Inferred freq: h
INFO:nixtla.nixtla_client:Querying model metadata...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Using the following exogenous features: ['Exogenous1', 'Exogenous2', 'day_0', 'day_1', 'day_2', 'day_3', 'day_4', 'day_5', 'day_6']
INFO:nixtla.nixtla_client:Calling Cross Validation Endpoint...

๐Ÿ“˜

Available models in Azure AI

If you are using an Azure AI endpoint, please be sure to set model="azureai":

nixtla_client.cross_validation(..., model="azureai")

For the public API, we support two models: timegpt-1 and timegpt-1-long-horizon.

By default, timegpt-1 is used. Please see this tutorial on how and when to use timegpt-1-long-horizon.