Quantile forecasts

In forecasting, we are often interested in a distribution of predictions rather than only a point prediction, because we want to have a notion of the uncertainty around the forecast.

To this end, we can create quantile forecasts.

Quantile forecasts have an intuitive interpretation, as they present a specific percentile of the forecast distribution. This allows us to make statements such as ‘we expect 90% of our observations of air passengers to be above 100’. This approach is helpful for planning under uncertainty, providing a spectrum of possible future values and helping users make more informed decisions by considering the full range of potential outcomes.

With TimeGPT, we can create a distribution of forecasts, and extract the quantile forecasts for a specified percentile. For instance, the 25th and 75th quantiles give insights into the lower and upper quartiles of expected outcomes, respectively, while the 50th quantile, or median, offers a central estimate.

TimeGPT uses conformal prediction to produce the quantiles.

1. Import packages

First, we import the required packages and initialize the Nixtla client

import pandas as pd
from nixtla import NixtlaClient

from IPython.display import display
nixtla_client = NixtlaClient(
    # defaults to os.environ.get("NIXTLA_API_KEY")
    api_key = 'my_api_key_provided_by_nixtla'
)

2. Load data

df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')
df.head()
timestampvalue
01949-01-01112
11949-02-01118
21949-03-01132
31949-04-01129
41949-05-01121

3. Forecast with quantiles

When using TimeGPT for time series forecasting, you can set the quantiles you want to predict. Here’s how you could do it:

quantiles = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
timegpt_quantile_fcst_df = nixtla_client.forecast(
    df=df, h=12, 
    quantiles=quantiles, 
    time_col='timestamp', target_col='value',
)
timegpt_quantile_fcst_df.head()
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: MS
INFO:nixtla.nixtla_client:Restricting input...
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
timestampTimeGPTTimeGPT-q-10TimeGPT-q-20TimeGPT-q-30TimeGPT-q-40TimeGPT-q-50TimeGPT-q-60TimeGPT-q-70TimeGPT-q-80TimeGPT-q-90
01961-01-01437.837952431.987091435.043799435.384363436.402155437.837952439.273749440.291541440.632104443.688812
11961-02-01426.062744412.704956414.832837416.042432421.719196426.062744430.406293436.083057437.292651439.420532
21961-03-01463.116577437.412564444.234985446.420233450.705762463.116577475.527393479.812921481.998169488.820590
31961-04-01478.244507448.726837455.428375465.570038469.879114478.244507486.609900490.918976501.060638507.762177
41961-05-01505.646484478.409872493.154315497.990848499.138708505.646484512.154260513.302121518.138654532.883096

TimeGPT will return forecasts in the format TimeGPT-q-{int(100 * q)} for each quantile q.

nixtla_client.plot(
    df, timegpt_quantile_fcst_df, 
    time_col='timestamp', target_col='value',
)

It’s essential to note that the choice of the quantile (or quantiles) depends on your specific use case. For high-stakes predictions, you might lean towards more conservative quantiles, such as the 10th or 20th percentile, to ensure you’re prepared for worse-case scenarios. On the other hand, if you’re in a situation where the cost of over-preparation is high, you might choose a quantile closer to the median, like the 50th percentile, to balance being cautious and efficient.

For instance, if you are managing inventory for a retail business during a big sale event, opting for a lower quantile might help you avoid running out of stock, even if it means you might overstock a bit. But if you are scheduling staff for a restaurant, you might go with a quantile closer to the middle to ensure you have enough staff on hand without significantly overstaffing.

Ultimately, the choice comes down to understanding the balance between risk and cost in your specific context, and using quantile forecasts from TimeGPT allows you to tailor your strategy to fit that balance perfectly.

Historical Forecast

You can also compute quantile forecasts for historical forecasts adding the add_history=True parameter as follows:

timegpt_quantile_fcst_df = nixtla_client.forecast(
    df=df, h=12, 
    quantiles=quantiles, 
    time_col='timestamp', target_col='value',
    add_history=True,
)
timegpt_quantile_fcst_df.head()
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: MS
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
INFO:nixtla.nixtla_client:Calling Historical Forecast Endpoint...
timestampTimeGPTTimeGPT-q-10TimeGPT-q-20TimeGPT-q-30TimeGPT-q-40TimeGPT-q-50TimeGPT-q-60TimeGPT-q-70TimeGPT-q-80TimeGPT-q-90
01951-01-01135.483673111.937768120.020593125.848879130.828935135.483673140.138411145.118467150.946753159.029579
11951-02-01144.442398120.896493128.979318134.807604139.787660144.442398149.097136154.077192159.905478167.988304
21951-03-01157.191910133.646004141.728830147.557116152.537172157.191910161.846648166.826703172.654990180.737815
31951-04-01148.769363125.223458133.306284139.134570144.114625148.769363153.424102158.404157164.232443172.315269
41951-05-01140.472946116.927041125.009866130.838152135.818208140.472946145.127684150.107740155.936026164.018852
nixtla_client.plot(
    df, timegpt_quantile_fcst_df, 
    time_col='timestamp', target_col='value',
)

Cross Validation

The quantiles argument can also be included in the cross_validation method, allowing comparing the performance of TimeGPT across different windows and different quantiles.

timegpt_cv_quantile_fcst_df = nixtla_client.cross_validation(
    df=df, 
    h=12, 
    n_windows=5,
    quantiles=quantiles, 
    time_col='timestamp', 
    target_col='value',
)
timegpt_quantile_fcst_df.head()
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Inferred freq: MS
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: MS
INFO:nixtla.nixtla_client:Restricting input...
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: MS
INFO:nixtla.nixtla_client:Restricting input...
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: MS
INFO:nixtla.nixtla_client:Restricting input...
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: MS
INFO:nixtla.nixtla_client:Restricting input...
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: MS
INFO:nixtla.nixtla_client:Restricting input...
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
INFO:nixtla.nixtla_client:Validating inputs...
timestampTimeGPTTimeGPT-q-10TimeGPT-q-20TimeGPT-q-30TimeGPT-q-40TimeGPT-q-50TimeGPT-q-60TimeGPT-q-70TimeGPT-q-80TimeGPT-q-90
01951-01-01135.483673111.937768120.020593125.848879130.828935135.483673140.138411145.118467150.946753159.029579
11951-02-01144.442398120.896493128.979318134.807604139.787660144.442398149.097136154.077192159.905478167.988304
21951-03-01157.191910133.646004141.728830147.557116152.537172157.191910161.846648166.826703172.654990180.737815
31951-04-01148.769363125.223458133.306284139.134570144.114625148.769363153.424102158.404157164.232443172.315269
41951-05-01140.472946116.927041125.009866130.838152135.818208140.472946145.127684150.107740155.936026164.018852
cutoffs = timegpt_cv_quantile_fcst_df['cutoff'].unique()
for cutoff in cutoffs:
    fig = nixtla_client.plot(
        df.tail(100), 
        timegpt_cv_quantile_fcst_df.query('cutoff == @cutoff').drop(columns=['cutoff', 'value']),
        time_col='timestamp', 
        target_col='value'
    )
    display(fig)