Forecasting Time Series with Irregular Timestamps
When working with time series data, the frequency of the timestamps is a crucial factor that can significantly impact the forecasting results. Regular frequencies like daily, weekly, or monthly are straightforward to handle. However, irregular frequencies like business days, which exclude weekends, can be challenging for time series forecasting methods. Our forecast method is equipped to handle this kind of irregular time series data, as long as you specify the frequency of the series. For example, in the case of business days, the frequency should be passed as ‘B’. Without this, the method might fail to automatically detect the frequency, especially when the timestamps are irregular.
import pandas as pd
from nixtlats import TimeGPT
/home/ubuntu/miniconda/envs/nixtlats/lib/python3.11/site-packages/statsforecast/core.py:25: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from tqdm.autonotebook import tqdm
timegpt = TimeGPT(
# defaults to os.environ.get("TIMEGPT_TOKEN")
token = 'my_token_provided_by_nixtla'
)
The first step is to fetch your time series data. The data must include timestamps and the associated values. For instance, you might be working with stock prices, and your data could look something like the following. In this example we use OpenBB.
pltr_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/openbb/pltr.csv')
pltr_df['date'] = pd.to_datetime(pltr_df['date'])
pltr_df.head()
date | Open | High | Low | Close | Adj Close | Volume | Dividends | Stock Splits | |
---|---|---|---|---|---|---|---|---|---|
0 | 2020-09-30 | 10.00 | 11.41 | 9.11 | 9.50 | 9.50 | 338584400 | 0.0 | 0.0 |
1 | 2020-10-01 | 9.69 | 10.10 | 9.23 | 9.46 | 9.46 | 124297600 | 0.0 | 0.0 |
2 | 2020-10-02 | 9.06 | 9.28 | 8.94 | 9.20 | 9.20 | 55018300 | 0.0 | 0.0 |
3 | 2020-10-05 | 9.43 | 9.49 | 8.92 | 9.03 | 9.03 | 36316900 | 0.0 | 0.0 |
4 | 2020-10-06 | 9.04 | 10.18 | 8.90 | 9.90 | 9.90 | 90864000 | 0.0 | 0.0 |
(pltr_df['date'].dt.dayofweek > 4).sum()
0
As we can see the timestamp is irregular. Let’s inspect the Close
series.
timegpt.plot(pltr_df, time_col='date', target_col='Close')
To forecast this data, you can use our forecast
method. Importantly, remember to specify the frequency of the data using the freq
argument. In this case, it would be ‘B’ for business days. We also need to define the time_col
to select the index of the series (by default is ds
), and the target_col
to forecast our target variable, in this case we will forecast Close
:
fcst_pltr_df = timegpt.forecast(
df=pltr_df, h=14, freq='B',
time_col='date', target_col='Close',
)
INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
WARNING:nixtlats.timegpt:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
INFO:nixtlats.timegpt:Calling Forecast Endpoint...
fcst_pltr_df.head()
date | TimeGPT | |
---|---|---|
0 | 2023-09-25 | 14.688427 |
1 | 2023-09-26 | 14.742798 |
2 | 2023-09-27 | 14.781240 |
3 | 2023-09-28 | 14.824156 |
4 | 2023-09-29 | 14.795214 |
timegpt.plot(
pltr_df,
fcst_pltr_df,
time_col='date',
target_col='Close',
max_insample_length=90,
)
You can also add uncertainty quantification to your forecasts using the level
argument:
fcst_pltr_levels_df = timegpt.forecast(
df=pltr_df, h=42, freq='B',
time_col='date', target_col='Close',
add_history=True,
level=[40.66, 90],
)
INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
WARNING:nixtlats.timegpt:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
INFO:nixtlats.timegpt:Calling Forecast Endpoint...
INFO:nixtlats.timegpt:Calling Historical Forecast Endpoint...
timegpt.plot(
pltr_df,
fcst_pltr_levels_df,
time_col='date',
target_col='Close',
level=[40.66, 90],
)
If you want to forecast another just change the target_col
parameter. Let’s forecast Volume
now:
fcst_pltr_df = timegpt.forecast(
df=pltr_df, h=14, freq='B',
time_col='date', target_col='Volume',
)
timegpt.plot(
pltr_df,
fcst_pltr_df,
time_col='date',
max_insample_length=90,
target_col='Volume',
)
INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
WARNING:nixtlats.timegpt:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
INFO:nixtlats.timegpt:Calling Forecast Endpoint...
But what if we want to predict all the time series at once? We can do that reshaping our dataframe. Currently, the dataframe is in wide format (each series is a column), but we need to have them in long format (stacked one each other). We can do it with:
pltr_long_df = pd.melt(
pltr_df,
id_vars=['date'],
var_name='series_id'
)
pltr_long_df.head()
date | series_id | value | |
---|---|---|---|
0 | 2020-09-30 | Open | 10.00 |
1 | 2020-10-01 | Open | 9.69 |
2 | 2020-10-02 | Open | 9.06 |
3 | 2020-10-05 | Open | 9.43 |
4 | 2020-10-06 | Open | 9.04 |
fcst_pltr_long_df = timegpt.forecast(
df=pltr_long_df, h=14, freq='B',
id_col='series_id', time_col='date', target_col='value',
)
INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
WARNING:nixtlats.timegpt:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
INFO:nixtlats.timegpt:Calling Forecast Endpoint...
fcst_pltr_long_df.head()
series_id | date | TimeGPT | |
---|---|---|---|
0 | Adj Close | 2023-09-25 | 14.688427 |
1 | Adj Close | 2023-09-26 | 14.742798 |
2 | Adj Close | 2023-09-27 | 14.781240 |
3 | Adj Close | 2023-09-28 | 14.824156 |
4 | Adj Close | 2023-09-29 | 14.795214 |
timegpt.plot(
pltr_long_df,
fcst_pltr_long_df,
id_col='series_id',
time_col='date',
target_col='value',
unique_ids=['Open'],
max_insample_length=90,
)
Adding extra information
In time series forecasting, the variables that we predict are often influenced not just by their past values, but also by other factors or variables. These external variables, known as exogenous variables, can provide vital additional context that can significantly improve the accuracy of our forecasts. One such factor, and the focus of this tutorial, is the company’s revenue. Revenue figures can provide a key indicator of a company’s financial health and growth potential, both of which can heavily influence its stock price. That we can obtain from openbb.
revenue_pltr = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/openbb/revenue-pltr.csv')
revenue_pltr.tail()
fiscalDateEnding | totalRevenue | |
---|---|---|
5 | 2022-06-30 | 473010000.0 |
6 | 2022-09-30 | 477880000.0 |
7 | 2022-12-31 | 508624000.0 |
8 | 2023-03-31 | 525186000.0 |
9 | 2023-06-30 | 533317000.0 |
fcst_pltr_revenue = timegpt.forecast(revenue_pltr, h=1, time_col='fiscalDateEnding', target_col='totalRevenue')
INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
INFO:nixtlats.timegpt:Inferred freq: Q-DEC
INFO:nixtlats.timegpt:Calling Forecast Endpoint...
fcst_pltr_revenue.head()
fiscalDateEnding | TimeGPT | |
---|---|---|
0 | 2023-09-30 | 540005888 |
revenue_pltr['fiscalDateEnding'] = pd.to_datetime(revenue_pltr['fiscalDateEnding'])
revenue_pltr = revenue_pltr.set_index('fiscalDateEnding').resample('B').ffill().reset_index()
IMPORTANT NOTE: It’s crucial to highlight that in this process, we
are assigning the same revenue value to all days within the given quarter. This simplification is necessary due to the disparity in granularity between quarterly revenue data and daily stock price data. However, it’s vital to treat this assumption with caution in practical applications. The impact of quarterly revenue figures on daily stock prices can vary significantly within the quarter based on a range of factors, including changing market expectations, other financial news, and events. In this tutorial, we use this assumption to illustrate the process of incorporating exogenous variables into our forecasting model, but in real-world scenarios, a more nuanced approach may be needed, depending on the available data and the specific use case. Then we can create the full historic dataset.
pltr_revenue_df = pltr_df.merge(revenue_pltr.rename(columns={'fiscalDateEnding': 'date'}))
pltr_revenue_df.head()
date | Open | High | Low | Close | Adj Close | Volume | Dividends | Stock Splits | totalRevenue | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 2021-03-31 | 22.500000 | 23.850000 | 22.379999 | 23.290001 | 23.290001 | 61458500 | 0.0 | 0.0 | 341234000.0 |
1 | 2021-04-01 | 23.950001 | 23.950001 | 22.730000 | 23.070000 | 23.070000 | 51788800 | 0.0 | 0.0 | 341234000.0 |
2 | 2021-04-05 | 23.780001 | 24.450001 | 23.340000 | 23.440001 | 23.440001 | 65374300 | 0.0 | 0.0 | 341234000.0 |
3 | 2021-04-06 | 23.549999 | 23.610001 | 22.830000 | 23.270000 | 23.270000 | 41933500 | 0.0 | 0.0 | 341234000.0 |
4 | 2021-04-07 | 23.000000 | 23.549999 | 22.809999 | 22.900000 | 22.900000 | 32766200 | 0.0 | 0.0 | 341234000.0 |
horizon = 14
import numpy as np
future_df = pd.DataFrame({
'date': pd.date_range(pltr_revenue_df['date'].iloc[-1], periods=horizon + 1, freq='B')[-horizon:],
'totalRevenue': np.repeat(fcst_pltr_revenue.iloc[0]['TimeGPT'], horizon)
})
future_df.head()
date | totalRevenue | |
---|---|---|
0 | 2023-07-03 | 540005888 |
1 | 2023-07-04 | 540005888 |
2 | 2023-07-05 | 540005888 |
3 | 2023-07-06 | 540005888 |
4 | 2023-07-07 | 540005888 |
fcst_pltr_df = timegpt.forecast(
pltr_revenue_df, h=horizon,
freq='B',
time_col='date',
target_col='Close',
X_df=future_df,
)
INFO:nixtlats.timegpt:Validating inputs...
INFO:nixtlats.timegpt:Preprocessing dataframes...
WARNING:nixtlats.timegpt:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
INFO:nixtlats.timegpt:Calling Forecast Endpoint...
timegpt.plot(
pltr_revenue_df,
fcst_pltr_df,
id_col='series_id',
time_col='date',
target_col='Close',
max_insample_length=90,
)
We can also see the importance of the revenue:
timegpt.weights_x.plot.barh(x='features', y='weights')
<Axes: ylabel='features'>
Updated about 1 month ago