Dask

Run TimeGPT distributedly on top of Dask

Dask is an open source parallel computing library for Python. In this guide, we will explain how to use TimeGPT on top of Dask.

Outline:

Installation
Load Your Data
Import Dask
Use TimeGPT on Dask

1. Installation

Install Dask through Fugue. Fugue provides an easy-to-use interface for distributed computing that lets users execute Python code on top of several distributed computing frameworks, including Dask.

Note

You can install fugue with pip:
pip install fugue[dask]

If executing on a distributed Dask cluster, ensure that the nixtla library is installed across all the workers.

2. Load Data

You can load your data as a pandas DataFrame. In this tutorial, we will use a dataset that contains hourly electricity prices from different markets.

import pandas as pd

df = pd.read_csv(
    'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv',
    parse_dates=['ds'],
) 
df.head()

	unique_id	ds	y
0	BE	2016-10-22 00:00:00	70.00
1	BE	2016-10-22 01:00:00	37.10
2	BE	2016-10-22 02:00:00	37.10
3	BE	2016-10-22 03:00:00	44.75
4	BE	2016-10-22 04:00:00	37.10

3. Import Dask

Import Dask and convert the pandas DataFrame to a Dask DataFrame.

import dask.dataframe as dd

dask_df = dd.from_pandas(df, npartitions=2)
dask_df

Dask DataFrame Structure:

	unique_id	ds	y
npartitions=2
0	string	string	float64
4200	...	...	...
8399	...	...	...

Dask Name: to_pyarrow_string, 2 graph layers

4. Use TimeGPT on Dask

Using TimeGPT on top of Dask is almost identical to the non-distributed case. The only difference is that you need to use a Dask DataFrame, which we already defined in the previous step.

First, instantiate the NixtlaClient class.

from nixtla import NixtlaClient

nixtla_client = NixtlaClient(
    # defaults to os.environ.get("NIXTLA_API_KEY")
    api_key = 'my_api_key_provided_by_nixtla'
)

👍
Use an Azure AI endpoint
To use an Azure AI endpoint, set the base_url argument:
nixtla_client = NixtlaClient(base_url="you azure ai endpoint", api_key="your api_key")

Then use any method from the NixtlaClient class such as forecast or cross_validation.

fcst_df = nixtla_client.forecast(dask_df, h=12)
fcst_df.compute().head()

	unique_id	ds	TimeGPT
0	BE	2016-12-31 00:00:00	45.190453
1	BE	2016-12-31 01:00:00	43.244446
2	BE	2016-12-31 02:00:00	41.958389
3	BE	2016-12-31 03:00:00	39.796486
4	BE	2016-12-31 04:00:00	39.204533

📘
Available models in Azure AI
If you are using an Azure AI endpoint, please be sure to set model="azureai":
nixtla_client.forecast(..., model="azureai")
For the public API, we support two models: timegpt-1 and timegpt-1-long-horizon.
By default, timegpt-1 is used. Please see this tutorial on how and when to use timegpt-1-long-horizon.

cv_df = nixtla_client.cross_validation(dask_df, h=12, n_windows=5, step_size=2)
cv_df.compute().head()

	unique_id	ds	cutoff	TimeGPT
0	BE	2016-12-30 04:00:00	2016-12-30 03:00:00	39.375439
1	BE	2016-12-30 05:00:00	2016-12-30 03:00:00	40.039215
2	BE	2016-12-30 06:00:00	2016-12-30 03:00:00	43.455849
3	BE	2016-12-30 07:00:00	2016-12-30 03:00:00	47.716408
4	BE	2016-12-30 08:00:00	2016-12-30 03:00:00	50.31665

You can also use exogenous variables with TimeGPT on top of Dask. To do this, please refer to the Exogenous Variables tutorial. Just keep in mind that instead of using a pandas DataFrame, you need to use a Dask DataFrame instead.

Dask

1. Installation

2. Load Data

3. Import Dask

4. Use TimeGPT on Dask

👍
Use an Azure AI endpoint

📘
Available models in Azure AI

1. Installation

2. Load Data

3. Import Dask

4. Use TimeGPT on Dask

👍Use an Azure AI endpoint

📘Available models in Azure AI

👍
Use an Azure AI endpoint

📘
Available models in Azure AI