Dask
Run TimeGPT distributedly on top of Dask
Dask is an open source parallel computing library for Python. In this guide, we will explain how to use TimeGPT
on top of Dask.
Outline:
1. Installation
Install Dask through Fugue. Fugue provides an easy-to-use interface for distributed computing that lets users execute Python code on top of several distributed computing frameworks, including Dask.
Note
You can install
fugue
withpip
:pip install fugue[dask]
If executing on a distributed Dask
cluster, ensure that the nixtla
library is installed across all the workers.
2. Load Data
You can load your data as a pandas
DataFrame. In this tutorial, we will use a dataset that contains hourly electricity prices from different markets.
import pandas as pd
df = pd.read_csv(
'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv',
parse_dates=['ds'],
)
df.head()
unique_id | ds | y | |
---|---|---|---|
0 | BE | 2016-10-22 00:00:00 | 70.00 |
1 | BE | 2016-10-22 01:00:00 | 37.10 |
2 | BE | 2016-10-22 02:00:00 | 37.10 |
3 | BE | 2016-10-22 03:00:00 | 44.75 |
4 | BE | 2016-10-22 04:00:00 | 37.10 |
3. Import Dask
Import Dask and convert the pandas
DataFrame to a Dask DataFrame.
import dask.dataframe as dd
dask_df = dd.from_pandas(df, npartitions=2)
dask_df
unique_id | ds | y | |
---|---|---|---|
npartitions=2 | |||
0 | string | string | float64 |
4200 | ... | ... | ... |
8399 | ... | ... | ... |
4. Use TimeGPT on Dask
Using TimeGPT
on top of Dask
is almost identical to the non-distributed case. The only difference is that you need to use a Dask
DataFrame, which we already defined in the previous step.
First, instantiate the NixtlaClient
class.
from nixtla import NixtlaClient
nixtla_client = NixtlaClient(
# defaults to os.environ.get("NIXTLA_API_KEY")
api_key = 'my_api_key_provided_by_nixtla'
)
Use an Azure AI endpoint
To use an Azure AI endpoint, set the
base_url
argument:
nixtla_client = NixtlaClient(base_url="you azure ai endpoint", api_key="your api_key")
Then use any method from the NixtlaClient
class such as forecast
or cross_validation
.
fcst_df = nixtla_client.forecast(dask_df, h=12)
fcst_df.compute().head()
unique_id | ds | TimeGPT | |
---|---|---|---|
0 | BE | 2016-12-31 00:00:00 | 45.190453 |
1 | BE | 2016-12-31 01:00:00 | 43.244446 |
2 | BE | 2016-12-31 02:00:00 | 41.958389 |
3 | BE | 2016-12-31 03:00:00 | 39.796486 |
4 | BE | 2016-12-31 04:00:00 | 39.204533 |
Available models in Azure AI
If you are using an Azure AI endpoint, please be sure to set
model="azureai"
:
nixtla_client.forecast(..., model="azureai")
For the public API, we support two models:
timegpt-1
andtimegpt-1-long-horizon
.By default,
timegpt-1
is used. Please see this tutorial on how and when to usetimegpt-1-long-horizon
.
cv_df = nixtla_client.cross_validation(dask_df, h=12, n_windows=5, step_size=2)
cv_df.compute().head()
unique_id | ds | cutoff | TimeGPT | |
---|---|---|---|---|
0 | BE | 2016-12-30 04:00:00 | 2016-12-30 03:00:00 | 39.375439 |
1 | BE | 2016-12-30 05:00:00 | 2016-12-30 03:00:00 | 40.039215 |
2 | BE | 2016-12-30 06:00:00 | 2016-12-30 03:00:00 | 43.455849 |
3 | BE | 2016-12-30 07:00:00 | 2016-12-30 03:00:00 | 47.716408 |
4 | BE | 2016-12-30 08:00:00 | 2016-12-30 03:00:00 | 50.31665 |
You can also use exogenous variables with TimeGPT
on top of Dask
. To do this, please refer to the Exogenous Variables tutorial. Just keep in mind that instead of using a pandas DataFrame, you need to use a Dask
DataFrame instead.
Updated about 1 month ago