Ray
Run TimeGPT distributedly on top of Ray
Ray is an open source unified compute framework to scale Python workloads. In this guide, we will explain how to use TimeGPT
on top of Ray.
Outline:
1. Installation
Install Ray through Fugue. Fugue provides an easy-to-use interface for distributed computing that lets users execute Python code on top of several distributed computing frameworks, including Ray.
Note
You can install
fugue
withpip
:pip install fugue[ray]
If executing on a distributed Ray
cluster, ensure that the nixtla
library is installed across all the workers.
2. Load Data
You can load your data as a pandas
DataFrame. In this tutorial, we will use a dataset that contains hourly electricity prices from different markets.
import pandas as pd
df = pd.read_csv(
'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv',
parse_dates=['ds'],
)
df.head()
unique_id | ds | y | |
---|---|---|---|
0 | BE | 2016-10-22 00:00:00 | 70.00 |
1 | BE | 2016-10-22 01:00:00 | 37.10 |
2 | BE | 2016-10-22 02:00:00 | 37.10 |
3 | BE | 2016-10-22 03:00:00 | 44.75 |
4 | BE | 2016-10-22 04:00:00 | 37.10 |
3. Initialize Ray
Initialize Ray
and convert the pandas DataFrame to a Ray
DataFrame.
import ray
from ray.cluster_utils import Cluster
ray_cluster = Cluster(
initialize_head=True,
head_node_args={"num_cpus": 2}
)
ray.init(address=ray_cluster.address, ignore_reinit_error=True)
2024-05-10 11:09:17,240 WARNING cluster_utils.py:157 -- Ray cluster mode is currently experimental and untested on Windows. If you are using it and running into issues please file a report at https://github.com/ray-project/ray/issues.
2024-05-10 11:09:19,076 INFO worker.py:1564 -- Connecting to existing Ray cluster at address: 127.0.0.1:63694...
2024-05-10 11:09:19,092 INFO worker.py:1740 -- Connected to Ray cluster. View the dashboard at 127.0.0.1:8265
ray_df = ray.data.from_pandas(df)
ray_df
MaterializedDataset(
num_blocks=1,
num_rows=8400,
schema={unique_id: object, ds: object, y: float64}
)
4. Use TimeGPT on Ray
Using TimeGPT
on top of Ray
is almost identical to the non-distributed case. The only difference is that you need to use a Ray
DataFrame.
First, instantiate the NixtlaClient
class.
from nixtla import NixtlaClient
nixtla_client = NixtlaClient(
# defaults to os.environ.get("NIXTLA_API_KEY")
api_key = 'my_api_key_provided_by_nixtla'
)
Use an Azure AI endpoint
To use an Azure AI endpoint, set the
base_url
argument:
nixtla_client = NixtlaClient(base_url="you azure ai endpoint", api_key="your api_key")
Then use any method from the NixtlaClient
class such as forecast
or cross_validation
.
%%capture
fcst_df = nixtla_client.forecast(ray_df, h=12)
Available models in Azure AI
If you are using an Azure AI endpoint, please be sure to set
model="azureai"
:
nixtla_client.forecast(..., model="azureai")
For the public API, we support two models:
timegpt-1
andtimegpt-1-long-horizon
.By default,
timegpt-1
is used. Please see this tutorial on how and when to usetimegpt-1-long-horizon
.
To visualize the result, use the to_pandas
method to convert the output of Ray
to a pandas
DataFrame.
fcst_df.to_pandas().tail()
unique_id | ds | TimeGPT | |
---|---|---|---|
55 | NP | 2018-12-24 07:00:00 | 55.387066 |
56 | NP | 2018-12-24 08:00:00 | 56.115517 |
57 | NP | 2018-12-24 09:00:00 | 56.090714 |
58 | NP | 2018-12-24 10:00:00 | 55.813717 |
59 | NP | 2018-12-24 11:00:00 | 55.528519 |
%%capture
cv_df = nixtla_client.cross_validation(ray_df, h=12, freq='H', n_windows=5, step_size=2)
cv_df.to_pandas().tail()
unique_id | ds | cutoff | TimeGPT | |
---|---|---|---|---|
295 | NP | 2018-12-23 19:00:00 | 2018-12-23 11:00:00 | 53.632019 |
296 | NP | 2018-12-23 20:00:00 | 2018-12-23 11:00:00 | 52.512775 |
297 | NP | 2018-12-23 21:00:00 | 2018-12-23 11:00:00 | 51.894035 |
298 | NP | 2018-12-23 22:00:00 | 2018-12-23 11:00:00 | 51.06572 |
299 | NP | 2018-12-23 23:00:00 | 2018-12-23 11:00:00 | 50.32592 |
You can also use exogenous variables with TimeGPT
on top of Ray
. To do this, please refer to the Exogenous Variables tutorial. Just keep in mind that instead of using a pandas DataFrame, you need to use a Ray
DataFrame instead.
5. Shutdown Ray
When you are done, shutdown the Ray
session.
ray.shutdown()
Updated about 18 hours ago