-
Notifications
You must be signed in to change notification settings - Fork 459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vertex AI Experiment Tracker Integration #3260
Open
nkhusainov
wants to merge
21
commits into
zenml-io:develop
Choose a base branch
from
nkhusainov:feature/vertexai-experiment-tracker
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 18 commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
0c3482e
add vertexai experiment tracker flavor
nkhusainov 945c8c6
add tests for vertexai experiment tracker
nkhusainov 71e3d50
add optional configuration parameters
nkhusainov 9e6b273
update type hint for tensorboard resource name method
nkhusainov 7f0b381
update default value of experiment_tensorboard to None
nkhusainov 18a4613
add validation for GCP location in VertexExperimentTrackerConfig
nkhusainov 50214aa
change experiment run_url with dashboard_url
nkhusainov 482f36e
add tests for formatting and validation of experiment and run names i…
nkhusainov c75e73f
update VertexExperimentTrackerConfig to use SecretField for sensitive…
nkhusainov 2422fa7
add service connector requirements to VertexExperimentTrackerFlavor
nkhusainov a8e3d84
fix: add missing newline at end of file in VertexExperimentTrackerFlavor
nkhusainov 8101977
update logo URL in VertexExperimentTrackerFlavor
nkhusainov e3928ae
refactor: update default location format and remove service_account a…
nkhusainov e8c0117
docs: add documentation for Vertex AI Experiment Tracker integration
nkhusainov d574b82
fix incorrect flavor type
nkhusainov 0806057
delete an extra line
nkhusainov 4673b36
improve code formatting and readability in VertexExperimentTracker an…
nkhusainov dd03e92
docs: enhance Vertex AI Experiment Tracker documentation with usage e…
nkhusainov 0190b31
set experiment and run names at prepare_step_run
nkhusainov 378fae8
update Vertex AI Experiment Tracker guide to use dynamic experiment a…
nkhusainov 0e69570
Merge branch 'develop' into feature/vertexai-experiment-tracker
htahir1 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
308 changes: 308 additions & 0 deletions
308
docs/book/component-guide/experiment-trackers/vertexai.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,308 @@ | ||
--- | ||
description: Logging and visualizing experiments with Vertex AI Experiment Tracker. | ||
--- | ||
|
||
# Vertex AI Experiment Tracker | ||
|
||
The Vertex AI Experiment Tracker is an [Experiment Tracker](./experiment-trackers.md) flavor provided with the Vertex AI ZenML integration. It uses the [Vertex AI tracking service](https://cloud.google.com/vertex-ai/docs/experiments/intro-vertex-ai-experiments) to log and visualize information from your pipeline steps (e.g., models, parameters, metrics). | ||
|
||
## When would you want to use it? | ||
|
||
[Vertex AI Experiment Tracker](https://cloud.google.com/vertex-ai/docs/experiments/intro-vertex-ai-experiments) is a managed service by Google Cloud that you would normally use in the iterative ML experimentation phase to track and visualize experiment results. That doesn't mean that it cannot be repurposed to track and visualize the results produced by your automated pipeline runs, as you make the transition toward a more production-oriented workflow. | ||
|
||
You should use the Vertex AI Experiment Tracker: | ||
|
||
* if you have already been using Vertex AI to track experiment results for your project and would like to continue doing so as you are incorporating MLOps workflows and best practices in your project through ZenML. | ||
* if you are looking for a more visually interactive way of navigating the results produced from your ZenML pipeline runs (e.g. models, metrics, datasets) | ||
* if you are building machine learning workflows in the Google Cloud ecosystem and want a managed experiment tracking solution tightly integrated with other Google Cloud services, Vertex AI is a great choice | ||
|
||
You should consider one of the other [Experiment Tracker flavors](./experiment-trackers.md#experiment-tracker-flavors) if you have never worked with Vertex AI before and would rather use another experiment tracking tool that you are more familiar with, or if you are not using GCP or using other cloud providers. | ||
|
||
## How do you configure it? | ||
|
||
The Vertex AI Experiment Tracker flavor is provided by the GCP ZenML integration, you need to install it on your local machine to be able to register a Vertex AI Experiment Tracker and add it to your stack: | ||
|
||
```shell | ||
zenml integration install gcp -y | ||
``` | ||
|
||
### Configuration Options | ||
|
||
To properly register the Vertex AI Experiment Tracker, you can provide several configuration options tailored to your needs. Here are the main configurations you may want to set: | ||
|
||
* `project`: Optional. GCP project name. If `None` it will be inferred from the environment. | ||
* `location`: Optional. GCP location where your experiments will be created. If not set defaults to us-central1. | ||
* `staging_bucket`: Optional. The default staging bucket to use to stage artifacts. In the form gs://... | ||
* `service_account_path`: Optional. A path to the service account credential json file to be used to interact with Vertex AI Experiment Tracker. Please check the [Authentication Methods](vertexai.md#authentication-methods) chapter for more details. | ||
|
||
With the project, location and staging_bucket, registering the Vertex AI Experiment Tracker can be done as follows: | ||
|
||
```shell | ||
# Register the Vertex AI Experiment Tracker | ||
zenml experiment-tracker register vertex_experiment_tracker \ | ||
--flavor=vertex \ | ||
--project=<GCP_PROJECT_ID> \ | ||
--location=<GCP_LOCATION> \ | ||
--staging_bucket=gs://<GCS_BUCKET-NAME> | ||
|
||
# Register and set a stack with the new experiment tracker | ||
zenml stack register custom_stack -e vertex_experiment_tracker ... --set | ||
``` | ||
|
||
### Authentication Methods | ||
|
||
Integrating and using a Vertex AI Experiment Tracker in your pipelines is not possible without employing some form of authentication. If you're looking for a quick way to get started locally, you can use the _Implicit Authentication_ method. However, the recommended way to authenticate to the Google Cloud Platform is through a [GCP Service Connector](../../how-to/infrastructure-deployment/auth-management/gcp-service-connector.md). This is particularly useful if you are configuring ZenML stacks that combine the Vertex AI Experiment Tracker with other remote stack components also running in GCP. | ||
|
||
> **Note**: Regardless of your chosen authentication method, you must grant your account the necessary roles to use Vertex AI Experiment Tracking. | ||
> * `roles/aiplatform.user` role on your project, which allows you to create, manage, and track your experiments within Vertex AI. | ||
> * `roles/storage.objectAdmin` role on your GCS bucket, granting the ability to read and write experiment artifacts, such as models and datasets, to the storage bucket. | ||
|
||
{% tabs %} | ||
{% tab title="Implicit Authentication" %} | ||
This configuration method assumes that you have authenticated locally to GCP using the [`gcloud` CLI](https://cloud.google.com/sdk/gcloud) (e.g., by running gcloud auth login). | ||
|
||
> **Note**: This method is quick for local setups but is unsuitable for team collaborations or production environments due to its lack of portability. | ||
|
||
We can then register the experiment tracker as follows: | ||
|
||
```shell | ||
# Register the Vertex AI Experiment Tracker | ||
zenml experiment-tracker register <EXPERIMENT_TRACKER_NAME> \ | ||
--flavor=vertex \ | ||
--project=<GCP_PROJECT_ID> \ | ||
--location=<GCP_LOCATION> \ | ||
--staging_bucket=gs://<GCS_BUCKET-NAME> | ||
|
||
# Register and set a stack with the new experiment tracker | ||
zenml stack register custom_stack -e vertex_experiment_tracker ... --set | ||
``` | ||
|
||
{% endtab %} | ||
|
||
{% tab title="GCP Service Connector (recommended)" %} | ||
To set up the Vertex AI Experiment Tracker to authenticate to GCP, it is recommended to leverage the many features provided by the [GCP Service Connector](../../how-to/infrastructure-deployment/auth-management/gcp-service-connector.md) such as auto-configuration, best security practices regarding long-lived credentials and reusing the same credentials across multiple stack components. | ||
|
||
If you don't already have a GCP Service Connector configured in your ZenML deployment, you can register one using the interactive CLI command. You have the option to configure a GCP Service Connector that can be used to access more than one type of GCP resource: | ||
|
||
```sh | ||
# Register a GCP Service Connector interactively | ||
zenml service-connector register --type gcp -i | ||
``` | ||
|
||
After having set up or decided on a GCP Service Connector to use, you can register the Vertex AI Experiment Tracker as follows: | ||
|
||
```shell | ||
# Register the Vertex AI Experiment Tracker | ||
zenml experiment-tracker register <EXPERIMENT_TRACKER_NAME> \ | ||
--flavor=vertex \ | ||
--project=<GCP_PROJECT_ID> \ | ||
--location=<GCP_LOCATION> \ | ||
--staging_bucket=gs://<GCS_BUCKET-NAME> | ||
|
||
zenml experiment-tracker connect <EXPERIMENT_TRACKER_NAME> --connector <CONNECTOR_NAME> | ||
|
||
# Register and set a stack with the new experiment tracker | ||
zenml stack register custom_stack -e vertex_experiment_tracker ... --set | ||
``` | ||
|
||
{% endtab %} | ||
|
||
{% tab title="GCP Credentials" %} | ||
When you register the Vertex AI Experiment Tracker, you can [generate a GCP Service Account Key](https://cloud.google.com/docs/authentication/application-default-credentials#attached-sa), store it in a [ZenML Secret](../../getting-started/deploying-zenml/secret-management.md) and then reference it in the Experiment Tracker configuration. | ||
|
||
This method has some advantages over the implicit authentication method: | ||
|
||
* you don't need to install and configure the GCP CLI on your host | ||
* you don't need to care about enabling your other stack components (orchestrators, step operators and model deployers) to have access to the experiment tracker through GCP Service Accounts and Workload Identity | ||
* you can combine the Vertex AI Experiment Tracker with other stack components that are not running in GCP | ||
|
||
For this method, you need to [create a user-managed GCP service account](https://cloud.google.com/iam/docs/service-accounts-create) and then [create a service account key](https://cloud.google.com/iam/docs/keys-create-delete#creating). | ||
|
||
With the service account key downloaded to a local file, you can register a ZenML secret and reference it in the Vertex AI Experiment Tracker configuration as follows: | ||
|
||
```shell | ||
# Register the Vertex AI Experiment Tracker and reference the ZenML secret | ||
zenml experiment-tracker register <EXPERIMENT_TRACKER_NAME> \ | ||
--flavor=vertex \ | ||
--project=<GCP_PROJECT_ID> \ | ||
--location=<GCP_LOCATION> \ | ||
--staging_bucket=gs://<GCS_BUCKET-NAME> \ | ||
--service_account_path=path/to/service_account_key.json | ||
|
||
# Register and set a stack with the new experiment tracker | ||
zenml experiment-tracker connect <EXPERIMENT_TRACKER_NAME> --connector <CONNECTOR_NAME> | ||
``` | ||
|
||
{% endtab %} | ||
{% endtabs %} | ||
|
||
## How do you use it? | ||
|
||
To be able to log information from a ZenML pipeline step using the Vertex AI Experiment Tracker component in the active stack, you need to enable an experiment tracker using the `@step` decorator. Then use Vertex AI's logging or auto-logging capabilities as you would normally do, e.g. | ||
|
||
Here are two examples demonstrating how to use the experiment tracker: | ||
|
||
### Example 1: Logging Metrics Using Built-in Methods | ||
|
||
This example demonstrates how to log time-series metrics using `aiplatform.log_time_series_metrics` from within a Keras callback, as well as using `aiplatform.log_metrics` to log specific metrics and `aiplatform.log_params` to log experiment parameters. The logged metrics can then be visualised in the UI of Vertex AI Experiment Tracker and integrated TensorBoard instance. | ||
|
||
```python | ||
from google.cloud import aiplatform | ||
|
||
class VertexAICallback(tf.keras.callbacks.Callback): | ||
def on_epoch_end(self, epoch, logs=None): | ||
logs = logs or {} | ||
metrics = {key: value for key, value in logs.items() if isinstance(value, (int, float))} | ||
aiplatform.log_time_series_metrics(metrics=metrics, step=epoch) | ||
|
||
|
||
@step(experiment_tracker="<VERTEXAI_TRACKER_STACK_COMPONENT_NAME>") | ||
def train_model( | ||
config: TrainerConfig, | ||
x_train: np.ndarray, | ||
y_train: np.ndarray, | ||
x_val: np.ndarray, | ||
y_val: np.ndarray, | ||
): | ||
aiplatform.autolog() | ||
|
||
... | ||
|
||
# Train the model, using the custom callback to log metrics into experiment tracker | ||
model.fit( | ||
x_train, | ||
y_train, | ||
validation_data=(x_test, y_test), | ||
epochs=config.epochs, | ||
batch_size=config.batch_size, | ||
callbacks=[VertexAICallback()] | ||
) | ||
|
||
... | ||
|
||
# Log specific metrics and parameters | ||
aiplatform.log_metrics(...) | ||
aiplatform.log_params(...) | ||
``` | ||
|
||
### Example 2: Uploading TensorBoard Logs | ||
|
||
This example demonstrates how to use an integrated TensorBoard instance to directly upload training logs. This is particularly useful if you're already using TensorBoard in your projects and want to benefit from its detailed visualizations during training. You can initiate the upload using `aiplatform.start_upload_tb_log` and conclude it with `aiplatform.end_upload_tb_log`. Similar to the first example, you can also log specific metrics and parameters directly. | ||
|
||
```python | ||
from google.cloud import aiplatform | ||
|
||
|
||
@step(experiment_tracker="<VERTEXAI_TRACKER_STACK_COMPONENT_NAME>") | ||
def train_model( | ||
config: TrainerConfig, | ||
gcs_path: str, | ||
x_train: np.ndarray, | ||
y_train: np.ndarray, | ||
x_val: np.ndarray, | ||
y_val: np.ndarray, | ||
): | ||
aiplatform.autolog() | ||
|
||
... | ||
|
||
experiment_name = ... | ||
experiment_run_name = ... | ||
|
||
# define a TensorBoard callback, logs are written to gcs_path. | ||
tensorboard_callback = tf.keras.callbacks.TensorBoard( | ||
log_dir=gcs_path, | ||
histogram_freq=1 | ||
) | ||
# start the TensorBoard log upload | ||
aiplatform.start_upload_tb_log( | ||
tensorboard_experiment_name=experiment_name, | ||
logdir=gcs_path, | ||
run_name_prefix=f"{experiment_run_name}_", | ||
) | ||
model.fit( | ||
x_train, | ||
y_train, | ||
validation_data=(x_test, y_test), | ||
epochs=config.epochs, | ||
batch_size=config.batch_size, | ||
) | ||
|
||
... | ||
|
||
# end the TensorBoard log upload | ||
aiplatform.end_upload_tb_log() | ||
|
||
aiplatform.log_metrics(...) | ||
aiplatform.log_params(...) | ||
``` | ||
|
||
|
||
{% hint style="info" %} | ||
Instead of hardcoding an experiment tracker name, you can also use the [Client](../../reference/python-client.md) to dynamically use the experiment tracker of your active stack: | ||
|
||
```python | ||
from zenml.client import Client | ||
|
||
experiment_tracker = Client().active_stack.experiment_tracker | ||
|
||
@step(experiment_tracker=experiment_tracker.name) | ||
def tf_trainer(...): | ||
... | ||
``` | ||
|
||
{% endhint %} | ||
|
||
### Experiment Tracker UI | ||
|
||
You can find the URL of the Vertex AI experiment linked to a specific ZenML run via the metadata of the step in which the experiment tracker was used: | ||
|
||
```python | ||
from zenml.client import Client | ||
|
||
client = Client() | ||
last_run = client.get_pipeline("<PIPELINE_NAME>").last_run | ||
trainer_step = last_run.steps.get("<STEP_NAME>") | ||
tracking_url = trainer_step.run_metadata["experiment_tracker_url"].value | ||
print(tracking_url) | ||
``` | ||
|
||
This will be the URL of the corresponding experiment in Vertex AI Experiment Tracker. | ||
|
||
Below are examples of the UI for the Vertex AI Experiment Tracker and the integrated TensorBoard instance. | ||
|
||
**Vertex AI Experiment Tracker UI** | ||
![VerteAI UI](../../.gitbook/assets/vertexai_experiment_tracker_ui.png) | ||
|
||
**TensorBoard UI** | ||
![TensorBoard UI](../../.gitbook/assets/vertexai_experiment_tracker_tb.png) | ||
|
||
### Additional configuration | ||
|
||
For additional configuration of the Vertex AI Experiment Tracker, you can pass `VertexExperimentTrackerSettings` to specify an experiment name or choose previously created TensorBoard instance. | ||
|
||
> **Note**: By default, Vertex AI will use the default TensorBoard instance in your project if you don't explicitly specify one. | ||
|
||
```python | ||
import mlflow | ||
from zenml.integrations.gcp.flavors.vertex_experiment_tracker_flavor import VertexExperimentTrackerSettings | ||
|
||
|
||
vertexai_settings = VertexExperimentTrackerSettings( | ||
experiment="<YOUR_EXPERIMENT_NAME>", | ||
experiment_tensorboard="TENSORBOARD_RESOURCE_NAME" | ||
) | ||
|
||
@step( | ||
experiment_tracker="<VERTEXAI_TRACKER_STACK_COMPONENT_NAME>", | ||
settings={"experiment_tracker": vertexai_settings}, | ||
) | ||
def step_one( | ||
data: np.ndarray, | ||
) -> np.ndarray: | ||
... | ||
``` | ||
|
||
Check out [this docs page](../../how-to/pipeline-development/use-configuration-files/runtime-configuration.md) for more information on how to specify settings. | ||
|
||
<figure><img src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" alt="ZenML Scarf"><figcaption></figcaption></figure> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
18 changes: 18 additions & 0 deletions
18
src/zenml/integrations/gcp/experiment_trackers/__init__.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at: | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express | ||
# or implied. See the License for the specific language governing | ||
# permissions and limitations under the License. | ||
"""Initialization for the VertexAI experiment tracker.""" | ||
|
||
from zenml.integrations.gcp.experiment_trackers.vertex_experiment_tracker import ( # noqa | ||
VertexExperimentTracker, | ||
) | ||
|
||
__all__ = ["VertexExperimentTracker"] |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great - but just as a small wish I'd love to see a screenshot of how it looks like on Vertex just to show it to users.
Also there is nothing about Tensorboard, and I know that that is a parameter in the implementation. Should that maybe stand out a bit as a special case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback, good suggestions! I'll add Tensorboard examples and several screenshots
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added 2 examples:
start_upload_tb_log
method requires experiment and run names, which are set at runtime.Question: What is the best way to access these variables? One option is to set them as object attributes during prepare_step_run, but this approach feels suboptimal. Are there cleaner alternatives?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm good questions. I'm also not sure - probably the best way right now is to set them in the experiment tracker class and then fetch them using client.active_stack.experiment_tracker.XYZ...
In case of credentials, one could leverage service connectors, but I think its a safe assumption that the vertex execution role has the required credentials .. so i'd leave that be for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done