Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docs on common stacks best practices #3092

Merged
merged 9 commits into from
Oct 17, 2024
87 changes: 86 additions & 1 deletion docs/book/how-to/popular-integrations/aws-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -286,6 +286,91 @@ Now that you have a functional AWS stack set up with ZenML, you can explore more
* Explore ZenML's [integrations](../../component-guide/README.md) with other popular tools and frameworks in the machine learning ecosystem.
* Join the [ZenML community](https://zenml.io/slack) to connect with other users, ask questions, and get support.

By leveraging the power of AWS and ZenML, you can streamline your machine learning workflows, improve collaboration, and deploy production-ready pipelines with ease. Happy experimenting and building!
By leveraging the power of AWS and ZenML, you can streamline your machine
learning workflows, improve collaboration, and deploy production-ready pipelines
with ease. What follows is a set of best practices for using your AWS stack with ZenML.

## Best Practices for Using an AWS Stack with ZenML

When working with an AWS stack in ZenML, consider the following best practices
to optimize your workflow, enhance security, and improve cost-efficiency. These
are all things you might want to do or amend in your own setup once you have
tried running some pipelines on your AWS stack.

### Use IAM Roles and Least Privilege Principle

Always adhere to the principle of least privilege when setting up IAM roles. Only grant the minimum permissions necessary for your ZenML pipelines to function. Regularly review and audit your [IAM roles](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html) to ensure they remain appropriate and secure.

### Leverage AWS Resource Tagging

Implement a [consistent tagging strategy](https://aws.amazon.com/solutions/guidance/tagging-on-aws/) for all of your AWS resources that you use for your pipelines. For example, if you have S3 as an artifact store in your stack, you should tag it like shown below:


```shell
aws s3api put-bucket-tagging --bucket your-bucket-name --tagging 'TagSet=[{Key=Project,Value=ZenML},{Key=Environment,Value=Production}]'
```

These tags will help you with billing and cost allocation tracking and also with
any cleanup efforts.

### Implement Cost Management Strategies

Use [AWS Cost Explorer](https://aws.amazon.com/aws-cost-management/aws-cost-explorer/) and [AWS Budgets](https://aws.amazon.com/aws-cost-management/aws-budgets/) to monitor and manage your spending. To create a cost budget:

1. Create a JSON file (e.g., `budget-config.json`) defining the budget:

```json
{
"BudgetLimit": {
"Amount": "100",
"Unit": "USD"
},
"BudgetName": "ZenML Monthly Budget",
"BudgetType": "COST",
"CostFilters": {
"TagKeyValue": [
"user:Project$ZenML"
]
},
"CostTypes": {
"IncludeTax": true,
"IncludeSubscription": true,
"UseBlended": false
},
"TimeUnit": "MONTHLY"
}
```

2. Create the cost budget:

```shell
aws budgets create-budget --account-id your-account-id --budget file://budget-config.json
```

Set up cost allocation tags to track expenses related to your ZenML projects:

```shell
aws ce create-cost-category-definition --name ZenML-Projects --rules-version 1 --rules file://rules.json
```

### Use Warm Pools for your SageMaker Pipelines

[Warm Pools in SageMaker](../../component-guide/orchestrators/sagemaker.md#using-warm-pools-for-your-pipelines) can significantly reduce the startup time of your pipeline steps, leading to faster iterations and improved development efficiency. This feature keeps compute instances in a "warm" state, ready to quickly start new jobs.

To enable Warm Pools, use the `SagemakerOrchestratorSettings` class:

```python
sagemaker_orchestrator_settings = SagemakerOrchestratorSettings(
keep_alive_period_in_seconds = 300, # 5 minutes, default value
)
```

This configuration keeps instances warm for 5 minutes after each job completes, allowing subsequent jobs to start faster if initiated within this timeframe. The reduced startup time can be particularly beneficial for iterative development processes or frequently run pipelines.

### Implement a Robust Backup Strategy

Regularly backup your critical data and configurations. For S3, enable versioning and consider using [cross-region replication](https://docs.aws.amazon.com/AmazonS3/latest/userguide/replication.html) for disaster recovery.

strickvl marked this conversation as resolved.
Show resolved Hide resolved
By following these best practices and implementing the provided examples, you can create a more secure, efficient, and cost-effective AWS stack for your ZenML projects. Remember to regularly review and update your practices as your projects evolve and as AWS introduces new features and services.

<figure><img src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" alt="ZenML Scarf"><figcaption></figcaption></figure>
74 changes: 74 additions & 0 deletions docs/book/how-to/popular-integrations/gcp-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,4 +175,78 @@ If you do not want to use any of the created resources in the future, simply del
gcloud project delete <PROJECT_ID_OR_NUMBER>
```

## Best Practices for Using a GCP Stack with ZenML

When working with a GCP stack in ZenML, consider the following best practices to optimize your workflow, enhance security, and improve cost-efficiency. These are all things you might want to do or amend in your own setup once you have tried running some pipelines on your GCP stack.

### Use IAM and Least Privilege Principle

Always adhere to the principle of least privilege when setting up IAM roles. Only grant the minimum permissions necessary for your ZenML pipelines to function. Regularly review and audit your IAM roles to ensure they remain appropriate and secure.

### Leverage GCP Resource Labeling

Implement a consistent labeling strategy for your GCP resources. To label a GCS bucket, for example:

```shell
gcloud storage buckets update gs://your-bucket-name --update-labels=project=zenml,environment=production
```

This command adds two labels to the bucket:
- A label with key "project" and value "zenml"
- A label with key "environment" and value "production"

You can add or update multiple labels in a single command by separating them with commas.

To remove a label, set its value to null:

```shell
gcloud storage buckets update gs://your-bucket-name --update-labels=label-to-remove=null
```

These labels will help you with billing and cost allocation tracking and also with any cleanup efforts.

To view the labels on a bucket:

```shell
gcloud storage buckets describe gs://your-bucket-name --format="default(labels)"
```

This will display all labels currently set on the specified bucket.

### Implement Cost Management Strategies

Use Google Cloud's [Cost Management tools](https://cloud.google.com/docs/costs-usage) to monitor and manage your spending. To set up a budget alert:

1. Navigate to the Google Cloud Console
2. Go to Billing > Budgets & Alerts
3. Click "Create Budget"
4. Set your budget amount, scope (project, product, etc.), and alert thresholds

You can also use the `gcloud` CLI to create a budget:

```shell
gcloud billing budgets create --billing-account=BILLING_ACCOUNT_ID --display-name="ZenML Monthly Budget" --budget-amount=1000 --threshold-rule=percent=90
```

Set up cost allocation labels to track expenses related to your ZenML projects in the Google Cloud Billing Console.

### Implement a Robust Backup Strategy

Regularly backup your critical data and configurations. For GCS, for example, enable versioning and consider using cross-region replication for disaster recovery.

To enable versioning on a GCS bucket:

```shell
gsutil versioning set on gs://your-bucket-name
```

To set up cross-region replication:

```shell
gsutil rewrite -r gs://source-bucket gs://destination-bucket
```

By following these best practices and implementing the provided examples, you can create a more secure, efficient, and cost-effective GCP stack for your ZenML projects. Remember to regularly review and update your practices as your projects evolve and as GCP introduces new features and services.


<figure><img src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" alt="ZenML Scarf"><figcaption></figcaption></figure>
2 changes: 1 addition & 1 deletion docs/book/how-to/popular-integrations/skypilot.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ This allows specifying VM size, spot usage, region, and more.

You can also configure resources per step:

```
```python
high_resource_settings = Skypilot<PROVIDER>OrchestratorSettings(...)

@step(settings={"orchestrator": high_resource_settings})
Expand Down