Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Refactor BentoCloud docs #4525

Merged
merged 2 commits into from
Feb 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file not shown.
59 changes: 0 additions & 59 deletions docs/source/bentocloud/best-practices/cost-optimization.rst

This file was deleted.

21 changes: 0 additions & 21 deletions docs/source/bentocloud/best-practices/index.rst

This file was deleted.

61 changes: 61 additions & 0 deletions docs/source/bentocloud/how-tos/autoscaling.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
===========
Autoscaling
===========

The autoscaling feature of BentoCloud dynamically adjusts the number of Service replicas within the specified minimum and maximum limits. This document explains how to set autoscaling for Deployments.

You can define the minimum and maximum values to define the boundaries for scaling, allowing the autoscaler to reduce or increase the number of replicas as needed. This feature supports scaling to zero replica. You can also define the specific metric thresholds that the autoscaler will use to determine when to adjust the number of replicas. The available ``metrics`` values include:

- ``cpu``: The CPU utilization percentage.
- ``memory``: The memory utilization.
- ``gpu``: The GPU utilization percentage.
- ``qps``: The queries per second.

By setting values for these fields, you are instructing the autoscaler to ensure that the average for each metric does not exceed the specified thresholds. For example, if you set the CPU value to ``80``, the autoscaler will target an average CPU utilization of 80%.

Allowed scaling-up behaviors (``scale_up_behavior``):

- ``fast`` (default): There is no stabilization window, so the autoscaler can increase the number of Pods immediately if necessary. It can increase the number of Pods by 100% or by 4 Pods, whichever is higher, every 15 seconds.
- ``stable``: The autoscaler can increase the number of Pods, but it will stabilize the number of Pods for 300 seconds (5 minutes) before deciding to scale up further. It can increase the number of Pods by 100% every 15 seconds.
- ``disabled``: Scaling-up is turned off.

Allowed scaling-down behaviors (``scale_down_behavior``):

- ``fast``: There is no stabilization window, so the autoscaler can reduce the number of Pods immediately if necessary. It can decrease the number of Pods by 100% or by 4 Pods, whichever is higher, every 15 seconds.
- ``stable`` (default): The autoscaler can reduce the number of Pods, but it will stabilize the number of Pods for 300 seconds (5 minutes) before deciding to scale down further. It can decrease the number of Pods by 100% every 15 seconds.
- ``disabled``: Scaling-down is turned off.

To set autoscaling, you need to configure the above fields in a separate YAML or JSON file. For example:

.. code-block:: yaml
:caption: `config-file.yaml`

services:
MyBentoService: # The Service name
scaling:
max_replicas: 2
min_replicas: 1
policy:
metrics:
- type: "cpu | memory | gpu | qps" # Specify the type here
value: "string" # Specify the value here
scale_down_behavior: "disabled | stable | fast" # Choose the behavior
scale_up_behavior: "disabled | stable | fast" # Choose the behavior

You can then deploy your project by referencing this file.

.. tab-set::

.. tab-item:: BentoML CLI

.. code-block:: bash

bentoml deploy . -f config-file.yaml

.. tab-item:: Python API

.. code-block:: python

import bentoml
# Set `bento` to the Bento name if it already exists
bentoml.deployment.create(bento = "./path_to_your_project", config_file="config-file.yaml")
6 changes: 3 additions & 3 deletions docs/source/bentocloud/how-tos/byoc.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
====
BYOC
====
====================
Bring your own cloud
====================

BentoCloud provides Bring Your Own Cloud (BYOC) as a part of the Enterprise plan, which allows you to run BentoCloud services within your
private cloud environment. This means the BentoCloud Control Plane and the Data Plane are separated, enabling you to stay closer to your data
Expand Down
Loading