Skip to content

Commit

Permalink
[docs] Provide guidelines for Many Model Training (#31517)
Browse files Browse the repository at this point in the history
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Closes #31486
  • Loading branch information
richardliaw authored Jan 10, 2023
1 parent 34a14a9 commit d970332
Show file tree
Hide file tree
Showing 4 changed files with 29 additions and 1 deletion.
7 changes: 7 additions & 0 deletions doc/source/data/examples/batch_training.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"(mmt-datasets)=\n",
"\n",
"# Batch Training with Ray Datasets"
]
},
Expand All @@ -15,6 +17,11 @@
"\n",
"This notebook showcases how to conduct batch training regression algorithms from [XGBoost](https://docs.ray.io/en/latest/tune/examples/tune-xgboost.html) and [Scikit-learn](https://docs.ray.io/en/latest/ray-more-libs/joblib.html) with **[Ray Datasets](https://docs.ray.io/en/latest/data/dataset.html)**. **XGBoost** is a popular open-source library used for regression and classification. **Scikit-learn** is a popular open-source library with a vast assortment of well-known ML algorithms.\n",
"\n",
"```{tip}\n",
"The workload showcased in this notebook can be expressed using different Ray components, such as Ray Data, Ray Tune and Ray Core.\n",
"For more information, including best practices, see {ref}`ref-use-cases-mmt`.\n",
"```\n",
"\n",
"![Batch training diagram](../../data/examples/images/batch-training.svg)\n",
"\n",
"For the data, we will use the [NYC Taxi dataset](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page). This popular tabular dataset contains historical taxi pickups by timestamp and location in NYC.\n",
Expand Down
7 changes: 7 additions & 0 deletions doc/source/ray-air/examples/batch_tuning.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
"id": "1ad6c41c",
"metadata": {},
"source": [
"(mmt-tune)=\n",
"\n",
"# Batch training & tuning on Ray Tune"
]
},
Expand All @@ -17,6 +19,11 @@
"\n",
"This notebook showcases how to conduct batch training regression algorithms from [XGBoost](https://docs.ray.io/en/latest/tune/examples/tune-xgboost.html) and [Scikit-learn](https://docs.ray.io/en/latest/ray-more-libs/joblib.html) with **[Ray Tune](https://docs.ray.io/en/latest/tune/index.html)**. **XGBoost** is a popular open-source library used for regression and classification. **Scikit-learn** is a popular open-source library with a vast assortment of well-known ML algorithms.\n",
"\n",
"```{tip}\n",
"The workload showcased in this notebook can be expressed using different Ray components, such as Ray Data, Ray Tune and Ray Core.\n",
"For best practices, see {ref}`ref-use-cases-mmt`.\n",
"```\n",
"\n",
"![Batch training diagram](../../data/examples/images/batch-training.svg)\n",
"\n",
"For the data, we will use the [NYC Taxi dataset](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page). This popular tabular dataset contains historical taxi pickups by timestamp and location in NYC.\n",
Expand Down
5 changes: 4 additions & 1 deletion doc/source/ray-core/examples/batch_training.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"(mmt-core)=\n",
"\n",
"# Batch Training with Ray Core"
]
},
Expand All @@ -12,7 +14,8 @@
"metadata": {},
"source": [
"```{tip}\n",
"We strongly recommend using {doc}`Ray Datasets </data/examples/batch_training>` and [AIR Trainers](air-trainers) to develop batch training, which will enable you to build it faster and more easily, and get the built-in benefits like auto-scaling actor pool. If you think your use case cannot be supported by Ray Datasets or AIR, we'd love to get your feedback e.g. through a [Ray GitHub issue](https://github.com/ray-project/ray/issues).\n",
"The workload showcased in this notebook can be expressed using different Ray components, such as Ray Data, Ray Tune and Ray Core.\n",
"For best practices, see {ref}`ref-use-cases-mmt`.\n",
"```\n",
"\n",
"Batch training and tuning are common tasks in simple machine learning use-cases such as time series forecasting. They require fitting of simple models on multiple data batches corresponding to locations, products, etc. This notebook showcases how to conduct batch training on the [NYC Taxi Dataset](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page) using only Ray Core and stateless Ray tasks."
Expand Down
11 changes: 11 additions & 0 deletions doc/source/ray-overview/use-cases.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,12 +57,23 @@ Batch inference refers to generating model predictions over a set of input obser
:text: [Example] Batch OCR processing using Ray Data
:classes: btn-link btn-block stretched-link

.. _ref-use-cases-mmt:

Many Model Training
-------------------

Many model training is common in ML use cases such as time series forecasting, which require fitting of models on multiple data batches corresponding to locations, products, etc.
Here, the focus is on training many models on subsets of a dataset. This is in contrast to training a single model on the entire dataset.

How do I do many model training on Ray?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

There are three ways of using Ray to express this workload.

1. If you have a large amount of data, use Ray Data (:ref:`Tutorial <mmt-datasets>`).
2. If you have a small amount of data (<10GB), want to integrate with tools, such as wandb and mlflow, and you have less than 20,000 models, use Ray Tune (:ref:`Tutorial <mmt-tune>`).
3. If your use case does not fit in any of the above categories, for example if you need to scale up to 1 million models, use Ray Core (:ref:`Tutorial <mmt-core>`), which gives you finer-grained control over the application. However, note that this is for advanced users and will require understanding of Ray Core :ref:`design patterns and anti-patterns <core-patterns>`.

.. TODO
Add link to many model training blog.
Expand Down

0 comments on commit d970332

Please sign in to comment.