[docs] Provide guidelines for Many Model Training #31517

richardliaw · 2023-01-07T10:17:36Z

Why are these changes needed?

Closes #31486 by providing basic guidelines for usage.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

Yard1 · 2023-01-09T16:59:41Z

Should we link to the guidelines from the notebooks themselves?

richardliaw · 2023-01-09T17:37:54Z

Yeah, that’s a good idea. Feel free to push something that corresponds to what you’re thinking about.

…

On Mon, Jan 9, 2023 at 8:59 AM Antoni Baum ***@***.***> wrote: Should we link to the guidelines from the notebooks themselves? — Reply to this email directly, view it on GitHub <#31517 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABCRZZLX2IGK456YDJJZGILWRQ7YTANCNFSM6AAAAAATT3NDGI> . You are receiving this because you authored the thread.Message ID: ***@***.***>

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Yard1 · 2023-01-09T19:52:25Z

@richardliaw added, PTAL

richardliaw · 2023-01-09T22:07:57Z

awesome can you approve?

…

On Mon, Jan 9, 2023 at 11:52 AM Antoni Baum ***@***.***> wrote: @richardliaw <https://github.com/richardliaw> added, PTAL — Reply to this email directly, view it on GitHub <#31517 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABCRZZJHZUK7IBHPUKMANB3WRRUALANCNFSM6AAAAAATT3NDGI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

matthewdeng · 2023-01-09T23:42:35Z

doc/source/ray-core/examples/batch_training.ipynb

@@ -24,7 +26,12 @@
   "source": [
    "Batch training in the context of this notebook is understood as creating the same model(s) for different and separate datasets or subsets of a dataset. This task is naively parallelizable and can be easily scaled with Ray.\n",
    "\n",
-    "![Batch training diagram](./images/batch-training.svg)"
+    "![Batch training diagram](./images/batch-training.svg)\n",


Should we remove the tip in line 17 now to avoid contradiction?

c21

LGTM

stephanie-wang · 2023-01-10T01:09:22Z

doc/source/ray-overview/use-cases.rst

+1. If you have a large amount of data, use Ray Data (:ref:`Tutorial <mmt-datasets>`).
+2. If you want to integrate with tools, such as wandb and mlflow, and if you have less than 20,000 models, use Ray Tune (:ref:`Tutorial <mmt-tune>`).
+3. If you want lower level control, better scale (up to 1 million models), maybe faster performance, use Ray Core (:ref:`Tutorial <mmt-core>`). Note that this requires you to be more careful about implementation.


I am a bit confused by this categorization. Are they mutually exclusive? I thought we would want to use Ray Data and Tune together. Maybe this is what you mean, but I think it could be more clear; one suggestion is to list Ray Core as a separate paragraph, more like an afterthought instead of equivalent to the other two.

Also confused by the "less than 20,000 models" part. What happens if you have more?

RE: Tune vs Ray Data -- we are talking about two different APIs that are mutually exclusive, the data map_groups and the Tune standard grid_search sweep.

RE: 20k - Basically Tune performance starts to degrade close to beyond that.

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

richardliaw · 2023-01-12T01:46:18Z

Yes! open a pr?

…

On Wed, Jan 11, 2023 at 9:10 AM Jules S. Damji ***@***.***> wrote: Should we reference the blog on MMT https://www.anyscale.com/blog/training-one-million-machine-learning-models-in-record-time-with-ray — Reply to this email directly, view it on GitHub <#31517 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABCRZZJTU5X3P2MGC3F3VNLWR3SORANCNFSM6AAAAAATT3NDGI> . You are receiving this because you modified the open/close state.Message ID: ***@***.***>

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com> Closes #31486

Provide guidelines

1969c26

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

richardliaw requested review from gjoliver, krfricke, xwjiang2010, amogkam, matthewdeng, Yard1, maxpumperla, a team, ericl, scv119, clarkzinzow, jjyao, jianoaix and c21 as code owners January 7, 2023 10:17

Yard1 added 2 commits January 9, 2023 19:48

Link back to use cases from notebooks

72b89a9

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Merge branch 'master' into pr/richardliaw/31517

78fd49f

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

matthewdeng approved these changes Jan 9, 2023

View reviewed changes

c21 approved these changes Jan 9, 2023

View reviewed changes

pcmoritz approved these changes Jan 10, 2023

View reviewed changes

stephanie-wang self-assigned this Jan 10, 2023

stephanie-wang reviewed Jan 10, 2023

View reviewed changes

richardliaw added 2 commits January 9, 2023 17:57

update-for-clarity

b60215f

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

fix

62df687

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

richardliaw merged commit d970332 into ray-project:master Jan 10, 2023

richardliaw deleted the provide-mmt-guides branch January 10, 2023 20:04

AmeerHajAli pushed a commit that referenced this pull request Jan 12, 2023

[docs] Provide guidelines for Many Model Training (#31517)

73ab006

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com> Closes #31486

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docs] Provide guidelines for Many Model Training #31517

[docs] Provide guidelines for Many Model Training #31517

richardliaw commented Jan 7, 2023 •

edited

Loading

Yard1 commented Jan 9, 2023

richardliaw commented Jan 9, 2023 via email

Yard1 commented Jan 9, 2023

richardliaw commented Jan 9, 2023 via email

matthewdeng Jan 9, 2023

c21 left a comment

stephanie-wang Jan 10, 2023 •

edited

Loading

richardliaw Jan 10, 2023

richardliaw commented Jan 12, 2023 via email

[docs] Provide guidelines for Many Model Training #31517

[docs] Provide guidelines for Many Model Training #31517

Conversation

richardliaw commented Jan 7, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

Yard1 commented Jan 9, 2023

richardliaw commented Jan 9, 2023 via email

Yard1 commented Jan 9, 2023

richardliaw commented Jan 9, 2023 via email

matthewdeng Jan 9, 2023

Choose a reason for hiding this comment

c21 left a comment

Choose a reason for hiding this comment

stephanie-wang Jan 10, 2023 • edited Loading

Choose a reason for hiding this comment

richardliaw Jan 10, 2023

Choose a reason for hiding this comment

richardliaw commented Jan 12, 2023 via email

richardliaw commented Jan 7, 2023 •

edited

Loading

stephanie-wang Jan 10, 2023 •

edited

Loading