-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docs] Provide guidelines for Many Model Training #31517
[docs] Provide guidelines for Many Model Training #31517
Conversation
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Should we link to the guidelines from the notebooks themselves? |
Yeah, that’s a good idea. Feel free to push something that corresponds to
what you’re thinking about.
…On Mon, Jan 9, 2023 at 8:59 AM Antoni Baum ***@***.***> wrote:
Should we link to the guidelines from the notebooks themselves?
—
Reply to this email directly, view it on GitHub
<#31517 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABCRZZLX2IGK456YDJJZGILWRQ7YTANCNFSM6AAAAAATT3NDGI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
@richardliaw added, PTAL |
awesome can you approve?
…On Mon, Jan 9, 2023 at 11:52 AM Antoni Baum ***@***.***> wrote:
@richardliaw <https://github.com/richardliaw> added, PTAL
—
Reply to this email directly, view it on GitHub
<#31517 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABCRZZJHZUK7IBHPUKMANB3WRRUALANCNFSM6AAAAAATT3NDGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@@ -24,7 +26,12 @@ | |||
"source": [ | |||
"Batch training in the context of this notebook is understood as creating the same model(s) for different and separate datasets or subsets of a dataset. This task is naively parallelizable and can be easily scaled with Ray.\n", | |||
"\n", | |||
"" | |||
"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we remove the tip in line 17 now to avoid contradiction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
1. If you have a large amount of data, use Ray Data (:ref:`Tutorial <mmt-datasets>`). | ||
2. If you want to integrate with tools, such as wandb and mlflow, and if you have less than 20,000 models, use Ray Tune (:ref:`Tutorial <mmt-tune>`). | ||
3. If you want lower level control, better scale (up to 1 million models), maybe faster performance, use Ray Core (:ref:`Tutorial <mmt-core>`). Note that this requires you to be more careful about implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit confused by this categorization. Are they mutually exclusive? I thought we would want to use Ray Data and Tune together. Maybe this is what you mean, but I think it could be more clear; one suggestion is to list Ray Core as a separate paragraph, more like an afterthought instead of equivalent to the other two.
Also confused by the "less than 20,000 models" part. What happens if you have more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RE: Tune vs Ray Data -- we are talking about two different APIs that are mutually exclusive, the data map_groups
and the Tune standard grid_search sweep.
RE: 20k - Basically Tune performance starts to degrade close to beyond that.
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Yes! open a pr?
…On Wed, Jan 11, 2023 at 9:10 AM Jules S. Damji ***@***.***> wrote:
Should we reference the blog on MMT
https://www.anyscale.com/blog/training-one-million-machine-learning-models-in-record-time-with-ray
—
Reply to this email directly, view it on GitHub
<#31517 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABCRZZJTU5X3P2MGC3F3VNLWR3SORANCNFSM6AAAAAATT3NDGI>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com> Closes #31486
Why are these changes needed?
Closes #31486 by providing basic guidelines for usage.
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.