-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Docs][KubeRay] Update RayJob doc for backoffLimit and DELETE_RAYJOB_CR_AFTER_JOB_FINISHES #47445
[Docs][KubeRay] Update RayJob doc for backoffLimit and DELETE_RAYJOB_CR_AFTER_JOB_FINISHES #47445
Conversation
DELETE_RAYJOB_CR_AFTER_JOB_FINISHES
DELETE_RAYJOB_CR_AFTER_JOB_FINISHES
8b61e33
to
a425ea3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some comments indicating which version started supporting these APIs?
* Automatic resource cleanup | ||
* `shutdownAfterJobFinishes` (Optional): Determines whether to recycle the RayCluster after the Ray job finishes. The default value is false. | ||
* `ttlSecondsAfterFinished` (Optional): Only works if `shutdownAfterJobFinishes` is true. The KubeRay operator deletes the RayCluster and the submitter `ttlSecondsAfterFinished` seconds after the Ray job finishes. The default value is 0. | ||
* `activeDeadlineSeconds` (Optional): If the RayJob doesn't transition the `JobDeploymentStatus` to `Complete` or `Failed` within `activeDeadlineSeconds`, the KubeRay operator transitions the `JobDeploymentStatus` to `Failed`, citing `DeadlineExceeded` as the reason. | ||
* `DELETE_RAYJOB_CR_AFTER_JOB_FINISHES` (Optional): If this environment variable is set to true, the RayJob custom resource itself will be deleted if `shutdownAfterJobFinishes` is also set to true. Note that all resources created by the RayJob will be deleted, including the K8s Job. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this environment variable should be set in the KubeRay operator, not the CR. Could we clarify that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
69e9af2
to
d215030
Compare
doc/source/cluster/kubernetes/getting-started/rayjob-quick-start.md
Outdated
Show resolved
Hide resolved
* Automatic resource cleanup | ||
* `shutdownAfterJobFinishes` (Optional): Determines whether to recycle the RayCluster after the Ray job finishes. The default value is false. | ||
* `ttlSecondsAfterFinished` (Optional): Only works if `shutdownAfterJobFinishes` is true. The KubeRay operator deletes the RayCluster and the submitter `ttlSecondsAfterFinished` seconds after the Ray job finishes. The default value is 0. | ||
* `activeDeadlineSeconds` (Optional): If the RayJob doesn't transition the `JobDeploymentStatus` to `Complete` or `Failed` within `activeDeadlineSeconds`, the KubeRay operator transitions the `JobDeploymentStatus` to `Failed`, citing `DeadlineExceeded` as the reason. | ||
* `DELETE_RAYJOB_CR_AFTER_JOB_FINISHES` (Optional, added in version 1.2.0): This environment variable should be set for the KubeRay operator, not the RayJob resource. If this environment variable is set to true, the RayJob custom resource itself will be deleted if `shutdownAfterJobFinishes` is also set to true. Note that all resources created by the RayJob will be deleted, including the K8s Job. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it still confusing to add DELETE_RAYJOB_CR_AFTER_JOB_FINISHES here, because the other entries are for RayJob API where-as this is an environment variable in kuberay.
Maybe it's time for a dedicated section in the kubray installation docs about environment variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. But I don't find a good place to put the kuberay-related configuration. @kevin85421 @andrewsykim where do you recommend to put them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not a blocker for this PR, but we should consider a "Configuring KubeRay" page. There are other enviornment variables that are not documented yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
open an issue ray-project/kuberay#2356 to track the progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some style nits. Please consider using Vale to find these issues in the future. Please excuse any inaccuracies I introduced in my suggestions and correct as needed. Happy to answer any questions you have about the suggestions. Thanks for your contribution!
* Automatic resource cleanup | ||
* `shutdownAfterJobFinishes` (Optional): Determines whether to recycle the RayCluster after the Ray job finishes. The default value is false. | ||
* `ttlSecondsAfterFinished` (Optional): Only works if `shutdownAfterJobFinishes` is true. The KubeRay operator deletes the RayCluster and the submitter `ttlSecondsAfterFinished` seconds after the Ray job finishes. The default value is 0. | ||
* `activeDeadlineSeconds` (Optional): If the RayJob doesn't transition the `JobDeploymentStatus` to `Complete` or `Failed` within `activeDeadlineSeconds`, the KubeRay operator transitions the `JobDeploymentStatus` to `Failed`, citing `DeadlineExceeded` as the reason. | ||
* `DELETE_RAYJOB_CR_AFTER_JOB_FINISHES` (Optional, added in version 1.2.0): This environment variable should be set for the KubeRay operator, not the RayJob resource. If this environment variable is set to true, the RayJob custom resource itself will be deleted if `shutdownAfterJobFinishes` is also set to true. Note that all resources created by the RayJob will be deleted, including the K8s Job. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* `DELETE_RAYJOB_CR_AFTER_JOB_FINISHES` (Optional, added in version 1.2.0): This environment variable should be set for the KubeRay operator, not the RayJob resource. If this environment variable is set to true, the RayJob custom resource itself will be deleted if `shutdownAfterJobFinishes` is also set to true. Note that all resources created by the RayJob will be deleted, including the K8s Job. | |
* `DELETE_RAYJOB_CR_AFTER_JOB_FINISHES` (Optional, added in version 1.2.0): Set this environment variable for the KubeRay operator, not the RayJob resource. If you set this environment variable to true, the RayJob custom resource itself is deleted if you also set `shutdownAfterJobFinishes` to true. Note that KubeRay deletes all resources created by the RayJob, including the Kubernetes Job. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I thought I have no style error because here says that the editorial style is enforced in CI by Vale and the CI passed without error. I've pushed updates. Is the CI broken so that it did not catch this error?
…CR_AFTER_JOB_FINISHES Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
…rt.md Signed-off-by: Kai-Hsun Chen <kaihsun@apache.org>
Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
3ef4951
to
780d9cd
Compare
…CR_AFTER_JOB_FINISHES (ray-project#47445) Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com> Signed-off-by: Kai-Hsun Chen <kaihsun@apache.org> Co-authored-by: Kai-Hsun Chen <kaihsun@apache.org> Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>
Why are these changes needed?
Update RayJob doc for backoffLimit and DELETE_RAYJOB_CR_AFTER_JOB_FINISHES behaviors in the following PRs:
Related issue number
N/A
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.