-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement v2 controller that sets up SSH for communication #373
Comments
https://www.kubeflow.org/docs/about/contributing/#joining-the-kubeflow-github-org Hi, could you please join the kubeflow org? Then we do not need to trigger the CICD for your PR manually. |
Sent PR kubeflow/internal-acls#473 Thanks for the suggestion |
I verified that images |
@alculquicondor Has community discussed tradeoffs about job vs pod for launcher, statefulsets vs plain pods for workers? |
Yes for launcher. See the discussion here #386 For workers, it's still open for discussion. We could do Statefulsets, but I think plain pods might be fine for now. We might migrate to Indexed Jobs at some point, but since it's only available in k8s 1.22, it's kind of early to discuss. |
I think this is pretty much ready. The last things I would like to do are:
|
There's this page https://www.kubeflow.org/docs/components/training/mpi/ |
Maybe we can introduce Indexed Job to mpi-operator v2 once kubernetes/enhancements#3715 is graduated to beta. |
Consider introducing JobSet instead of managing raw pods for the workers: https://github.com/kubernetes-sigs/jobset |
Implementation for https://github.com/kubeflow/mpi-operator/blob/master/proposals/scalable-robust-operator.md
The text was updated successfully, but these errors were encountered: