Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I submit tf-job in armada? #536

Closed
denkensk opened this issue Mar 17, 2021 · 6 comments
Closed

How can I submit tf-job in armada? #536

denkensk opened this issue Mar 17, 2021 · 6 comments
Labels
component/scheduling Armada Server, Scheduler and Scheduler Injester no on roadmap Good ideas that are not currently planned for implementation type/design Design / Architecture suggestions

Comments

@denkensk
Copy link

denkensk commented Mar 17, 2021

After I test the example in https://github.com/G-Research/armada/blob/master/example/jobs.yaml

Is there any methods to submit tf job? The object job isn't used frequently. The job of Tensorflow or Pytorch is widely used. https://github.com/kubeflow/tf-operator/blob/master/examples/v1/dist-mnist/tf_job_mnist.yaml

┆Issue is synchronized with this Jira Task by Unito

@denkensk
Copy link
Author

/help

@jankaspar
Copy link
Collaborator

Hi, Armada currently does not support TF Jobs, the closest to it you can get is job with multiple podSpecs, its possible to submit jobs like this:

queue: test
jobSetId: job-set-1
jobs:
  - priority: 0
    podSpecs:
      - containers:
          ...
        resources:
          ...
      - containers:
          ...
        resources:
          ...

Armada will aim to schedule multiple pods in one of the clusters at the same time.

Adding support of custom jobs specifications like TF Jobs and other kubeflow types (https://www.kubeflow.org/docs/components/training/) is something we have considered, but have not implemented yet.

Are you planning to use Armada for specific use case?

@denkensk
Copy link
Author

Hi, Armada currently does not support TF Jobs, the closest to it you can get is job with multiple podSpecs, its possible to submit jobs like this:

But if I set the tf job with multiple podSpecs, How can tf-operator operate it?

Are you planning to use Armada for specific use case?

Yes, But our job is mainly AI/BigData like tfjob\Pytorch\spark

Armada will aim to schedule multiple pods in one of the clusters at the same time.
Adding support of custom jobs specifications like TF Jobs and other kubeflow types (https://www.kubeflow.org/docs/components/training/) is something we have considered, but have not implemented yet.

I'm looking forward to this feature. Do you have a general schedule?

@denkensk
Copy link
Author

@jankaspar Thanks

@jankaspar
Copy link
Collaborator

Hi, sorry for late reply,
We don't have any schedule for the support of additional job types. But we would be happy to accept any PRs.

You are right, its not possible to use Tensor Flow operator, but you can use Tensor Flow in the multi node jobs without the operator.

@Sharpz7 Sharpz7 added type/design Design / Architecture suggestions component/scheduling Armada Server, Scheduler and Scheduler Injester no on roadmap Good ideas that are not currently planned for implementation labels Aug 24, 2023
@richscott
Copy link
Member

Closing this ticket, due to age and this feature still not being planned. Please reopen if this is still strong interest in this feature.

@richscott richscott closed this as not planned Won't fix, can't repro, duplicate, stale Jan 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/scheduling Armada Server, Scheduler and Scheduler Injester no on roadmap Good ideas that are not currently planned for implementation type/design Design / Architecture suggestions
Projects
None yet
Development

No branches or pull requests

4 participants