Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SDK] Create Job From Docker API #1878

Open
andreyvelich opened this issue Aug 2, 2023 · 10 comments
Open

[SDK] Create Job From Docker API #1878

andreyvelich opened this issue Aug 2, 2023 · 10 comments

Comments

@andreyvelich
Copy link
Member

andreyvelich commented Aug 2, 2023

Previously, we created create_job_from_func API: #1659.
This API is useful for users who want to quickly convert their training function to a Kubeflow Distributed Training Job, but it is hard to be used for large models since all import/code should be self-contained.

Similar to KFP Containerized Python Components, we can introduce a new API called: create_job_from_docker which helps user converts their training code to a Kubeflow Training Job.

Initially, we can have the following signature:

def create_job_from_docker(
  self,
  name: str,
  namespace: Optional[str] = None,
  job_kind: Optional[str] = None,
  base_image: str = constants.PYTORCHJOB_BASE_IMAGE,
  command: str = None,
  num_worker_replicas: int = None):
    ...

Which is simply constructing Training Job using base image.

In the future, we can introduce target_image, packages_to_install, etc. parameters which allows SDK to build Docker image on a fly using Docker client.
User requires to run docker daemon to use it.

Related: kubeflow/common#66.

What do you think @kubeflow/wg-training-leads @tenzen-y @kuizhiqing @yaobaiwei @zw0610 @droctothorpe ?

@terrytangyuan
Copy link
Member

+1 for ease of use. Although I would avoid mentioning "docker" which is implementation specific.

@andreyvelich
Copy link
Member Author

Makes sense, any suggestions @terrytangyuan (e.g. create_job_from_image) ?

@terrytangyuan
Copy link
Member

What about create_job(func, img) that calls underlying implementation?

@andreyvelich
Copy link
Member Author

Makes sense, so just provide users 1 API called create_job where they can set Custom Resource, function or image and we are going to process the request accordingly, right ?

@terrytangyuan
Copy link
Member

Yep exactly this will avoid exploding the list of public APIs.

@tenzen-y
Copy link
Member

tenzen-y commented Aug 2, 2023

It's a good idea. SGTM

In the future, we can introduce target_image, packages_to_install, etc. parameters which allows SDK to build Docker image on a fly using Docker client.
User requires to run docker daemon to use it.

In future work, it might be better to add parameters to define if push built image to the registry.

@johnugeorge
Copy link
Member

/cc @gaocegege

@andreyvelich
Copy link
Member Author

/assign @andreyvelich

Copy link

github-actions bot commented Dec 3, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@tenzen-y
Copy link
Member

tenzen-y commented Dec 4, 2023

/lifecycle frozen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants