Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix default label for Training Operators #1808

Merged

Conversation

andreyvelich
Copy link
Member

This PR fixes the default label for Kubeflow Training Operators.
After this we should be able to run E2E Kubeflow example: https://github.com/kubeflow/katib/blob/master/examples/v1beta1/kubeflow-pipelines/kubeflow-e2e-mnist.ipynb.

We have to cherry pick this change on release-0.13 branch and make the new RC.

/assign @kimwnasptd @tenzen-y @gaocegege @johnugeorge

@coveralls
Copy link

coveralls commented Feb 14, 2022

Coverage Status

Coverage decreased (-0.1%) to 74.138% when pulling 1517c4a on andreyvelich:fix-kubeflow-training-label into 35ea563 on kubeflow:master.

@andreyvelich
Copy link
Member Author

cc @kubeflow/wg-training-leads FYI

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch @andreyvelich!

Should we modify the description in the following docs?

  1. https://github.com/kubeflow/katib/blob/6339e48f01847372132477d123d6fe48dc72f164/docs/proposals/metrics-collector.md#mutating-webhook
In **Pod Level Injecting**,

1. Job operators (_e.x. TFjob/PyTorchjob_) tag the `job-role: master` ([#1064](https://github.com/kubeflow/tf-operator/pull/1064)) label on the master pod.
  1. https://github.com/kubeflow/katib/blob/6339e48f01847372132477d123d6fe48dc72f164/docs/proposals/trial-custom-crd.md#primary-pod-label-location
For example, for TFJob:
. . .
PrimaryPodLabel:
  "job-role": "master"
. . .

@andreyvelich
Copy link
Member Author

Nice catch @andreyvelich!

Should we modify the description in the following docs?

  1. https://github.com/kubeflow/katib/blob/6339e48f01847372132477d123d6fe48dc72f164/docs/proposals/metrics-collector.md#mutating-webhook
In **Pod Level Injecting**,

1. Job operators (_e.x. TFjob/PyTorchjob_) tag the `job-role: master` ([#1064](https://github.com/kubeflow/tf-operator/pull/1064)) label on the master pod.
  1. https://github.com/kubeflow/katib/blob/6339e48f01847372132477d123d6fe48dc72f164/docs/proposals/trial-custom-crd.md#primary-pod-label-location
For example, for TFJob:
. . .
PrimaryPodLabel:
  "job-role": "master"
. . .

Sure, let's change the proposals also.

Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich, terrytangyuan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tenzen-y
Copy link
Member

/retest

@johnugeorge
Copy link
Member

/lgtm

@google-oss-prow google-oss-prow bot added the lgtm label Feb 15, 2022
@google-oss-prow google-oss-prow bot merged commit 6a36763 into kubeflow:master Feb 15, 2022
@andreyvelich andreyvelich deleted the fix-kubeflow-training-label branch February 15, 2022 14:53
google-oss-prow bot pushed a commit that referenced this pull request Feb 15, 2022
…ors on release-0.13 (#1813)

* Fix default label for Training Operators

* Fix version comment

* Change the docs

* Change git command
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants