Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support leader election for katib-controller #1713

Conversation

tenzen-y
Copy link
Member

@tenzen-y tenzen-y commented Oct 13, 2021

What this PR does / why we need it:
Support leader election for katib-controller.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1705

Checklist:

  • Docs included if any changes are user facing

/assign
/assign @andreyvelich @gaocegege @johnugeorge

@coveralls
Copy link

coveralls commented Oct 13, 2021

Coverage Status

Coverage decreased (-0.05%) to 73.523% when pulling 7798d48 on tenzen-y:issue-1705-support-leader-election-for-katib-controller into 0a38e87 on kubeflow:master.

Copy link
Member

@gaocegege gaocegege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks for your contribution! 🎉 👍

@gaocegege
Copy link
Member

/hold

/cc @andreyvelich

@google-cla
Copy link

google-cla bot commented Oct 13, 2021

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@tenzen-y
Copy link
Member Author

@googlebot I consent.

@google-cla
Copy link

google-cla bot commented Oct 13, 2021

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@tenzen-y
Copy link
Member Author

Can you please comment with @googlebot I consent.? @gaocegege

@google-cla
Copy link

google-cla bot commented Oct 13, 2021

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@gaocegege
Copy link
Member

@googlebot I consent.

@tenzen-y
Copy link
Member Author

/retest

@tenzen-y tenzen-y force-pushed the issue-1705-support-leader-election-for-katib-controller branch from 0a3296d to c81858e Compare October 13, 2021 09:55
@tenzen-y
Copy link
Member Author

I fixed setup-katib in e2e.

job.batch/katib-cert-generator created
persistentvolumeclaim/katib-mysql created
validatingwebhookconfiguration.admissionregistration.k8s.io/katib.kubeflow.org created
Pod Status 0/7
All Katib components are running.
Katib deployments
NAME               READY   UP-TO-DATE   AVAILABLE   AGE
katib-controller   2/2     2            2           13s
katib-db-manager   1/1     1            1           13s
katib-mysql        0/1     1            0           13s
katib-ui           1/1     1            1           13s
pytorch-operator   1/1     1            1           15s
tf-job-operator    1/1     1            1           16s
job.batch/katib-cert-generator created
persistentvolumeclaim/katib-mysql created
validatingwebhookconfiguration.admissionregistration.k8s.io/katib.kubeflow.org created
Pod Status 0/8
Pod Status 6/8
Pod Status 6/8
All Katib components are running.
Katib deployments
NAME               READY   UP-TO-DATE   AVAILABLE   AGE
katib-controller   2/2     2            2           34s
katib-db-manager   1/1     1            1           34s
katib-mysql        0/1     1            0           34s
katib-ui           1/1     1            1           34s
pytorch-operator   1/1     1            1           36s
tf-job-operator    1/1     1            1           37s

@tenzen-y
Copy link
Member Author

These errors are caused by the pull rate limit in the docker hub.

TOOMANYREQUESTS: You have reached your pull rate limit. You may increase the limit by authentica
ting and upgrading: https://www.docker.com/increase-rate-limit

@tenzen-y
Copy link
Member Author

/retest

@johnugeorge
Copy link
Member

/lgtm
/approve

@google-oss-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gaocegege, johnugeorge, tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [gaocegege,johnugeorge]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pushing this @tenzen-y!
I left few comments.
/hold for the review

test/e2e/v1beta1/scripts/setup-katib.sh Outdated Show resolved Hide resolved
manifests/v1beta1/components/controller/controller.yaml Outdated Show resolved Hide resolved
@tenzen-y
Copy link
Member Author

I modified scripts for the e2e test and fixed directory structure for HA manifests.
Can you take a look changes? @andreyvelich

@tenzen-y tenzen-y force-pushed the issue-1705-support-leader-election-for-katib-controller branch 2 times, most recently from a48b4f1 to bd18869 Compare October 28, 2021 16:24
@tenzen-y
Copy link
Member Author

tenzen-y commented Oct 28, 2021

@andreyvelich
I've modified it based on your suggestions.
Can you check the changes, please?

If it looks good to you, I would like to create PR in the website repository to add docs for katib-leader-election to katib-setup section.

@tenzen-y tenzen-y force-pushed the issue-1705-support-leader-election-for-katib-controller branch from bd18869 to c55bd42 Compare October 28, 2021 16:40
Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the updates @tenzen-y!
Small comments from me.

tenzen-y and others added 3 commits October 29, 2021 07:19
…on-rbac.yaml

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
…on-rbac.yaml

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
…on-rbac.yaml

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
@tenzen-y tenzen-y force-pushed the issue-1705-support-leader-election-for-katib-controller branch from 3127f1f to ee8fd73 Compare October 28, 2021 22:22
@google-cla
Copy link

google-cla bot commented Oct 28, 2021

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@tenzen-y
Copy link
Member Author

Can you comment with @googlebot I consent., please? @andreyvelich

@google-cla
Copy link

google-cla bot commented Oct 28, 2021

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@andreyvelich
Copy link
Member

@googlebot I consent.

…on-rbac.yaml

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
@andreyvelich
Copy link
Member

Thank you for pushing this @tenzen-y!
Please submit a PR to update the Kubeflow website with this new install as you mentioned before: #1713 (comment).

/lgtm
/assign @gaocegege @johnugeorge

@johnugeorge
Copy link
Member

/approve

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gaocegege, johnugeorge, tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [gaocegege,johnugeorge]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@andreyvelich
Copy link
Member

I think we can merge it.
Thank you for your contribution @tenzen-y!
/hold cancel

@google-oss-robot google-oss-robot merged commit f802295 into kubeflow:master Nov 2, 2021
@tenzen-y tenzen-y deleted the issue-1705-support-leader-election-for-katib-controller branch November 3, 2021 03:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support leader-election (HA) for Katib Components
6 participants