Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

try get get old pg when new pg not exist #2400

Merged
merged 2 commits into from
Aug 1, 2022

Conversation

Akiqqqqqqq
Copy link

@Akiqqqqqqq Akiqqqqqqq commented Jul 30, 2022

Signed-off-by: Qiuyu Wu qiuyu.wu@shopee.com

fix:#2389

what happen:
after this feature #2140 applied, error comes when upgrading volcano from 1.4 to 1.6, details showed in #2389.

this is because volcano-job-controller creates extra pgs with job uid suffix for existing jobs, which are already running, and thus two pgs exist in cluster for one vj. when existing jobs are using over half of the cluster resource already, the new pgs with uid will cause inqueue get larger in proportion AddJobEnqueueableFn (line 334 in pkg/scheduler/plugins/proportion/proportion.go). after inqueue gets larger than realCapability, the new coming jobs will pend forever

@volcano-sh-bot volcano-sh-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jul 30, 2022
Qiuyu Wu added 2 commits July 30, 2022 10:42
Signed-off-by: Qiuyu Wu <qiuyu.wu@shopee.com>
Signed-off-by: Qiuyu Wu <qiuyu.wu@shopee.com>
@Thor-wl
Copy link
Contributor

Thor-wl commented Aug 1, 2022

Thanks for your contribution. I think it is important for users who upgrades volcano versions from below v1.6.0 to v1.6.0.

@Thor-wl
Copy link
Contributor

Thor-wl commented Aug 1, 2022

/lgtm

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Aug 1, 2022
Copy link
Collaborator

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall lgtm, just one comment i think the PG construction code can be resued, maybe abstract a buildPodGroup function

Copy link
Member

@william-wang william-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: william-wang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 1, 2022
@volcano-sh-bot volcano-sh-bot merged commit dba2d6e into volcano-sh:master Aug 1, 2022
william-wang pushed a commit to william-wang/volcano that referenced this pull request Aug 1, 2022
william-wang pushed a commit to william-wang/volcano that referenced this pull request Aug 1, 2022
try get get old pg when new pg not exist

Signed-off-by: william-wang <wang.platform@gmail.com>
QingyaFan added a commit to QingyaFan/volcano that referenced this pull request Aug 5, 2024
v1.5 changed the naming logics of pod group by adding UID into the name: volcano-sh#2140, and there is also another fix regarding handling the already created pod group without UID in create or update: volcano-sh#2400. But a similar fix does not exist in the syncJob function.

Fixes volcano-sh#3640
QingyaFan added a commit to QingyaFan/volcano that referenced this pull request Aug 5, 2024
v1.5 changed the naming logics of pod group by adding UID into the name,
and there is a fix handling the already created pod group without UID in
create or update: volcano-sh#2400. But similar fix does not exist in syncJob function.

Fixes volcano-sh#3640
QingyaFan added a commit to QingyaFan/volcano that referenced this pull request Aug 5, 2024
v1.5 changed the naming logics of pod group by adding UID into the name,
and there is a fix handling the already created pod group without UID in
create or update: volcano-sh#2400. But similar fix does not exist in syncJob function.

Fixes volcano-sh#3640

Signed-off-by: cheerfun <qingyafan@outlook.com>
QingyaFan added a commit to QingyaFan/volcano that referenced this pull request Aug 6, 2024
v1.5 changed the naming logics of pod group by adding UID into the name,
and there is a fix handling the already created pod group without UID in
create or update: volcano-sh#2400. But similar fix does not exist in syncJob function.

Fixes volcano-sh#3640

Signed-off-by: cheerfun <qingyafan@outlook.com>
QingyaFan added a commit to QingyaFan/volcano that referenced this pull request Aug 6, 2024
v1.5 changed the naming logics of pod group by adding UID into the name,
and there is a fix handling the already created pod group without UID in
create or update: volcano-sh#2400. But similar fix does not exist in syncJob function.

Fixes volcano-sh#3640

Signed-off-by: cheerfun <qingyafan@outlook.com>
QingyaFan added a commit to QingyaFan/volcano that referenced this pull request Aug 8, 2024
…#3640

v1.5 changed the naming logics of pod group by adding UID into the
name: volcano-sh#2140, and there is also another fix regarding handling the
already created pod group without UID in create or update: volcano-sh#2400.
But a similar fix does not exist in the syncJob function.

Fixes volcano-sh#3640
QingyaFan added a commit to QingyaFan/volcano that referenced this pull request Aug 8, 2024
…#3640

v1.5 changed the naming logics of pod group by adding UID into the
name: volcano-sh#2140, and there is also another fix regarding handling the
already created pod group without UID in create or update: volcano-sh#2400.
But a similar fix does not exist in the syncJob function.

Fixes volcano-sh#3640

Signed-off-by: cheerfun <qingyafan@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants