Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add admitPod and PGController #165

Closed
wants to merge 1 commit into from

Conversation

wangyuqing4
Copy link
Contributor

@wangyuqing4 wangyuqing4 commented May 15, 2019

Which issue(s) this PR fixes :
Fixes #135 #134

Special notes for your reviewer:

  1. new func AdmitPod in admission controller

  2. new PGcontroller in controller

  3. delete Inqueue job phase

  4. fix UT

Release note:


1. add ValidatingWebhookConfiguration volcano-validate-pod, only limit CREATE pods, allow pods to create when:
- pod.spec.schedulerName is default-scheduler
- podgroup phase isn't Pending
- normal job, no podgroup

2. new PGcontroller, create pg for normal job when use kube-batch.

3. if create job, job phase will be Pending->Running...... , so fix UT

@k82cn
Copy link
Member

k82cn commented May 15, 2019

/cc @TommyLike @hzxuzhonghu

@hzxuzhonghu
Copy link
Collaborator

Have not yet look at the pr, serveral question:

new PGcontroller, create pg for normal job when use kube-batch

Is this for other users instead of volcano job controller? If the user do not use volcano job, how should he disable the job controller in volcano?

if create job, job phase will be Pending->Running...... , so fix UT

Missing the background, just remember the inqueue phase is added days ago, why remove it ?

@wangyuqing4
Copy link
Contributor Author

  1. user can use other controller to create pod

  2. the pr only delete job inqueue phase,pg inqueue phase is retained, can simplify job state convert

@wangyuqing4
Copy link
Contributor Author

/cc @lminzhw

@TommyLike
Copy link
Contributor

In order not to refactor this again, I think we need some clarifications/discussions on why we need do in this way. pasted as below, hope someone can explain this.

  1. Why we need admission to prevent creating pod? that sounds like part of the sync work, and can be easily moved into our controller logic, Place it here will make the controller keeping creating pods and deny, again and again. What's the benefits from using admission service here?

  2. PodGroupController sounds more like (original)JobController here, we need to watch job and create podgroup if needed.

@k82cn @wangyuqing4 @hzxuzhonghu

@k82cn
Copy link
Member

k82cn commented May 15, 2019

xref #135

Currently, we delay pod creattion in job controller which make it hard for two scenarios:

  1. vk.job can not work with other scheduler
  2. enqueue can not support other operators

@TommyLike
Copy link
Contributor

TommyLike commented May 15, 2019

xref #135

Currently, we delay pod creattion in job controller which make it hard for two scenarios:

  1. vk.job can not work with other scheduler
  2. enqueue can not support other operators

This can be addressed by removing enqueue state plus waiting for podgroup status if it's a volcano job while keep creating for the others. Anything missing?

@hzxuzhonghu
Copy link
Collaborator

PodGroupController sounds more like (original)JobController here, we need to watch job and create podgroup if needed.

Same as what I said, if this is for users who are only using kube-batch, this should be a separate component. I donot see it is necessary in volcano.

BTW, I guess the first question is also for kube-batch user only.

@k82cn
Copy link
Member

k82cn commented May 15, 2019

Same as what I said, if this is for users who are only using kube-batch, this should be a separate component. I donot see it is necessary in volcano.

hm... 'kube-batch' is part of volcano. Volcano should be open for others who only use scheduler part, e.g. kubeflow.

This can be addressed by removing enqueue state plus waiting for podgroup status if it's a volcano job while keep creating for the others. Anything missing?

not sure how that works :(, we should not bind controller & scheduler.

@TommyLike
Copy link
Contributor

Same as what I said, if this is for users who are only using kube-batch, this should be a separate component. I donot see it is necessary in volcano.

hm... 'kube-batch' is part of volcano. Volcano should be open for others who only use scheduler part, e.g. kubeflow.

This can be addressed by removing enqueue state plus waiting for podgroup status if it's a volcano job while keep creating for the others. Anything missing?

not sure how that works :(, we should not bind controller & scheduler.

hmm with a quick thought, the pseudo code would be:

function syncJob(job) {
//1. Create job related resources, for example, service, plugin, configmap
//2. Create pods logic as below.
if IsVolcanoJob() && ! IsPodGroupEnqueue(job) {
   // Skip creating pods
} else {
  //Create pods as required.
}
}

@hzxuzhonghu
Copy link
Collaborator

hm... 'kube-batch' is part of volcano. Volcano should be open for others who only use scheduler part, e.g. kubeflow.

I know, i mean, if others not use Volcano Job, why do we enforce them to deploy all the volcano controllers? We should allow to specify which controller to start.

@k82cn
Copy link
Member

k82cn commented May 15, 2019

why do we enforce them to deploy all the volcano controllers? We should allow to specify which controller to start.

I'd like to control the number of binaries; that should be ok if they do not create volcano Job by yaml file :)

@k82cn
Copy link
Member

k82cn commented May 15, 2019

hmm with a quick thought, the pseudo code would be:

PodGroupController does not create pods.

@TommyLike
Copy link
Contributor

why do we enforce them to deploy all the volcano controllers? We should allow to specify which controller to start.

I'd like to control the number of binaries; that should be ok if they do not create volcano Job by yaml file :)

hmm with a quick thought, the pseudo code would be:

PodGroupController does not create pods.

I was mean in the job controller, but I think I get the point now, the new added logic in admission service is used to block ANY pods that are created with non-default scheduler? This sounds like a pure kube-batch logic but we address this in volcano?

@k82cn
Copy link
Member

k82cn commented May 15, 2019

kube-batch is part of volcano!!! * 3 :)

Copy link
Contributor

@TommyLike TommyLike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

pkg/controllers/podgroup/pg_controller_handler.go Outdated Show resolved Hide resolved
pkg/controllers/podgroup/pg_controller.go Outdated Show resolved Hide resolved
cmd/controllers/app/server.go Outdated Show resolved Hide resolved
pkg/controllers/job/job_controller_actions.go Outdated Show resolved Hide resolved
pkg/admission/admit_pod.go Outdated Show resolved Hide resolved
pkg/apis/helpers/helpers.go Outdated Show resolved Hide resolved
@k82cn
Copy link
Member

k82cn commented May 17, 2019

please resolve the conflict.

@volcano-sh-bot volcano-sh-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 27, 2019
@volcano-sh-bot volcano-sh-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 15, 2019
@TravisBuddy
Copy link

Hey @wangyuqing4,
Something went wrong with the build.

TravisCI finished with status errored, which means the build failed because of something unrelated to the tests, such as a problem with a dependency or the build process itself.

View build log

TravisBuddy Request Identifier: 8ac87940-a6bb-11e9-a3be-7d74802580dc

@volcano-sh-bot volcano-sh-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 15, 2019
@volcano-sh-bot volcano-sh-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 16, 2019
@TommyLike
Copy link
Contributor

lgtm, but we need other reviews since it's a huge change.

/assign @hzxuzhonghu

@@ -44,6 +44,7 @@ type Config struct {
PrintVersion bool
AdmissionServiceName string
AdmissionServiceNamespace string
SchedulerName string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better to be an label selector; schedulerName is not enough.

@@ -177,6 +183,42 @@ func RegisterWebhooks(c *Config, clienset *kubernetes.Clientset, cabundle []byte
return err
}

// Prepare validate pods
path = "/pods"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest /validating-pods, should consider if we need mutating in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like job,
image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

legacy

cmd/admission/app/server.go Outdated Show resolved Hide resolved
hack/e2e-admission-config.yaml Outdated Show resolved Hide resolved
installer/helm/chart/volcano/values.yaml Outdated Show resolved Hide resolved
pkg/admission/admission_controller.go Outdated Show resolved Hide resolved
return true
}
return false
default:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another case, when delete event miss, please refer to DeletedFinalStateUnknown

nvm, it only handles pod add

pkg/controllers/podgroup/pg_controller.go Show resolved Hide resolved
pkg/controllers/podgroup/pg_controller.go Outdated Show resolved Hide resolved
ObjectMeta: metav1.ObjectMeta{
Namespace: pod.Namespace,
Name: pgName,
OwnerReferences: pod.OwnerReferences,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not understand why set pg's OwnerReferences same with pod's

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

show the pg and pod belong the same thing

},
}

if _, err := cc.kbClients.SchedulingV1alpha1().PodGroups(pod.Namespace).Create(pg); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously PG is created by job controller, right?

Why also create it here? If a pod already created, is there a need to create a PG with MinMember equals 1? The pod maybe already running.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the queue, isn't vk job but use volcano scheduler, so create pg here.

@hzxuzhonghu
Copy link
Collaborator

overall lgtm

pkg/admission/admit_pod.go Outdated Show resolved Hide resolved
@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Jul 17, 2019
@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: hzxuzhonghu, wangyuqing4
To complete the pull request process, please assign hex108
You can assign the PR to them by writing /assign @hex108 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hzxuzhonghu
Copy link
Collaborator

@k82cn any other comments

if job.Status.State.Phase != "" {
return job, nil
return false, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove bool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the func initJobStatus in createJob need UpdateStatus, if remove bool, will go on UpdateStatus and return err

@k82cn
Copy link
Member

k82cn commented Jul 20, 2019

let split this huge PR into some smaller ones:

  1. PodGroup Controller
  2. Remove ShadowPodGroup in scheduler
  3. Add admission controller
  4. Remove inqueue state in job & add e2e test for deployment

Each PR should include UT for it.

@volcano-sh-bot volcano-sh-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 20, 2019
@volcano-sh-bot
Copy link
Contributor

@wangyuqing4: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm Indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor Delay Pod Creation by admission controller
7 participants