Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add podgroup controller #401

Merged
merged 1 commit into from
Aug 1, 2019
Merged

Conversation

wangyuqing4
Copy link
Contributor

Fixes #134

if normal pod/job/...... use volcano scheduler, podgroup controller watch pod, can create podgroup.

@volcano-sh-bot volcano-sh-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jul 31, 2019
@hzxuzhonghu
Copy link
Collaborator

Can u add desc about the relation with #165 and #370 ? I donot know which one to review

@wangyuqing4
Copy link
Contributor Author

wangyuqing4 commented Jul 31, 2019

#165 is huge, so split smaller one, #401 is part 1, #370 is part 3. @hzxuzhonghu you can review #401 first.

image

@TravisBuddy
Copy link

Hey @wangyuqing4,
Something went wrong with the build.

TravisCI finished with status errored, which means the build failed because of something unrelated to the tests, such as a problem with a dependency or the build process itself.

View build log

TravisBuddy Request Identifier: 395cfd80-b34d-11e9-a522-656c855f12dd

@hzxuzhonghu
Copy link
Collaborator

ok

switch obj.(type) {
case *v1.Pod:
pod := obj.(*v1.Pod)
if pod.Spec.SchedulerName == "volcano" &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not hard code scheduler name here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

queue: workqueue.NewRateLimitingQueue(workqueue.DefaultControllerRateLimiter()),
}

cc.sharedInformers = informers.NewSharedInformerFactory(cc.kubeClients, 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One point: we should prevent duplicate informers in different controllers. It is very costly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok,follow-up pr will optimize the part,gc/queue/job/pg controller will delete same code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, please file an issue to track

Copy link
Collaborator

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think overall lgtm, but some nits

func (cc *Controller) Run(stopCh <-chan struct{}) {
go cc.sharedInformers.Start(stopCh)
go cc.podInformer.Informer().Run(stopCh)
go cc.pgInformer.Informer().Run(stopCh)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We donot need Run informers separately as sharedInformers Start will Run the informers created from informer factory.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same happens to other controllers

}

req := podRequest{
pod: pod,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, you just need the pod ns/name

func (cc *Controller) Run(stopCh <-chan struct{}) {
go cc.sharedInformers.Start(stopCh)
go cc.podInformer.Informer().Run(stopCh)
go cc.pgInformer.Informer().Run(stopCh)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc.sharedInformers.Star will run all its informers.

)

type podRequest struct {
pod *v1.Pod
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only need pod ns/name

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can not use pointer here: the same pod may have different object in cache.

// Run start NewPodgroupController
func (cc *Controller) Run(stopCh <-chan struct{}) {
go cc.sharedInformers.Start(stopCh)
go cc.podInformer.Informer().Run(stopCh)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if start sharedInformer, it's not necessary to start pod informer again.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing happens to other controllers

if pod.Annotations[scheduling.GroupNameAnnotationKey] == "" {
pod.Annotations[scheduling.GroupNameAnnotationKey] = pgName
} else {
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log an error message if pod.Annotations[scheduling.GroupNameAnnotationKey] != pgName.

ObjectMeta: metav1.ObjectMeta{
Namespace: pod.Namespace,
Name: pgName,
OwnerReferences: pod.OwnerReferences,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if pod's ownerReferences is empty, set PodGroup's ower references to this Pod for GC.

func generatePodgroupName(pod *v1.Pod) string {
pgName := vkbatchv1.PodgroupNamePrefix
if len(pod.OwnerReferences) != 0 {
pgName += string(pod.OwnerReferences[0].UID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should find the controlled owner

@TravisBuddy
Copy link

Hey @wangyuqing4,
Something went wrong with the build.

TravisCI finished with status errored, which means the build failed because of something unrelated to the tests, such as a problem with a dependency or the build process itself.

View build log

TravisBuddy Request Identifier: e001cd50-b43d-11e9-a5f8-ad46f5ea24b0


func newPGOwnerReferences(pod *v1.Pod) []metav1.OwnerReference {
if len(pod.OwnerReferences) != 0 {
return pod.OwnerReferences
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there maybe not controlled ownerference there.

@hzxuzhonghu
Copy link
Collaborator

/lgtm

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Aug 1, 2019
@k82cn
Copy link
Member

k82cn commented Aug 1, 2019

/approve

@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: k82cn, wangyuqing4

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 1, 2019
@volcano-sh-bot volcano-sh-bot merged commit 7243054 into volcano-sh:master Aug 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add PodGroupController to creat shadow PodGroup
5 participants