Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for adjusting OOM score adjustment. #94

Merged
merged 1 commit into from
Aug 19, 2024

Conversation

zxj874478410
Copy link
Contributor

@zxj874478410 zxj874478410 commented Jul 10, 2024

The OomScoreAdj field is added to the LinuxContainerAdjustment structure of the NRI plugin. This field records the value of oom_score_adj that needs to be adjusted during CreateContainer. Added the appropriate set method and the method to write back to the NRI response.
Fix #92

TEST METHOD:
We build a NRI plugin with CreateContainer function like this:

func (p *plugin) CreateContainer(_ context.Context, pod *api.PodSandbox, ctr *api.Container) (*api.ContainerAdjustment, []*api.ContainerUpdate, error) {
	log.Infof("Creating container %s/%s/%s...", pod.GetNamespace(), pod.GetName(), ctr.GetName())

	//
	// This is the container creation request handler. Because the container
	// has not been created yet, this is the lifecycle event which allows you
	// the largest set of changes to the container's configuration, including
	// some of the later immautable parameters. Take a look at the adjustment
	// functions in pkg/api/adjustment.go to see the available controls.
	//
	// In addition to reconfiguring the container being created, you are also
	// allowed to update other existing containers. Take a look at the update
	// functions in pkg/api/update.go to see the available controls.
	//

	adjustment := &api.ContainerAdjustment{}

	adjustment.SetLinuxOomScoreAdj(121)
	log.Infof("set oom score: %v", 121)
	return adjustment, nil, nil
}

Then run this NRI plugin with containerd. Create a k8s pod of apache.

[root@master bin]# kubectl get po -A |grep apache
kube-system    apache-66f99dd558-c9bgm          1/1     Running   0               4h5m

Run the docker inspect command to query the process ID of the Apache container is 3445531.

At last, query the oom_score_adj of this pid. It has been set to 121 successfully.

[root@master bin]# cat /proc/3445531/oom_score_adj
121

pkg/api/api.proto Outdated Show resolved Hide resolved
pkg/api/api.proto Outdated Show resolved Hide resolved
Copy link
Member

@klihub klihub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank's for the PR @zxj874478410 ! I have a few nits, but otherwise it looks good to me.

@zxj874478410
Copy link
Contributor Author

Thank's for the PR @zxj874478410 ! I have a few nits, but otherwise it looks good to me.

Thanks for the review @klihub . I've fixed these faults.

pkg/api/adjustment.go Outdated Show resolved Hide resolved
Copy link

@chenzhiwei chenzhiwei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@klihub klihub self-requested a review July 11, 2024 13:06
@klihub klihub changed the title NRI plugins support adjust oom_score_adj NRI support for adjusting OOM score adjustment. Jul 12, 2024
@klihub klihub changed the title NRI support for adjusting OOM score adjustment. Add support for adjusting OOM score adjustment. Jul 12, 2024
klihub
klihub previously approved these changes Jul 19, 2024
Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if we should be able to nil out the oooscoreadj

pkg/api/adjustment.go Outdated Show resolved Hide resolved
pkg/runtime-tools/generate/generate.go Show resolved Hide resolved
pkg/api/adjustment.go Outdated Show resolved Hide resolved
func (g *Generator) AdjustOomScoreAdj(score *nri.OptionalInt) {
if score != nil {
g.SetProcessOOMScoreAdj(int(score.Value))
}
Copy link
Contributor Author

@zxj874478410 zxj874478410 Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! After we set OomScoreAdj to nil, We can handle the nil here.
I try to call the func g.initConfigProcess() by calling func g.SetProcessOOMScoreAdj(0), and then set g.Config.Process.OOMScoreAdj to nil. Will it work? @mikebrow

Suggested change
}
} else {
g.SetProcessOOMScoreAdj(0)
g.Config.Process.OOMScoreAdj = nil
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm let's try it / test it out

Copy link
Contributor Author

@zxj874478410 zxj874478410 Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tested it with the following code

oomScoreAdj:= 555
adjustment.SetLinuxOomScoreAdj(&oomScoreAdj)
adjustment.SetLinuxOomScoreAdj(nil)

The second set unset the oomscoreadj with the api. Finally, the oom_score_adj is set to 996 by kubelet.

cat /proc/2213925/oom_score_adj
996

This is work! @mikebrow

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code path appears to be the source of #117. @mikebrow, do we need to support a plugin both setting an explicit adjustment and then clearing its own adjustment?

Copy link
Member

@mikebrow mikebrow Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nod.. while I think it's a valid path to unset given 0 / 999 etc.. is not unset.. we need to ensure we are not accidentally unsetting a valid oom from kubelet. @klihub FYI

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

manual:

           Sets the adjustment value for the Linux kernel's
           Out-Of-Memory (OOM) killer score for executed processes.
           Takes an integer between -1000 (to disable OOM killing of
           processes of this unit) and 1000 (to make killing of
           processes of this unit under memory pressure very likely).
           See The /proc Filesystem[6] for details. If not specified
           defaults to the OOM score adjustment level of the service
           manager itself, which is normally at 0.

note behavior normally 0 when unset, but is configurable making unset a unique state

@zxj874478410 zxj874478410 force-pushed the main-dev branch 2 times, most recently from 94de1db to f3f0bd7 Compare August 6, 2024 02:05
Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zxj874478410 zxj874478410 force-pushed the main-dev branch 2 times, most recently from 1712c1a to 83f6ab0 Compare August 7, 2024 02:16
@zxj874478410
Copy link
Contributor Author

zxj874478410 commented Aug 7, 2024

@klihub @mikebrow Just rebased code and fixed conflicts, please take a look, thanks.

Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Xiaojin Zhang <874478410@qq.com>
@zxj874478410
Copy link
Contributor Author

LGTM

@zxj874478410
Copy link
Contributor Author

@klihub @mikebrow Sorry for conflict again. Just fixed conflicts, please take a look, thanks.

@zxj874478410 zxj874478410 reopened this Aug 19, 2024
@klihub
Copy link
Member

klihub commented Aug 19, 2024

@zxj874478410 This fix is not sufficient in it's current form as it breaks replacing a mount with another one to the same destination. I think we'd need a bit of additional code to prevent that from happening. Maybe something like this, where 8f509e7 is an updated version of your commit, and 7ea0a47 adds a new test case for mount removal.

@klihub klihub dismissed their stale review August 19, 2024 10:42

@zxj874478410 PTAL at the latest review comments and the suggested fixes/improvements.

@zxj874478410
Copy link
Contributor Author

@zxj874478410 This fix is not sufficient in it's current form as it breaks replacing a mount with another one to the same destination. I think we'd need a bit of additional code to prevent that from happening. Maybe something like this, where 8f509e7 is an updated version of your commit, and 7ea0a47 adds a new test case for mount removal.

Thanks for the detailed instructions, but does this problem refer to another pr #87 ? This one refers to OOM score adjustment.

@klihub
Copy link
Member

klihub commented Aug 19, 2024

@zxj874478410 This fix is not sufficient in it's current form as it breaks replacing a mount with another one to the same destination. I think we'd need a bit of additional code to prevent that from happening. Maybe something like this, where 8f509e7 is an updated version of your commit, and 7ea0a47 adds a new test case for mount removal.

Thanks for the detailed instructions, but does this problem refer to another pr #87 ? This one refers to OOM score adjustment.

Argh... Sorry about that. Yes, you are absolutely right, it is about PR #87. Time to renew my glasses or get a brain transplant...

@klihub klihub self-requested a review August 19, 2024 11:40
Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zxj874478410

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support adjust oom_Score_adj for NRI plugins
5 participants