Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[extension/cgroupruntime] Be aware of ECS task and CPU limits #36920

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

r0mdau
Copy link

@r0mdau r0mdau commented Dec 21, 2024

Description

Allow the cgroupruntime extension to set GOMAXPROCS based on AWS ECS metadata. See related issue for detailed informations.

Link to tracking issue

Fixes #36814

Testing

Added integration test with httptest for the ECS metadata endpoint.

Something to clarify: #36617 (comment)

Documentation

Added extension name and link in the README.

Signed-off-by: Romain Dauby <romain.dauby@gmail.com>
Signed-off-by: Romain Dauby <romain.dauby@gmail.com>
Copy link
Contributor

@rogercoll rogercoll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this, the code lgtm, just left a couple of nits/question.

Comment on lines +46 to 57
var undo func()
var err error
if gomaxecs.IsECS() {
undo, err = gomaxecs.Set(gomaxecs.WithLogger(func(str string, params ...any) {
set.Logger.Debug(fmt.Sprintf(str, params))
}))
} else {
undo, err = maxprocs.Set(maxprocs.Logger(func(str string, params ...any) {
set.Logger.Debug(fmt.Sprintf(str, params))
}))
}
return undoFunc(undo), err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
var undo func()
var err error
if gomaxecs.IsECS() {
undo, err = gomaxecs.Set(gomaxecs.WithLogger(func(str string, params ...any) {
set.Logger.Debug(fmt.Sprintf(str, params))
}))
} else {
undo, err = maxprocs.Set(maxprocs.Logger(func(str string, params ...any) {
set.Logger.Debug(fmt.Sprintf(str, params))
}))
}
return undoFunc(undo), err
if gomaxecs.IsECS() {
return gomaxecs.Set(gomaxecs.WithLogger(func(str string, params ...any) {
set.Logger.Debug(fmt.Sprintf(str, params))
}))
} else {
return maxprocs.Set(maxprocs.Logger(func(str string, params ...any) {
set.Logger.Debug(fmt.Sprintf(str, params))
}))
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, today is nit day, wdyt ?

			if gomaxecs.IsECS() {
				return gomaxecs.Set(gomaxecs.WithLogger(func(str string, params ...any) {
					set.Logger.Debug(fmt.Sprintf(str, params))
				}))
			}
			return maxprocs.Set(maxprocs.Logger(func(str string, params ...any) {
				set.Logger.Debug(fmt.Sprintf(str, params))
			}))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me 🚀

@@ -1,10 +1,11 @@
module github.com/open-telemetry/opentelemetry-collector-contrib/extension/cgroupruntimeextension

go 1.22.0
go 1.22.4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, I thought that the component's Go version needed to updated all at once, but it seems there are some components already on go 1.22.7. 👍

@@ -220,6 +257,13 @@ func TestCgroupV2SudoIntegration(t *testing.T) {
})
require.NoError(t, err)

if test.setECSMetadataURI {
server := startMockECSServer()
defer server.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wdyt if we move this defer calls to the t.Cleanup function? That way is aligned with the other resources rollback strategy, and we are also safe on panics.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure how to implement it as I see multiple options. I need to clean only if test.setECSMetadataURI is true. Because I find it handy to start the mock http server only if this same bool is true.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, I think you could append the cleanup function inside the conditional. Something like:

			if test.setECSMetadataURI {
				server := startMockECSServer()
				os.Setenv("ECS_CONTAINER_METADATA_URI_V4", server.URL)
				t.Cleanup(func() {
					server.Close()
					os.Unsetenv("ECS_CONTAINER_METADATA_URI_V4")
				})
			}

@@ -63,6 +66,21 @@ func cgroupMaxCpu(filename string) (quota int64, period uint64, err error) {
return quota, period, err
}

func startMockECSServer() *httptest.Server {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the integration test interact with this server? Is it a dependency for gomaxecs.IsECS()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the gomaxecs library yes, it is making a http call to the ECS metadata uri.

Does it truly interacts ? I am not sure even for the github action CI, see my comment in the integration test PR.

Commands launched in Github action CI:

gotestsum --rerun-fails=1 --packages="./..." -- -race -timeout 360s -parallel 4 -skip Sudo -tags=integration,""
gotestsum --rerun-fails=1 --packages="./..." -- -race -timeout 360s -parallel 4 -skip Sudo -exec sudo -run Sudo -tags=integration,""

On my local workstation it is Skipped, compared to the github ci, I removed the -skip Sudo flag, the output:

$ cd extension/cgroupruntimeextension
$ ../../.tools/gotestsum --rerun-fails=1 --packages="./..." -- -race -timeout 360s -parallel 4 -tags=integration,""
∅  internal/metadata
✓  . (1.056s)

=== Skipped
=== SKIP: . TestCgroupV2SudoIntegration (0.00s)
    integration_test.go:48: System running in hybrid or cgroupv1 mode

DONE 15 tests, 1 skipped in 5.366s

I could run the tests in a docker container with some cgroups, open to any suggestion for making the integration test pass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you help me identify where the gomaxecs library does the http call to the ECS metadata URI? I quickly checked the IsECS method and brought me to this GetECSMetadataURI method, which just looks for the environment variable.

I could run the tests in a docker container with some cgroups, open to any suggestion for making the integration test pass.

Nice catch, just pushed this small fix to make them run #36941

It seems that a changelog file is missing (make ch-new) and the linting CI is complaining. Could you please take a look?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[extension/cgroupruntime] Be aware of ECS task and CPU limits
3 participants