-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[extension/cgroupruntime]: Initial implementation #35472
[extension/cgroupruntime]: Initial implementation #35472
Conversation
Manually tested with a modified version of the OpenTelemetry collector that exposes the internal runtime metrics: https://pkg.go.dev/runtime/metrics Deployed with very limited cgroup resources:
Resulting runtime metrics:
|
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
@mx-psi would you still be able to sponsor this new component? |
@rogercoll Yes! Happy to review this whenever you let me know it is ready |
@mx-psi Great! Moved the PR to ready for review, either for this PR or for a follow-up I think it would be great to have a way to create and run the corresponding integration tests. Thanks! |
Extension "info" logs:
|
8496b92
to
1f0c26f
Compare
Co-authored-by: Pablo Baeyens <pbaeyens31+github@gmail.com>
Co-authored-by: Pablo Baeyens <pbaeyens31+github@gmail.com>
I think this can be merged without the integration tests, but some sort of tests should be a requirement for inclusion in the contrib distro. @rogercoll Could you file a separate issue for this so we can track it? |
Makes sense to me, tracking issue #36545 |
**Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> This PR adds the initial implementation of a new component to dynamically set the values of `GOMEMLIMIT` and `GOMAXPROCS` used by the Go runtime. Those values are normally manually aligned with the cgroup resource limit to prevent cpu throttling or out of memory scenarios. The component would ease the manual steps of configuring these environment variables in K8s deployments (e.g Helm [templates](https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-collector/templates/_helpers.tpl#L169)) in addition to have fine-grained values (e.g. 90% of the resource memory limits). **Link to tracking Issue:** <Issue number if applicable> open-telemetry#30289 **Testing:** <Describe what testing was performed and which tests were added.> Unit testing for the component has been added (config and extension start/stop). But ideally, an integration test that actually asserts the runtime modifications should be added as well. The extension relies on "github.com/KimMachineGun/automemlimit/memlimit" and "go.uber.org/automaxprocs/maxprocs" packages for the runtime modifications, but they don't provide a way to mock the "cgroups" file system which is the one they read to get the resource quota limits. - Automemlimit package tests expect to run in a cgroup environment: https://github.com/KimMachineGun/automemlimit/blob/main/memlimit/cgroups_test.go#L18 - Automaxprocs does not expose the cpu quota retrieval https://github.com/uber-go/automaxprocs/blob/master/maxprocs/maxprocs.go#L41 Any suggestion on how to perform this integration tests in the contrib repository? One possibility is to use the https://github.com/containerd/cgroups package to set the quota, but this requires privileged permissions (also in the GHA) **Documentation:** <Describe the documentation added.> --------- Co-authored-by: Pablo Baeyens <pbaeyens31+github@gmail.com>
**Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> This PR adds the initial implementation of a new component to dynamically set the values of `GOMEMLIMIT` and `GOMAXPROCS` used by the Go runtime. Those values are normally manually aligned with the cgroup resource limit to prevent cpu throttling or out of memory scenarios. The component would ease the manual steps of configuring these environment variables in K8s deployments (e.g Helm [templates](https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-collector/templates/_helpers.tpl#L169)) in addition to have fine-grained values (e.g. 90% of the resource memory limits). **Link to tracking Issue:** <Issue number if applicable> open-telemetry#30289 **Testing:** <Describe what testing was performed and which tests were added.> Unit testing for the component has been added (config and extension start/stop). But ideally, an integration test that actually asserts the runtime modifications should be added as well. The extension relies on "github.com/KimMachineGun/automemlimit/memlimit" and "go.uber.org/automaxprocs/maxprocs" packages for the runtime modifications, but they don't provide a way to mock the "cgroups" file system which is the one they read to get the resource quota limits. - Automemlimit package tests expect to run in a cgroup environment: https://github.com/KimMachineGun/automemlimit/blob/main/memlimit/cgroups_test.go#L18 - Automaxprocs does not expose the cpu quota retrieval https://github.com/uber-go/automaxprocs/blob/master/maxprocs/maxprocs.go#L41 Any suggestion on how to perform this integration tests in the contrib repository? One possibility is to use the https://github.com/containerd/cgroups package to set the quota, but this requires privileged permissions (also in the GHA) **Documentation:** <Describe the documentation added.> --------- Co-authored-by: Pablo Baeyens <pbaeyens31+github@gmail.com>
**Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> This PR adds the initial implementation of a new component to dynamically set the values of `GOMEMLIMIT` and `GOMAXPROCS` used by the Go runtime. Those values are normally manually aligned with the cgroup resource limit to prevent cpu throttling or out of memory scenarios. The component would ease the manual steps of configuring these environment variables in K8s deployments (e.g Helm [templates](https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-collector/templates/_helpers.tpl#L169)) in addition to have fine-grained values (e.g. 90% of the resource memory limits). **Link to tracking Issue:** <Issue number if applicable> open-telemetry#30289 **Testing:** <Describe what testing was performed and which tests were added.> Unit testing for the component has been added (config and extension start/stop). But ideally, an integration test that actually asserts the runtime modifications should be added as well. The extension relies on "github.com/KimMachineGun/automemlimit/memlimit" and "go.uber.org/automaxprocs/maxprocs" packages for the runtime modifications, but they don't provide a way to mock the "cgroups" file system which is the one they read to get the resource quota limits. - Automemlimit package tests expect to run in a cgroup environment: https://github.com/KimMachineGun/automemlimit/blob/main/memlimit/cgroups_test.go#L18 - Automaxprocs does not expose the cpu quota retrieval https://github.com/uber-go/automaxprocs/blob/master/maxprocs/maxprocs.go#L41 Any suggestion on how to perform this integration tests in the contrib repository? One possibility is to use the https://github.com/containerd/cgroups package to set the quota, but this requires privileged permissions (also in the GHA) **Documentation:** <Describe the documentation added.> --------- Co-authored-by: Pablo Baeyens <pbaeyens31+github@gmail.com>
Description:
This PR adds the initial implementation of a new component to dynamically set the values of
GOMEMLIMIT
andGOMAXPROCS
used by the Go runtime. Those values are normally manually aligned with the cgroup resource limit to prevent cpu throttling or out of memory scenarios.The component would ease the manual steps of configuring these environment variables in K8s deployments (e.g Helm templates) in addition to have fine-grained values (e.g. 90% of the resource memory limits).
Link to tracking Issue: #30289
Testing: Unit testing for the component has been added (config and extension start/stop). But ideally, an integration test that actually asserts the runtime modifications should be added as well. The extension relies on "github.com/KimMachineGun/automemlimit/memlimit" and "go.uber.org/automaxprocs/maxprocs" packages for the runtime modifications, but they don't provide a way to mock the "cgroups" file system which is the one they read to get the resource quota limits.
Any suggestion on how to perform this integration tests in the contrib repository? One possibility is to use the https://github.com/containerd/cgroups package to set the quota, but this requires privileged permissions (also in the GHA)
Documentation: