-
Notifications
You must be signed in to change notification settings - Fork 736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory management is not correct when using cgroup v2 #14190
Comments
@tajila fyi |
@babsingh Can you take a look at this one |
@babsingh I have got an internal sev1 case with the same issue.can you please have a look at it? |
Yes, I will take a look. |
As per eclipse-omr/omr#1281, we only added support for cgroup v1 in OMR: eclipse-omr/omr#1310. Support for cgroup v2 is still pending. @ashu-mehra Can you point us to any resources (design docs, unmerged code, etc.) which can be used to support cgroup v2 in OMR? |
@babsingh I didn't get any chance to explore cgroup v2 mechanism. I did create an issue for it but due to time constraint it never got worked on. The best resource I can refer to would be the man page for cgroups which has description for both versions and take it from there. |
@babsingh can you make a clear statement about the workaround? |
|
- Switch to Java 17 - Change default docker base image to `eclipse-temurin:17-jre` - Create OpenJ9 images based on `ibm-semeru-runtimes:open-17-jre` Note: The default `eclipse-temurin:17-jre` based images properly support cgroups v2 to limit cpu and memory resources. `1.0-SNAPSHOT-openj9` tagged images based on `ibm-semeru-runtimes:open-17-jre` don't yet, as of eclipse-openj9/openj9#14190
In JDK17+ extensions repo, there is a Java API which supports Cgroup V1 and V2: https://github.com/ibmruntimes/openj9-openjdk-jdk17/tree/openj9/src/java.base/linux/classes/jdk/internal/platform. There is a possibility to backport this API to Java 8 and Java 11 if this API can replace uses of OMR Cgroup native API within OpenJ9. This Java API has only one native dependency: OpenJ9 should be able to use this API since it has its own implementation of OpenJ9 uses the OMR Cgroup native API in the following locations:
Can we replace the OMR Cgroup native API with the extensions repo Cgroup Java API?Invoking the extensions repo Cgroup Java API in the above OpenJ9 locations may not be possible because the above locations won't allow execution of Java code through call-ins. Also, it may not be feasible to translate from the OMR Cgroup native API to the extensions repo Cgroup Java API (missing functionality). So, our best bet is to update the OMR Cgroup native API to support Cgroup v2. |
Not sure if the previous is actually a question or not, it's arrived at the right conclusion. The usage in jvminit.c occurs during option processing before java code is running. Similarly for GC and probably for JIT as well. |
From @mpirvu: I discovered that published Semeru container images do not load the embedded AOT code because of various AOT incompatibility issues. This happens because the AOT code is not in "portable" format. This is most likely due to the fact that Semeru images were built in an environment with cgroups v2 and OpenJ9 is not able to detect that is running in containers.
I am raising the priority of this issue since many customers running OpenJ9 in containers will not be able to use AOT and experience a start-up slowdown. As I understand, we do not have much control on the dockerhub process/environment that builds the images.
The problem is that we don't control the process of building the Semeru images. Once those images have been produced, they contain non-portable AOT. |
@mpirvu This work is being given high priority. We should have minimal support by April. This should address customer issues. Global tracker for the OMR implementation |
For the linked docker issue, is the openj9/runtime/gc_base/GCExtensions.cpp Lines 254 to 255 in cc586f0
Is this correct? |
As you pointed correctly in |
In The following was a docker with v1 cgroup:
To confirm, the memory limit was not set for this process (125):
So even with cgroup v1 the max heap size is 512M. I'm not sure this is because cgroups v2 is not implemented. |
@EricYangIBM Study the behaviour reported in ibmruntimes/ci.docker#124 (comment). @dmitripivkine In ibmruntimes/ci.docker#124 (comment), |
Maybe that container has a cgroup limit enforced? That would explain the heap extended by openj9/runtime/gc_base/GCExtensions.cpp Lines 254 to 255 in cc586f0
|
Yes, I think it is enforced via |
Task 6 for cgroup v2 is complete (reference: eclipse-omr/omr#1281 (comment)). This means that cgroup v2 has the same functionality as cgroup v1. We should be able to verify if customer issues are fixed. re #14190 (comment): @mpirvu Can you please confirm if the |
I will once nightly builds are available with the PR that was merged today |
Referring to eclipse-omr/omr#1281 (comment), the AOT issue should have been resolved by Tasks 1 and 2. Today's PR only updates the cgroup stats in the javacore. |
For the original issue, @EricYangIBM had verified that ibmruntimes/ci.docker#124 was resolved in eclipse-omr/omr#6432 (comment). So, it should also have been resolved by Tasks 1 and 2 in eclipse-omr/omr#1281 (comment). fyi @lgrateau |
I verified on a Ubuntu 22.04 machine that the latest nightly build of OpenJ9 correctly determines that is running in container and sees the correct CPU and memory limits (on the same machine, the existing OpenJ9 0.32.0 does not read these limits correctly) |
@tajila Why did we move this issue from the 0.33 to 0.34 release? As of now, cgroup v2 support is identical to cgroup v1 support. So, this issue can be closed once the end users verify the fix. |
Keeping it to track the tests changes. If you have another item for that we can close this. |
Those tests are OMR specific. In addition to those tests, all pending cgroup related OMR work is being tracked in eclipse-omr/omr#1281 (comment). This OpenJ9 issue was primarily opened to address cgroup related user issues. Since those issues have been resolved, we should close this issue. |
See ibmruntimes/ci.docker#124
The text was updated successfully, but these errors were encountered: