Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory management is not correct when using cgroup v2 #14190

Closed
pshipton opened this issue Dec 20, 2021 · 26 comments
Closed

Memory management is not correct when using cgroup v2 #14190

pshipton opened this issue Dec 20, 2021 · 26 comments

Comments

@pshipton
Copy link
Member

See ibmruntimes/ci.docker#124

@pshipton
Copy link
Member Author

@tajila fyi

@tajila
Copy link
Contributor

tajila commented Jan 5, 2022

@babsingh Can you take a look at this one

@sharanpatil123
Copy link

@babsingh I have got an internal sev1 case with the same issue.can you please have a look at it?

@babsingh
Copy link
Contributor

Yes, I will take a look.

@babsingh
Copy link
Contributor

As per eclipse-omr/omr#1281, we only added support for cgroup v1 in OMR: eclipse-omr/omr#1310. Support for cgroup v2 is still pending. @ashu-mehra Can you point us to any resources (design docs, unmerged code, etc.) which can be used to support cgroup v2 in OMR?

@ashu-mehra
Copy link
Contributor

@babsingh I didn't get any chance to explore cgroup v2 mechanism. I did create an issue for it but due to time constraint it never got worked on. The best resource I can refer to would be the man page for cgroups which has description for both versions and take it from there.

@DanHeidinga
Copy link
Member

@babsingh can you make a clear statement about the workaround?

@babsingh
Copy link
Contributor

babsingh commented Jan 11, 2022

groldan added a commit to groldan/geoserver-cloud that referenced this issue Feb 23, 2022
- Switch to Java 17
- Change default docker base image to `eclipse-temurin:17-jre`
- Create OpenJ9 images based on `ibm-semeru-runtimes:open-17-jre`

Note:
The default `eclipse-temurin:17-jre` based images properly support
cgroups v2 to limit cpu and memory resources.

`1.0-SNAPSHOT-openj9` tagged images based on
`ibm-semeru-runtimes:open-17-jre` don't yet, as
of  eclipse-openj9/openj9#14190
@babsingh
Copy link
Contributor

babsingh commented Feb 25, 2022

In JDK17+ extensions repo, there is a Java API which supports Cgroup V1 and V2: https://github.com/ibmruntimes/openj9-openjdk-jdk17/tree/openj9/src/java.base/linux/classes/jdk/internal/platform. There is a possibility to backport this API to Java 8 and Java 11 if this API can replace uses of OMR Cgroup native API within OpenJ9.

This Java API has only one native dependency: CgroupMetrics.isUseContainerSupport -> JVM_IsUseContainerSupport.

OpenJ9 should be able to use this API since it has its own implementation of JVM_IsUseContainerSupport.

OpenJ9 uses the OMR Cgroup native API in the following locations:

1. runtime/rasdump/javadump.cpp (writeCgroupMetrics)
	OMR Cgroup functions used:
		- omrsysinfo_cgroup_get_enabled_subsystems
		- omrsysinfo_cgroup_is_system_available
		- omrsysinfo_get_cgroup_subsystem_list
		- omrsysinfo_cgroup_subsystem_iterator_init
		- omrsysinfo_cgroup_subsystem_iterator_hasNext
		- omrsysinfo_cgroup_subsystem_iterator_metricKey
		- omrsysinfo_cgroup_subsystem_iterator_destroy

	OMR Cgroup data structs used:
		- OMRCgroupEntry
		- OMRCgroupMetricIteratorState
		- OMRCgroupMetricElement

2. runtime/gc_base/GCExtensions.cpp
	OMR Cgroup functions used:
		- omrsysinfo_cgroup_are_subsystems_enabled
		- omrsysinfo_cgroup_is_memlimit_set

3. runtime/vm/jvminit.c
	OMR Cgroup functions used:
		- omrsysinfo_cgroup_enable_subsystems
		- omrsysinfo_cgroup_get_available_subsystems

4. runtime/compiler/control/CompilationThread.cpp
	OMR Cgroup functions used:
		- omrsysinfo_cgroup_are_subsystems_enabled (OMR_CGROUP_SUBSYSTEM_MEMORY)

Can we replace the OMR Cgroup native API with the extensions repo Cgroup Java API?

Invoking the extensions repo Cgroup Java API in the above OpenJ9 locations may not be possible because the above locations won't allow execution of Java code through call-ins. Also, it may not be feasible to translate from the OMR Cgroup native API to the extensions repo Cgroup Java API (missing functionality). So, our best bet is to update the OMR Cgroup native API to support Cgroup v2.

@pshipton
Copy link
Member Author

Not sure if the previous is actually a question or not, it's arrived at the right conclusion. The usage in jvminit.c occurs during option processing before java code is running. Similarly for GC and probably for JIT as well.

@babsingh
Copy link
Contributor

From @mpirvu:

I discovered that published Semeru container images do not load the embedded AOT code because of various AOT incompatibility issues. This happens because the AOT code is not in "portable" format. This is most likely due to the fact that Semeru images were built in an environment with cgroups v2 and OpenJ9 is not able to detect that is running in containers.
I used a machine with cgroups v2 and generated a javacore while running in containers. The javacore showed:

1CICONTINFO    Running in container : FALSE
1CICGRPINFO    JVM support for cgroups enabled : FALSE

I am raising the priority of this issue since many customers running OpenJ9 in containers will not be able to use AOT and experience a start-up slowdown. As I understand, we do not have much control on the dockerhub process/environment that builds the images.

There is a temporary workaround for this issue: #14190 (comment). Docker containers can be switched back to cgroup v1 since both cgroup v1 and v2 should be available in newer containers.

The problem is that we don't control the process of building the Semeru images. Once those images have been produced, they contain non-portable AOT.

@babsingh
Copy link
Contributor

@mpirvu This work is being given high priority. We should have minimal support by April. This should address customer issues.

Global tracker for the OMR implementation

eclipse-omr/omr#1281

@EricYangIBM
Copy link
Contributor

For the linked docker issue, is the Max. Heap Size (Estimated): 512.00M being calculated from j9gc_get_maximum_heap_size? Where memoryMax is calculated at https://github.com/eclipse/omr/blob/644b9078b90a441a73da0ab9da66dc60d283033a/gc/base/GCExtensionsBase.cpp#L305-L324
and is assigned a maximum of 512M, then in MM_GCExtensions::computeDefaultMaxHeapForJava gets expanded if cgroup is enabled:

memoryMax = (uintptr_t)OMR_MAX((int64_t)(usablePhysicalMemory / 2), (int64_t)(usablePhysicalMemory - OPENJ9_IN_CGROUP_NATIVE_FOOTPRINT_EXCLUDING_HEAP));
memoryMax = (uintptr_t)OMR_MIN(memoryMax, (usablePhysicalMemory / 4) * 3);

Is this correct?

@dmitripivkine
Copy link
Contributor

As you pointed correctly in MM_GCExtensions::computeDefaultMaxHeapForJava() we set memoryMax for cgroup and adjust it later to 25% of available RAM if it is larger (for Java 11 and up only, Java 8 has hardcoded 512m limit). I guess you should not touch this logic but make omrsysinfo_cgroup_are_subsystems_enabled() and omrsysinfo_cgroup_is_memlimit_set() work for cgroup v2

@EricYangIBM
Copy link
Contributor

In omrsysinfo_cgroup_is_memlimit_set: https://github.com/eclipse/omr/blob/644b9078b90a441a73da0ab9da66dc60d283033a/port/unix/omrsysinfo.c#L5967-L5974
My cgroups v1 machine had a memory.limit_in_bytes as 9223372036854771712 for a simple java test process which would trigger this error. So basically if the cgroup memlimit is not set we don't expand the heap size (the usablePhysicalMemory was not obtained from the cgroup memory limit file).

The following was a docker with v1 cgroup:

root@658e20312005:~/hostdir/openj9-openjdk-jdk8# build/linux-x86_64-normal-server-release/images/j2sdk-image/bin/java -XshowSettings:vm -XX:+OriginalJDK8HeapSizeCompatibilityMode -version
VM settings:
    Max. Heap Size (Estimated): 512.00M
...

To confirm, the memory limit was not set for this process (125):

root@658e20312005:/# cat /sys/fs/cgroup/memory/cgroup.procs 
1
115
125
153
root@658e20312005:/# cat /sys/fs/cgroup/memory/memory.limit_in_bytes 
9223372036854771712

So even with cgroup v1 the max heap size is 512M. I'm not sure this is because cgroups v2 is not implemented.

@babsingh
Copy link
Contributor

babsingh commented Mar 22, 2022

So even with cgroup v1 the max heap size is 512M. I'm not sure this is because cgroups v2 is not implemented.

@EricYangIBM Study the behaviour reported in ibmruntimes/ci.docker#124 (comment).

@dmitripivkine In ibmruntimes/ci.docker#124 (comment), Max. Heap Size (Estimated): 3.00G is shown for J9 Java8. The hardcoded 512m limit is not being enforced for Java8.

@EricYangIBM
Copy link
Contributor

Maybe that container has a cgroup limit enforced? That would explain the heap extended by

memoryMax = (uintptr_t)OMR_MAX((int64_t)(usablePhysicalMemory / 2), (int64_t)(usablePhysicalMemory - OPENJ9_IN_CGROUP_NATIVE_FOOTPRINT_EXCLUDING_HEAP));
memoryMax = (uintptr_t)OMR_MIN(memoryMax, (usablePhysicalMemory / 4) * 3);

@babsingh
Copy link
Contributor

Maybe that container has a cgroup limit enforced?

Yes, I think it is enforced via docker run -m 4GB.

@babsingh
Copy link
Contributor

Task 6 for cgroup v2 is complete (reference: eclipse-omr/omr#1281 (comment)). This means that cgroup v2 has the same functionality as cgroup v1. We should be able to verify if customer issues are fixed.

re #14190 (comment): @mpirvu Can you please confirm if the not be able to use AOT and experience a start-up slowdown issue is resolved?

@mpirvu
Copy link
Contributor

mpirvu commented May 17, 2022

Can you please confirm if the not be able to use AOT and experience a start-up slowdown issue is resolved?

I will once nightly builds are available with the PR that was merged today

@babsingh
Copy link
Contributor

I will once nightly builds are available with the PR that was merged today

Referring to eclipse-omr/omr#1281 (comment), the AOT issue should have been resolved by Tasks 1 and 2. Today's PR only updates the cgroup stats in the javacore.

@babsingh
Copy link
Contributor

babsingh commented May 17, 2022

For the original issue, @EricYangIBM had verified that ibmruntimes/ci.docker#124 was resolved in eclipse-omr/omr#6432 (comment). So, it should also have been resolved by Tasks 1 and 2 in eclipse-omr/omr#1281 (comment).

fyi @lgrateau

@mpirvu
Copy link
Contributor

mpirvu commented May 17, 2022

I verified on a Ubuntu 22.04 machine that the latest nightly build of OpenJ9 correctly determines that is running in container and sees the correct CPU and memory limits (on the same machine, the existing OpenJ9 0.32.0 does not read these limits correctly)

@babsingh
Copy link
Contributor

@tajila Why did we move this issue from the 0.33 to 0.34 release? As of now, cgroup v2 support is identical to cgroup v1 support. So, this issue can be closed once the end users verify the fix.

@tajila
Copy link
Contributor

tajila commented May 20, 2022

Keeping it to track the tests changes. If you have another item for that we can close this.

@babsingh
Copy link
Contributor

If you have another item for that we can close this.

Those tests are OMR specific. In addition to those tests, all pending cgroup related OMR work is being tracked in eclipse-omr/omr#1281 (comment).

This OpenJ9 issue was primarily opened to address cgroup related user issues. Since those issues have been resolved, we should close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants