Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for dynamically adjusting scratch memory space of compilations #2546

Merged
merged 1 commit into from
Sep 19, 2018

Conversation

mpirvu
Copy link
Contributor

@mpirvu mpirvu commented Aug 2, 2018

The JIT is allowed to use up to 256 MB of 'scratch' memory for a compilation
while JSR292 compilations increase this limit to 1.5 GB. These limits create
footprint spikes during an application run which can make resource
provisioning more cumbersome.

This commit adds the following changes:

  1. The JIT will use portlib APIs to compute the amount of physical memory
    available (taking into consideration any container limits). This value
    is cached (to minimize API overhead) and recomputed with a desired
    frequency (0.5 sec by default)

  2. At the beginning of a compilation the JIT checks the amount of available
    physical memory and sets a limit for the scratch space accordingly. If the
    amount of physical memory is so low that a relatively cheap compilation cannot
    be performed, the compilation request is aborted right there.

  3. When a compilation thread needs to allocate a new scratch memory segment
    (of size 16 MB) it checks again the available physical memory. If this is low,
    the compilation is aborted and will be retried at a lower optimization level.
    At the same time a flag will be set which will make one compilation thread
    to suspend itself (however, the JIT always makes sure there is at least one
    compilation thread active).

Signed-off-by: Marius Pirvu mpirvu@ca.ibm.com

@fjeremic
Copy link
Contributor

fjeremic commented Aug 2, 2018

@dsouzai FYI for review.

@mpirvu mpirvu force-pushed the autoscratchlimit branch from f08383d to 49d1af5 Compare August 2, 2018 21:06
@mpirvu
Copy link
Contributor Author

mpirvu commented Aug 2, 2018

Note that at the moment the code is not ready to be merged (hence the WIP heading).
Also, the fix from eclipse-omr/omr#2785 is needed for this commit to work correctly.

@mpirvu
Copy link
Contributor Author

mpirvu commented Aug 2, 2018

@ashu-mehra FYI for review

// Compute free physical memory taking into account container limits
// To limit overhead, this function caches the computed value and
// only refreshes it if more than 0.5 sec have passed since the last update
// or if "now" parameter is true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the "now" parameter mentioned in this comment?

@dsouzai
Copy link
Contributor

dsouzai commented Aug 3, 2018

The changes lgtm.

@fjeremic fjeremic self-assigned this Aug 20, 2018
@fjeremic
Copy link
Contributor

@mpirvu what is the state of this PR?

@mpirvu
Copy link
Contributor Author

mpirvu commented Sep 12, 2018

I just had a stab at it last night but in my local experiments I found the j9sysinfo_get_memory_info API to give the same results for host and container memory (which is wrong). I had a discussion with Ashutosh and the API seems to be working fine for him. Now it's on me to determine why j9sysinfo_get_memory_info gives me bad information.

@mpirvu
Copy link
Contributor Author

mpirvu commented Sep 13, 2018

With some instrumentation I determined that
rc = readCgroupSubsystemFile(portLibrary, OMR_CGROUP_SUBSYSTEM_MEMORY, CGROUP_MEMORY_SWAP_LIMIT_IN_BYTES_FILE, numItemsToRead, "%lu", &cgroupMemInfo->memoryAndSwapLimit);
returns an error code and therefore retrieveLinuxCgroupMemoryStats returns error even though the memoryLimit and memoryUsage were correctly read.

@ashu-mehra
In my opinion retrieveLinuxCgroupMemoryStats should always return 'success' and the validity of various memory statistics can be determined by comparison with OMRPORT_MEMINFO_NOT_AVAILABLE

@mpirvu
Copy link
Contributor Author

mpirvu commented Sep 13, 2018

I should also mention that on two Ubuntu 16.04 systems where I tried, when launching a container with a memory limit I get this warning:
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.

@mpirvu
Copy link
Contributor Author

mpirvu commented Sep 14, 2018

Jenkins test all all jdk8

@mpirvu
Copy link
Contributor Author

mpirvu commented Sep 14, 2018

Could I get another review please: @ashu-mehra @dsouzai

@mpirvu mpirvu changed the title WIP: Support for dynamically adjusting scratch memory space of compilations Support for dynamically adjusting scratch memory space of compilations Sep 14, 2018
if (memInfo.buffered != OMRPORT_MEMINFO_NOT_AVAILABLE)
freePhysicalMemory += memInfo.buffered;
else
incomplete = !_cgroupMemorySubsystemEnabled;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for incomplete to be true from the line 10510, but then set to false in this line? If so, Shouldn't there be a check to not set it to false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I need to correct the behavior to never change it to 'false' once we set it to 'true'

Copy link
Contributor

@dsouzai dsouzai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM

{
// Set a flag that will determine one compilation thread to suspend itself
// Even if multiple threads enter his path we still want to suspend only
// on thread, not several. This is why we use a flag and not a counter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: on -> one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@ashu-mehra
Copy link
Contributor

Looks fine to me.

@ashu-mehra
Copy link
Contributor

In my opinion retrieveLinuxCgroupMemoryStats should always return 'success' and the validity of various memory statistics can be determined by comparison with OMRPORT_MEMINFO_NOT_AVAILABLE

I think we also need to set memoryAndSwapLimit to 0 if the file does not exist. And if we do that then that change alone will fix the issue.

The JIT is allowed to use up to 256 MB of 'scratch' memory for a compilation
while JSR292 compilations increase this limit to 1.5 GB. These limits create
footprint spikes during an application run which can make resource
provisioning more cumbersome.

This commit adds the following changes:
1) The JIT will use portlib APIs to compute the amount of physical memory
available (taking into consideration any container limits). This value
is cached (to minimize API overhead) and recomputed with a desired
frequency (0.5 sec by default)

2) At the beginning of a compilation the JIT checks the amount of available
physical memory and sets a limit for the scratch space accordingly. If the
amount of physical memory is so low that a relatively cheap compilation cannot
be performed, the compilation request is aborted right there.

3) When a compilation thread neeeds to allocate a new scratch memory segment
(of size 16 MB) it checks again the available physical memory. If this is low,
the compilation is aborted and will be retried at a lower optimization level.
At the same time a flag will be set which will make one compilation thread
to suspend itself (however, the JIT always makes sure there is at least one
compilation thread active).

4) If the amount of free physical memory is low, the JIT will not activate
additional compilation threads which can make the memory shortage even worse.

'low' phyical memory is determined by TR::Options::getSafeReservePhysicalMemoryValue()
which currently defaults to 50MB and can be changed with
-Xjit:safeReservePhysicalMemoryValue=<nnn>  value in KB

Signed-off-by: Marius Pirvu <mpirvu@ca.ibm.com>
@fjeremic
Copy link
Contributor

Jenkins test sanity all JDK8

@fjeremic fjeremic merged commit 5310edf into eclipse-openj9:master Sep 19, 2018
@mpirvu mpirvu deleted the autoscratchlimit branch December 13, 2018 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants