Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip frozen segments for commit accounting #75738

Merged
merged 1 commit into from
Sep 20, 2022

Conversation

cshung
Copy link
Member

@cshung cshung commented Sep 16, 2022

Frozen segments should be excluded from the commit accounting verification.

@cshung cshung self-assigned this Sep 16, 2022
@ghost
Copy link

ghost commented Sep 16, 2022

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Frozen segments should be excluded from the commit accounting verification.

Author: cshung
Assignees: cshung
Labels:

area-GC-coreclr

Milestone: -

@EgorBo
Copy link
Member

EgorBo commented Sep 16, 2022

I assume it makes GCHeap::UpdateFrozenSegment that I added in #49576 useless, right? (it reports to gc when vm commits more memory for a frozen segment)

@cshung
Copy link
Member Author

cshung commented Sep 16, 2022

I assume it makes GCHeap::UpdateFrozenSegment that I added in #49576 useless, right? (it reports to gc when vm commits more memory for a frozen segment)

It depends on what was your goal when you introduced UpdateFrozenSegment.

Let's get started with the concept of commit accounting.

In the case of heap_hard_limit (i.e. COMPlus_GCHeapHardLimit), we wanted to restrict the GC heap from committing too much memory. To do so, we keep track of how much memory is used by the GC heap by having some accounting variables (e.g. current_total_committed, committed_by_oh, ...). These variables represent how much memory is committed for a certain purpose.

To keep these variables up-to-date, we add and subtract them whenever we commit/decommit within the GC. Sometimes, the GC repurposes memory (e.g. moving a region from a heap to a free list), and we adjust our accounting in those cases as well.

One of the concerns is that when the GC code base evolves, the accounting might be wrong, and that's why we have various methods (e.g. the two places we are editing in the code) to check if the accounting was done correctly using asserts. The idea is that if I loop through the heap data structures, the total committed bytes in those regions should match exactly with the accounting variables.

Now, let's get back to this change.

Since we didn't update the accounting variables on UpdateFrozenSegment, we need to adjust the check such that it matches.

Assuming your goal is to update the heap_segment_committed and heap_segment_allocated such that it reflects correctly on SOS, this change doesn't change anything for that.

On the other hand, if your goal is to make sure committed memory used for the frozen segments is counted towards heap_hard_limit, then we will probably want an alternative change that updates the accounting variables on UpdateFrozenSegment/RegisterFrozenSegment/UnregisterFrozenSegment instead.

In any case, this bug can be easily reproed by running under a checked build with DOTNET_GCHeapHardLimit turned on.

@jkotas
Copy link
Member

jkotas commented Sep 17, 2022

Frozen segments allocated at runtime, like that ones added by Egor for string literals, should count towards GC hard limit. They are no different from other memory managed by the GC from the user point of view. The fact that we populate this memory using different mechanism is an internal implementation detail.

We may want to have a flag on frozen segments that says whether or not it counts towards GC hard limit. Frozen segments created that are backed by read-only file-mappings should not count towards GC hard limit.

@cshung
Copy link
Member Author

cshung commented Sep 17, 2022

If we count the committed bytes towards heap_hard_limit, there is a possibility that we ran out of limit on these calls. @EgorBo, is it okay to throw OutOfMemory at the spot you call RegisterFrozenSegment or UpdateFrozenSegment?

@EgorBo
Copy link
Member

EgorBo commented Sep 17, 2022

is it okay to throw OutOfMemory at the spot you call RegisterFrozenSegment or UpdateFrozenSegment?

I think so, we already throw OOM in some cases from those call sites

@jkotas
Copy link
Member

jkotas commented Sep 17, 2022

UpdateFrozenSegment would need some work to implement the backout correctly. Its call site assumes that it will never fail. Also, the extra memory was committed already when UpdateFrozenSegment is called, so the "damage" was done already.

Would it be easier for GC to ignore the fact that the frozen segment overshot the limit?

@cshung
Copy link
Member Author

cshung commented Sep 19, 2022

@Maoni0

@Maoni0
Copy link
Member

Maoni0 commented Sep 20, 2022

Would it be easier for GC to ignore the fact that the frozen segment overshot the limit?

it would be much more desirable for GC to simply ignore the frozen segments. from GC's POV, these segments are something we only need to care about when we have to, ie, when they are in range, meaning if they are between gc's g_lowest_address and g_highest_address, in which case GC will need to have the bookkeeping datastructure for them. but other than that, GC just ignores them. GC isn't in charge of allocating them or managing objects on them. they are almost like native memory so it'd be better for components that allocate them to make sure there's memory for them.

UpdateFrozenSegment doesn't just update heap_segment_committed, it also updates heap_segment_allocated and GC does need the allocated value to do the minimal bookkeeping (ie, clear the mark bits in sweep_ro_segments).

@jkotas
Copy link
Member

jkotas commented Sep 20, 2022

From outside the runtime point of view, the string literals are managed heap memory just like any other. I expect that the diagnostic tools will be treating them as such. If the GC APIs exclude them, we may be questions about discrepancies between GC heap sizes reported by GC APIs vs. what's reported by diagnostic tools as sum of all managed object sizes.

@Maoni0
Copy link
Member

Maoni0 commented Sep 20, 2022

From outside the runtime point of view, the string literals are managed heap memory just like any other. I expect that the diagnostic tools will be treating them as such

actually diagnostic tools are already distinguishing between different types of heaps (eg SOH vs LOH) so I'd imagine this would be another type of heap and tools can display objects in this heap like they do with SOH/LOH/POH (we'll of course need to make sure we expose all the necessary info for tools to read ro segments info).

if someone wants to set a hardlimit for their app, they usually look at the heap in a memory tool. it seems like we could just tell them that ro segments are not charged against hardlimit in tooling. the only customer we currently have for ro segments allocates this memory themselves and knows clearly it's not charged against hardlimit.

in any case, my understanding is there'll need to be some diag work that needs to happen for ro segs now it's used by the runtime. I'm interested to see what the diag folks have to say what they think the experience should be.

@noahfalk
Copy link
Member

noahfalk commented Sep 20, 2022

I commented over in #49576 (comment), but I think this issue is raising similar concerns. We need to decide how we will conceptualize this heap for users and then we want to make all of our accounting and quotas match that story. GCHeapHardLimit is one of ~10 different ways users might interact with this accounting.

I think there are three different stories we could tell:

a. Treat the Frozen Object Heap as a subdivision of some other part of the GC heap, for example considering it a subset of SOH gen2, or a subset of the pinned object heap. Two variations here: (i) We could disclose that this subdivision exists in our public docs/APIs/tools or (ii) we could treat it as an undisclosed implementation detail.
b. Disclose the Frozen Object Heap as a new top level category within the GC. We would now have 4 GC regions - SOH, POH, LOH, and Frozen Object Heap (FOH).
c. Disclose the Frozen Object Heap as a new memory region that holds managed objects, but is distinct from the GC entirely.

I think (a) and (b) require less effort from users to understand because it preserves the existing invariant that "GC heap" == "the pool of memory in which all managed objects are allocated". For people who only want/need a simple understanding of .NET memory this is a nice abstraction.

In terms of labor anything that exposes the Frozen Object Heap as a new developer visible concept that they observe and reason about through tools/APIs/docs will add a bunch of touch points (example dotnet/diagnostics#1408).

If the Frozen Object Heap will typically be small and not very impactful for performance investigation I lean towards option a-ii to minimize the total work. If the Frozen Object Heap is large enough that it needs explicit treatment then I lean towards option b.

We may want to have a flag on frozen segments that says whether or not it counts towards GC hard limit. Frozen segments created that are backed by read-only file-mappings should not count towards GC hard limit.

How would we conceptualize this for users? This one seems like we've conceptually split the FOH into FileMapped FOH and non-FileMapped FOH, treated the FileMapped FOH as option (c) and treated the non-FileMapped FOH as (a) or (b). We would be adding a decent amount of user-visible complexity in service of (so far) one perf optimization.

@cshung
Copy link
Member Author

cshung commented Sep 20, 2022

While the discussion is still in flux, can we merge this change for now?

Without this change, any app running under heap_hard_limit using the frozen object heap is failing with FATAL_GC_ERROR right now whenever it runs a GC.

@jkotas
Copy link
Member

jkotas commented Sep 20, 2022

While the discussion is still in flux, can we merge this change for now?

Sounds good to me.

@cshung
Copy link
Member Author

cshung commented Sep 20, 2022

The CI is ready, once I have a code review approval I can get this merged.

@cshung cshung merged commit 4e6244c into dotnet:main Sep 20, 2022
@cshung cshung deleted the public/skip-frozen-accounting branch September 20, 2022 18:20
@ghost ghost locked as resolved and limited conversation to collaborators Oct 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants