Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to not generate precise GC info #88811

Closed
wants to merge 2 commits into from

Conversation

MichalStrehovsky
Copy link
Member

Revived #75817 with a small change.

Follow up to #75803.

If enabled, conservative GC stack scanning will be used and metadata related to GC stack reporting will not be generated. The generated executable file will be smaller, but the GC will be less efficient (garbage collection might take longer and keep objects alive for longer periods of time than usual).

Saves 4.4% in size on a Hello World. Saves 6.7% on Stage 1. I'll take that.

The main difference from the previous PR is that I'm no longer dropping GC info on UnmanagedCallersOnly methods and that seems to be enough to make Smoke tests pass. Let's see what the rest of the testing thinks.

Cc @dotnet/ilc-contrib

Revived dotnet#75817 with a small change.

Follow up to dotnet#75803.

If enabled, conservative GC stack scanning will be used and metadata related to GC stack reporting will not be generated. The generated executable file will be smaller, but the GC will be less efficient (garbage collection might take longer and keep objects alive for longer periods of time than usual).

Saves 4.4% in size on a Hello World. Saves 6.7% on Stage 1. I'll take that.

The main difference from the previous PR is that I'm no longer dropping GC info on `UnmanagedCallersOnly` methods and that seems to be enough to make Smoke tests pass.
@ghost
Copy link

ghost commented Jul 13, 2023

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Issue Details

Revived #75817 with a small change.

Follow up to #75803.

If enabled, conservative GC stack scanning will be used and metadata related to GC stack reporting will not be generated. The generated executable file will be smaller, but the GC will be less efficient (garbage collection might take longer and keep objects alive for longer periods of time than usual).

Saves 4.4% in size on a Hello World. Saves 6.7% on Stage 1. I'll take that.

The main difference from the previous PR is that I'm no longer dropping GC info on UnmanagedCallersOnly methods and that seems to be enough to make Smoke tests pass. Let's see what the rest of the testing thinks.

Cc @dotnet/ilc-contrib

Author: MichalStrehovsky
Assignees: MichalStrehovsky
Labels:

area-NativeAOT-coreclr

Milestone: -

@MichalStrehovsky

This comment was marked as off-topic.

@azure-pipelines

This comment was marked as abuse.

@MichalStrehovsky
Copy link
Member Author

/azp run runtime-extra-platforms

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MichalStrehovsky
Copy link
Member Author

Hmm, the GC is not happy.

I was hoping that if this doesn't look problematic, I would propose enabling this when OptimizationPreference=Size but if we're running into GC issues in conservative mode, that gives me a pause.

Pulled this from runfo get-helix-payload -j 491a31dc-0b73-4882-b873-3591543550be -w System.Collections.Concurrent.Tests -o c:\hell

0:000> k
*** WARNING: Unable to verify timestamp for libpthread-2.31.so
*** WARNING: Unable to verify timestamp for libc-2.31.so
 # Child-SP          RetAddr               Call Site
00 0000fedf`294d3220 0000ab49`b4d9f9b0     System_Collections_Concurrent!WKS::gc_heap::relocate_address+0xc8
01 0000fedf`294d3240 0000ab49`b4d7edf0     System_Collections_Concurrent!WKS::GCHeap::Relocate+0xc8
02 0000fedf`294d3280 0000ab49`b4d7ecac     System_Collections_Concurrent!Thread::GcScanRootsWorker+0x114
03 0000fedf`294d3310 0000ab49`b4d7ab1c     System_Collections_Concurrent!Thread::GcScanRoots+0x80
04 0000fedf`294d35c0 0000ab49`b4da80d8     System_Collections_Concurrent!GCToEEInterface::GcScanRoots+0x188
05 0000fedf`294d3630 0000ab49`b4d991ac     System_Collections_Concurrent!WKS::gc_heap::relocate_phase+0x9c
06 0000fedf`294d36d0 0000ab49`b4d90260     System_Collections_Concurrent!WKS::gc_heap::plan_phase+0x478c
07 0000fedf`294d3b60 0000ab49`b4d9c358     System_Collections_Concurrent!WKS::gc_heap::gc1+0x318
08 0000fedf`294d3c20 0000ab49`b4d8c1f0     System_Collections_Concurrent!WKS::gc_heap::garbage_collect+0x73c
09 0000fedf`294d3ce0 0000ab49`b4d8d454     System_Collections_Concurrent!WKS::GCHeap::GarbageCollectGeneration+0x434
0a 0000fedf`294d3d50 0000ab49`b4d8e0c0     System_Collections_Concurrent!WKS::gc_heap::trigger_gc_for_alloc+0x58
0b 0000fedf`294d3d70 0000ab49`b4db4340     System_Collections_Concurrent!WKS::gc_heap::try_allocate_more_space+0x33c
0c 0000fedf`294d3dd0 0000ab49`b4d79afc     System_Collections_Concurrent!WKS::GCHeap::Alloc+0xbc
0d 0000fedf`294d3e10 0000ab49`b4dcff40     System_Collections_Concurrent!RhpGcAlloc+0x118
0e 0000fedf`294d3e50 0000ab49`b53d12f0     System_Collections_Concurrent!RhpNewObject+0x3c [/__w/1/s/src/coreclr/nativeaot/Runtime/arm64/AllocFast.S @ 91] 
0f 0000fedf`294d3ed0 0000ab49`b56eff30     System_Collections_Concurrent!xunit_assert_Xunit_Sdk_AssertEqualityComparer_1<Int32>::.ctor+0x40 [/_/src/xunit.assert/Asserts/Sdk/AssertEqualityComparer.cs @ 37] 
10 0000fedf`294d3f00 0000ab49`b4ef62a8     System_Collections_Concurrent!xunit_assert_Xunit_Assert::DoesNotContain+0x60 [/_/src/xunit.assert/Asserts/CollectionAsserts.cs @ 364] 

@jkotas
Copy link
Member

jkotas commented Jul 13, 2023

Hmm, the GC is not happy.

The JIT depends on the GC not happening in no GC regions. It is free to produce code that has GC references in SIMD register or temporary byrefs pointing outside the object. We would need to introduce conservative GC reporting mode for codegen to make this work reliably.

@jkotas
Copy link
Member

jkotas commented Jul 13, 2023

I think that the conservative GC mode is useful debugging and bring up aid. I do not think it is interesting to spend time on trying to make it work reliably. It is too unpredictable performance-wise to be taken seriously.

@VSadov
Copy link
Member

VSadov commented Jul 13, 2023

It is free to produce code that has GC references in SIMD register or temporary byrefs pointing outside the object.

I think this should be tolerated. GC must validate every input and treat as checked+pinned.

The JIT depends on the GC not happening in no GC regions.

This could be a problem, in theory. If there are regions of code that must be GC-atomic, and rely on GC info for that, they may no longer be GC-atomic.

Does the same scenario work if GC info is present?

(I am just concerned that conservative mode has issues, It should work or we must know reasons as there could be bugs affecting precise mode as well)

@VSadov
Copy link
Member

VSadov commented Jul 13, 2023

When I ran tests in conservative mode in the past I saw failures in tests that observe object life times (finalizers, weakrefs,…), but no crashes.

I wonder if we have some unexpected changes in the logic/invariants or it is just GC info not present is the actual cause.

@jkotas
Copy link
Member

jkotas commented Jul 13, 2023

I think this should be tolerated. GC must validate every input and treat as checked+pinned.

Right, you can make it work. It is not obvious to me that it works today. Does the conservative reporting report everything in SIMD registers?

This could be a problem, in theory. If there are regions of code that must be GC-atomic, and rely on GC info for that, they may no longer be GC-atomic.
Does the same scenario work if GC info is present?

Yes, it works fine if GC info is present. GC info has "no GC regions" where the runtime is disallowed to suspend for GC. Look for emitDisableGC in the JIT.

@VSadov
Copy link
Member

VSadov commented Jul 13, 2023

Does the conservative reporting report everything in SIMD registers?

I do not think so. See ForEachPossibleObjectRef
Can we have managed references in floating/simd registers at all?

Also we might see regression from #86917, but I think that will not impact conservative reporting, since it in unwinding.

Another thing that bothers me is that the crash appears to be in pointer updating. It means GC was ok when marking through the same reference, but unhappy when seeing it again in plan phase. (all refs that need updating should have been seen when marking, "updated" is a subset of "live").
Also - assuming this is a conservatively reported ref, why is it even affected by relocation, it should be pinned, and ideally not even reported for relocation.
Could be a seconary crash though - we got into bad state already and now crashing in random place.

@VSadov
Copy link
Member

VSadov commented Jul 13, 2023

I think that the conservative GC mode is useful debugging and bring up aid.

Also, special modes like conservative tend to challenge assumed invariants and point to fragilities in the design or even lurking stress issues. It is useful to have it working or at least know conditions when it does not work.

I'd not recommend it as a kind of optimization though. It is an observable change. We have tests that will reliably fail in this mode.
You may squeeze a bit more perf and file size at the cost of slightly larger heap, but you may also end up with quite gnarly issues - like files not closing once in a rare while and such.

@jkotas
Copy link
Member

jkotas commented Jul 13, 2023

Can we have managed references in floating/simd registers at all?

Yes. I believe that I have seen the JIT using SIMD registers to copy GC references in no-GC regions.

Another thing that bothers me is that the crash appears to be in pointer updating

The values in the managed portion of the stack can be updating while the GC is in progress. Consider ref void* argument passed to P/Invoke that points into managed portion of the stack. The P/Invoke can be writing random values into it, so the GC can see new bogus pointers during the relocation. The GC would need to ignore these in the conservative mode that we are taking about here.

@jkotas
Copy link
Member

jkotas commented Jul 13, 2023

The GC would need to ignore these in the conservative mode that we are taking about here.

Actually, I think the relocation stackwalk should be suppressed in conservative mode altogether. There is nothing to relocate.

@VSadov
Copy link
Member

VSadov commented Jul 13, 2023

Actually, I think the relocation stackwalk should be suppressed in conservative mode altogether. There is nothing to relocate.

We report some stack pointers precisely in conservative mode. NativeAOT reports threadstatic roots precisely, for example, maybe a few more things.
CoreCLR has a lot more cases - I remember some kind of things for tailcalls that are logically on stack but physically not, some exception frame stuff that could be not in the current stack range, if I remember correctly.

As long as we report some stuff precisely, that same stuff needs to be reported for relocation. The rest does not need to be reported, but can be - for simplicity or because noone bothered to sort that out (since GC would sort it out anyways).

@MichalStrehovsky
Copy link
Member Author

The reason why I think this is interesting is scenarios where Mono is used - Mono never needs to pay for the size of precise GC info because is doesn't do precise GC in codegen. People will already observe similar lifetime issues there.

Having this option allows more apples to apples comparison in size. But if it's too problematic for coreclr, it's probably not worth the effort.

@VSadov
Copy link
Member

VSadov commented Jul 14, 2023

The reason why I think this is interesting is scenarios where Mono is use

I like the option. It also could be used by pause-sensitive apps like games.

But tying it to OptimizationPreference=Size seems a bit too far as that is not supposed to require additional care or testing - basically just a throughput/size dial.
If my app would break only with OptimizationPreference=Size, I’d report it as a bug.

@MichalStrehovsky
Copy link
Member Author

But tying it to OptimizationPreference=Size seems a bit too far as that is not supposed to require additional care or testing - basically just a throughput/size dial.
If my app would break only with OptimizationPreference=Size, I’d report it as a bug.

The JIT is generally free to extend lifetimes of things however it sees fit - is this different from lifetimes becoming different because the JIT devirtualized and inlined something in one configuration and didn't do it in the other? It feels like in either case something can unpredictably change. One should ideally not rely on these things. We've been resolving issues where people were complaining about lifetime differences as By Design in the past (e.g. #37064 but there's many of those).

@jkotas
Copy link
Member

jkotas commented Jul 14, 2023

is this different from lifetimes becoming different because the JIT devirtualized and inlined something in one configuration and didn't do it in the other?

It is different in that:

  • The behavior is unpredictable, it can vary a lot from run to run a lot
  • It is impossible to workaround the lifetime extensions in critical places by wrapping them in NoInline methods

@VSadov
Copy link
Member

VSadov commented Jul 14, 2023

The JIT is generally free to extend lifetimes of things

That is impossible to fix, thus it is a Feature, not a Bug. :-).
Most automated memory management schemes will have this issue.

Reachability in general is not decidable (whether a program will use a given object is a halting problem), so we instead rely on liveness. But the moment when JIT kills variables, while intuitive, is not well defined - it is affected by codegen strategy and optimizations.
Practically this is rarely breaking and thus “precise enough” and as a last resort the liveness can be forced. Leaving uninlinable method will certainly kill its locals.

Many programs will work fine when switched to conservative reporting, but the chances of breakage is higher and fixes may be nontrivial.

@MichalStrehovsky
Copy link
Member Author

It is impossible to workaround the lifetime extensions in critical places by wrapping them in NoInline methods

Once the method returns, it's locals would be considered dead by a conservative scan too, wouldn't they (my assumption is that we do take the stack pointer into account)?

The behavior is unpredictable, it can vary a lot from run to run a lot

This also feels like "it's a range". It does depend on when the GC happens but the same thing happens for the arbitrary lifetime extensions done by the JIT, multiplied by the number of times the code gets paused.

I'll concede to the opinions of the two of you because both of you know a lot more about the GC than I do. But I'm puzzled how our bug tracker is not filled with people complaining about lifetimes on Mono if this is such a big problem (the issues about lifetimes I see are pretty much exclusively CoreCLR issues and I don't even remember a Mono issue about this).

@jkotas
Copy link
Member

jkotas commented Jul 14, 2023

Once the method returns, it's locals would be considered dead by a conservative scan too, wouldn't they (my assumption is that we do take the stack pointer into account)?

It is not the case. The "bad" pointer can be stuck in some random place on the stack that you have no idea about.

I'm puzzled how our bug tracker is not filled with people complaining about lifetimes on Mono

Here are some examples of issues:

IIRC, there are also places in Mono runtime where optimizations are suppressed or stack cleared explicitly to reduce severity of the problem. If we were to introduce codegen mode for conservative GC and see it actually used, we would probably end up doing things like that too.

@MichalStrehovsky
Copy link
Member Author

Makes sense, thanks! I'm going to close this. It's still good to know we have a 5+% opportunity if we need it, but we'll need to make a bigger investment for it.

@ghost ghost locked as resolved and limited conversation to collaborators Aug 14, 2023
@MichalStrehovsky MichalStrehovsky deleted the conservative branch January 29, 2024 07:25
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants