-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mono][aot] Investigate RAM consumption in Mono AOT compiler #95791
Comments
@lateralusX when you're ready, can you share your findings with @kotlarmilos ? |
Memory usage in the AOT compiler consists of:
Some ideas for improvements:
|
Been looking at this for a couple of days:
I will fix the things I hit so far. |
Could you help me to understand this issue? The APIs in question are recursive and so Mono "must" have existing handling for them, even if just to cause them to return Is the issue that LLVM is doing other optimizations first, rather than an early elimination of dead code blocks? -- In RyuJIT we handle the general mapping to constant true/false here: There is then a fallback for other special APIs, which may be unrecognized here: RyuJIT then likewise does an early removal of dead code blocks as part of importation, to improve throughput for latter phases. |
Correct, there are fallbacks for We do miss support for intrinsic for Main issue is that we don't run regular JIT dead code elimination pass on the generated CFG when we use LLVM. In that case we rely on LLVM opt, but we only run that as an out of proc process after generating the full LLVM module (covering the whole assembly), so we end up generating all code, consume a lot of memory. It would be great if we code do elimination of dead code blocks early in Mono as well, that would reduce the memory foot print when compiling assemblies like S.P.C. I'm still investigating this issue, so I will find more details when I start to eliminate the initial detected issues. |
Using unlinked S.P.C during investigation of potential fixes that could have significant impact on cross compiler memory size during this investigation. Before any changes running a full AOT compile of S.P.C targeting x64 ended up with a cross compiler memory usage of ~6 GB in .net8. I have investigated and implemented the following fixes that will dramatically reduce the memory usage in this scenario targeting x64:
With the above changes, compiling unlinked S.P.C ends up at ~1.3 GB of memory usage, so a rather dramatic improvement from original 6 GB. I will also look into implementing a driver option in cross compiler as part of this effort. I will probably add a new driver option to aot compiler, that in turn will run aot compiler as a separate process using the asm-only + no_opt, it will then run opt, llc, asm as separate processes. That will make sure we release memory used by cross compiler before running LLVM tooling (opt/llc) and that should improve the scalability and parallelization on machines. Still working on the fixes in a down stream repro, so will complete work there first and then upstream relevant changes. |
PR reducing memory footprint, #97096. Will do changes to add driver mode to AOT compiler as a separate PR. |
PR implementing driver mode in Mono AOT cross compiler, reduce machine memory usage per compiled assembly, not keeping cross compiler instance alive when running tools like opt and llc, #97226. |
…is resolved (dotnet#96875) * Exclude System.Numerics.Tensors.Tests from wasm aot until dotnet#95791 is resolved * Also exclude System.Numerics.Tensors.Net8.Tests
Motivation and Background
The Mono AOT compiler currently requires a minimum of 16GB of RAM memory to compile
System.Private.CoreLib.dll
using a linux machine. Currently, this limitation is preventing us from running fullAOT tests in our CI. The purpose of this issue is to explore ways to reduce the RAM consumption of the AOT compiler.Analysis
We conducted a simple experiment within
free -m
command inside a docker container usingcbl-mariner-2.0-cross-amd64
on a machine with 32GB of RAM memory. The graph below presents the RAM memory consumption during consecutive AOT compilation forrunningmono.dll
andSystem.Private.CoreLib.dll
assemblies using LLVM configuration in release mode.The graph indicates a clear pattern of memory consumption during the AOT compilation of these two assemblies. The memory consumption for compiling
runningmono.dll
is smaller as it is less complex, but the pattern is the same. However, there are two noticeable spikes that are worth investigating.runningmono.dll
:JIT time: 6518 ms
Generation time: 47876 ms
Assembly+Link time: 127 ms
System.Private.CoreLib.dll
:JIT time: 53617 ms
Generation time: 388484 ms
Assembly+Link time: 2845 ms
The first spike, which is likely related to the JIT phase, does not have a constant steepness. This irregularity may indicate a potential issue with memory handling during this phase. The second spike likely corresponds to the generation phase, and it appears to be consistent. By addressing these spikes, we aim to reduce the RAM usage of the AOT compiler and improve its efficiency.
Tasks
/cc: @vargaz @lambdageek @BrzVlad @ivanpovazan @fanyang-mono @matouskozak @SamMonoRT @steveisok
The text was updated successfully, but these errors were encountered: