-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pathological GC time on M1 mac #48473
Pathological GC time on M1 mac #48473
Comments
I assume this is on master? cc: @d-netto |
I'm not sure, but we're seeing bad behaviour on 1.9 as well. |
Yes, this was master. |
bench = "list.jl"
No Changes to `~/GCBenchmarks/benches/serial/linked/Project.toml`
No Changes to `~/GCBenchmarks/benches/serial/linked/Manifest.toml`
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │ 9253 │ 6099 │ 5380 │ 704 │ 1981 │ 12 │ 3055 │ 63 │
│ median │ 9342 │ 6118 │ 5396 │ 722 │ 1995 │ 13 │ 3136 │ 64 │
│ maximum │ 9424 │ 6226 │ 5503 │ 725 │ 2031 │ 18 │ 3140 │ 65 │
│ stdev │ 62 │ 53 │ 51 │ 6 │ 19 │ 2 │ 26 │ 0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
Julia Version 1.9.0-beta3
Commit 24204a73447 (2023-01-18 07:20 UTC)
Platform Info:
OS: Linux (aarch64-linux-gnu)
CPU: 4 × Neoverse-N1
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, neoverse-n1)
Threads: 1 on 4 virtual cores Doesn't seem to be aarch64 related. It would be nice to test on x86 mac as well. bench = "append.jl"
No Changes to `~/GCBenchmarks/benches/serial/append/Project.toml`
No Changes to `~/GCBenchmarks/benches/serial/append/Manifest.toml`
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │ 1963 │ 199 │ 158 │ 40 │ 98 │ 15 │ 1470 │ 3 │
│ median │ 1975 │ 204 │ 163 │ 41 │ 99 │ 17 │ 1470 │ 3 │
│ maximum │ 2016 │ 211 │ 170 │ 42 │ 101 │ 21 │ 1470 │ 4 │
│ stdev │ 16 │ 3 │ 3 │ 1 │ 1 │ 2 │ 0 │ 0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ For reference. It's slower than expected but not pathological. |
The same benchmark had an opposite trend on a AMD/x86-64 machine:
|
Would be interesting to test M1 linux as well. I guess a VM might be fine to just get a rough idea. |
bench = "list.jl"
No Changes to `~/gctest/Resources/julia/bin/GCBenchmarks/benches/serial/linked/Project.toml`
No Changes to `~/gctest/Resources/julia/bin/GCBenchmarks/benches/serial/linked/Manifest.toml`
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬──────
│ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ tim ⋯
│ │ ms │ ms │ ms │ ms │ ms │ ⋯
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼──────
│ minimum │ 7592 │ 5487 │ 4825 │ 662 │ 1792 │ ⋯
│ median │ 7912 │ 5772 │ 5075 │ 695 │ 1888 │ ⋯
│ maximum │ 8352 │ 6050 │ 5339 │ 720 │ 1982 │ ⋯
│ stdev │ 266 │ 184 │ 173 │ 19 │ 69 │ ⋯
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴──────
3 columns omitted
bench = "append.jl"
No Changes to `~/gctest/Resources/julia/bin/GCBenchmarks/benches/serial/append/Project.toml`
No Changes to `~/gctest/Resources/julia/bin/GCBenchmarks/benches/serial/append/Manifest.toml`
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬──────
│ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ tim ⋯
│ │ ms │ ms │ ms │ ms │ ms │ ⋯
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼──────
│ minimum │ 2167 │ 240 │ 86 │ 153 │ 58 │ ⋯
│ median │ 2222 │ 251 │ 90 │ 162 │ 63 │ ⋯
│ maximum │ 2356 │ 263 │ 95 │ 172 │ 65 │ ⋯
│ stdev │ 55 │ 7 │ 3 │ 7 │ 2 │ ⋯
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴──────
julia> versioninfo()
Julia Version 1.9.0-beta3
Commit 24204a73447 (2023-01-18 07:20 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin21.4.0)
CPU: 12 × Intel(R) Core(TM) i7-8700B CPU @ 3.20GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
Threads: 1 on 12 virtual cores
Environment:
DYLD_FALLBACK_LIBRARY_PATH = /Users/julia/lib:/usr/local/lib:/lib:/usr/lib:/Users/julia/Library/Python/3.7/lib
JULIA_PKG_PRECOMPILE_AUTO = 1
JULIA_PKG_SERVER = https://pkg.julialang.org It seems to be m1 related 🤔 |
It seems surprising that there would be such a big OS difference here .... We're not really doing any system calls. I guess we might be stressing the memory subsystem, but jeez.... |
I did a bare metal linux run and the results match @gbaraldi's VM results. |
GC log on M1/macOS:
vs x86_64 linux:
For some reason on macOS, we're just doing way more collections and in particular an excessive amount of full collections. |
What's the status on 1.8? Just thinking if there is something to bisect. |
Looks like 1.8 is fine (M1/aarch64):
|
Could to be related to #44805. For reference, after commenting out: // If the live data outgrows the suggested max_total_memory
// we keep going with minimum intervals and full gcs until
// we either free some space or get an OOM error.
if (live_bytes > max_total_memory) {
sweep_full = 1;
} and: // We need this for 32 bit but will be useful to set limits on 64 bit
if (gc_num.interval + live_bytes > max_total_memory) {
if (live_bytes < max_total_memory) {
gc_num.interval = max_total_memory - live_bytes;
} else {
// We can't stay under our goal so let's go back to
// the minimum interval and hope things get better
gc_num.interval = default_collect_interval;
}
} on master it goes to
on the M1/macOS. |
I wonder if we're getting the wrong values from libuv here. |
are we detecting max memory incorrectly on mac? |
Yes, |
Seems like it's the case (instead of an issue with the heuristics). Hardcoding 2GB (which is closer to what I have on my machine) into
on M1/macOS. |
Free memory on macos or at least on the M1 doesn't seem to be a reliable way of checking. |
With JuliaLang/libuv#34 I get bench = "list.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│ │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│ │ ms │ ms │ ms │ ms │ ms │ us │ MB │ % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │ 5112 │ 3703 │ 3223 │ 466 │ 1222 │ 7 │ 2701 │ 71 │
│ median │ 5150 │ 3747 │ 3262 │ 488 │ 1236 │ 9 │ 2701 │ 72 │
│ maximum │ 5270 │ 3868 │ 3372 │ 507 │ 1323 │ 15 │ 2705 │ 72 │
│ stdev │ 57 │ 50 │ 42 │ 12 │ 34 │ 2 │ 2 │ 0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘ Which seems more reasonable. |
Duplicate of #47684 |
Should we be calling |
I am in favor of that. It's odd to take the state of the system when Julia starts as the high-water mark. Constrained memory seems like the right concept (in particular since cgroups is a thing). |
Macos has no concept of constrained memory, and returns 0 here. |
Then we use total memory. |
After some discussion should we just remove Lines 3258 to 3262 in d72a9a1
|
Remove the high watermark logic, because it doesn't really make sense, and allow for use of 60% of system memory before aggressive GC kicks in. Should fix #48473
M1 mac:
x86 server:
For comparison, on other benchmarks, the M1 is faster:
vs
Benchmarks are from https://github.com/JuliaCI/GCBenchmarks.
The text was updated successfully, but these errors were encountered: