Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible regression in GC time in 1.10 #53018

Open
sumiya11 opened this issue Jan 23, 2024 · 12 comments
Open

Possible regression in GC time in 1.10 #53018

sumiya11 opened this issue Jan 23, 2024 · 12 comments
Labels
GC Garbage collector kind:regression Regression in behavior compared to a previous version performance Must go faster

Comments

@sumiya11
Copy link
Contributor

Hi!

Out of curiousity, I checked the performance of my package (Groebner) on recent 1.11.

It occurred to me that there could be a regression, and that the root of it could be somewhere in 1.10.
My back of the envelope benchmarks show a 2-5x slowdown in 1.10 compared to 1.9, see below.

I run the script

using Groebner, AbstractAlgebra

model = Groebner.cyclicn(8, k=GF(2^30 + 3), ordering=:degrevlex);

@time groebner(model);
@time groebner(model);
@time groebner(model);
@time groebner(model);
@time groebner(model);
@time groebner(model);

And get the output:

1.9.2 (official release):

  5.102363 seconds (1.96 M allocations: 338.489 MiB, 2.31% gc time, 73.30% compilation time)
  1.302451 seconds (159.63 k allocations: 221.873 MiB, 2.55% gc time)
  1.172796 seconds (159.63 k allocations: 221.873 MiB, 0.87% gc time)
  1.157940 seconds (159.63 k allocations: 221.873 MiB, 0.66% gc time)
  1.232590 seconds (159.63 k allocations: 221.873 MiB, 1.08% gc time)
  1.151201 seconds (159.63 k allocations: 221.873 MiB, 0.67% gc time)

1.10.0 (official release):

  8.735243 seconds (1.79 M allocations: 331.722 MiB, 44.51% gc time, 43.72% compilation time)
  1.465188 seconds (159.65 k allocations: 221.874 MiB)
  3.934508 seconds (159.65 k allocations: 164.941 MiB, 59.77% gc time)
  2.728948 seconds (159.65 k allocations: 162.659 MiB, 42.54% gc time)
  2.651898 seconds (159.65 k allocations: 162.659 MiB, 43.89% gc time)
  4.909563 seconds (159.65 k allocations: 221.874 MiB, 68.95% gc time)

1.11 (1 day old master):

  6.671727 seconds (3.48 M allocations: 456.577 MiB, 3.70% gc time, 75.23% compilation time)
  2.353556 seconds (264.26 k allocations: 294.034 MiB, 39.44% gc time)
  1.538453 seconds (264.26 k allocations: 294.034 MiB, 6.80% gc time)
  2.392527 seconds (264.26 k allocations: 294.034 MiB, 38.40% gc time)
  2.007164 seconds (264.26 k allocations: 294.034 MiB, 43.94% gc time)
  1.672259 seconds (264.26 k allocations: 294.034 MiB, 34.68% gc time)

One feature of the Groebner.groebner function is that it allocates a lot of small vectors throughout execution.
I can try to isolate this into a smaller example if it would make sense.

All julia versions use AbstractAlgebra v0.35.3 and Groebner v0.6.3.
I run julia without cli arguments and without a startup file.
My machine is:

Julia Version 1.11.0-DEV.1355
Commit 4d26be0847 (2024-01-22 21:30 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 64 × Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, skylake-avx512)
Threads: 1 default, 0 interactive, 1 GC (on 64 virtual cores)
Environment:
  LD_LIBRARY_PATH = :/home/demin/libs2/usr/local/lib/
@nsajko nsajko added the performance Must go faster label Jan 23, 2024
@ViralBShah ViralBShah added the regression 1.10 Regression in the 1.10 release label Jan 23, 2024
@oscardssmith
Copy link
Member

oscardssmith commented Jan 23, 2024

I think this might be fixed by #52994.

Edit. this is unlikely to have fixed it since this is single threaded.

@sumiya11
Copy link
Contributor Author

Another quick comment: groebner is more or less pure, except that it mutates a couple of globals for logging, etc.

@gbaraldi
Copy link
Member

gbaraldi commented Jan 23, 2024

The difference is that we run way too many full collections here, both in 1.10 and master. Specially in 1.10

@KristofferC KristofferC added kind:regression Regression in behavior compared to a previous version GC Garbage collector labels Jan 23, 2024
@KristofferC KristofferC added this to the 1.11 milestone Jan 23, 2024
@gbaraldi gbaraldi removed the regression 1.10 Regression in the 1.10 release label Jan 23, 2024
@sumiya11
Copy link
Contributor Author

This should very roughly resemble the workflow in Groebner.jl:

# Very important work
function work0(m, n, k)
    for _ in 1:k
        polys = [rand(UInt32, rand(1:m)) for _ in 1:n]
        work1(polys)
    end
end

function work1(polys)
    res = 0
    for poly in polys
        new_poly = work2(poly)
        res += sum(new_poly)
    end
    res
end

function work2(poly)
    new_poly = similar(poly)
    work3!(new_poly)
end

function work3!(poly::AbstractVector{T}) where {T}
    poly[1] = one(T)
    for i in 2:length(poly)
        poly[i] = convert(T, i)^3 - poly[i - 1]
    end
    poly
end

using InteractiveUtils; versioninfo()
@time work0(100, 100_000, 10)
@time work0(100, 100_000, 10)
@time work0(100, 100_000, 10)
@time work0(100, 100_000, 10)
@time work0(100, 100_000, 10)

Resulting in

1.9:

Julia Version 1.9.2
Commit e4ee485e909 (2023-07-05 09:39 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 64 × Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake-avx512)
  Threads: 1 on 64 virtual cores
Environment:
  LD_LIBRARY_PATH = :/home/demin/libs2/usr/local/lib/
  2.161450 seconds (2.00 M allocations: 519.583 MiB, 34.35% gc time)
  1.992795 seconds (2.00 M allocations: 519.507 MiB, 32.74% gc time)
  1.951021 seconds (2.00 M allocations: 519.505 MiB, 32.37% gc time)
  1.949284 seconds (2.00 M allocations: 519.529 MiB, 32.36% gc time)
  1.855381 seconds (2.00 M allocations: 519.498 MiB, 30.36% gc time)

1.10:

Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 64 × Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake-avx512)
  Threads: 1 on 64 virtual cores
Environment:
  LD_LIBRARY_PATH = :/home/demin/libs2/usr/local/lib/
  4.185146 seconds (2.00 M allocations: 519.578 MiB, 53.11% gc time)
  4.822002 seconds (2.00 M allocations: 519.339 MiB, 60.95% gc time)
  4.269982 seconds (2.00 M allocations: 520.013 MiB, 57.77% gc time)
  4.377889 seconds (2.00 M allocations: 519.624 MiB, 57.68% gc time)
  4.543688 seconds (2.00 M allocations: 519.801 MiB, 57.88% gc time)

@oscardssmith
Copy link
Member

for a more minimal mwe,

julia> function work0(m, n, k)
           res=0.0
           for _ in 1:k
               res += sum(sum.([rand(rand(1:m)) for _ in 1:n]))
           end
           res
       end
work0 (generic function with 1 method)

# 1.9 
  0.490611 seconds (1.00 M allocations: 470.900 MiB, 31.94% gc time)
2.5250631225162618e7

# 1.10 with gcthreads=1
julia> @time work0(100, 100_000, 10)
  0.666685 seconds (1.00 M allocations: 470.683 MiB, 47.04% gc time)
2.523670469905652e7

@d-netto
Copy link
Member

d-netto commented Jan 24, 2024

Didn't investigate master yet.

The difference between 1.9 and 1.10 seems to be mostly on the number of full collections we're running (0 on 1.9, 3 on 1.10).

The regression is certainly still there, but it's not really surprising given that we returned to more aggressive GC heuristics in 1.10.

Also, note that we seem to be able to recover some of the performance in this MWE by using GC threads, at least on the M2.

MWE:

function work0(m, n, k)
   res=0.0
   for _ in 1:k
       res += sum(sum.([rand(rand(1:m)) for _ in 1:n]))
   end
   res
end

GC.enable_logging()
@time work0(100, 100_000, 10)
  • 1.9:
GC: pause 31.11ms. collected 29.561881MB. incr 
GC: pause 5.35ms. collected 0.000936MB. incr 
GC: pause 6.60ms. collected 49.412432MB. incr 
GC: pause 4.48ms. collected 49.289232MB. incr 
GC: pause 3.89ms. collected 49.260032MB. incr 
GC: pause 4.60ms. collected 18446744073709.531250MB. incr 
GC: pause 7.14ms. collected 0.816240MB. incr 
GC: pause 7.80ms. collected 96.496656MB. incr 
GC: pause 6.02ms. collected 49.454624MB. incr 
  0.225451 seconds (1.00 M allocations: 470.933 MiB, 34.15% gc time)
  • 1.10 (1 GC thread):

GC: pause 35.18ms. collected 25.071297MB. incr 

GC: pause 5.34ms. collected 0.000511MB. incr 

GC: pause 16.09ms. collected 0.763397MB. full 

GC: pause 43.02ms. collected 140.385666MB. full 

GC: pause 35.74ms. collected 94.071075MB. incr 

GC: pause 6.00ms. collected 46.402496MB. incr 

GC: pause 9.18ms. collected 1.424820MB. full 
  0.292375 seconds (1.00 M allocations: 470.905 MiB, 51.49% gc time)
  • 1.10 (4 GC threads):

GC: pause 14.64ms. collected 25.071297MB. incr 

GC: pause 8.54ms. collected 0.000511MB. incr 

GC: pause 17.84ms. collected 0.763397MB. full 

GC: pause 29.04ms. collected 140.754700MB. full 

GC: pause 14.73ms. collected 94.285538MB. incr 

GC: pause 10.96ms. collected 46.489609MB. incr 

GC: pause 12.92ms. collected 1.427139MB. full 
  0.258420 seconds (1.00 M allocations: 471.498 MiB, 42.05% gc time)

@sumiya11
Copy link
Contributor Author

Also, note that we seem to be able to recover some of the performance in this MWE by using GC threads, at least on the M2.

Right, I also observe this

@JeffBezanson
Copy link
Sponsor Member

Do we know what commit changed this? Maybe a bisect is warranted? I think this is a pretty unacceptable regression.

@sumiya11
Copy link
Contributor Author

From 1.9.4 to 1.10.0, according to my bisect, 3b97715 is the first commit with this regression.

The output for 3b97715:

Julia Version 1.10.0-beta3.24
Commit 3b9771507a (2023-10-20 20:43 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 64 × Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake-avx512)
  Threads: 1 on 64 virtual cores
Environment:
  LD_LIBRARY_PATH = :/home/demin/libs2/usr/local/lib/
  3.512954 seconds (2.00 M allocations: 519.278 MiB, 54.29% gc time)
  3.799685 seconds (2.00 M allocations: 520.242 MiB, 61.02% gc time)
  3.384975 seconds (2.00 M allocations: 519.617 MiB, 58.15% gc time)
  3.464177 seconds (2.00 M allocations: 519.612 MiB, 59.38% gc time)
  3.584769 seconds (2.00 M allocations: 519.452 MiB, 60.04% gc time)

The output for the previous commit:

Julia Version 1.10.0-beta3.23
Commit d4809e5b77* (2023-10-20 09:45 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 64 × Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake-avx512)
  Threads: 1 on 64 virtual cores
Environment:
  LD_LIBRARY_PATH = :/home/demin/libs2/usr/local/lib/
  2.720464 seconds (2.00 M allocations: 519.775 MiB, 41.72% gc time)
  1.947959 seconds (2.00 M allocations: 519.718 MiB, 26.96% gc time)
  1.850764 seconds (2.00 M allocations: 519.615 MiB, 19.69% gc time)
  1.943128 seconds (2.00 M allocations: 519.803 MiB, 25.03% gc time)
  2.401440 seconds (2.00 M allocations: 519.135 MiB, 43.55% gc time)

@oscardssmith
Copy link
Member

that was what I thought it would be, but @JeffBezanson and I tried 1.8 and it was fast also, so I'm confused

@KristofferC KristofferC changed the title Possible regression in 1.10 Possible regression in GC time in 1.10 Apr 23, 2024
@KristofferC KristofferC removed this from the 1.11 milestone Apr 23, 2024
@KristofferC
Copy link
Sponsor Member

This is an unfortunate slowdown but since it is not new in 1.11 it probably doesn't make sense to hold the 1.11 release for this.

@sumiya11
Copy link
Contributor Author

Hi, is there progress with this?

There is more code in the ecosystem that is affected by the slowdown, as I would imagine. The reproducing example is a fairly common thing to write.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GC Garbage collector kind:regression Regression in behavior compared to a previous version performance Must go faster
Projects
None yet
Development

No branches or pull requests

8 participants