Possible regression in GC time in 1.10 #53018

sumiya11 · 2024-01-23T12:52:28Z

Hi!

Out of curiousity, I checked the performance of my package (Groebner) on recent 1.11.

It occurred to me that there could be a regression, and that the root of it could be somewhere in 1.10.
My back of the envelope benchmarks show a 2-5x slowdown in 1.10 compared to 1.9, see below.

I run the script

using Groebner, AbstractAlgebra

model = Groebner.cyclicn(8, k=GF(2^30 + 3), ordering=:degrevlex);

@time groebner(model);
@time groebner(model);
@time groebner(model);
@time groebner(model);
@time groebner(model);
@time groebner(model);

And get the output:

1.9.2 (official release):

  5.102363 seconds (1.96 M allocations: 338.489 MiB, 2.31% gc time, 73.30% compilation time)
  1.302451 seconds (159.63 k allocations: 221.873 MiB, 2.55% gc time)
  1.172796 seconds (159.63 k allocations: 221.873 MiB, 0.87% gc time)
  1.157940 seconds (159.63 k allocations: 221.873 MiB, 0.66% gc time)
  1.232590 seconds (159.63 k allocations: 221.873 MiB, 1.08% gc time)
  1.151201 seconds (159.63 k allocations: 221.873 MiB, 0.67% gc time)

1.10.0 (official release):

  8.735243 seconds (1.79 M allocations: 331.722 MiB, 44.51% gc time, 43.72% compilation time)
  1.465188 seconds (159.65 k allocations: 221.874 MiB)
  3.934508 seconds (159.65 k allocations: 164.941 MiB, 59.77% gc time)
  2.728948 seconds (159.65 k allocations: 162.659 MiB, 42.54% gc time)
  2.651898 seconds (159.65 k allocations: 162.659 MiB, 43.89% gc time)
  4.909563 seconds (159.65 k allocations: 221.874 MiB, 68.95% gc time)

1.11 (1 day old master):

  6.671727 seconds (3.48 M allocations: 456.577 MiB, 3.70% gc time, 75.23% compilation time)
  2.353556 seconds (264.26 k allocations: 294.034 MiB, 39.44% gc time)
  1.538453 seconds (264.26 k allocations: 294.034 MiB, 6.80% gc time)
  2.392527 seconds (264.26 k allocations: 294.034 MiB, 38.40% gc time)
  2.007164 seconds (264.26 k allocations: 294.034 MiB, 43.94% gc time)
  1.672259 seconds (264.26 k allocations: 294.034 MiB, 34.68% gc time)

One feature of the Groebner.groebner function is that it allocates a lot of small vectors throughout execution.
I can try to isolate this into a smaller example if it would make sense.

All julia versions use AbstractAlgebra v0.35.3 and Groebner v0.6.3.
I run julia without cli arguments and without a startup file.
My machine is:

Julia Version 1.11.0-DEV.1355
Commit 4d26be0847 (2024-01-22 21:30 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 64 × Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, skylake-avx512)
Threads: 1 default, 0 interactive, 1 GC (on 64 virtual cores)
Environment:
  LD_LIBRARY_PATH = :/home/demin/libs2/usr/local/lib/

The text was updated successfully, but these errors were encountered:

oscardssmith · 2024-01-23T14:39:15Z

I think this might be fixed by #52994.

Edit. this is unlikely to have fixed it since this is single threaded.

sumiya11 · 2024-01-23T15:00:01Z

Another quick comment: groebner is more or less pure, except that it mutates a couple of globals for logging, etc.

gbaraldi · 2024-01-23T15:10:05Z

The difference is that we run way too many full collections here, both in 1.10 and master. Specially in 1.10

sumiya11 · 2024-01-23T15:21:01Z

This should very roughly resemble the workflow in Groebner.jl:

# Very important work
function work0(m, n, k)
    for _ in 1:k
        polys = [rand(UInt32, rand(1:m)) for _ in 1:n]
        work1(polys)
    end
end

function work1(polys)
    res = 0
    for poly in polys
        new_poly = work2(poly)
        res += sum(new_poly)
    end
    res
end

function work2(poly)
    new_poly = similar(poly)
    work3!(new_poly)
end

function work3!(poly::AbstractVector{T}) where {T}
    poly[1] = one(T)
    for i in 2:length(poly)
        poly[i] = convert(T, i)^3 - poly[i - 1]
    end
    poly
end

using InteractiveUtils; versioninfo()
@time work0(100, 100_000, 10)
@time work0(100, 100_000, 10)
@time work0(100, 100_000, 10)
@time work0(100, 100_000, 10)
@time work0(100, 100_000, 10)

Resulting in

1.9:

Julia Version 1.9.2
Commit e4ee485e909 (2023-07-05 09:39 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 64 × Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake-avx512)
  Threads: 1 on 64 virtual cores
Environment:
  LD_LIBRARY_PATH = :/home/demin/libs2/usr/local/lib/
  2.161450 seconds (2.00 M allocations: 519.583 MiB, 34.35% gc time)
  1.992795 seconds (2.00 M allocations: 519.507 MiB, 32.74% gc time)
  1.951021 seconds (2.00 M allocations: 519.505 MiB, 32.37% gc time)
  1.949284 seconds (2.00 M allocations: 519.529 MiB, 32.36% gc time)
  1.855381 seconds (2.00 M allocations: 519.498 MiB, 30.36% gc time)

1.10:

Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 64 × Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake-avx512)
  Threads: 1 on 64 virtual cores
Environment:
  LD_LIBRARY_PATH = :/home/demin/libs2/usr/local/lib/
  4.185146 seconds (2.00 M allocations: 519.578 MiB, 53.11% gc time)
  4.822002 seconds (2.00 M allocations: 519.339 MiB, 60.95% gc time)
  4.269982 seconds (2.00 M allocations: 520.013 MiB, 57.77% gc time)
  4.377889 seconds (2.00 M allocations: 519.624 MiB, 57.68% gc time)
  4.543688 seconds (2.00 M allocations: 519.801 MiB, 57.88% gc time)

oscardssmith · 2024-01-23T15:38:31Z

for a more minimal mwe,

julia> function work0(m, n, k)
           res=0.0
           for _ in 1:k
               res += sum(sum.([rand(rand(1:m)) for _ in 1:n]))
           end
           res
       end
work0 (generic function with 1 method)

# 1.9 
  0.490611 seconds (1.00 M allocations: 470.900 MiB, 31.94% gc time)
2.5250631225162618e7

# 1.10 with gcthreads=1
julia> @time work0(100, 100_000, 10)
  0.666685 seconds (1.00 M allocations: 470.683 MiB, 47.04% gc time)
2.523670469905652e7

d-netto · 2024-01-24T19:39:01Z

Didn't investigate master yet.

The difference between 1.9 and 1.10 seems to be mostly on the number of full collections we're running (0 on 1.9, 3 on 1.10).

The regression is certainly still there, but it's not really surprising given that we returned to more aggressive GC heuristics in 1.10.

Also, note that we seem to be able to recover some of the performance in this MWE by using GC threads, at least on the M2.

MWE:

function work0(m, n, k)
   res=0.0
   for _ in 1:k
       res += sum(sum.([rand(rand(1:m)) for _ in 1:n]))
   end
   res
end

GC.enable_logging()
@time work0(100, 100_000, 10)

1.9:

GC: pause 31.11ms. collected 29.561881MB. incr 
GC: pause 5.35ms. collected 0.000936MB. incr 
GC: pause 6.60ms. collected 49.412432MB. incr 
GC: pause 4.48ms. collected 49.289232MB. incr 
GC: pause 3.89ms. collected 49.260032MB. incr 
GC: pause 4.60ms. collected 18446744073709.531250MB. incr 
GC: pause 7.14ms. collected 0.816240MB. incr 
GC: pause 7.80ms. collected 96.496656MB. incr 
GC: pause 6.02ms. collected 49.454624MB. incr 
  0.225451 seconds (1.00 M allocations: 470.933 MiB, 34.15% gc time)

1.10 (1 GC thread):


GC: pause 35.18ms. collected 25.071297MB. incr 

GC: pause 5.34ms. collected 0.000511MB. incr 

GC: pause 16.09ms. collected 0.763397MB. full 

GC: pause 43.02ms. collected 140.385666MB. full 

GC: pause 35.74ms. collected 94.071075MB. incr 

GC: pause 6.00ms. collected 46.402496MB. incr 

GC: pause 9.18ms. collected 1.424820MB. full 
  0.292375 seconds (1.00 M allocations: 470.905 MiB, 51.49% gc time)

1.10 (4 GC threads):


GC: pause 14.64ms. collected 25.071297MB. incr 

GC: pause 8.54ms. collected 0.000511MB. incr 

GC: pause 17.84ms. collected 0.763397MB. full 

GC: pause 29.04ms. collected 140.754700MB. full 

GC: pause 14.73ms. collected 94.285538MB. incr 

GC: pause 10.96ms. collected 46.489609MB. incr 

GC: pause 12.92ms. collected 1.427139MB. full 
  0.258420 seconds (1.00 M allocations: 471.498 MiB, 42.05% gc time)

sumiya11 · 2024-01-25T11:33:12Z

Also, note that we seem to be able to recover some of the performance in this MWE by using GC threads, at least on the M2.

Right, I also observe this

JeffBezanson · 2024-03-08T19:09:42Z

Do we know what commit changed this? Maybe a bisect is warranted? I think this is a pretty unacceptable regression.

sumiya11 · 2024-03-12T19:13:42Z

From 1.9.4 to 1.10.0, according to my bisect, 3b97715 is the first commit with this regression.

The output for 3b97715:

Julia Version 1.10.0-beta3.24
Commit 3b9771507a (2023-10-20 20:43 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 64 × Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake-avx512)
  Threads: 1 on 64 virtual cores
Environment:
  LD_LIBRARY_PATH = :/home/demin/libs2/usr/local/lib/
  3.512954 seconds (2.00 M allocations: 519.278 MiB, 54.29% gc time)
  3.799685 seconds (2.00 M allocations: 520.242 MiB, 61.02% gc time)
  3.384975 seconds (2.00 M allocations: 519.617 MiB, 58.15% gc time)
  3.464177 seconds (2.00 M allocations: 519.612 MiB, 59.38% gc time)
  3.584769 seconds (2.00 M allocations: 519.452 MiB, 60.04% gc time)

The output for the previous commit:

Julia Version 1.10.0-beta3.23
Commit d4809e5b77* (2023-10-20 09:45 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 64 × Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake-avx512)
  Threads: 1 on 64 virtual cores
Environment:
  LD_LIBRARY_PATH = :/home/demin/libs2/usr/local/lib/
  2.720464 seconds (2.00 M allocations: 519.775 MiB, 41.72% gc time)
  1.947959 seconds (2.00 M allocations: 519.718 MiB, 26.96% gc time)
  1.850764 seconds (2.00 M allocations: 519.615 MiB, 19.69% gc time)
  1.943128 seconds (2.00 M allocations: 519.803 MiB, 25.03% gc time)
  2.401440 seconds (2.00 M allocations: 519.135 MiB, 43.55% gc time)

oscardssmith · 2024-03-13T01:59:48Z

that was what I thought it would be, but @JeffBezanson and I tried 1.8 and it was fast also, so I'm confused

KristofferC · 2024-04-23T15:54:14Z

This is an unfortunate slowdown but since it is not new in 1.11 it probably doesn't make sense to hold the 1.11 release for this.

sumiya11 · 2024-05-26T13:25:32Z

Hi, is there progress with this?

There is more code in the ecosystem that is affected by the slowdown, as I would imagine. The reproducing example is a fairly common thing to write.

nsajko added the performance Must go faster label Jan 23, 2024

ViralBShah added the regression 1.10 Regression in the 1.10 release label Jan 23, 2024

KristofferC added kind:regression Regression in behavior compared to a previous version GC Garbage collector labels Jan 23, 2024

KristofferC added this to the 1.11 milestone Jan 23, 2024

gbaraldi removed the regression 1.10 Regression in the 1.10 release label Jan 23, 2024

KristofferC changed the title ~~Possible regression in 1.10~~ Possible regression in GC time in 1.10 Apr 23, 2024

KristofferC removed this from the 1.11 milestone Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible regression in GC time in 1.10 #53018

Possible regression in GC time in 1.10 #53018

sumiya11 commented Jan 23, 2024

oscardssmith commented Jan 23, 2024 •

edited

Loading

sumiya11 commented Jan 23, 2024

gbaraldi commented Jan 23, 2024 •

edited

Loading

sumiya11 commented Jan 23, 2024

oscardssmith commented Jan 23, 2024

d-netto commented Jan 24, 2024 •

edited

Loading

sumiya11 commented Jan 25, 2024

JeffBezanson commented Mar 8, 2024

sumiya11 commented Mar 12, 2024

oscardssmith commented Mar 13, 2024

KristofferC commented Apr 23, 2024

sumiya11 commented May 26, 2024

Possible regression in GC time in 1.10 #53018

Possible regression in GC time in 1.10 #53018

Comments

sumiya11 commented Jan 23, 2024

oscardssmith commented Jan 23, 2024 • edited Loading

sumiya11 commented Jan 23, 2024

gbaraldi commented Jan 23, 2024 • edited Loading

sumiya11 commented Jan 23, 2024

oscardssmith commented Jan 23, 2024

d-netto commented Jan 24, 2024 • edited Loading

sumiya11 commented Jan 25, 2024

JeffBezanson commented Mar 8, 2024

sumiya11 commented Mar 12, 2024

oscardssmith commented Mar 13, 2024

KristofferC commented Apr 23, 2024

sumiya11 commented May 26, 2024

oscardssmith commented Jan 23, 2024 •

edited

Loading

gbaraldi commented Jan 23, 2024 •

edited

Loading

d-netto commented Jan 24, 2024 •

edited

Loading