Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in package loading on Windows starting in Julia 1.10.0-rc2 #52711

Closed
jaakkor2 opened this issue Jan 3, 2024 · 5 comments · Fixed by #52758 or #52759
Closed

Regression in package loading on Windows starting in Julia 1.10.0-rc2 #52711

jaakkor2 opened this issue Jan 3, 2024 · 5 comments · Fixed by #52758 or #52759
Labels
compiler:latency Compiler latency

Comments

@jaakkor2
Copy link
Contributor

jaakkor2 commented Jan 3, 2024

There seems to be a regression in some package loading times and allocations on Windows that started between 1.10.0-rc1 and 1.10.0-rc2. Linux seems to be unaffected.

All binaries are official, from juliaup on Windows and https://julialang.org/downloads/ on Linux. Julia is started with --startup-file=no.

Linux

julia> @time using Makie # Linux, Julia 1.10.0-rc1, Makie 0.20.3
  3.555613 seconds (2.49 M allocations: 167.027 MiB, 3.49% gc time, 1.44% compilation time: 53% of which was recompilation)

julia> @time using Makie # Linux, Julia 1.10.0, Makie 0.20.3
  3.644380 seconds (2.51 M allocations: 168.201 MiB, 2.23% gc time, 1.84% compilation time: 50% of which was recompilation)

Windows

julia> @time using Makie # Windows, Julia 1.10.0-rc1, Makie 0.20.3
  2.182203 seconds (2.55 M allocations: 170.186 MiB, 2.63% gc time, 1.59% compilation time: 46% of which was recompilation)

julia> @time using Makie # Windows, Julia 1.10.0-rc2, Makie 0.20.3
  2.516064 seconds (3.21 M allocations: 216.767 MiB, 2.54% gc time, 14.87% compilation time: 95% of which was recompilation)

julia> @time using Makie # Windows, Julia 1.10.0, Makie 0.20.3
  2.529594 seconds (3.20 M allocations: 215.536 MiB, 2.61% gc time, 14.74% compilation time: 95% of which was recompilation)
julia> @time using Graphs # Windows, Julia 1.10.0-rc1, Graphs 1.9.0
  0.419079 seconds (511.44 k allocations: 34.292 MiB, 1.76% compilation time)

julia> @time using Graphs # Windows, Julia 1.10.0-rc2, Graphs 1.9.0
  0.547158 seconds (698.31 k allocations: 47.068 MiB, 6.43% gc time, 24.97% compilation time: 95% of which was recompilation)
julia> @time using StaticArrays, SparseArrays # Windows, Julia 1.10.0-rc1, StaticArrays 1.9.0, SparseArrays 1.10.0
  0.353384 seconds (463.29 k allocations: 30.355 MiB, 2.10% compilation time)

julia> @time using StaticArrays, SparseArrays # Windows, Julia 1.10.0-rc2, StaticArrays 1.9.0, SparseArrays 1.10.0
  0.483263 seconds (649.90 k allocations: 43.212 MiB, 7.07% gc time, 27.88% compilation time: 95% of which was recompilation)

Some packages does not show this, for example OrdinaryDiffEq. Also not shown by using SparseArrays, StaticArrays.

@KristofferC
Copy link
Sponsor Member

KristofferC commented Jan 3, 2024

Here are the commits that were put into rc2: #52045. If someone feels up for it, doing a bisect to find the offending one would be very helpful.

@jaakkor2
Copy link
Contributor Author

jaakkor2 commented Jan 4, 2024

Offending PR seems to be #51371.

Note: #51698 is cherry-picked to fix compilation on Cygwin mingw64.

Last good

git checkout 99d71c0b5921678130d9520c34d6ad8caa6c5a9a # More helpful error message for empty `cpu_target` in `Base.julia_cmd` (#52217)
git cherry-pick 4ef353c6e4 # Put mingw32 `*.a` files in `private_libdir` (#51698)
julia> @time using Makie # Version 1.10.0-rc1.22 (2024-01-04), HEAD/82e67b2e7c* (fork: 212 commits, 185 days)
  2.183912 seconds (2.54 M allocations: 170.143 MiB, 2.84% gc time, 1.73% compilation time: 50% of which was recompilation)

First bad

git checkout 2638522b7576fce9a951ac54efbdbbacf7148a8d # Memoize `cwstring` when used for env lookup / modification on Windows (#51371)
git cherry-pick 4ef353c6e4 # Put mingw32 `*.a` files in `private_libdir` (#51698)
julia> @time using Makie # Version 1.10.0-rc1.23 (2024-01-04), HEAD/8a31dab105* (fork: 213 commits, 185 days)
  2.479346 seconds (3.20 M allocations: 215.322 MiB, 3.02% gc time, 15.27% compilation time: 95% of which was recompilation)

@KristofferC
Copy link
Sponsor Member

If I look at what gets compiled when loading Makie the difference seems to be that in 1.10.0 the following signatures are compiled (which are not present in 1.10.0-rc1):

precompile(Tuple{typeof(Artifacts._artifact_str), Module, String, Base.SubString{String}, String, Base.Dict{String, Any}, Base.SHA1, Base.BinaryPlatforms.Platform, Any})
precompile(Tuple{typeof(Base.setindex!), Base.EnvDict, String, String})
precompile(Tuple{typeof(Base.getindex), Base.EnvDict, String})
precompile(Tuple{typeof(FFTW.__init__)})
precompile(Tuple{typeof(LazyModules.__init__)})

I think the _artifact_str is quite expensive to compile. Running with @time_imports:

┌ precompile(Tuple{typeof(Artifacts._artifact_str), Module, String, Base.SubString{String}, String, Base.Dict{String, Any}, Base.SHA1, Base.BinaryPlatforms.Platform, Any})
346.7 ms EarCut_jll.__init__() 99.45% compilation time (100% recompilation)
    348.0 ms  EarCut_jll 99.08% compilation time (100% recompilation)

seems to confirm that.

@KristofferC
Copy link
Sponsor Member

KristofferC commented Jan 4, 2024

SnoopCompile also shows

    5: signature Tuple{typeof(convert), Type{Vector{UInt16}}, Any} triggered MethodInstance for setindex!(::IdDict{String, Vector{UInt16}}, ::Any, ::Any) (3456 children)

julia> show(root)
MethodInstance for setindex!(::IdDict{String, Vector{UInt16}}, ::Any, ::Any) (3456 children)
 MethodInstance for Base.memoized_env_lookup(::String) (3455 children)
  MethodInstance for Base.access_env(::Returns{Nothing}, ::String) (81 children)
   MethodInstance for get(::Base.EnvDict, ::String, ::Nothing) (79 children)
   ⋮
  ⋮
  MethodInstance for Base._hasenv(::String) (162 children)

which seems to correspond to the bisected PR. The ::Any are inferred due to the @nospecializes in the IdDict code:

function setindex!(d::IdDict{K,V}, @nospecialize(val), @nospecialize(key)) where {K, V}

Maybe this could be changed to a Dict? Although the situation seems quite fragile. Improving the inferrability of the Artifacts code could probably be another way to fix it.

@KristofferC
Copy link
Sponsor Member

The artifact_str invalidation can be reprod with:

struct Foo{T, N} <: AbstractArray{T, N} end

function Base.convert(::Type{Array{T,N}}, ::Foo) where{T,N}
    Array{T,N}(undef, 0)
end

using EarCut_jll

KristofferC added a commit that referenced this issue Jan 8, 2024
…dows env variables (#52758)

Should fix #52711.

My analysis of the invalidation is as follows:

We added code to cache the conversion to `cwstring` in env handling on
Windows (#51371):

```julia
const env_dict = IdDict{String, Vector{UInt16}}()

function memoized_env_lookup(str::AbstractString)
    ...
    env_dict[str] = cwstring(str)
    ...
end

function access_env(onError::Function, str::AbstractString)
    var = memoized_env_lookup(str)
    ...
end
```

Since `IdDict` has `@nospecialize` on `setindex!` we compile this
method:

```julia
setindex!(::IdDict{String, Vector{UInt16}}, ::Any, ::Any)
```

which has an edge to:

```julia
convert(Type{Vector{Int64}}, Any})
```

But then StaticArrays comes along and adds a method

```julia
convert(::Type{Array{T, N}}, ::StaticArray)
```

which invalidates the `setindex!` (due to the edge to `convert`) which
invalidates the whole env handling on Windows which causes 4k other
methods downstream to be invalidated, in particular, the artifact string
macro which causes a significant delay in the next jll package you load
after loading StaticArrays.

There should be no performance penalty to this since strings already
does a hash for their `objectid`.
KristofferC added a commit that referenced this issue Jan 8, 2024
This was in an attempt to fix
#52711 which ultimately proved
unsuccessful. This might however still be useful in other scenarios.

```
using JET
 report_opt(Tuple{typeof(Artifacts._artifact_str), Module, String, Base.SubString{String}, String, Base.Dict{String, Any}, Base.SHA1, Base.BinaryPlatforms.Platform, Any}; target_modules=[Artifacts])
```

is quite a bit cleaner with this.
KristofferC added a commit that referenced this issue Jan 24, 2024
…dows env variables (#52758)

Should fix #52711.

My analysis of the invalidation is as follows:

We added code to cache the conversion to `cwstring` in env handling on
Windows (#51371):

```julia
const env_dict = IdDict{String, Vector{UInt16}}()

function memoized_env_lookup(str::AbstractString)
    ...
    env_dict[str] = cwstring(str)
    ...
end

function access_env(onError::Function, str::AbstractString)
    var = memoized_env_lookup(str)
    ...
end
```

Since `IdDict` has `@nospecialize` on `setindex!` we compile this
method:

```julia
setindex!(::IdDict{String, Vector{UInt16}}, ::Any, ::Any)
```

which has an edge to:

```julia
convert(Type{Vector{Int64}}, Any})
```

But then StaticArrays comes along and adds a method

```julia
convert(::Type{Array{T, N}}, ::StaticArray)
```

which invalidates the `setindex!` (due to the edge to `convert`) which
invalidates the whole env handling on Windows which causes 4k other
methods downstream to be invalidated, in particular, the artifact string
macro which causes a significant delay in the next jll package you load
after loading StaticArrays.

There should be no performance penalty to this since strings already
does a hash for their `objectid`.

(cherry picked from commit b7c24ed)
Drvi pushed a commit to RelationalAI/julia that referenced this issue Jun 7, 2024
…dows env variables (JuliaLang#52758)

Should fix JuliaLang#52711.

My analysis of the invalidation is as follows:

We added code to cache the conversion to `cwstring` in env handling on
Windows (JuliaLang#51371):

```julia
const env_dict = IdDict{String, Vector{UInt16}}()

function memoized_env_lookup(str::AbstractString)
    ...
    env_dict[str] = cwstring(str)
    ...
end

function access_env(onError::Function, str::AbstractString)
    var = memoized_env_lookup(str)
    ...
end
```

Since `IdDict` has `@nospecialize` on `setindex!` we compile this
method:

```julia
setindex!(::IdDict{String, Vector{UInt16}}, ::Any, ::Any)
```

which has an edge to:

```julia
convert(Type{Vector{Int64}}, Any})
```

But then StaticArrays comes along and adds a method

```julia
convert(::Type{Array{T, N}}, ::StaticArray)
```

which invalidates the `setindex!` (due to the edge to `convert`) which
invalidates the whole env handling on Windows which causes 4k other
methods downstream to be invalidated, in particular, the artifact string
macro which causes a significant delay in the next jll package you load
after loading StaticArrays.

There should be no performance penalty to this since strings already
does a hash for their `objectid`.

(cherry picked from commit b7c24ed)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:latency Compiler latency
Projects
None yet
2 participants