Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

50% performance regression in map! #35914

Open
stevengj opened this issue May 17, 2020 · 9 comments
Open

50% performance regression in map! #35914

stevengj opened this issue May 17, 2020 · 9 comments
Labels
performance Must go faster regression Regression in behavior compared to a previous version

Comments

@stevengj
Copy link
Member

From discourse, I see a big performance regression compared to Julia 1.0 in a simple map! call starting in Julia 1.2:

using BenchmarkTools
test8!(D, A, B, C) = map!((a, b, c) -> a + b + c, D, A, B, C)
A = rand(1000,1000); B = rand(1000,1000); C = rand(1000,1000); D = zeros(1000,1000);
@btime test8!($D,$A,$B,$C);

Julia 1.0.4 gives 1.817 ms and Julia 1.1.0 gives 1.961 ms, but Julia 1.2.0 gives 3.004 ms, Julia 1.3.0 gives 3.091 ms, and Julia 1.4.0 gives 3.006 ms.

@stevengj stevengj added performance Must go faster regression Regression in behavior compared to a previous version labels May 17, 2020
@abarankab
Copy link

Hi, I did some benchmarks and the cause is this line. If you remove it and rebuild Julia, the test runs in 1.490 ms as compared to 2.927 ms with it. Adding the @inbounds macro didn't help and the runtime was about 3 ms as before. Adding the --check-bounds=no flag speeds up the code to 1.487 ms.

I'm currently investigating why @inbounds didn't elide the @boundscheck block in map_n!. Should be something about inlines or propagating inbounds.

@mbauman
Copy link
Member

mbauman commented May 18, 2020

Hah, wow, that's quite the literal speed bump. There are three issues here:

  • Yes, the @boundscheck will never be elided as that function doesn't inline (and isn't called with @inbounds anyways... and it's a dimension check, not a bounds check.
  • It just checks if the indices are == and doesn't actually throw a DimensionMismatch if they're not
  • LinearIndices doesn't have a specialized == method so it's using the fallback O(n) algorithm from AbstractArray

@stevengj
Copy link
Member Author

A fourth issue is that it assumes fast linear indexing, i.e. it is assuming IndexStyle(args...) isa IndexLinear

@tkf
Copy link
Member

tkf commented May 19, 2020

I think a generic and composable approach to implement IndexStyle-generic map! would be to implement it in terms of foldl. See #35036 for how I used it for foreach(f, arrays...). It's generally better than raw for loops. (But it's more like a possible future improvement rather than a performance bug fix.)

@vtjnash
Copy link
Member

vtjnash commented Jan 28, 2021

From Matt's comment, we see these problems and slow down were introduced by f9645ff, with the belief that the changes would make this function faster.

@KZiemian
Copy link

KZiemian commented Mar 22, 2021

I repeat benchmarking in Julia 1.6.0-rc3, performance of map! where adding matrices is around 40% better than in v. 1.4, but still much slower than simple A + B + C.

using BenchmarkTools

A = rand(1000, 1000);  B = rand(1000, 1000); C = rand(1000, 1000); D = rand(1000, 1000);
test(A, B, C) = A + B + C
@btime test($A, $B, $C);
  3.575 ms (2 allocations: 7.63 MiB)

test8!(D, A, B, C) = map!((a, b, c) -> a + b + c, D, A, B, C)
@btime test8!($D, $A, $B, $C)
  6.457 ms (0 allocations: 0 bytes)

@ViralBShah
Copy link
Member

Still slower on Julia 1.8 compared to 1.0 as originally reported.

@vtjnash
Copy link
Member

vtjnash commented Feb 6, 2024

The problem resolved itself in v1.10

@vtjnash vtjnash closed this as completed Feb 6, 2024
@chriselrod
Copy link
Contributor

chriselrod commented Feb 6, 2024

julia> @btime test($A, $B, $C);
  1.346 ms (2 allocations: 7.63 MiB)

julia> @btime test8!($D, $A, $B, $C);
  2.354 ms (0 allocations: 0 bytes)

julia> versioninfo()
Julia Version 1.10.0
Commit 3120989f39 (2023-12-25 18:01 UTC)
Build Info:

    Note: This is an unofficial build, please report bugs to the project
    responsible for this build and not to the Julia project unless you can
    reproduce the issue using official builds available at https://julialang.org/downloads

Platform Info:
  OS: Linux (x86_64-redhat-linux)
  CPU: 36 × Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake-avx512)
  Threads: 1 on 36 virtual cores

The assembly for test8! shows it is not vectorized. The example from #35914 (comment) has not resolved itself.

@ViralBShah ViralBShah reopened this Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster regression Regression in behavior compared to a previous version
Projects
None yet
Development

No branches or pull requests

8 participants