Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turbo findmax #84

Merged
merged 5 commits into from
Aug 10, 2023
Merged

Conversation

chriselrod
Copy link
Contributor

@chriselrod chriselrod commented Aug 9, 2023

After:

julia> @benchmark RecursiveFactorization.lu!(copyto!($B,$A), $ipiv,Val(true),Val(false), blocksize=8)
BenchmarkTools.Trial: 2562 samples with 1 evaluation.
 Range (min  max):  384.506 μs  399.233 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     386.841 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   387.488 μs ±   1.524 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                ▃▂▄▄▅█▅▂▁▂                                       
  ▂▁▁▁▂▂▂▃▃▄▆▅▇████████████▇▅▄▄▃▃▃▃▃▃▃▄▄▅▄▆▆▇▇▇▇▇▇█▆▇▆▇▅▅▄▃▃▃▃▃ ▄
  385 μs           Histogram: frequency by time          391 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

Before:

julia> @benchmark RecursiveFactorization.lu!(copyto!($B,$A), $ipiv,Val(true),Val(false), blocksize=8)
BenchmarkTools.Trial: 2318 samples with 1 evaluation.
 Range (min  max):  426.292 μs  445.566 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     428.158 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   428.777 μs ±   1.491 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

           ▁▂▄▆▆█▅▆▆▆▁▃▁                    ▁▁▁▁▃▁ ▂▁            
  ▂▂▂▂▃▅▄▅██████████████▇▇▄▃▃▂▁▂▂▂▂▂▃▅▄▄▄▅▇▆██████▆██▇▆▆▄▄▃▃▃▃▃ ▄
  426 μs           Histogram: frequency by time          432 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

Setup:

@time using RecursiveFactorization
A = rand(300,300);
B = similar(A);
ipiv = Vector{Int}(undef,300);

Benchmarks across all sizes 4:500, RF 0.2.19:
lubenchrf19
lubenchturbofindma
versus PR on bottom.

Summary stats of relative differences:

julia> StatsBase.summarystats(respr ./ res19)
Summary Stats:
Length:         497
Missing Count:  0
Mean:           1.086197
Minimum:        0.889275
1st Quartile:   1.059688
Median:         1.092518
3rd Quartile:   1.120472
Maximum:        1.254754

So, 8.6% increase in GFLOPS on average on

julia> versioninfo()
Julia Version 1.9.3-DEV.0
Commit 6fc1be04ee (2023-07-06 14:55 UTC)
Platform Info:
  OS: Linux (x86_64-generic-linux)
  CPU: 36 × Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, cascadelake)
  Threads: 1 on 36 virtual cores

Copy link
Member

@YingboMa YingboMa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good, but the test failure looks real.

@chriselrod
Copy link
Contributor Author

This looks pretty good, but the test failure looks real.

It is. It's specific to x86_64 CPUs without AVX2.

@chriselrod
Copy link
Contributor Author

For a long time, I tried to get good performance on them when using integers by using Int32 instead of Int64, but that introduced bugs. I think I should just not care about performance and use Int64.

@chriselrod
Copy link
Contributor Author

chriselrod commented Aug 9, 2023

@YingboMa fixed upstream; all green

@chriselrod chriselrod merged commit 75ca426 into JuliaLinearAlgebra:master Aug 10, 2023
15 checks passed
@chriselrod chriselrod deleted the turbofindmax branch August 10, 2023 00:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants