Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sortperm algorithm #124

Closed
henry2004y opened this issue Dec 16, 2022 · 3 comments
Closed

sortperm algorithm #124

henry2004y opened this issue Dec 16, 2022 · 3 comments

Comments

@henry2004y
Copy link
Owner

henry2004y commented Dec 16, 2022

While comparing the metadata loading performance between Julia and Python, I noticed that a large portion of the time is spent in sortperm in Julia and argsort in Numpy. As of Julia 1.8, the default sorting algorithm, unstable QuickSort, is not so fast: using MergeSort has higher memory usage but is faster. argsort in Numpy is implemented in OpenBlas/MKL/Blas.

See JuliaCollections/SortingAlgorithms.jl#33, JuliaLang/julia#939, JuliaLang/julia#47587

The default sorting algorithm is going to change in Julia 1.9. For now, maybe we first use MergeSort and then compare with the new sorting algorithm in Julia 1.9 once it's released.

henry2004y added a commit that referenced this issue Dec 16, 2022
@henry2004y
Copy link
Owner Author

henry2004y commented Dec 18, 2022

It turns out that Numpy uses multithreading by default, whereas Julia uses 1 thread by default. There is no multithreaded Base.sortperm right now, but it can be supported via external libraries, e.g. ThreadsX.jl. This also explains why for small number of cells we have better performance in Julia, but for large number of cells we have better performance in Numpy.

MergeSort:

Number of Cells Serial Base [ms] 1 thread [ms] 2 threads [ms] 4 threads [ms]
6,300 0.24 0.29 0.22 0.18
51,200 1.6 1.9 1.2 0.78
3,966,580 238 283 173 140
4,612,500 231 280 171 125

QuickSort:

Number of Cells Serial Base [ms] 1 thread [ms] 2 threads [ms] 4 threads [ms]
6,300 0.28 0.30 0.30 0.30
51,200 1.9 2.6 1.6 0.98
3,966,580 335 422 244 150
4,612,500 303 442 250 169

Currently with both Julia 1.8 and 1.9, using ThreadsX add ~0.4 s to FFTX. Further tests are needed for the new versions.

@henry2004y
Copy link
Owner Author

henry2004y commented Feb 11, 2023

bulk1.0001298.vlsv from EGI (length(cellid) = 3966580):

julia> @benchmark cellindex = sortperm($cellid, alg=QuickSort)
BenchmarkTools.Trial: 17 samples with 1 evaluation.
 Range (min  max):  302.087 ms  323.626 ms  ┊ GC (min  max): 0.00%  0.05%
 Time  (median):     308.169 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   309.514 ms ±   5.991 ms  ┊ GC (mean ± σ):  0.14% ± 0.26%

  ▁   ▁ ▁ ▁ ▁  ▁▁▁ █   ▁  █ ▁  ▁                              █  
  █▁▁▁█▁█▁█▁█▁▁███▁█▁▁▁█▁▁█▁█▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  302 ms           Histogram: frequency by time          324 ms <

 Memory estimate: 30.26 MiB, allocs estimate: 2.
julia> @benchmark cellindex = sortperm($cellid)
BenchmarkTools.Trial: 19 samples with 1 evaluation.
 Range (min  max):  260.656 ms  272.919 ms  ┊ GC (min  max): 0.98%  0.10%
 Time  (median):     268.170 ms               ┊ GC (median):    0.12%
 Time  (mean ± σ):   267.057 ms ±   4.192 ms  ┊ GC (mean ± σ):  0.51% ± 0.55%

  █ ▁       ▁▁ ▁  ▁        ▁   ▁       ▁     ▁  █▁  ▁ ▁ ▁▁    ▁  
  █▁█▁▁▁▁▁▁▁██▁█▁▁█▁▁▁▁▁▁▁▁█▁▁▁█▁▁▁▁▁▁▁█▁▁▁▁▁█▁▁██▁▁█▁█▁██▁▁▁▁█ ▁
  261 ms           Histogram: frequency by time          273 ms <

 Memory estimate: 60.53 MiB, allocs estimate: 4.
julia> @benchmark cellindex = sortperm($cellid, alg=MergeSort)
BenchmarkTools.Trial: 21 samples with 1 evaluation.
 Range (min  max):  237.798 ms  246.445 ms  ┊ GC (min  max): 1.19%  0.00%
 Time  (median):     242.145 ms               ┊ GC (median):    0.13%
 Time  (mean ± σ):   241.943 ms ±   2.068 ms  ┊ GC (mean ± σ):  0.42% ± 0.49%

                                ▃        █ ▃                     
  ▇▁▁▁▁▁▁▁▁▇▁▇▁▁▇▇▇▁▁▇▁▁▁▁▇▇▁▁▁▁█▁▁▁▇▇▁▁▁█▇█▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇ ▁
  238 ms           Histogram: frequency by time          246 ms <

 Memory estimate: 45.39 MiB, allocs estimate: 4.

Merge sort is still the winner here for serial sorting.

@henry2004y
Copy link
Owner Author

Some new attempts that might be useful: https://github.com/LSchwerdt/SimultaneousSortperm.jl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant