This repository has been archived by the owner on Mar 12, 2021. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
646: Improve mapreduce performance r=maleadt a=wongalvis14 ~More than 3-fold improvement over the latest implementation~ Benchmarking function from #611 First stage: Using the number of "max parallel threads a single block can hold" as the number of blocks, perform reduction with serial iteration if needed Second stage: Reduction in a single block, no serial iteration This approach aims to strike an optimal balance between workload of each thread, kernel launch overhead and parallel resource exhaustion. ``` New impl: julia> @benchmark pi_mc_cu(10000000) BenchmarkTools.Trial: memory estimate: 16.98 KiB allocs estimate: 468 -------------- minimum time: 2.520 ms (0.00% GC) median time: 2.536 ms (0.00% GC) mean time: 2.584 ms (0.64% GC) maximum time: 15.600 ms (50.62% GC) -------------- samples: 1930 evals/sample: 1 Old recursion impl: julia> @benchmark pi_mc_cu(10000000) BenchmarkTools.Trial: memory estimate: 17.05 KiB allocs estimate: 472 -------------- minimum time: 4.059 ms (0.00% GC) median time: 4.076 ms (0.00% GC) mean time: 4.130 ms (0.64% GC) maximum time: 23.199 ms (63.12% GC) -------------- samples: 1209 evals/sample: 1 Latest serial impl: BenchmarkTools.Trial: memory estimate: 7.81 KiB allocs estimate: 242 -------------- minimum time: 8.544 ms (0.00% GC) median time: 8.579 ms (0.00% GC) mean time: 8.622 ms (0.27% GC) maximum time: 26.172 ms (41.80% GC) -------------- samples: 580 evals/sample: 1 ``` Co-authored-by: wongalvis14 <wongalvis14@gmail.com> Co-authored-by: Tim Besard <tim.besard@gmail.com>
- Loading branch information