Split compilation of dense kernels #1375

upsj · 2023-07-30T13:42:26Z

The compilation time, especially for DPC++, is dominated by dense_kernels.cpp, more precisely the reduction kernels with their large number of combinations. By splitting them up, we can improve the parallel efficiency of the builds significantly.

I think these changes are also part of #972

sonarqubecloud · 2023-07-31T00:33:46Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
14 Code Smells

59.6% Coverage
0.0% Duplication

codecov · 2023-07-31T00:37:42Z

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.01% 🎉

Comparison is base (945a4d8) 91.18% compared to head (331a70d) 91.19%.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1375      +/-   ##
===========================================
+ Coverage    91.18%   91.19%   +0.01%     
===========================================
  Files          600      608       +8     
  Lines        50695    50693       -2     
===========================================
+ Hits         46228    46232       +4     
+ Misses        4467     4461       -6

Files Changed	Coverage Δ
common/unified/matrix/dense_kernels.cpp	`100.00% <ø> (ø)`
common/unified/matrix/dense_kernels_conv.cpp	`100.00% <100.00%> (ø)`
common/unified/matrix/dense_kernels_conv_ell.cpp	`100.00% <100.00%> (ø)`
common/unified/matrix/dense_kernels_conv_sellp.cpp	`100.00% <100.00%> (ø)`
common/unified/matrix/dense_kernels_dot.cpp	`100.00% <100.00%> (ø)`
common/unified/matrix/dense_kernels_dot_conj.cpp	`100.00% <100.00%> (ø)`
common/unified/matrix/dense_kernels_norm1.cpp	`100.00% <100.00%> (ø)`
common/unified/matrix/dense_kernels_norm2.cpp	`100.00% <100.00%> (ø)`
common/unified/matrix/dense_kernels_norm2_sq.cpp	`100.00% <100.00%> (ø)`

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

common/unified/matrix/dense_kernels_norm1.cpp

pratikvn · 2023-07-31T07:00:07Z

Can we somehow make this automatic and not manually split this into files ? I think this unnecessarily complicates our file structure. How about something like a .tpp file, which marked with macros and using CMake to automatically generate the separate compilation units at build time to enable parallelism ?

upsj · 2023-07-31T07:12:32Z

@pratikvn I don't believe that's possible in the way we do it, e.g. for Jacobi. We would most likely need to do this on the individual sub-configuration instantiation like we do with jacobi_kernels.cu, which is not really possible with the current file structure of the reduction kernels. At least it would mean a separation of the kernel and its configurations into two different files.

Co-authored-by: Yuhsiang M. Tsai <yhmtsai@gmail.com>

upsj · 2023-07-31T07:27:51Z

@pratikvn or maybe I am misunderstanding your suggestion, can you give an example?

yhmtsai · 2023-07-31T07:40:53Z

I think some script to split all functions with the GKO_INSTANTIATION function into different files.
for each GKO_INSTANTIATION function,

#include "*.cpp" // to get all necessary compenents
namespace list{
GKO_INSTANTIATION(function name);
}

each instantiation creates a file and compiles those files.
note. it may not work if there's a function without template or inline or anonymous namespace

pratikvn · 2023-07-31T07:48:58Z

Also, how much of an effect does this splitting have ? If possible, I would like to avoid this fragmentation. I think it makes finding functions quite hard and also not really scalable as each reduction operation now has its own file.

upsj · 2023-07-31T07:49:49Z

Ah thanks, I like the idea. We only have a single file with enough compile time where this is necessary though, maybe we can pick this idea up again if we have more such files at some point?

upsj · 2023-07-31T08:11:12Z

@pratikvn Compiling the full file takes 5m53.388s, compiling the individual files in parallel takes 1m1.452s. In my tests, the total build was stuck at least 2-3 minutes at this single file.

pratikvn

In general I would like to avoid this type of fragmentation. But given that the benefit seems to be large, as a temporary measure, it looks acceptable to be. But I think we should try to do this automatically if possible, so that it is scalable, and the build system can handle the splitting into different instantiation units. See suggestion from @yhmtsai above.

upsj · 2023-07-31T08:57:21Z

@pratikvn this is not meant as a temporary measure. We do the same thing already for Jacobi and ParILUT kernels. If we want to split things into definition and instantiation, we don't really need to use automation (and I don't see any advantage of that), but it still leads to the same number of files, with the small disadvantage that IDEs will in my experience have a harder time displaying information about the function templates without instantiations available.

upsj · 2023-07-31T09:00:29Z

small note: We may need to do something similar for csr_kernels (at least in HIP), because I currently have a debug build taking 56 minutes ~~and counting~~ (it finished) on that file

yhmtsai · 2023-07-31T11:18:41Z

from the splitting definition and instantiation, it generates more files than the origin.
The automation applies to the develop's version, not this branch.
One Cpp has several def and $n$ instantiation (only count macro)
->
One Cpp + $n$ file for different function instantiation.
For me, the 5 min does not sound like a problem.

upsj · 2023-07-31T15:17:02Z

After spending way too much time on this, I think I can conclude that making this work semi-automatically is not worth the effort. The only alternative is what Mike suggested, e.g. putting all template instantiations into separate files and keeping the code in one place. What do you think about that?

upsj · 2023-08-01T13:14:24Z

superseded by #1378

Split compilation of dense kernels

331a70d

upsj added the 1:ST:ready-for-review This PR is ready for review label Jul 30, 2023

upsj requested a review from a team July 30, 2023 13:42

upsj self-assigned this Jul 30, 2023

ginkgo-bot added reg:build This is related to the build system. mod:cuda This is related to the CUDA module. type:matrix-format This is related to the Matrix formats mod:hip This is related to the HIP module. labels Jul 30, 2023

yhmtsai approved these changes Jul 31, 2023

View reviewed changes

common/unified/matrix/dense_kernels_norm1.cpp Outdated Show resolved Hide resolved

update formatting

ce8de9f

Co-authored-by: Yuhsiang M. Tsai <yhmtsai@gmail.com>

upsj added the 1:ST:run-full-test label Jul 31, 2023

pratikvn approved these changes Jul 31, 2023

View reviewed changes

upsj mentioned this pull request Jul 31, 2023

Split compilation of large files automatically #1378

Merged

upsj closed this Aug 1, 2023

upsj deleted the split_large_files branch August 26, 2024 14:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split compilation of dense kernels #1375

Split compilation of dense kernels #1375

upsj commented Jul 30, 2023 •

edited

Loading

sonarqubecloud bot commented Jul 31, 2023

codecov bot commented Jul 31, 2023

pratikvn commented Jul 31, 2023

upsj commented Jul 31, 2023

upsj commented Jul 31, 2023

yhmtsai commented Jul 31, 2023 •

edited

Loading

pratikvn commented Jul 31, 2023

upsj commented Jul 31, 2023

upsj commented Jul 31, 2023

pratikvn left a comment

upsj commented Jul 31, 2023

upsj commented Jul 31, 2023 •

edited

Loading

yhmtsai commented Jul 31, 2023 •

edited

Loading

upsj commented Jul 31, 2023

upsj commented Aug 1, 2023

Split compilation of dense kernels #1375

Split compilation of dense kernels #1375

Conversation

upsj commented Jul 30, 2023 • edited Loading

sonarqubecloud bot commented Jul 31, 2023

codecov bot commented Jul 31, 2023

Codecov Report

pratikvn commented Jul 31, 2023

upsj commented Jul 31, 2023

upsj commented Jul 31, 2023

yhmtsai commented Jul 31, 2023 • edited Loading

pratikvn commented Jul 31, 2023

upsj commented Jul 31, 2023

upsj commented Jul 31, 2023

pratikvn left a comment

Choose a reason for hiding this comment

upsj commented Jul 31, 2023

upsj commented Jul 31, 2023 • edited Loading

yhmtsai commented Jul 31, 2023 • edited Loading

upsj commented Jul 31, 2023

upsj commented Aug 1, 2023

upsj commented Jul 30, 2023 •

edited

Loading

yhmtsai commented Jul 31, 2023 •

edited

Loading

upsj commented Jul 31, 2023 •

edited

Loading

yhmtsai commented Jul 31, 2023 •

edited

Loading