Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split compilation of dense kernels #1375

Closed
wants to merge 2 commits into from
Closed

Split compilation of dense kernels #1375

wants to merge 2 commits into from

Conversation

upsj
Copy link
Member

@upsj upsj commented Jul 30, 2023

The compilation time, especially for DPC++, is dominated by dense_kernels.cpp, more precisely the reduction kernels with their large number of combinations. By splitting them up, we can improve the parallel efficiency of the builds significantly.

I think these changes are also part of #972

@upsj upsj added the 1:ST:ready-for-review This PR is ready for review label Jul 30, 2023
@upsj upsj requested a review from a team July 30, 2023 13:42
@upsj upsj self-assigned this Jul 30, 2023
@ginkgo-bot ginkgo-bot added reg:build This is related to the build system. mod:cuda This is related to the CUDA module. type:matrix-format This is related to the Matrix formats mod:hip This is related to the HIP module. labels Jul 30, 2023
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 14 Code Smells

59.6% 59.6% Coverage
0.0% 0.0% Duplication

@codecov
Copy link

codecov bot commented Jul 31, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.01% 🎉

Comparison is base (945a4d8) 91.18% compared to head (331a70d) 91.19%.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1375      +/-   ##
===========================================
+ Coverage    91.18%   91.19%   +0.01%     
===========================================
  Files          600      608       +8     
  Lines        50695    50693       -2     
===========================================
+ Hits         46228    46232       +4     
+ Misses        4467     4461       -6     
Files Changed Coverage Δ
common/unified/matrix/dense_kernels.cpp 100.00% <ø> (ø)
common/unified/matrix/dense_kernels_conv.cpp 100.00% <100.00%> (ø)
common/unified/matrix/dense_kernels_conv_ell.cpp 100.00% <100.00%> (ø)
common/unified/matrix/dense_kernels_conv_sellp.cpp 100.00% <100.00%> (ø)
common/unified/matrix/dense_kernels_dot.cpp 100.00% <100.00%> (ø)
common/unified/matrix/dense_kernels_dot_conj.cpp 100.00% <100.00%> (ø)
common/unified/matrix/dense_kernels_norm1.cpp 100.00% <100.00%> (ø)
common/unified/matrix/dense_kernels_norm2.cpp 100.00% <100.00%> (ø)
common/unified/matrix/dense_kernels_norm2_sq.cpp 100.00% <100.00%> (ø)

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pratikvn
Copy link
Member

Can we somehow make this automatic and not manually split this into files ? I think this unnecessarily complicates our file structure. How about something like a .tpp file, which marked with macros and using CMake to automatically generate the separate compilation units at build time to enable parallelism ?

@upsj
Copy link
Member Author

upsj commented Jul 31, 2023

@pratikvn I don't believe that's possible in the way we do it, e.g. for Jacobi. We would most likely need to do this on the individual sub-configuration instantiation like we do with jacobi_kernels.cu, which is not really possible with the current file structure of the reduction kernels. At least it would mean a separation of the kernel and its configurations into two different files.

Co-authored-by: Yuhsiang M. Tsai <yhmtsai@gmail.com>
@upsj
Copy link
Member Author

upsj commented Jul 31, 2023

@pratikvn or maybe I am misunderstanding your suggestion, can you give an example?

@yhmtsai
Copy link
Member

yhmtsai commented Jul 31, 2023

I think some script to split all functions with the GKO_INSTANTIATION function into different files.
for each GKO_INSTANTIATION function,

#include "*.cpp" // to get all necessary compenents
namespace list{
GKO_INSTANTIATION(function name);
}

each instantiation creates a file and compiles those files.
note. it may not work if there's a function without template or inline or anonymous namespace

@pratikvn
Copy link
Member

Also, how much of an effect does this splitting have ? If possible, I would like to avoid this fragmentation. I think it makes finding functions quite hard and also not really scalable as each reduction operation now has its own file.

@upsj
Copy link
Member Author

upsj commented Jul 31, 2023

Ah thanks, I like the idea. We only have a single file with enough compile time where this is necessary though, maybe we can pick this idea up again if we have more such files at some point?

@upsj
Copy link
Member Author

upsj commented Jul 31, 2023

@pratikvn Compiling the full file takes 5m53.388s, compiling the individual files in parallel takes 1m1.452s. In my tests, the total build was stuck at least 2-3 minutes at this single file.

Copy link
Member

@pratikvn pratikvn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I would like to avoid this type of fragmentation. But given that the benefit seems to be large, as a temporary measure, it looks acceptable to be. But I think we should try to do this automatically if possible, so that it is scalable, and the build system can handle the splitting into different instantiation units. See suggestion from @yhmtsai above.

@upsj
Copy link
Member Author

upsj commented Jul 31, 2023

@pratikvn this is not meant as a temporary measure. We do the same thing already for Jacobi and ParILUT kernels. If we want to split things into definition and instantiation, we don't really need to use automation (and I don't see any advantage of that), but it still leads to the same number of files, with the small disadvantage that IDEs will in my experience have a harder time displaying information about the function templates without instantiations available.

@upsj
Copy link
Member Author

upsj commented Jul 31, 2023

small note: We may need to do something similar for csr_kernels (at least in HIP), because I currently have a debug build taking 56 minutes and counting (it finished) on that file

@yhmtsai
Copy link
Member

yhmtsai commented Jul 31, 2023

from the splitting definition and instantiation, it generates more files than the origin.
The automation applies to the develop's version, not this branch.
One Cpp has several def and $n$ instantiation (only count macro)
->
One Cpp + $n$ file for different function instantiation.
For me, the 5 min does not sound like a problem.

@upsj
Copy link
Member Author

upsj commented Jul 31, 2023

After spending way too much time on this, I think I can conclude that making this work semi-automatically is not worth the effort. The only alternative is what Mike suggested, e.g. putting all template instantiations into separate files and keeping the code in one place. What do you think about that?

@upsj
Copy link
Member Author

upsj commented Aug 1, 2023

superseded by #1378

@upsj upsj closed this Aug 1, 2023
@upsj upsj deleted the split_large_files branch August 26, 2024 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-for-review This PR is ready for review 1:ST:run-full-test mod:cuda This is related to the CUDA module. mod:hip This is related to the HIP module. reg:build This is related to the build system. type:matrix-format This is related to the Matrix formats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants