-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Internal compiler error in KokkosBatched::Experimental::TeamGemm #349
Comments
@huttered40 I edited your post just to avoid undesired Markdown formatting in the compiler output. In the future, please enclose verbatim text in triple backticks. Thanks! |
@huttered40 could you post more info about your configuration and build? I was able to build kokkos-kernels on the pascal queue (rhel7G) on White. Here is my setup: Tested with VOTD develop branch of kokkos and kokkos-kernels: kokkos SHA: kokkos/kokkos@b18689e kokkos-kernels SHA: b26f446 Modules: Configuration:
Interactive node session: Build library then tests: |
Interactive node session: Module: Kokkos: Kokkos-kernels: Relevant part of Makefile: CXXFLAGS = -O3 --expt-extended-lambda --expt-relaxed-constexpr# -std=c++14
KOKKOS_CXX_STANDARD=c++14 # Currently only works when using the develop branch of kokkos
LINK = ${CXX}
LDFLAGS =
EXE = test.cuda
KOKKOS_DEVICES = "Cuda"
KOKKOS_ARCH = "Power8,Pascal60" # For rhel-7G queue on White
KOKKOS_CUDA_OPTIONS += "enable_lambda" My application is calling The error again is: |
@huttered40 if you modify the way to generate your makefile like below it should work (it worked for me) - use the
|
@huttered40 oop, I didn't properly set |
I don't think that we can handle the compiler error. The code is header only code and it is compiled within your code. It is a very unlucky case but I don't think that we can give much of help for this compiler error. |
Probably have to grind this down to a reproducer for Nvidia since c++14 should be supported... |
@ndellingwood Does kokkos officially support c++14 ? |
@kyungjoo-kim good point, there isn't nightly testing with c++14 enabled so we shouldn't claim it is officially supported. I put in PR kokkos/kokkos#1913 so we can enable c++14 through generated makefiles and begin testing. |
I am reopening this. We have multiple requests to support C++14. It doesn't have to be every version of every compiler with C++14 support as this is evolving. However, we do have to support gcc 7.2. Trilinos is moving the PR testing to gcc 7.2 very soon. |
Adding @crtrott he said he'd also help look into this. |
Apparently fixed in GCC 7.3: At least the last 4 entries of the call stack are the same: Debian Bug:
KokkosKernels:
|
What's the chance for work-around in GCC 7.2? |
Looking into it. My guess is that it is pretty good that we can work around this. The compiler gets tripped up in something related to figuring out whether something is an rvalue or so. So adding some parenthesis, explicit casts, using a temporary instead of inline computing the value, etc. may avoid the trigger. |
Ok found two options for this original code. The offending thing is capturing idx_j by reference in the inner most layer, where part of idx_j is coming from the argument to another inlined lambda. Kokkos::parallel_for(Kokkos::TeamThreadRange(team,blockDim_j), [&] (const int j) {
const int idx_j = offset_j+j;
Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,blockDim_i), [&] (const int i) {
const int idx_i = offset_i+i;
A_scr(i,j) = idx_i<A.extent_int(0) && idx_j<A.extent_int(1) ? A(idx_i,idx_j) : ATV::zero();
});
}); Option 1: Capture by value in innermost-lambda: Kokkos::parallel_for(Kokkos::TeamThreadRange(team,blockDim_j), [&] (const int j) {
const int idx_j = offset_j+j;
Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,blockDim_i), [=] (const int i) {
const int idx_i = offset_i+i;
A_scr(i,j) = idx_i<A.extent_int(0) && idx_j<A.extent_int(1) ? A(idx_i,idx_j) : ATV::zero();
});
}); Option2: move the offset calculation in the innermost loop: Kokkos::parallel_for(Kokkos::TeamThreadRange(team,blockDim_j), [&] (const int j) {
Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,blockDim_i), [&] (const int i) {
const int idx_j = offset_j+j;
const int idx_i = offset_i+i;
A_scr(i,j) = idx_i<A.extent_int(0) && idx_j<A.extent_int(1) ? A(idx_i,idx_j) : ATV::zero();
});
}); My guess is that the second option is better. In any case we can ifdef this with C++ standard and GCC version. |
Btw. this applies to all similar places in the code: 91, 118, 145, ... |
@crtrott Is the original code still legal in C++ standards (nesting two lambdas and capruting values by reference) ? I have many places that follow the same pattern of this. |
This is legal C++ (depending on what you do it might not be legal Kokkos though: remember the code must be valid when capturing by value, but capturing by reference may get better performance). |
I also prefer the second option. Anyway you are a magician. How do you know that the compiler problem is due to capturing values by refernce ? |
If you look at the call stack, the functions name indicate that it tries to optimize away expressions (fold), it tries to figure out if something is an rvalue and then crashes when it tries to optimize some reference access inside a parenthesis. This is all just educated guesses but looks like I guessed right ;-). |
Ah I am working on the proper fix and will issue a pull request. |
thanks. |
Found a couple more places which could be resolved by making temporaries non-const ... |
Same test config passed with the suggested change, I'll test more completely and then put in the PR with the change and updated scripts to make sure gcc/7.4 is also tested.. |
Of note, it was part of the Trilinos config, and I used |
Resurfaced issue #349, update to macro for CXX14 workaround issues. Added gcc/7.4 to testing on white testbed. Added support for kokkos-dev-2 system for testing. Extra fixes caught while spot-check testing on kokkos-dev-2 * Shadow warning of type in KokkosSparse_spmv_struct_impl.hpp * Missing test for Threads backend in KokkosGraph_color_d2.cpp
Capture by reference is working on my workstation, but fails on Summit with arborx/src/details/ArborX_DetailsTreeTraversal.hpp:102:7: internal compiler error: in maybe_undo_parenthesized_ref, at cp/semantics.c:1739 if (i < real_predicates_in_team) { This is reminiscent of kokkos/kokkos-kernels#349.
The GCC 7 compiler on summit was failing to compile all of the code. The problematic parts involved using lambda functions. I think the problem is that the compiler has a bug where it has a problem resolving the type of variables captured by reference. The problem seems similar to this bug reported to Kokkos: kokkos/kokkos-kernels#349 Solved the problem by removing the lambdas with either a named method or just inline code. I suspect the problem arose (without anyone's knowledge) with MR !2331, which moved VTK-m to C++14. This GCC error seems to happen with C++14 but not C++11. (The features of lambdas changed between these two versions of C++.)
…#14004) The current compute kernel fails to compile with gcc6/7 and c++14/17, due to a known bug of gcc. It is triggered when a const integer is capture by reference in a lambda function, and is parenthesized in that lambda code. Capturing the const ints by value fixes this issue. See also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83204 and kokkos/kokkos-kernels#349 Lead-authored-by: Jin Shang <shangjin1997@gmail.com> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Co-authored-by: jinshang <jinshang@tencent.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…apache#14004) The current compute kernel fails to compile with gcc6/7 and c++14/17, due to a known bug of gcc. It is triggered when a const integer is capture by reference in a lambda function, and is parenthesized in that lambda code. Capturing the const ints by value fixes this issue. See also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83204 and kokkos/kokkos-kernels#349 Lead-authored-by: Jin Shang <shangjin1997@gmail.com> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Co-authored-by: jinshang <jinshang@tencent.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…apache#14004) The current compute kernel fails to compile with gcc6/7 and c++14/17, due to a known bug of gcc. It is triggered when a const integer is capture by reference in a lambda function, and is parenthesized in that lambda code. Capturing the const ints by value fixes this issue. See also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83204 and kokkos/kokkos-kernels#349 Lead-authored-by: Jin Shang <shangjin1997@gmail.com> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Co-authored-by: jinshang <jinshang@tencent.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…apache#14004) The current compute kernel fails to compile with gcc6/7 and c++14/17, due to a known bug of gcc. It is triggered when a const integer is capture by reference in a lambda function, and is parenthesized in that lambda code. Capturing the const ints by value fixes this issue. See also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83204 and kokkos/kokkos-kernels#349 Lead-authored-by: Jin Shang <shangjin1997@gmail.com> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Co-authored-by: jinshang <jinshang@tencent.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
I am getting an internal compiler error when running KokkosBatched::Experimental::TeamGemm on White machine - rhel 7G queue. The GCC compiler version is 7.2.0 and I tried 6.4.0 as well, both with same issue. This error does not occur when running on Bowman with GCC 4.9.3. Most of the stack trace is posted below:
The text was updated successfully, but these errors were encountered: