Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal compiler error in KokkosBatched::Experimental::TeamGemm #349

Closed
huttered40 opened this issue Nov 26, 2018 · 32 comments
Closed

Internal compiler error in KokkosBatched::Experimental::TeamGemm #349

huttered40 opened this issue Nov 26, 2018 · 32 comments

Comments

@huttered40
Copy link

huttered40 commented Nov 26, 2018

I am getting an internal compiler error when running KokkosBatched::Experimental::TeamGemm on White machine - rhel 7G queue. The GCC compiler version is 7.2.0 and I tried 6.4.0 as well, both with same issue. This error does not occur when running on Bowman with GCC 4.9.3. Most of the stack trace is posted below:

../kokkos-kernels/src/batched/KokkosBatched_Gemm_Team_Internal.hpp:137:27: internal compiler error: in maybe_undo_parenthesized_ref, at cp/semantics.c:1705
             const int i = (ij/nq)*mb;
               ~~~~~~~~~~~~^~~~~~
0x102ebfe3 maybe_undo_parenthesized_ref(tree_node*)
	../.././gcc/cp/semantics.c:1704
0x1034eacf cp_fold
	../.././gcc/cp/cp-gimplify.c:2141
0x1034f8b7 cp_fold_maybe_rvalue
	../.././gcc/cp/cp-gimplify.c:2003
0x1034e5b7 cp_fold
	../.././gcc/cp/cp-gimplify.c:2110
0x1034f8b7 cp_fold_maybe_rvalue
	../.././gcc/cp/cp-gimplify.c:2003
0x1034e27f cp_fold_rvalue
	../.././gcc/cp/cp-gimplify.c:2024
0x1034e27f cp_fold
	../.././gcc/cp/cp-gimplify.c:2242
0x102ba7db cp_build_binary_op(unsigned int, tree_code, tree_node*, tree_node*, int)
	../.././gcc/cp/typeck.c:5243
0x101b430f build_new_op_1
	../.././gcc/cp/call.c:5982
0x101b4eff build_new_op(unsigned int, tree_code, int, tree_node*, tree_node*, tree_node*, tree_node**, int)
	../.././gcc/cp/call.c:6026
0x102af247 build_x_binary_op(unsigned int, tree_code, tree_node*, tree_code, tree_node*, tree_code, tree_node**, int)
	../.././gcc/cp/typeck.c:3928
0x10206f33 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool)
	../.././gcc/cp/pt.c:16937
0x101f5edf tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
	../.././gcc/cp/pt.c:16550
0x101f79c7 tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
	../.././gcc/cp/pt.c:15786
0x101f79c7 tsubst_init
	../.././gcc/cp/pt.c:14483
0x101f6edf tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
	../.././gcc/cp/pt.c:15907
0x101f489b tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
	../.././gcc/cp/pt.c:15801
0x101f4b13 tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
	../.././gcc/cp/pt.c:16027
0x101f489b tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
	../.././gcc/cp/pt.c:15801
0x101f4b13 tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
	../.././gcc/cp/pt.c:16027
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
@mhoemmen
Copy link
Contributor

@huttered40 I edited your post just to avoid undesired Markdown formatting in the compiler output. In the future, please enclose verbatim text in triple backticks. Thanks!

@ndellingwood
Copy link
Contributor

@huttered40 could you post more info about your configuration and build? I was able to build kokkos-kernels on the pascal queue (rhel7G) on White. Here is my setup:

Tested with VOTD develop branch of kokkos and kokkos-kernels:

kokkos SHA: kokkos/kokkos@b18689e

kokkos-kernels SHA: b26f446

Modules:
module load devpack/20180521/openmpi/2.1.2/gcc/7.2.0/cuda/9.2.88

Configuration:
I have kokkos and kokkos-kernels clone to my $HOME directory.

KOKKOS_PATH=${HOME}/kokkos #path to kokkos source
KOKKOSKERNELS_SCALARS=double #the scalar types to instantiate =double,float...
KOKKOSKERNELS_LAYOUTS=LayoutLeft,LayoutRight  #the layout types to instantiate.
KOKKOSKERNELS_ORDINALS=int #ordinal types to instantiate
KOKKOSKERNELS_OFFSETS=int #offset types to instantiate
KOKKOSKERNELS_PATH=../.. #path to kokkos-kernels top directory.
CXX=${KOKKOS_PATH}/bin/nvcc_wrapper #icpc #
KOKKOSKERNELS_OPTIONS=eti-only #options for kokkoskernels  
KOKKOS_DEVICES="Cuda,Serial"
KOKKOS_ARCHS="Power8,Pascal60"
CXXFLAGS="-pedantic -O3 -g -Wshadow -Wsign-compare -Wtype-limits -Wuninitialized"

../../scripts/generate_makefile.bash --kokkoskernels-path=${KOKKOSKERNELS_PATH} --with-scalars=${KOKKOSKERNELS_SCALARS} --with-ordinals=${KOKKOSKERNELS_ORDINALS} --with-offsets=${KOKKOSKERNELS_OFFSETS} --kokkos-path=${KOKKOS_PATH} --with-devices=${KOKKOS_DEVICES} --arch=${KOKKOS_ARCHS} --compiler=${CXX} --with-options=${KOKKOSKERNELS_OPTIONS}  --cxxflags="${CXXFLAGS}"

Interactive node session:
bsub -Is -n 1 -q rhel7G bash

Build library then tests:
make install-lib -j16
cd unit_test
make -j

@huttered40
Copy link
Author

Interactive node session:
bsub -Is -q rhel7G -n 32 bash

Module:
devpack/20180521/openmpi/3.1.0/gcc/7.2.0/cuda/9.2.88

Kokkos:
branch: develop
most recent commit hash: b18689e

Kokkos-kernels:
branch: develop
most recent commit hash: b26f446

Relevant part of Makefile:

CXXFLAGS = -O3 --expt-extended-lambda --expt-relaxed-constexpr# -std=c++14
KOKKOS_CXX_STANDARD=c++14               # Currently only works when using the develop branch of kokkos
LINK = ${CXX}
LDFLAGS =
EXE = test.cuda
KOKKOS_DEVICES = "Cuda"
KOKKOS_ARCH = "Power8,Pascal60" # For rhel-7G queue on White
KOKKOS_CUDA_OPTIONS += "enable_lambda"

My application is calling KokkosBatched::Experimental::TeamGemm<TransposeAType,TransposeBType,GemmAlgType>::invoke(...)

The error again is:
../kokkos-kernels/src/batched/KokkosBatched_Gemm_Team_Internal.hpp:136:27: internal compiler error: in maybe_undo_parenthesized_ref, at cp/semantics.c:1705 const int i = ij/nq*mb, j = ij%nq*nb;

@ndellingwood
Copy link
Contributor

@huttered40 if you modify the way to generate your makefile like below it should work (it worked for me) - use the --with-cuda-options argument to set enable_lambda (this takes care of --expt-extended-lambda) and set KOKKOS_CXXFLAGS="--expt-relaxed-constexpr"

KOKKOS_PATH=${HOME}/kokkos #path to kokkos source
KOKKOSKERNELS_SCALARS=double #the scalar types to instantiate =double,float...
KOKKOSKERNELS_LAYOUTS=LayoutLeft,LayoutRight  #the layout types to instantiate.
KOKKOSKERNELS_ORDINALS=int #ordinal types to instantiate
KOKKOSKERNELS_OFFSETS=int #offset types to instantiate
KOKKOSKERNELS_PATH=../.. #path to kokkos-kernels top directory.
CXX=${KOKKOS_PATH}/bin/nvcc_wrapper #icpc #
KOKKOS_CXX_STANDARD=c++14
KOKKOS_CXXFLAGS="--expt-relaxed-constexpr"
KOKKOSKERNELS_OPTIONS=eti-only #options for kokkoskernels  
KOKKOS_DEVICES="Cuda,Serial"
KOKKOS_ARCHS="Power8,Pascal60"
KOKKOS_CUDA_OPTION="enable_lambda" #"enable_lambda,force_uvm,rdc"
CXXFLAGS="-pedantic -O3 -g -Wshadow -Wsign-compare -Wtype-limits -Wuninitialized"

../../scripts/generate_makefile.bash --kokkoskernels-path=${KOKKOSKERNELS_PATH} --with-scalars=${KOKKOSKERNELS_SCALARS} --with-ordinals=${KOKKOSKERNELS_ORDINALS} --with-offsets=${KOKKOSKERNELS_OFFSETS} --kokkos-path=${KOKKOS_PATH} --with-devices=${KOKKOS_DEVICES} --arch=${KOKKOS_ARCHS} --compiler=${CXX} --with-options=${KOKKOSKERNELS_OPTIONS}  --cxxflags="${CXXFLAGS}" --with-cuda-options=${KOKKOS_CUDA_OPTION}

@ndellingwood
Copy link
Contributor

@huttered40 oop, I didn't properly set KOKKOS_CXX_STANDARD=c++14, when I did that I saw your failure. Cross-referencing your PR with fix here: #350

@kyungjoo-kim
Copy link
Contributor

I don't think that we can handle the compiler error. The code is header only code and it is compiled within your code. It is a very unlucky case but I don't think that we can give much of help for this compiler error.

@ndellingwood
Copy link
Contributor

Probably have to grind this down to a reproducer for Nvidia since c++14 should be supported...

@kyungjoo-kim
Copy link
Contributor

@ndellingwood Does kokkos officially support c++14 ?

@ndellingwood
Copy link
Contributor

@kyungjoo-kim good point, there isn't nightly testing with c++14 enabled so we shouldn't claim it is officially supported. I put in PR kokkos/kokkos#1913 so we can enable c++14 through generated makefiles and begin testing.

@srajama1
Copy link
Contributor

srajama1 commented Dec 5, 2018

I am reopening this. We have multiple requests to support C++14. It doesn't have to be every version of every compiler with C++14 support as this is evolving. However, we do have to support gcc 7.2. Trilinos is moving the PR testing to gcc 7.2 very soon.

@ndellingwood
Copy link
Contributor

Adding @crtrott he said he'd also help look into this.

@crtrott
Copy link
Member

crtrott commented Dec 10, 2018

Apparently fixed in GCC 7.3:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=882855

At least the last 4 entries of the call stack are the same:

Debian Bug:

0x102ebfe3 maybe_undo_parenthesized_ref(tree_node*)
	../.././gcc/cp/semantics.c:1704
0x1034eacf cp_fold
	../.././gcc/cp/cp-gimplify.c:2141
0x1034f8b7 cp_fold_maybe_rvalue
	../.././gcc/cp/cp-gimplify.c:2003
0x1034e5b7 cp_fold
	../.././gcc/cp/cp-gimplify.c:2110
0x1022385b store_init_value(tree_node*, tree_node*, vec<tree_node*, va_gc, vl_embed>**, int)
	../.././gcc/cp/typeck2.c:841

KokkosKernels:

0x102ebfe3 maybe_undo_parenthesized_ref(tree_node*)
	../.././gcc/cp/semantics.c:1704
0x1034eacf cp_fold
	../.././gcc/cp/cp-gimplify.c:2141
0x1034f8b7 cp_fold_maybe_rvalue
	../.././gcc/cp/cp-gimplify.c:2003
0x1034e5b7 cp_fold
	../.././gcc/cp/cp-gimplify.c:2110
0x102ba7db cp_build_binary_op(unsigned int, tree_code, tree_node*, tree_node*, int)
	../.././gcc/cp/typeck.c:5243

@nmhamster
Copy link
Contributor

What's the chance for work-around in GCC 7.2?

@crtrott
Copy link
Member

crtrott commented Dec 10, 2018

Looking into it. My guess is that it is pretty good that we can work around this. The compiler gets tripped up in something related to figuring out whether something is an rvalue or so. So adding some parenthesis, explicit casts, using a temporary instead of inline computing the value, etc. may avoid the trigger.

@crtrott
Copy link
Member

crtrott commented Dec 10, 2018

Ok found two options for this original code. The offending thing is capturing idx_j by reference in the inner most layer, where part of idx_j is coming from the argument to another inlined lambda.

      Kokkos::parallel_for(Kokkos::TeamThreadRange(team,blockDim_j), [&] (const int j) {
        const int idx_j = offset_j+j;
        Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,blockDim_i), [&] (const int i) {
          const int idx_i = offset_i+i;
          A_scr(i,j) = idx_i<A.extent_int(0) && idx_j<A.extent_int(1) ? A(idx_i,idx_j) : ATV::zero();
        });
      });

Option 1: Capture by value in innermost-lambda:

      Kokkos::parallel_for(Kokkos::TeamThreadRange(team,blockDim_j), [&] (const int j) {
        const int idx_j = offset_j+j;
        Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,blockDim_i), [=] (const int i) {
          const int idx_i = offset_i+i;
          A_scr(i,j) = idx_i<A.extent_int(0) && idx_j<A.extent_int(1) ? A(idx_i,idx_j) : ATV::zero();
        });
      });

Option2: move the offset calculation in the innermost loop:

      Kokkos::parallel_for(Kokkos::TeamThreadRange(team,blockDim_j), [&] (const int j) {
        Kokkos::parallel_for(Kokkos::ThreadVectorRange(team,blockDim_i), [&] (const int i) {
          const int idx_j = offset_j+j;
          const int idx_i = offset_i+i;
          A_scr(i,j) = idx_i<A.extent_int(0) && idx_j<A.extent_int(1) ? A(idx_i,idx_j) : ATV::zero();
        });
      });

My guess is that the second option is better. In any case we can ifdef this with C++ standard and GCC version.

@crtrott
Copy link
Member

crtrott commented Dec 10, 2018

Btw. this applies to all similar places in the code: 91, 118, 145, ...

@kyungjoo-kim
Copy link
Contributor

@crtrott Is the original code still legal in C++ standards (nesting two lambdas and capruting values by reference) ? I have many places that follow the same pattern of this.

@crtrott
Copy link
Member

crtrott commented Dec 10, 2018

This is legal C++ (depending on what you do it might not be legal Kokkos though: remember the code must be valid when capturing by value, but capturing by reference may get better performance).

@kyungjoo-kim
Copy link
Contributor

I also prefer the second option. Anyway you are a magician. How do you know that the compiler problem is due to capturing values by refernce ?

@crtrott
Copy link
Member

crtrott commented Dec 10, 2018

If you look at the call stack, the functions name indicate that it tries to optimize away expressions (fold), it tries to figure out if something is an rvalue and then crashes when it tries to optimize some reference access inside a parenthesis. This is all just educated guesses but looks like I guessed right ;-).

@crtrott
Copy link
Member

crtrott commented Dec 10, 2018

Ah I am working on the proper fix and will issue a pull request.

@kyungjoo-kim
Copy link
Contributor

thanks.

crtrott added a commit that referenced this issue Dec 11, 2018
@crtrott
Copy link
Member

crtrott commented Dec 11, 2018

Found a couple more places which could be resolved by making temporaries non-const ...
I didn't ifdef those but put a comment in.

@ndellingwood
Copy link
Contributor

ndellingwood commented Oct 25, 2019

Same test config passed with the suggested change, I'll test more completely and then put in the PR with the change and updated scripts to make sure gcc/7.4 is also tested..

@aprokop
Copy link
Contributor

aprokop commented Oct 25, 2019

Of note, it was part of the Trilinos config, and I used -DCMAKE_CXX_STANDARD=14 and not specify any other cxx11 related flags (like -DTrilinos_CXX11_FLAGS.

ndellingwood added a commit that referenced this issue Oct 25, 2019
Resurfaced issue #349, update to macro for CXX14 workaround issues.
Added gcc/7.4 to testing on white testbed.
Added support for kokkos-dev-2 system for testing.
Extra fixes caught while spot-check testing on kokkos-dev-2
* Shadow warning of type in KokkosSparse_spmv_struct_impl.hpp
* Missing test for Threads backend in KokkosGraph_color_d2.cpp
aprokop added a commit to aprokop/ArborX that referenced this issue Aug 14, 2020
Capture by reference is working on my workstation, but fails on Summit
with
arborx/src/details/ArborX_DetailsTreeTraversal.hpp:102:7: internal compiler error: in maybe_undo_parenthesized_ref, at cp/semantics.c:1739
                                  if (i < real_predicates_in_team) {

This is reminiscent of kokkos/kokkos-kernels#349.
kwrobot pushed a commit to Kitware/VTK-m that referenced this issue Feb 25, 2021
The GCC 7 compiler on summit was failing to compile all of the
code. The problematic parts involved using lambda functions.
I think the problem is that the compiler has a bug where it
has a problem resolving the type of variables captured by
reference. The problem seems similar to this bug reported
to Kokkos:

kokkos/kokkos-kernels#349

Solved the problem by removing the lambdas with either a
named method or just inline code.

I suspect the problem arose (without anyone's knowledge) with
MR !2331, which moved VTK-m to C++14. This GCC error seems to
happen with C++14 but not C++11. (The features of lambdas changed
between these two versions of C++.)
cwsmith added a commit to SCOREC/pumi-pic that referenced this issue Apr 14, 2021
pitrou added a commit to apache/arrow that referenced this issue Aug 30, 2022
…#14004)

The current compute kernel fails to compile with gcc6/7 and c++14/17, due to a known bug of gcc. It is triggered when a const integer is capture by reference in a lambda function, and is parenthesized in that lambda code. Capturing the const ints by value fixes this issue.

See also:  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83204 and kokkos/kokkos-kernels#349

Lead-authored-by: Jin Shang <shangjin1997@gmail.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Co-authored-by: jinshang <jinshang@tencent.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
anjakefala pushed a commit to anjakefala/arrow that referenced this issue Aug 31, 2022
…apache#14004)

The current compute kernel fails to compile with gcc6/7 and c++14/17, due to a known bug of gcc. It is triggered when a const integer is capture by reference in a lambda function, and is parenthesized in that lambda code. Capturing the const ints by value fixes this issue.

See also:  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83204 and kokkos/kokkos-kernels#349

Lead-authored-by: Jin Shang <shangjin1997@gmail.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Co-authored-by: jinshang <jinshang@tencent.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
zagto pushed a commit to zagto/arrow that referenced this issue Oct 7, 2022
…apache#14004)

The current compute kernel fails to compile with gcc6/7 and c++14/17, due to a known bug of gcc. It is triggered when a const integer is capture by reference in a lambda function, and is parenthesized in that lambda code. Capturing the const ints by value fixes this issue.

See also:  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83204 and kokkos/kokkos-kernels#349

Lead-authored-by: Jin Shang <shangjin1997@gmail.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Co-authored-by: jinshang <jinshang@tencent.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
fatemehp pushed a commit to fatemehp/arrow that referenced this issue Oct 17, 2022
…apache#14004)

The current compute kernel fails to compile with gcc6/7 and c++14/17, due to a known bug of gcc. It is triggered when a const integer is capture by reference in a lambda function, and is parenthesized in that lambda code. Capturing the const ints by value fixes this issue.

See also:  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83204 and kokkos/kokkos-kernels#349

Lead-authored-by: Jin Shang <shangjin1997@gmail.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Co-authored-by: jinshang <jinshang@tencent.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants