Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error building TensorFlow-2.6.0-foss-2021a-CUDA-11.3.1.eb #13971

Closed
scicomp-moffitt opened this issue Sep 10, 2021 · 2 comments
Closed

Error building TensorFlow-2.6.0-foss-2021a-CUDA-11.3.1.eb #13971

scicomp-moffitt opened this issue Sep 10, 2021 · 2 comments
Milestone

Comments

@scicomp-moffitt
Copy link
Contributor

scicomp-moffitt commented Sep 10, 2021

eb TensorFlow-2.6.0-foss-2021a-CUDA-11.3.1.eb --robot --cuda-compute-capabilities 7.0,7.2,7.5,8.0,8.6
OS RockyLinux 8.4, 6 core Linode VM , AMD EPYC 7542, 16GB RAM
ude/roctracer -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fPIC -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -fno-omit-frame-pointer -no-canonical-prefixes -fno-canonical-system-headers -DNDEBUG -g0 -O2 -ffunction-sections -fdata-sections -w -DAUTOLOAD_DYNAMIC_KERNELS -O2 -ftree-vectorize '-march=native' -fno-math-errno -fPIC -fPIC '-std=c++14' -c tensorflow/compiler/mlir/tensorflow/ir/tf_ops.cc -o bazel-out/k8-opt/bin/tensorflow/compiler/mlir/tensorflow/_objs/tensorflow_ops/tf_ops.pic.o)
Execution platform: @local_execution_config_platform//:platform
tensorflow/compiler/mlir/tensorflow/ir/tf_ops.cc: In member function virtual mlir::Type mlir::TF::TensorFlowDialect::parseType(mlir::DialectAsmParser&) const:
tensorflow/compiler/mlir/tensorflow/ir/tf_ops.cc:353: note: -Wmisleading-indentation is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
  353 |     return ret;
      |
gcc: fatal error: Killed signal terminated program cc1plus
compilation terminated.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
INFO: Elapsed time: 10119.445s, Critical Path: 537.21s
INFO: 16798 processes: 4915 internal, 11883 local.
FAILED: Build did NOT complete successfully
FAILED: Build did NOT complete successfully
 (at easybuild/tools/run.py:577 in parse_cmd_output)
== 2021-09-10 22:07:48,940 build_log.py:265 INFO        ... (took 2 hours 48 mins 44 secs)
== 2021-09-10 22:07:48,941 build_log.py:265 INFO ... (took 2 hours 52 mins 15 secs)
== 2021-09-10 22:07:48,943 filetools.py:1884 INFO Removing lock /app/eb/software/.locks/_app_eb_software_TensorFlow_2.6.0-foss-2021a-CUDA-11.3.1.lock...
== 2021-09-10 22:07:48,948 filetools.py:359 INFO Path /app/eb/software/.locks/_app_eb_software_TensorFlow_2.6.0-foss-2021a-CUDA-11.3.1.lock successfully removed.
== 2021-09-10 22:07:48,948 filetools.py:1888 INFO Lock removed: /app/eb/software/.locks/_app_eb_software_TensorFlow_2.6.0-foss-2021a-CUDA-11.3.1.lock
== 2021-09-10 22:07:48,948 easyblock.py:3726 WARNING build failed (first 300 chars): cmd " bazel --output_user_root=/app/eb/build/TensorFlow/2.6.0/foss-2021a-CUDA-11.3.1/tmp3z_8q12q-bazel-tf --host_jvm_args=-Xms512m --host_jvm_args=-Xmx4096m build --config=noaws --config=nogcp --config=nohdfs --compilation_mode=opt --config=opt --subcommands --verbose_failures --jobs=6 --copt="-fPIC
== 2021-09-10 22:07:48,953 easyblock.py:300 INFO Closing log for application name TensorFlow version 2.6.0
@boegel boegel added this to the 4.x milestone Sep 11, 2021
@boegel
Copy link
Member

boegel commented Sep 11, 2021

@scicomp-moffitt I ran into this problem too, it occur when the GCC installation you are using is missing a patch to fix an internal compiler error (ICE) when using nvcc (the CUDA compiler).

The patch was added in #13310, and is included since EasyBuild v4.4.1.

So you will need to do a forced reinstallation of GCCcore-10.3.0.eb using a recent EasyBuild version to resolve this...

@boegel boegel closed this as completed Sep 11, 2021
@scicomp-moffitt
Copy link
Contributor Author

OK, good to know,

I had launched that again but with fewer compute capabilities and it ran to completion without using the patch. Not sure if it is working though

eb TensorFlow-2.6.0-foss-2021a-CUDA-11.3.1.eb --robot --cuda-compute-capabilities 7.5,8.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants