Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

CI: Attempt fixing illegal instruction errors #17842

Merged
merged 4 commits into from
Mar 16, 2020

Conversation

leezu
Copy link
Contributor

@leezu leezu commented Mar 16, 2020

Hypothesis: Caching the source-compiled variants of libzstd1 and libb2 via our Docker cache can lead to "Illegal instruction" errors when deploying the docker image on an older machines. If so, switching to the distribution provided variants will solve the issue.

Follow-up on #17828

Caching the source-compiled variants via our Docker cache can lead to "Illegal
instruction" errors when deploying the docker image on an older machines.
@marcoabreu
Copy link
Contributor

Could you elaborate what that flag exactly does?

@apeforest
Copy link
Contributor

Will this fix the issue #17840?

@leezu leezu merged commit bd6b80e into apache:master Mar 16, 2020
@leezu
Copy link
Contributor Author

leezu commented Mar 16, 2020

@marcoabreu the flag disables AVX instruction set. https://github.com/facebook/zstd contains code that can be vectorized. For whatever reason, gcc uses avx instruction set even if it shouldn't (cf #14664) and we need to explicitly disable it as the resulting binary is used on different machines thanks to the docker cache.

@leezu leezu deleted the fixciillegalinstruction branch March 16, 2020 17:40
@leezu
Copy link
Contributor Author

leezu commented Mar 16, 2020

@apeforest I'm not sure about the TVM problem. I'm attempting to fix the following problem:

[2020-03-15T23:40:48.818Z] FAILED: /usr/local/bin/ccache /usr/bin/c++  -DDMLC_CORE_USE_CMAKE -DDMLC_LOG_FATAL_THROW=1 -DDMLC_LOG_STACK_TRACE_SIZE=0 -DDMLC_MODERN_THREAD_LOCAL=0 -DDMLC_USE_AZURE=0 -DDMLC_USE_CXX11=1 -DDMLC_USE_HDFS=0 -DDMLC_USE_S3=0 -DMSHADOW_IN_CXX11 -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_MKL=0 -DMXNET_USE_BLAS_OPEN=1 -DMXNET_USE_LAPACK=1 -DMXNET_USE_LIBJPEG_TURBO=0 -DMXNET_USE_OPENCV=1 -DMXNET_USE_OPENMP=1 -DMXNET_USE_TVM_OP=1 -DNDEBUG=1 -DUSE_CUDNN -D_DARWIN_C_SOURCE -D_POSIX_C_SOURCE=200809L -D_POSIX_SOURCE -D_XOPEN_SOURCE=700 -D__USE_XOPEN2K8 -I/work/mxnet/include -I/work/mxnet/src -I/work/mxnet/3rdparty/nvidia_cub -I/work/mxnet/3rdparty/tvm/nnvm/include -I/work/mxnet/3rdparty/tvm/include -I/work/mxnet/3rdparty/dmlc-core/include -I/work/mxnet/3rdparty/dlpack/include -I3rdparty/dmlc-core/include -isystem /usr/include/opencv -Wall -Wno-unknown-pragmas -Wno-sign-compare -O3  -Wall -Wno-unknown-pragmas -fPIC -O3 -msse2 -std=c++11 -O3 -DNDEBUG -fPIC   -fopenmp -MD -MT 3rdparty/dmlc-core/CMakeFiles/dmlc.dir/src/io/line_split.cc.o -MF 3rdparty/dmlc-core/CMakeFiles/dmlc.dir/src/io/line_split.cc.o.d -o 3rdparty/dmlc-core/CMakeFiles/dmlc.dir/src/io/line_split.cc.o -c /work/mxnet/3rdparty/dmlc-core/src/io/line_split.cc

[2020-03-15T23:40:48.818Z] Illegal instruction (core dumped)

[2020-03-15T23:40:48.818Z] ninja: build stopped: subcommand failed.

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-17835/9/pipeline

@marcoabreu
Copy link
Contributor

It's not really nice to merge a PR while there are still open questions....

Will this globally disable the AVX instruction set?

@leezu
Copy link
Contributor Author

leezu commented Mar 16, 2020

@marcoabreu this fixes a bug in #17828 which you approved before. It's urgent as it blocks all CI runs.
No, this only disables avx in libzstd used by ccache to compress files in it's cache.

@apeforest approved this PR, which is why I merged the PR.

@marcoabreu
Copy link
Contributor

Oh I didn't know it blocked. In that case, fine with me.

Thanks for elaborating. Sounds good.

leezu added a commit to leezu/mxnet that referenced this pull request Mar 17, 2020
MoisesHer pushed a commit to MoisesHer/incubator-mxnet that referenced this pull request Apr 10, 2020
Caching the source-compiled variants of libzstd1 and libb2 via our Docker cache can lead to "Illegal instruction" errors when deploying the docker image on an older machines.
anirudh2290 pushed a commit to anirudh2290/mxnet that referenced this pull request May 29, 2020
Caching the source-compiled variants of libzstd1 and libb2 via our Docker cache can lead to "Illegal instruction" errors when deploying the docker image on an older machines.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants