-
Notifications
You must be signed in to change notification settings - Fork 6.8k
FusedOp Failing Static Linked Build #16765
Comments
@ptrendx Can you take a look? |
Hmmm, it is strange - https://github.com/apache/incubator-mxnet/blob/master/src/common/rtc.cc already uses those same functions (e.g. here: https://github.com/apache/incubator-mxnet/blob/master/src/common/rtc.cc#L175) and it compiles no problem... |
Trading my assignment with @DickJC123 who is looking into this issue. |
I'm able to reproduce the link error in the docker container mentioned above with the command:
I'll continue investigating the root cause. FYI, the following command does not have a similar issue:
|
Finally figured out what's going on here. The build of bin/im2rec via ld (as driven by g++) is failing because LDFLAGS is missing '-lcuda -lnvrtc'. The Makefile will add these flags to LDFLAGS (as well as compile with MXNET_ENABLE_CUDA_RTC=1) if it sees ENABLE_CUDA_RTC set in config.mk . The maven builds are using the flag USE_NVRTC (to no effect), while the pip builds were converted to using ENABLE_CUDA_RTC via PR #14250. Not sure why the PR stopped short of converting all the builds. The functionality of ./src/common/rtc.cc is guarded by MXNET_ENABLE_CUDA_RTC. So the question to fusion PR author @ptrendx becomes, do you think the pointwise fusion should be similarly guarded by ENABLE_CUDA_RTC (or a different flag)? Should MXNet warn when the user is running on a build that lacks the rtc capability, and under what circumstances (e.g. only when MXNET_USE_FUSION=1 is set explicitly in the environment and on a gpu context)? Should the user expect to run the unittest suite on the no-rtc builds, and how do we detect that? |
Description
The build is currently failing for the statically linked build that is used for Scala Maven Publishing. This is blocking the currently nightly snapshot and must also be fixed before building the release jars as well.
See full log at http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/restricted-publish-artifacts/detail/master/287/pipeline/
Main Scala nightly pipeline at http://jenkins.mxnet-ci.amazon-ml.com/job/restricted-publish-artifacts/job/master/
It seems to be a result of #15167. The pip build has also been failing since this date for what might be the same reason.
To Reproduce
This version of the build can be run by following the instructions located at https://github.com/apache/incubator-mxnet/tree/master/tools/staticbuild. The Scala build uses variant cu92mkl by default, but other cuda builds should have the same problem.
The build is currently run on a Ubuntu 14.04 docker instance using https://github.com/apache/incubator-mxnet/blob/master/ci/docker/Dockerfile.publish.ubuntu1404_cpu.
The text was updated successfully, but these errors were encountered: