-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
Hey @mseth10 , Thanks for submitting the PR
CI supported jobs: [edge, miscellaneous, unix-cpu, centos-gpu, website, sanity, windows-gpu, unix-gpu, clang, centos-cpu, windows-cpu] Note: |
@@ -33,4 +33,4 @@ set(USE_F16C OFF CACHE BOOL "Build with x86 F16C instruction support") | |||
set(USE_LIBJPEG_TURBO ON CACHE BOOL "Build with libjpeg-turbo") | |||
|
|||
set(CUDACXX "/usr/local/cuda-10.0/bin/nvcc" CACHE STRING "Cuda compiler") | |||
set(MXNET_CUDA_ARCH "3.0;5.0;6.0;7.0;7.5" CACHE STRING "Cuda architectures") | |||
set(MXNET_CUDA_ARCH "3.0;5.0;6.0;7.0" CACHE STRING "Cuda architectures") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dropping support for 7.5 cuda arch is it favorable?
For eg for Tesla T4 [G4 instances] cuda arch supported is 7.5
@leezu What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a temporary fix to get the CD working, I checked the builds for cu100 and cu102, both fail because of binary size issues. We should work on adding back 7.5 arch after making sure the build works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sidenote: 7.0 binaries also run on 7.5
if platform.system() == 'Darwin': | ||
shutil.copytree(os.path.join(CURRENT_DIR, 'mxnet-build/3rdparty/mkldnn/build/install/include'), | ||
os.path.join(CURRENT_DIR, 'mxnet/include/mkldnn')) | ||
shutil.copytree(os.path.join(CURRENT_DIR, 'mxnet-build/3rdparty/mkldnn/build/install/include'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious how
- previously when nightly CD was working, why was mkldnn include done only for Darwin
- now, to fix nightly CD, this needs to be done for all OS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This platform condition was added by mistake in some earlier commit. We are fixing that here. No reason to package dnnl header files for only Darwin.
Also @mseth10 do you mind updating the description [removing extraneous content] and also adding the Jenkins Pipeline build where you tested it out or a command you used to reproduce & test this fix. Thanks |
@ChaiBapchya updated the description. Check it out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mseth10
Retriggered one previously broken pipeline as http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/restricted-mxnet-cd%2Fmxnet-cd-release-job/detail/mxnet-cd-release-job/1048/pipeline |
…he#18205) * use cmake for cd static build, skip running kvstore tests * update dnnl headers stash location * remove unnecessary platform condition * remove 7.5 arch for cu100, cu101, cu102 Co-authored-by: Ubuntu <ubuntu@ip-172-31-3-62.us-west-2.compute.internal>
Description
This PR fixes nightly CD GPU tests by updating the build toolchain to use cmake static build and updating dnnl headers stash location. It removes 7.5 arch for cu100, cu101, cu102 builds to solve oversized libmxnet.so binary issue with cmake builds.
It also fixes the issue with dnnl headers packaging into nightly build artifacts. Fixes #18120
Here's the link to a broken CD pipeline: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/restricted-mxnet-cd%2Fmxnet-cd-release-job/detail/mxnet-cd-release-job/1024/pipeline/
Commands used to reproduce and test the fix: