change windows build system #16980

yajiedesign · 2019-12-05T08:03:59Z

Description

now, the huge image size causes many troubles.etc large image can't laod. some compile error.
now, the PIP build use compressed fatbin by hack nvcc,but it will cause a lot of memory,and general users have difficulty compiling on their own.
this pr change build system,it spilt different cuda arch to different dll.etc mxnet_30.mxnet_50.Then use a warpdll to automatically identify the gpu sm version. Load the corresponding dll. And forward all calls.
this pr also contains tools to automatically generate warp dll and will be executed automatically during compilation.

tools/gen_warp.cpp

CMakeLists.txt

yajiedesign · 2019-12-17T08:41:10Z

every body,ci test is ok .

CMakeLists.txt

leezu · 2019-12-17T09:32:40Z

ci/build_windows.py

@@ -115,6 +117,7 @@ class BuildFlavour(Enum):
        '-DCUDA_ARCH_NAME=Manual '
        '-DCUDA_ARCH_BIN=52 '
        '-DCUDA_ARCH_PTX=52 '
+        '-DCUDA_ARCH_LIST=5.2 '


Why add CUDA_ARCH_LIST?

the first class cuda use CUDA_ARCH_LIST choose or set arch

I don't think it works currently. I got this warning when testing it:

Manually-specified variables were not used by the project: CUDA_ARCH_LIST

It is fixed as part of #17031

currently works fine on ci.The warning may be caused by other modifications.

I just tried it again. It only works when using the Makefile generator (or apparently the windows generator as well). But doesn't work when using the ninja generator.
It's quite strange.

Where do you expect the variable to be used? Only at https://github.com/apache/incubator-mxnet/blob/8645b9a4a220940bdccba3aeee577eb645a86b34/cmake/FirstClassLangCuda.cmake#L109 ?

yes.only in https://github.com/apache/incubator-mxnet/pull/16980/files#diff-101d74caab6e808c727c01a427bd927eR137

leezu

Could you please elaborate on the motivation on this change?

now, the huge image size causes many troubles.etc large image can't laod. some compile error.

What do you mean?

now, the PIP build use compressed fatbin by hack nvcc

How would this change address the compressed fatbin? The compression was used to make the library size smaller. However, this PR will split the dll into multiple dlls. The total size that needs to be distributed on Pypi doesn't change or does it?

,but it will cause a lot of memory,and general users have difficulty compiling on their own.

What do you mean?

Thank you!

yajiedesign · 2019-12-18T11:01:35Z

@leezu now imageSize is too large. Windows cannot load too large DLL.So we must try to reduce the imagesize size.
previously I reduced the size by modifying nvcc.But this method takes up too much memory when starting multiple processes.So I designed a new scheme

yajiedesign · 2019-12-18T11:02:45Z

@leezu The pip size will be solved by other methods.

leezu · 2019-12-18T11:55:02Z

@yajiedesign do you mean on Windows 32 bit? Do we have any need to support 32 bit build?
I found the following reference https://software.intel.com/en-us/articles/memory-limits-applications-windows
If you have any other reference in mind, please mention it.

Unfortunately Microsoft doesn't seem to publish statistics about the number of users running Windows 32bit. But Steam does https://store.steampowered.com/hwsurvey/ and as of November 2019 less than 1% of users run Windows 7 32bit, Windows 10 32bit.

So I think building for Windows 64 bit may be an easier solution?

Regarding the pip size: Yes, I agree your PR doesn't address it. That's why I wonder why you cite pip size as motivation for this PR. You can edit the motivation and remove it, given that this PR is not related.

leezu · 2019-12-18T12:07:13Z

Regarding the memory size, in my understanding if you use multiple processes, that all load the mxnet dll, the memory among them will be shared.
Or is there some problem that prevents the memory reuse?

yajiedesign · 2019-12-18T12:20:45Z

@leezu the previously reduced size method broken dll shared.

yajiedesign · 2019-12-18T12:25:16Z

@leezu not 32 bit. 64bit dll are also image size limit.This is determined by PE file structure.Reference https://stackoverflow.com/questions/6976693/what-is-the-maximum-size-of-a-pe-file-on-64-bit-windows

yajiedesign · 2019-12-18T12:29:46Z

@leezu Pip is mentioned here because the old method is use to PIP build,and solved pip pkg size problem.

leezu · 2019-12-19T02:01:02Z

@leezu not 32 bit. 64bit dll are also image size limit.This is determined by PE file structure.Reference https://stackoverflow.com/questions/6976693/what-is-the-maximum-size-of-a-pe-file-on-64-bit-windows

Thanks @yajiedesign. It seems one of the root causes is that Windows does not support position independent code and PE format must always use 32bit relative addressing.

The reason I ask you so many questions is that a) I'm concerned that not many people are familiar with Windows, so we need to document the reasoning that went into this patch b) fundamentally the problem you attempt to solve here also affects unix systems and it would be good to have a cross-platform solution. (Thanks to ELF superiority over Windows PE, I understand that the problem is more pressing on Windows and we may need to have a Windows specific intermediate solution at first)

What do you think?

Maybe @samskalicky's dynamic loading of operators effort can solve this in a cross-platform way?

yajiedesign · 2019-12-19T05:14:54Z

Yes, it is urgent under windows. The current scheme is private and has many problems
This solution may be portable to Linux. But I don't know it is need under Linux.I'm not familiar with linux.
I think merge it first to make windows available, and then consider cross platform issues.

leezu · 2019-12-19T05:28:14Z

@yajiedesign can you provide a link for the current scheme? Or is the code not available at all? Where does it run?

Which scheme do you mean? Do you mean the code used to build the windows binary at https://pypi.org/project/mxnet-cu101/1.6.0b20190926/#files

How large is the binary without the current private scheme? What happens without the private scheme?

yajiedesign · 2019-12-19T06:58:12Z

@leezu not current scheme link.it only on my build machine. It's complicated and dirty.
yes current scheme used to build pip pkg.
without current scheme .image size about 1.8gb.it can't load.

leezu · 2019-12-19T10:07:28Z

Thanks for providing more details.

I think it would be good to move the files introduced here from tools to tools/windows or tools/windowsbuild and add a README.md there which documents the motivation (ie. the information you provided during the discussion). Then it should be fine to merge.

You can also consider making the warping feature optional in the Windows build, as it adds overhead and is not needed apart from the pip build that needs to include all cuda arches.

CMakeLists.txt

yajiedesign · 2019-12-22T02:48:28Z

ok.

yajiedesign · 2019-12-26T09:46:25Z

ci test ok

marcoabreu · 2019-12-26T10:51:16Z

tools/windowsbuild/warp_dll.cpp

+
+int find_version()
+{
+	int known_sm[] = { 30,35,37,50,52,60,61,70,75 };


Can you make that dynamic so we don't have to update it in the future?

yajiedesign · 2019-12-27T13:10:07Z

@marcoabreu done

yajiedesign · 2019-12-30T03:11:48Z

can we merge?

add gen_warp cpp version add add_custom_command to run warp_gen add download cmake add option change option add dynamic read mxnet dll

marcoabreu

Great job, thanks!

yajiedesign · 2019-12-31T12:48:22Z

ci test ok.

CMakeLists.txt

yajiedesign · 2020-01-02T01:55:48Z

ping

ptrendx · 2020-01-03T21:38:31Z

@yajiedesign Could you make a PR with backported changes needed for Windows build to 1.6 branch as you mentioned in the mail to dev@?

yajiedesign · 2020-01-04T05:21:32Z

@ptrendx ok

Introduce unified MXNET_CUDA_ARCH option to specify cuda architectures. Previously cuda architecture setting was partially broken and different options were applied to different parts of the build (CUDA_ARCH_NAME CUDA_ARCH_BIN CUDA_ARCH_PTX and CUDA_ARCH_LIST). Include FindCUDAToolkit from CMake 3.17, which replaces the deprecated FindCUDA functionality for finding the cuda toolkit include directories and libraries. Workaround for DLL size limitation on Windows (apache#16980) * change windows build system. add gen_warp cpp version add add_custom_command to run warp_gen add download cmake add option change option add dynamic read mxnet dll

Switch to modern CMake CUDA handling (apache#17031) Introduce unified MXNET_CUDA_ARCH option to specify cuda architectures. Previously cuda architecture setting was partially broken and different options were applied to different parts of the build (CUDA_ARCH_NAME CUDA_ARCH_BIN CUDA_ARCH_PTX and CUDA_ARCH_LIST). Include FindCUDAToolkit from CMake 3.17, which replaces the deprecated FindCUDA functionality for finding the cuda toolkit include directories and libraries. Workaround for DLL size limitation on Windows (apache#16980) * change windows build system. add gen_warp cpp version add add_custom_command to run warp_gen add download cmake add option change option add dynamic read mxnet dll

…ranch Fix CUDNN detection for CMake build (apache#17019) Replace mxnet_option macro with standard CMAKE_DEPENDENT_OPTION (apache#17018) Switch to modern CMake CUDA handling (apache#17031) Introduce unified MXNET_CUDA_ARCH option to specify cuda architectures. Previously cuda architecture setting was partially broken and different options were applied to different parts of the build (CUDA_ARCH_NAME CUDA_ARCH_BIN CUDA_ARCH_PTX and CUDA_ARCH_LIST). Include FindCUDAToolkit from CMake 3.17, which replaces the deprecated FindCUDA functionality for finding the cuda toolkit include directories and libraries. Workaround for DLL size limitation on Windows (apache#16980) * change windows build system. add gen_warp cpp version add add_custom_command to run warp_gen add download cmake add option change option add dynamic read mxnet dll

Fix CUDNN detection for CMake build (#17019) Replace mxnet_option macro with standard CMAKE_DEPENDENT_OPTION (#17018) Switch to modern CMake CUDA handling (#17031) Introduce unified MXNET_CUDA_ARCH option to specify cuda architectures. Previously cuda architecture setting was partially broken and different options were applied to different parts of the build (CUDA_ARCH_NAME CUDA_ARCH_BIN CUDA_ARCH_PTX and CUDA_ARCH_LIST). Include FindCUDAToolkit from CMake 3.17, which replaces the deprecated FindCUDA functionality for finding the cuda toolkit include directories and libraries. Workaround for DLL size limitation on Windows (#16980) * change windows build system. add gen_warp cpp version add add_custom_command to run warp_gen add download cmake add option change option add dynamic read mxnet dll Co-authored-by: Leonard Lausen <leonard@lausen.nl>

lordofgod · 2021-04-28T07:21:51Z

@yajiedesign

这个issue的修改可能会导致cpp_package的OpWrapperGenerator无法正常生成op.h文件, 错误码127。原因是warp_dll.cpp文件的119行HMODULE hm = LoadLibraryW(dll_name);使用了相对路径。调用OpWrapperGenerator时的dll搜索路径不是libmxnet.dll的所在目录，导致这段代码找不到依赖的dll库。这里是不是使用绝对路径更好一些。

This pr will cause the "OpwrapperGenerator.py" failed to generate "op.h" with error code 127, while building with cpp_package. The reason is use of relative dll path at "HMODULE hm = LoadLibraryW(dll_name);" in the new added file warp_dll.cpp, Line 119. "OpWrapperGenerator.py" 's path is different from libmxnet.dll, thus may cause the 127 error while calling the generator script, which infers that sub-dll's load failure in libmxnet.dll. If it's better to use absolute path here, or other strategy for solving this.

dll search path reference: https://docs.microsoft.com/en-us/windows/win32/dlls/dynamic-link-library-search-order

yajiedesign requested a review from szha as a code owner December 5, 2019 08:03

yajiedesign added Build CMake CMake related bugs/issues/improvements labels Dec 5, 2019

lanking520 reviewed Dec 5, 2019

View reviewed changes

tools/gen_warp.cpp Outdated Show resolved Hide resolved

apeforest reviewed Dec 6, 2019

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

apeforest reviewed Dec 6, 2019

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

apeforest reviewed Dec 6, 2019

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

yajiedesign requested review from aaronmarkham and marcoabreu as code owners December 7, 2019 03:18

yajiedesign force-pushed the build_system branch from e81b673 to 91a8e3b Compare December 7, 2019 06:14

yajiedesign mentioned this pull request Dec 10, 2019

Use CMake standard library to handle CUDA #17031

Merged

5 tasks

leezu reviewed Dec 17, 2019

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

leezu reviewed Dec 17, 2019

View reviewed changes

yajiedesign added the pr-awaiting-review PR is waiting for code review label Dec 18, 2019

leezu reviewed Dec 18, 2019

View reviewed changes

yajiedesign force-pushed the build_system branch 2 times, most recently from d7611b3 to ab1e279 Compare December 20, 2019 11:25

leezu reviewed Dec 20, 2019

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

yajiedesign force-pushed the build_system branch 2 times, most recently from 93b4c16 to 5a31b19 Compare December 25, 2019 11:49

marcoabreu reviewed Dec 26, 2019

View reviewed changes

yajiedesign added 2 commits December 30, 2019 22:25

change windows build system.

f366d9e

add gen_warp cpp version add add_custom_command to run warp_gen add download cmake add option change option add dynamic read mxnet dll

fix

fc27ce9

yajiedesign force-pushed the build_system branch from 8c7043f to fc27ce9 Compare December 30, 2019 15:01

fix

7272e02

marcoabreu approved these changes Dec 30, 2019

View reviewed changes

fix

ee31ae5

leezu reviewed Dec 31, 2019

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

leezu reviewed Dec 31, 2019

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

yajiedesign added 2 commits December 31, 2019 21:44

fix

ec5f961

up

550ac45

leezu merged commit ffeacfa into apache:master Jan 2, 2020

yajiedesign mentioned this pull request Jan 4, 2020

Backport #17031, #16980 to 1.6 branch #17213

Merged

leezu mentioned this pull request Mar 18, 2020

Windows link failures in Debug mode #17633

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

change windows build system #16980

change windows build system #16980

yajiedesign commented Dec 5, 2019

yajiedesign commented Dec 17, 2019

leezu Dec 17, 2019

yajiedesign Dec 18, 2019

leezu Dec 18, 2019

yajiedesign Dec 18, 2019

leezu Dec 18, 2019 •

edited

Loading

yajiedesign Dec 18, 2019

leezu left a comment

yajiedesign commented Dec 18, 2019

yajiedesign commented Dec 18, 2019

leezu commented Dec 18, 2019 •

edited

Loading

leezu commented Dec 18, 2019

yajiedesign commented Dec 18, 2019

yajiedesign commented Dec 18, 2019

yajiedesign commented Dec 18, 2019 •

edited

Loading

leezu commented Dec 19, 2019

yajiedesign commented Dec 19, 2019

leezu commented Dec 19, 2019

yajiedesign commented Dec 19, 2019

leezu commented Dec 19, 2019

yajiedesign commented Dec 22, 2019

yajiedesign commented Dec 26, 2019

marcoabreu Dec 26, 2019

yajiedesign commented Dec 27, 2019

yajiedesign commented Dec 30, 2019

marcoabreu left a comment

yajiedesign commented Dec 31, 2019

yajiedesign commented Jan 2, 2020

ptrendx commented Jan 3, 2020

yajiedesign commented Jan 4, 2020

lordofgod commented Apr 28, 2021

change windows build system #16980

change windows build system #16980

Conversation

yajiedesign commented Dec 5, 2019

Description

yajiedesign commented Dec 17, 2019

leezu Dec 17, 2019

Choose a reason for hiding this comment

yajiedesign Dec 18, 2019

Choose a reason for hiding this comment

leezu Dec 18, 2019

Choose a reason for hiding this comment

yajiedesign Dec 18, 2019

Choose a reason for hiding this comment

leezu Dec 18, 2019 • edited Loading

Choose a reason for hiding this comment

yajiedesign Dec 18, 2019

Choose a reason for hiding this comment

leezu left a comment

Choose a reason for hiding this comment

yajiedesign commented Dec 18, 2019

yajiedesign commented Dec 18, 2019

leezu commented Dec 18, 2019 • edited Loading

leezu commented Dec 18, 2019

yajiedesign commented Dec 18, 2019

yajiedesign commented Dec 18, 2019

yajiedesign commented Dec 18, 2019 • edited Loading

leezu commented Dec 19, 2019

yajiedesign commented Dec 19, 2019

leezu commented Dec 19, 2019

yajiedesign commented Dec 19, 2019

leezu commented Dec 19, 2019

yajiedesign commented Dec 22, 2019

yajiedesign commented Dec 26, 2019

marcoabreu Dec 26, 2019

Choose a reason for hiding this comment

yajiedesign commented Dec 27, 2019

yajiedesign commented Dec 30, 2019

marcoabreu left a comment

Choose a reason for hiding this comment

yajiedesign commented Dec 31, 2019

yajiedesign commented Jan 2, 2020

ptrendx commented Jan 3, 2020

yajiedesign commented Jan 4, 2020

lordofgod commented Apr 28, 2021

leezu Dec 18, 2019 •

edited

Loading

leezu commented Dec 18, 2019 •

edited

Loading

yajiedesign commented Dec 18, 2019 •

edited

Loading