Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for including PTX code in PyTorch #2328

Closed
wants to merge 8 commits into from

Conversation

Flamefire
Copy link
Contributor

@Flamefire Flamefire commented Feb 2, 2021

This adds PTX code to PyTorch by default for any newer architecture than the last selected one.
This can be changed by the new EC option "ptx"

  • Discussion about the cuda cache needs resolving (see below)

QUESTION: What about cuda_cache_size? It might be better to make this an easybuild option (similar to --cuda-compute-capabilities) instead. E.g. for PyTorch the cache seems to be quite large. Running the test test_cpp_extensions_aot_no_ninja alone fills up 1GB

Framework PR: easybuilders/easybuild-framework#3569 If that is merged I can remove the option in this EC

This adds PTX code to PyTorch by default for any newer architecture than
the last selected one.
This can be changed by the new EC option "ptx"
@boegel boegel added this to the next release (4.3.3?) milestone Feb 2, 2021
@Flamefire
Copy link
Contributor Author

Had to add a bugfix as TORCH_CUDA_ARCH_LIST needs to be set for tests too or the build (even current one) will fail if the GPUs found during build are newer than what the used nvcc supports. See https://gist.github.com/3ce737772ff805683c226e500b525c67

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire

Overview of tested easyconfigs (in order)

Build succeeded for 0 out of 1 (1 easyconfigs in total)
taurusi8028 - Linux centos linux 7.9.2009, x86_64, AMD EPYC 7352 24-Core Processor, Python 2.7.5
See https://gist.github.com/d30c8f48f9c0948ea0c543797ddff458 for a full test report.

@@ -51,7 +51,9 @@ def extra_options():
extra_vars.update({
'excluded_tests': [{}, 'Mapping of architecture strings to list of tests to be excluded', CUSTOM],
'custom_opts': [[], 'List of options for the build/install command. Can be used to change the defaults ' +
'set by the PyTorch EasyBlock, for example ["USE_MKLDNN=0"].', CUSTOM]
'set by the PyTorch EasyBlock, for example ["USE_MKLDNN=0"].', CUSTOM],
'ptx': ['latest', 'For which compute architectures PTX code should be generated. Can be '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CUDA arches are not guaranteed to be in order, so one of the following changes is required:

  1. latest to become last
  2. The code below changes to add +PTX to the latest CUDA arch
  3. We order the CUDA arches

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Using "last" which matches "first" and is easiest.

@Flamefire
Copy link
Contributor Author

I had to add an option for setting up the cuda cache as using ptx (by default on) will now possibly trigger JIT compilation which writes to the HOME directory. See https://developer.nvidia.com/blog/cuda-pro-tip-understand-fat-binaries-jit-caching/ for details

There also seems to be a failure which I'm not sure about. Might be because I started the test to early. Will rerun

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire

Overview of tested easyconfigs (in order)

  • SUCCESS PyTorch-1.7.1-fosscuda-2019b-Python-3.7.4.eb

Build succeeded for 1 out of 1 (1 easyconfigs in total)
taurusml3 - Linux RHEL 7.6, POWER, 8335-GTX, Python 2.7.5
See https://gist.github.com/a2fd3c5a7ab9956d83e1df29e62cb6f7 for a full test report.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire

Overview of tested easyconfigs (in order)

Build succeeded for 0 out of 1 (1 easyconfigs in total)
taurusi8033 - Linux centos linux 7.9.2009, x86_64, AMD EPYC 7352 24-Core Processor, Python 2.7.5
See https://gist.github.com/333e42009f7c32b178b429f547deae8b for a full test report.

@Flamefire
Copy link
Contributor Author

While this does work the amount of JIT compiling that would happen when running the so-compiled PyTorch make this unfeasible. So I'd recommend against using this and am closing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants