Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{devel}[multiple] Define MAX_JOBS in easyconfigs of PyTorch from v1.1.0 to v1.4.0 #10772

Merged

Conversation

lexming
Copy link
Contributor

@lexming lexming commented Jun 9, 2020

(created using eb --new-pr)

Explicitly set MAX_JOBS to the parallel setting from EB to avoid build errors from excessive parallelism. The build of PyTorch by default uses all cores physically present in the system.

@lexming lexming added the bug fix label Jun 9, 2020
@lexming lexming added this to the next release (4.2.2?) milestone Jun 9, 2020
@lexming lexming changed the title Define MAX_JOBS in easyconfigs of PyTorch from v1.1.0 to v1.4.0 {devel}[multiple] Define MAX_JOBS in easyconfigs of PyTorch from v1.1.0 to v1.4.0 Jun 9, 2020
@easybuilders easybuilders deleted a comment from boegelbot Jun 9, 2020
@lexming
Copy link
Contributor Author

lexming commented Jun 9, 2020

@boegel Can you take a look at these failed unit tests please? It's weird, it seems that the easyconfigs in fosscuda fail because at some point the string from prebuildopts

TORCH_CUDA_ARCH_LIST="3.5 3.7 5.2 6.0 6.1 7.0"

gets converted to

TORCH_CUDA_ARCH_LIST="3.5 %(pyshortver)s 5.2 6.0 6.1 7.0"

@boegel
Copy link
Member

boegel commented Jul 6, 2020

@lexming The tests for PyTorch easyconfigs using Python 3.7.x as dependency are indeed failing now because because of the %(pyshortver)s template being injected in the dumped easyconfig, which is kind of silly.

The culprit is the change you are making though, indirectly at least: you're introducing %(parallel)s into buildopts. The value for the %(parallel)s template is not set yet when the dumped easyconfig is compared with the original one in the tests. This makes the test basically compare "raw" values where templates are still in place, as opposed to templated string values. This results in comparing the originalbuildopts value (which includes both %(parallel)s and "3.7") with the value from the dumped easyconfig (where buildopts contains %(pyshortver)s because of overly aggressive template injecting).

An easy workaround here is to inject a value for the %(parallel)s value, so the comparison can be done on string values where templates have been resolved, which is done in 170e391 .

To proper fix is probably to make the template injection in the dumped easyconfig a bit less aggressive, but I doubt that's a trivial thing to fix (I've opened an issue for this at easybuilders/easybuild-framework#3380).

@lexming
Copy link
Contributor Author

lexming commented Jul 6, 2020

@boegel that is much more complex than anticipated, thanks for the workaround

@easybuilders easybuilders deleted a comment from boegelbot Jul 6, 2020
Copy link
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegel
Copy link
Member

boegel commented Jul 6, 2020

Test report by @boegel
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in this PR)
node2303.phanpy.os - Linux centos linux 7.8.2003, x86_64, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (haswell), Python 2.7.5
See https://gist.github.com/22be06fdc895b0a04e0b85cbd3b7dbe9 for a full test report.

@boegel
Copy link
Member

boegel commented Jul 6, 2020

Test report by @boegel
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in this PR)
node3408.kirlia.os - Linux centos linux 7.8.2003, x86_64, Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz (cascadelake), Python 2.7.5
See https://gist.github.com/97b79c216eb808083e4560e4bf7ce6cf for a full test report.

@boegel
Copy link
Member

boegel commented Jul 6, 2020

Test report by @boegel
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in this PR)
node3309.joltik.os - Linux centos linux 7.8.2003, x86_64, Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz (cascadelake), Python 3.6.8
See https://gist.github.com/e0c1243a5c91b004bded4fc6b3332986 for a full test report.

@boegel
Copy link
Member

boegel commented Jul 6, 2020

Going in, thanks @lexming!

@boegel boegel merged commit 9a31feb into easybuilders:develop Jul 6, 2020
@lexming lexming deleted the 20200609175609_new_pr_PyTorch110 branch July 7, 2020 09:56
@lexming
Copy link
Contributor Author

lexming commented Jul 7, 2020

Test report by @lexming
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in this PR)
node368.hydra.os - Linux centos linux 7.7.1908, x86_64, Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz, Python 2.7.5
See https://gist.github.com/0c5973b65b53a141956e15f8545e0c36 for a full test report.

@lexming
Copy link
Contributor Author

lexming commented Jul 7, 2020

Test report by @lexming
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in this PR)
node366.hydra.os - Linux centos linux 7.7.1908, x86_64, Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz, Python 2.7.5
See https://gist.github.com/5417963bb8415573a73c89678e110ea5 for a full test report.

@lexming
Copy link
Contributor Author

lexming commented Jul 7, 2020

Test report by @lexming
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in this PR)
node366.hydra.os - Linux centos linux 7.7.1908, x86_64, Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz, Python 2.7.5
See https://gist.github.com/073020871ff0863619553ff222c2d655 for a full test report.

@lexming
Copy link
Contributor Author

lexming commented Jul 7, 2020

Test report by @lexming
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in this PR)
node373.hydra.os - Linux centos linux 7.7.1908, x86_64, Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz, Python 2.7.5
See https://gist.github.com/cfbd14881b951e2e0cf58a8165c9c3e0 for a full test report.

@lexming
Copy link
Contributor Author

lexming commented Jul 8, 2020

Test report by @lexming
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in this PR)
node105.hydra.os - Linux centos linux 7.7.1908, x86_64, Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, Python 2.7.5
See https://gist.github.com/af152306d32d1fb6a848c2abc97f5d15 for a full test report.

@lexming
Copy link
Contributor Author

lexming commented Jul 8, 2020

Test report by @lexming
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in this PR)
node105.hydra.os - Linux centos linux 7.7.1908, x86_64, Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, Python 2.7.5
See https://gist.github.com/c42199b6e2791e1deeed626d905d497c for a full test report.

Flamefire added a commit to Flamefire/easybuild-easyblocks that referenced this pull request Jul 23, 2020
Flamefire added a commit to Flamefire/easybuild-easyblocks that referenced this pull request Jul 23, 2020
@Flamefire
Copy link
Contributor

FYI: I moved that into the EasyBlock I'm creating: easybuilders/easybuild-easyblocks@919d22b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants