Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

limit maxparallel to 16 by default #4606

Merged
merged 4 commits into from
Aug 8, 2024
Merged

Conversation

branfosj
Copy link
Member

@branfosj branfosj commented Aug 8, 2024

We often see it where the large numbers of cores available on modern systems causes either the build to progress slowly or to fail. I propose we add this default to maxparallel to counter that.

@branfosj branfosj added this to the 5.0 milestone Aug 8, 2024
test/framework/easyblock.py Outdated Show resolved Hide resolved
Copy link
Member

@ocaisa ocaisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ocaisa ocaisa merged commit 3e0b4e5 into easybuilders:5.0.x Aug 8, 2024
35 checks passed
@branfosj branfosj deleted the maxparallel branch August 8, 2024 13:33
@boegel
Copy link
Member

boegel commented Aug 9, 2024

Would be nice to motivate this a bit better, for example by benchmarking the speedups we see with a couple of large projects on systems with say 128 cores, for example GCC, OpenFOAM, Qt5, etc.

16 is not an unreasonable value to use as default, but that's based on gut feeling.

@ocaisa
Copy link
Member

ocaisa commented Aug 9, 2024

Do we have a template that could be used to auto-set maxparallel to available CPUs for easyconfigs that are known to support it (I imagine the ones where this is the case, and worthwhile, are more the exception than the rule)?

For the examples given, this could also just be done in the easyblocks.

@branfosj
Copy link
Member Author

branfosj commented Aug 9, 2024

I do not have a 128 core node available at the moment, but using a 112 core one:

Node:

  • 2x 56 core Sapphire Rapids (Intel(R) Xeon(R) Platinum 8480CL)
  • 512GB
  • building in /dev/shm
  • installing onto GPFS

GCCcore-14.1.0.eb

Cores Time
112 43:01
56 39:13
28 38:56
21 38:54
14 40:38
7 49:01

OpenFOAM-11-foss-2023a.eb

Cores Time
112 2:37:57
56 2:32:02
28 2:25:35
21 2:23:07
14 2:18:44
7 1:57:44

Qt5-5.15.13-GCCcore-13.2.0.eb

Cores Time
112 8:31:29
56 8:16:43
28 7:57:40
21 7:23:10
14 7:07:45
7 5:12:12

Rust-1.78.0-GCCcore-13.3.0.eb

For comparison, I've also tested using dev/shm as the install directory.

Cores Time Time (/dev/shm)
112 42:43 38:37
56 43:17 37:29
28 39:27 34:26
21 41:42 33:10
14 41:17 34:32
7 54:31 45:54

@jfgrimm
Copy link
Member

jfgrimm commented Aug 9, 2024

I'll start some jobs for the same software that @branfosj is doing, plus Rust-1.78.0-GCCcore-13.3.0.eb

Node:

  • 2x 48 core AMD EPYC 7643
  • 512GB
  • building in /dev/shm
  • installing onto /dev/shm

GCCcore-14.1.0.eb

Cores Time
96 35:46
64 35:42
32 35:53
16 37:14
8 47:51

OpenFOAM-11-foss-2023a.eb

Cores Time
96 37:36
64 37:37
32 37:48
16 40:23
8 47:54

Qt5-5.15.13-GCCcore-13.2.0.eb

Cores Time
96 1:35:01
64 1:37:16
32 1:42:14
16 1:56:32
8 2:42:36

Rust-1.78.0-GCCcore-13.3.0.eb

Cores Time
96 21:18
64 22:03
32 23:19
16 29:22
8 43:35

@jfgrimm jfgrimm added the EasyBuild-5.0 EasyBuild 5.0 label Aug 12, 2024
@jfgrimm
Copy link
Member

jfgrimm commented Aug 12, 2024

@branfosj interesting that your Qt5 times are so wildly different to mine -- is that just because I installed to /dev/shm?

@branfosj
Copy link
Member Author

@branfosj interesting that your Qt5 times are so wildly different to mine -- is that just because I installed to /dev/shm?

There is something weird about building Qt5 in that development environment of mine that causes it to take ages. However, I can also get Qt5 build much more quickly. Our live build time, using 24 cores, for Qt5-5.15.7-GCCcore-12.2.0.eb is 56 minutes. But my development build of the same is 10 hours. And the logs for the two builds are with 6 lines in length of each other.

@jfgrimm
Copy link
Member

jfgrimm commented Aug 12, 2024

@branfosj interesting that your Qt5 times are so wildly different to mine -- is that just because I installed to /dev/shm?

There is something weird about building Qt5 in that development environment of mine that causes it to take ages. However, I can also get Qt5 build much more quickly. Our live build time, using 24 cores, for Qt5-5.15.7-GCCcore-12.2.0.eb is 56 minutes. But my development build of the same is 10 hours. And the logs for the two builds are with 6 lines in length of each other.

wild, that's a huge difference. In any case, 16-32 cores looks like the sweet spot on our system

@jfgrimm
Copy link
Member

jfgrimm commented Aug 14, 2024

added a few more data points for core counts (including 1):
Relative installation times (using _dev_shm)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

4 participants