Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update cuda arch 8.9 #1733

Merged
merged 2 commits into from
Nov 28, 2024
Merged

Update cuda arch 8.9 #1733

merged 2 commits into from
Nov 28, 2024

Conversation

yhmtsai
Copy link
Member

@yhmtsai yhmtsai commented Nov 25, 2024

This PR adds Ada to sSMtoCores, adds ada and hopper in selector

@ginkgo-bot ginkgo-bot added reg:build This is related to the build system. reg:ci-cd This is related to the continuous integration system. mod:cuda This is related to the CUDA module. mod:hip This is related to the HIP module. labels Nov 25, 2024
@yhmtsai yhmtsai self-assigned this Nov 25, 2024
@yhmtsai yhmtsai added 1:ST:ready-for-review This PR is ready for review 1:ST:run-full-test labels Nov 25, 2024
@yhmtsai yhmtsai requested a review from a team November 26, 2024 08:07
.gitlab-ci.yml Outdated
@@ -108,6 +108,64 @@ build/cuda110/nompi/gcc/cuda/release/shared:
# disable spurious unused argument warning
EXTRA_CMAKE_FLAGS: "-DCMAKE_CUDA_FLAGS=-diag-suppress=177"

# cuda 11.4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these versions necessary? We already test 11.0, 11.7, 12.0. I could understand if we also add the latest cuda version (12.6), but these seem quite random and unnecessary to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary but good to have.
I consider all versions are different, so it will be great if we test all.
Also, something in nvhpc is different from the normal setup in my opinion.
We do not do that actually just limited by the resource.
Also, I only pick up those in the docker hub now because we do not process the our container currently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't test all versions, so my question still remains, are these versions so important that we need to add extra jobs, each might take ~40--60min, to our CI? Until I see a compelling reason, I will block this particular change.

Also, I wouldn't want to change our CI this close to the release. After the release, we should restructure our CI anyway to be a bit less random in which versions we are using.

Copy link
Member Author

@yhmtsai yhmtsai Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am seeing it in the opposite side. I do not have idea why we drop these version accendidently. The version jump is roughly 3 minor version in nvcc side. They change some components version when updating the version. As #1730 , we do not test it a lot because only two of them only in nvhpc (I still think nvhpc is different from nvcc with the same cuda version). To get more confident with the thrust without giving the user option to turn it off by comments, which actually made something broken and fix recently, I manually run additionaly 4 version to check.

Copy link
Member

@MarcelKoch MarcelKoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI changes should be removed, see the comment thread.
Otherwise LGTM.

@yhmtsai yhmtsai changed the title Update cuda arch and bring some cuda version back Update cuda arch 8.9 Nov 27, 2024
@MarcelKoch MarcelKoch added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Nov 28, 2024
@yhmtsai yhmtsai merged commit 230ab24 into develop Nov 28, 2024
8 of 11 checks passed
@yhmtsai yhmtsai deleted the update_cuda_arch branch November 28, 2024 12:45
MarcelKoch pushed a commit to MarcelKoch/ginkgo that referenced this pull request Dec 2, 2024
This pr updates cuda arch 8.9 and pick the missing change from custom thrust namespace

Related PR: ginkgo-project#1733
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-to-merge This PR is ready to merge. 1:ST:run-full-test mod:cuda This is related to the CUDA module. mod:hip This is related to the HIP module. reg:build This is related to the build system. reg:ci-cd This is related to the continuous integration system.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants