-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
alpaka-related updates #41340
Comments
assign heterogeneous |
A new Issue was created by @fwyzard Andrea Bocci. @Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
#41341 should address the second bullet, at least in part. |
On the first point I wonder if we could (at least eventually) utilize scram itself to avoid running the tests in the first place that would fail because of missing hardware? @smuzaffar |
Interesting idea. For a CUDA-only test we could do something like this: <bin name="cudaTest" file="cudaTest.cu">
<use name="cuda"/>
<flags TEST_RUNNER_CMD="cudaIsEnabled && ./cudaTest || echo 'Failed to initialise the CUDA runtime, the test will be skipped.'"/>
</bin> For an Alpaka-based test there are two problems:
The first problem could be solved if SCRAM could provide an environment variable that expands to the name of the binary (or to its full path) that can be used inside <flags TEST_RUNNER_CMD="cudaIsEnabled && ./$TEST_BINARY_NAME || echo 'Failed to initialise the CUDA runtime, the test $TEST_BINARY_NAME will be skipped.'"/> The second problem could be solved is SCRAM could provide an environment variable with the name of the Alpaka backend being tested, like <flags TEST_RUNNER_CMD="alpakaIsEnabled$TEST_ALPAKA_BACKEND && ./$TEST_BINARY_NAME || echo 'Failed to initialise the $TEST_ALPAKA_BACKEND backend, the test $TEST_BINARY_NAME will be skipped.'"/> Of course, it would be even nicer if SCRAM could automate this with a single flag, something like <flags TEST_REQUIRE_ALPAKA_BACKEND/> and report these tests as |
And to clarify: |
@fwyzard, I was also thinking along the lines of using
SCRAM should run @aandvalenzuela is interested in improving the GPU/Alpaka tests. if the above sounds good then she can already look in to implement it. |
Looks good to me. I would suggest We have actually done the latter for pixeltrack-standalone, but not in CMSSW where we are relying on the plug-in system. |
@makortel @fwyzard , cms-sw/cmssw-config#95 implements the new unit test rules for cuda/rocm/alpaka backends. See cms-sw/cmssw-config#95 (comment) and let me know if this is sufficient? |
@smuzaffar from the description it looks good. Since the change impacts both "cuda" and "alpaka-cuda" tests, maybe it would actually be better to check |
@smuzaffar From the descriptions it looks good to me too. My only question is about tests that run cmsRun (e.g. directly or via a shell script). From cms-sw/cmssw-config#95 I understand the behavior is driven by the dependencies of the test. Is the |
@makortel , For dedicated GPU IBs, we can set |
@smuzaffar Thanks, but I'm still confused :) Let's take a concrete example cmssw/HeterogeneousCore/AlpakaTest/test/BuildFile.xml Lines 6 to 10 in 5b979f2
Here the goal is to run the test script in one way when Should this be expressed along <test name="testHeterogeneousCoreAlpakaTestModulesCUDA" command="testAlpakaModules.sh cuda">
<use name="cuda"/>
</test>
<test name="testHeterogeneousCoreAlpakaTestModulesCPU" command="testAlpakaModules.sh cpu"/> then? (and probably making the |
yes @makortel that fregment of BuildFile should be change as you suggested.
|
... and adding a ROCm version. |
Profit from cms-sw#41340
By the way, new build rules, to make use of |
Profit from cms-sw#41340
Profit from cms-sw#41340
Profit from cms-sw#41340
@fwyzard This issue has been completed, right? |
I don't remember what it was about... I'll have to re-read the thread and remind me of the details. |
Looking at the tests, I think we should still introduce a way to make them fail gracefully instead of crashing, when no GPU is available. For example:
|
Playing around with |
This is actually as simple as if (cms::alpakatools::devices<Platform>().empty()) {
std::cout << "No devices found, the test will be skipped.\n";
exit(EXIT_SUCCESS);
} @makortel, do you think we need to wrap it in a function like |
I've updated an example in #44036 . |
This is similar to what we had at one point (and still have with direct CUDA). My concern for this specific approach is that an infrastructure problem, e.g. in GPU IBs, would go unnoticed because the tests report success (by default If the test would return I don't have strong feelings whether to abstract or copy-paste. Four lines isn't much, but it's still something. On the other hand, e.g. in Catch2-based tests one could do REQUIRE(not cms::alpakatools::devices<Platform>().empty()); which I wouldn't consider to be worth of abstracting. |
Sure, I'm OK with returning In fact I coded it that way, then I switched to |
(will fix tomorrow) |
Mhm... I don't see the quoted images :-( Note that the message should be the same in all cases, it's just a matter of choice. |
Profit from cms-sw#41340
Ok, third attempt to quote pictures... |
Going back to text then. My (not very strong) order of preferences would be
|
OK, so directly |
I'll update the PRs accordingly. |
Profit from cms-sw#41340
I'm collecting some "to do" items to improve the alpaka-related developments:
cms::cudatest::requireDevices()
; possibly also some explicit support for Catch2 tests.DataFormats/Portable/interface/Product.h
,HeterogeneousCore/AlpakaCore/interface/ScopedContext.h
, etc.The text was updated successfully, but these errors were encountered: