-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to deal with plugins not being built for some architectures? #28576
Comments
A new Issue was created by @makortel Matti Kortelainen. @davidlange6, @Dr15Jones, @smuzaffar, @fabiocos, @kpedro88 can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
Do really cuda not exist for ppc64le? |
moving fillDescriptions to a separate file.cc is a possible solution; not as a standard but to avoid issues in these cases; but this likely requires having a plugin.h and a modules.cc file to declare and instantiate the relevant parts, while leaving some methods unresolved if the rest of the plugin.cc can not be compiled. |
|
CUDA is available for ppc64le. I was actually wondering that myself as well, maybe @smuzaffar @mrodozov could clarify if there is anything fundamental preventing adding CUDA external for ppc64le or it just has not been done yet? |
I'm not sure we can count on the availability of all foreseen GPUs (NVIDIA, AMD, Intel) on all our current CPU architectures (x86, Arm, Power). A possible way out could be to put the GPU to be part of the architecture ( |
|
I was specifically thinking about a separate library per GPU type, and then loading only the one corresponding to the GPU the machine has (if any). |
and do we have a mechanism for that? |
Not today. But that's the question, would we need/benefit from such a mechanism. |
Going back to the original issue, I understand there are two problems
The first issue (availability of By the way, I wrote "today" because I have not given any thoughts to how to handle it if/when we use something like Kokkos or Alpaka. The second issue (availability of the module itself) would be automatically address if we create a dummy module. |
I am only afraid that we start with "few localized actions" and will end-up in few months/years to apply the pattern to most of the plugins. This is why, at the end, I always failed back to support solutions based on SCRAM_ARCH. |
assign core |
New categories assigned: core @Dr15Jones,@smuzaffar,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks |
Here is a proposal based on earlier discussion of @Dr15Jones and I. We concluded that all the modules specified in the configuration need to be loaded and constructed:
A possible way, that has minimal impact on the existing code, would be
Applying the same constraints to Services and ESProcuders implies
|
In the long run we could think of moving more functionality to
Then we would not need to validate the module configurations, nor call the constructors for the data dependence DAG of modules. Neither of these would help to enforce the Instead of the module dependence DAG in python we could think of |
I do see a risk here that we continue to bundle everything together, and if something not-generally-loadable appears, we may have to restructure a lot of code. On the other hand, going for complicated solutions now would have the risk of doing possibly unnecessary work now. |
Replying here to @fwyzard's comment #27983 (comment)
I don't like it too much either.
There are more issues than just the "configuration hash". If we can not build an EDModule for some architecture
|
Summary of a discussion in the core software meeting today In the short term (#31261) continue with
For longer term some new thoughts
In the specific case of CUDA11 and gcc10, we could work around with fallback compiler options (use gcc9 with nvcc, and gcc10 elsewhere), but this was considered potentially fragile (e.g. towards C++ modules), and a more general solution was preferred. |
The problems with CUDA modules have been addressed in the Alpaka module system. The approach there (ModuleTypeResolver, #40383 and links therein) has some similarities with
but there are also some differences. I think we could close this issue, and if some other system (like ML inference, although there we are already dealing with this problem in a different way) would face similar challenge, open a new issue then. |
+core |
cms-bot internal usage |
This issue is fully signed and ready to be closed. |
PR #28537 introduces first plugins depending on CUDA, and they are ignored for
ppc64le
. If/when we add some of these (or future) plugins on any configuration fragment used inrunTheMatrix.py
, the depending workflows will fail onppc64le
(similar to the tests in #28572) already because thefillDescriptions()
-generated configuration files are not present (and even if that would succeed, loading of the plugin(s) would fail).Some options on how to proceed
fillDescriptions()
for such modules and go back to hand-writtencfi
files, avoid also configuration validationcfi
fragments that is always presentCUDAService
(or some other mechanism) tells that CUDA is available.CUDAService
itself would need such an indirection as wellMore ideas are welcome.
The text was updated successfully, but these errors were encountered: