Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA]: Proposal to Change MaxSmOccupancy Inline Specification for Enhanced Compatibility Across Shared Libraries Utilizing Thrust/CUB #1391

Closed
1 task done
eee4017 opened this issue Feb 16, 2024 · 3 comments · Fixed by #1395
Labels
feature request New feature or request.

Comments

@eee4017
Copy link

eee4017 commented Feb 16, 2024

Is this a duplicate?

Area

CUB

Is your feature request related to a problem? Please describe.

I propose modifying the MaxSmOccupancy function's inline specification from inline to force_inline (utilizing the _CCCL_FORCEINLINE) here. This change is crucial for enhancing the compatibility and functionality of projects that employ multiple shared libraries incorporating the Thrust/CUB libraries.

When utilizing Thrust/CUB across several shared libraries, it's possible to encounter a cudaErrorInvalidDeviceFunction error. This issue arises if the compiler fails to correctly inline the MaxSmOccupancy function. Our project structure comprises multiple libraries (e.g., libA.so and libB.so) that utilize Thrust/CUB and are linked together. We discovered that MaxSmOccupancy is implemented within libA.so. However, when invoking Thrust functions in libB.so, the kernel pointer (kernel_ptr), which is a Thrust device function within libB.so, is passed to and queried by MaxSmOccupancy in libA.so. This operation is problematic within CUDA, which triggers cudaErrorInvalidDeviceFunction because passing the function pointer of a device function between libraries is restricted by the CUDA Runtime API, given that CUlibrary structures of CUDA Driver are opaque and managed by CUDA Runtime.

To address this issue and prevent the cudaErrorInvalidDeviceFunction, it's imperative to ensure that MaxSmOccupancy is forcefully inlined. This adjustment ensures that the cudaOccupancyMaxActiveBlocksPerMultiprocessor function is invoked within the same library that calls the Thrust function, thereby circumventing the aforementioned error.

Describe the solution you'd like

Change inline to _CCCL_FORCEINLINE here

Describe alternatives you've considered

No response

Additional context

No response

@eee4017 eee4017 added the feature request New feature or request. label Feb 16, 2024
@jrhemstad
Copy link
Collaborator

Thanks for the excellent write up! We've dealt with countless insidious issues that originate from the interplay between symbol visibility across shared libraries and how kernel registration works in the CUDA Runtime. It's been a nasty problem that we've hoped was finally put to rest. You can read about the saga here #443.

I think you may have just identified an area that we missed 🙁.

Similar to how in #443 we had to decorate the thrust::cuda_cub::launcher::triple_chevron kernel launch function with _LIBCUDACXX_HIDDEN (which is ultimately just __attribute__((visibility(hidden)))) to avoid symbol collisions across shared objects, it would seem we need to do the same thing with cub::MaxSmOccupancy. Using forceinline as you suggest would probably work too, but the symbol visibility annotation is the more targeted solution to the real root of the problem.

I'll need @gevtushenko to confirm that this is indeed the right fix and then we'll try and take care of that ASAP.

@gevtushenko
Copy link
Collaborator

@eee4017 thank you for reporting the issue! I agree with your analysis. Every function taking kernel pointers should be hidden. I think it goes beyond SM occupancy calculator and triple chevron launcher. In the CUB dispatch layer, we also have this issue that was likely masked by force inlining. Some places (segmented sort) missed force inline annotation, potentially leading to linkage issue. I've filed #1391 that hides all functions taking kernel pointers. Please, take a look if it addresses the issue for you.

@ZelboK
Copy link
Contributor

ZelboK commented Feb 19, 2024

I was going to update https://github.com/NVIDIA/cccl/pull/592
but I just noticed this issue. Should I hold off?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants