You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 22, 2022. It is now read-only.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Move common members into the launch_param_base_t class (shared_memory_bytes).
Remove context as a member and instead use it as an argument of the launch function. This makes sense because we might want to use multiple contexts with the same launch box, and rather that re-instantiating a new launch box for each context, we can use the same object.
Changed the launch function to use a tuple of args rather than a variadic pack of args. This design choice is based around my expectation that most kernels will have a lot of arguments, and it will look cleaner if they are wrapped into a tuple. If this wasn't for launching cuda kernels I would have been able to use std::apply in C++17, but I had to reimplement it to be able to use the chevrons <<<>>>.
The only issue I have still that I'd like to discuss here is how to efficiently declare the launch functions. Currently across the two types of launch boxes the functions are exact copies from one another. The only difference between the classes is that the functions reference member types that are declared in those classes, so they cannot easily be moved into the shared parent class.
3. Changed the launch function to use a tuple of args rather than a variadic pack of args. This design choice is based around my expectation that most kernels will have a lot of arguments, and it will look cleaner if they are wrapped into a tuple. If this wasn't for launching cuda kernels I would have been able to use std::apply in C++17, but I had to reimplement it to be able to use the chevrons <<<>>>.
This is cool, I like std::apply, I am just wondering what is the benefit of the tuple versus direct forwarding the arguments? Isn't it less code if I just forward it?
Changed the launch function to use a tuple of args rather than a variadic pack of args. This design choice is based around my expectation that most kernels will have a lot of arguments, and it will look cleaner if they are wrapped into a tuple. If this wasn't for launching cuda kernels I would have been able to use std::apply in C++17, but I had to reimplement it to be able to use the chevrons <<<>>>.
This is cool, I like std::apply, I am just wondering what is the benefit of the tuple versus direct forwarding the arguments? Isn't it less code if I just forward it?
Yes it is a bit less code from the user perspective when you just forward it directly, but since I was adding the context argument I thought it would make it clearer which args are for the actual kernel in the case where there are a bunch of them.
Changed the launch function to use a tuple of args rather than a variadic pack of args. This design choice is based around my expectation that most kernels will have a lot of arguments, and it will look cleaner if they are wrapped into a tuple. If this wasn't for launching cuda kernels I would have been able to use std::apply in C++17, but I had to reimplement it to be able to use the chevrons <<<>>>.
This is cool, I like std::apply, I am just wondering what is the benefit of the tuple versus direct forwarding the arguments? Isn't it less code if I just forward it?
Yes it is a bit less code from the user perspective when you just forward it directly, but since I was adding the context argument I thought it would make it clearer which args are for the actual kernel in the case where there are a bunch of them.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
None yet
2 participants
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I made a few changes
launch_param_base_t
class (shared_memory_bytes
).launch
function. This makes sense because we might want to use multiple contexts with the same launch box, and rather that re-instantiating a new launch box for each context, we can use the same object.launch
function to use a tuple of args rather than a variadic pack of args. This design choice is based around my expectation that most kernels will have a lot of arguments, and it will look cleaner if they are wrapped into a tuple. If this wasn't for launching cuda kernels I would have been able to usestd::apply
in C++17, but I had to reimplement it to be able to use the chevrons<<<>>>
.The only issue I have still that I'd like to discuss here is how to efficiently declare the launch functions. Currently across the two types of launch boxes the functions are exact copies from one another. The only difference between the classes is that the functions reference member types that are declared in those classes, so they cannot easily be moved into the shared parent class.