-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA early exit on empty grid #1729
Conversation
launch jenkins |
Hi there, this is jenkins continuous integration... |
launch jenkins |
@@ -149,19 +149,22 @@ namespace gridtools::fn::backend { | |||
auto strides = sid::get_strides(std::forward<Composite>(composite)); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something like the following (as the first thing in the function) would be more readable. What do you think?
if empty_domain(sizes) | |
return; | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and maybe even better to move it outside of gpu.hpp to the common part where we are dispatching to the backends (but didn't check where that is and if it really makes sense).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created an is_domain_empty
function. I usually use an if
block instead of an early return when the code is simple, but I'm happy with either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer Hannes early return in this case as I my eyes don’t have to track any braces opening and closing. And my proposed changes would be even a bit more minimalistic, just before the kernel launch:
if (blocks.x ==0 || blocks.y == 0 || blocks.z == 0)
return;
(of course this does not include the optimization of skipping unnecessary computation of ptr_holder, strides, etc., but I don’t think we need any optimization of such trivial computations for an empty domain)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, uhh, it was with the blocks
before Hannes suggested to make it a function :) Now that I wrote the empty domain function, I'd rather keep it, because that satisfies the requirements independent of how you calculate the blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll also put an early return
launch jenkins |
jenkins launch |
launch jenkins |
@@ -12,6 +12,7 @@ | |||
#include <gtest/gtest.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer having this test as a regression test, running for all available backends. You could even just put a new variant of the copy stencil test into tests/regression/fn/fn_copy.cpp
I guess. There are already versions with and without domain offsets, so the one with zero domain size would actually fit well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…or add a separate one as you also check the scan. I don’t care much, but enabling it for all backends makes sense for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a separate regression test (and removed the unit test).
launch jenkins |
good to test |
launch perftests |
launch perftest |
launch perftest |
launch jenkins |
launch jenkins |
launch perftest |
launch perftest |
launch jenkins |
Launching a kernel with a zero gridSize is not allowed by CUDA. This PR adds a check to exit early and not even attempt to launch the kernel if any of the domain sizes are zero.
Launching a kernel with a zero gridSize is not allowed by CUDA. This PR adds a check to exit early and not even attempt to launch the kernel if any of the domain sizes are zero.